Essentials of Business Analytics 1st Edition Camm
by by Jeffrey D. Camm, James J. Cochran , Michael J. Fry, Jeffrey W. Ohlmann, David R. Anderson
Chapter 1: Introduction 1. The decisions concerning an organization’s goals and future plans are called a. financial decisions. b. tactical decisions. c. strategic decisions. d. operational decisions. Answer: C Difficulty: Easy LO: 1.1, Page 4 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Strategic decisions involve higher-level issues concerned with the overall direction of the organization. 2. Tactical decisions define: a. the day-to-day activities of the organization. b. the goals and plans of the organization. c. the domain of operations managers, who are close to the customer. d. the steps taken to achieve the goals and objectives. Answer: D Difficulty: Medium LO: 1.1, Page 4 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Tactical decisions concern how the organization should achieve the goals and objectives set by its strategy. 3. Picks and Axes Inc. is an Internet-based retail seller of hiking boots and mountaineering gear. The company decides to open retail stores across the major areas of the city to help complement its Internet-based strategy. This activity would be categorized as a(n) a. tactical decision. b. operational decision. c. strategic decision. d. financial decision. Answer: C Difficulty: Medium LO: 1.1, Page 4 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Strategic decisions involve higher-level issues concerned with the overall direction of the organization; these decisions define the organization’s overall goals and aspirations
for the future. Strategic decisions are usually the domain of higher-level executives and have a time horizon of three to five years. 4. _____ is the most critical step in a decision making process. a. Choosing an alternative b. Identifying and defining the problem c. Evaluating the alternatives d. Determining the set of alternatives Answer: B Difficulty: Medium LO: 1.2, Page 5 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Step 1 of decision making, identifying and defining the problem, is the most critical. Only if the problem is well-defined, with clear metrics of success or failure (step 2), can a proper approach for solving the problem (steps 3 and 4) be devised. Decision making concludes with the choice of an alternative (step 5). 5. Firms guided by data-driven decision making have a. higher market value. b. lower productivity. c. higher risk. d. lower profit. Answer: A Difficulty: Medium LO: 1.2, Page 5 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Firms guided by data-driven decision making have higher productivity and market value and increased output and profitability. 6. ______ encompasses reports, data dashboards, and descriptive statistics to describe the past data. a. Predictive analytics b. Descriptive analytics c. Prescriptive analytics d. Decision analysis Answer: B Difficulty: Easy LO: 1.3, Page 5 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC:
Feedback: Descriptive analytics encompasses the set of techniques that describes what has happened in the past. 7. Information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on extracted from the manufacturing plant’s database refers to ______. a. spreadsheet models b. data dashboards c. data mining d. data query Answer: D Difficulty: Medium LO: 1.3, Page 6 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: A data query is a request for information with certain characteristics from a database. 8. Corporate-level managers use ______ to summarize sales by region, current inventory levels, and other company-wide metrics all in a single screen. a. simulation b. crosstabulation c. data dashboards d. tables Answer: C Difficulty: Medium LO: 1.3, Page 6 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: For corporate-level managers, daily data dashboards might summarize sales by region, current inventory levels, and other company-wide metrics. 9. ______ helps in constructing a mathematical model to predict the future sales, based on past data. a. Predictive analytics b. Decision analysis c. Prescriptive analytics d. Descriptive analytics Answer: A Difficulty: Easy LO: 1.3, Page 6 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC:
Feedback: Predictive analytics consists of techniques that use models constructed from past data to predict the future. 10. Which of the following techniques is used in predictive analytics? a. Data dashboards b. Linear regression c. Data visualization d. Optimization models Answer: B Difficulty: Medium LO: 1.3, Page 6 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Linear regression, time series analysis, some data-mining techniques, and simulation, often referred to as risk analysis, all fall under the banner of predictive analytics. 11. A retail store owner offers a discount on product A and predicts that, the customers would purchase products B and C in addition to product A. Identify the technique used to make such a prediction. a. Data query b. Simulation c. Data mining d. Data dashboards Answer: C Difficulty: Medium LO: 1.3, Page 6 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Data mining is a technique used to find patterns or relationships among elements of the data in a large database. 12. ______ is used in the pharmaceutical industry to assess the risk of introducing a new drug. a. Data dashboards b. Charts c. Spreadsheet models d. Simulation Answer: D Difficulty: Medium LO: 1.3, Page 6 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC:
Feedback: Simulation involves the use of probability and statistics to construct a computer model to study the impact of uncertainty on a decision. 13. Which of the following analytical techniques helps us arrive at the best decision? a. Predictive analytics b. Data mining c. Prescriptive analytics d. Descriptive analytics Answer: C Difficulty: Medium LO: 1.3, Page 6 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Prescriptive analytics indicate a best course of action to take; that is, the output of a prescriptive model is a best decision. 14. Supply network design models provide the cost-minimizing plant and distribution center locations subject to meeting the customer service requirements. This model is referred as a. Optimization models. b. Forecasting models. c. Data mining models. d. Network models. Answer: A Difficulty: Easy LO: 1.3, Page 7 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Optimization models give the best decision subject to constraints of the situation. 15. Simulation optimization helps: a. in identifying the constraints of the situation. b. to find good decisions in highly complex and highly uncertain settings. c. in assigning values to outcomes. d. to model certainty with optimization techniques. Answer: B Difficulty: Medium LO: 1.3, Page 7 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Simulation optimization, combines the use of probability and statistics to model uncertainty with optimization techniques to find good decisions in highly complex and highly uncertain settings.
16. When a decision maker is faced with several decision alternatives and an uncertain set of future events. He/She uses ______ to develop an optimal strategy. a. utility theory b. predictive analytics c. data mining d. decision analysis Answer: D Difficulty: Medium LO: 1.3, Page 7 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The techniques of decision analysis can be used to develop an optimal strategy when a decision maker is faced with several decision alternatives and an uncertain set of future events. 17. ______ assigns values to outcomes based on the decision maker’s attitude toward risk, loss, and other factors. a. Simulation optimization b. Utility theory c. Optimization model d. Data dashboard Answer: B Difficulty: Medium LO: 1.3, Page 8 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Decision analysis also employs utility theory, which assigns values to outcomes based on the decision maker’s attitude toward risk, loss, and other factors. 18. Walmart handles over one million purchase transactions per hour. Although the data represents opportunities, a. it also presents analytical challenges from a processing point of view. b. it can be processed, or analyzed in a reasonable amount of time. c. it has itself led to a decrease in the use of analytics. d. it seldom has valuable applications of analytics. Answer: A Difficulty: Medium LO: 1.4, Page 8 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Although big data represents opportunities, it also presents analytical challenges from a processing point of view.
19. Which of the following examples best describes big data? a. Facebook processes one thousand picture uploads per day. b. Five hundred cell-phone owners around the world generate vast amounts of data by calling, texting, tweeting and browsing the web on a daily basis. c. The amount of data created every 48 hours is equivalent to the entire amount of data created from the dawn of civilization. d. 9 percent of the data in the world today has been created in the last two years. Answer: C Difficulty: Medium LO: 1.4, Page 8 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Big data is simply a set of data that cannot be managed, processed, or analyzed with commonly available software in a reasonable amount of time. 20. Advanced analytics in a few instances refers to: a. descriptive and prescriptive analytics. b. simulation.. c. predictive and prescriptive analytics. d. decision analysis. Answer: C Difficulty: Medium LO: 1.5, Page 9 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Predictive and prescriptive analytics are sometimes therefore referred to as advanced analytics. 21. In a financial sector, we use ______ to construct financial instruments such as derivatives. a. descriptive and prescriptive models b. predictive models c. descriptive models d. prescriptive models Answer: B Difficulty: Medium LO: 1.5, Page 9 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Predictive models are used to forecast future financial performance, to construct financial instruments such as derivatives, etc. 22. GE Asset Management uses optimization models to:
a. assess the risk of investment portfolios. b. forecast future financial performance. c. successfully manage commercial real estate risk. d. decide on how to invest its cash received from insurance policies. Answer: D Difficulty: Medium LO: 1.5, Page 9 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: GE Asset Management uses optimization models to decide how to invest its own cash received from insurance policies and other financial products. 23. In order to manage an organization’s human resource activities such as employee hiring, tracking and influencing employee retention, the HR personnel use ______. a. descriptive and predictive analytics b. prescriptive analytics c. predictive and prescriptive analytics d. predictive analytics Answer: A Difficulty: Medium LO: 1.5, Page 10 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The HR analytics team uses descriptive and predictive analytics to support employee hiring and to track and influence retention. 24. A better understanding of consumer behavior through analytics leads to a(n): a. decrease in demand. b. assured increase in sales. c. more effective pricing strategies. d. poor customer satisfaction. Answer: C Difficulty: Easy LO: 1.5, Page 10 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: A better understanding of consumer behavior through analytics leads to the better use of advertising budgets, more effective pricing strategies, improved forecasting of demand, improved product line management, and increased customer satisfaction and loyalty. 25. All the three descriptive, predictive, and prescriptive analytics are used in:
a. b. c. d.
Marketing and HR analytics. Marketing and Health Care analytics. Health Care and HR analytics. Health Care and Financial analytics.
Answer: B Difficulty: Easy LO: 1.5, Page 10 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Descriptive, predictive, and prescriptive analytics are all used in marketing and healthcare. 26. A children’s apparel manufacturer used descriptive analytics: a. to present supply chain to managers visually. b. to achieve efficiency in delivery of goods. c. to schedule staff and vehicle for delivery. d. to plan capacity utilization by incorporating the inherent uncertainty in commodities pricing. Answer: A Difficulty: Easy LO: 1.5, Page 11 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The women’s apparel manufacturer has successfully used descriptive analytics to present the status of its supply chain to managers visually. 27. The U.S. Internal Revenue Service uses ______ to identify patterns that distinguish questionable annual personal income tax filings. a. utility theory b. prescriptive analytics c. data mining d. decision analysis Answer: C Difficulty: Easy LO: 1.5, Page 12 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The U.S. Internal Revenue Service has used data mining to identify patterns that distinguish questionable annual personal income tax filings. 28. Franchises across several major sports dynamically adjust ticket prices throughout the season to reflect the relative attractiveness and potential demand for each game by using: a. data mining.
b. predictive analytics. c. simulation. d. prescriptive analytics. Answer: D Difficulty: Easy LO: 1.5, Page 12 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Using prescriptive analytics, franchises across several major sports dynamically adjust ticket prices throughout the season to reflect the relative attractiveness and potential demand for each game. 29. Web sites and social media sites use descriptive and advanced analytics to data collected in online experiments: a. to position ads for the promotion of products and services. b. to schedule staff and vehicle. c. to design its premium seating offerings. d. to decide how much to offer players in contract negotiations. Answer: A Difficulty: Easy LO: 1.5, Page 12 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Leading companies apply descriptive and advanced analytics to data collected in online experiments to determine the best way to configure Web sites, position ads, and utilize social networks for the promotion of products and services. 30. The disadvantage of online experimentation is: a. it fails to promote sales. b. it uses trial-and-error to statistically determine the difference in the Web site traffic and sales. c. its experiments are conducted with risk in disruption of the overall business. d. it fails to track the results of different versions of a Web site. Answer: B Difficulty: Easy LO: 1.5, Page 13 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Online experiments are proving to be invaluable because they enable the company to use trial-and-error in determining statistically what makes a difference in their Web site traffic and sales.
Chapter 2: Descriptive Statistics 1. _____ provide facts and figures that can be used for analysis and interpretation of a population of interest. a. Data b. Variables c. Range d. Query Answer: A Difficulty: Easy LO: 2.1, Page 16 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation. 2. A variable is defined as a a. quantity of interest that can take on same values. b. set of values. c. quantity of interest that can take on different values. d. characteristic that takes on same values from a set of values. Answer: C Difficulty: Easy LO: 2.1, Page 16 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: A characteristic or a quantity of interest that can take on different values is known as a variable. 3. A set of values corresponding to a set of variables is defined as a(n) _____. a. quantity b. event c. factor d. observation Answer: D Difficulty: Easy LO: 2.1, Page 16 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics
Feedback: An observation is a set of values corresponding to a set of variables. 4. The difference in a variable measured over observations (time, customers, items, etc.) is called as _____. a. observed differences b. variation c. variable change d. descriptive analytics Answer: B Difficulty: Moderate LO: 2.1, Page 16 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: Variation is the difference in a variable measured over observations (time, customers, items, etc.). 5. A variable whose values are not known with certainty is called a _____. a. certain variable b. random variable c. constant variable d. decision variable Answer: B Difficulty: Moderate LO: 2.1, Page 17 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: A quantity whose values are not known with certainty is called a random variable, or uncertain variable. 6. _____ act(s) as a representative of the population. a. The analytics b. The variance c. A sample d. The random variables Answer: C Difficulty: Easy LO: 2.2, Page 17 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics
Feedback: A subset of the population is known as a sample, and it acts as a representative of the population. 7. The act of collecting data that are representative of the population data is called a. random sampling. b. sample data. c. population sampling. d. applications of business analytics. Answer: A Difficulty: Easy LO: 2.2, Page 18 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: A representative sample can be gathered by random sampling of the population data. 8. The data on grades (A, B, C, and D) scored by all students in a test is an example of a. quantitative data. b. sample data. c. categorical data. d. analytical data. Answer: C Difficulty: Easy LO: 2.2, Page 18 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: If arithmetic operations cannot be performed on the data, they are considered categorical data. 9. The data on the time taken by 10 students in a class to answer a test is an example of a. population data. b. categorical data. c. time series data. d. quantitative data. Answer: D Difficulty: Easy LO: 2.2, Page 18 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics
Feedback: Data are considered quantitative data if numeric and arithmetic operations, such as addition, subtraction, multiplication, and division, can be performed on them. 10. _____ are collected from several entities at the same point in time. a. Time series data b. Categorical and quantitative data c. Cross-sectional data d. Random data Answer: C Difficulty: Moderate LO: 2.2, Page 18 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: Cross-sectional data are collected from several entities at the same, or approximately the same, point in time. 11. Data collected from several entities over several time periods is a. categorical and quantitative data. b. time series data. c. source data. d. cross-sectional data. Answer: B Difficulty: Easy LO: 2.2, Page 18 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: Time series data are collected over several time periods. 12. In a(n) _____, one or more variables are identified and controlled or manipulated so that data can be obtained about how they influence the variable of interest identified first. a. experimental study b. observational study c. categorical study d. variable study Answer: A Difficulty: Easy LO: 2.2, Page 18 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics
Feedback: In an experimental study, a variable of interest is first identified. Then one or more other variables are identified and controlled or manipulated so that data can be obtained about how they influence the variable of interest. 13. The data collected from the customers in restaurants about the quality of food is an example of a. variable study. b. cross-sectional study. c. experimental study. d. observational study. Answer: D Difficulty: Moderate LO: 2.2, Page 19 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: Nonexperimental, or observational, studies make no attempt to control the variables of interest. Some restaurants use observational studies to obtain data about customer opinions on the quality of food, quality of service, atmosphere, and so on. 14. When the data are large and when it is difficult to analyze all at once, which of the following feature in Excel is used to make the data more manageable and to develop insights? a. Frequency table b. Sorting and filtering c. Fill color d. Charts Answer: B Difficulty: Easy LO: 2.3, Page 21 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: Excel contains option to sort and filter data so that one can identify patterns of the data more easily. 15. A summary of data that shows the number of observations in each of several nonoverlapping bins is called a. a frequency distribution. b. a sample summary. c. a bin distribution. d. an observed distribution. Answer: A Difficulty: Easy LO: 2.4, Page 25
Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: A frequency distribution is a summary of data that shows the number (frequency) of observations in each of several nonoverlapping classes, typically referred to as bins, when dealing with distributions. 16. Which of the following gives the proportion of items in each bin? a. Frequency b. Percent frequency c. Relative frequency d. Bin proportion Answer: C Difficulty: Easy LO: 2.4, Page 27 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The relative frequency of a bin equals the fraction or proportion of items belonging to a class. 17. Compute the relative frequencies for the data given in the table below:
Grades A B C D Total
Number of students 16 28 33 13 90
a. 0.31, 0.14, 0.37, 0.18 b. 0.37, 0.14, 0.31, 0.18 c. 0.14, 0.31, 0.37, 0.18 d. 0.18, 0.31, 0.37, 0.14 Answer: D Difficulty: Moderate LO: 2.4, Page 27 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The relative frequency of a bin equals the fraction or proportion of items belonging to a class. Relative frequency of a bin = Frequency of the bin /n.
18. Consider the data below. What percentage of students scored grade C?
Grades A B C D Total
Number of students 16 28 33 13 90
a. 33% b. 31% c. 37% d. 28% Answer: C Difficulty: Moderate LO: 2.4, Page 27 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: A percent frequency distribution summarizes the percent frequency of the data for each bin. The percent frequency of a bin is the relative frequency multiplied by 100. 19. Which of the following are necessary to be determined to define the classes for a frequency distribution with quantitative data? a. Number of nonoverlapping bins, width of each bin, and bin limits b. Width of each bin and bin lower limits c. Number of overlapping bins, width of each bin, and bin upper limits d. Width of each bin and number of bins Answer: A Difficulty: Moderate LO: 2.4, Page 28 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The three steps necessary to define the classes for a frequency distribution with quantitative data are: determine the number of nonoverlapping bins, determine the width of each bin, and determine the bin limits. 20. The purpose of using enough bins is to show the a. number of observations. b. number of variables. c. variation in the data.
d. correlation in the data. Answer: C Difficulty: Moderate LO: 2.4, Page 28 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The goal is to use enough bins to show the variation in the data, but not so many classes that some contain only a few data items. 21. _____ is a graphical summary of data previously summarized in a frequency distribution. a. Box plot b. Histogram c. Line chart d. Scatter chart Answer: B Difficulty: Easy LO: 2.4, Page 31 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: A common graphical presentation of quantitative data is a histogram. This graphical summary can be prepared for data previously summarized in either a frequency, a relative frequency, or a percent frequency distribution. 22. Identify the shape of the distribution in the below figure.
a. Moderately skewed left b. Symmetric
c. Highly skewed right d. Moderately skewed right Answer: D Difficulty: Moderate LO: 2.4, Pages 33-34 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: A histogram is said to be skewed to the right if its tail extends farther to the right than to the left. The given histogram is, therefore, moderately skewed to the right. 23. The _____ shows the number of data items with values less than or equal to the upper class limit of each class. a. cumulative frequency distribution b. frequency distribution c. percent frequency distribution d. relative frequency distribution Answer: A Difficulty: Easy LO: 2.4, Page 34 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The cumulative frequency distribution shows the number of data items with values less than or equal to the upper class limit of each class. 24. The _____ is a point estimate of the population mean for the variable of interest. a. sample mean b. median c. Sample d. geometric mean Answer: A Difficulty: Moderate LO: 2.5, Page 35 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The sample mean is a point estimate of the (typically unknown) population mean for the variable of interest. 25. Compute the mean of the following data: 56 42 37 29 45 51
30
25
34
57
a. 42.8 b. 52.1 c. 40.6 d. 39.4 Answer: C Difficulty: Moderate LO: 2.5, Page 35 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The mean provides a measure of central location for the data. It is computed as: Mean =
56+42+37+29+45+51+30+25+34+57 406 = = 40.6. 10 10
26. Compute the median of the following data: 32 41 36 24 29 30 a. 28 b. 31 c. 40 d. 34
40
22
25
37
Answer: B Difficulty: Moderate LO: 2.5, Pages 36-37 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The median is the value in the middle when the data are arranged in ascending order (smallest to largest value). Computed as: Median = average of middle two values = 27. Compute the mode for the following data: 12 16 19 10 12 11 a. 21 b. 11 c. 12 d. 10
21
12
21
30+32 2
= 31.
10
Answer: C Difficulty: Moderate LO: 2.5, Page 37 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The mode is the value that occurs most frequently in a data set. The value 12 occurs with the greatest frequency of 3. Therefore, the mode is 12.
28. Compute the geometric mean for the following data on growth factors of an investment for 10 years: 1.10 0.50 0.70 1.21 1.25 1.12 1.16 1.11 1.13 1.22 a. 1.0221 b. 1.0148 c. 1.0363 d. 1.1475 Answer: B Difficulty: Moderate LO: 2.5, Pages 38-39 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The geometric mean is a measure of location that is calculated by finding the nth root of the product of n values. Geometric mean = 10
√(1.1)(0.5)(0.7)(1.21)(1.25)(1.12)(1.16)(1.11)(1.13)(1.22) = 1.0148.
29. The simplest measure of variability is the a. variance. b. standard deviation. c. coefficient of variation. d. range. Answer: D Difficulty: Easy LO: 2.6, Page 41 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The simplest measure of variability is the range. 30. The variance is based on the a. deviation about the median. b. number of variables. c. deviation about the mean. d. correlation in the data. Answer: C Difficulty: Easy LO: 2.6, Page 41 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics
Feedback: The variance is based on the deviation about the mean, which is the difference between the value of each observation (xi) and the mean. 31. For the following sample data, compute the variance. 32
41
36
24
29
30
40
22
25
37
a. 45.6 b. 35.5 c. 41.04 d. 29.4 Answer: A Difficulty: Moderate LO: 2.6, Page 42 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The variance is based on the deviation about the mean, which is the difference between the value of each observation (xi) and the mean. It is computed as, s2 =
∑(𝑥𝑖 −𝑥̅ )2 𝑛−1
= 410.4/9 = 45.6.
32. Compute the standard deviation for the following sample data. 32 41 36 24 29 30 40 22 25 a. 5.96 b. 6.41 c. 5.42 d. 6.75
37
Answer: D Difficulty: Moderate LO: 2.6, Page 43 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The standard deviation is defined to be the positive square root of the variance. It is computed as s = √s2 = √45.6 = 6.75. 33. Compute the coefficient of variation for the following sample data. 32 41 36 24 29 30 40 22 25 a. 18.64 percent b. 21.36 percent c. 20.28 percent d. 21.67 percent Answer: B
37
Difficulty: Moderate LO: 2.6, Page 44 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The coefficient of variation indicates how large the standard deviation is relative to the mean. The coefficient of variation is (6.75/31.6 × 100) = 21.36 percent. 34. Compute the 50th percentile for the following data: 10
15
17
21
25
12
16
11
13
22
a. 18.6 b. 13.3 c. 15.5 d. 17.7 Answer: C Difficulty: Moderate LO: 2.7, Pages 44-45 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: A percentile is the value of a variable at which a specified (approximate) percentage of observations are below that value. 50th percentile = median = 15.5. 35. Compute the third quartile for the following data. 10 15 17 21 25 12 16 a. 21.25 b. 15.5 c. 21.5 d. 11.75
11
13
22
Answer: A Difficulty: Moderate LO: 2.7, Pages 45-46 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: Quartiles divide data into four parts, with each part containing approximately onefourth, or 25 percent, of the observations. The third quartile is 21.25. 36. Compute IQR for the following data. 10 15 17 21 25 a. 6.25 b. 7.75 c. 5.14
12
16
11
13
22
d. 9.50 Answer: D Difficulty: Moderate LO: 2.7, Page 46 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The difference between the third and first quartiles is often referred to as the interquartile range, or IQR. IQR = 21.25 – 11.75 = 9.50. 37. A _____ determines how far a particular value is from the mean relative to the data set’s standard deviation. a. coefficient of variation b. z-score c. variance d. percentile Answer: B Difficulty: Moderate LO: 2.7, Page 46 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: A z-score helps us determine how far a particular value is from the mean relative to the data set’s standard deviation. 38. For data having a bell-shaped distribution, approximately _____ percent of the data values will be within one standard deviation of the mean. a. 95 b. 66 c. 68 d. 97 Answer: C Difficulty: Easy LO: 2.7, Page 48 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: Approximately 68 percent of the data values will be within one standard deviation of the mean for data having a bell-shaped distribution. 39. Any data value with a z-score less than –3 or greater than +3 is treated as a(n) a. outlier. b. usual value.
c. whisker. d. z-score value. Answer: A Difficulty: Easy LO: 2.7, Page 49 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: Any data value with a z-score less than –3 or greater than +3 is treated as an outlier. 40. Which of the following graphs provide information on outliers and IQR of a data set? a. Histogram b. Line chart c. Scatter chart d. Box plot Answer: D Difficulty: Easy LO: 2.7, Page 49 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: A box plot is a graphical summary of the distribution of data and it is developed from the quartiles for a data set. Therefore, the information on the outliers and IQR can be obtained from a box plot. 41. If covariance between two variables is near 0, then it implies that a. there exists a positive relationship between the variables. b. the variables are not linearly related. c. the variables are negatively related. d. the variables are strongly related. Answer: B Difficulty: Easy LO: 2.8, Page 53 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: If the covariance between two variables is near 0, then the variables are not linearly related. 42. The correlation coefficient will always take values a. greater than 0. b. between –1 and 0. c. between –1 and +1. d. less than –1.
Answer: C Difficulty: Easy LO: 2.8, Page 55 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Descriptive Statistics Feedback: The correlation coefficient will always take values between –1 and +1. Problems 1. A student willing to participate in a debate competition required to fill a registration form. State whether each of the following information about the participant provides categorical or quantitative data. a. What is your date of birth? b. Have you participated in any debate competition previously? c. If yes, how many debate competitions have you participated so far? d. Have you won any of the competitions? e. If yes, how many have you won? Answer: a. Quantitative. b. Categorical. c. Quantitative. d. Categorical. e. Quantitative. Difficulty: Easy LO: 2.2, Page 18 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 2. The following table provides information on the number of billionaires in a country and the continents on which these countries are located. Nationality United States Brazil Russia Mexico India Turkey United Kingdom Hong Kong
Continent North America South America Europe North America Asia Europe Europe Asia
Number of Billionaires 426 38 105 37 54 40 31 39
Germany Canada China
Europe North America Asia
57 28 120
a. Sort the countries from largest to smallest based on the number of billionaires. What are the top 5 countries according to the number of billionaires? b. Filter the countries to display only the countries located in North America. Answer:
a. Number of Nationality Continent Billionaires United States North America 426 China Asia 120 Russia Europe 105 Germany Europe 57 India Asia 54 Turkey Europe 40 Hong Kong Asia 39 Brazil South America 38 Mexico North America 37 United Kingdom Europe 31 Canada North America 28 The top five countries with more number of billionaires are United States, China, Russia, Germany, and India.
b.
Nationality United States Mexico Canada
Continent North America North America North America
Number of Billionaires 426 37 28
Difficulty: Moderate LO: 2.3, Pages 21-23 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 3. The data on the percentage of visitors in the previous and current years at 12 well-known national parks of Unites States are given below:
Percentage of visitors Percentage of visitors National Parks previous year current year The Smokies 78.2% 84.2% The Grand Canyon 83.5% 81.6% Theodore Roosevelt 81.6% 84.8% Yosemite 74.2% 78.4% Yellowstone 77.9% 76.2% Olympic 86.4% 88.6% The Colorado Rockies 84.3% 85.4% Zion 76.7% 78.9% The Grand Tetons 84.6% 87.8% Cuyahoga Valley 85.1% 86.7% Acadia 79.2% 82.6% Shenandoah 72.9% 79.2% a. Sort the parks in descending order by their current year’s visitor percentage. Which park has the highest number of visitors in the current year? Which park has the lowest number of visitors in the current year? b. Calculate the change in visitor percentage from the previous to the current year for each park. Use Excel’s conditional formatting to highlight the park whose visitor percentage decreased from the previous year to the current year. c. Use Excel’s conditional formatting tool to create data bars for the change in visitor percentage from the previous year to the current year for each park calculated in part b. Answer: a. The sorted list of parks for the current year appears as below:
National Parks Olympic The Grand Tetons Cuyahoga Valley The Colorado Rockies Theodore Roosevelt The Smokies Acadia The Grand Canyon Shenandoah Zion Yosemite Yellowstone
Percentage of visitors previous Percentage of visitors current year year 86.4% 88.6% 84.6% 87.8% 85.1% 86.7% 84.3% 85.4% 81.6% 84.8% 78.2% 84.2% 79.2% 82.6% 83.5% 81.6% 72.9% 79.2% 76.7% 78.9% 74.2% 78.4% 77.9% 76.2%
Olympic has the highest number of visitor’s in the current year and Yellowstone has the lowest number of visitors in the current year.
b.
National Parks The Smokies The Grand Canyon Theodore Roosevelt Yosemite Yellowstone Olympic The Colorado Rockies Zion The Grand Tetons Cuyahoga Valley Acadia Shenandoah
Percentage of Percentage of visitors visitors current Change in visitor previous year year percentage 78.2% 84.2% 6.00% 83.5% 81.6% -1.90% 81.6% 84.8% 3.20% 74.2% 78.4% 4.20% 77.9% 76.2% -1.70% 86.4% 88.6% 2.20% 84.3% 85.4% 1.10% 76.7% 78.9% 2.20% 84.6% 87.8% 3.20% 85.1% 86.7% 1.60% 79.2% 82.6% 3.40% 72.9% 79.2% 6.30%
c. The output using Excel’s conditional formatting tool that created data bars for the change in visitor percentage from the previous year to the current year for each park appears as below.
Difficulty: Moderate LO: 2.3, Pages 21-25 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 4. The partial relative frequency distribution is given below:
Group 1
Relative Frequency 0.15
2 3 4
0.32 0.29
a. What is the relative frequency of group 4? b. The total sample size is 400. What is the frequency of group 4? c. Show the frequency distribution. d. Show the percent frequency distribution. Answer: a. The relative frequency of group 4 is obtained as 1.00 – 0.15 – 0.32 – 0.29 = 0.24. b. If the total sample size is 400, the frequency of group 4 is obtained as 0.24 × 400 = 96. c. Group 1 2 3 4 Total
Relative Frequency 0.15 0.32 0.29 0.24 1.00
Frequency 60 128 116 96 400
Group 1 2 3 4 Total
Relative Frequency 0.15 0.32 0.29 0.24 1.00
% Frequency 15 32 29 24 100
d.
Difficulty: Moderate LO: 2.4, Pages 25-28 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 5. A survey on the most preferred newspaper in USA listed The New York Times(TNYT), Washington Post(WP), Daily News(DN), New York Post(NYP), and Los Angeles Times (LAT) as the top five most preferred newspapers. The table below shows the preferences of 50 citizens.
TNYT DN
WP TNYT
NYP LAT
WP WP
TNYT WP
DN NYP LAT WP TNYT LAT WP TNYT
LAT TNYT WP DN TNYT LAT WP DN
TNYT WP DN TNYT LAT NYP TNYT NYP
TNYT LAT WP DN TNYT WP DN TNYT
NYP NYP LAT DN NYP DN TNYT WP
a. Are these data categorical or quantitative? b. Provide frequency and percent frequency distributions. c. On the basis of the sample, which newspaper is preferred the most? Answer: a. The given data are categorical. b. Newspapers TNYT WP DN NYP LAT Total
Frequency 14 12 9 7 8 50
% Frequency 28 24 18 14 16 100
c. The most preferred newspaper is The New York Times. Difficulty: Moderate LO: 2.4, Pages 25-28 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 6. The mentor of a class researched on the number of hours spent on study in a week by each student of the class, to analyze the correlation between the study hours and the marks obtained by each student. The data on the hours spent per week by 25 students are listed below:
13 12 13 17 24
14 19 16 18 20
16 21 18 23 14
15 22 25 16 22
12 19 21 12 15
a. What is the least amount of time a student spent per week on studying after school hours in this sample? The highest? b. Use a class width of 2 hours to prepare a frequency distribution, a relative frequency distribution, and a percent frequency distribution for the data. c. Prepare a histogram and comment on the shape of the distribution. Answer: a. The least time a student spends is 12 hours, and the highest is 25 hours. b. Hours in Study per Week 12-13 14-15 16-17 18-19 20-21 22-23 24-25 Total
Relative Frequency Frequency 5 0.2 4 0.16 4 0.16 4 0.16 3 0.12 3 0.12 2 0.08 25 1
% Frequency 20 16 16 16 12 12 8 100
c.
Hours in Study per Week 6
Frequency
5 4 3 2 1 0 12-13
14-15
16-17
18-19 Hours
The distribution is skewed to the right. Difficulty: Moderate LO: 2.4, Pages 28-34 Bloom’s: Application BUSPROG: Analytic Skills
20-21
22-23
24-25
DISC: Descriptive Statistics 7. The manager of an automobile showroom studied the time spent by each salesman interacting with the customer in a month apart from the other jobs assigned to them. The data in hours are given below. 17 18 20 15 19 10 26 13 17 24 14 26
13 16 24 19 12 16 27 23 15 20 21 24
Using classes 10−13, 14−17, and so on, show: a. The frequency distribution. b. The relative frequency distribution. c. The cumulative frequency distribution. d. The cumulative relative frequency distribution. e. The proportion of salesmen who spend 13 hours of time or less with the customers. f. Prepare a histogram and comment on the shape of the distribution. Answer:
Class 10-13 14-17 18-21 22-25 26-29 Total
Frequency 4 7 6 4 3 24
Relative Frequency 0.17 0.29 0.25 0.17 0.13 ≈1
Cumulative Frequency 4 11 17 21 24
Cumulative Relative Frequency 0.17 0.46 0.71 0.88 1.00 (approx.)
e. From the cumulative relative frequency distribution, 17% of the salesmen spend 13 hours of time or less time with the customers. f.
The distribution is skewed to the right. Difficulty: Challenging LO: 2.4, Pages 28-35 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 8. The scores of a sample of students in a Math test are 20, 15, 19, 21, 22, 12, 17, 14, 24, 16 and in a Stat test are 16, 12, 19, 17, 22, 14, 20, 21, 24, 15, 13. a. Compute the mean and median scores for both the Math and the Stat tests. b. Compare the mean and median scores computed in part a. Comment. Answer: a. For Math test: Mean = 18. Median = 18. For Stat test: Mean = 17.5. Median = 17. b. The mean and the median scores for statistics are lower than that for mathematics. These lower values are because of an additional score 13 for statistics which is lower than the mean and the median scores for mathematics. Difficulty: Moderate LO: 2.5, Pages 35-37 Bloom’s: Application
BUSPROG: Analytic Skills DISC: Descriptive Statistics 9. Consider a sample on the waiting times (in minutes), at the billing counter in a grocery store, to be 15, 24, 18, 15, 21, 20, 15, 22, 19, 16, 15, 22, 20, 15, and 21. Compute the mean, median, and mode. Answer: Mean = 18.53. Median = 19. Mode = 15.
Difficulty: Moderate LO: 2.5, Pages 35-38 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 10. Suppose that you make a fixed deposit of $1,000 in Bank X, and $500 in Bank Y. The value of each investment at the end of each subsequent year is provided in the table: Year 1 2 3 4 5 6 7 8 9 10
Bank X ($) 1,320 1,510 1,750 2,090 2,240 2,470 2,830 3,220 3,450 3,690
Bank Y ($) 560 620 680 740 790 820 870 910 950 990
Which of the two banks provide a better return over this time period? Answer: a.
Year
Bank X
1 2 3
1,000 1,320 1,510 1,750
Growth Factor 1.32 1.14 1.16
Bank Y 500 560 620 680
Growth Factor 1.12 1.11 1.10
4 5 6 7 8 9 10
2,090 2,240 2,470 2,830 3,220 3,450 3,690
Geometric Mean % of return
1.19 1.07 1.10 1.15 1.14 1.07 1.07
740 790 820 870 910 950 990
1.09 1.07 1.04 1.06 1.05 1.04 1.04
1.1395
Geometric Mean
1.0707
13.95%
% of return
7.07%
We observe that Bank X provides better return when compared to Bank Y. Difficulty: Challenging LO: 2.5, Pages 38-40 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 11. Consider a sample on the waiting times (in minutes) at the billing counter in a grocery store to be 15, 24, 18, 15, 21, 20, 15, 22, 19, 16, 15, 22, 20, 15, and 21. Compute the 25th, 50th, and 75th percentiles. Answer:
25th percentile = 15. 50th percentile = 19. 75th percentile = 21.
Difficulty: Moderate LO: 2.7, Pages 44-45 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 12. Suppose that the average time an employee takes to reach the office is 35 minutes. To address the issue of late comers, the mode of transport chosen by the employee is tracked: private transport (two-wheelers and four-wheelers) and public transport. The data on the average time (in minutes) taken using both a private transportation system and a public transportation system for a sample of employees are given below: Private Transport 27
Public Transport 30
33 28 32 20 34 30 28 18 29
29 25 20 27 32 37 38 21 35
a. What are the mean and median travel times for employees using a private transport? What are the mean and median travel times for employees using a public transport? b. What are the variance and standard deviation of travel times for employees using a private transport? What are the variance and standard deviation of travel times for employees using a public transport? c. Comment. Answer: Travel times (in minutes) a. Using private transport: Mean = 27.9. Median = 28.5. Using public transport: Mean = 29.4. Median = 29.5. b. Using private transport: Variance= 27.43. Standard deviation = 5.24. Using public transport: Variance = 39.38. Standard deviation = 6.28. c. The travel times of employees using a private transport are less than that when using a public transport. Difficulty: Moderate LO: 2.5 and 2.6, Pages 35-37 and 41-43 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 13. The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average time taken to resolve the issue by a sample of 15 such executives are shown below:
Name Jack Sam Richard Steve Mc Cathay Sergio John Mike Lewis Mark Matt Peter Shaggy Jeff Gerald
Time (in minutes) 25.3 28.2 26.8 29.5 22.4 21.7 24.3 22.4 26.8 29.4 23.6 26.4 23.5 26.8 28.1
a. What is the mean resolution time? b. What is the median resolution time? c. What is the mode for these 15 executives? d. What is the variance and standard deviation? e. What is the third quartile? Answer: a. Mean = 25.68. b. Median = 26.4. c. Mode = 26.8. d. Variance = 6.67; Standard deviation = 2.58. e. Third Quartile = 28.1. Difficulty: Moderate LOs: 2.5, 2.6, 2.7, Pages 35-38, 41-43, 45-46 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 14. Suppose that the average time an employee takes to reach the office is 35 minutes. To address the issue of late comers, the mode of transport chosen by the employee is tracked: private transport (two-wheelers and four-wheelers) and public transport. The data on the average time (in minutes) taken using both a private transportation system and a public transportation system for a sample of employees are given below: Private Transport
Public Transport
27 33 28 32 20 34 30 28 18 29
30 29 25 20 27 32 37 38 21 35
a. Considering the travel times (in minutes) of employees using private transport. Compute the z-score for the tenth employee with travel time of 29 minutes. b. Considering the travel times (in minutes) of employees using public transport. Compute the z-score for the second employee with travel time of 29 minutes. How does this z-score compare with the z-score you calculated for part a? c. Based on z-scores, do the data for employees using private transport and public transport contain any outliers? Answer: a. For tenth employee using private transport: The z-score is obtained as, 𝑧 =
(29−27.9) = 0.21. 5.24
b. For second employee using public transport: The z-score is obtained as, 𝑧 =
(29−29.4) = −0.06. 6.28
Even though the employees had the same travel time, the z-score for the tenth employee in the sample who used a private transport is much larger because that employee is part of a sample with a smaller mean and a smaller standard deviation. c. Travel Times using Private Transport 27 33 28 32 20 34 30 28 18 29
z-score -0.17 0.97 0.02 0.78 -1.51 1.16 0.40 0.02 -1.89 0.21
Travel Times using Public Transport 30 29 25 20 27 32 37 38 21 35
z-score 0.10 -0.06 -0.70 -1.50 -0.38 0.41 1.21 1.37 -1.34 0.89
No z-score is less than –3.0 or above +3.0; therefore, the z-scores do not indicate the existence of any outliers in either sample. Difficulty: Challenging LO: 2.7, Pages 46-47 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 15. The results of a survey showed that on average, children spend 5.6 hours at PlayStation per week. Suppose that the standard deviation is 1.7 hours and that the number of hours at PlayStation follows a bell-shaped distribution. a. Use the empirical rule to calculate the percentage of children who spend between 2.2 and 9 hours at PlayStation per week. b. What is the z-value for a child who spends 7.5 hours at PlayStation per week? c. What is the z-value for a child who spends 4.5 hours at PlayStation per week? Answer: a. According to the empirical rule, approximately 95% of data values will be within two standard deviations of the mean. 2.2 is two standard deviations less than the mean and 9 is two standard deviations greater than the mean. Therefore, approximately 95% of children spend between 2.2 and 9 hours at PlayStation per week. (7.5−5.6) = 1.12. 1.7 (4.5−5.6) 𝑧 = 1.7 = −0.65.
b. 𝑧 = c.
Difficulty: Moderate LO: 2.7, Pages 46-48 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 16. A study on the average minutes spent by students on internet usage is 300 with a standard deviation of 102. Answer the following questions assuming a bell-shaped distribution and using the empirical rule. a. What percentage of students use internet for more than 402 minutes? b. What percentage of students use internet for more than 504 minutes? c. What percentage of students use internet between 198 minutes and 300 minutes? Answer: a. 402 is one standard deviation above the mean. The empirical rule states that 68% of data values will be within one standard deviation of the mean. Because a bell-shaped distribution is symmetric, 0.5×(1-68%) = 16% of the data values will be greater than (mean + 1×standard deviation) 402. 16% of students use internet for more than 402 minutes.
b. 504 is two standard deviations above the mean. The empirical rule states that 95% of data values will be within two standard deviations of the mean. Because a bell-shaped distribution is symmetric, 0.5×(1-95%) = 2.5% of the data values will be greater than (mean + 2×standard deviation) 504. 2.5% of students use internet for more than 504 minutes. c. 198 is one standard deviation below the mean. The empirical rule states that 68% of data values will be within one standard deviation of the mean, and we expect that 0.5×(1 - 68%) = 16% of data values will be below one standard deviation below the mean. 300 is the mean, so we expect that 50% of the data values will be below the mean. Therefore, we expect 50% 16% = 34% of the data values will be between the mean 300 and one standard deviation below the mean 198. 34% of students use internet between 198 minutes and 300 minutes. Difficulty: Challenging LO: 2.7, Page 48 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 17. Eight observations taken for two variables are as follows: 𝑥𝑖
𝑦𝑖
11
35
13
32
17
26
18
25
22
20
24
17
26
11
28
10
a. Develop a scatter diagram with x on the horizontal axis. b. What does the scatter diagram developed in part a indicate about the relationship between the two variables? c. Compute and interpret the sample covariance. d. Compute and interpret the sample correlation coefficient. Answer: a.
b. There appears to be a negative linear relationship between the x and y variables. c. 𝑥𝑖
𝑦𝑖
(𝑥𝑖 − 𝑥̅ )
(𝑦𝑖 − 𝑦̅)
( xi − x )( yi − y )
11
35
-8.88
13
-115.38
13
32
-6.88
10
-68.75
17
26
-2.88
4
-11.50
18
25
-1.88
3
-5.63
22
20
2.13
-2
-4.25
24
17
4.13
-5
-20.63
26
11
6.13
-11
-67.38
28
10
8.13
-12
-97.50 -391
𝑥̅ = 19.88 𝑦̅ = 22
𝑠𝑥𝑦 =
∑(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦̅) −391 = = −55.86. 𝑛−1 7
The negative covariance confirms that there is a negative linear relationship between the x and y variables in this data set. d. 𝑠𝑥 = 6.13, 𝑠𝑦 = 9.17 Then the correlation coefficient is calculated as: 𝑠
−55.86
𝑟𝑥𝑦 = 𝑠 𝑥𝑦 = (6.13)(9.17) = −0.99. 𝑠 𝑥 𝑦
The correlation coefficient again confirms and indicates a strong negative linear association between the x and y variables in this data set. Difficulty: Challenging LO: 2.8, Page 52-56 Bloom’s: Application BUSPROG: Analytic Skills DISC: Descriptive Statistics 18. Consider the following data on income and savings of a sample of residents in a locality: Income ($ thousands) 50 51 52 55 56 58 60 62 65 66
Savings($ thousands) 10 11 13 14 15 15 16 16 17 17
a. Compute the correlation coefficient. Is there a positive correlation between the income and savings? What is your interpretation? b. Show a scatter diagram of the relationship between the income and savings. Answer: a.
𝑥𝑖 50 51 52 55
𝑦𝑖 10 11 13 14
(𝑥𝑖 − 𝑥̅ ) -7.5 -6.5 -5.5 -2.5
(𝑦𝑖 − 𝑦̅) -4.4 -3.4 -1.4 -0.4
(𝑥𝑖 − 𝑥̅ )2 56.25 42.25 30.25 6.25
(𝑦𝑖 − 𝑦̅)2 19.36 11.56 1.96 0.16
( xi − x )( yi − y ) 33 22.1 7.7 1
56 58 60 62 65 66 𝑠𝑥𝑦 =
15 15 16 16 17 17
-1.5 0.5 2.5 4.5 7.5 8.5
0.6 0.6 1.6 1.6 2.6 2.6
2.25 0.25 6.25 20.25 56.25 72.25 292.5
0.36 0.36 2.56 2.56 6.76 6.76 52.4
-0.9 0.3 4 7.2 19.5 22.1 116
∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) 116 = = 12.89. 𝑛−1 9
∑(𝑥𝑖 − 𝑥̅ )2 292.5 𝑠𝑥 = √ =√ = 5.70. 𝑛−1 9 ∑(𝑦 − 𝑦̅)2 52.4 𝑠𝑦 = √ =√ = 2.41. 𝑛−1 9 𝑟𝑥𝑦 =
𝑠𝑥𝑦 12.89 = = 0.938 𝑠𝑥 𝑠𝑦 (5.70)(2.41)
This indicates that there is a strong positive relationship between income and savings. b. 18
Savings ($ thousands)
16
14
12
10
8 45
50
55
60
Income ($ thousands)
Difficulty: Challenging LO: 2.8, Page 52-56 Bloom’s: Application
65
70
BUSPROG: Analytic Skills DISC: Descriptive Statistics
Chapter 3: Data Visualization 1. _____ helps in designing effective tables and charts for data visualization. a. Data-ink ratio b. Crosstabulation c. PivotTable d. Scatter charts Answer: A Difficulty: Easy LO: 3.1, Page 73 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: One of the most helpful ideas for creating effective tables and charts for data visualization is the idea of the data-ink ratio. 2. Data-ink is the ink used in a table or chart that a. does not help in conveying the data to the audience. b. helps in presenting data when the audience need not know exact values. c. is necessary to convey the meaning of the data to the audience. d. increases the Non-data-ink ratio. Answer: C Difficulty: Easy LO: 3.1, Page 73 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Data-ink is the ink used in a table or chart that is necessary to convey the meaning of the data to the audience. 3. Deleting the grid lines in the table and the horizontal lines in the chart a. increases the data-ink ratio. b. decreases the data-ink ratio. c. increases the Non-data-ink ratio. d. does not affect the data-ink ratio. Answer: A Difficulty: Moderate LO: 3.1, Page 74 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Deleting the grid lines increases the data-ink ratio because a larger proportion of the ink used in the table is used to convey the information (the actual numbers). Similarly, deleting the unnecessary horizontal lines in the chart increases the data-ink ratio. 4. In many cases, white space in a chart can improve _____. a. complexity b. readability
c. functionality d. stability Answer: B Difficulty: Easy LO: 3.1, Page 74 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: In many cases, white space in a table or chart can improve readability. 5. Tables should be used when a. the reader need not refer to specific numerical values. b. the reader need not make precise comparisons between different values and not just relative comparisons. c. the values being displayed have different units or very different magnitudes. d. the reader need not differentiate the columns and rows. Answer: C Difficulty: Easy LO: 3.2, Page 75 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The tables should be used when the reader needs to refer to specific numerical values, when the reader needs to make precise comparisons between different values and not just relative comparisons, and when the values being displayed have different units or very different magnitudes. 6. In designing an effective table, a. avoid the use of unnecessary ink in tables. b. increase the number of horizontal and vertical lines in the table. c. avoid the use of necessary ink in tables. d. do not focus on the alignment of the text and numbers in the table. Answer: A Difficulty: Moderate LO: 3.2, Page 77 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: In designing an effective table, keep in mind the data-ink ratio and avoid the use of unnecessary ink in tables. 7. A useful type of table for describing data of two variables is a a. data table. b. bubble chart. c. crosstabulation. d. scatter chart. Answer: C
Difficulty: Easy LO: 3.2, Page 79 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: A crosstabulation provides a tabular summary of data for two variables. 8. A crosstabulation in Microsoft Excel is known as a a. scatter plot. b. bar chart. c. histogram. d. PivotTable. Answer: D Difficulty: Moderate LO: 3.2, Page 80 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: A crosstabulation in Microsoft Excel is known as a PivotTable.
9. ______ are visual methods of displaying data. a. Tables b. Charts c. Pivot tables d. Crosstabs Answer: B Difficulty: Easy LO: 3.3, Page 85 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Charts are used to display and analyze data. 10. The software package most commonly used for creating simple charts is a. Excel. b. XLMiner. c. SAS. d. R. Answer: A Difficulty: Easy LO: 3.3, Page 85 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Excel is the most commonly used software package for creating simple charts. 11. A _____ is a graphical presentation of the relationship between two quantitative variables.
a. histogram b. bar chart c. pie chart d. scatter chart Answer: D Difficulty: Moderate LO: 3.3, Page 85 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: A scatter chart is a graphical presentation of the relationship between two quantitative variables. 12. A _____ is a line that provides an approximation of the relationship between the variables. a. line chart b. sparkline c. trendline d. gridline Answer: C Difficulty: Moderate LO: 3.3, Page 86 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: To obtain an approximate relationship between the variables, we add a trendline on a scatter chart. 13. If the scatter chart indicates a positive linear relationship between two variables, then their correlation coefficient is a. equal to –1. b. greater than 1. c. between 0 and +1. d. between –1 and 0. Answer: C Difficulty: Moderate LO: 3.3, Page 87 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: If the scatter chart indicates a positive linear relationship between two variables, then their covariance is positive and hence, their correlation coefficient is between 0 and +1.
14. A chart similar to a scatter chart, but uses a line to connect the points in the chart is called the a. line chart. b. scatter plot. c. trendline.
d. bar chart. Answer: A Difficulty: Moderate LO: 3.3, Page 87 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Line charts are similar to scatter charts, but a line connects the points in the chart. The line chart connects the points to show the continuity, and makes it easy for the reader to interpret changes over time. 15. A line chart displaying the data values collected over a period of time is termed as a a. boxplot. b. frequency graph. c. dot plot d. time series plot. Answer: D Difficulty: Moderate LO: 3.3, Page 87 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Line charts are very useful for time series data collected over a period of time (minutes, hours, days, years, etc.). Such line charts are often called as time series plots. 16. A line chart that has no axes but is used to provide information on overall trends for time series data is called a a. time series plot. b. sparkline. c. trendline. d. bubble chart. Answer: B Difficulty: Moderate LO: 3.3, Page 89 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Sparkline is a minimalist type of line chart that provides information on the overall trends for time series data. 17. The charts that are helpful in making comparisons between categorical variables are a. bar charts and scatter charts. b. scatter charts and line charts. c. bar charts and column charts. d. column charts and line charts. Answer: C Difficulty: Moderate
LO: 3.3, Page 90 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Bar charts and column charts provide a graphical summary of categorical data and they are very helpful in making comparisons between categorical variables. 18. Bar charts use a. horizontal bars to display the magnitude of the quantitative variable. b. vertical bars to display the magnitude of the quantitative variable. c. horizontal and vertical bars to display the magnitude of the quantitative variable. d. vertical bars to display the magnitude of the qualitative variable. Answer: A Difficulty: Moderate LO: 3.3, Page 90 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Bar charts use horizontal bars to display the magnitude of the quantitative variable. 19. Making visual comparisons between categorical variables is difficult in a a. scatter chart. b. pie chart. c. line chart. d. column chart. Answer: B Difficulty: Easy LO: 3.3, Page 93 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Making visual comparisons is much easier in the bar chart than in the pie chart, particularly when using a limited number of colors for differentiation. 20. Using multiple lines on a line chart or employing multiple charts is an alternative to a a. column chart. b. line chart. c. two-dimensional graph. d. three-dimensional chart. Answer: D Difficulty: Moderate LO: 3.3, Page 93 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: As an alternative for a three-dimensional (3-D) chart, we consider the use of multiple lines on a line chart or employing multiple charts.
21. A chart that is recommended as an alternative to a pie chart is a a. bar chart. b. line chart. c. stacked column chart. d. box plot. Answer: A Difficulty: Moderate LO: 3.3, Page 93 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Making visual comparisons is much easier in the bar chart than in the pie chart. It is recommended against using pie charts in most situations and suggested using bar charts for comparing categorical data instead. 22. In order to visualize three variables in two-dimensional graph, we use a a. 2-D chart. b. 3-D chart. c. bubble chart. d. column chart. Answer: C Difficulty: Moderate LO: 3.3, Page 93 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: A bubble chart is a graphical means of visualizing three variables in a twodimensional graph and is therefore, sometimes a preferred alternative to a 3-D graph. 23. The size of the bubble in a bubble chart can represent the a. two categorical variables. b. z-axis value. c. area covered by the categorical variables. d. intersection of the x-axis and the y-axis values. Answer: B Difficulty: Moderate LO: 3.3, Page 93 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: An alternative to a 3-D chart is to create a bubble chart, where the size of the bubble can represent the z-axis value. 24. A two-dimensional graph representing the data using different shades of color to indicate magnitude is called a ______. a. heat map b. bubble chart
c. column chart d. pie chart Answer: A Difficulty: Easy LO: 3.3, Page 95 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: A heat map is a two-dimensional graphical representation of data that uses different shades of color to indicate magnitude. Heat maps depend strongly on the use of color to convey information over different areas, across time, or both. 25. To avoid problems in interpreting the differences in color, _____ can be added. a. a bubble chart. b. a pie chart. c. the heat maps. d. the sparklines. Answer: D Difficulty: Moderate LO: 3.3, Page 97 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: To avoid problems with interpreting differences in color, we can add the sparklines. 26. An effective way to show both trend and magnitude is achieved by using a combination of a a. time series plot and sparklines. b. line chart and trendlines. c. heat map and sparklines. d. bubble chart and trendlines. Answer: C Difficulty: Moderate LO: 3.3, Page 97 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The combination of a heat map and sparklines is an effective way to display both trend and magnitude. 27. A disadvantage of a stacked column and bar chart is a. it does not include all the values of the variable. b. it cannot be used to compare relative values of quantitative variables for the same category. c. it has difficulty perceiving small differences in areas. d. it is used when many quantitative variables need to be displayed. Answer: C
Difficulty: Moderate LO: 3.3, Page 98 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Stacked column and bar charts suffer from the same difficulties as pie charts because the human eye has difficulty perceiving small differences in areas. 28. An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is a a. stacked bar chart. b. clustered column chart. c. pie chart. d. clustered bar chart. Answer: B Difficulty: Easy LO: 3.3, Page 98 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Experts often recommend against the use of stacked column bar charts for more than a couple of quantitative variables in each category. An alternative chart for these same data is called a clustered column (or bar) chart. 29. A useful chart for displaying multiple variables is the a. stacked column and bar charts. b. scatter chart. c. scatter chart matrix. d. two-dimensional graph. Answer: C Difficulty: Moderate LO: 3.3, Page 98 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Scatter chart matrix is a useful chart for displaying multiple variables. 30. To generate scatter chart matrix, we use a. native Excel functionality. b. Excel Add-In XLMiner. c. Excel Add-In MegaStat. d. all of the above. Answer: B Difficulty: Moderate LO: 3.3, Page 101 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC:
Feedback: It is not possible to generate a scatter chart matrix using native Excel functionality. A scatter chart matrix is generated using the Excel Add-In XLMiner. 31. To summarize and analyze data with both a crosstabulation and charting, Excel typically pairs a. PivotCharts with PivotTables. b. stacked column charts with PivotTables. c. heat maps with trendline. d. bubble chart with trendline. Answer: A Difficulty: Moderate LO: 3.3, Page 101 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The PivotChart displays the same information as that of the PivotTable, but the column chart used here makes it easier to compare quantitative variables per category. 32. A PivotChart, in few instances, is same as a a. clustered column chart. b. bubble chart. c. stacked column chart. d. bar chart. Answer: A Difficulty: Moderate LO: 3.3, Page 101 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The PivotChart is same as a clustered column chart in a study based on a couple of quantitative variables belonging to same category. 33. The best way to differentiate chart elements is using a. too many colors. b. labels. c. bubbles. d. chart titles. Answer: B Difficulty: Moderate LO: 3.3, Page 102 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: In many cases, it is preferable to differentiate chart elements with dashed lines, patterns, or labels.
34. A _____ is used for examining data with more than two variables and it includes a different vertical axis for each variable. a. scatter plot. b. PivotChart. c. column chart. d. parallel coordinates plot. Answer: D Difficulty: Moderate LO: 3.4, Page 103 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: One type of helpful chart for examining data with more than two variables is the parallel coordinates plot, which includes a different vertical axis for each variable. 35. A _____ is useful for visualizing hierarchical data along multiple dimensions. a. heat map b. hierarchical map c. tree map d. map of multiple hierarchy Answer: C Difficulty: Moderate LO: 3.4, Page 103 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: In order to visualize hierarchical data along multiple dimensions, we use a tree map.
36. _____ merges maps and statistics to present data collected over different geographies. a. The heat map b. The geographic information system c. A geographical map d. The statistical information system Answer: C Difficulty: Easy LO: 3.4, Page 104 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The geographic information system (GIS) is a system that merges maps and statistics to present data collected over different geographies. 37. A data visualization tool that updates in real time and gives multiple outputs is called a. a data table. b. a metrics table. c. the GIS. d. a data dashboard.
Answer: D Difficulty: Moderate LO: 3.5, Pages 105, 109 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: A data dashboard is a data visualization tool that illustrates multiple metrics and automatically updates these metrics as new data become available. 38. In a business, the values indicating the business’s current operating characteristics, such as its financial position, the inventory on hand, customer service metrics, are typically known as a. company performance indicators. b. performance indicators. c. key performance indicators. d. business performance indicators. Answer: C Difficulty: Easy LO: 3.5, Page 106 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: In a business, the values such as its financial position, the inventory on hand, customer service metrics, and the like are often indicative of the business’s current operating characteristics. These values are typically known as key performance indicators (KPIs). 39. We create multiple dashboards a. to help the user scroll vertically and horizontally to see the entire dashboard. b. so that each dashboard can be viewed on a single screen. c. to make sure the KPIs are not displayed in the data dashboard. d. so that all dashboards can be viewed on a single screen. Answer: B Difficulty: Moderate LO: 3.5, Page 106 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: Rather than requiring the user to scroll vertically and horizontally to see the entire dashboard, it is better to create multiple dashboards so that each dashboard can be viewed on a single screen. 40. The data dashboard for a marketing manager may have KPIs related to a. current sales measures and sales by region. b. current financial standing of the company. c. vehicle’s current speed, fuel level, and engine temperature. d. overall performance of the company’s stock over the previous 52 weeks. Answer: A
Difficulty: Moderate LO: 3.5, Page 106 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The KPIs displayed in the data dashboard should convey meaning to its user and be related to the decisions the user makes. For example, the data dashboard for a marketing manager may have KPIs related to current sales measures and sales by region. 41. Consider the clustered bar chart of the dashboard developed to monitor the performance of a call center:
This chart allows the IT manager to a. identify a particular type of problem by the call volume. b. identify a particular type of problem by location. c. identify different types of problems (Email, Internet, or Software) in the call center. d. identify the frequency of each problem in the call center. Answer: B Difficulty: Moderate LO: 3.5, Page 107 Bloom’s: Knowledge BUSPROG: Analytic Skills DISC: Feedback: The clustered bar chart shows the call volume in the call center by type of problem (Email, Internet, or Software) for each of three cities in Texas. This chart allows the IT manager to quickly identify if there is a particular type of problem by location.
Problems 1. The following table is an example on the profit made by Hydro America, a water servicing company, for 5 different years. Year 2008 2009 2010 2011
Total revenue ($) 62723201 67177612 72648252 71225185
Cost of revenue ($) 26256000 37026005 35054123 35187462
Gross profit ($) 36467201 30151607 37594129 36037723
2012
75847373
39298243
36549130
Reformat the table to improve readability and to help the manager identify the year with the highest profit. Answer: To improve the readability of the table, we remove unnecessary gridlines, right align the numerical columns, remove bolded font except for column titles, and add commas to dollar values to ease readability. Year 2008 2009 2010 2011 2012
Total revenue ($) 62,723,201 67,177,612 72,648,252 71,225,185 75,847,373
Cost of revenue ($) 26,256,000 37,026,005 35,054,123 35,187,462 39,298,243
Gross profit ($) 36,467,201 30,151,607 37,594,129 36,037,723 36,549,130
It is now easy to identify the year with the highest profit (37,594,129) which is 2010. Difficulty: Easy LO: 3.2, Pages 77-79 Bloom’s: Application BUSPROG: Analytic Skills DISC: 2. Consider the below table and the line chart on the temperatures in 11 different states of the United States: States Illinois North Carolina Florida Indiana New Jersey Ohio Pennsylvania Texas Virginia Michigan New York
Temperature (degrees F) 76 79 80 80 85 83 86 90 87 91 85
a. What are the problems with the layout and display of this line chart? b. Create a new line chart for the given data. Format the chart to make it easy to read and interpret. Answer: a. The chart contains unnecessary gridlines, the y-axis label values are spaced closer together, and the shading of the chart does not add value. b.
Small markers are added on the line chart at each data point. Difficulty: Easy LO: 3.3, Pages 87-89 Bloom’s: Application BUSPROG: Analytic Skills DISC:
3. The data on the scores obtained by students in 5 different entrance exams have been collected from 50 colleges and they are provided below. Create a PivotTable in Excel to display the number of students who had taken up each exam and the average score for students in each exam. Exams SAT ACT MCAT GRE GMAT ACT MCAT GRE GMAT SAT GRE GMAT ACT MCAT GRE GMAT SAT GMAT SAT GRE GMAT ACT MCAT GRE ACT
Scores 520 400 580 280 540 356 520 355 480 574 396 450 420 560 297 520 489 500 566 451 460 422 550 310 384
Exams MCAT GRE GMAT SAT GMAT SAT GRE MCAT GRE ACT MCAT GRE GMAT SAT GMAT SAT GMAT ACT MCAT GRE GMAT SAT GMAT SAT GRE
Scores 487 267 455 528 536 469 455 520 489 455 589 500 500 528 480 475 570 480 567 546 544 420 453 510 473
a. Which exam did most students attempt? b. Which exam has the highest average score? c. Use the PivotTable to determine the exam attempted by the student with the highest score. What is the exam attempted by the student with the lowest score? Answer:
a. Most students attempted the GMAT exam. The PivotTable shows that GMAT exam has the greatest number of students with 13 students. b. MCAT has the highest average score of 547 (approx.). c. By changing the Value Field Settings for Scores from Average to Max, we see that MCAT has the highest score of 589. By changing the Value Field Settings for Scores to Min, we see that GRE has the least score of 267. Difficulty: Moderate LO: 3.2, Pages 80-85 Bloom’s: Application BUSPROG: Analytic Skills DISC: 4. A local search service company surveys on the number of service centres available in 3 major cities for different brands of automobiles with an objective to improve the services to its customers. The data on the 20 automobile brands and the number of service centres are given below:
Brands Audi BMW Mercedes-Benz
Number of service centres 38 42 49
Rolls-Royce Volkswagen Toyota Jaguar Nissan Ford Fiat Land Rover Chevrolet Ferrari Hyundai Porsche Skoda Tata Honda Renault Subaru
25 30 30 42 35 41 35 34 29 32 15 35 42 20 23 40 10
a. How many automobile brands have centres between 20 and 29 in these 3 cities? b. How many automobile brands have more than 40 centres in these cities? Answer: We create a PivotTable to summarize the given data using classes 10-19, 20-29, 30-39, and 4049.
Use “Number of service centres” as the Columns, and use “Count of Number of service centres” as the Values in the PivotTable. Right click on the table and use the option Group to obtain the classes. We see that, a. 4 automobile brands have centres between 20 and 29 in these cities. b. 6 brands have centres more than 40. Difficulty: Moderate LO: 3.2, Pages 82-83 Bloom’s: Application BUSPROG: Analytic Skills DISC: 5. A summary on commodities below lists the change in price on a particular day for each commodity belonging to 3 categories - Base Metals, Precious Metals, and Agricultural & Cattle Futures.
Commodity Aluminium Gold Corn Silver Aluminium Alloy Wheat Soybeans Copper Platinum Cocoa Coffee Lead White Sugar Sugar 11 Nickel Cotton Oranges Tin Palladium Palm oil Zinc
Commodity Summary Type of Commodity Base metals Precious metals Agricultural & Cattle Futures Precious metals Base metals Agricultural & Cattle Futures Agricultural & Cattle Futures Base metals Precious metals Agricultural & Cattle Futures Agricultural & Cattle Futures Base metals Agricultural & Cattle Futures Agricultural & Cattle Futures Base metals Agricultural & Cattle Futures Agricultural & Cattle Futures Base metals Precious metals Agricultural & Cattle Futures Base metals
Price 1700 1229 400 1975 1750 640 1300 7012 1357 2.8 109 2065 450 20.19 13300 77.39 139.62 22600 717 930 1800
Change (%) 0.0750 -0.2300 0.0125 -0.1800 -0.1000 -0.0425 -0.1250 -0.1700 -0.1900 0.0000 -0.0085 -0.1000 -0.0900 -0.0087 -0.2500 -0.0087 -0.0040 0.3000 -0.0700 0.0500 -0.0100
a. Prepare a PivotTable that gives the frequency count of the data by Commodity Type (rows) and the Change (columns). Use classes of -0.25–(-0.15), -0.15–(-0.05), -0.05–0.05, 0.05–0.15, and 0.25–0.35 for the Change (%).
b. What conclusions can you draw about the commodity type and the change (%) in price for that particular day? Answer: a.
b. The Precious metal commodities had the lowest change (%) in price for that particular day and the Base metals had varied changes between -0.25% and 0.35%. No commodity of the Agricultural & Cattle futures and the Precious metals had a change greater than 0.25%.
Difficulty: Moderate LO: 3.2, Pages 80-83 Bloom’s: Application BUSPROG: Analytic Skills DISC: 6. The income levels vary by race and educational attainment. To examine this inequality in the income, data have been collected for 7 different years on the median income earned by an individual based on the race and education.
Year 2003 2003 2003 2003 2003 2003 2003
Racial Demographic White White White White White Black Black
Educational attainment High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college
Median Income $33,405 $40,325 $55,225 $67,295 $77,521 $25,741 $30,517
2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 1997 1997 1997 1997 1997 1997 1997 1997 1997 1997 1997
Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian
Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate
$46,851 $52,106 $61,523 $33,654 $25,749 $51,752 $70,519 $81,760 $22,547 $25,403 $35,482 $45,207 $56,217 $32,451 $39,410 $53,178 $65,147 $75,120 $23,874 $29,415 $42,013 $50,321 $60,741 $32,185 $24,961 $50,102 $75,410 $80,164 $23,784 $25,640 $32,654 $44,891 $55,617 $31,048 $38,497 $52,179 $62,498 $74,614 $22,981 $26,479 $43,578 $48,521 $58,462 $30,148
1997 1997 1997 1997 1997 1997 1997 1997 1997 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991
Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian Asian Asian Asian Asian
Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree
$23,647 $49,521 $72,149 $80,149 $22,156 $23,641 $31,560 $43,297 $53,189 $33,405 $40,325 $55,225 $67,295 $77,521 $25,741 $30,517 $46,851 $52,106 $61,523 $33,654 $25,749 $51,752 $70,519 $81,760 $22,547 $25,403 $35,482 $45,207 $56,217 $32,451 $39,410 $53,178 $65,147 $75,120 $23,874 $29,415 $42,013 $50,321 $60,741 $32,185 $24,961 $50,102 $75,410 $80,164
1991 1991 1991 1991 1991 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985
Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic
High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree
$23,784 $25,640 $32,654 $44,891 $55,617 $31,048 $38,497 $52,179 $62,498 $74,614 $22,981 $26,479 $43,578 $48,521 $58,462 $30,148 $23,647 $49,521 $72,149 $80,149 $22,156 $23,641 $31,560 $43,297 $53,189 $30,178 $36,479 $50,341 $60,278 $72,369 $20,149 $25,874 $42,987 $42,687 $55,649 $29,741 $22,648 $45,321 $70,561 $75,219 $20,498 $22,647 $30,489 $40,089
1985
Hispanic
Doctorate degree
$52,641
a. Sort the PivotTable data to display the years with the smallest sum of median income on top and the largest on the bottom. Which year had the smallest sum of median income? What is the total income in the year with the smallest sum of median income? b. Add the Racial Demographic to the Row Labels in the PivotTable. Sort the Racial Demographic by Sum of Median Income with the lowest values on top and the highest values on bottom. Filter the Row Labels so that only the year 2003 is displayed. Which Racial demography had the smallest sum of median income in the year 2003? Which Racial demography had the largest sum of median income in the year 2003? Answer: To sort data in a PivotTable in Excel, right-click any cell in the PivotTable that contains the data to be sorted, and select Sort.
a.
The year 1985 had the smallest sum of median income with $846,845. b.
Hispanics had the lowest sum of median income and Whites had the highest sum of median income in the year 2003. Difficulty: Challenging LO: 3.2, Pages 80-85 Bloom’s: Application BUSPROG: Analytic Skills
DISC: 7. Consider a study on the number of accidents occurred in 10 states of the United States of America in different cities for 3 consecutive years. Create a PivotTable in Excel to answer the following questions. The PivotTable should group the number of accidents into yearly bins and display the sum of accidents occurred each year in columns of Excel. Row labels should include the accident locations and allow for grouping the locations into states or viewing by city. You should also sort the PivotTable so that the states with the greatest number of accidents between 2011 and 2013 appear at the top of the PivotTable. State GA GA GA FL GA FL AZ FL IA GA CO GA GA FL GA CA FL GA GA GA FL CO FL FL AZ FL GA GA CO CO GA FL GA GA
City Rock Spring Doraville Ellaville Jacksonville Stockbridge Belleview Phoenix Crestview Johnston Rockmart Greenwood Village Jonesboro Decatur Clearwater Gray Nevada City Milton Woodstock Cumming Statesboro Palm Beach Greeley Sarasota Apollo Beach Prescott Port St. Lucie Stockbridge Atlanta Windsor Castle Rock Clayton Tampa Jackson Franklin
Number of accidents 52 44 67 53 72 63 69 51 48 44 53 54 76 76 57 76 61 78 70 47 42 60 40 75 40 61 78 60 43 55 58 58 63 64
Year 2011 2011 2011 2011 2011 2012 2011 2012 2012 2012 2011 2011 2013 2013 2012 2013 2011 2013 2012 2013 2011 2012 2011 2011 2012 2012 2012 2011 2013 2011 2011 2011 2012 2012
GA FL GA GA FL FL AL AL GA GA CA CA GA GA CA FL GA GA CO CO GA GA AZ FL AR GA GA GA FL FL AZ GA GA CA CA AZ GA GA FL FL FL FL GA GA
Macon Cocoa Beach Valdosta Dallas Brooksville Winter Park Birmingham Birmingham East Ellijay Cartersville San Luis Obispo Napa Springfield Clarkesville Palm Springs Port Orange Watkinsville Roswell Louisville Denver McDonough Brunswick Scottsdale Orlando Batesville Atlanta McCaysville Dawsonville Coral Gables Carrabelle Scottsdale Vidalia Tifton Westminster Woodland Hills Scottsdale Barnesville Gordon Tampa Jacksonville Crawfordville Ponte Vedra Beach Winder Douglasville
71 76 76 64 41 41 74 78 80 66 57 65 80 60 69 77 48 65 62 80 54 48 75 41 60 49 54 62 53 49 42 66 73 68 72 79 54 47 62 68 56 50 61 51
2011 2011 2011 2012 2013 2011 2013 2011 2013 2011 2011 2012 2013 2012 2011 2013 2012 2013 2012 2013 2012 2013 2011 2013 2011 2012 2011 2013 2011 2012 2012 2013 2011 2011 2013 2013 2011 2013 2012 2013 2013 2013 2012 2011
GA FL CA CA CA CA FL FL FL FL GA GA FL FL FL FL GA FL CA FL FL FL GA CA AZ FL CA CA FL FL FL AZ GA FL GA AL GA GA GA FL FL CA FL CA
Ellijay Bradenton Sonoma Solvang Chico Stockton Ocala Bartow Panama City Beach Port Saint Joe Acworth Jasper Lantana Clewiston Aventura Miami Savannah Englewood Granite Bay Tampa Naples Fort Lauderdale Saint Marys San Diego Mesa Bonifay San Rafael Oakland Fort Pierce Clermont Palatka Phoenix Cartersville Key West Carrollton Fort Deposit Hiawassee Ellijay Duluth Orlando Boca Raton La Jolla Marco Island Los Angeles
40 43 67 65 61 61 56 61 45 68 42 48 72 61 57 67 46 63 64 60 58 57 62 61 70 47 77 57 48 49 70 52 41 54 65 68 58 61 47 78 45 45 78 61
2011 2011 2012 2013 2013 2011 2011 2011 2012 2011 2013 2011 2012 2011 2012 2012 2012 2012 2012 2013 2011 2012 2013 2012 2012 2011 2013 2013 2011 2013 2011 2013 2012 2013 2012 2013 2013 2012 2012 2011 2012 2011 2012 2011
GA FL GA FL CA CA AL FL GA AZ FL GA GA GA GA FL CA CA AR GA CA CA CA AZ FL FL GA FL CA CO AZ IA CA AL GA AZ AL FL FL FL GA GA GA GA
Cornelia Immokalee Carrollton Miami Santa Monica La Jolla Irondale Panama City Atlanta Mesa Miami Reidsville Norcross Alpharetta Alpharetta Boca Raton Newport Beach Pasadena Bentonville Alpharetta San Francisco Los Angeles San Diego Phoenix Bradenton Naples Lawrenceville Ocala Bakersfield Pueblo Flagstaff Sioux City Ventura Birmingham Newnan Gilbert Montgomery Venice Sarasota Jupiter Gray Perry Macon Woodstock
75 55 56 51 54 40 80 76 64 41 75 77 52 54 58 78 55 59 67 50 41 80 57 67 77 59 58 51 68 47 42 53 46 58 45 40 64 71 61 66 69 72 40 78
2011 2011 2013 2011 2011 2013 2012 2012 2013 2011 2013 2011 2013 2011 2012 2011 2012 2012 2012 2013 2013 2012 2013 2012 2012 2013 2012 2013 2011 2011 2013 2013 2012 2012 2013 2011 2011 2013 2013 2012 2012 2012 2012 2012
GA CA CA GA CA CA GA GA GA FL CA GA CO CO GA GA FL CA CA GA CA GA GA CA CA GA FL FL CA CT AR HI
Suwanee Temecula Rancho Cucamonga Winder Los Angeles Irvine Newnan Villa Rica Fayetteville Coral Gables Calabasas Kennesaw Greeley Colorado Springs Stockbridge Commerce Cape Coral Merced Culver City McDonough Redlands Duluth Jackson Pomona Newport Beach Loganville Bradenton Tallahassee Torrance Stamford Gravette Honolulu
58 45 57 77 66 74 63 67 55 75 42 63 52 79 47 65 69 52 55 63 48 51 68 41 61 68 44 60 42 70 44 70
2011 2012 2013 2013 2013 2013 2011 2012 2013 2013 2012 2012 2013 2013 2013 2012 2013 2011 2011 2013 2012 2013 2013 2011 2011 2012 2013 2012 2013 2011 2013 2011
a. Which state had the greatest number of accidents between 2011 and 2013? b. How many accidents occurred in the state of Colorado (CO) in 2012? In what cities did these accidents occur? c. Use the PivotTable’s filter capability to view only the accidents in Alabama (AL), Arizona (AZ), and Arkansas (AR) for the years 2011 through 2013. What is the total number of accidents in these states between 2011 and 2013? d. Create a PivotChart to display a column chart that shows the total number of accidents in each year 2011 through 2013 in the state of California. Adjust the formatting of this column chart so that it best conveys the data. What does this column chart suggest about accidents between 2011 and 2013 in California? Discuss.
Hint: You may have to switch the row and column labels in the PivotChart to get the best presentation for your PivotChart. Answer: a.
Georgia (GA) had the greatest number of accidents between 2011 and 2013.
b.
The state of Colorado had 122 accidents in the year 2012 in the cities, Greeley and Louisville. c.
There were 1210 accidents between the years 2011 through 2013 in Alabama (AL), Arizona (AZ), and Arkansas (AR).
d.
The accidents have increased in the year 2013 compared to the past two years. Difficulty: Challenging LO: 3.2 and 3.3, Pages 80-85 and 101-102 Bloom’s: Application BUSPROG: Analytic Skills DISC: 8. The data on the distance walked per week by 20 people of different age groups are given in the table below:
Age 18 20 21 25 26 29 38 34 42 23 32 45 50 53 44 19 28
Distance walked/week 25 22 20 23 18 15 19 16 14 21 24 13 11 9 10 28 26
35 49 27
17 12 27
a. Create a scatter chart for these 20 observations. b. Fit a linear trendline to the 20 observations. What can you say about the relationship between the two quantitative variables? Answer: a. 30
Distance walked
25 Distance walked/week
20 15 10
5 0 15
25
35 Age
45
55
b. 30
Distance walked
25 20
Distance walked/week
15
Linear (Distance walked/week)
10 5 0 15
25
35 Age
45
55
There appears to be a negative linear relationship between the age and the distance walked. Difficulty: Easy LO: 3.3, Pages 85-87 Bloom’s: Application BUSPROG: Analytic Skills
DISC: 9. Consider the below data on 30 different investments and their maturity values after 15 years. Investment ($) 1500 2000 2200 2480 2850 3250 3560 3890 4180 4390 4550 4800 5150 5320 5510 5760 6140 6300 6480 6590 6712 6900 7110 7480 7590 7670 7700 7840 8010 8500
Future value ($) 3119 4158 4574 5156 5925 6757 7401 8088 8690 9127 9460 9979 10707 11060 11455 11975 12765 13098 13472 13701 13954 14345 14782 15551 15780 15946 16008 16299 16653 17671
a. Prepare a scatter diagram to show the relationship between the variables Investment and Future value. Comment on any relationship between the variables. b. Create a trendline for the relationship between Investment and Future value. What does the trendline indicate about this relationship? Answer: a.
Future value
16000 13000 10000 Future value
7000
4000 1000 1000 2500 4000 5500 7000 8500 10000 Investment
There appears to be a positive linear relationship between investment and future value. As investment increases, future value also increases. b.
Future value
16000 13000 10000
Future value
7000
Linear (Future value)
4000 1000 1000 2500 4000 5500 7000 850010000 Investment
The trendline confirms that there is a positive linear trend between investment and future value. Difficulty: Easy LO: 3.3, Pages 85-87 Bloom’s: Application BUSPROG: Analytic Skills DISC:
10.A survey on the average pass percentage achieved by 4 of the top-ranked colleges of a city for 5 different years was conducted to rate the quality of teaching in each of these colleges. Colleges College 1
Year 1 65
Year 2 67
Year 3 63
Year 4 68
Year 5 70
College 2 70 75 77 82 75 College 3 88 95 90 97 98 College 4 55 57 53 59 55 a. Construct a line chart for the time series data for years 1 through 5 showing the average pass percentage in each college. Show the time series for all four colleges on the same graph. b. What does the line chart indicate about the average pass percentage of the colleges between years 1 through 5? Discuss. c. Construct a clustered column chart showing average pass percentage in each college using the years 1 through 5 data. Represent the years along the horizontal axis, and cluster the average pass percentages for the four colleges in each year. Which college is leading in each year? Answer: a. 100
Average Pass Percentage
90 80 70 60
College 1
50
College 2
40
College 3
30
College 4
20 10 Year 1
Year 2
Year 3
Year 4
Year 5
Year
b. College 3 has the highest average pass percentage between years 1 through 5 followed by College 2, College 1, and College 4. This performance has been consistent throughout the 5 years. c.
100
Average Pass Percentage
90 80 70 60
College 1
50
College 2
40
College 3
30
College 4
20 10 Year 1
Year 2
Year 3
Year 4
Year 5
Year
We observe that, College 3 is leading in each year consistently. Difficulty: Moderate LO: 3.3, Pages 87-91 Bloom’s: Application BUSPROG: Analytic Skills DISC: 11. Growth is the primary focus for all companies. A factor that acts as a key term while analysing the growth of a company is the number of resources/ employees working for the company over a period of time. One such study about a start-up company’s growth, in terms of the increase in the number of employees per month in a span of 2 years is shown below: Month 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Number of employees 40 48 50 52 49 54 57 53 60 64 68 70 73 76 72 75 79
18 19 20 21 22 23 24
80 84 82 86 89 94 100
a. Create a line chart for these time series data. What interpretations can you make about the increase in the number of employees over these 24 months? b. Fit a linear trendline to the data. What does the trendline indicate about the increase in the number of employees over these 24 months? Answer: a.
Number of Employees
110 90 70 50 30 10 1
3
5
7
9
11
13
15
17
19
21
23
Month
There was a slight change in the number of employees for the first 9 months. It increased rapidly through the 14 months before falling in the 15 month and again increased up to 24 months. Overall, there was an increase in the number of employees over the 24 months.
b.
Number of Employees
110 90 70 50 30 10 1
3
5
7
9
11
13
15
17
19
21
23
Month
The trendline confirms that there is an overall linear trend in the increase in the number of employees over these 24 months. Difficulty: Easy LO: 3.3, Page 87 Bloom’s: Application BUSPROG: Analytic Skills DISC: 12. The data on the runs scored in a match by top 5 players of a cricket team are given below: Players Player 1 Player 2 Player 3 Player 4 Player 5
Runs Scored 42 35 53 29 39
a. Create a column chart to display the information in the table above. Format the column chart to best display the data by adding axes labels, a chart title, etc. b. Sort the values in Excel so that the column chart is ordered from most runs scored to fewest. c. Insert data labels to display the runs scored by each player above the columns in the column chart obtained in part b. Answer: a.
Runs scored in a match by top five players of the team
Runs Scored
60
50 40 30 20 10 Player 1
Player 2
Player 3
Player 4
Player 5
Players
b. Sorting can be done by selecting the data in Excel and then using the Sort function in the Sort & Filter group under the DATA tab.
Runs scored in a match by top five players of the team 60
Runs Scored
50 40 30 20 10 0 Player 3
Player 1
Player 5
Player 2
Player 4
Players
c. Data labels can be added by right clicking on one of the columns in the chart and selecting Add Data Labels.
Runs scored in a match by top five players of the team
Runs Scored
60
53
50
42
40
39
35
30
29
20 10 0 Player 3
Player 1
Player 5 Players
Difficulty: Moderate LO: 3.3, Pages 90-93 Bloom’s: Application
Player 2
Player 4
BUSPROG: Analytic Skills DISC: 13. The total number of runs scored by the players in the previous problem is 198. The following pie chart shows the percentage of runs scored by each player:
Runs Scored
20%
21%
Player 1 Player 2 Player 3
14%
18%
Player 4 Player 5
27%
a. What are the problems with using a pie chart to display these data? b. What type of chart would be preferred for displaying the data in this pie chart? c. Use a different type of chart to display the percentage of runs scored by each player that conveys the data better than the pie chart. Format the chart and add data labels to improve the chart’s readability. Answer: a. In the pie chart, it is difficult to perceive differences in area. It can also be difficult to distinguish the different colors in the pie chart. Finally, it takes a lot of work for the reader to match the players to the different pieces of the pie chart. b. A sorted column or bar chart would be preferable to display the data in this pie chart. c.
Runs scored in a match by top five players of the team
% of runs scored
30%
27%
25%
21%
20%
20%
18%
15%
15% 10% 5% 0% Player 3
Player 1
Player 5
Player 2
Player 4
Players
Difficulty: Moderate LO: 3.3, Page 93 Bloom’s: Application BUSPROG: Analytic Skills DISC:
14. A research was conducted on a sample of 1000 males and 1000 females to study the kind of movie most men and women prefer to watch. The results are shown in the table below: Movie Type Action Comedy Horror Romance
Male 294 264 237 205
Female 226 276 200 298
a. Construct a clustered column chart with the type of movie as the horizontal variable. b. What can we infer from the clustered bar chart in part a? Answer: a.
350
Frequency
300 250 Male
200
Female 150 100 Action
Comedy
Horror
Romance
Movie Type
b. From the chart, we observe that most men prefer to watch action movies and most women prefer to watch romantic movies. However, the preferences for comedy movies are almost evenly distributed across the genders and a horror movie is preferred more by men when compared to women. Difficulty: Moderate LO: 3.3, Pages 97-99 Bloom’s: Application BUSPROG: Analytic Skills DISC: 15. Consider the following survey results regarding marital status by age: Age Category 18-24 25-34 35-44 45-54
Never Married (%) 49 44 28 22
Married (%) 35 35 45 58
Divorced (%) 16 21 27 20
a. Construct a stacked column chart to display the survey data on marital status. Use Age Category as the variable on the horizontal axis. b. Construct a clustered column chart to display the survey data. Use Age Category as the variable on the horizontal axis. c. What can you infer about the relationship between age and marital status from the column charts in parts a and b? Which column chart (stacked or clustered) is best for interpreting this relationship? Why? Answer: a.
Never Married (%)
Married (%)
Divorced (%)
120
Marital Status (%)
100
80 60 40 20 0 18-24
25-34
35-44
Age Category
b.
45-54
Never Married (%)
Married (%)
Divorced (%)
70
Marital Status (%)
60 50 40 30 20 10 0 18-24
25-34
35-44
45-54
Age Category
c. Younger respondents are more likely to be never married and older respondents are more likely to be married. The clustered column chart makes it easier to compare the relative percent values within an age category. The percentage of respondents who are never married is high in the age group 18-24 and 25-34. The percentage of respondents who are married is high in the age group 45-54 and who are divorced is high in the age group 35-44. Difficulty: Moderate LO: 3.3, Pages 97-99 Bloom’s: Application BUSPROG: Analytic Skills DISC: 16. The regional manager of a company wishes to determine the time spent at each division in the car production process. A study was undertaken over a month that resulted in the following data related to the percentage of time spent at three divisions - Car body construction, Paint shop, and Assembly, at four locations of production plants.
Production Plants Michigan Kentucky Illinois Ohio
Car Body Construction (%) 35 37 33 36
Paint Shop (%) 45 41 39 40
Assembly (%) 20 22 28 24
a. Create a stacked bar chart with production plants along the vertical axis. Reformat the bar chart to best display these data by adding required labels and chart title.
b. Create a clustered bar chart with production plants along the vertical axis and clusters of divisions. Reformat the bar chart to best display these data by adding required labels and chart title. c. Create multiple bar charts where each production plant becomes a single bar chart showing the percentage of time spent at the divisions. Reformat the bar charts to best display these data by adding required labels and chart title. d. Which form of bar chart (stacked, clustered, or multiple) is preferable for these data? Why? Answer: a.
Percent time spent at the divisions Car Body Construction
Paint Shop
Assembly
Production Plant
Ohio Illinois Kentucky Michigan 0
20
40
60
80
100
120
Time Spent (%)
b.
Percent time spent at the divisions Assembly
Paint Shop
Car Body Construction
Production Plant
Ohio Illinois Kentucky Michigan 0
10
20
30 Time Spent (%)
40
50
60
c.
Michigan Assembly Paint Shop Car Body Construction 0
10
20
30
40
50
60
40
50
60
40
50
60
Time Spent (%)
Kentucky Assembly Paint Shop Car Body Construction 0
10
20
30
Time Spent (%)
Illinois Assembly
Paint Shop
Car Body Construction 0
10
20
30
Time Spent (%)
Ohio Assembly
Paint Shop
Car Body Construction 0
10
20
30
40
50
60
Time Spent (%)
d. Both the stacked and clustered bar chart do not help in making relative comparisons when there are many quantitative variables within each category, so the individual bar charts are preferred. However, the clustered bar chart, which may help make comparisons between the production plants easier, may be preferred in this case. Difficulty: Challenging LO: 3.3, Pages 97-99 Bloom’s: Application BUSPROG: Analytic Skills DISC:
17. A consumer electronics company, after three months of the launch of 5 new products in the market, arrived at the following results: Products A B C D E
Profit (%) 19 28 15 22 16
Market share (%) 18 12 25 35 10
Cost ($) 4500 3000 8750 6250 2500
a. Create a bubble chart where the market share is along the horizontal axis, the profit is on the vertical axis, and the size of the bubbles represents the cost. Format this chart for best presentation by adding axes labels and labelling each bubble with the product name. b. The manager of the company is interested in producing the product that increases the profit for a given level of market share and cost. From the bubble chart in part a, identify the product which needs to be produced in larger quantity. c. From the bubble chart in part a, now identify the product which needs to be produced in larger quantity taking into account both its market share and cost and that can increase the profit.
Answer: a.
Profits made by the products given the market share and cost 35
Profit (%)
30
B
25 D
20
A E
15
C
10 5 5
10
15
20
25
30
35
40
Market share (%)
b. Product B makes the highest profit and hence, it needs to be produced in larger quantity for the given level of market share and cost. c. Product D can be produced in larger quantity when its market share, the profit, and its cost are taken into consideration. Difficulty: Challenging LO: 3.3, Pages 93-95 Bloom’s: Application BUSPROG: Analytic Skills DISC: 18. The project lead in an MNC decides to assign every member of his team to a new project and monitors their performance on a customized scale of scores. The data on their performance over a period of six months are shown below:
Team members 1 2 3 4 5 6 7 8 9 10 11
Jan 4 2 -1 5 2 4 1 1 4 4 5
Performance Scores Feb Mar Apr May 5 2 3 -1 2 3 1 4 4 4 4 5 1 4 2 5 2 1 5 4 1 2 1 -1 5 -1 2 5 2 5 5 4 5 3 4 2 5 2 2 -1 -1 5 1 2
Jun 3 4 1 3 4 4 1 2 2 5 2
12 13 14 15 16 17 18 19 20 21 22 23 24 25
3 -1 2 5 5 5 3 2 4 3 3 2 1 3
2 1 3 -1 2 -1 4 3 4 1 1 2 2 5
-1 4 3 5 1 4 -1 5 3 1 5 -1 4 -1
-1 -1 2 5 5 -1 5 1 1 4 5 3 4 2
1 4 -1 1 2 2 2 3 -1 -1 4 2 4 5
2 5 4 1 -1 1 4 1 5 4 2 -1 3 4
a. Create a heat map in Excel that shades the cells with negative performance scores. Use Excel’s Conditional Formatting function to create this heat map. b. For each month, identify the team member who scored negative? Which month has the highest negative performance scores? Answer: a.
b. January: Team members 3 and 13. February: Team members 11, 15, and 17. March: Team members 7, 12, 18, 23, and 25. April: Team members 12, 13, and 17. May: Team members 1, 6, 10, 14, 20, and 21. June: Team members 16 and 23. We observe that, most team members scored negative in the month of May. Difficulty: Moderate LO: 3.3, Pages 95-97 Bloom’s: Application BUSPROG: Analytic Skills DISC: 19. The following table shows the average monthly distance travelled (in Billion Miles) by vehicles on urban highways for five different years. Urban Highways - Average Monthly Distance Travelled by Vehicles (Billion Miles)
Years Year 1 Year 2 Year 3 Year 4 Year 5
Jan 4.22 4.31 4.38 4.45 4.51
Feb 5.32 5.44 5.51 5.59 5.65
Mar 5.21 5.34 5.41 5.5 5.62
Apr 5.12 5.24 5.36 5.41 5.49
May 4.92 4.98 4.98 5.01 5.12
Jun 4.49 4.59 4.63 4.72 4.8
July 4.55 4.68 4.71 4.78 4.88
Aug 4.49 4.65 4.78 4.79 4.82
Sep 4.44 4.61 4.82 4.82 4.95
Oct 4.39 4.68 4.88 4.92 5.12
Nov 4.37 4.74 4.85 5.06 5.22
Dec 4.35 4.79 4.89 5.11 5.44
a. Use Excel to create sparklines for the average monthly vehicle distance travelled each year. b. Which year has decreasing trend of the average distance travelled? Which year has increasing trend of the average distance travelled? c. Use Excel to create a heat map for the average distance travelled by vehicles. Do you find the heat map or the sparklines to be better at communicating the trend of the average vehicle distance travelled over different years? Why? Answer: a.
b. Year 1 has a decreasing trend and Year 5 has an increasing trend. c.
It is difficult to create a heat map that effectively conveys the overall trend of the average monthly distance travelled for each year. The heat map shows the relative magnitude of the distance travelled by the vehicles which is absent from the sparklines. However, the trend for each year is less apparent in the heat map. Difficulty: Moderate LO: 3.3, Pages 95-97 Bloom’s: Application BUSPROG: Analytic Skills DISC: 20. The data on the ranks assigned to a random sample of students in a competitive exam based on scores and three different statuses are given below:
Name
Score
Rank
Status
Steve Joshua John Alex Jeff Matt Chris
80 75 95 90 85 80 75
1 2 3 4 5 6 7
DV DV V V V NV NV
a. Create a parallel coordinates plot using XLMiner for these data. Include vertical axes for the name, score, and rank. Color the lines by the type of status. b. According to the parallel coordinates plot, how are disable veterans differentiated from veterans? Answer: a.
b. We observe that the disabled veterans are assigned the highest ranks though their scores are relatively low and the veterans with high scores are given the moderate ranks. Difficulty: Moderate LO: 3.4, Page 103 Bloom’s: Application BUSPROG: Analytic Skills DISC:
21. The owner of the grocery store is interested in providing a better service to his customers with respect to the wait time at the billing counter. The data on 20 waiting customers are given below:
Customer Number
Wait Time (min)
Purchase Amount ($)
Customer Age
Credit Score
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2.3 2.8 3.2 3.4 3.4 4.2 3.2 1.4 6.4 7.8 6.5 9.8 5 1.8 6.1 3.4 7.8 2.8 1.2 9.5
518 592 598 845 648 695 844 470 488 527 843 704 824 570 503 483 707 796 485 727
42 33 38 40 29 46 42 40 24 37 52 43 56 35 39 44 33 42 46 50
694 879 531 509 869 777 470 714 517 794 551 673 846 735 816 516 729 591 866 879
a. Use XLMiner to create a scatter chart matrix for these data. Include the variables wait time, purchase amount, customer age, and credit score. b. What can you infer about the relationships between these variables from the scatter chart matrix? Answer: a.
b. The waiting time appears to have a positive relationship with the purchase amount. The customers’ age seems to have a positive relationship with the purchase amount and credit score as well. Difficulty: Moderate LO: 3.3, Pages 98-101 Bloom’s: Application BUSPROG: Analytic Skills DISC:
Chapter 4: Linear Regression 1. _____ is a statistical procedure used to develop an equation showing how two variables are related. a. Regression analysis b. Data mining c. Time series analysis d. Factor analysis Answer: A Difficulty: Easy LO: 4.1, Page 125 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: Regression analysis is a statistical procedure used to develop an equation showing how variables are related. 2. A regression analysis involving one independent variable and one dependent variable is referred to as a _____. a. factor analysis b. time series analysis c. simple regression d. data mining Answer: C Difficulty: Easy LO: 4.1, Page 125 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: A regression analysis involving one independent variable and one dependent variable is referred to as a simple regression. 3. A linear regression analysis for which any one unit change in the independent variable is assumed to: a. have the same change in the dependent variable. b. have no change in the dependent variable. c. have an inverse effect on the dependent variable d. have a nullifying effect on the dependent variable. Answer: A Difficulty: Easy LO: 4.1, Page 125 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis
Feedback: A regression analysis for which any one unit change in the independent variable is assumed to result in the same change in the dependent variable is referred to as a linear regression. 4. A(n) _____ refers to a measurable factor that defines a characteristic of a population, process, or system. a. random variable b. expectation c. parameter d. residual Answer: C Difficulty: Easy LO: 4.1, Page 125 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: A parameter refers to a measurable factor that defines a characteristic of a population, process, or system. 5. In the simple linear regression model, the _____ accounts for the variability in the dependent variable that cannot be explained by the linear relationship between the variables. a. constant term b. error term c. model parameter d. residual Answer: B Difficulty: Easy LO: 4.1, Page 125 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: In the simple linear regression model, the error term accounts for the variability in the dependent variable that cannot be explained by the linear relationship between the variables. 6. The graph of the simple linear regression equation is a(n) _____. a. ellipse b. hyperbola c. parabola d. straight line Answer: D Difficulty: Easy LO: 4.1, Page 126 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis
Feedback: The graph of the simple linear regression equation is a straight line. 7. In the graph of the simple linear regression equation, the parameter βo represents the _____ of the regression line. a. slope b. x-intercept c. y-intercept d. end-point Answer: C Difficulty: Easy LO: 4.1, Page 126 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: In the graph of the simple linear regression equation, the parameter βo is the yintercept of the regression line. 8. In the graph of the simple linear regression equation, the parameter β1 is the _____ of the regression line. a. slope b. x-intercept c. y-intercept d. end-point Answer: A Difficulty: Easy LO: 4.1, Page 126 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: In the graph of the simple linear regression equation, the parameter β1 is the slope of the regression line. 9. When the mean value of the dependent variable is independent of variation in the independent variable, the slope of the regression line is _____. a. positive b. zero c. negative d. infinite Answer: A Difficulty: Moderate LO: 4.1, Page 126 Bloom’s: Comprehension BUSPROG: Analytic DISC: Regression Analysis Feedback: When the mean value of the dependent variable is independent of the variation in the independent variable, the slope of the regression line is zero.
10. The procedure of using sample data to find the estimated regression equation is better known as _____. a. point estimation b. interval estimation c. the least squares method d. extrapolation Answer: C Difficulty: Moderate LO: 4.2, Page 127 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: The least squares method is a procedure for using sample data to find the estimated regression equation. 11. A _____ is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between the variables. a. contingency table b. scatter chart c. Gantt chart d. pie chart Answer: B Difficulty: Easy LO: 4.2, Page 128 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: A scatter chart is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between the variables. 12. The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is known as the _____. a. constant term b. error term c. residual d. model parameter Answer: C Difficulty: Easy LO: 4.2, Page 129 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is known as the residual.
13. The _____ is the range of values of the independent variables in the data used to estimate the regression model. a. confidence interval b. codomain c. experimental region d. validation set Answer: C Difficulty: Moderate LO: 4.2, Page 130 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: The experimental region is the range of values of the independent variables in the data used to estimate the regression model. 14. Prediction of the value of the dependent variable outside the experimental region is called _____. a. interpolation b. forecasting c. averaging d. extrapolation Answer: D Difficulty: Easy LO: 4.2, Page 130 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: Prediction of the value of the dependent variable outside the experimental region is called extrapolation. 15. The _____ is a measure of the error in using the estimated regression equation to predict the values of the dependent variable in a sample. a. sum of squares due to regression (SSR) b. error term c. sum of squares due to error (SSE) d. residual Answer: C Difficulty: Easy LO: 4.3, Page 134 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: The value of sum of squares due to error is a measure of the error in using the estimated regression equation to predict the values of the dependent variable in the sample.
16. What would be the value of the sum of squares due to regression (SSR) if the total sum of squares (SST) is 25.32 and the sum of squares due to error (SSE) is 6.89? a. 31.89 b. 19.32 c. 18.43 d. 15.32 Answer: C Difficulty: Moderate LO: 4.3, Page 136 Bloom’s: Application BUSPROG: Analytic DISC: Regression Analysis Feedback: The three quantities are related as SST = SSR + SSE. Substituting the values, we get SSR=18.43. 17. The coefficient of determination: a. takes values between -1 to +1. b. is equal to zero for a perfect fit. c. is equal to one for the poorest fit. d. is used to evaluate the goodness of fit. Answer: D Difficulty: Moderate LO: 4.3, Page 136 Bloom’s: Comprehension BUSPROG: Analytic DISC: Regression Analysis Feedback: The coefficient of determination is used to evaluate the goodness of fit for the estimated regression equation. 18. What would be the coefficient of determination if the total sum of squares (SST) is 23.29 and the sum of squares due to regression (SSR) is 10.03? a. 2.32 b. 0.43 c. 13.26 d. 0.89 Answer: B Difficulty: Moderate LO: 4.3, Page 136 Bloom’s: Application BUSPROG: Analytic DISC: Regression Analysis Feedback: The coefficient of determination r2 = SSR/SST. Substituting the given values we get r2 =0.43. 19. The process of making estimates and drawing conclusions about one or more characteristics of a population through analysis of sample data drawn from the population is known as _____.
a. b. c. d.
inductive inference deductive inference statistical inference Bayesian inference
Answer: C Difficulty: Easy LO: 4.5, Page 144 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: The process of making estimates and drawing conclusions about one or more characteristics of a population (the value of one or more parameters) through analysis of sample data drawn from the population is known as statistical inference. 20. The process of making conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture is better known as: a. postulation. b. hypothesis testing. c. statistical inference. d. empirical research. Answer: B Difficulty: Easy LO: 4.5, Page 144 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: Hypothesis testing is the process of making conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture. 21. _____ refers to the use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter. a. Interval estimation b. Hypothesis testing c. Statistical inference d. Point estimation Answer: A Difficulty: Easy LO: 4.5, Page 144 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis
Feedback: Interval estimation refers to the use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter. 22. A normally distributed error term with mean of zero would: a. have values that are symmetric about the variance. b. allow more accurate modelling. c. yield biased regression estimates. d. be a hyperbolic curve. Answer: B Difficulty: Moderate LO: 4.5, Page 144 Bloom’s: Comprehension BUSPROG: Analytic DISC: Regression Analysis Feedback: The practical implication of normally distributed errors with a mean of zero and a constant variation for any given combination of values of x1, x2, . . . , xq is that the regression estimates are unbiased, possess consistent accuracy, and tend to err in small amounts rather than in large amounts. 23. Which of the following inferences can be drawn from the scatter chart given below?
a. b. c. d.
The residuals have a varying variance. The model captures the relationship between the variables accurately. The regression model follows the F probability distribution. The residual distribution is consistently scattered about zero.
Answer: A Difficulty: Challenging LO: 4.5, Page 146 Bloom’s: Comprehension BUSPROG: Analytic DISC: Regression Analysis
Feedback: The variation in the residuals e increases as the value of the independent variable x increases, suggesting that the residuals do not have a constant variance. 24. The following scatter chart would help conclude that:
a. the residuals have a constant variance. b. the model fails to capture the relationship between the variables accurately. c. the model underpredicts the value of the dependent variable for intermediate values of the independent variable. d. the residual is normally distributed. Answer: B Difficulty: Challenging LO: 4.5, Page 147 Bloom’s: Comprehension BUSPROG: Analytic DISC: Regression Analysis Feedback: The residuals are positive for small and large values of the independent variable x but are negative for the remaining values of the independent variable. This pattern suggests that the linear relationships in the regression model underpredicts the value of dependent variable for small and large values of the independent variable and overpredicts the value of the dependent variable for intermediate values of the independent variable. In this case, the regression model does not adequately capture the relationship between the independent variable x and the dependent variable y. 25. Which of the following inferences can be drawn from the scatter chart given below?
a. The residuals have a constant variance. b. The model captures the relationship between the variables accurately. c. The model underpredicts the value of the dependent variable for intermediate values of the independent variable. d. The residual distribution is not normally distributed. Answer: D Difficulty: Challenging LO: 4.5, Page 147 Bloom’s: Comprehension BUSPROG: Analytic DISC: Regression Analysis Feedback: The residuals in the given figure are not symmetrically distributed around zero; many of the negative residuals are relatively close to zero, while the relatively few positive residuals tend to be far from zero. This skewness suggests that the residuals are not normally distributed. 26. The following scatter chart would help conclude that:
a. b. c. d.
the model is time-invariant. the model captures the relationship between the variables accurately. the residuals are interdependent. the residuals are normally distributed.
Answer: C Difficulty: Moderate LO: 4.5, Page 147 Bloom’s: Comprehension BUSPROG: Analytic DISC: Regression Analysis Feedback: In this figure, connected consecutive residuals allow us to see a distinct pattern across every set of four residuals; the second residual is consistently larger than the first and smaller than the third, whereas the fourth residual is consistently the smallest. This pattern, which occurs consistently over each set of four consecutive residuals in the chart, suggests that the residuals generated by this model are not independent. 27. _____ is used to test the hypothesis that the values of the regression parameters β1, β2, . . . , βq are all zero. a. An F test b. A t test c. The least squares method d. Extrapolation Answer: A Difficulty: Easy LO: 4.5, Page 150 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: An F test is used to test the hypothesis that the values of the regression parameters β1, β2, . . . , βq are all zero. Ref: 4.1 Use the data given below for Questions 28-30. A regression model with four independent variables and a sample set of 550 observations is found to have a SSR value of 15.48 and SSE value of 5.32. 28. What would be the mean square due to regression (MSR) in this case? a. 15.48 b. 10 c. 5.9 d. 3.87 Answer: D Difficulty: Moderate LO: 4.5, Page 150 Bloom’s: Application BUSPROG: Analytic
DISC: Regression Analysis Feedback: The MSR is given by the formula (SSR/q). Substituting the given values, we get MSR=3.87. 29. What would be the mean square error (MSE) in this case? a. 0.59 b. 0.01 c. 1.26 d. 545 Answer: B Difficulty: Moderate LO: 4.5, Page 150 Bloom’s: Application BUSPROG: Analytic DISC: Regression Analysis Feedback: The MSE is given by the formula (SSE/ n-q-1). Substituting the given values, we get SSE=0.0097, which is approximated to 0.01. 30. What would be the F test statistic in this case? a. 396.46 b. 400 c. 350.32 d. 298.55 Answer: A Difficulty: Moderate LO: 4.5, Page 150 Bloom’s: Application BUSPROG: Analytic DISC: Regression Analysis Feedback: The F test statistic is given by the formula MSR / MSE. Substituting the given values, we get F=396.46. 31. The _____ is an indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating. a. residual b. tolerance factor c. confidence level d. accuracy level Answer: C Difficulty: Easy LO: 4.5, Page 152 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis
Feedback: The confidence level is an indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating. 32. _____ refers to the degree of correlation among independent variables in a regression model. a. Multicollinearity b. Tolerance c. Rank d. Confidence level Answer: A Difficulty: Easy LO: 4.5, Page 154 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: Multicollinearity is the degree of correlation among independent variables in a regression model. 33. A variable used to model the effect of categorical independent variables in a regression model is known as a _____. a. dependent variable b. response c. dummy variable d. predictor variable Answer: C Difficulty: Easy LO: 4.6, Page 161 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: A variable used to model the effect of categorical independent variables in a regression model is known as a dummy variable. 34. The prespecified value of the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model is referred to as the _____. a. milestone b. breakpoint c. tipping point d. watchpoint Answer: B Difficulty: Easy LO: 4.7, Page 170 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis
Feedback: The prespecified value of the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model is referred to as the knot or breakpoint. 35. _____ refers to the scenario in which the relationship between the dependent variable and one independent variable is different at different values of a second independent variable. a. Interaction b. Multicollinearity c. Autocorrelation d. Covariance Answer: A Difficulty: Easy LO: 4.7, Page 173 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: Interaction refers to the scenario in which the relationship between the dependent variable and one independent variable is different at different values of a second independent variable. 36. Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population is termed as _____. a. approximation b. hypothesizing c. overfitting d. postulating Answer: C Difficulty: Easy LO: 4.8, Page 179 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population is termed as overfitting. 37. Assessing the regression model on data other than the sample data that was used to generate the model is known as _____. a. approximation b. cross-validation c. graphical validation d. postulation Answer: B Difficulty: Moderate LO: 4.8, Page 179 Bloom’s: Knowledge BUSPROG: Analytic
DISC: Regression Analysis Feedback: Assessing the regression model on data other than the sample data that was used to generate the model is known as cross-validation. 38. _____ is the data set used to build the candidate models. a. Range b. Codomain c. Validation set d. Training set Answer: D Difficulty: Easy LO: 4.8, Page 179 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: Training set is the data set used to build the candidate models. 39. _____ refers to the data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable. a. Codomain b. Training set c. Validation set d. Range Answer: C Difficulty: Easy LO: 4.8, Page 179 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: Validation set is the data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable. 40. The principle of using the simplest meaningful model possible without sacrificing accuracy is referred to as_____. a. Engel’s law b. KISS principle c. Murphy’s law d. Ockham’s razor Answer: D Difficulty: Easy LO: 4.8, Page 179 Bloom’s: Knowledge BUSPROG: Analytic DISC: Regression Analysis Feedback: The principle of using the simplest meaningful model possible without sacrificing accuracy is referred to as Ockham’s razor.
Problems 41. The data on profit and market capitalization for a sample of 15 different firms in U.S are as below. Profits ($ millions) y
Market Capitalization ($ millions) x
296.2 -25 4085 6558 12525 3394 442.8 633.1 3528 2698 1200.65 11.987 641.8 5043 5206
1936.9 1171.8 55135.8 97417.2 95198.9 53579.7 12466.3 8894.3 65872.4 25661.3 19854.7 195643.8 10447.8 66695.5 53558.4
a. Develop a scatter chart for the above data. What does this chart indicate about the relationship between market capitalization and profit? b. Use the data to develop an estimated regression equation that could be used to estimate a firm’s profit based on its market capitalization. What is the estimated regression model? c. What is the predicted profit for the market capitalization of 70721.3 ($ million)? Answer: a. The scatter chart with Market Capitalization as the independent variable follows.
This scatter chart indicates there may be a positive linear relationship between market capitalization and profit. b. The following Excel output provides the estimated regression equation that could be used to estimate the firm’s profit (y) based on its market capitalization (x).
The estimated simple linear regression equation is 𝑦̂ = 1916.4873 + 0.0229𝑥. c. The predicted profit for a market capitalization of 70721.3 will be 𝑦̂ = 1916.4873 + 0.0229 ∗ 70721.3 = 3536.57 or approximately 3537 ($ millions).
Difficulty: Moderate LO: 4.1; LO: 4.2; LO: 4.3; Pages 125-135 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 42. A research center is interested in investigating about height and age of children who are between 5 to 9 years old. In order to do this, a sample of 15 children is selected and the data is given below. Age (in years) 7 8 5 8 8 7 7 7 9 8 5 8 6 5 8
Height (inches) 47.3 48.8 41.3 50.4 51 47.1 46.9 48 51.2 51.2 40.3 48.9 45.2 41.9 49.6
a. Develop a scatter chart with age as the independent variable. What does the scatter chart indicate about the relationship between the height and age of children? b. Use the data to develop an estimated regression equation that could be used to estimate the height based on the age. What is the estimated regression model? c. How much of the variation in the sample values of height does the model estimated in part b explain? Answer: a. The scatter chart with age as the independent variable follows.
This scatter chart indicates there may be a positive linear relationship between height and age of children who are 5 to 10 years old. b. The following Excel output provides the estimated regression equation that could be used to estimate a child’s height (y) based on age (x).
The estimated simple linear regression equation is 𝑦̂ = 27.9140 + 2.7395𝑥. c. The coefficient of determination R2 is 0.9402, so the regression model estimated in part (b) explains approximately 94% of the variation in the height in the sample. Difficulty: Challenging LO: 4.1; LO: 4.2; LO: 4.3; Pages 125-135
Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 43. A company’s sales in the period 2000 to 2011 along with the national income of the country, where the business is set up, are as below.
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
National Income (in millions of dollars) x 305 316 358 350 375 392 400 398 430 456 578 498
Company's sales (in thousands of dollars) y 470 485 499 515 532 532 556 576 583 587 601 605
a. Develop a scatter chart for the above data. What does this chart indicate about the relationship between the National Income and the Company's sales in the period 2000 to 2011? b. Use the data to develop an estimated regression equation that could be used to estimate the company’s sales based on the national income. What is the estimated regression model? Answer: a. The scatter chart with national income as the independent variable follows.
This scatter chart indicates there may be a positive linear relationship between national income and company’s sales. b. The following Excel output provides the estimated regression equation that could be used to estimate the company’s sales (y) based on the national income (x).
The estimated simple linear regression equation is 𝑦̂ = 328.9817 + 0.5340𝑥. Difficulty: Easy LO: 4.1; LO: 4.2; Pages 125-133
Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 44. A company’s sales in the period 2000 to 2011 along with the national income of the country, where the business is set up, are as below.
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
National Income (in millions of dollars) x 305 316 358 350 375 392 400 398 430 456 578 498
Company's sales (in thousands of dollars) y 470 485 499 515 532 532 556 576 583 587 601 605
Test whether each of the regression parameters β0 and β1 is equal to zero at a 0.05 level of significance. What are the correct interpretations of the estimated regression parameters? Answer: First we check the conditions necessary for valid inference in regression. The Excel plot of the residuals and market capitalization follows.
Because we are working with only 12 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. However, this scatter chart does not provide strong evidence of a violation of the conditions, so we will proceed with our inference. Excel output:
The p-value associated with the estimated regression parameter b1 is 8.9E-05 which is approximately equal to 0. Because this p-value is less than the 0.05 level of significance, we reject the hypothesis that β1 = 0. Hence, we conclude that there is a relationship between national income and company’s sales, and our best estimate is that a one million dollar increase in national Income corresponds to an increase of $534.02 in Company's sales. Company's sales are expected to increase as National Income increases, so this result is consistent with what is expected. The estimated regression parameter b0 suggests that when national income is zero, the Company's sales are $328,981, which is not realistic. Difficulty: Challenging LO: 4.5; Pages 143-160 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 45. The data shown below are the average personal income and personal consumption expenditures based on the survey conducted in the year 1995 to 2009 in U.S. Personal income ($) 23,310 24,444
Personal consumption expenditures ($) 18,714 19,569
25,657 27,260 28,336 30,317 31,162 31,448 32,282 33,872 35,423 37,723 39,418 40,156 39,113
20,414 21,434 22,738 24,227 25,074 25,865 26,848 28,228 29,818 31,210 32,551 33,273 32,853
a. Develop a scatter chart for the above data. What does this chart indicate about the relationship between average personal income and personal consumption expenditure? b. Develop an estimated regression equation showing how personal consumption expenditure is related personal income. c. What proportion of variation in the sample values of proportion of personal consumption expenditure does this model explain? Answer: a. The scatter chart with National Income as the independent variable follows.
This scatter chart indicates there is a positive linear relationship between personal income and personal consumption expenditure.
b. The following Excel output provides the estimated regression equation that could be used to estimate the personal consumption expenditure (y) based on the personal income (x).
The estimated simple linear regression equation is 𝑦̂ = −2551.8095 + 0.8983𝑥. c. The coefficient of determination R2 is 0.9954, so the regression model estimated in part (b) explains approximately 99.5% of the variation in the personal consumption expenditure in the sample. Difficulty: Moderate LO: 4.1; LO: 4.2; LO: 4.3; Pages 125-135 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 46. The data shown below are the average personal income and personal consumption expenditures based on the survey conducted in the year 1995 to 2009 in U.S. Personal income ($) 23,310 24,444 25,657 27,260 28,336 30,317
Personal consumption expenditures ($) 18,714 19,569 20,414 21,434 22,738 24,227
31,162 31,448 32,282 33,872 35,423 37,723 39,418 40,156 39,113
25,074 25,865 26,848 28,228 29,818 31,210 32,551 33,273 32,853
a. What is the 95 percent confidence interval for the regression parameter β1? Based on this interval, what conclusion can you make about the hypotheses that the regression parameter β1 is equal to zero? d. What is the 95 percent confidence interval for the regression parameter β0? Based on this interval, what conclusion can you make about the hypotheses that the regression parameter β0 is equal to zero? Answer: a. First we check the conditions necessary for valid inference in regression. The Excel plot of the residuals and personal income follows.
The scatter chart suggests that the regression model might underpredict the value of dependent variable. However, because we are working with only 15 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. So we will proceed with our inference. Excel Output:
The 95% confidence interval for the regression parameter β1 provided in the Excel output is (0.8616, 0.9349). Because this interval does not include zero, we reject the hypothesis that β1 = 0. Hence, we conclude that there is a relationship between personal income and personal consumption expenditure. And, our best estimate is that a one dollar increase in personal income corresponds to an increase of $0.8983 in personal consumption expenditure. b. The 95% confidence interval for the regression parameter β0 provided in the Excel output is (3740.5039, -1363.1151). Because this interval does not include zero, we reject the hypothesis that β0 = 0. Difficulty: Moderate LO: 4.5; Pages 143-160 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 47. A survey is conducted to determine whether the age of car influences the annual maintenance cost. A sample of 10 cars is selected and the data is shown below.
3
Annual Maintenance Cost ($) y 120
5
115
6
135
Age of car (months) x
7
290
9
275
10
300
11
350
13
475
14
500
15
550
a. Develop a scatter chart for these data with age of cars as the independent variable. What does the scatter chart indicate about the relationship between age of a car and the annual maintenance cost? b. Use the data to develop an estimated regression equation that could be used to predict the annual maintenance cost given the age of the car. What is the estimated regression model? Answer: c. The scatter chart with age of cars as the independent variable follows.
This scatter chart indicates there is a positive linear relationship between age of a car and the annual maintenance cost. d. The following Excel output provides the estimated regression equation that could be used to estimate the annual maintenance based on the age of a car.
The estimated simple linear regression equation is 𝑦̂ = −45.5955 + 38.3436𝑥.
Difficulty: Moderate LO: 4.1; LO: 4.2; LO: 4.3; Pages 125-135 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 48. A survey is conducted to determine whether the age of car influences the annual maintenance cost. A sample of 10 cars is selected and the data is shown below.
3
Annual Maintenance Cost ($) y 120
5
115
6
135
7
290
9
275
10
300
11
350
13
475
14
500
15
550
Age of car (months) x
a. Test whether each of the regression parameters β0 and β1 is equal to zero at a 0.05 level of significance. b. Interpret the estimated regression parameters? Are these interpretations reasonable?
Answer: a. First we check the conditions necessary for valid inference in regression. The Excel plot of the residuals and age of cars follows.
Because we are working with only 10 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. However, this scatter chart does not provide strong evidence of a violation of the conditions, so we will proceed with our inference. Excel output:
The p-value associated with the estimated regression parameter b1 is 4.14E-06 ≈ 0. Because this p-value is less than the 0.05 level of significance, we reject the hypothesis that β1 = 0. We conclude that there is a linear relationship between age of a car and the annual maintenance cost. And, out estimate is that as age of car increases by a year the maintenance cost increases by $38.34. The maintenance cost of a car is expected to increase as the age increases. So this result is consistent with what is expected.
The estimated regression parameter b0 suggests that when the age of car is 0, the maintenance coat is -$45.60. This result is obviously not realistic and the parameter estimate and the test of the hypothesis that β0 = 0 are meaningless because the y-intercept has been estimated through extrapolation (there is no car in the sample data with age zero). Difficulty: Moderate LO: 4.5; Pages 143-160 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 49. A research team in at the Gonzaga University is interested in predicting a student's overall university GPA if his/her high school GPA is known. Assume that a random sample of 20 students is selected and the data is given below. High School GPA 3.65 3.08 3.12 2.87 3.34 2.7 3.25 3.01 3.2 2.62 3.71 2.26 3.02 3.58 3.67 2.92 3.71 3.22 2.27 2.8
University GPA 3.72 3.21 3 2.67 3.57 2.97 2.83 2.89 3.8 2.43 3.82 2.66 3.62 3.83 3.74 2.64 3.53 2.99 2.37 2.69
a. Develop a scatter chart for these data with High School GPA as the independent variable. What does the scatter chart indicate about the relationship between high school GPAs and overall university GPA?
b. Develop an estimated regression equation showing how high school GPA is related to overall university GPA. What is the estimated regression model? c. What is the predicted overall university GPA of Sophia, a student who has been admitted to Gonzaga University, with 3.40 high school GPA? Answer: a. The scatter chart with high school GPA as the independent variable follows.
This scatter chart indicates there may be a positive linear relationship between high school GPAs and overall university GPA. b. The following Excel output provides the estimated regression equation that could be used to estimate the overall university GPA (y) based on student’s GPA scored in high school (x).
The estimated simple linear regression equation is 𝑦̂ = 0.1907 + 0.9543𝑥. c. The predicted overall university GPA of Sophia who has scored 3.40 in high school GPA will be 𝑦̂ = 0.1907 + 0.9543 ∗ 3.40 = 3.44 Difficulty: Moderate LO: 4.1; LO: 4.2; LO: 4.3; Pages 125-135 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis
50. A researcher wanted to study effect of two factors, x1 and x2, on yield (y). The observations are given below. Observations 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x1 42.5 43.5 43.9 44.8 46.8 47.5 50.1 51.9 54.7 54.8 57.1 57.8 62.3 66.7 71.2
x2 30.1 29.3 31.1 29.6 29.7 29.9 30.1 30.4 30.5 31 31.8 31.4 31.5 32.1 32.5
y 260.4 261.7 273.6 278.6 281.5 294.6 301.2 314.6 320.5 324.7 356.7 370.3 378 384.8 396.9
a. Develop an estimated linear regression equation with the factor x1 as the independent variable. Test for a significant relationship between factor x1 and yield at the 0.05 level of significance. b. How much of the variation in the sample values of yield does the model in part (a) explain? Answer: a. The following Excel output provides the estimated linear regression equation that can be used to predict yield (y) given the factor x1.
The estimated linear regression equation is 𝑦̂ = 44.5053 + 5.1917𝑥1 . Before testing the hypothesis β1 =0 for this regression model, we check the conditions necessary for valid inference in regression. The Excel plot of the residuals and factor x1 follows.
Because we are working with only 15 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. However, these scatter charts do not provide strong evidence of a violation of the conditions. The p-value for the test of the hypothesis that β1 = 0 is 9.64E-10 ≈ 0. Because this p-value is less than the 0.05 level of significance, we reject the hypothesis that β1 = 0, and conclude that there is a relationship between yield and factor x1 at the 0.05 level of significance. b. The coefficient of determination R2 is 0.9483, so the regression model estimated in part (a) explains approximately 94.8% of the variation in the yield in the sample. Difficulty: Moderate LO: 4.1; LO: 4.2; LO: 4.3; Pages 125-135 Bloom’s: Application
BUSPROG: Analytic Skills DISC: Regression Analysis
51.
A researcher wanted to study effect of two factors, x1 and x2, on yield (y). The observations are given below.
Observations 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x1 42.5 43.5 43.9 44.8 46.8 47.5 50.1 51.9 54.7 54.8 57.1 57.8 62.3 66.7 71.2
x2 30.1 29.3 31.1 29.6 29.7 29.9 30.1 30.4 30.5 31 31.8 31.4 31.5 32.1 32.5
y 260.4 261.7 273.6 278.6 281.5 294.6 301.2 314.6 320.5 324.7 356.7 370.3 378 384.8 396.9
a. Develop an estimated regression equation with both factors x1 and x2 as the independent variables. Is the overall regression statistically significant at the 0.05 level of significance? If so, then test whether each of the regression parameters β0, β1, and β2 is equal to zero at a 0.01 level of significance. What are the correct interpretations of the estimated regression parameters? b. How much of the variation in the sample values of y does the model in part (a) explain? Answer: The following Excel output provides the estimated multiple linear regression equation that could be used to predict the yield (y) given the two factors x1 and x2.
The estimated multiple linear regression equation is 𝑦̂ = −138.87 + 4.4942𝑥1 + 7.1704𝑥2 . Before testing for a significant overall regression relationship (that is, testing the hypothesis that β1 = β2 = 0), we check the conditions necessary for valid inference in regression. The Excel plots of the residuals and each of the two independent variables follow.
Because we are working with only 15 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. This scatter chart of the residuals versus factor x1 does not provide strong evidence of a violation of the conditions.
Similarly, this scatter chart of the residuals versus factor x2 does not provide strong evidence of a violation of the conditions, so we will proceed with our inference. The p-value associated with the F test for an overall regression relationship is 9.87E-09 ≈ 0. Because this p-value is less than the 0.01 level of significance, we reject the hypothesis that β1 = β2 = 0. We conclude that there is an overall regression relationship at the 0.01 level of significance The p-value associated with the estimated regression parameter b1 is 2.41E-05 ≈ 0. Because this p-value is less than the 0.01 level of significance, we reject the hypothesis that β1 = 0. We conclude that there is a relationship between yield and factor x1 at the 0.01 level of significance. The best estimate is that if we hold factor x2 constant, a 1 unit increase in factor x1 corresponds to an increase of 4.49 unit change in yield. The p-value associated with the estimated regression parameter b2 is 0.2607. Because this p-value is greater than the 0.01 level of significance, we do not reject the hypothesis that β2 = 0. We fail to conclude that there is a relationship between factor x2 and yield at the 0.01 level of significance. The estimated regression parameter b0 suggests that when factors x1 and x2 are both zero, the yield is -138.97 units. This result is obviously not realistic as yield cannot be negative. b. The coefficient of determination R2 is 0.9537, so this regression model explains approximately 95% of the variation in the sample values of yield.
Difficulty: Moderate LO: 4.4; LO: 4.5; Pages 138-160 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis
52.
A student is interested in studying the impact of number of books referred by students on a statistics course and the number of lectures they attended on the final grade on the course. A sample of 25 students is selected and the data is given below.
BOOKS 2 3 1 5 2 1 3 4 5 2 5 1 3 1 4 5 3 3 1 5
ATTEND 17 18 17 20 12 13 17 19 22 22 22 12 18 12 21 14 18 13 8 22
GRADE 60 54 62 59 44 40 96 90 97 54 91 48 91 65 82 61 54 46 64 90
a. Develop an estimated regression equation using number of books referred and the number of lectures attended to predict the final grade on the course. b. Joseph referred 4 books and attended 19 lectures. What is his predicted final score on the course? Answer: a. The following Excel output provides the estimated multiple linear regression equation with number of books referred by students on a statistics course and the number of lectures they attended as the independent variables.
The estimated multiple linear regression equation is 𝑦̂ = 28.75 + 4.30Books + 1.541Attend. b. If Joseph refers 4 books and attends 19 lectures, his predicted final score will be 𝑦̂ = 28.75 + 4.30 ∗ 4 + 1.541 ∗ 19 = 75.23, or approximately 75.
Difficulty: Easy LO: 4.4; Pages 138-143 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 53. A student is interested in studying the impact of number of books referred by students on a statistics course and the number of lectures they attended on the final grade on the course. A sample of 25 students is selected and the data is given below.
BOOKS 2 3 1 5 2 1 3 4 5 2
ATTEND 17 18 17 20 12 13 17 19 22 22
GRADE 60 54 62 59 44 40 96 90 97 54
5 1 3 1 4 5 3 3 1 5
22 12 18 12 21 14 18 13 8 22
91 48 91 65 82 61 54 46 64 90
a. Use the F test to determine the overall significance of the relationship. What is your conclusion at the 0.05 level of significance? Use the t test to determine the significance of each independent variable? What are your conclusions at the 0.05 level of significance? b. How much of the variation in the final grade does the model in part (a) explain? Answer: a. To test any hypotheses, we check the conditions necessary for valid inference in regression. The Excel plots of the residuals and each of the two independent variables follow.
Because we are working with only 20 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. However, this scatter chart does not provide strong evidence of a violation of the conditions.
Similarly, this scatter chart does not provide strong evidence of a violation of the conditions, so we will proceed with our inference. Excel Output:
The p-value associated with the F test for an overall regression relationship is 0.0151. Because this p-value is less than the 0.05 level of significance, we reject the hypothesis that β1 = β2 = 0. We conclude that there is an overall regression relationship at the 0.05 level of significance. The p-value associated with the estimated regression parameter b1 is 0.1945. Because this pvalue is greater than the 0.05 level of significance, we do not reject the hypothesis that β1 = 0. We cannot conclude that there is a relationship between number of books referred and the final score at the 0.05 level of significance. The p-value associated with the estimated regression parameter b2 is 0.2081. Because this pvalue is greater than the 0.05 level of significance, we do not reject the hypothesis that β2 = 0.
We cannot conclude that there is a relationship between the number of lectures attended and the final score of students. Difficulty: Challenging LO: 4.5; Pages 143-160 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 54. A survey conducted by a research team was to investigate how the education level, tenure in current employment, and age, are related to annual income. A sample 20 employees is selected and the data is given below. Education (No. of years) 17 12 20 14 12 14 12 18 16 11 16 12 16 13 11 20 19 16 12 10
Length of tenure in current employment (No. of years) 8 12 9 4 1 9 8 10 12 7 14 4 17 7 6 4 7 12 2 6
Age (No. of years) 40 41 44 42 19 28 43 37 36 39 36 22 45 42 18 40 35 38 19 44
Annual income ($) 124,000 30,000 193,000 88,000 27,000 43,000 96,000 110,000 88,000 36,000 81,000 38,000 140,000 11,000 21,000 151,000 124,000 48,000 26,000 124,000
a. Determine the estimated multiple linear regression equation that can be used to predict the annual income given number of years school completed (Education), length of tenure in current employment, and age. b. Use the F test to determine the overall significance of the regression relationship. What is the conclusion at the 0.05 level of significance? Answer:
a. The following Excel output provides the estimated multiple linear regression equation with education (x1), length of tenure in current employment (x2), age (x3) as the independent variables.
The estimated multiple linear regression equation is 𝑦̂ = −143481.19 + 10011.92𝑥1 − 2193.88𝑥2 + 2689.24𝑥3. b. Before performing any hypothesis tests on the results, we check the conditions necessary for valid inference in regression. The Excel plots of the residuals and education, length of tenure in current employment, age follow.
Because we are working with only 20 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. However, none of these scatter charts provide strong evidence of a violation of the conditions necessary for valid inference in regression, so we will proceed with our inference. The p-value associated with the F test for an overall regression relationship is 0.00039. Because this p-value is less than the 0.05 level of significance, we reject the hypothesis that β1 = β2 = β3 = 0. We conclude that there is an overall regression relationship at the 0.05 level of significance. Difficulty: Moderate LO: 4.5; Pages 143-160 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 55. A survey conducted by a research team was to investigate how the education level, tenure in current employment, and age, are related to annual income. A sample 20 employees is selected and the data is given below.
Education (No. of years) 17 12 20 14 12 14 12 18 16 11 16 12 16 13 11 20 19 16 12 10
Length of tenure in current employment (No. of years) 8 12 9 4 1 9 8 10 12 7 14 4 17 7 6 4 7 12 2 6
Age (No. of years)
Annual income ($)
40 41 44 42 19 28 43 37 36 39 36 22 45 42 18 40 35 38 19 44
124,000 30,000 193,000 88,000 27,000 43,000 96,000 110,000 88,000 36,000 81,000 38,000 140,000 11,000 21,000 151,000 124,000 48,000 26,000 124,000
a. Check if the F test leads to conclude that an overall regression relationship exists. If yes, use the t test to determine the significance of each independent variable. What is the conclusion for each test at the 0.05 level of significance? b. Remove all independent variables that are not significant at the 0.05 level of significance from the estimated regression equation. What is your estimated regression equation in this case? Answer: a. Using the following Excel output, we conclude that there is an overall regression relationship at the 0.05 level of significance.
The p-value associated with the estimated regression parameter b1 is 0.0013. Because this pvalue is less than the 0.05 level of significance, we reject the hypothesis that β1 = 0. We conclude that there is a relationship between the annual income and the education level at the 0.05 level of significance. Our best estimate is that if we hold the length of tenure in current employment and the age constant, a one year increase in the education level corresponds to an increase of $10,011.92 in annual income. The p-value associated with the estimated regression parameter b2 is 0.3246. Because this pvalue is greater than the 0.05 level of significance, we do not reject the hypothesis that β2 = 0. We cannot conclude that there is a relationship between the length of tenure in current employment and the annual income at the 0.05 level of significance when controlling for education level and age. The p-value associated with the estimated regression parameter b3 is 0.0149. Because this pvalue is less than the 0.05 level of significance, we reject the hypothesis that β3 = 0. We conclude that there is a relationship between the age and the annual income at the 0.05 level of significance. Our best estimate is that if we hold the education level and length of tenure in current employment constant, an increase in age by a year corresponds to an increase of $2,689.24 in annual income. b. The following Excel output is obtained by removing the independent variable, length of tenure in current employment (x2), which is not significant at 0.05 significance level. Hence, this output provides the estimated multiple linear regression equation with education (x1), age (x3) as the independent variables.
The estimated multiple linear regression equation is 𝑦̂ = −138671.68 + 9588.89𝑥1 + 2234.56𝑥3 .
Difficulty: Challenging LO: 4.5; Pages 143-160 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 56. Consider the following data with the dependent variable y, independent variable x, and the dummy variable d. y 100.0 100.0 50.0 75.0 65.0 90.0 85.0 65.0 50.0 75.0 75.0 65.0 95.0 95.0 90.0
d
x
0 0 0 1 0 0 1 1 1 1 1 0 0 0 1
9.3 6.5 4.2 7.4 6.0 7.6 9.6 6.5 6.0 9.9 8.6 4.9 7.2 9.9 7.8
80.0 75.0 95.0 100.0 90.0 70.0 100.0 40.0 90.0 75.0 40.0 45.0 55.0 100.0 90.0 80.0 80.0 90.0 60.0 95.0 45.0 65.0 85.0 60.0 75.0
1 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 1 0 0 1 1 1
7.2 7.0 8.7 9.9 9.3 9.1 8.9 5.4 8.8 8.8 5.0 6.5 5.2 10.0 8.4 6.4 6.7 8.5 5.9 8.9 6.3 7.5 6.8 5.7 4.5
a. Develop the estimated regression equation using all of the independent variables included in the data. b. Test for an overall regression relationship at the 0.05 level of significance. Is there a significant regression relationship? Answer: a. The following Excel output provides the estimated multiple linear regression equation that could be used to predict y given the dummy variable, d, and x.
The estimated multiple linear regression equation is 𝑦̂ = 19.918 − 5.954𝑑 + 8.010𝑥. b. Before testing any hypotheses for this regression model, we check the conditions necessary for valid inference in regression. Excel plots of the residuals and each independent variable follow.
The residuals appear to have a mean of zero and do not appear to be badly skewed at any value of any independent variable. We therefore will proceed with our inference.
The p-value associated with the F test for an overall regression relationship is 7.813E-07. Because this p-value is less than the 0.05 level of significance, we reject the hypothesis that β1 = β2 = 0 and conclude that there is an overall regression relationship at the 0.05 level of significance. Difficulty: Moderate LO: 4.6; Pages 161-165 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 57. Consider the following data with the dependent variable y, independent variable x, and the dummy variable d. y 100.0 100.0 50.0 75.0 65.0 90.0 85.0 65.0 50.0 75.0 75.0 65.0 95.0 95.0 90.0 80.0 75.0 95.0 100.0 90.0 70.0 100.0 40.0 90.0 75.0 40.0
d
x
0 0 0 1 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 0 0 1 1
9.3 6.5 4.2 7.4 6.0 7.6 9.6 6.5 6.0 9.9 8.6 4.9 7.2 9.9 7.8 7.2 7.0 8.7 9.9 9.3 9.1 8.9 5.4 8.8 8.8 5.0
45.0 55.0 100.0 90.0 80.0 80.0 90.0 60.0 95.0 45.0 65.0 85.0 60.0 75.0
1 0 0 0 1 0 1 0 1 0 0 1 1 1
6.5 5.2 10.0 8.4 6.4 6.7 8.5 5.9 8.9 6.3 7.5 6.8 5.7 4.5
a. Check if the overall regression relationship exists for the above data. If yes, test the relationship between each independent variable and the dependent variable at the 0.05 level of significance, and interpret the relationship between each of the independent variables and the dependent variable. b. How much of the variation in the sample values of delay does this estimated regression equation explain? Answer: a. Using the following Excel output, we conclude that there is an overall regression relationship at the 0.05 level of significance.
We also have checked the necessary conditions for valid inference in regression in the previous question. Hence, the p-value for the test of the hypothesis that β1 = 0 is 0.1540. Because this p-value is greater than the 0.05 level of significance, we do not reject the hypothesis that β1 = 0, and we conclude that there is no difference in the variables y and d controlling for the variable x. The p-value for the test of the hypothesis that β2 = 0 is 1.52E-07. Because this p-value is less than the 0.05 level of significance, we reject the hypothesis that β2 = 0, and conclude that there is a difference between variables y and the independent variable x at the 0.05 level of significance. We estimate that, holding the variable d constant, 1 unit increase in x increases 8.010 units of y. b. The coefficient of determination is R2 = 0.5324, so the regression model explains approximately 53% of the variation in the y values in the sample. Difficulty: Moderate LO: 4.5; LO: 4.6; Pages 143-165 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 58. A production company is studying the relationship between the average cost/unit and number of units produced in a batch. A sample of 10 batches is selected and the data is given below. No. of units produced 20 35 50 65 80 95 110 120 135 150
Cost/unit 37.7158 35.0158 30.8158 25.8158 20.0364 16.9064 16.3766 13.8564 13.9696 13.847
a. Develop a scatter chart for these data. What does the scatter chart indicate about the relationship between average cost/unit and number of units produced? b. Develop an estimated simple linear regression equation for the data. How much variation in the sample values of cost/unit is explained by this regression model? Answer:
a. The scatter chart with number of units produced as the independent variable follows.
A simple linear regression model does not appear to be appropriate; there appears to be a curvilinear relationship between cost/unit and number of units produced in a batch. b. The following Excel output provides the estimated simple linear regression equation that could be used to predict cost/unit (y) given the number of units produced in a batch (x).
The estimated multiple linear regression equation is 𝑦̂ = 39.8967 − 0.2030𝑥, and the coefficient of determination for this model is R2 = 0.9177, so the regression model explains approximately 92% of the variation in the sample values of cost/unit. Difficulty: Easy LO: 4.1; LO: 4.2; Pages 125-133 Bloom’s: Application
BUSPROG: Analytic Skills DISC: Regression Analysis 59. A production company is studying the relationship between the average cost/unit and number of units produced in a batch. A sample of 10 batches is selected and the data is given below. No. of units produced Cost/unit 20 37.7158 35 35.0158 50 30.8158 65 25.8158 80 20.0364 95 16.9064 110 16.3766 120 13.8564 135 13.9696 150 13.847 a. Develop an estimated quadratic regression equation for the data. How much variation in the sample values of cost/unit does this regression model explain? b. Is the overall regression relationship significant at a 0.05 level of significance? If so, then test the relationship between the independent variable and the dependent variable at a 0.05 level of significance. Answer: a. The following Excel output provides the estimated second order quadratic regression equation that could be used to predict cost/unit (y) given the number of units produced (x).
The estimated second order quadratic regression equation is 𝑦̂ = 47.99 − 0.4535𝑥 + 0.0015𝑥 2, and the coefficient of determination for this model is R2 = 0.9824, so the quadratic regression model explains approximately 98% of the variation in the sample values of cost/unit. b. Before testing any hypotheses about this regression model, we again check the conditions necessary for valid inference in regression. The Excel plot of the residuals and vehicle speed follows.
These scatter charts do not provide strong evidence of a violation of the conditions, so we will proceed with our inference. The p-value associated with the F test for an overall regression relationship is 7.24E-07. Because this p-value is less than the 0.05 level of significance, we reject the hypothesis that β1 = β2 = 0 and conclude that there is an overall regression relationship at the 0.05 level of significance. The p-value for the test of the hypothesis that β1 = 0 is 4.32E-05. Because this p-value is less than the 0.05 level of significance, we again reject the hypothesis that β1 = 0. Similarly, the p-value for the test of the hypothesis that β2 = 0 is 0.0014. Because this p-value is less than the 0.05 level of significance, we reject the hypothesis that β2 = 0. We therefore conclude that there is a nonlinear relationship between cost/unit and number of units produced in the batch.
Difficulty: Easy LO: 4.7; Pages 165-177 Bloom’s: Application BUSPROG: Analytic Skills DISC: Regression Analysis 60. Consider the below data which is based on a company’s sales in the period 2000 to 2011 along with the national income of the country, where the business is set up.
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
National Income (in millions of dollars) x 305 316 358 350 375 392 400 398 430 456 578 498
Company's sales (in thousands of dollars) y 470 485 499 515 532 532 556 576 583 587 601 605
a. Develop a scatter chart for these data, treating the national income as the independent variable. Does a simple linear regression model appear to be appropriate? b. Develop an appropriate estimated regression equation to predict the company's sales, given the national income. How much variation in the sample values of company’s sales is explained by this regression model Answer: a. The scatter chart with national income as the independent variable follows.
A simple linear regression model does not appear to be appropriate; there appears to be a curvilinear relationship between national income and company’s sales. b. Based on the scatter chart, we estimate quadratic regression equation that could be used to national income (y) given company’s sales (x).
The estimated multiple linear regression equation is 𝑦̂ = −128.98 + 02.71𝑥 − 0.0024𝑥 2, and the coefficient of determination for this model is r2 = 0.9380, so the regression model explains approximately 94% of the variation in the sample values of company’s sales. Difficulty: Easy LO: 4.7; Pages 165-177 Bloom’s: Application
BUSPROG: Analytic Skills DISC: Regression Analysis
Chapter 5: Time Series Analysis and Forecasting 1. A forecast is defined as a(n): a. prediction of future values of a time series. b. quantitative method used when historical data on the variable of interest are either unavailable or not applicable. c. set of observations on a variable measured at successive points in time. d. outcome of a random experiment. Answer: A Difficulty: Easy LO: 5.1, Page 204 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: Forecast is defined as a prediction of future values of a time series. 2. Qualitative forecasting methods are used when: a. historical data on the variable being forecast are available. b. information on past values of the variable being measured is quantifiable. c. historical data on the variable being forecast are either unavailable or are not applicable. d. it is reasonable to assume that past is prologue. Answer: C Difficulty: Moderate LO: 5.1, Page 204 Bloom’s: Comprehension BUSPROG: Analytic DISC: Time Series Data Feedback: Qualitative forecasting methods are used when historical data on the variable being forecast are either unavailable or not applicable. 3. A set of observations on a variable measured at successive points in time or over successive periods of time constitute a _____. a. geometric series b. time invariant set c. time series d. logarithmic series Answer: C Difficulty: Easy LO: 5.1, Page 205 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: A time series is a sequence of observations on a variable measured at successive points in time or over successive periods of time. 4. Which of the following states the objective of time series analysis?
a. To study the variation of time with respect to increase in the variable value b. To analyze the time-dependent environmental factors that affected variable values in the past c. To use present variable values to study what should have been the ideal past values d. To uncover a pattern in the time series and then extrapolate the pattern into the future Answer: D Difficulty: Moderate LO: 5.1, Page 204 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: The objective of time series analysis is to uncover a pattern in the time series and then extrapolate the pattern into the future. 5. Causal forecasting: a. does not depend upon historical values of a variable. b. assumes that the variable being forecast has a cause-effect relationship with one or more other variables. c. uses present variable values to study what should have been the ideal past values. d. uses time series plots to study if the variable values are centered around the mean. Answer: B Difficulty: Moderate LO: 5.1, Page 204 Bloom’s: Comprehension BUSPROG: Analytic DISC: Time Series Data Feedback: Causal or exploratory forecasting methods are based on the assumption that the variable being forecast has a cause-effect relationship with one or more other variables. 6. A _____ pattern exists when the data fluctuate randomly around a constant mean over time. a. vertical b. seasonal c. cyclical d. horizontal Answer: D Difficulty: Easy LO: 5.1, Page 205 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: A horizontal pattern exists when the data fluctuate randomly around a constant mean over time. 7. _____ is the term used for a time series whose statistical properties are independent of time. a. Cluster b. Stationary time series
c. Trend d. Constant time series Answer: B Difficulty: Easy LO: 5.1, Page 205 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: Stationary time series is the term used for a time series whose statistical properties are independent of time. 8. Which of the following is true of a stationary time series? a. The process generating the data has a variable mean. b. The variability of the time series is constant over time. c. The time series plot for this case is a straight line. d. The fluctuations in values will always exhibit a cyclical pattern. Answer: B Difficulty: Moderate LO: 5.1, Page 205 Bloom’s: Comprehension BUSPROG: Analytic DISC: Time Series Data Feedback: For a stationary time series, 1. The process generating the data has a constant mean. 2. The variability of the time series is constant over time. 9. If a time series plot exhibits a horizontal pattern, then: a. it is evident that the time series is stationary. b. the data fluctuates randomly around a variable mean. c. there is no relationship between time and the time series variable. d. there is no sufficient evidence to conclude that the time series is stationary. Answer: D Difficulty: Moderate LO: 5.1, Page 205 Bloom’s: Comprehension BUSPROG: Analytic DISC: Time Series Data Feedback: A time series plot for a stationary time series will always exhibit a horizontal pattern with random fluctuations. However, simply observing a horizontal pattern is not sufficient evidence to conclude that the time series is stationary. 10. Trend refers to: a. the long-run shift or movement in the time series observable over several periods of time. b. the outcome of a random experiment. c. the recurring patterns observed over successive periods of time.
d. the short-run shift or movement in the time series observable at some specific period of time. Answer: A Difficulty: Moderate LO: 5.1, Page 207 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: Trend refers to the long-run shift or movement in the time series observable over several periods of time. 11. Trends result from: a. rapidly-arising short-term factors. b. rapidly-arising long-term factors. c. slowly-varying short-term factors. d. slowly-varying long-term factors. Answer: D Difficulty: Moderate LO: 5.1, Page 207 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: A trend is usually the result of long-term factors such as population increases or decreases, shifting demographic characteristics of the population, improving technology, changes in the competitive landscape, and/or changes in consumer preferences. 12. Which of the following data patterns best describes the scenario shown in the below plot?
a. b. c. d.
Time series with a linear trend pattern Time series with a nonlinear trend pattern Time series with no pattern Time series with a horizontal pattern
Answer: D
Difficulty: Easy LO: 5.1, Page 205 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: The given scenario shows a time series plot with a horizontal pattern. 13. Which of the following data patterns best describes the scenario shown in the given time series plot?
a. b. c. d.
Linear trend pattern Nonlinear trend pattern Seasonal pattern Cyclical pattern
Answer: A Difficulty: Easy LO: 5.1, Page 207 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: The given time series plot shows a linear trend. 14. Which of the following data patterns best describes the scenario shown in the given time series plot?
a. b. c. d.
Linear trend pattern Nonlinear trend pattern Seasonal pattern Cyclical pattern
Answer: B Difficulty: Easy LO: 5.1, Page 208 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: The given time series plot shows a nonlinear trend. 15. Which of the following data patterns best describes the scenario shown in the given time series plot?
a. b. c. d.
Linear trend pattern Logarithmic trend Exponential trend Seasonal pattern
Answer: D Difficulty: Easy LO: 5.1, Page 209 Bloom’s: Knowledge
BUSPROG: Analytic DISC: Time Series Data Feedback: The given time series plot shows a seasonal trend. 16. Which of the following data patterns best describes the scenario shown in the given time series plot?
a. b. c. d.
Linear trend and cyclical pattern Linear trend and horizontal pattern Seasonal and cyclical patterns Seasonal pattern and linear trend
Answer: D Difficulty: Easy LO: 5.1, Page 209 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: The given time series plot exhibits both a seasonal pattern and a linear trend. 17. An exponential trend pattern is appropriate when: a. the amount of increase between periods in the value of the variable is constant. b. the percentage change between periods in the value of the variable is relatively constant. c. there is a no relationship between the time series variable and time. d. there are random fluctuations in the variable value with time. Answer: B Difficulty: Moderate LO: 5.1, Page 209 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: An exponential trend pattern is appropriate when the percentage change of variable value from one period to the next is relatively constant.
18. A time series that shows a recurring pattern over one year or less is said to follow a _____. a. horizontal pattern b. stationary pattern c. cyclical pattern d. seasonal pattern Answer: D Difficulty: Easy LO: 5.1, Page 209 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: A time series that shows a recurring pattern over one year or less is said to follow a seasonal pattern. 19. With reference to time series data patterns, a cyclical pattern is the component of the time series that: a. shows a periodic pattern over one year or less. b. does not vary with respect to time. c. results in periodic above-trend and below-trend behavior of the time series lasting more than one year. d. is characterized by a linear variation of the dependent variable with respect to time. Answer: C Difficulty: Moderate LO: 5.1, Page 211 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: With reference to time series data patterns, a cyclical pattern is the component of the time series that results in periodic above-trend and below-trend behavior of the time series lasting more than one year. 20. _____ is the amount by which the predicted value differs from the observed value of the time series variable. a. Mean forecast error b. Mean absolute error c. Smoothing constant d. Forecast error Answer: D Difficulty: Easy LO: 5.2, Page 213 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: Forecast error is the amount by which the forecasted value differs from the observed value.
21. If the forecasted value of the time series variable for period 2 is 22.5 and the actual value observed for period 2 is 25, what is the forecast error in period 2? a. 3 b. 2 c. 2.5 d. –2.5 Answer: C Difficulty: Easy LO: 5.2, Page 213 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data Feedback: Forecast error is the amount by which the forecasted value differs from the observed value. For the given values, the forecast error in period 2 is computed as 25 – 22.5 = 2.5. 22. The measures of accuracy of the forecasts: a. check how well a particular forecasting method is able to reproduce the time series data that are already available. b. use the current value to estimate how well the model generates previous values correctly. c. predict the future values and wait for a pre-defined time period to examine how accurate the predictions were. d. check to see if the forecast error is negative. Answer: A Difficulty: Moderate LO: 5.2, Page 213 Bloom’s: Comprehension BUSPROG: Analytic DISC: Time Series Data Feedback: Measures of forecast accuracy are used to determine how well a particular forecasting method is able to reproduce the time series data that are already available. By selecting the method that is most accurate for the data already known, we hope to increase the likelihood that we will obtain more accurate forecasts for future time periods. 23. Forecast error: a. takes a positive value when the forecast is too high. b. cannot be negative. c. cannot take a value of zero. d. is associated with measuring forecast accuracy. Answer: D Difficulty: Moderate LO: 5.1, 5.2, Page 204 and 213 Bloom’s: Comprehension BUSPROG: Analytic DISC: Time Series Data Feedback: The forecast in time series analysis is based solely on past values of the variable and/or on past forecast errors.
24. Which of the following measures of forecast accuracy is susceptible to the problem of positive and negative forecast errors offsetting one another? a. Mean absolute error b. Mean forecast error c. Mean squared error d. Mean absolute percentage error Answer: B Difficulty: Moderate LO: 5.2, Page 214 Bloom’s: Comprehension BUSPROG: Analytic DISC: Time Series Data Feedback: Because positive and negative forecast errors tend to offset one another, the mean forecast error is not a very useful measure of forecast accuracy. 25. The moving averages method refers to a forecasting method that: a. moves up the average of every subsequent forecast by one. b. uses regression relationship based on past time series values to predict the future time series values. c. relates a time series to other variables that are believed to explain or cause its behavior. d. uses the average of the most recent data values in the time series as the forecast for the next period. Answer: D Difficulty: Easy LO: 5.3, Page 217 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: The moving averages method refers to a forecasting method that uses the average of the most recent data values in the time series as the forecast for the next period. 26. The moving averages and exponential smoothing methods are appropriate for a time series exhibiting _____. a. horizontal pattern b. cyclical pattern c. trends d. seasonal effects Answer: A Difficulty: Easy LO: 5.3, Page 217 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: The moving averages and exponential smoothing methods are appropriate for a time series exhibiting horizontal pattern.
27. Which of the following statements is the objective of the moving averages and exponential smoothing methods? a. To characterize the variable fluctuations by a smooth curve b. To smooth out random fluctuations in the time series c. To characterize the variable fluctuations by an exponential equation d. To transform a nonstationary time series into a stationary series Answer: B Difficulty: Moderate LO: 5.3, Page 217 Bloom’s: Comprehension BUSPROG: Analytic DISC: Time Series Data Feedback: The objective of the moving averages and exponential smoothing methods is to smooth out random fluctuations in the time series; they are also referred to as smoothing methods. 28. In the moving averages method, the order k determines the: a. error tolerance b. compensation for forecasting error c. number of time series values under consideration d. number of samples in each unit time period Answer: C Difficulty: Moderate LO: 5.3, Page 218 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: To use moving averages to forecast a time series, we must first select the order k, or the number of time series values to be included in the moving average. If only the most recent values of the time series are considered relevant, a small value of k is preferred. If a greater number of past values are considered relevant, then we generally opt for a larger value of k. 29. Using a large value for order k in the moving averages method is effective in: a. tracking changes in a time series more quickly. b. smoothing out random fluctuations. c. providing a more accurate forecast variable value. d. eliminating the effect of seasonal variations in the time series. Answer: B Difficulty: Moderate LO: 5.3, Page 218 Bloom’s: Comprehension BUSPROG: Analytic DISC: Time Series Data Feedback: A moving average will adapt to the new level of the series and continue to provide good forecasts in k periods. Thus a smaller value of k will track shifts in a time series more
quickly. On the other hand, larger values of k will be more effective in smoothing out random fluctuations. 30. _____ uses a weighted average of past time series values as the forecast. a. The qualitative method b. Exponential smoothing c. Correlation analysis d. The causal model Answer: B Difficulty: Easy LO: 5.3, Page 221 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: Exponential smoothing uses a weighted average of past time series values as the forecast. 31. With reference to exponential forecasting models, a parameter that provides the weight given to the most recent time series value in the calculation of the forecast value is known as the _____. a. moving average b. regression coefficient c. smoothing constant d. mean forecast error Answer: C Difficulty: Easy LO: 5.3, Page 221 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: With reference to exponential forecasting models, a parameter that provides the weight given to the most recent time series value in the calculation of the forecast value is known as the smoothing constant. 32. The exponential smoothing forecast for period t + 1 is a weighted average of the: a. forecast value in period t with weight α and the actual value for period t with weight 1 – α. b. actual value in period t + 1 with weight α and the forecast for period t with weight 1 – α. c. forecast value in period t – 1 with weight α and the forecast for period t with weight 1 – α. d. actual value in period t with weight α and the forecast for period t with weight 1 – α. Answer: D Difficulty: Moderate LO: 5.3, Page 221 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data
Feedback: The exponential smoothing forecast for period t + 1 is a weighted average of the actual value in period t and the forecast for period t. The weight given to the actual value in period t is the smoothing constant α, and the weight given to the forecast in period t is 1 – α. 33. Which of the following is true of the exponential smoothing coefficient? a. It is a randomly generated value between –1 and +1. b. It is small for a time series that has relatively little random variability. c. It is chosen as the value that minimizes a selected measure of forecast accuracy such as the mean squared error. d. It is computed in relation with the order value, k, for the moving averages. Answer: C Difficulty: Easy LO: 5.3, Page 225 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: The criterion we will use to determine a desirable value for the smoothing constant α is the same as that proposed for determining the order or number of periods of data to include in the moving averages calculation; that is, we choose the value of α that minimizes the MSE. 34. The process of _____ might be used to determine the value of smoothing constant that minimizes the mean squared error. a. quantization b. nonlinear optimization c. clustering d. curve fitting Answer: B Difficulty: Easy LO: 5.3, Page 225 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: Trial and error is often used to determine whether a different smoothing constant α can provide more accurate forecasts, but we can avoid trial and error and determine the value of α that minimizes MSE through the use of nonlinear optimization. 35. Autoregressive models: a. use the average of the most recent data values in the time series as the forecast for the next period. b. are used to smooth out random fluctuations in time series. c. relate a time series to other variables that are believed to explain or cause its behavior. d. occur whenever all the independent variables are previous values of the same time series. Answer: D Difficulty: Easy LO: 5.4, Page 228 Bloom’s: Knowledge
BUSPROG: Analytic DISC: Time Series Data Feedback: Autoregressive models occur whenever all the independent variables are previous values of the same time series. 36. A time series with a seasonal pattern can be modeled by treating the season as a _____. a. predictor variable b. dependent variable c. dummy variable d. categorical variable Answer: C Difficulty: Easy LO: 5.4, Page 228 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: A time series with a seasonal pattern can be modeled by treating the season as a dummy variable. 37. Causal models: a. provide evidence of a causal relationship between an independent variable and the variable to be forecast. b. use the average of the most recent data values in the time series as the forecast for the next period. c. occur whenever all the independent variables are previous values of the same time series. d. relate a time series to other variables that are believed to explain or cause its behavior. Answer: D Difficulty: Easy LO: 5.4, Page 232 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: Causal models relate a time series to other variables that are believed to explain or cause its behavior. 38. A causal model provides evidence of _____ between an independent variable and the variable to be forecast. a. a causal relationship b. association c. no relationship d. a seasonal relationship Answer: B Difficulty: Easy LO: 5.4, Page 232 Bloom’s: Knowledge BUSPROG: Analytic
DISC: Time Series Data Feedback: The causal forecasting model provides evidence only of association between an independent variable and the variable to be forecast. The model does not provide evidence of a causal relationship between an independent variable and the variable to be forecast, and the conclusion that a causal relationship exists must be based on practical experience. 39. The value of an independent variable from the prior period is referred to as a _____. a. lagged variable b. dummy variable c. predictor variable d. categorical variable Answer: A Difficulty: Easy LO: 5.4, Page 235 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: The value of an independent variable from the prior period is referred to as a lagged variable. 40. For causal modeling, _____ are used to detect linear or nonlinear relationships between the independent and dependent variables. a. descriptive statistics on the data b. scatter charts c. contingency tables d. pie charts Answer: B Difficulty: Easy LO: 5.5, Page 236 Bloom’s: Knowledge BUSPROG: Analytic DISC: Time Series Data Feedback: For causal modeling, scatter charts can indicate whether strong linear or nonlinear relationships exist between the independent and dependent variables.
Problems 1. Consider the following time series data: Year
Value
1 2 3 4
234 287 255 310
5 6 7 8 9 10
298 250 456 412 525 436
Using the naïve method (most recent value) as the forecast for the next year, compute the following measures of forecast accuracy: a. Mean absolute error b. Mean squared error c. Mean absolute percentage error d. What is the forecast for year 11? Answer: The following table shows the calculations for parts (a), (b), and (c).
Year 1 2 3 4 5 6 7 8 9 10
Value 234 287 255 310 298 250 456 412 525 436 Total
Forecast Forecast Error 234 287 255 310 298 250 456 412 525
53 -32 55 -12 -48 206 -44 113 -89
a. MAE = 652/9 = 72.44 b. MSE = 74368/9 = 8263.11 c. MAPE = 169.7764/9 = 18.86% d. The forecast for year 11 is 𝑦̂11 = 436. Difficulty: Easy
Absolute Value of Forecast Error
Squared Forecast Error
53 32 55 12 48 206 44 113 89 652
2809 1024 3025 144 2304 42436 1936 12769 7921 74368
Percentage Error 18.4669 -12.5490 17.7419 -4.0268 -19.2000 45.1754 -10.6796 21.5238 -20.4128
Absolute Value of Percentage Error 18.4669 12.5490 17.7419 4.0268 19.2000 45.1754 10.6796 21.5238 20.4128 169.7764
LO: 5.2, Pages 212-217 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 2. Consider the following time series data: Year
Value
1 2 3 4 5 6 7 8 9 10
234 287 255 310 298 250 456 412 525 436
Using the average of all the historical data as a forecast for the next year, compute the following measures of forecast accuracy: a. Mean absolute error b. Mean squared error c. Mean absolute percentage error d. What is the forecast for year 11? Answer: The following table shows the calculations for parts (a), (b), and (c).
Year 1 2 3 4 5 6 7 8 9 10
Value 234 287 255 310 298 250 456 412 525 436
Forecast 234.00 260.50 258.67 271.50 276.80 272.33 298.57 312.75 336.33 Total
Forecast Error 53.00 -5.50 51.33 26.50 -26.80 183.67 113.43 212.25 99.67
Absolute Value of Forecast Error
Squared Forecast Error
53.00 5.50 51.33 26.50 26.80 183.67 113.43 212.25 99.67 772.15
2809.00 30.25 2635.11 702.25 718.24 33733.44 12866.04 45050.06 9933.44 108477.84
Percentage Error 18.4669 -2.1569 16.5591 8.8926 -10.7200 40.2778 27.5312 40.4286 22.8593
Absolute Value of Percentage Error 18.4669 2.1569 16.5591 8.8926 10.7200 40.2778 27.5312 40.4286 22.8593 187.8924
a.
MAE = 772.15/9= 85.79
b.
MSE = 108477.84/9 = 12053.09
c.
MAPE = 187.8924/9 = 20.88%
e. The forecast for year 11 is 𝑦̂11 = (y1 + y2 + y3 + y4 + y5 + y6 + y7 + y8 + y9 + y10) / 10 = (234 + 287 + 255 + 310 + 298 + 250 + 456 + 412 + 525 + 436) / 10 = 346.3. Difficulty: Easy LO: 5.2, Pages 212-217 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 3. The monthly sales revenue (in hundreds of dollars) of a company for one year is listed below. Month January February March April May June July August September October November December
Sales ($100s) 12,354 13,657 14,536 13,478 16,590 19,790 17,987 18,657 19,765 18,678 20,678 23,675
a. Compute MSE using the most recent value as the forecast for the next period. What is the forecast for the next month? b. Compute MSE using the average of all the data available as the forecast for the next period. What is the forecast for the next month? c. Which method appears to provide the better forecast? Answer: a.
Month January
Sales ($100s) 12,354
Forecast
Forecast Error
Squared Forecast Error
February
13,657
12,354
1303
1697809
March
14,536
13,657
879
772641
April
13,478
14,536
-1058
1119364
May
16,590
13,478
3112
9684544
June
19,790
16,590
3200
10240000
July
17,987
19,790
-1803
3250809
August
18,657
17,987
670
448900
September
19,765
18,657
1108
1227664
October
18,678
19,765
-1087
1181569
November
20,678
18,678
2000
4000000
December
23,675
20,678
2997
8982009 42605309
Total
MSE = 42605309/11 = 3873209.91 ≈ 3873210 The forecast (in $100s) for the next month is 𝑦̂13 = ydec = 23,675. b.
Month January February March April May June July August September October November December
Sales ($100s) Forecast 12,354 13,657 12,354.00 14,536 13,005.50 13,478 13,515.67 16,590 13,506.25 19,790 14,123.00 17,987 15,067.50 18,657 15,484.57 19,765 15,881.13 18,678 16,312.67 20,678 16,549.20 23,675 16,924.55 Total
Forecast Error
Squared Forecast Error
1303.00 1530.50 -37.67 3083.75 5667.00 2919.50 3172.43 3883.88 2365.33 4128.80 6750.45
1697809.00 2342430.25 1418.78 9509514.06 32114889.00 8523480.25 10064303.04 15084485.02 5594801.78 17046989.44 45568636.57 147548757.1847
MSE = 147548757.1847/11 = 13413523.38 Forecast (in $100s) for next month is 𝑦̂13 = (y1 + y2 +… + y11 + y12) / 12 = (12,354 + 13,657 + … + 20,678 + 23,675) / 12 = 17,487.08. c. The most recent value method in part (a) is better because MSE is smaller. Difficulty: Moderate LO: 5.2, Pages 212-217
Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 4. Consider the following time series data: Year 1 2 3 4 5 6 7 8 9 10
Value 234 287 255 310 298 250 302 267 225 336
a. Construct a time series plot. What type of pattern exists in the data? b. Develop a three-year moving average for this time series. Compute MSE and a forecast for the year 11. Answer: a. 400
Time Series Value
350 300
250 200 150 100 50 0 0
2
4
6
8
Year (t)
The time series data appear to follow a horizontal pattern. b.
10
12
Year 1 2 3 4 5 6 7 8 9 10
Value 234 287 255 310 298 250 302 267 225 336
Forecast
258.67 284.00 287.67 286.00 283.33 273.00 264.67
Forecast Error
51.33 14.00 -37.67 16.00 -16.33 -48.00 71.33 Total
Squared Forecast Error
2635.11 196.00 1418.78 256.00 266.78 2304.00 5088.44 12165.11
MSE = 12165.11/7 = 1737.87 The forecast for year 11 is 𝑦̂11 = (y8 + y9 + y10) / 3 = (267 + 225 + 336) / 3 = 276.00. Difficulty: Easy LO: 5.3, Pages 217-221 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 5. Consider the following time series data:
Year 1 2 3 4 5 6 7 8 9 10
Value 234 287 255 310 298 250 302 267 225 336
a. Use α = 0.2 to compute the exponential smoothing values for the time series. Compute MSE and a forecast for year 11.
b. Use trial and error to find a value of the exponential smoothing coefficient α that results in a smaller MSE than what you calculated for α = 0.2. c. Compute the forecast for year 11 using the smoothing coefficient α selected using trial error. Answer: a. Smoothing constant α = 0.2
Year 1 2 3 4 5 6 7 8 9 10
Value 234 287 255 310 298 250 302 267 225 336
Forecast
Forecast Error
234.00 244.60 246.68 259.34 267.08 263.66 271.33 270.46 261.37
53.00 10.40 63.32 38.66 -17.08 38.34 -4.33 -45.46 74.63 Total
Squared Forecast Error 2809.00 108.16 4009.42 1494.29 291.56 1469.94 18.73 2066.84 5569.64 17837.58
MSE = 17837.58/9 = 1981.95 The forecast for year 11 is 𝑦̂11 = αy10 + (1- α) 𝑦̂10 = (0.2)(336) + (1 - 0.2)(261.37) = 276.30. b. Several values of α will yield an MSE smaller than the MSE associated with α = 0.2. The table below shows the resulting MSE from several different α. α
MSE
0.1
2285.29
0.2
1981.95
0.3
1928.71
0.4
1978.37
0.5
2081.20
0.6
2219.57
0.7
2387.01
0.8
2580.70
The value of α that yields the minimum MSE is α = 0.29, which yields an MSE of 1928.08. α = 0.29 Year 1 2 3 4 5 6 7 8 9 10
Value 234 287 255 310 298 250 302 267 225 336
Forecast Error
Forecast 234.00 249.37 251.00 268.11 276.78 269.01 278.58 275.22 260.66
53.00 5.63 59.00 29.89 -26.78 32.99 -11.58 -50.22 75.34 Total
Squared Forecast Error 2809.00 31.70 3480.68 893.30 717.14 1088.11 134.09 2522.20 5676.53 17352.74
MSE = 17352.74/9 = 1928.08 The forecast for year 11 is 𝑦̂11 = αy10 + (1- α) 𝑦̂10 = (0.29)(336) + (1 - 0.29)260.66 = 282.51. Difficulty: Challenging LO: 5.3, Pages 221-225 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 6. The monthly market shares of General Electric Company for 12 consecutive months follow. 21.51, 22.43, 23.02, 23.03, 22.1, 23.37, 23.21, 24.6, 23.31, 23.94, 26.05, 26.65 a. Construct a time series plot. What type of pattern exists in the data? b. Develop three-month and four-month moving averages for this time series. Does the threemonth or the four-month moving average provide the better forecasts based on MSE? Explain. c. What is the moving average forecast for the next month? Answer: a.
30
Market Shares
25 20 15 10 5 0 0
2
4
6
8
10
12
14
Month (t)
The data appear to follow a horizontal pattern. b.
Month 1 2 3 4 5 6 7 8 9 10 11 12
Market shares 21.51 22.43 23.02 23.03 22.1 23.37 23.21 24.6 23.31 23.94 26.05 26.65
3-Month Moving Average Forecast
22.32 22.83 22.72 22.83 22.89 23.73 23.71 23.95 24.43
Error
(Error)2
4-Month Moving Average Forecast
0.71 -0.73 0.65 0.38 1.71 -0.42 0.23 2.10 2.22 Total
0.50 0.53 0.43 0.14 2.91 0.17 0.05 4.41 4.91 14.07
22.50 22.65 22.88 22.93 23.32 23.62 23.77 24.48
Error
(Error)2
-0.40 0.72 0.33 1.67 -0.01 0.32 2.29 2.18 Total
0.16 0.53 0.11 2.80 0.00 0.10 5.22 4.73 13.64
MSE (3-Month) = 14.07/ 9 = 1.56 MSE (4-Month) = 13.64/ 8 = 1.71 The 3-Month moving average provides the better forecasts because the MSE for the 3-Month moving average is smaller.
c. Using the 3-Month moving average, the forecast for the next month is 𝑦̂13 = (y10 + y11 + y12) / 3 = (23.94 + 26.05 + 26.65) / 3 = 25.55. Difficulty: Moderate LO: 5.3, Pages 217-221 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 7. The following time series shows the sales of a particular commodity over the past 15 weeks. Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Sales 1123 1157 1138 1120 1130 1132 1188 1151 1129 1118 1125 1147 1162 1190 1137
a. Construct a time series plot. What type of pattern exists in the data? b. Use α = 0.3 to develop the exponential smoothing values for the time series and compute the forecast of demand for the next week. c. Use trial and error to find a value of the exponential smoothing coefficient α that results in a relatively small MSE. Answer: a.
1200
1190 1180
Sales
1170 1160 1150 1140 1130 1120 1110 0
2
4
6
8
10
12
14
16
Week (t)
The time series plot shows a horizontal pattern. b. α
Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
= 0.3
Sales 1123 1157 1138 1120 1130 1132 1188 1151 1129 1118 1125 1147 1162 1190 1137
MSE = 9194.72/14 = 656.77
Forecast 1123.00 1133.20 1134.64 1130.25 1130.17 1130.72 1147.91 1148.83 1142.88 1135.42 1132.29 1136.71 1144.29 1158.01
Forecast Error
Squared Forecast Error
34.00 4.80 -14.64 -0.25 1.83 57.28 3.09 -19.83 -24.88 -10.42 14.71 25.29 45.71 -21.01 Total
1156.00 23.04 214.33 0.06 3.34 3280.82 9.58 393.37 619.19 108.54 216.30 639.84 2089.08 441.23 9194.72
The forecast for week 16 is 𝑦̂16 = αy15 + (1- α)𝑦̂15 = 0.3(1137) + 0.7(1158.01) = 1151.70 c. MSE values for exponential smoothing forecasts with several different values of α appear below. α 0.05 0.1 0.2 0.21 0.3 0.4 0.5
MSE 767.11 686.51 646.06 645.92 656.77 679.49 703.75
The value of α that yields the smallest possible MSE is α = 0.21, which yields an MSE of 645.92. α = 0.21
Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Sales 1123 1157 1138 1120 1130 1132 1188 1151 1129 1118 1125 1147 1162 1190 1137 Total
MSE = 9042.83/14 = 645.92 Difficulty: Challenging LO: 5.3, Pages 221-225 Bloom’s: Application
Forecast
Forecast Error
Squared Forecast Error
1123.00 1130.14 1131.79 1129.31 1129.46 1129.99 1142.17 1144.03 1140.87 1136.07 1133.74 1136.53 1141.88 1151.98
34.00 7.86 -11.79 0.69 2.54 58.01 8.83 -15.03 -22.87 -11.07 13.26 25.47 48.12 -14.98
1156.00 61.78 139.02 0.47 6.46 3364.90 77.90 225.82 523.11 122.51 175.72 648.83 2315.82 224.49 9042.83
BUSPROG: Analytic DISC: Time Series Data 8. The following times series shows the demand for a particular product over the past 10 months. Month 1 2 3 4 5 6 7 8 9 10
Demand 324 311 303 314 323 313 302 315 312 326
a. Construct a time series plot. What type of pattern exists in the data? b. Develop a three-month moving average for this time series. Compute MSE and a forecast for month 11. Answer: a. 340
Demand
330 320 310 300 290 280 0
2
4
6
8
10
Month (t)
The data appear to follow a horizontal pattern. b.
Month 1 2
Demand 324 311
Forecast
Forecast Error
Squared Forecast Error
12
3 4 5 6 7 8 9 10
303 314 323 313 302 315 312 326
312.67 309.33 313.33 316.67 312.67 310.00 309.67 Total
1.33 13.67 -0.33 -14.67 2.33 2.00 16.33
1.78 186.78 0.11 215.11 5.44 4.00 266.78 680.00
MSE = 680.00/7 = 97.14 The forecast for month 11 is 𝑦̂11 = (y8 + y9 + y10) / 3 = (315 + 312 + 326) / 3 = 317.67. Difficulty: Easy LO: 5.3, Pages 217-221 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 9. The following times series shows the demand for a particular product over the past 10 months. Month 1 2 3 4 5 6 7 8 9 10
Value 324 311 303 314 323 313 302 315 312 326
a. Use α = 0.2 to compute the exponential smoothing values for the time series. Compute MSE and a forecast for month 11. b. Compare the three-month moving average forecast with the exponential smoothing forecast using α = 0.2. Which appears to provide the better forecast based on MSE? Answer: a. Smoothing constant α = 0.2
α
0.2
Month 1 2 3 4 5 6 7 8 9 10
Value 324 311 303 314 323 313 302 315 312 326
Forecast
Forecast Error
Squared Forecast Error
324.00 321.40 317.72 316.98 318.18 317.14 314.12 314.29 313.83
-13.00 -18.40 -3.72 6.02 -5.18 -15.14 0.88 -2.29 12.17
169.00 338.56 13.84 36.29 26.84 229.36 0.78 5.26 148.01 Total = 967.94
MSE = 967.94/9 = 107.55 The forecast for month 11 is 𝑦̂11 = αy10 + (1- α) 𝑦̂10 = 0.2(326) + (1 - 0.2)313.83 = 316.27. b. Comparing the MSE for three-month moving average (calculated in the previous problem) and the MSE for exponential smoothing, the three-month moving average provides a better forecast as it has a smaller MSE. Difficulty: Moderate LO: 5.3, Pages 217-225 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 10. The following data shows the quarterly profit (in thousands of dollars) made by a particular company in the past 3 years. Year
Quarter
Profit ($1000s)
1
1
45
1
2
51
1
3
72
1
4
50
2
1
49
2
2
45
2
3
79
2
4
54
3
1
42
3
2
58
3
3
70
3
4
56
a. Construct a time series plot. What type of pattern exists in the data? b. Develop a three-period moving average for this time series. Compute MSE and a forecast of profit (in $1000s) for the next quarter. Answer: Rewrite the data as below: t
Profit ($1000s)
1
45
2
51
3
72
4
50
5
49
6
45
7
79
8
54
9
42
10
58
11
70
12
56
Profit ($1000s)
a.
90 80 70 60 50 40 30 20 10 0 0
2
4
6
8 t
10
12
14
The data appear to follow a horizontal pattern. b.
Year 1 1 1 1 2 2 2 2 3 3 3 3
Quarter 1 2 3 4 1 2 3 4 1 2 3 4
t 1 2 3 4 5 6 7 8 9 10 11 12
Profit ($1000s) 45 51 72 50 49 45 79 54 42 58 70 56
Forecast
Forecast Error
Squared Forecast Error
56.00 57.67 57.00 48.00 57.67 59.33 58.33 51.33 56.67
-6.00 -8.67 -12.00 31.00 -3.67 -17.33 -0.33 18.67 -0.67
36.00 75.11 144.00 961.00 13.44 300.44 0.11 348.44 0.44 Total = 1879.00
MSE = 1879/9 = 208.78 The forecast of profit (in $1000s) for the next quarter is 𝑦̂13 = (y10 + y11 + y12) / 3 = (58 + 70 + 56) / 3 = 61.33. Difficulty: Easy LO: 5.3, Pages 217-221 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data
11. The following data shows the quarterly profit (in thousands of dollars) made by a particular company in the past 3 years. Year
Quarter
Profit ($1000s)
1
1
45
1
2
51
1
3
72
1
4
50
2
1
49
2
2
45
2
3
79
2
4
54
3
1
42
3
2
58
3
3
70
3
4
56
a. Use α = 0.3 to compute the exponential smoothing values for the time series. Compute MSE and the forecast of profit (in $1000s) for the next quarter. b. Compare the three-period moving average forecast with the exponential smoothing forecast using α = 0.3. Which appears to provide the better forecast based on MSE? Answer: a. Smoothing constant α = 0.3
Year 1 1 1 1 2 2 2 2 3 3 3 3
Quarter 1 2 3 4 1 2 3 4 1 2 3 4
t 1 2 3 4 5 6 7 8 9 10 11 12
Profit ($1000s) 45 51 72 50 49 45 79 54 42 58 70 56
Forecast 45.000 46.800 54.360 53.052 51.836 49.785 58.550 57.185 52.629 54.241 58.968
Forecast Error
Squared Forecast Error
6.000 25.200 -4.360 -4.052 -6.836 29.215 -4.550 -15.185 5.371 15.759 -2.968 Total
36.000 635.040 19.010 16.419 46.736 853.488 20.701 230.581 28.843 248.359 8.811 2143.988
MSE = 2143.988/11 = 194.91 The forecast of profit (in $1000s) for quarter 13 is 𝑦̂13 = αy12 + (1- α)𝑦̂12 = 0.3(56) + (1 0.3)58.968 = 58.08. b. Compared to the three-period moving average forecast (calculated in the previous problem), exponential smoothing forecast provides a better forecast because it has a smaller MSE.
Difficulty: Moderate LO: 5.3, Pages 217-225 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 12. The below time series gives the indices of Industrial Production in U.S for 10 consecutive years. Year 1 2 3 4 5 6 7 8 9 10
IP 79.62 86.54 88.14 89.23 93.45 97.4 99.34 96.98 100.22 103.56
a. Construct a time series plot. What type of pattern exists in the data? b. Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series. c. What is the forecast for t = 11? Answer: a. 120 100
IP
80 60 40 20 0 0
2
4
6 Year (t)
The time series plot shows a linear trend.
8
10
12
b. Excel output:
From the above output, the regression estimates for the y-intercept and slope that minimize MSE for this time series are b0 = 80.458 and b1 = 2.36, which result in the following forecasts, errors, and MSE:
Year 1 2 3 4 5 6 7 8 9 10
IP 79.62 86.54 88.14 89.23 93.45 97.4 99.34 96.98 100.22 103.56
Forecast 82.81981818 85.18163636 87.54345455 89.90527273 92.26709091 94.62890909 96.99072727 99.35254545 101.71436364 104.07618182 Total
MSE = 35.622/10 = 3.56. c.
𝑦̂11 = b0 + b1t = 80.458 + 2.36(11) = 106.438.
Forecast Error -3.200 1.358 0.597 -0.675 1.183 2.771 2.349 -2.373 -1.494 -0.516
Squared Forecast Error 10.239 1.845 0.356 0.456 1.399 7.679 5.519 5.629 2.233 0.266 35.622
Difficulty: Moderate LO: 5.4, Pages 226-228 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 13. The monthly sales (in hundreds of dollars) of a company are listed below. Month January February March April May June July August September October November December
Sales ($100s) 12,354 13,657 14,536 13,478 16,590 19,790 17,987 18,657 19,765 18,678 20,678 23,675
a. Construct a time series plot. What type of pattern exists in the data? b. Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series. c. What is the sales forecast (in hundreds of dollars) for next month? Answer: a. 25,000
Sales ($100s)
20,000 15,000 10,000 5,000
0 0
2
4
6
8 Month (t)
10
12
14
The time series plot shows a linear trend. b.
Excel output:
From the above output, the regression estimates for the y-intercept and slope that minimize MSE for this time series are b0 = 11747.38 and b1 = 883.03, which result in the following forecasts, errors, and MSE:
Month January February March April May June July August September October November December
t 1 2 3 4 5 6 7 8 9 10 11 12
Sales ($100s) 12,354 13,657 14536 13478 16,590 19,790 17,987 18,657 19,765 18,678 20,678 23,675
Forecast 12630.41026 13513.44172 14396.47319 15279.50466 16162.53613 17045.56760 17928.59907 18811.63054 19694.66200 20577.69347 21460.72494 22343.75641
Squared Forecast Error Forecast Error -276.410 76402.630 143.558 20608.978 139.527 19467.730 -1801.505 3245419.047 427.464 182725.360 2744.432 7531909.203 58.401 3410.669 -154.631 23910.603 70.338 4947.434 -1899.693 3608835.292 -782.725 612658.334 1331.244 1772209.495 Total = 17102504.775
MSE = 17102504.775/12 = 1425208.73. c. The forecast (in $100s) is 𝑦̂13 = b0 + b1t = 11747.38 + 883.03(13) = $23,226.79.
Difficulty: Moderate LO: 5.4, Pages 226-228 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 14. Consider the following time series: t 1 2 3 4 5 6 7 8
yt 1234 1201 1103 987 945 891 817 734
a. Construct a time series plot. What type of pattern exists in the data? b. Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series. c. What is the forecast for t = 9? Answer: a. 1400 1200 1000
yt
800 600 400 200 0 0
2
4
6 t
The data appear to show a linear trend with a decreasing pattern.
8
10
b. Excel output:
The regression estimates for the y-intercept and slope that minimize MSE for the given time series are b0 = 1315.68 and b1 = -72.60, which result in the following forecasts, errors, and MSE:
t 1 2 3 4 5 6 7 8
yt 1234 1201 1103 987 945 891 817 734
Forecast 1243.083333 1170.488095 1097.892857 1025.297619 952.702381 880.1071429 807.5119048 734.9166667
Forecast Error -9.083 30.512 5.107 -38.298 -7.702 10.893 9.488 -0.917
Squared Forecast Error 82.507 930.976 26.083 1466.708 59.327 118.654 90.024 0.840 Total = 2775.119
MSE = 2775.119/8 = 346.89. c. 𝑦̂9 = b0 + b1t = 1315.68 - 72.60(9) = 662.32 Difficulty: Moderate LO: 5.4, Pages 226-228 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 15.
The yearly sales (in millions of dollars) of an automobile manufacturing company during the period 2000-2011 are given below:
Year
Sales ($millions) y
470 485 499 515 532 532 556 576 583 587 601 605
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
a. Construct a time series plot. What type of pattern exists in the data? b. Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series. c. What is the sales forecast (in millions of dollars) for the year 2012? Answer: a. 700
Sales (in millions of dollars)
600 500 400 300 200 100 0 1998
2000
2002
2004
2006
Year (t)
The time series plot shows a linear trend. b. Excel output:
2008
2010
2012
The regression estimates for the y-intercept and slope that minimize MSE for the given time series are b0 = -24986.47 and b1 = 12.73, which result in the following forecasts, errors, and MSE:
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Sales ($millions) y 470 485 499 515 532 532 556 576 583 587 601 605
Forecast 475.0641026 487.7948718 500.525641 513.2564103 525.9871795 538.7179487 551.4487179 564.1794872 576.9102564 589.6410256 602.3717949 615.1025641
Forecast Error -5.064 -2.795 -1.526 1.744 6.013 -6.718 4.551 11.821 6.090 -2.641 -1.372 -10.103
MSE = 428.551/12 = 35.713. c. The sales forecast (in millions of dollars) for the year 2012: 𝑦̂2012 = b0 + b1t = -24986.47 + 12.73(2012) = 627.83. Difficulty: Moderate LO: 5.4, Pages 226-228 Bloom’s: Application BUSPROG: Analytic
Squared Forecast Error 25.645 7.811 2.328 3.040 36.154 45.131 20.714 139.725 37.085 6.975 1.882 102.062 Total = 428.551
DISC: Time Series Data 16. Consider the following time series data: t 1 2 3 4 5 6 7 8 9 10
yt 0.345 0.366 0.398 0.356 0.456 0.478 0.543 0.596 0.634 0.698
a. Construct a time series plot. What type of pattern exists in the data? b. Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series. c. What is the forecast for t = 11? Answer: a. 0.8 0.7 0.6
yt
0.5 0.4 0.3 0.2 0.1 0 0
2
4
6 t
This time series plot shows an upward linear trend. b. Excel output:
8
10
12
The regression estimates that minimize MSE for this time series are b0 = 0.2661 and b1 = 0.0402, which result in the following forecasts, errors, and MSE:
t 1 2 3 4 5 6 7 8 9 10
yt 0.345 0.366 0.398 0.356 0.456 0.478 0.543 0.596 0.634 0.698
Forecast 0.306290909 0.346448485 0.386606061 0.426763636 0.466921212 0.507078788 0.547236364 0.587393939 0.627551515 0.667709091
Forecast Error 0.039 0.020 0.011 -0.071 -0.011 -0.029 -0.004 0.009 0.006 0.030
Squared Forecast Error 0.001 0.000 0.000 0.005 0.000 0.001 0.000 0.000 0.000 0.001 Total = 0.009
MSE = 0.009/10 = 0.0009. c. The forecast for t = 11 is 𝑦̂11 = b0 + b1t = 0.2661 + 0.0402(11) = 0.708. Difficulty: Moderate LO: 5.4, Pages 226-228 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 17. Consider the following quarterly time series:
Quarter 1 2 3 4
Year 1 923 1056 1124 992
Year 2 1112 1156 1124 1078
Year 3 1243 1301 1254 1198
a. Construct a time series plot. What type of pattern exists in the data? b. Use a multiple regression model with dummy variables as follows to develop an equation to account for seasonal effects in the data. Qtr1 = 1 if quarter 1, 0 otherwise; Qtr2 = 1 if quarter 2, 0 otherwise; Qtr3 = 1 if quarter 3, 0 otherwise. c. Compute the quarterly forecasts for next year based on the model developed in part b. Answer: a. 1400
Time Series Values
1200 1000 800 600 400
200 0 0
2
4
6
8
10
12
14
Quarter (t)
The above time series plot reveals a horizontal pattern with a seasonal pattern in the data. For instance, in each year the value increases from quarter 1 to quarter 2 and drops from quarter 3 to quarter 4. b. Rewrite the data using the dummy variables in the following format:
Year 1 1 1 1 2
Quarter 1 2 3 4 1
Qtr1 1 0 0 0 1
Qtr2 0 1 0 0 0
Qtr3 0 0 1 0 0
Time Series Value, yt 923 1056 1124 992 1112
2 2 2 3 3 3 3
2 3 4 1 2 3 4
0 0 0 1 0 0 0
1 0 0 0 1 0 0
0 1 0 0 0 1 0
1156 1124 1078 1243 1301 1254 1198
We can use Excel’s Regression tool to find the regression model that accounts for the seasonal effects in the data. Excel output:
From the above output, the regression model that minimizes MSE for the given time series is: 𝑦̂𝑡 = 1089.33 + 3.33Qtr1 + 81.67Qtr2 + 78Qtr3 c. Based on the model in part (b), the quarterly forecasts for next year are as follows: Quarter 1 forecast = 1089.33 + 3.33(1) + 81.67(0) + 78(0) = 1092.67 Quarter 2 forecast = 1089.33 + 3.33(0) + 81.67(1) + 78(0) = 1171.00 Quarter 3 forecast = 1089.33 + 3.33(0) + 81.67(0) + 78(1) = 1167.33 Quarter 4 forecast = 1089.33 + 3.33(0) + 81.67(0) + 78(0) = 1089.33 Difficulty: Challenging LO: 5.4, Pages 228-230
Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 18. Consider the following time series: Quarter 1 2 3 4
Year 1 923 1056 1124 992
Year 2 1112 1156 1124 1078
Year 3 1243 1301 1254 1198
a. Use a multiple regression model to develop an equation to account for linear trend and seasonal effects in the data. To capture seasonal effects, use the dummy variables Qtr1 = 1 if quarter 1, 0 otherwise; Qtr2 = 1 if quarter 2, 0 otherwise; Qtr3 = 1 if quarter 3, 0 otherwise; and create a variable t such that t = 1 for quarter 1 in year 1, t = 2 for quarter 2 in year 1, … ,t = 12 for quarter 4 in year 3. b. Compute the quarterly forecasts for next year based on the model developed in part a. Answer: a. Rewrite the data using dummy variables and variable t in the following format: Year 1 1 1 1 2 2 2 2 3 3 3 3
Quarter 1 2 3 4 1 2 3 4 1 2 3 4
Qtr1 1 0 0 0 1 0 0 0 1 0 0 0
Qtr2 0 1 0 0 0 1 0 0 0 1 0 0
Qtr3 0 0 1 0 0 0 1 0 0 0 1 0
t 1 2 3 4 5 6 7 8 9 10 11 12
yt 923 1056 1124 992 1112 1156 1124 1078 1243 1301 1254 1198
Use Excel’s Regression tool to find the regression model that accounts for both the trend and seasonal effects in the data. Excel output:
The regression model that minimizes MSE for the given time series is: 𝑦̂𝑡 = 864.08 + 87.80Qtr1 + 137.98Qtr2 + 106.16Qtr3 + 28.16t b. Based on the model in part (a), the quarterly forecasts for next year are as follows: Quarter 1 forecast = 864.08 + 87.80(1) + 137.98(0) + 106.16(0) + 28.16(13) = 1317.92 Quarter 2 forecast = 864.08 + 87.80(0) + 137.98(1) + 106.16(0) + 28.16(14) = 1396.25 Quarter 3 forecast = 864.08 + 87.80(0) + 137.98(0) + 106.16(1) + 28.16(15) = 1392.58 Quarter 4 forecast = 864.08 + 87.80(0) + 137.98(0) + 106.16(0) + 28.16(16) = 1314.58 Difficulty: Moderate LO: 5.4, Pages 230-231 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 19. The following table shows the average monthly distance traveled (in billion miles) by vehicles on urban highways for five different years.
Years Year 1 Year 2 Year 3 Year 4 Year 5
Urban Highways - Average Monthly Distance Traveled by Vehicles (Billion Miles) Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec 4.22 5.32 5.21 5.12 4.92 4.49 4.55 4.49 4.44 4.39 4.37 4.35 4.31 5.44 5.34 5.24 4.98 4.59 4.68 4.65 4.61 4.68 4.74 4.79 4.38 5.51 5.41 5.36 4.98 4.63 4.71 4.78 4.82 4.88 4.85 4.89 4.45 5.59 5.5 5.41 5.01 4.72 4.78 4.79 4.82 4.92 5.06 5.11 4.51 5.65 5.62 5.49 5.12 4.8 4.88 4.82 4.95 5.12 5.22 5.44
a. Construct a time series plot. What type of pattern exists in the data? b. Use a multiple regression model with dummy variables as follows to develop an equation to account for seasonal effects in the data. Jan = 1 if Month is January, 0 otherwise; Feb = 1 if month is February, 0 otherwise; …; Nov = 1 if month is November, 0 otherwise. c. Compute the forecast (in billion miles) for next three months based on the model developed in part b. Answer: a.
Distance Traveled (in Billion Miles)
5.8 5.6 5.4 5.2 5 4.8 4.6 4.4 4.2 4 0
10
20
30
40
50
60
70
Month (t)
This time series plot shows horizontal pattern in the data, however, there is seasonal pattern as well. For instance, the lowest value occurs in January and the highest in February. b. Rewrite the data with dummy variables in the following format: yt 4.22 5.32 5.21 5.12 4.92 4.49 4.55 4.49 4.44 4.39
Jan 1 0 0 0 0 0 0 0 0 0
Feb 0 1 0 0 0 0 0 0 0 0
Mar 0 0 1 0 0 0 0 0 0 0
Apr 0 0 0 1 0 0 0 0 0 0
May 0 0 0 0 1 0 0 0 0 0
June 0 0 0 0 0 1 0 0 0 0
July 0 0 0 0 0 0 1 0 0 0
Aug 0 0 0 0 0 0 0 1 0 0
Sep 0 0 0 0 0 0 0 0 1 0
Oct 0 0 0 0 0 0 0 0 0 1
Nov 0 0 0 0 0 0 0 0 0 0
4.37 4.35 4.31 5.44 5.34 5.24 4.98 4.59 4.68 4.65 4.61 4.68 4.74 4.79 4.38 5.51 5.41 5.36 4.98 4.63 4.71 4.78 4.82 4.88 4.85 4.89 4.45 5.59 5.5 5.41 5.01 4.72 4.78 4.79 4.82 4.92 5.06 5.11 4.51 5.65 5.62
0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
5.49 5.12 4.8 4.88 4.82 4.95 5.12 5.22 5.44
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
Use Excel’s Regression tool to find the regression model that accounts for the seasonal effects in the data. Excel output:
From the above output, the regression model that minimizes MSE for this time series is: 𝑦̂𝑡 = 4.916 – 0.542Jan + 0.586Feb + 0.5Mar + 0.408Apr + 0.086May –0.27June – 0.196July – 0.21Aug – 0.188Sep – 0.118Oct – 0.068Nov c. Based on the model in part (b), the quarterly forecasts for the next three months are as follows:
Year 6, January forecast (in billion miles) = 4.916 – 0.542(1) +0.586(0) +0.5(0) + 0.408(0) + 0.086(0) – 0.27(0) – 0.196(0) – 0.21(0) – 0.188(0) – 0.118(0) – 0.068(0) = 4.374. Year 6, February forecast (in billion miles) = 4.916 – 0.542(0) +0.586(1) +0.5(0) + 0.408(0) + 0.086(0) – 0.27(0) – 0.196(0) – 0.21(0) – 0.188(0) – 0.118(0) – 0.068(0) = 5.502. Year 6, March forecast (in billion miles) = 4.916 – 0.542(0) +0.586(0) +0.5(1) + 0.408(0) + 0.086(0) – 0.27(0) – 0.196(0) – 0.21(0) – 0.188(0) – 0.118(0) – 0.068(0) = 5.416. Difficulty: Challenging LO: 5.4, Pages 228-230 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data 20. The following table shows the average monthly distance traveled (in billion miles) by vehicles on urban highways for five different years.
Years Year 1 Year 2 Year 3 Year 4 Year 5
Jan 4.22 4.31 4.38 4.45 4.51
Urban Highways - Average Monthly Distance Traveled by Vehicles (Billion Miles) Feb Mar Apr May Jun July Aug Sep Oct Nov Dec 5.32 5.21 5.12 4.92 4.49 4.55 4.49 4.44 4.39 4.37 4.35 5.44 5.34 5.24 4.98 4.59 4.68 4.65 4.61 4.68 4.74 4.79 5.51 5.41 5.36 4.98 4.63 4.71 4.78 4.82 4.88 4.85 4.89 5.59 5.5 5.41 5.01 4.72 4.78 4.79 4.82 4.92 5.06 5.11 5.65 5.62 5.49 5.12 4.8 4.88 4.82 4.95 5.12 5.22 5.44
a. Use a multiple regression model to develop an equation to account for seasonal effects and any linear trend in the data. To capture seasonal effects, use the dummy variables Jan = 1 if month is January, 0 otherwise; Feb = 1 if month is February, 0 otherwise; …; Nov = 1 if month is November, 0 otherwise; and create a variable t such that t = 1 for January of year 1, t = 2 for February of year 1, …, t = 60 for December of year 5. b. Compute the forecast (in billion miles) for next three months based on the model developed in part a. Answer: a. yt 4.22 5.32 5.21 5.12 4.92 4.49
Rewrite the data using the dummy variables and the variable t in the following format: Jan 1 0 0 0 0 0
Feb 0 1 0 0 0 0
Mar 0 0 1 0 0 0
Apr 0 0 0 1 0 0
May 0 0 0 0 1 0
June 0 0 0 0 0 1
July 0 0 0 0 0 0
Aug 0 0 0 0 0 0
Sep 0 0 0 0 0 0
Oct 0 0 0 0 0 0
Nov 0 0 0 0 0 0
t 1 2 3 4 5 6
4.55 4.49 4.44 4.39 4.37 4.35 4.31 5.44 5.34 5.24 4.98 4.59 4.68 4.65 4.61 4.68 4.74 4.79 4.38 5.51 5.41 5.36 4.98 4.63 4.71 4.78 4.82 4.88 4.85 4.89 4.45 5.59 5.5 5.41 5.01 4.72 4.78 4.79 4.82 4.92 5.06
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
5.11 4.51 5.65 5.62 5.49 5.12 4.8 4.88 4.82 4.95 5.12 5.22 5.44
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0
48 49 50 51 52 53 54 55 56 57 58 59 60
We can use Excel’s Regression tool to find the regression model that accounts for both the trend and seasonal effects in the data. Excel output:
From the above output, the regression model that minimizes MSE for this time series is:
𝑦̂𝑡 = 4.576 – 0.438Jan + 0.681Feb + 0.585Mar + 0.484Apr + 0.152May – 0.213June – 0.149July – 0.172Aug – 0.1608Sep – 0.099Oct – 0.059Nov + 0.0095t b. Based on the model in part (a), the quarterly forecasts for the next three months are as follows: Year 6, January forecast (in billion miles) = 4.576 – 0.438(1) + 0.681(0) + 0.585(0) + 0.484(0) + 0.152(0) – 0.213(0) – 0.149(0) – 0.172(0) – 0.1608(0) – 0.099(0) – 0.059(0) + 0.0095(61) = 4.714. Year 6, February forecast (in billion miles) = 4.576 – 0.438(0) + 0.681(1) + 0.585(0) + 0.484(0) + 0.152(0) – 0.213(0) – 0.149(0) – 0.172(0) – 0.1608(0) – 0.099(0) – 0.059(0) + 0.0095(62) = 5.842. Year 6, March forecast (in billion miles) = 4.576 – 0.438(0) + 0.681(0) + 0.585(1) + 0.484(0) + 0.152(0) – 0.213(0) – 0.149(0) – 0.172(0) – 0.1608(0) – 0.099(0) – 0.059(0) + 0.0095(63) = 5.756. Difficulty: Challenging LO: 5.4, Pages 228-231 Bloom’s: Application BUSPROG: Analytic DISC: Time Series Data
Chapter 6: Data Mining 1. Which of the following reasons is responsible for the increase in the use of data-mining techniques in business? a. The lack of methods to electronically track data b. The dearth of information to analyze and interpret c. The ability to electronically warehouse data d. The ability to manually analyze all the data Answer: C Difficulty: Moderate LO: 6.1, Page 252 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The increase in the use of data-mining techniques in business has been caused largely by three events: the explosion in the amount of data being produced and electronically tracked, the ability to electronically warehouse these data, and the affordability of computer power to analyze the data. 2. Observation refers to the: a. estimated continuous outcome variable. b. set of recorded values of variables associated with a single entity. c. goal of predicting a categorical outcome based on a set of variables. d. mean of all variable values associated with one particular entity. Answer: B Difficulty: Moderate LO: 6.1, Page 252 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An observation is defined as the set of recorded values of variables associated with a single entity. It is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables. 3. _____ is a category of data-mining techniques in which an algorithm learns how to predict or classify an outcome variable of interest. a. Supervised learning b. Unsupervised learning c. Dimension reduction d. Data sampling Answer: A Difficulty: Easy LO: 6.1, Page 252 Bloom’s: Knowledge BUSPROG: Analytic DISC:
Feedback: Supervised learning is a category of data-mining techniques in which an algorithm learns how to predict or classify an outcome variable of interest. Linear regression is a wellknown supervised learning approach from classical statistics. 4. The estimation of the value for a continuous outcome is done during _____. a. classification b. prediction c. data preparation d. data sampling Answer: B Difficulty: Easy LO: 6.1, Page 252 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The estimation of the value for a continuous outcome is done during prediction. 5. _____ is the process of estimating the value of a categorical outcome variable. a. Sampling b. Prediction c. Classification d. Validation Answer: C Difficulty: Easy LO: 6.1, Page 252 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Classification is the process of estimating the value of a categorical outcome variable. 6. _____methods do not attempt to predict an output value but are rather used to detect patterns and relationships in the data. a. Supervised learning b. Machine learning c. Artificial intelligence d. Unsupervised learning Answer: D Difficulty: Easy LO: 6.1, Page 252 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Unsupervised learning methods do not attempt to predict an output value but are rather used to detect patterns and relationships in the data.
7. In which of the following data-mining process steps is the data manipulated to make it suitable for formal modeling? a. Data sampling b. Data preparation c. Model construction d. Model assessment Answer: B Difficulty: Easy LO: 6.1, Page 253 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The data preparation step involves manipulation of data to put it in a form suitable for formal modeling. 8. A sample is representative of the entire data population only if it: a. includes all the observations as the original data repository. b. can be used to draw the same conclusions as the database. c. is drawn sequentially from the given database. d. is small enough to be manipulated quickly. Answer: B Difficulty: Moderate LO: 6.1, Page 253 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: A sample is representative if the analyst can make the same conclusions from it as from the entire population of data. The sample of data must be large enough to contain significant information, yet small enough to be manipulated quickly. 9. Which of the following methods is used by the analyst to decide if a particular variable needs to be retained in the sample during the sampling process? a. Descriptive statistics and data visualization b. Regression c. Outlier analysis d. Data Testing Answer: A Difficulty: Moderate LO: 6.1, Page 253 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: When obtaining a representative sample, it is also important not to carelessly discard variables from consideration. It is generally best to include as many variables as possible in the sample. After exploring the data with descriptive statistics and visualization, the analyst can eliminate variables that are not of interest.
10. The process of reducing the number of variables to consider in a data-mining approach without losing any crucial information is termed as _____. a. dimension reduction b. data sampling c. data reduction d. aggregation Answer: A Difficulty: Easy LO: 6.2, Page 254 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The process of reducing the number of variables to consider in a data-mining approach without losing any crucial information is termed as dimension reduction. 11. Which of the following is true of unsupervised learning? a. Its objective is to predict the outcome of a variable. b. Its error tolerance is tightly controlled by accuracy measures. c. Qualitative assessments are used to confirm the definite accuracy measures. d. It detects patterns and relationships in the data. Answer: D Difficulty: Moderate LO: 6.3, Page 255 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In an unsupervised learning application, there is no outcome variable to predict; rather, the goal is to use the variable values to identify relationships between observations. Without an explicit outcome variable, there is no definite measure of accuracy. Instead, qualitative assessments, such as how well the results match expert judgment, are used to assess unsupervised learning methods. 12. The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is _____. a. data visualization b. cluster analysis c. market analysis d. supervised learning Answer: B Difficulty: Easy LO: 6.3, Page 256 Bloom’s: Knowledge BUSPROG: Analytic DISC:
Feedback: Clustering can be employed during the data preparation step to identify variables or observations that can be aggregated or removed from consideration. Cluster analysis is commonly used in marketing to divide consumers into different homogeneous groups, a process known as market segmentation. 13. Which of the following is true of hierarchical clustering? a. All observations are put in a mega-cluster to begin with. b. Each of the large clusters is broken down iteratively. c. It is a bottom-up approach to clustering. d. At the end of the process, observations in the same cluster have maximum distance. Answer: C Difficulty: Moderate LO: 6.3, Page 256 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Bottom-up hierarchical clustering starts with each observation belonging to its own cluster and then sequentially merges the most similar clusters to create a series of nested clusters. 14. k-means clustering is the process of: a. agglomerating observations into a series of nested groups based on a measure of similarity. b. organizing observations into one of a number of groups based on a measure of similarity. c. reducing the number of variables to consider in a data-mining approach. d. estimating the value of a continuous outcome variable. Answer: B Difficulty: Easy LO: 6.3, Page 256 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: k-means clustering is the process of organizing observations into one of k groups based on a measure of similarity. 15. Which of the following is true of Euclidean distances? a. It is used to measure dissimilarity between categorical variable observations. b. It is not affected by the scale on which variables are measured. c. It increases with the increase in similarity between variable values. d. It is susceptible to distortions from outlier measurements. Answer: D Difficulty: Moderate LO: 6.3, Pages 256-257 Bloom’s: Comprehension BUSPROG: Analytic DISC:
Feedback: Euclidean distance is the geometric measure of dissimilarity between observations based on Pythagorean Theorem. It is susceptible to distortions from outlier measurements. 16. The simplest measure of similarity between observations consisting solely of categorical variables is given by _____. a. the Euclidean distance b. the Ward’s distance c. matching coefficient d. Jaccard’s coefficient Answer: C Difficulty: Easy LO: 6.3, Page 257 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: When clustering observations solely on the basis of categorical variables encoded as 0–1 (or dummy variables), a better measure of similarity between two observations can be achieved by counting the number of variables with matching values. The simplest overlap measure is called the matching coefficient. 17. Jaccard’s coefficient is different from the matching coefficient in that the: a. former measures overlap while the latter measures dissimilarity. b. former does not count matching zero entries while the latter does. c. former deals with categorical variable while the latter deals with continuous variables. d. former is affected by the scale used to measure variables while the latter is not. Answer: B Difficulty: Moderate LO: 6.3, Page 258 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Jaccard’s coefficient refers to a measure of similarity between observations consisting solely of binary categorical variables that consider only matches of nonzero entries. 18. Single linkage is a measure of calculating dissimilarity between clusters by: a. considering only the two most dissimilar observations in the two clusters. b. computing the average dissimilarity between every pair of observations between the two clusters. c. considering only the two closest observations in the two clusters. d. considering the distance between the cluster centroids. Answer: C Difficulty: Moderate LO: 6.3, Page 258 Bloom’s: Knowledge BUSPROG: Analytic DISC:
Feedback: Single linkage is a measure of calculating dissimilarity between clusters by considering only the two closest observations in the two clusters. 19. _____ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters. a. Single linkage b. Complete linkage c. Average linkage d. Average group linkage Answer: B Difficulty: Easy LO: 6.3, Page 259 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Complete linkage is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters. 20. Average linkage is a measure of calculating dissimilarity between clusters by: a. considering only the two most dissimilar observations in the two clusters. b. computing the average dissimilarity between every pair of observations between the two clusters. c. considering only the two closest observations in the two clusters. d. considering the distance between the cluster centroids. Answer: B Difficulty: Easy LO: 6.3, Page 259 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Average linkage is a measure of calculating dissimilarity between clusters by computing the average dissimilarity between every pair of observations between the two clusters. 21. _____ is a measure of calculating dissimilarity between clusters by considering the distance between the cluster centroids. a. Single linkage b. Complete linkage c. Average linkage d. Average group linkage Answer: D Difficulty: Easy LO: 6.3, Page 259 Bloom’s: Knowledge BUSPROG: Analytic DISC:
Feedback: Average group linkage is a measure of calculating dissimilarity between clusters by considering the distance between the cluster centroids. 22. _____ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation. a. Single linkage b. Ward’s method c. Average group linkage d. Dendrogram Answer: B Difficulty: Easy LO: 6.3, Page 259 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Ward’s method is a procedure that can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation. 23. A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a _____. a. dendrogram b. scatter chart c. decile-wise lift chart d. cumulative lift tree Answer: A Difficulty: Easy LO: 6.3, Pages 260-261 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a dendrogram. 24. A cluster’s _____ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram. a. dimension b. affordability c. durability d. span Answer: C Difficulty: Moderate LO: 6.3, Page 262 Bloom’s: Knowledge BUSPROG: Analytic DISC:
Feedback: A cluster’s durability (or strength) can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram. 25. The endpoint of a k-means clustering algorithm occurs when: a. Euclidean distance between clusters is minimum. b. Euclidean distance between observations in a cluster is maximum. c. no further changes are observed in cluster structure and number. d. all of the observations are encompassed within a single large cluster with mean k. Answer: C Difficulty: Challenging LO: 6.3, Page 262 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The k-means algorithm repeats the process (calculate cluster centroid, assign observation to cluster with nearest centroid) until there is no change in the clusters or a specified maximum number of iterations is reached. 26. In which of the following scenarios would it be appropriate to use hierarchical clustering? a. When the number of observations in the dataset is relatively high b. When it is not necessary to know the nesting of clusters c. When the number of clusters is known beforehand d. When binary or ordinal data needs to be clustered Answer: D Difficulty: Challenging LO: 6.3, Page 265 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: If one has a small data set and want to easily examine solutions with increasing numbers of clusters, one may want to use hierarchical clustering. Hierarchical clusters are also convenient if one want to observe how clusters are nested. k-means clustering partitions the observations, which is appropriate if trying to summarize the data with k “average” observations that describe the data with the minimum amount of error. Because Euclidean distance is the standard metric for k-means clustering, it is generally not as appropriate for binary or ordinal data for which an “average” is not meaningful. 27. An analysis of items frequently co-occurring in transactions is known as _____. a. market segmentation b. market basket analysis c. regression analysis d. cluster analysis Answer: B Difficulty: Easy LO: 6.3, Page 265
Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An analysis of items frequently co-occurring in transactions is known as market basket analysis. 28. A _____ refers to the number of times that a collection of items occur together in a transaction data set. a. test set b. validation count c. support count d. training set Answer: C Difficulty: Easy LO: 6.3, Page 266 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The number of times that a collection of items occur together in a transaction data set is known as the support count. 29. The lift ratio of an association rule with a confidence value of 0.43 and in which the consequent occurs in 6 out of 10 cases is: a. 1.40 b. 0.54 c. 1.00 d. 0.72 Answer: D Difficulty: Moderate LO: 6.3, Page 267 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The lift ratio is given by confidence/ (support of consequent/total number of transactions). By substituting the given values, the lift ratio is obtained as 0.72. 30. Which of the following is a commonly used supervised learning method? a. k-means clustering b. k-nearest neighbors c. hierarchical clustering d. association rule development Answer: B Difficulty: Easy LO: 6.4, Page 269 Bloom’s: Knowledge BUSPROG: Analytic
DISC: Feedback: There are three commonly used supervised learning methods: k-nearest neighbors, classification and regression trees, and logistic regression. 31. _____ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data. a. Underfitting b. Overfitting c. Oversampling d. Undersampling Answer: B Difficulty: Easy LO: 6.4, Page 270 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Overfitting occurs when the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data. 32. Test set is the data set used to: a. build the data mining model. b. estimate accuracy of candidate models on unseen data. c. estimate accuracy of final model on unseen data. d. show counts of actual versus predicted class values. Answer: C Difficulty: Moderate LO: 6.4, Page 270 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Test set is the data set used to estimate accuracy of final model on unseen data. 33. One minus the overall error rate is often referred to as the _____ of the model. a. sensitivity b. accuracy c. specificity d. cutoff value Answer: B Difficulty: Easy LO: 6.4, Page 273 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The percentage of misclassified observations is expressed as the overall error rate. One minus the overall error rate is often referred to as the accuracy of the model.
34. An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n) _____. a. false negative b. false positive c. residual d. outlier Answer: B Difficulty: Easy LO: 6.4, Page 273 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a false positive. 35. Separate error rates with respect to the false negative and false positive cases are computed to take into account the: a. assymetric costs in misclassification. b. symmetric weights of these two cases. c. distortions due to outliers. d. effect of sampling error. Answer: A Difficulty: Challenging LO: 6.4, Page 274 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Separate error rates with respect to the false negative and false positive cases are computed to take into account the assymetric costs in misclassification. 36. In the k-nearest neighbors method, when the value of k is set to 1,: a. the classification or prediction of a new observation is based solely on the single most similar observation from the training set. b. the new observation’s class is naïvely assigned to the most common class in the training set. c. the new observation’s prediction is used to estimate the anticipated error rate on future data over the entire training set. d. the classification or prediction of a new observation is subject to the smallest possible classification error. Answer: A Difficulty: Moderate LO: 6.4, Pages 277-278 Bloom’s: Comprehension BUSPROG: Analytic DISC:
Feedback: In the k-nearest neighbors method, when the value of k is set to 1, the classification or prediction of a new observation is based solely on the single most similar observation from the training set. 37. _____ is a measure of the heterogeneity of observations in a classification tree. a. Sensitivity b. Specificity c. Accuracy d. Impurity Answer: D Difficulty: Easy LO: 6.4, Page 283 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Impurity is a measure of the heterogeneity of observations in a classification tree. 38. A _____ classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules. a. regression tree b. scatter chart c. classification tree d. classification confusion matrix Answer: C Difficulty: Easy LO: 6.4, Page 283 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A classification tree classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules. 39. The impurity of a group of observations is based on the variance of the outcome value for the observations in the group for _____. a. regression trees b. time-series plots c. classification trees d. cumulative lift charts Answer: A Difficulty: Easy LO: 6.4, Page 283 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: For regression trees, the impurity of a group of observations is based on the variance of the outcome value for the observations in the group.
40. _____ is a generalization of linear regression for predicting a categorical outcome variable. a. Multiple linear regression b. Logistic regression c. Discriminant analysis d. Cluster analysis Answer: B Difficulty: Easy LO: 6.4, Page 299 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Logistic regression is a generalization of linear regression for predicting a categorical outcome variable. Problems 1. As part of the quarterly reviews, the manager of a retail store analyzes the quality of customer service based on the periodic customer satisfaction ratings (on a scale of 1 to 10 with 1 = Poor and 10 = Excellent). To understand the level of service quality, which includes the waiting times of the customers in the checkout section, he collected the following data on 100 customers who visited the store. Customer Number
Wait Time (min)
Purchase Amount ($)
Customer Age
Customer Satisfaction Rating
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
2.3 2.8 3.2 3.4 3.4 4.2 3.2 1.4 6.4 7.8 6.5 9.8 5 1.8 6.1 3.4 7.8
436 408 432 431 456 537 456 430 663 839 659 836 543 419 700 432 845
42 33 38 40 29 46 42 40 24 37 52 43 56 35 39 44 33
7 6 5 5 6 4 5 8 3 4 5 2 4 8 6 7 5
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
2.8 1.2 9.5 8.2 7.6 5.4 6.7 9.6 11.4 2.1 5.6 3.7 4.9 6.4 9.3 10.6 6.5 5.4 7.6 3.2 2.4 1 0.2 2.4 5.7 6.4 6 3.7 8.7 6.9 9.8 10 9.5 6.3 7.4 2.3 4.6 4.9 5.7 7.4 6.8 9.6 6.4
467 425 848 808 674 547 691 847 826 426 535 521 513 645 846 730 786 523 654 443 409 400 418 498 532 663 681 543 800 673 856 756 854 672 698 434 544 523 546 676 662 1000 678
42 46 50 55 35 52 38 53 48 52 32 43 44 53 52 51 53 46 36 48 54 39 51 30 32 44 39 54 51 45 43 44 43 50 47 43 40 53 55 42 36 40 46
6 8 4 3 3 4 5 4 2 7 7 8 6 5 4 3 3 5 6 7 8 6 7 6 5 7 8 5 5 5 4 4 6 6 7 7 4 6 6 8 6 5 5
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
7.2 5.6 9.7 2.3 4.3 5.7 2.4 6.7 2.4 9.8 4.5 6.7 7.2 3.5 8.9 9.7 3.5 4.7 8.5 9.7 2.7 5.7 7.6 4.4 7.8 9.4 4.9 7.1 5.4 6.7 8.6 4.5 6.1 5.3 6.7 8.1 6.3 7.4 8.8 9.6
655 535 833 498 508 542 435 665 387 845 532 687 643 424 836 876 456 523 818 845 401 554 648 540 839 845 534 693 512 665 825 548 704 509 672 824 632 689 839 847
32 36 35 30 41 49 39 41 54 34 40 30 33 49 47 31 47 49 35 54 55 43 51 31 45 48 36 44 39 49 36 30 31 31 35 36 30 35 50 35
4 5 3 7 6 6 8 5 9 7 6 5 4 7 5 4 7 6 5 4 7 6 7 6 5 4 5 4 3 5 5 7 5 6 5 4 4 2 4 2
Apply k-means clustering with k = 5 using Wait Time (min), Purchase Amount ($), Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data, and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Analyze the resultant clusters. What is the smallest cluster? What is the least dense cluster (as measured by the average distance in the cluster)? What reasons do you see for low customer satisfaction ratings? Answer: We specify # Iterations = 50 and # Starts = 10. We use the default fixed seed of 12345. We see that the size of the clusters does not vary much. Size of cluster varies from 6 to 36. The smallest cluster has 6 customers, cluster-4. The least dense cluster is the 36 customer cluster, cluster- 5, which includes customers with waiting time ranging from 6.1 to 11.4, purchase amount ranging from 654 to 1000, age between 31 and 55, and customer satisfaction rating ranging from 2 to 7. From the below output, it appears that more waiting times and high purchase amounts are the reasons for low customer satisfaction ratings. The high purchase amounts can be attributed to high prices of the products in the store.
Difficulty: Easy LO: 6.3, Pages 262-264 Bloom’s: Application BUSPROG: Analytic DISC: 2. As part of the quarterly reviews, the manager of a retail store analyzes the quality of customer service based on the periodic customer satisfaction ratings (on a scale of 1 to 10 with 1 = Poor and 10 = Excellent). To understand the level of service quality, which includes the waiting times of the customers in the checkout section, he collected the following data on 100 customers who visited the store.
Customer Number
Wait Time (min)
Purchase Amount ($)
Customer Age
Customer Satisfaction Rating
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2.3 2.8 3.2 3.4 3.4 4.2 3.2 1.4 6.4 7.8 6.5 9.8 5 1.8 6.1 3.4
436 408 432 431 456 537 456 430 663 839 659 836 543 419 700 432
42 33 38 40 29 46 42 40 24 37 52 43 56 35 39 44
7 6 5 5 6 4 5 8 3 4 5 2 4 8 6 7
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
7.8 2.8 1.2 9.5 8.2 7.6 5.4 6.7 9.6 11.4 2.1 5.6 3.7 4.9 6.4 9.3 10.6 6.5 5.4 7.6 3.2 2.4 1 0.2 2.4 5.7 6.4 6 3.7 8.7 6.9 9.8 10 9.5 6.3 7.4 2.3 4.6 4.9 5.7 7.4 6.8 9.6
845 467 425 848 808 674 547 691 847 826 426 535 521 513 645 846 730 786 523 654 443 409 400 418 498 532 663 681 543 800 673 856 756 854 672 698 434 544 523 546 676 662 1000
33 42 46 50 55 35 52 38 53 48 52 32 43 44 53 52 51 53 46 36 48 54 39 51 30 32 44 39 54 51 45 43 44 43 50 47 43 40 53 55 42 36 40
5 6 8 4 3 3 4 5 4 2 7 7 8 6 5 4 3 3 5 6 7 8 6 7 6 5 7 8 5 5 5 4 4 6 6 7 7 4 6 6 8 6 5
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
6.4 7.2 5.6 9.7 2.3 4.3 5.7 2.4 6.7 2.4 9.8 4.5 6.7 7.2 3.5 8.9 9.7 3.5 4.7 8.5 9.7 2.7 5.7 7.6 4.4 7.8 9.4 4.9 7.1 5.4 6.7 8.6 4.5 6.1 5.3 6.7 8.1 6.3 7.4 8.8 9.6
678 655 535 833 498 508 542 435 665 387 845 532 687 643 424 836 876 456 523 818 845 401 554 648 540 839 845 534 693 512 665 825 548 704 509 672 824 632 689 839 847
46 32 36 35 30 41 49 39 41 54 34 40 30 33 49 47 31 47 49 35 54 55 43 51 31 45 48 36 44 39 49 36 30 31 31 35 36 30 35 50 35
5 4 5 3 7 6 6 8 5 9 7 6 5 4 7 5 4 7 6 5 4 7 6 7 6 5 4 5 4 3 5 5 7 5 6 5 4 4 2 4 2
Apply hierarchical clustering with 5 clusters using Wait Time (min), Purchase Amount ($), Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Use Ward’s method as the clustering method. a. Use a PivotTable on the data in the HC_Clusters1 worksheet to compute the cluster centers for the five clusters in the hierarchical clustering. b. Identify the cluster with the largest average waiting time. Using all the variables, how would you characterize this cluster? c. Identify the smallest cluster. d. By examining the dendrogram on the HC_Dendrogram worksheet (as well as the sequence of clustering stages in HC_Output1), what number of clusters seems to be the most natural fit based on the distance? Answer: a. Below is the PivotTable obtained on the data in the “HC_Clusters1” worksheet.
b. Cluster 5 has the largest average waiting time (approx. 9.35 min). This cluster is a collection of 11 customers characterized by the largest average purchase amount of about $823, the oldest average customer age, and the the lowest average customer satisfaction rating 3.36. c. We see that the size of the clusters does not vary much. However, Cluster-5 is the smallest cluster with a collection of 11 customers. d. From the below figure, 4 clusters appear to be a natural fit for this data. When there are more than 4 clusters, mergers result in a small marginal increase in distance, but when there are less than 4 clusters, mergers lead to large marginal increase in distance.
Difficulty: Moderate LO: 6.3, Pages 255-262 Bloom’s: Application BUSPROG: Analytic DISC: 3. As part of the quarterly reviews, the manager of a retail store analyzes the quality of customer service based on the periodic customer satisfaction ratings (on a scale of 1 to 10 with 1 = Poor and 10 = Excellent). To understand the level of service quality, which includes the waiting times of the customers in the checkout section, he collected the following data on 100 customers who visited the store.
Customer Number
Wait Time (min)
Purchase Amount ($)
Customer Age
Customer Satisfaction Rating
1 2 3 4 5
2.3 2.8 3.2 3.4 3.4
436 408 432 431 456
42 33 38 40 29
7 6 5 5 6
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
4.2 3.2 1.4 6.4 7.8 6.5 9.8 5 1.8 6.1 3.4 7.8 2.8 1.2 9.5 8.2 7.6 5.4 6.7 9.6 11.4 2.1 5.6 3.7 4.9 6.4 9.3 10.6 6.5 5.4 7.6 3.2 2.4 1 0.2 2.4 5.7 6.4 6 3.7 8.7 6.9 9.8
537 456 430 663 839 659 836 543 419 700 432 845 467 425 848 808 674 547 691 847 826 426 535 521 513 645 846 730 786 523 654 443 409 400 418 498 532 663 681 543 800 673 856
46 42 40 24 37 52 43 56 35 39 44 33 42 46 50 55 35 52 38 53 48 52 32 43 44 53 52 51 53 46 36 48 54 39 51 30 32 44 39 54 51 45 43
4 5 8 3 4 5 2 4 8 6 7 5 6 8 4 3 3 4 5 4 2 7 7 8 6 5 4 3 3 5 6 7 8 6 7 6 5 7 8 5 5 5 4
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
10 9.5 6.3 7.4 2.3 4.6 4.9 5.7 7.4 6.8 9.6 6.4 7.2 5.6 9.7 2.3 4.3 5.7 2.4 6.7 2.4 9.8 4.5 6.7 7.2 3.5 8.9 9.7 3.5 4.7 8.5 9.7 2.7 5.7 7.6 4.4 7.8 9.4 4.9 7.1 5.4 6.7 8.6
756 854 672 698 434 544 523 546 676 662 1000 678 655 535 833 498 508 542 435 665 387 845 532 687 643 424 836 876 456 523 818 845 401 554 648 540 839 845 534 693 512 665 825
44 43 50 47 43 40 53 55 42 36 40 46 32 36 35 30 41 49 39 41 54 34 40 30 33 49 47 31 47 49 35 54 55 43 51 31 45 48 36 44 39 49 36
4 6 6 7 7 4 6 6 8 6 5 5 4 5 3 7 6 6 8 5 9 7 6 5 4 7 5 4 7 6 5 4 7 6 7 6 5 4 5 4 3 5 5
92 93 94 95 96 97 98 99 100
4.5 6.1 5.3 6.7 8.1 6.3 7.4 8.8 9.6
548 704 509 672 824 632 689 839 847
30 31 31 35 36 30 35 50 35
7 5 6 5 4 4 2 4 2
a. Apply hierarchical clustering with 5 clusters using Wait Time (min) and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure, and specify single linkage as the clustering method. Analyze the resulting clusters by computing the cluster size. It may be helpful to use a PivotTable on the data in the HC_Clusters worksheet generated by XLMiner to compute descriptive measures of the Wait Time and Customer Satisfaction Rating variables in each cluster. You can also visualize the clusters by creating a scatter plot with Wait Time (min) as the x-variable and Customer Satisfaction Rating as the y-variable. b. Repeat part a using average linkage as the clustering method. Compare the clusters to the previous method.
Answer: a. Single linkage results in clusters with extreme sizes. There are three single-customer clusters (customer-40, customer-70, and customer-98). There is one 90-customer cluster with waiting time ranging between 1 min to 11.4 min.
e. Average linkage results in two clusters which have two customers. Some of the single linkage clusters are closely related to the average linkage clusters. For example, Cluster 1 in the single linkage is the merger of Clusters 1, 2, and 3 from the average linkage. And, Cluster 4 of the single linkage cluster is similar to cluster 5 of the average linkage cluster.
Difficulty: Moderate LO: 6.3, Pages 255-262 Bloom’s: Application BUSPROG: Analytic DISC: 4. As part of the quarterly reviews, the manager of a retail store analyzes the quality of customer service based on the periodic customer satisfaction ratings (on a scale of 1 to 10 with 1 = Poor and 10 = Excellent). To understand the level of service quality, which includes the waiting times of the customers in the checkout section, he collected the following data on 100 customers who visited the store. Customer Number
Wait Time (min)
Purchase Amount ($)
Customer Age
Customer Satisfaction Rating
1 2 3 4 5 6 7 8
2.3 2.8 3.2 3.4 3.4 4.2 3.2 1.4
436 408 432 431 456 537 456 430
42 33 38 40 29 46 42 40
7 6 5 5 6 4 5 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
6.4 7.8 6.5 9.8 5 1.8 6.1 3.4 7.8 2.8 1.2 9.5 8.2 7.6 5.4 6.7 9.6 11.4 2.1 5.6 3.7 4.9 6.4 9.3 10.6 6.5 5.4 7.6 3.2 2.4 1 0.2 2.4 5.7 6.4 6 3.7 8.7 6.9 9.8 10 9.5 6.3
663 839 659 836 543 419 700 432 845 467 425 848 808 674 547 691 847 826 426 535 521 513 645 846 730 786 523 654 443 409 400 418 498 532 663 681 543 800 673 856 756 854 672
24 37 52 43 56 35 39 44 33 42 46 50 55 35 52 38 53 48 52 32 43 44 53 52 51 53 46 36 48 54 39 51 30 32 44 39 54 51 45 43 44 43 50
3 4 5 2 4 8 6 7 5 6 8 4 3 3 4 5 4 2 7 7 8 6 5 4 3 3 5 6 7 8 6 7 6 5 7 8 5 5 5 4 4 6 6
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
7.4 2.3 4.6 4.9 5.7 7.4 6.8 9.6 6.4 7.2 5.6 9.7 2.3 4.3 5.7 2.4 6.7 2.4 9.8 4.5 6.7 7.2 3.5 8.9 9.7 3.5 4.7 8.5 9.7 2.7 5.7 7.6 4.4 7.8 9.4 4.9 7.1 5.4 6.7 8.6 4.5 6.1 5.3
698 434 544 523 546 676 662 1000 678 655 535 833 498 508 542 435 665 387 845 532 687 643 424 836 876 456 523 818 845 401 554 648 540 839 845 534 693 512 665 825 548 704 509
47 43 40 53 55 42 36 40 46 32 36 35 30 41 49 39 41 54 34 40 30 33 49 47 31 47 49 35 54 55 43 51 31 45 48 36 44 39 49 36 30 31 31
7 7 4 6 6 8 6 5 5 4 5 3 7 6 6 8 5 9 7 6 5 4 7 5 4 7 6 5 4 7 6 7 6 5 4 5 4 3 5 5 7 5 6
95 96 97 98 99 100
6.7 8.1 6.3 7.4 8.8 9.6
672 824 632 689 839 847
35 36 30 35 50 35
5 4 4 2 4 2
For the above data, apply k-means clustering using Wait time (min) as the variable with k = 3. Be sure to Normalize input data, and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Then create one distinct data set for each of the three resulting clusters for waiting time. a. For the observations composing the cluster which has the low waiting time, apply hierarchical clustering with Ward’s method to form two clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters, report the characteristics of each cluster. b. For the observations composing the cluster which has the medium waiting time, apply hierarchical clustering with Ward’s method to form three clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters, report the characteristics of each cluster. c. For the observations composing the cluster which has the high waiting time, apply hierarchical clustering with Ward’s method to form two clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters, report the characteristics of each cluster. Answer: Below is the Pivot table on the data in KM_Cluster1.
a. The interval with the low waiting time is separated into two clusters with respect to Purchase amount, Age, and Customer satisfaction rating.
b. The interval with the medium waiting time is separated into clusters of 22 and 19 customers with about similar Customer age and Customer Satisfaction Rating. The other cluster differs primarily in terms of Customer age and Customer Satisfaction Rating.
c. The interval with the high waiting time is separated into two clusters of 14 and 9 customers which have similar purchase amount and Customer Satisfaction Rating.
Difficulty: Challenging LO: 6.3, Pages 255-264 Bloom’s: Application BUSPROG: Analytic DISC: 5. To examine the local housing market in a particular region, a sample of 120 homes sold during a year is collected. The data are given below. LandValue ($) 18100 23600 25900 22100 23900 22400 24100 26300 24900 13600 36100 19500 38800 23500
BuildingValue ($) 92500 152700 134300 129600 168700 118300 123300 133800 139400 87200 210400 101300 224700 139000
Acres 0.5 0.22 0.3 0.23 0.32 0.25 0.26 0.26 0.24 0.17 0.6 0.16 0.44 0.22
Age 53.9 19.7 15.9 41 39.9 41.8 70.9 37.8 33 34.7 52.9 67.8 21.7 10.8
Price ($) 114885 180895 162038 154496 196973 145075 151480 164762 166528 105762 250170 125082 265066 166697
26300 21900 23400 15000 15000 9200 9200 5600 9000 21000 23500 36000 23700 22000 19900 22100 24600 21500 15000 15700 14200 10700 16600 25500 15100 7400 28500 25100 50100 83300 124500 47000 64600 33900 41100 29100 56400 45400 23800 52800 25100 27200 28100
164200 122400 149600 102200 102200 22000 22000 48000 58800 109600 165900 262500 114900 102700 95800 116300 165500 113400 81100 129200 81600 49700 72700 110700 74300 55500 129400 83900 164600 276000 552300 214400 185000 138800 156300 96400 256400 219200 92100 172800 99200 152600 102900
0.35 0.17 0.22 0.12 0.12 0.17 0.17 0.12 0.24 0.21 0.15 0.22 0.22 0.2 0.23 0.18 0.29 0.17 0.16 0.23 0.15 0.15 0.18 0.21 0.23 0.15 0.25 0.2 0.23 0.61 1.05 0.22 0.58 0.22 0.18 0.28 0.4 0.21 0.15 0.27 0.19 0.18 0.18
3.9 15.7 15.7 97.8 97.7 120.9 120.9 103.9 88 36.7 5.7 2.9 37.7 48.9 78.9 30.8 43 44.9 62.9 46.7 57.9 99.8 91.8 48 71.8 96.8 49.9 45.8 44 47.9 5.7 92.9 91 97.9 76 57.8 56.8 79.8 91.9 74.8 36.7 16.7 75.8
194881 146818 176048 119584 121759 34947 35214 57142 72192 133848 194079 300407 141700 128866 119189 141018 193661 137308 99817 148909 100701 65082 92614 137889 91180 64119 160139 113043 217684 360936 679795 264115 254075 173987 200251 130214 316874 267672 119769 229499 128456 181102 132977
28800 33400 20700 25600 25800 29300 26000 25900 32800 31100 25800 27200 25000 29200 30000 20400 23600 16200 29300 27000 25600 46200 22900 27100 30700 29100 34700 20000 35700 35100 33700 33700 36400 33200 39200 33100 16000 24900 22000 20000 33900 22100 22800
98800 103900 95600 101900 110700 147700 116000 73500 125000 166800 105300 94800 105900 117500 93300 112000 83400 85800 123900 97800 86300 220500 160000 105200 107100 102400 150400 80400 159400 161500 162500 162500 176100 122300 169200 180100 98400 63800 121300 107600 230800 153800 111100
0.19 0.45 0.14 0.2 0.18 0.2 0.18 0.16 0.35 0.2 0.17 0.17 0.16 0.2 0.26 0.13 0.16 0.1 0.22 0.18 0.16 0.57 0.15 0.21 0.3 0.23 0.28 0.24 0.28 0.25 0.21 0.21 0.29 0.2 0.36 0.2 0.19 0.45 0.27 0.23 0.27 0.3 0.23
53.9 84.9 89.8 57.8 51.9 90.9 44 81.8 68.7 57.7 58.8 42.9 82 53.8 55.7 80.9 57.7 67 44.8 46.8 61.7 50.8 20.7 51.8 70 58 68.9 66.9 1.7 8.8 8.8 8.8 8.9 4.9 5.9 5.8 49.9 83.9 34.9 36.7 10 46.8 52
131411 139697 120046 131026 141202 181575 144513 100953 160546 199970 134647 124311 133543 151392 124476 136599 110399 105027 157819 129675 115952 268552 187870 135549 142738 135284 189790 105302 196936 201349 198580 200228 215634 157208 212662 217543 118491 91539 147802 131948 268444 180464 137326
24700 38700 25800 31700 82200 19500 24400 22500 25900 22700 21200 34000 18900 33900 23800 23900 18500 36300 47300 36600
117800 118700 108000 140500 171700 147600 132000 119800 117100 95000 56700 163800 118000 151600 133500 119000 110500 122500 298800 238700
0.32 0.81 0.26 0.34 1.23 0.53 0.25 0.18 0.29 0.25 0.23 0.26 0.17 0.26 0.21 0.21 0.19 0.61 0.36 0.28
48.7 47.8 53.3 40.6 56.4 28.2 14.2 15.5 17.7 55.6 96.6 15.2 45.5 25.3 13.6 14.3 32.2 56.2 31.4 25.5
145115 159644 135049 174475 257467 169311 157570 143676 146960 121175 81869 199361 139981 186637 161123 146054 130575 162270 348138 278839
Apply k-means clustering with k = 10 using LandValue ($), BuildingValue ($), Acres, Age, and Price ($) as variables. Be sure to Normalize input data, and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. What is the smallest cluster? What is the least dense cluster (as measured by the average distance in the cluster)? Answer: We specify # Iterations = 50 and # Starts = 10. We use the default fixed seed of 12345. We see that the size of the clusters varies widely. There are 2 single home clusters, Cluster-3 and Cluster-6. The least dense cluster is the 7-home cluster, cluster 2, which includes homes with Age ranging from 21.7 to 91 and Price ranging from $159,644 to $360,936.
Difficulty: Easy LO: 6.3, Pages 262-264 Bloom’s: Application BUSPROG: Analytic DISC: 6. To examine the local housing market in a particular region, a sample of 120 homes sold during a year are collected. The data is given below. LandValue ($) 18100 23600 25900
BuildingValue ($) 92500 152700 134300
Acres 0.5 0.22 0.3
Age 53.9 19.7 15.9
Price ($) 114885 180895 162038
22100 23900 22400 24100 26300 24900 13600 36100 19500 38800 23500 26300 21900 23400 15000 15000 9200 9200 5600 9000 21000 23500 36000 23700 22000 19900 22100 24600 21500 15000 15700 14200 10700 16600 25500 15100 7400 28500 25100 50100 83300 124500 47000
129600 168700 118300 123300 133800 139400 87200 210400 101300 224700 139000 164200 122400 149600 102200 102200 22000 22000 48000 58800 109600 165900 262500 114900 102700 95800 116300 165500 113400 81100 129200 81600 49700 72700 110700 74300 55500 129400 83900 164600 276000 552300 214400
0.23 0.32 0.25 0.26 0.26 0.24 0.17 0.6 0.16 0.44 0.22 0.35 0.17 0.22 0.12 0.12 0.17 0.17 0.12 0.24 0.21 0.15 0.22 0.22 0.2 0.23 0.18 0.29 0.17 0.16 0.23 0.15 0.15 0.18 0.21 0.23 0.15 0.25 0.2 0.23 0.61 1.05 0.22
41 39.9 41.8 70.9 37.8 33 34.7 52.9 67.8 21.7 10.8 3.9 15.7 15.7 97.8 97.7 120.9 120.9 103.9 88 36.7 5.7 2.9 37.7 48.9 78.9 30.8 43 44.9 62.9 46.7 57.9 99.8 91.8 48 71.8 96.8 49.9 45.8 44 47.9 5.7 92.9
154496 196973 145075 151480 164762 166528 105762 250170 125082 265066 166697 194881 146818 176048 119584 121759 34947 35214 57142 72192 133848 194079 300407 141700 128866 119189 141018 193661 137308 99817 148909 100701 65082 92614 137889 91180 64119 160139 113043 217684 360936 679795 264115
64600 33900 41100 29100 56400 45400 23800 52800 25100 27200 28100 28800 33400 20700 25600 25800 29300 26000 25900 32800 31100 25800 27200 25000 29200 30000 20400 23600 16200 29300 27000 25600 46200 22900 27100 30700 29100 34700 20000 35700 35100 33700 33700
185000 138800 156300 96400 256400 219200 92100 172800 99200 152600 102900 98800 103900 95600 101900 110700 147700 116000 73500 125000 166800 105300 94800 105900 117500 93300 112000 83400 85800 123900 97800 86300 220500 160000 105200 107100 102400 150400 80400 159400 161500 162500 162500
0.58 0.22 0.18 0.28 0.4 0.21 0.15 0.27 0.19 0.18 0.18 0.19 0.45 0.14 0.2 0.18 0.2 0.18 0.16 0.35 0.2 0.17 0.17 0.16 0.2 0.26 0.13 0.16 0.1 0.22 0.18 0.16 0.57 0.15 0.21 0.3 0.23 0.28 0.24 0.28 0.25 0.21 0.21
91 97.9 76 57.8 56.8 79.8 91.9 74.8 36.7 16.7 75.8 53.9 84.9 89.8 57.8 51.9 90.9 44 81.8 68.7 57.7 58.8 42.9 82 53.8 55.7 80.9 57.7 67 44.8 46.8 61.7 50.8 20.7 51.8 70 58 68.9 66.9 1.7 8.8 8.8 8.8
254075 173987 200251 130214 316874 267672 119769 229499 128456 181102 132977 131411 139697 120046 131026 141202 181575 144513 100953 160546 199970 134647 124311 133543 151392 124476 136599 110399 105027 157819 129675 115952 268552 187870 135549 142738 135284 189790 105302 196936 201349 198580 200228
36400 33200 39200 33100 16000 24900 22000 20000 33900 22100 22800 24700 38700 25800 31700 82200 19500 24400 22500 25900 22700 21200 34000 18900 33900 23800 23900 18500 36300 47300 36600
176100 122300 169200 180100 98400 63800 121300 107600 230800 153800 111100 117800 118700 108000 140500 171700 147600 132000 119800 117100 95000 56700 163800 118000 151600 133500 119000 110500 122500 298800 238700
0.29 0.2 0.36 0.2 0.19 0.45 0.27 0.23 0.27 0.3 0.23 0.32 0.81 0.26 0.34 1.23 0.53 0.25 0.18 0.29 0.25 0.23 0.26 0.17 0.26 0.21 0.21 0.19 0.61 0.36 0.28
8.9 4.9 5.9 5.8 49.9 83.9 34.9 36.7 10 46.8 52 48.7 47.8 53.3 40.6 56.4 28.2 14.2 15.5 17.7 55.6 96.6 15.2 45.5 25.3 13.6 14.3 32.2 56.2 31.4 25.5
215634 157208 212662 217543 118491 91539 147802 131948 268444 180464 137326 145115 159644 135049 174475 257467 169311 157570 143676 146960 121175 81869 199361 139981 186637 161123 146054 130575 162270 348138 278839
Apply hierarchical clustering with 10 clusters using LandValue ($), BuildingValue ($), Acres, Age, and Price ($) as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Use Ward’s method as the clustering method. a. Use a PivotTable on the data in the HC_Clusters1 worksheet to compute the cluster centers for the clusters in the hierarchical clustering. b. Identify the cluster with the largest average price. Using all the variables, how would you characterize this cluster? c. Identify the smallest cluster. Answer: a. Below is the PivotTable obtained on the data in the “HC_Clusters1” worksheet.
b. Cluster 7 has the largest average price (about $296,295). This cluster is a collection of 6 homes characterized by a cluster center indicating relatively moderate land value of $41,500, the second largest average building value of $251,983, a relatively low average Acres value of 0.33; and a relatively low average age of about 25 years. c. Clusters 9 and 10 are the smallest clusters each with a single home. Difficulty: Moderate LO: 6.3, Pages 255-262 Bloom’s: Application BUSPROG: Analytic DISC: 7. To examine the local housing market in a particular region, a sample of 120 homes sold during a year are collected. The data is given below. LandValue ($) 18100 23600 25900 22100 23900 22400 24100 26300 24900 13600 36100 19500 38800 23500 26300 21900 23400 15000
BuildingValue ($) 92500 152700 134300 129600 168700 118300 123300 133800 139400 87200 210400 101300 224700 139000 164200 122400 149600 102200
Acres 0.5 0.22 0.3 0.23 0.32 0.25 0.26 0.26 0.24 0.17 0.6 0.16 0.44 0.22 0.35 0.17 0.22 0.12
Age 53.9 19.7 15.9 41 39.9 41.8 70.9 37.8 33 34.7 52.9 67.8 21.7 10.8 3.9 15.7 15.7 97.8
Price ($) 114885 180895 162038 154496 196973 145075 151480 164762 166528 105762 250170 125082 265066 166697 194881 146818 176048 119584
15000 9200 9200 5600 9000 21000 23500 36000 23700 22000 19900 22100 24600 21500 15000 15700 14200 10700 16600 25500 15100 7400 28500 25100 50100 83300 124500 47000 64600 33900 41100 29100 56400 45400 23800 52800 25100 27200 28100 28800 33400 20700 25600
102200 22000 22000 48000 58800 109600 165900 262500 114900 102700 95800 116300 165500 113400 81100 129200 81600 49700 72700 110700 74300 55500 129400 83900 164600 276000 552300 214400 185000 138800 156300 96400 256400 219200 92100 172800 99200 152600 102900 98800 103900 95600 101900
0.12 0.17 0.17 0.12 0.24 0.21 0.15 0.22 0.22 0.2 0.23 0.18 0.29 0.17 0.16 0.23 0.15 0.15 0.18 0.21 0.23 0.15 0.25 0.2 0.23 0.61 1.05 0.22 0.58 0.22 0.18 0.28 0.4 0.21 0.15 0.27 0.19 0.18 0.18 0.19 0.45 0.14 0.2
97.7 120.9 120.9 103.9 88 36.7 5.7 2.9 37.7 48.9 78.9 30.8 43 44.9 62.9 46.7 57.9 99.8 91.8 48 71.8 96.8 49.9 45.8 44 47.9 5.7 92.9 91 97.9 76 57.8 56.8 79.8 91.9 74.8 36.7 16.7 75.8 53.9 84.9 89.8 57.8
121759 34947 35214 57142 72192 133848 194079 300407 141700 128866 119189 141018 193661 137308 99817 148909 100701 65082 92614 137889 91180 64119 160139 113043 217684 360936 679795 264115 254075 173987 200251 130214 316874 267672 119769 229499 128456 181102 132977 131411 139697 120046 131026
25800 29300 26000 25900 32800 31100 25800 27200 25000 29200 30000 20400 23600 16200 29300 27000 25600 46200 22900 27100 30700 29100 34700 20000 35700 35100 33700 33700 36400 33200 39200 33100 16000 24900 22000 20000 33900 22100 22800 24700 38700 25800 31700
110700 147700 116000 73500 125000 166800 105300 94800 105900 117500 93300 112000 83400 85800 123900 97800 86300 220500 160000 105200 107100 102400 150400 80400 159400 161500 162500 162500 176100 122300 169200 180100 98400 63800 121300 107600 230800 153800 111100 117800 118700 108000 140500
0.18 0.2 0.18 0.16 0.35 0.2 0.17 0.17 0.16 0.2 0.26 0.13 0.16 0.1 0.22 0.18 0.16 0.57 0.15 0.21 0.3 0.23 0.28 0.24 0.28 0.25 0.21 0.21 0.29 0.2 0.36 0.2 0.19 0.45 0.27 0.23 0.27 0.3 0.23 0.32 0.81 0.26 0.34
51.9 90.9 44 81.8 68.7 57.7 58.8 42.9 82 53.8 55.7 80.9 57.7 67 44.8 46.8 61.7 50.8 20.7 51.8 70 58 68.9 66.9 1.7 8.8 8.8 8.8 8.9 4.9 5.9 5.8 49.9 83.9 34.9 36.7 10 46.8 52 48.7 47.8 53.3 40.6
141202 181575 144513 100953 160546 199970 134647 124311 133543 151392 124476 136599 110399 105027 157819 129675 115952 268552 187870 135549 142738 135284 189790 105302 196936 201349 198580 200228 215634 157208 212662 217543 118491 91539 147802 131948 268444 180464 137326 145115 159644 135049 174475
82200 19500 24400 22500 25900 22700 21200 34000 18900 33900 23800 23900 18500 36300 47300 36600
171700 147600 132000 119800 117100 95000 56700 163800 118000 151600 133500 119000 110500 122500 298800 238700
1.23 0.53 0.25 0.18 0.29 0.25 0.23 0.26 0.17 0.26 0.21 0.21 0.19 0.61 0.36 0.28
56.4 28.2 14.2 15.5 17.7 55.6 96.6 15.2 45.5 25.3 13.6 14.3 32.2 56.2 31.4 25.5
257467 169311 157570 143676 146960 121175 81869 199361 139981 186637 161123 146054 130575 162270 348138 278839
a. Apply hierarchical clustering with 10 clusters using LandValue ($), BuildingValue ($), Acres, Age, and Price ($) as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure, and specify complete linkage as the clustering method. Analyze the resulting clusters by computing the cluster size. It may be helpful to use a PivotTable on the data in the HC_Clusters worksheet generated by XLMiner. You can also visualize the clusters by creating a scatter plot with Acre as the x-variable and Price ($) as the y-variable. b. Repeat part a using average group linkage as the clustering method. Compare the clusters to the previous method. Answer: a. Complete linkage results in clusters with extreme sizes. There are two single-home clusters (home-45 and home-105). There is one 43-home cluster, cluster 3, which has the average price centered at $124,927.
b. Average group linkage results in three single-home clusters. Only one of the complete linkage clusters is identical to a cluster from Average group linkage. Cluster 10 of complete linkage and average group linkage are same.
Difficulty: Moderate LO: 6.3, Pages 255-262 Bloom’s: Application BUSPROG: Analytic DISC: 8. To examine the local housing market in a particular region, a sample of 120 homes sold during a year are collected. The data is given below. LandValue ($)
BuildingValue ($)
Acres
Age
18100 23600 25900 22100 23900 22400 24100 26300 24900 13600 36100 19500
92500 152700 134300 129600 168700 118300 123300 133800 139400 87200 210400 101300
0.5 0.22 0.3 0.23 0.32 0.25 0.26 0.26 0.24 0.17 0.6 0.16
53.9 19.7 15.9 41 39.9 41.8 70.9 37.8 33 34.7 52.9 67.8
Price ($) 114885 180895 162038 154496 196973 145075 151480 164762 166528 105762 250170 125082
38800 23500 26300 21900 23400 15000 15000 9200 9200 5600 9000 21000 23500 36000 23700 22000 19900 22100 24600 21500 15000 15700 14200 10700 16600 25500 15100 7400 28500 25100 50100 83300 124500 47000 64600 33900 41100 29100 56400 45400 23800 52800 25100
224700 139000 164200 122400 149600 102200 102200 22000 22000 48000 58800 109600 165900 262500 114900 102700 95800 116300 165500 113400 81100 129200 81600 49700 72700 110700 74300 55500 129400 83900 164600 276000 552300 214400 185000 138800 156300 96400 256400 219200 92100 172800 99200
0.44 0.22 0.35 0.17 0.22 0.12 0.12 0.17 0.17 0.12 0.24 0.21 0.15 0.22 0.22 0.2 0.23 0.18 0.29 0.17 0.16 0.23 0.15 0.15 0.18 0.21 0.23 0.15 0.25 0.2 0.23 0.61 1.05 0.22 0.58 0.22 0.18 0.28 0.4 0.21 0.15 0.27 0.19
21.7 10.8 3.9 15.7 15.7 97.8 97.7 120.9 120.9 103.9 88 36.7 5.7 2.9 37.7 48.9 78.9 30.8 43 44.9 62.9 46.7 57.9 99.8 91.8 48 71.8 96.8 49.9 45.8 44 47.9 5.7 92.9 91 97.9 76 57.8 56.8 79.8 91.9 74.8 36.7
265066 166697 194881 146818 176048 119584 121759 34947 35214 57142 72192 133848 194079 300407 141700 128866 119189 141018 193661 137308 99817 148909 100701 65082 92614 137889 91180 64119 160139 113043 217684 360936 679795 264115 254075 173987 200251 130214 316874 267672 119769 229499 128456
27200 28100 28800 33400 20700 25600 25800 29300 26000 25900 32800 31100 25800 27200 25000 29200 30000 20400 23600 16200 29300 27000 25600 46200 22900 27100 30700 29100 34700 20000 35700 35100 33700 33700 36400 33200 39200 33100 16000 24900 22000 20000 33900
152600 102900 98800 103900 95600 101900 110700 147700 116000 73500 125000 166800 105300 94800 105900 117500 93300 112000 83400 85800 123900 97800 86300 220500 160000 105200 107100 102400 150400 80400 159400 161500 162500 162500 176100 122300 169200 180100 98400 63800 121300 107600 230800
0.18 0.18 0.19 0.45 0.14 0.2 0.18 0.2 0.18 0.16 0.35 0.2 0.17 0.17 0.16 0.2 0.26 0.13 0.16 0.1 0.22 0.18 0.16 0.57 0.15 0.21 0.3 0.23 0.28 0.24 0.28 0.25 0.21 0.21 0.29 0.2 0.36 0.2 0.19 0.45 0.27 0.23 0.27
16.7 75.8 53.9 84.9 89.8 57.8 51.9 90.9 44 81.8 68.7 57.7 58.8 42.9 82 53.8 55.7 80.9 57.7 67 44.8 46.8 61.7 50.8 20.7 51.8 70 58 68.9 66.9 1.7 8.8 8.8 8.8 8.9 4.9 5.9 5.8 49.9 83.9 34.9 36.7 10
181102 132977 131411 139697 120046 131026 141202 181575 144513 100953 160546 199970 134647 124311 133543 151392 124476 136599 110399 105027 157819 129675 115952 268552 187870 135549 142738 135284 189790 105302 196936 201349 198580 200228 215634 157208 212662 217543 118491 91539 147802 131948 268444
22100 22800 24700 38700 25800 31700 82200 19500 24400 22500 25900 22700 21200 34000 18900 33900 23800 23900 18500 36300 47300 36600
153800 111100 117800 118700 108000 140500 171700 147600 132000 119800 117100 95000 56700 163800 118000 151600 133500 119000 110500 122500 298800 238700
0.3 0.23 0.32 0.81 0.26 0.34 1.23 0.53 0.25 0.18 0.29 0.25 0.23 0.26 0.17 0.26 0.21 0.21 0.19 0.61 0.36 0.28
46.8 52 48.7 47.8 53.3 40.6 56.4 28.2 14.2 15.5 17.7 55.6 96.6 15.2 45.5 25.3 13.6 14.3 32.2 56.2 31.4 25.5
180464 137326 145115 159644 135049 174475 257467 169311 157570 143676 146960 121175 81869 199361 139981 186637 161123 146054 130575 162270 348138 278839
For the above data, apply k-means clustering using Price ($) as the variable with k = 3. Be sure to Normalize input data, and specify 50 iterations and 10 random starts in Step 2 of the XLMiner kMeans Clustering procedure. Then create one distinct data set for each of the three resulting clusters of price. a. For the observations composing the cluster with low home price, apply hierarchical clustering with Ward’s method to form three clusters using Acres and Age as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters1, report the characteristics of each cluster. b. For the observations composing the cluster with medium home price, apply hierarchical clustering with Ward’s method to form three clusters using Acres and Age as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters1, report the characteristics of each cluster. c. Comment on the cluster with high home price. Answer: Below is the Pivot table on the data in KM_Cluster1.
a. The interval with the low home price is separated into three clusters with respect to Acres and Age. The characteristics of each cluster are as below.
b. The interval with the medium home price is separated into three clusters with respect to Acres and Age. The characteristics of each cluster are as below.
c. The third cluster that has the high home price is a single home cluster with values for Acres and Age as 1.05 and 5.7, respectively, and price $679,795. Difficulty: Challenging LO: 6.3, Pages 255-264 Bloom’s: Application BUSPROG: Analytic DISC: 9. A retailer is interested in analyzing the shopping trend of men concerning the items: Shirts, pants, Jeans, t-shirts, Shoes, and Belts. A sample of 50 male customers is selected and the data are given below. t-shirt Formal Shirt Formal Pants Formal Pants Formal Shirt Formal Pants Formal Shoes Formal Shoes Formal Pants Belt Formal Shirt Belt
Formal Pants t-shirt Formal Shoes Formal Pants Formal Shirt Formal Shirt Jeans t-shirt Jeans t-shirt Formal Pants
Belt Belt Jeans t-shirt Formal Pants t-shirt Jeans Formal Shoes Formal Shirt
t-shirt Formal Shoes
Formal Shoes
Belt
Formal Shirt
Belt
Formal Shoes
Belt Formal Shoes
Formal Shoes Formal Shoes t-shirt Formal Shirt Jeans Formal Shoes Formal Shirt Formal Shoes Belt t-shirt Formal Pants Formal Pants Formal Shirt Belt Formal Shirt Formal Pants Formal Shoes Jeans Formal Pants Formal Shoes Formal Pants Formal Shoes Belt Jeans Formal Shirt Formal Pants t-shirt Formal Shoes Belt Formal Pants Formal Shoes Jeans Formal Pants Belt Formal Shirt Jeans Formal Shirt Formal Shirt
Belt Formal Pants Jeans Formal Pants Belt Formal Pants Formal Pants Formal Pants t-shirt Formal Pants Formal Shirt Formal Shoes Formal Pants t-shirt Formal Shoes Formal Pants Formal Pants Formal Shoes Formal Pants t-shirt Formal Pants Formal Shoes t-shirt Formal Pants Formal Shirt Jeans Formal Pants Formal Shoes Formal Shirt t-shirt Formal Shoes Formal Shoes Formal Pants Jeans Formal Pants Jeans Jeans
t-shirt
Formal Shirt
Formal Shoes
Jeans
Formal Shirt t-shirt Formal Shirt Jeans Formal Shirt Formal Shoes
t-shirt Formal Shoes Belt Formal Shoes Formal Shoes
Jeans Formal Shoes
Formal Pants
Jeans Formal Shoes Belt Formal Shirt
Formal Shirt
Belt Formal shirt Formal Pants Jeans Formal Shoes Formal shirt Belt Formal pant Formal Shoes Belt t-shirt Jeans t-shirt Belt Formal Shoes Formal Pants
Formal Shirt Formal Shirt
Jeans Belt
Belt
t-shirt Formal Pants
Formal Shirt
Jeans
t-shirt
Formal Shoes
Formal shirt t-shirt
Belt
Formal Shirt
Belt t-shirt
Formal Shoes
Formal Shoes
Belt
Formal Shirt
a. Using a minimum support of 20 transactions and a minimum confidence of 50 percent, use XLMiner to generate a list of association rules. How many rules satisfy this criterion?
b. Using the list of rules from part a, consider the rule with the largest lift ratio. Interpret what this rule is saying about the relationship between the antecedent item set and consequent item set. c. Interpret the support count of the item set composed of the all the items involved in the rule with the largest lift ratio. d. Interpret the confidence of the rule with the largest lift ratio. e. Interpret the lift ratio of the rule with the largest lift ratio. Answer: a. 14 rules have a support count of at least 20 and a confidence of 50%. b. Antecedent: Formal Pants, Formal shoes; Consequent: Formal shirt. If a customer purchases formal pants and Formal shoes, then he also purchases formal shirts. c. The support count of the item set involved in this rule is 23 meaning that Formal pants and Formal shirt, Formal shoes have been purchased 23 times together. d. The confidence of this rule is 79.31% which means that of the 29 times Formal pants and Formal shoes were purchased, 23 times Formal shirts were also purchased. e. The lift ratio of this rule is 1.37 which means that a customer purchasing Formal pants and Formal shoes and who also purchased Formal shirts is 37% more likely than a randomly selected customer who purchased Formal shoes.
Difficulty: Easy LO: 6.3, Pages 265-269 Bloom’s: Application BUSPROG: Analytic DISC: 10. A bank is interested in identifying different attributes of its customers and below is the sample data of 150 customers. In the data table for the dummy variable Gender, 0 represents Male and 1 represents Female. And for the dummy variable Personal loan, 0 represents a customer who has not taken personal loan and 1 represents a customer who has taken personal loan. Age
Gender
Work experience
Income (in 1000 $)
Family size
Personal loan
47 26
0 1
22 3
53 22
3 1
1 1
38 37 44 55 44 30 63 34 52 55 52 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61
0 0 0 1 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1
16 12 22 30 23 5 35 8 26 25 28 29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30
29 32 32 45 50 22 56 23 29 34 45 23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49
4 6 3 7 2 2 2 4 1 2 3 3 4 5 4 3 3 2 1 4 3 5 6 2 1 2 3 2 3 5 3 4 1 3 4 5 2 3 4 2 1 6 2
1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1
29 52 56 61 26 60 37 39 46 59 54 27 54 42 64 33 65 38 48 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44 41 54 26 33
1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0
6 25 31 33 4 30 12 14 21 30 31 4 30 18 35 8 34 13 23 7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21 15 29 3 8
34 39 54 43 23 56 23 39 34 39 28 22 45 36 46 32 36 32 26 32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25 28 54 24 26
1 3 2 2 2 3 5 4 5 2 1 1 2 4 2 6 1 4 5 3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3 2 4 2 3
1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1
45 63 55 49 64 26 42 48 64 52 41 40 40 55 49 46 59 51 45 39 55 63 53 39 50 27 29 61 57 56 35 25 57 47 60 32 27 39 26 46 28 64 65
1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1
20 30 31 25 35 3 19 22 34 28 16 14 16 32 27 21 32 26 23 17 32 35 27 16 25 4 6 30 27 34 6 3 27 25 35 8 4 18 1 25 4 37 33
34 54 49 34 54 19 23 43 45 32 34 34 26 37 39 29 56 23 43 26 37 43 24 43 56 32 23 45 34 54 45 32 36 54 33 45 32 39 22 34 23 43 49
4 2 3 5 2 1 1 3 2 4 3 5 4 2 3 1 2 3 4 4 5 2 1 6 3 1 2 1 2 4 5 2 2 4 3 2 1 4 1 5 2 4 2
0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0
47 27 25 25 64 44 65 54 51 51 28 56 57 35 47 54 28 45 43
1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 1
25 5 3 2 30 21 25 31 26 21 4 32 26 9 21 27 5 22 18
37 28 34 24 53 48 47 55 43 46 25 55 49 27 54 45 29 45 43
4 2 1 1 2 5 2 1 3 1 2 3 2 6 4 2 3 5 2
0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 1
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Classify the data using k-nearest neighbors with up to k = 10. Use Age, Gender, Work experience, Income (in 1000 $), and Family size as input variables and Personal loan as the output variable. In Step 2 of XLMiner’s k-nearest neighbors Classification procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. a. For the cutoff probability value 0.5, what value of k minimizes the overall error rate on the validation data? Explain the difference in the overall error rate on the training, validation, and test data. b. Examine the decile-wise lift chart on the test data. Identify and interpret the first decile lift. c. For cutoff probability values of 0.5, 0.4, 0.3, and 0.2, what are the corresponding Class 1 error rates and Class 0 error rates on the validation data? Answer: a. The overall error rate is minimized at k = 2. The overall error rate for the training, validation, and test sets is 26.67%, 42.22%, and 43.33%, respectively. The overall error rate is the lowest on the training data since a training set observation’s set of k nearest neighbors will always include itself, artificially lowering the error rate. For k = 2, the overall error rate on the validation data is biased since this overall error rate is the lowest error rate over all values of k. Thus, applying k = 2 on the test data will typically result in a larger (and more representative) overall error rate because we are not using the test data to find the best value of k.
b. The first decile lift is 1.14. For this test data set of 30 observations and 21 actual customers who have taken the personal loan, if we randomly select 3 customers, on average 2.1 of them would have taken the personal loan. However, if we use k-NN with k = 2 to identify the top 3 observations most likely to have personal loan, then (2.1)(1.14) ≈ 2.4 of them would have taken the personal loan. This can be confirmed from the Detailed Scoring report on the test data by observing that there are 5 observations with a predicted probability of 100% of taking a personal loan, but only 4 of these actually took a loan. Thus, of the top 3 observations recommended by k-NN with k = 2, there would be on average (3)(4/5) = 2.4 that actually took the loan. c. For cutoff probability values of 0.5, 0.4, 0.3, and 0.2, the corresponding Class 1 error rates and Class 0 error rates on the validation data are as below:
Cutoff Value
Class 1 Error Rate
Class 0 Error Rate
0.5
37.04%
50.22%
0.4
37.04%
50.22%
0.3
37.04%
50.22%
0.2
37.04%
50.22%
Difficulty: Moderate LO: 6.4, Pages 269-280 Bloom’s: Application BUSPROG: Analytic DISC: 11. A bank is interested in identifying different attributes of its customers and below is the sample data of 150 customers. In the data table for the dummy variable Gender, 0 represents Male and 1 represents Female. And for the dummy variable Personal loan, 0 represents a customer who has not taken personal loan and 1 represents a customer who has taken personal loan. Age
Gender
Work experience
Income (in 1000 $)
Family size
Personal loan
47 26 38 37 44 55 44
0 1 0 0 0 1 1
22 3 16 12 22 30 23
53 22 29 32 32 45 50
3 1 4 6 3 7 2
1 1 1 1 0 0 0
30 63 34 52 55 52 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29 52 56 61 26
1 0 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0
5 35 8 26 25 28 29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6 25 31 33 4
22 56 23 29 34 45 23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34 39 54 43 23
2 2 4 1 2 3 3 4 5 4 3 3 2 1 4 3 5 6 2 1 2 3 2 3 5 3 4 1 3 4 5 2 3 4 2 1 6 2 1 3 2 2 2
0 0 0 1 1 1 1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1
60 37 39 46 59 54 27 54 42 64 33 65 38 48 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44 41 54 26 33 45 63 55 49 64
1 1 1 1 1 0 0 0 1 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1
30 12 14 21 30 31 4 30 18 35 8 34 13 23 7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21 15 29 3 8 20 30 31 25 35
56 23 39 34 39 28 22 45 36 46 32 36 32 26 32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25 28 54 24 26 34 54 49 34 54
3 5 4 5 2 1 1 2 4 2 6 1 4 5 3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3 2 4 2 3 4 2 3 5 2
0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 0
26 42 48 64 52 41 40 40 55 49 46 59 51 45 39 55 63 53 39 50 27 29 61 57 56 35 25 57 47 60 32 27 39 26 46 28 64 65 47 27 25 25 64
0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0
3 19 22 34 28 16 14 16 32 27 21 32 26 23 17 32 35 27 16 25 4 6 30 27 34 6 3 27 25 35 8 4 18 1 25 4 37 33 25 5 3 2 30
19 23 43 45 32 34 34 26 37 39 29 56 23 43 26 37 43 24 43 56 32 23 45 34 54 45 32 36 54 33 45 32 39 22 34 23 43 49 37 28 34 24 53
1 1 3 2 4 3 5 4 2 3 1 2 3 4 4 5 2 1 6 3 1 2 1 2 4 5 2 2 4 3 2 1 4 1 5 2 4 2 4 2 1 1 2
0 1 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0
44 65 54 51 51 28 56 57 35 47 54 28 45 43
1 0 0 0 0 1 1 1 1 1 1 1 0 1
21 25 31 26 21 4 32 26 9 21 27 5 22 18
48 47 55 43 46 25 55 49 27 54 45 29 45 43
5 2 1 3 1 2 3 2 6 4 2 3 5 2
1 0 1 0 1 1 1 1 1 1 0 1 0 1
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Fit a classification tree using Age, Gender, Work experience, Income (in 1000 $), and Family size as input variables and Personal loan as the output variable. In Step 2 of XLMiner’s Classification Tree procedure, be sure to Normalize input data and to set the Minimum #records in a terminal node to 1. In Step 3 of XLMiner’s Classification Tree procedure, set the maximum number of levels to seven. Generate the Full tree, Best pruned tree, and Minimum error tree. Generate lift charts for both the validation data and the test data. a. Interpret the set of rules implied by the best pruned tree that characterize the customers who have taken personal loan. b. For the default cutoff value of 0.5, what is the overall error rate, Class 1 error rate, and Class 0 error rate of the best pruned tree on the test data? Interpret these respective measures. c. Examine the decile-wise lift chart for the best pruned tree on the test data. What is the first decile lift? Interpret this value.
Answer: a. The rules of the best pruned tree can be distilled to characterize a customer who has taken the personal loan as: i. Age < 39.5 years, Female, and Income > $24,000 OR ii.
Age between 46.5 and 57.5 years and Family size < 3 OR
iii.
Age between 46.5 and 57.5 years, Family size > 2, and Income > $49,000
b. For the default cutoff value of 0.5 on the best pruned tree on the test data, the overall error rate is 46.67%, the class 1 error rate is 61.90%, and the class 0 error rate is 11.11%. That is, the best pruned tree classifies a randomly-selected observation in the test data correctly 46.67% of the time. For a randomly-selected observation who has taken a personal loan, the best pruned tree will correctly classify it 61.90% of the time. For a randomly-selected observation who has not taken a personal loan, the best pruned tree will correctly classify it only 11.11% of the time. c. The first decile lift of the best pruned tree on the test data is 1.22. For this test data set of 30 observations and 21 customers who have actually taken personal loan, if we randomly select 3 customers, on average 2.1 of them would have taken personal loan. However, if we use best prune tree to identify the top 3 observations most likely to have personal loan, then (1.22)(2.1) ≈ 2.57 of them would have taken personal loan. This can be confirmed from the Detailed Scoring report by observing that the best pruned tree rates 7 observations to have a 100% probability of taking a loan, but only 6 of these actually took a loan. Therefore,
out of the top 3 recommendations from the best pruned tree, only (3)(6/7) = 2.57 observations on average will have taken a loan. Difficulty: Moderate LO: 6.4, Pages 269-291 Bloom’s: Application BUSPROG: Analytic DISC:
12. A bank is interested in identifying different attributes of its customers and below is the sample data of 150 customers. In the data table for the dummy variable Gender, 0 represents Male and 1 represents Female. And for the dummy variable Personal loan, 0 represents a customer who has not taken personal loan and 1 represents a customer who has taken personal loan. Age
Gender
Work experience
Income (in 1000 $)
Family size
Personal loan
47 26 38 37 44 55 44 30 63 34 52 55 52 63 51 41 37 46 30 48 50 56 35 39 48 51
0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1
22 3 16 12 22 30 23 5 35 8 26 25 28 29 30 18 14 23 6 25 22 31 9 13 22 21
53 22 29 32 32 45 50 22 56 23 29 34 45 23 32 21 43 23 18 34 21 24 23 29 34 39
3 1 4 6 3 7 2 2 2 4 1 2 3 3 4 5 4 3 3 2 1 4 3 5 6 2
1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 0 1 1 0 1
27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29 52 56 61 26 60 37 39 46 59 54 27 54 42 64 33 65 38 48 31 39 58 51 41
1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1 0 1 0 0
3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6 25 31 33 4 30 12 14 21 30 31 4 30 18 35 8 34 13 23 7 15 28 26 18
26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34 39 54 43 23 56 23 39 34 39 28 22 45 36 46 32 36 32 26 32 35 45 23 34
1 2 3 2 3 5 3 4 1 3 4 5 2 3 4 2 1 6 2 1 3 2 2 2 3 5 4 5 2 1 1 2 4 2 6 1 4 5 3 5 2 3 3
1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0
36 39 58 42 44 65 40 33 45 38 32 51 32 46 44 41 54 26 33 45 63 55 49 64 26 42 48 64 52 41 40 40 55 49 46 59 51 45 39 55 63 53 39
1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0
12 12 29 15 21 35 13 11 24 12 10 30 11 22 21 15 29 3 8 20 30 31 25 35 3 19 22 34 28 16 14 16 32 27 21 32 26 23 17 32 35 27 16
32 22 34 45 32 54 23 41 29 24 28 43 34 39 25 28 54 24 26 34 54 49 34 54 19 23 43 45 32 34 34 26 37 39 29 56 23 43 26 37 43 24 43
2 4 2 1 3 2 5 4 5 4 3 2 1 4 3 2 4 2 3 4 2 3 5 2 1 1 3 2 4 3 5 4 2 3 1 2 3 4 4 5 2 1 6
0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1
50 27 29 61 57 56 35 25 57 47 60 32 27 39 26 46 28 64 65 47 27 25 25 64 44 65 54 51 51 28 56 57 35 47 54 28 45 43
1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 1
25 4 6 30 27 34 6 3 27 25 35 8 4 18 1 25 4 37 33 25 5 3 2 30 21 25 31 26 21 4 32 26 9 21 27 5 22 18
56 32 23 45 34 54 45 32 36 54 33 45 32 39 22 34 23 43 49 37 28 34 24 53 48 47 55 43 46 25 55 49 27 54 45 29 45 43
3 1 2 1 2 4 5 2 2 4 3 2 1 4 1 5 2 4 2 4 2 1 1 2 5 2 1 3 1 2 3 2 6 4 2 3 5 2
0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 1
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Use logistic regression to classify observations as Personal loan taken (or not taken) using Age, Gender, Work experience, Income (in 1000 $), and Family size as input variables and Personal
loan as the output variable. Perform an exhaustive-search best subset selection with the number of best subsets equal to 2. a. From the generated set of logistic regression models, select one that you believe is a good fit. Express the model as a mathematical equation relating the output variable to the input variables. b. Increases in which variables increase the chance of a customer who has taken the personal loan? Increases in which variables decrease the chance of a customer who has not taken the personal loan? c. Using the default cutoff value of 0.5 for your logistic regression model, what is the overall error rate on the test data? Answer: a.
Using Mallow’s Cp statistic to guide the selection, we see that the model using 3 independent variables seem to be viable candidates. We will select the model with 3 variables (4 coefficients including the intercept).
The resulting model is: log odds of the event (personal loan) = 5.009 – 0.194*Age + 0.210*Work experience - 0.282*Family size. b. From the regression output, increase in work experience increases the likelihood of a customer taking personal loan. Increase in Age and Family size will decrease the likelihood of a customer taking the personal loan. c. The overall error rate is 53.33%.
Difficulty: Moderate LO: 6.4, Pages 269-307 Bloom’s: Application BUSPROG: Analytic DISC: 13. A bank wants to understand better the details of customers who are likely to default the loan. In order to analyze this, the data from a random sample of 200 customers are given below: Average Balance 1222.3 6291.0 1051.0 1118.3 1176.8 1052.0 1314.6 439.7 1232.7 1855.4 322.4 1570.7 2729.0 1397.8 1464.1 40.3 1296.4 2142.7
Age 36 41 52 36 35 46 37 34 56 51 44 39 51 41 47 46 32 58
Gender 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0
Married Divorced 0 0 1 1 1 0 0 0 1 0 1 0 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0
Family size 1 3 4 2 2 3 1 5 1 2 5 3 1 2 2 3 2 4
Loan Default 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0
2756.3 1451.1 1003.9 1245.7 3011.1 1222.3 2225.9 2708.2 2341.6 1817.7 1417.3 4291.6 1310.7 1144.0 1088.4 1341.9 1269.1 1435.5 113.7 4646.5 1003.9 1773.5 3349.1 647.0 3901.6 1603.2 1308.1 4061.5 2283.1 1023.0 1083.2 1158.6 1052.0 592.2 6834.4 1505.7 1170.0 1509.6 1061.0 517.2 1661.7
32 49 48 40 42 30 51 40 30 42 52 59 34 38 47 31 56 46 58 39 38 46 45 54 58 38 39 35 57 40 52 26 30 35 41 57 56 43 52 60 46
1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1
1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1
0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1
2 3 2 6 1 2 5 2 1 2 3 4 2 3 2 2 1 2 3 3 2 2 3 3 4 2 2 1 3 2 4 1 2 3 3 3 6 4 3 5 2
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0
1279.5 1656.5 1319.8 1227.5 1748.8 1060.0 1119.6 1135.0 2777.1 1535.6 352.0 1605.8 5737.2 3354.3 10096.1 9164.0 6796.7 2108.9 265.2 1097.0 1041.0 1224.9 1557.7 3202.2 1173.0 1794.3 2423.5 171.8 12157.9 4107.0 887.9 1165.1 643.5 1529.1 2142.7 1035.0 1003.9 1509.6 1118.3 1124.8 1891.8
57 46 49 38 51 38 44 59 50 34 59 42 48 45 57 53 48 57 52 55 58 51 44 57 39 41 47 52 50 52 47 44 50 45 59 54 48 43 37 46 49
0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1
0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 1 3 4 2 3 2 1 5 2 2 1 2 3 4 5 3 3 2 1 3 2 3 1 2 3 3 2 2 3 1 1 1 2 3 1 2 2 1
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 1
6796.7 1709.8 1011.7 1270.4 1663.0 1648.7 1887.9 1244.4 2465.1 6086.9 1262.6 1513.5 1170.3 1557.7 2454.7 710.8 3711.8 1748.8 1248.3 1002.6 1130.0 1040.3 1595.4 1144.0 1582.4 1049.0 1577.2 561.0 3349.1 1704.6 1245.7 16191.8 2185.6 1167.7 1535.6 1319.8 1145.6 1304.2 1851.5 2099.8 1152.0
48 54 30 52 39 33 47 37 36 38 57 27 29 45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46 45 35 54 37 40 34 50 49 53 49 40 34
1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0
1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1
0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0
1 2 3 2 3 2 2 3 3 3 3 1 3 2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2 3 2 3 3 1 2 1 1 3 3 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1219.7 1235.3 1811.2 732.5 5630.6 2420.9 2454.7 1557.7 4017.3 4017.3 1351.0 1507.0 1050.7 1657.8 1115.0 245.9 1058.5 1377.0 1079.3 1456.3 2063.4 1106.6 1119.6 2496.3 1578.5 1284.7 1409.5 1085.8 1083.0 1556.4 1080.6 1457.6 1478.4 1690.3 1458.9 1465.4 1002.6 1728.0 1015.6 1163.8 1299.0
49 50 30 28 56 67 58 46 72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41 33 47 51 56 46 51 34 31 48 59
0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 0
1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1
0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 3 1 2 3 3 3 2 2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3 2 3 1 3 3 1 1 2 1 2
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1400.4 1005.2 1341.9 1032.5 1236.6 1087.1 1170.3 1237.9 1296.4 1182.0 1133.0 1629.2 1830.7 1137.8 2011.4 170.3 1135.2 195.0
38 27 40 39 48 34 55 55 51 37 39 52 36 36 56 36 39 29
1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1
1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1
1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0
1 3 2 1 1 1 1 1 2 1 1 3 3 3 3 1 3 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
In XLMiner’s Partition with Oversampling procedure, partition the data so there is 50 percent successes (Loan default) in the training set and 40 percent of the validation data is taken away as test data. Classify the data using k-nearest neighbors with up to k = 10. Use Loan default as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s k-nearest neighbors Classification procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. a. b. c. d. e.
For the cutoff probability value 0.5, what value of k minimizes the overall error rate on the validation data? What is the overall error rate on the test data? Interpret this measure. What are the Class 1 error rate and the Class 0 error rate on the test data? Compute and interpret the sensitivity and specificity for the test data. Examine the decile-wise lift chart on the test data. What is the first decile lift on the test data? Interpret this value.
Answer: a. A value of k = 1 minimizes the overall error rate on the validation set.
b. The overall error rate on the test data is 32.50%. For a randomly-selected observation from the test data, k-NN with k=1 will classify it correctly 32.50% of the time.
c. The class 1 error rate is 50.00% and the class 0 error rate is 29.41% for the test data. d. Sensitivity = 1 – class 1 error rate = 50.00%. This means that the model can correctly identify 50% of the customers who had defaulted on their loan. Specificity = 1 – class 0 error rate = 70.59%. This means that the model can correctly identify 70.59% of the customers who had not defaulted on their loan in the test data. e. The first decile lift is 1.54. For this test data set of 40 customers and 6 actual customers who have defaulted the loan, if we randomly selected 4 customers, on average 0.6 of the customers would have defaulted on their loan. However, if we use k-NN with k = 1 to identify the top 4 customers, then (0.6)(1.54) = 0.92 customers would have defaulted on their loan. This can be confirmed from the Detailed Scoring report on the test data by observing that there are 13 observations with a predicted probability of 100% of taking a personal loan, but only 3 of these actually took a loan. Thus, of the top 4 observations recommended by k-NN with k = 1, there would be on average (4)(3/13) = 0.92 that actually defaulted. Difficulty: Moderate
LO: 6.4, Pages 269-280 Bloom’s: Application BUSPROG: Analytic DISC: 14. A bank wants to understand better the details of customers who are likely to default the loan. In order to analyze this, the data from a random sample of 200 customers are given below: Average Balance 1222.3 6291.0 1051.0 1118.3 1176.8 1052.0 1314.6 439.7 1232.7 1855.4 322.4 1570.7 2729.0 1397.8 1464.1 40.3 1296.4 2142.7 2756.3 1451.1 1003.9 1245.7 3011.1 1222.3 2225.9 2708.2 2341.6 1817.7 1417.3 4291.6 1310.7
Age 36 41 52 36 35 46 37 34 56 51 44 39 51 41 47 46 32 58 32 49 48 40 42 30 51 40 30 42 52 59 34
Gender 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1
Married Divorced 0 0 1 1 1 0 0 0 1 0 1 0 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0
Family size 1 3 4 2 2 3 1 5 1 2 5 3 1 2 2 3 2 4 2 3 2 6 1 2 5 2 1 2 3 4 2
Loan Default 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1144.0 1088.4 1341.9 1269.1 1435.5 113.7 4646.5 1003.9 1773.5 3349.1 647.0 3901.6 1603.2 1308.1 4061.5 2283.1 1023.0 1083.2 1158.6 1052.0 592.2 6834.4 1505.7 1170.0 1509.6 1061.0 517.2 1661.7 1279.5 1656.5 1319.8 1227.5 1748.8 1060.0 1119.6 1135.0 2777.1 1535.6 352.0 1605.8 5737.2
38 47 31 56 46 58 39 38 46 45 54 58 38 39 35 57 40 52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59 50 34 59 42 48
1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0
1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0
3 2 2 1 2 3 3 2 2 3 3 4 2 2 1 3 2 4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3 2 1 5 2 2
0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0
3354.3 10096.1 9164.0 6796.7 2108.9 265.2 1097.0 1041.0 1224.9 1557.7 3202.2 1173.0 1794.3 2423.5 171.8 12157.9 4107.0 887.9 1165.1 643.5 1529.1 2142.7 1035.0 1003.9 1509.6 1118.3 1124.8 1891.8 6796.7 1709.8 1011.7 1270.4 1663.0 1648.7 1887.9 1244.4 2465.1 6086.9 1262.6 1513.5 1170.3
45 57 53 48 57 52 55 58 51 44 57 39 41 47 52 50 52 47 44 50 45 59 54 48 43 37 46 49 48 54 30 52 39 33 47 37 36 38 57 27 29
0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
1 2 3 4 5 3 3 2 1 3 2 3 1 2 3 3 2 2 3 1 1 1 2 3 1 2 2 1 1 2 3 2 3 2 2 3 3 3 3 1 3
0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1557.7 2454.7 710.8 3711.8 1748.8 1248.3 1002.6 1130.0 1040.3 1595.4 1144.0 1582.4 1049.0 1577.2 561.0 3349.1 1704.6 1245.7 16191.8 2185.6 1167.7 1535.6 1319.8 1145.6 1304.2 1851.5 2099.8 1152.0 1219.7 1235.3 1811.2 732.5 5630.6 2420.9 2454.7 1557.7 4017.3 4017.3 1351.0 1507.0 1050.7
45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46 45 35 54 37 40 34 50 49 53 49 40 34 49 50 30 28 56 67 58 46 72 72 41 29 60
1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1
1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2 3 2 3 3 1 2 1 1 3 3 1 1 2 3 1 2 3 3 3 2 2 1 2 3 1
0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1657.8 1115.0 245.9 1058.5 1377.0 1079.3 1456.3 2063.4 1106.6 1119.6 2496.3 1578.5 1284.7 1409.5 1085.8 1083.0 1556.4 1080.6 1457.6 1478.4 1690.3 1458.9 1465.4 1002.6 1728.0 1015.6 1163.8 1299.0 1400.4 1005.2 1341.9 1032.5 1236.6 1087.1 1170.3 1237.9 1296.4 1182.0 1133.0 1629.2 1830.7
36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41 33 47 51 56 46 51 34 31 48 59 38 27 40 39 48 34 55 55 51 37 39 52 36
0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0
1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1
1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0
1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3 2 3 1 3 3 1 1 2 1 2 1 3 2 1 1 1 1 1 2 1 1 3 3
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1137.8 2011.4 170.3 1135.2 195.0
36 56 36 39 29
1 0 0 1 1
1 1 1 1 1
0 1 1 0 0
3 3 1 3 1
0 0 1 1 1
In XLMiner’s Partition with Oversampling procedure, partition the data so there is 50 percent successes (Loan default) in the training set and 40 percent of the validation data is taken away as test data. Fit a classification tree using Loan Default as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s Classification Tree procedure, be sure to Normalize input data, and set the Minimum #records in a terminal node to 1. In Step 3 of XLMiner’s Classification Tree procedure, set the maximum number of levels to 7. Generate the Full tree, Best pruned tree, and Minimum error tree. Generate lift charts for both the validation data and test data. a. Why is partitioning with oversampling advised in this case? b. Interpret the set of rules implied by the best pruned tree that characterize loan defaulters. c. For the default cutoff value of 0.5, what are the overall error rate, Class 1 error rate, and Class 0 error rate of the best pruned tree on the test data? d. Examine the decile-wise lift chart for the best pruned tree on the test data. What is the first decile lift? Interpret this value. Answer: a. Loan default observations only make up 15.5% of the data set. By oversampling the Loan default observations in the training set, a data mining algorithm can better learn how to classify them. b. Customers who had defaulted their loan are classified by: i. The average balance is less than $951.75 OR ii. The average balance is less than $1216.30 and the family size is greater than 2.
c. The overall error rate is 10.00%. The class 1 error rate is 50.00% and the class 0 error rate is 2.94%.
d. The first decile lift is 5. For this test data set of 40 customers and 6 actual customers who have defaulted loan, if we randomly selected 4 customers, on average 0.6 customers would have
defaulted on their loan. However, if we use the classification tree to identify the top 4 customers, then (.6)(5) = 3 of the customers would have defaulted on their loan. This can be confirmed from the Detailed Scoring report by observing that among the top 4 observations in the test set rated by the best pruned tree to be most likely to default, 3 actually defaulted. Difficulty: Moderate LO: 6.4, Pages 269-291 Bloom’s: Application BUSPROG: Analytic DISC:
15. A bank wants to understand better the details of customers who are likely to default the loan. In order to analyze this, the data from a random sample of 200 customers are given below: Average Balance 1222.3 6291.0 1051.0 1118.3 1176.8 1052.0 1314.6 439.7 1232.7 1855.4 322.4 1570.7 2729.0 1397.8 1464.1 40.3 1296.4 2142.7 2756.3 1451.1 1003.9 1245.7 3011.1 1222.3 2225.9 2708.2
Age 36 41 52 36 35 46 37 34 56 51 44 39 51 41 47 46 32 58 32 49 48 40 42 30 51 40
Gender 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1
Married Divorced 0 0 1 1 1 0 0 0 1 0 1 0 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0
Family size 1 3 4 2 2 3 1 5 1 2 5 3 1 2 2 3 2 4 2 3 2 6 1 2 5 2
Loan Default 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
2341.6 1817.7 1417.3 4291.6 1310.7 1144.0 1088.4 1341.9 1269.1 1435.5 113.7 4646.5 1003.9 1773.5 3349.1 647.0 3901.6 1603.2 1308.1 4061.5 2283.1 1023.0 1083.2 1158.6 1052.0 592.2 6834.4 1505.7 1170.0 1509.6 1061.0 517.2 1661.7 1279.5 1656.5 1319.8 1227.5 1748.8 1060.0 1119.6 1135.0
30 42 52 59 34 38 47 31 56 46 58 39 38 46 45 54 58 38 39 35 57 40 52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59
1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0
1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1
1 2 3 4 2 3 2 2 1 2 3 3 2 2 3 3 4 2 2 1 3 2 4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3
0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
2777.1 1535.6 352.0 1605.8 5737.2 3354.3 10096.1 9164.0 6796.7 2108.9 265.2 1097.0 1041.0 1224.9 1557.7 3202.2 1173.0 1794.3 2423.5 171.8 12157.9 4107.0 887.9 1165.1 643.5 1529.1 2142.7 1035.0 1003.9 1509.6 1118.3 1124.8 1891.8 6796.7 1709.8 1011.7 1270.4 1663.0 1648.7 1887.9 1244.4
50 34 59 42 48 45 57 53 48 57 52 55 58 51 44 57 39 41 47 52 50 52 47 44 50 45 59 54 48 43 37 46 49 48 54 30 52 39 33 47 37
1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1
0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 5 2 2 1 2 3 4 5 3 3 2 1 3 2 3 1 2 3 3 2 2 3 1 1 1 2 3 1 2 2 1 1 2 3 2 3 2 2 3
0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
2465.1 6086.9 1262.6 1513.5 1170.3 1557.7 2454.7 710.8 3711.8 1748.8 1248.3 1002.6 1130.0 1040.3 1595.4 1144.0 1582.4 1049.0 1577.2 561.0 3349.1 1704.6 1245.7 16191.8 2185.6 1167.7 1535.6 1319.8 1145.6 1304.2 1851.5 2099.8 1152.0 1219.7 1235.3 1811.2 732.5 5630.6 2420.9 2454.7 1557.7
36 38 57 27 29 45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46 45 35 54 37 40 34 50 49 53 49 40 34 49 50 30 28 56 67 58 46
0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
3 3 3 1 3 2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2 3 2 3 3 1 2 1 1 3 3 1 1 2 3 1 2 3 3 3 2
0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
4017.3 4017.3 1351.0 1507.0 1050.7 1657.8 1115.0 245.9 1058.5 1377.0 1079.3 1456.3 2063.4 1106.6 1119.6 2496.3 1578.5 1284.7 1409.5 1085.8 1083.0 1556.4 1080.6 1457.6 1478.4 1690.3 1458.9 1465.4 1002.6 1728.0 1015.6 1163.8 1299.0 1400.4 1005.2 1341.9 1032.5 1236.6 1087.1 1170.3 1237.9
72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41 33 47 51 56 46 51 34 31 48 59 38 27 40 39 48 34 55 55
1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 0
1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1
0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3 2 3 1 3 3 1 1 2 1 2 1 3 2 1 1 1 1 1
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
1296.4 1182.0 1133.0 1629.2 1830.7 1137.8 2011.4 170.3 1135.2 195.0
51 37 39 52 36 36 56 36 39 29
0 1 0 0 0 1 0 0 1 1
1 0 0 1 1 1 1 1 1 1
1 0 0 0 0 0 1 1 0 0
2 1 1 3 3 3 3 1 3 1
0 0 0 0 0 0 0 1 1 1
In XLMiner’s Partition with Oversampling procedure, partition the data so there is 50 percent successes (Loan default) in the training set and 40 percent of the validation data are taken away as test data. Construct a logistic regression model using Loan default as the output variable and all the other variables as input variables. Perform an exhaustive-search best subset selection with the number of best subsets equal to 2. Generate lift charts for both the validation data and test data. a. From the generated set of logistic regression models, select one that is a good fit. Express the model as a mathematical equation relating the output variable to the input variables. Do the relationships suggested by the model make sense? Try to explain them. b. Using the default cutoff value of 0.5 for your logistic regression model, what is the overall error rate on the test data? c. Examine the decile-wise lift chart for your model on the test data. What is the first decile lift? Interpret this value. Answer: a. Using Mallow’s Cp statistic to guide the selection, we see that the model using 2 independent variables seem to be viable candidates. We will select the model with 2 variables (3 coefficients including the intercept).
Resulting model: log odds of event (Loan default) = 1.556 – 0.004*Average Balance + 1.569*Family size
As the average balance in the account increases, the chances of a customer defaulting the loan decreases, and if the family size increases the chances of a customer defaulting the loan increases. b. The overall error rate is 22.50%.
c. The first decile lift is 5. For this test data set of 40 customers and 6 actual customers who have defaulted the loan, if we randomly selected 4 customers, on average 0.6 customers would have defaulted on their loan. However, if we use the classification tree to identify the top 4 customers, then (0.6)(5) = 3 of the customers would have defaulted on their loan. Difficulty: Moderate LO: 6.4, Pages 269-307 Bloom’s: Application BUSPROG: Analytic DISC:
16. To examine the local housing market in a particular region, a sample of 120 homes sold during a year are collected. The data are given below:
LandValue ($)
BuildingValue ($)
Acres
Baths
Toilets
18100
92500
0.5
1
1
Fireplaces Bedrooms 1
4
Age 53.9
Sale Price ($) 114885
23600 25900 22100 23900 22400 24100 26300 24900 13600 36100 19500 38800 23500 26300 21900 23400 15000 15000 9200 9200 5600 9000 21000 23500 36000 23700 22000 19900 22100 24600 21500 15000 15700 14200 10700 16600 25500 15100 7400 28500 25100 50100 83300
152700 134300 129600 168700 118300 123300 133800 139400 87200 210400 101300 224700 139000 164200 122400 149600 102200 102200 22000 22000 48000 58800 109600 165900 262500 114900 102700 95800 116300 165500 113400 81100 129200 81600 49700 72700 110700 74300 55500 129400 83900 164600 276000
0.22 0.3 0.23 0.32 0.25 0.26 0.26 0.24 0.17 0.6 0.16 0.44 0.22 0.35 0.17 0.22 0.12 0.12 0.17 0.17 0.12 0.24 0.21 0.15 0.22 0.22 0.2 0.23 0.18 0.29 0.17 0.16 0.23 0.15 0.15 0.18 0.21 0.23 0.15 0.25 0.2 0.23 0.61
2 2 2 2 2 1 2 2 1 2 1 2 1 2 2 2 1 1 1 1 1 1 2 2 3 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 2 3
1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 0 2 1 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 0 0 0 0 1 1 1 0 0 0 0 1
3 3 4 4 3 4 3 4 3 2 2 4 3 3 3 3 3 3 4 4 3 3 3 4 4 4 3 2 3 4 3 2 3 3 2 4 3 3 2 4 3 3 2
19.7 15.9 41 39.9 41.8 70.9 37.8 33 34.7 52.9 67.8 21.7 10.8 3.9 15.7 15.7 97.8 97.7 120.9 120.9 103.9 88 36.7 5.7 2.9 37.7 48.9 78.9 30.8 43 44.9 62.9 46.7 57.9 99.8 91.8 48 71.8 96.8 49.9 45.8 44 47.9
180895 162038 154496 196973 145075 151480 164762 166528 105762 250170 125082 265066 166697 194881 146818 176048 119584 121759 34947 35214 57142 72192 133848 194079 300407 141700 128866 119189 141018 193661 137308 99817 148909 100701 65082 92614 137889 91180 64119 160139 113043 217684 360936
124500 47000 64600 33900 41100 29100 56400 45400 23800 52800 25100 27200 28100 28800 33400 20700 25600 25800 29300 26000 25900 32800 31100 25800 27200 25000 29200 30000 20400 23600 16200 29300 27000 25600 46200 22900 27100 30700 29100 34700 20000 35700 35100
552300 214400 185000 138800 156300 96400 256400 219200 92100 172800 99200 152600 102900 98800 103900 95600 101900 110700 147700 116000 73500 125000 166800 105300 94800 105900 117500 93300 112000 83400 85800 123900 97800 86300 220500 160000 105200 107100 102400 150400 80400 159400 161500
1.05 0.22 0.58 0.22 0.18 0.28 0.4 0.21 0.15 0.27 0.19 0.18 0.18 0.19 0.45 0.14 0.2 0.18 0.2 0.18 0.16 0.35 0.2 0.17 0.17 0.16 0.2 0.26 0.13 0.16 0.1 0.22 0.18 0.16 0.57 0.15 0.21 0.3 0.23 0.28 0.24 0.28 0.25
4 2 2 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 3 1 1 2 1 1 2 2
1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 1 1 1 0 1 1 1 2 0 1 1 0 1 1 0 0 1 1 0 1 2 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 2 0 1 1
4 4 4 4 3 1 3 3 4 2 3 3 3 3 4 3 2 3 4 3 2 3 2 3 3 3 3 2 3 2 2 4 3 3 4 3 3 3 2 3 3 4 3
5.7 92.9 91 97.9 76 57.8 56.8 79.8 91.9 74.8 36.7 16.7 75.8 53.9 84.9 89.8 57.8 51.9 90.9 44 81.8 68.7 57.7 58.8 42.9 82 53.8 55.7 80.9 57.7 67 44.8 46.8 61.7 50.8 20.7 51.8 70 58 68.9 66.9 1.7 8.8
679795 264115 254075 173987 200251 130214 316874 267672 119769 229499 128456 181102 132977 131411 139697 120046 131026 141202 181575 144513 100953 160546 199970 134647 124311 133543 151392 124476 136599 110399 105027 157819 129675 115952 268552 187870 135549 142738 135284 189790 105302 196936 201349
33700 33700 36400 33200 39200 33100 16000 24900 22000 20000 33900 22100 22800 24700 38700 25800 31700 82200 19500 24400 22500 25900 22700 21200 34000 18900 33900 23800 23900 18500 36300 47300 36600
162500 162500 176100 122300 169200 180100 98400 63800 121300 107600 230800 153800 111100 117800 118700 108000 140500 171700 147600 132000 119800 117100 95000 56700 163800 118000 151600 133500 119000 110500 122500 298800 238700
0.21 0.21 0.29 0.2 0.36 0.2 0.19 0.45 0.27 0.23 0.27 0.3 0.23 0.32 0.81 0.26 0.34 1.23 0.53 0.25 0.18 0.29 0.25 0.23 0.26 0.17 0.26 0.21 0.21 0.19 0.61 0.36 0.28
2 2 2 2 2 2 1 2 1 1 2 1 1 1 1 2 1 2 2 2 2 2 1 1 2 1 2 2 2 2 1 3 2
1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 0 1 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 2 2 1 2
4 4 4 3 3 4 4 2 4 3 3 3 3 3 3 2 3 3 2 3 3 3 3 2 4 3 3 3 3 4 3 4 3
8.8 8.8 8.9 4.9 5.9 5.8 49.9 83.9 34.9 36.7 10 46.8 52 48.7 47.8 53.3 40.6 56.4 28.2 14.2 15.5 17.7 55.6 96.6 15.2 45.5 25.3 13.6 14.3 32.2 56.2 31.4 25.5
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the sale price using k-nearest neighbors with up to k = 10. Use Sale Price as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s k-Nearest Neighbors Prediction procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate a Detailed Scoring report for all three sets of data. a. What value of k minimizes the root mean squared error (RMSE) on the validation data? b. What is the RMSE on the validation data and test data?
198580 200228 215634 157208 212662 217543 118491 91539 147802 131948 268444 180464 137326 145115 159644 135049 174475 257467 169311 157570 143676 146960 121175 81869 199361 139981 186637 161123 146054 130575 162270 348138 278839
c. What is the average error on the validation data and test data? What does this suggest? Answer: a. A value of k = 2 minimizes the RMSE on validation data.
b. The RMSE on the validation set is $22,873.11 and the RMSE on the test data is $27,987.05.
c. The average error of -778.37 on the validation data suggests a slight tendency to overestimate the output variable in the validation data. The average error of 8324.75 on the test data suggests a tendency to underestimate the output variable in the test data. The difference in sign of these two average error estimates suggests that there is no systematic bias in the predictions. Difficulty: Easy
LO: 6.4, Pages 269-283 Bloom’s: Application BUSPROG: Analytic DISC:
17. To examine the local housing market in a particular region, a sample of 120 homes sold during a year are collected. The data are given below:
LandValue ($)
BuildingValue ($)
Acres
Baths
Toilets
18100 23600 25900 22100 23900 22400 24100 26300 24900 13600 36100 19500 38800 23500 26300 21900 23400 15000 15000 9200 9200 5600 9000 21000 23500 36000 23700 22000 19900 22100 24600
92500 152700 134300 129600 168700 118300 123300 133800 139400 87200 210400 101300 224700 139000 164200 122400 149600 102200 102200 22000 22000 48000 58800 109600 165900 262500 114900 102700 95800 116300 165500
0.5 0.22 0.3 0.23 0.32 0.25 0.26 0.26 0.24 0.17 0.6 0.16 0.44 0.22 0.35 0.17 0.22 0.12 0.12 0.17 0.17 0.12 0.24 0.21 0.15 0.22 0.22 0.2 0.23 0.18 0.29
1 2 2 2 2 2 1 2 2 1 2 1 2 1 2 2 2 1 1 1 1 1 1 2 2 3 1 1 2 1 1
1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Fireplaces Bedrooms 1 1 1 1 1 1 1 1 1 0 2 1 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 0 1
4 3 3 4 4 3 4 3 4 3 2 2 4 3 3 3 3 3 3 4 4 3 3 3 4 4 4 3 2 3 4
Age 53.9 19.7 15.9 41 39.9 41.8 70.9 37.8 33 34.7 52.9 67.8 21.7 10.8 3.9 15.7 15.7 97.8 97.7 120.9 120.9 103.9 88 36.7 5.7 2.9 37.7 48.9 78.9 30.8 43
Sale Price ($) 114885 180895 162038 154496 196973 145075 151480 164762 166528 105762 250170 125082 265066 166697 194881 146818 176048 119584 121759 34947 35214 57142 72192 133848 194079 300407 141700 128866 119189 141018 193661
21500 15000 15700 14200 10700 16600 25500 15100 7400 28500 25100 50100 83300 124500 47000 64600 33900 41100 29100 56400 45400 23800 52800 25100 27200 28100 28800 33400 20700 25600 25800 29300 26000 25900 32800 31100 25800 27200 25000 29200 30000 20400 23600
113400 81100 129200 81600 49700 72700 110700 74300 55500 129400 83900 164600 276000 552300 214400 185000 138800 156300 96400 256400 219200 92100 172800 99200 152600 102900 98800 103900 95600 101900 110700 147700 116000 73500 125000 166800 105300 94800 105900 117500 93300 112000 83400
0.17 0.16 0.23 0.15 0.15 0.18 0.21 0.23 0.15 0.25 0.2 0.23 0.61 1.05 0.22 0.58 0.22 0.18 0.28 0.4 0.21 0.15 0.27 0.19 0.18 0.18 0.19 0.45 0.14 0.2 0.18 0.2 0.18 0.16 0.35 0.2 0.17 0.17 0.16 0.2 0.26 0.13 0.16
1 1 2 1 1 1 1 1 1 1 1 2 3 4 2 2 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 1 1 1 0 0 0 0 1 2 2 1 1 1 0 1 1 1 2 0 1 1 0 1 1 0 0 1 1 0 1 2 0 0 1 1 0 1 0
3 2 3 3 2 4 3 3 2 4 3 3 2 4 4 4 4 3 1 3 3 4 2 3 3 3 3 4 3 2 3 4 3 2 3 2 3 3 3 3 2 3 2
44.9 62.9 46.7 57.9 99.8 91.8 48 71.8 96.8 49.9 45.8 44 47.9 5.7 92.9 91 97.9 76 57.8 56.8 79.8 91.9 74.8 36.7 16.7 75.8 53.9 84.9 89.8 57.8 51.9 90.9 44 81.8 68.7 57.7 58.8 42.9 82 53.8 55.7 80.9 57.7
137308 99817 148909 100701 65082 92614 137889 91180 64119 160139 113043 217684 360936 679795 264115 254075 173987 200251 130214 316874 267672 119769 229499 128456 181102 132977 131411 139697 120046 131026 141202 181575 144513 100953 160546 199970 134647 124311 133543 151392 124476 136599 110399
16200 29300 27000 25600 46200 22900 27100 30700 29100 34700 20000 35700 35100 33700 33700 36400 33200 39200 33100 16000 24900 22000 20000 33900 22100 22800 24700 38700 25800 31700 82200 19500 24400 22500 25900 22700 21200 34000 18900 33900 23800 23900 18500
85800 123900 97800 86300 220500 160000 105200 107100 102400 150400 80400 159400 161500 162500 162500 176100 122300 169200 180100 98400 63800 121300 107600 230800 153800 111100 117800 118700 108000 140500 171700 147600 132000 119800 117100 95000 56700 163800 118000 151600 133500 119000 110500
0.1 0.22 0.18 0.16 0.57 0.15 0.21 0.3 0.23 0.28 0.24 0.28 0.25 0.21 0.21 0.29 0.2 0.36 0.2 0.19 0.45 0.27 0.23 0.27 0.3 0.23 0.32 0.81 0.26 0.34 1.23 0.53 0.25 0.18 0.29 0.25 0.23 0.26 0.17 0.26 0.21 0.21 0.19
1 1 1 1 2 3 1 1 2 1 1 2 2 2 2 2 2 2 2 1 2 1 1 2 1 1 1 1 2 1 2 2 2 2 2 1 1 2 1 2 2 2 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 0 0 1 1 0 0 0 2 0 1 1 1 1 1 0 1 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 2
2 4 3 3 4 3 3 3 2 3 3 4 3 4 4 4 3 3 4 4 2 4 3 3 3 3 3 3 2 3 3 2 3 3 3 3 2 4 3 3 3 3 4
67 44.8 46.8 61.7 50.8 20.7 51.8 70 58 68.9 66.9 1.7 8.8 8.8 8.8 8.9 4.9 5.9 5.8 49.9 83.9 34.9 36.7 10 46.8 52 48.7 47.8 53.3 40.6 56.4 28.2 14.2 15.5 17.7 55.6 96.6 15.2 45.5 25.3 13.6 14.3 32.2
105027 157819 129675 115952 268552 187870 135549 142738 135284 189790 105302 196936 201349 198580 200228 215634 157208 212662 217543 118491 91539 147802 131948 268444 180464 137326 145115 159644 135049 174475 257467 169311 157570 143676 146960 121175 81869 199361 139981 186637 161123 146054 130575
36300 47300 36600
122500 298800 238700
0.61 0.36 0.28
1 3 2
1 1 1
2 1 2
3 4 3
56.2 31.4 25.5
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the sale price using a regression tree. Use Sale Price as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s Regression Tree procedure, be sure to Normalize input data, to set the Maximum #splits for input variables to 59, to set the Minimum #records in a terminal node to 1, and specify Using Best prune tree as the scoring option. In Step 3 of XLMiner’s Regression Tree procedure, set the maximum number of levels to 7. Generate the Full tree and Best pruned tree. a. In terms of number of decision nodes, compare the size of the full tree to the size of the best pruned tree. b. What is the root mean squared error (RMSE) of the best pruned tree on the validation data and on the test data? c. What is the average error on the validation data and test data? What does this suggest? d. By examining the best pruned tree, what are the critical variables in predicting the sale price of a home? Answer: a. There 59 decision nodes in the full tree and 41 decision nodes in the best pruned tree. b. The RMSE on the validation set is $13,757.85 and the RMSE on the test data is $16,054.64.
c. The average error on the validation set is $5464.69 and the average error on the test data is $5294.11. There is a slight evidence of systematic underestimation of home sale price. d. The best pruned tree for the pre-crisis data contains decision nodes on BuildingValue, LandValue, Acres, Fireplaces, and Age.
162270 348138 278839
Difficulty: Moderate LO: 6.4, Pages 269-298 Bloom’s: Application BUSPROG: Analytic DISC:
18. To examine the local housing market in a particular region, a sample of 120 homes sold during a year are collected. The data are given below:
Acres
Baths
Toilets
0.5 0.22 0.3 0.23 0.32 0.25 0.26 0.26 0.24 0.17 0.6 0.16 0.44 0.22 0.35 0.17 0.22 0.12 0.12 0.17
1 2 2 2 2 2 1 2 2 1 2 1 2 1 2 2 2 1 1 1
1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Fireplaces Bedrooms 1 1 1 1 1 1 1 1 1 0 2 1 1 0 0 1 1 0 0 0
4 3 3 4 4 3 4 3 4 3 2 2 4 3 3 3 3 3 3 4
Age 53.9 19.7 15.9 41 39.9 41.8 70.9 37.8 33 34.7 52.9 67.8 21.7 10.8 3.9 15.7 15.7 97.8 97.7 120.9
Sale Price ($) 114885 180895 162038 154496 196973 145075 151480 164762 166528 105762 250170 125082 265066 166697 194881 146818 176048 119584 121759 34947
0.17 0.12 0.24 0.21 0.15 0.22 0.22 0.2 0.23 0.18 0.29 0.17 0.16 0.23 0.15 0.15 0.18 0.21 0.23 0.15 0.25 0.2 0.23 0.61 1.05 0.22 0.58 0.22 0.18 0.28 0.4 0.21 0.15 0.27 0.19 0.18 0.18 0.19 0.45 0.14 0.2 0.18 0.2
1 1 1 2 2 3 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 2 3 4 2 2 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1
0 0 0 0 1 1 0 0 1 0 1 1 0 0 0 0 1 1 1 0 0 0 0 1 2 2 1 1 1 0 1 1 1 2 0 1 1 0 1 1 0 0 1
4 3 3 3 4 4 4 3 2 3 4 3 2 3 3 2 4 3 3 2 4 3 3 2 4 4 4 4 3 1 3 3 4 2 3 3 3 3 4 3 2 3 4
120.9 103.9 88 36.7 5.7 2.9 37.7 48.9 78.9 30.8 43 44.9 62.9 46.7 57.9 99.8 91.8 48 71.8 96.8 49.9 45.8 44 47.9 5.7 92.9 91 97.9 76 57.8 56.8 79.8 91.9 74.8 36.7 16.7 75.8 53.9 84.9 89.8 57.8 51.9 90.9
35214 57142 72192 133848 194079 300407 141700 128866 119189 141018 193661 137308 99817 148909 100701 65082 92614 137889 91180 64119 160139 113043 217684 360936 679795 264115 254075 173987 200251 130214 316874 267672 119769 229499 128456 181102 132977 131411 139697 120046 131026 141202 181575
0.18 0.16 0.35 0.2 0.17 0.17 0.16 0.2 0.26 0.13 0.16 0.1 0.22 0.18 0.16 0.57 0.15 0.21 0.3 0.23 0.28 0.24 0.28 0.25 0.21 0.21 0.29 0.2 0.36 0.2 0.19 0.45 0.27 0.23 0.27 0.3 0.23 0.32 0.81 0.26 0.34 1.23 0.53
1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 3 1 1 2 1 1 2 2 2 2 2 2 2 2 1 2 1 1 2 1 1 1 1 2 1 2 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1
1 0 1 2 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 2 0 1 1 1 1 1 0 1 1 0 1 0 1 1 1 0 0 1 0 1 1 1
3 2 3 2 3 3 3 3 2 3 2 2 4 3 3 4 3 3 3 2 3 3 4 3 4 4 4 3 3 4 4 2 4 3 3 3 3 3 3 2 3 3 2
44 81.8 68.7 57.7 58.8 42.9 82 53.8 55.7 80.9 57.7 67 44.8 46.8 61.7 50.8 20.7 51.8 70 58 68.9 66.9 1.7 8.8 8.8 8.8 8.9 4.9 5.9 5.8 49.9 83.9 34.9 36.7 10 46.8 52 48.7 47.8 53.3 40.6 56.4 28.2
144513 100953 160546 199970 134647 124311 133543 151392 124476 136599 110399 105027 157819 129675 115952 268552 187870 135549 142738 135284 189790 105302 196936 201349 198580 200228 215634 157208 212662 217543 118491 91539 147802 131948 268444 180464 137326 145115 159644 135049 174475 257467 169311
0.25 0.18 0.29 0.25 0.23 0.26 0.17 0.26 0.21 0.21 0.19 0.61 0.36 0.28
2 2 2 1 1 2 1 2 2 2 2 1 3 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 0 0 0 1 0 1 1 1 2 2 1 2
3 3 3 3 2 4 3 3 3 3 4 3 4 3
14.2 15.5 17.7 55.6 96.6 15.2 45.5 25.3 13.6 14.3 32.2 56.2 31.4 25.5
157570 143676 146960 121175 81869 199361 139981 186637 161123 146054 130575 162270 348138 278839
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the sale price using multiple linear regression. Use Sale Price as the output variable and all the other variables as input variables. To generate a pool of models to consider, execute the following steps. In Step 2 of XLMiner’s Multiple Linear Regression procedure, click the Best subset option. In the Best Subset dialog box, check the box next to Perform best subset selection, enter 6 in the box next to Maximum size of best subset:, enter 1 in the box next to Number of best subsets:, and check the box next to Exhaustive search. a. From the generated set of multiple linear regression models, select one that you believe is a good fit. Express the model as a mathematical equation relating the output variable to the input variables. b. For your model, what is the RMSE on the validation data and test data? c. What is the average error on the validation data and test data? What does this suggest? Answer: a. Using goodness-of-fit measures such as Mallow’s Cp statistic and adjusted R2, we see that there are several viable models for consideration. After comparing several of the candidate models based on prediction error on the validation set, we suggest that including 4 independent variables may be good candidates to examine the output variable more closely.
Sale Price estimate = 53,752 + 145,163*Acres + 61,950*Baths + 24,936*Fireplaces – 565*Age b.
The RMSE on the validation set is $42,320 and the RMSE on the test data is $47,188.
c. The average error of -$13,518 on the validation data and -$24,878 on the test data suggest a slight tendency to underestimate the output variable. Difficulty: Moderate LO: 6.4, Pages 269-307 Bloom’s: Application BUSPROG: Analytic DISC:
19. A research team wanted to assess the relationship between age, systolic blood pressure, smoking, and risk of stroke. A sample of 150 patients who had a stroke is selected and the data collected are given below. Here, for the variable Smoker, 1 represents smokers and 0 represents nonsmokers.
Age 86
Blood Pressure Smoker 177 1
Risk 45
76 56 78 67 77 60 66 80 62 59 72 70 73 67 64 67 64 62 75 59 64 71 78 65 67 69 67 75 72 71 70 68 63 70 75 66 64 71 58 70 63 65 63
189 155 98 145 209 199 166 125 117 83 134 145 188 163 87 123 204 145 213 196 124 145 120 156 167 143 187 193 85 152 89 132 165 221 173 145 132 167 155 134 92 143 143
1 0 1 0 1 1 1 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 0 1
65 16 45 7 34 67 54 67 56 12 32 45 26 67 87 23 23 34 56 76 34 54 43 56 13 76 54 34 6 23 45 34 68 23 87 67 56 45 26 65 76 28 45
66 65 80 67 60 60 62 66 57 68 66 72 70 70 61 60 82 70 63 75 60 72 72 65 72 60 64 65 70 75 75 62 67 69 73 68 67 61 61 71 60 67 67
87 154 135 156 187 125 176 187 152 154 134 165 173 132 167 165 119 184 167 132 176 78 93 154 77 134 165 187 234 123 99 103 114 156 160 107 142 165 141 128 138 117 147
1 1 1 0 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0
34 39 72 24 22 34 59 67 54 26 34 56 27 76 45 34 76 12 56 89 76 56 12 54 56 78 38 59 6 47 21 28 39 13 47 56 43 26 45 87 45 34 8
73 67 71 67 63 73 72 70 67 69 69 62 74 71 75 63 71 66 66 71 71 70 66 68 66 63 75 61 70 72 63 61 71 66 68 60 64 65 70 67 74 61 72
135 154 174 126 142 167 159 133 147 157 175 125 150 124 176 173 172 112 130 125 104 125 102 176 167 156 187 113 142 105 140 137 142 105 149 134 128 111 106 101 170 130 164
1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
56 76 43 56 52 23 45 52 45 32 67 65 49 58 12 52 49 64 69 61 34 58 64 58 54 47 34 56 37 57 12 34 67 54 36 5 43 47 26 37 48 59 62
65 62 75 65 62 67 67 62 67 74 61 75 75 61 65 74 75 70 64 76
123 211 98 67 145 132 145 132 154 167 156 187 193 132 156 123 156 167 165 123
1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1
68 9 56 53 45 49 39 46 56 63 12 59 39 47 52 57 34 12 48 41
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the Risk of stroke using k-nearest neighbors with up to k = 20. Use Risk as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s k-Nearest Neighbors Prediction procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate a Detailed Scoring report for all three sets of data. a. What value of k minimizes the root mean squared error (RMSE) on the validation data? b. What is the RMSE on the validation data and test data? c. What is the average error on the validation data and test data? What does this suggest? Answer: a. A value of k = 10 minimizes the RMSE on validation data.
b. The RMSE on the validation set is 15.31 and the RMSE on the test data is 14.65.
c. The average error of 4.84 on the validation data suggests a slight tendency to underestimate the output variable in the validation data. The average error of -1.32 on the test data suggests a slight tendency to overestimate the output variable in the test data. The difference in sign of these two average error estimates suggests that there is no systematic bias in the predictions. Difficulty: Easy LO: 6.4, Pages 269-283 Bloom’s: Application
BUSPROG: Analytic DISC: 20. A research team wanted to assess the relationship between age, systolic blood pressure, smoking, and risk of stroke. A sample of 150 patients who had a stroke is selected and the data collected are given below. Here, for the variable Smoker, 1 represents smokers and 0 represents nonsmokers.
Age 86 76 56 78 67 77 60 66 80 62 59 72 70 73 67 64 67 64 62 75 59 64 71 78 65 67 69 67 75 72 71 70 68
Blood Pressure Smoker 177 1 189 1 155 0 98 1 145 0 209 1 199 1 166 1 125 1 117 1 83 0 134 1 145 1 188 0 163 1 87 1 123 1 204 0 145 1 213 1 196 1 124 1 145 1 120 1 156 1 167 0 143 1 187 1 193 1 85 0 152 1 89 1 132 1
Risk 45 65 16 45 7 34 67 54 67 56 12 32 45 26 67 87 23 23 34 56 76 34 54 43 56 13 76 54 34 6 23 45 34
63 70 75 66 64 71 58 70 63 65 63 66 65 80 67 60 60 62 66 57 68 66 72 70 70 61 60 82 70 63 75 60 72 72 65 72 60 64 65 70 75 75 62
165 221 173 145 132 167 155 134 92 143 143 87 154 135 156 187 125 176 187 152 154 134 165 173 132 167 165 119 184 167 132 176 78 93 154 77 134 165 187 234 123 99 103
1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1
68 23 87 67 56 45 26 65 76 28 45 34 39 72 24 22 34 59 67 54 26 34 56 27 76 45 34 76 12 56 89 76 56 12 54 56 78 38 59 6 47 21 28
67 69 73 68 67 61 61 71 60 67 67 73 67 71 67 63 73 72 70 67 69 69 62 74 71 75 63 71 66 66 71 71 70 66 68 66 63 75 61 70 72 63 61
114 156 160 107 142 165 141 128 138 117 147 135 154 174 126 142 167 159 133 147 157 175 125 150 124 176 173 172 112 130 125 104 125 102 176 167 156 187 113 142 105 140 137
1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
39 13 47 56 43 26 45 87 45 34 8 56 76 43 56 52 23 45 52 45 32 67 65 49 58 12 52 49 64 69 61 34 58 64 58 54 47 34 56 37 57 12 34
71 66 68 60 64 65 70 67 74 61 72 65 62 75 65 62 67 67 62 67 74 61 75 75 61 65 74 75 70 64 76
142 105 149 134 128 111 106 101 170 130 164 123 211 98 67 145 132 145 132 154 167 156 187 193 132 156 123 156 167 165 123
1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1
67 54 36 5 43 47 26 37 48 59 62 68 9 56 53 45 49 39 46 56 63 12 59 39 47 52 57 34 12 48 41
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the Risk of stroke using a regression tree. Use Risk as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s Regression Tree procedure, be sure to Normalize input data, to set the Maximum #splits for input variables to 74, to set the Minimum #records in a terminal node to 1, and specify Using Best prune tree as the scoring option. In Step 3 of XLMiner’s Regression Tree procedure, set the maximum number of levels to 7. Generate the Full tree, Best pruned tree, and Minimum error tree. Generate a Detailed Scoring report for all three sets of data. a. In terms of number of decision nodes, compare the size of the full tree to the size of the best pruned tree.
b. What is the root mean squared error (RMSE) of the best pruned tree on the validation data and on the test data? c. What is the average error on the validation data and test data? What does this suggest? d. By examining the best pruned tree, what are the critical variables in predicting the risk? Answer: a. There 67 decision nodes in the full tree and 1 decision node in the best pruned tree. b. The RMSE on the validation set is 13.66 and the RMSE on the test data is 14.78.
c. The average error of 3.54 on the validation data suggests a slight tendency to underestimate the output variable in the validation data. The average error of -0.91 on the test data suggests a slight tendency to overestimate the output variable in the test data. The difference in sign of these two average error estimates suggests that there is no systematic bias in the predictions. d. The best pruned tree only contains a decision node on Smoker.
Difficulty: Moderate
LO: 6.4, Pages 269-298 Bloom’s: Application BUSPROG: Analytic DISC:
Chapter 7: Spreadsheet Models 1. Which of the following is true of spreadsheet packages used in business analytics? a. They are more expensive than specialized packages. b. They require substantial user training. c. They come preloaded on computers. d. They do not have specialized functions to perform detailed analyses. Answer: C Difficulty: Moderate LO: 7.1, Page 321 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: As spreadsheets packages are less expensive, often come preloaded on computers, and are fairly easy to use, they are without question the most used business analytics tool. 2. Spreadsheet models are referred to as what-if models because they: a. are mathematical and logic-based models. b. allow easy instantaneous recalculation for a change in model inputs. c. come preloaded on computers. d. have specialized functions to perform detailed analysis. Answer: B Difficulty: Moderate LO: 7.1, Page 321 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Spreadsheet models are referred to as what-if models because they allow easy instantaneous recalculation for a change in model inputs using easy-to-use, sophisticated mathematical and logical functions. 3. In _____ decision making companies have to decide whether they should manufacture a product or outsource its production to another firm. a. goal seek b. two-way c. voting-based d. make-versus-buy Answer: D Difficulty: Easy LO: 7.1, Page 322 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: In a make-versus-buy decision companies have to decide whether they should manufacture a product or outsource its production to another firm.
Commented [JDC1]: I would argue part d is also true. ansrsource: We have changed distractor d.
4. The modeling process begins with the framing of the _____ that shows the relationships between the various parts of the problem being modeled. a. mathematical model b. conceptual model c. circular model d. correlation model Answer: B Difficulty: Easy LO: 7.1, Page 322 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: It is often useful to begin the modeling process with a conceptual model that shows the relationships between the various parts of the problem being modeled. 5. The conceptual model: a. helps in organizing the data requirements. b. controls the model inputs. c. has tools defined to identify the optimal solution. d. explores the effects of changing model parameters. Answer: A Difficulty: Moderate LO: 7.1, Page 322 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The conceptual model helps in organizing the data requirements and provides a road map for eventually constructing a mathematical model. A conceptual model also provides a clear way to communicate the model to others. 6. A(n) _____ is a visual representation that shows which entities influence others in a model. a. decision tree diagram b. influence diagram c. entity chart d. time series plot Answer: B Difficulty: Easy LO: 7.1, Page 322 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An influence diagram is a visual representation that shows which entities influence others in a model. 7. What do nodes in an influence diagram represent? a. Model parts
b. Influence levels c. Road maps d. Environmental factors Answer: A Difficulty: Easy LO: 7.1, Page 322 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Nodes in an influence diagram represent the model parts. 8. The influence in an influence diagram is visually depicted by: a. a circular symbol b. an arrow c. a straight line d. the height of the influence diagram Answer: B Difficulty: Easy LO: 7.1, Page 322 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The influence in an influence diagram is visually depicted by an arrow. 9. Which of the following approaches is a good way to proceed with the influence diagram building for a problem? a. The influence diagram for the entire problem is build first and then separate portions are clustered to form separate models. b. The influence diagram for all the model parts at the same level are built in parallel to reduce the likelihood of error. c. The influence diagram is reverse engineered –the diagram is developed in the opposite direction starting with the model output. d. The influence diagram for a portion of the problem is build first and then expanded until the total problem is conceptually modeled. Answer: D Difficulty: Moderate LO: 7.1, Page 322 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The influence diagram for a portion of the problem is build first and then expanded until the total problem is conceptually modeled. 10. The modularity approach of building the influence diagram for a portion of the problem first and then expanding until the total problem is conceptually modeled: a. improves complexity of the modeling process.
b. reduces the likelihood of error. c. results in the propagation of errors. d. helps avoid construction of the mathematical and spreadsheet models. Answer: B Difficulty: Easy LO: 7.1, Page 322 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The modularity approach of building the influence diagram for a portion of the problem first and then expanding until the total problem is conceptually modeled reduces the likelihood of error. Reference - 7.1: Use the influence diagram given below to answer questions 12-14.
11. Reference - 7.1. Which of the following would be a likely mathematical expression for Total Cost? a. Total Cost = Total Variable Cost × Material Cost per unit + Fixed Cost b. Total Cost = Fixed Cost + Total Variable Cost c. Total Cost = Total Variable Cost + Total Revenue × Production Volume d. Total Cost = Fixed Cost × Total Variable Cost – Production Volume Answer: B Difficulty: Moderate LO: 7.1, Pages 322-324 Bloom’s: Application BUSPROG: Analytic
DISC: Feedback: Total Cost = Fixed Cost + Total Variable Cost is the correct expression according to the given conceptual model. 12. Which of the following would be a likely mathematical expression for Total Variable Cost? a. Total Variable Cost = Production Volume × Revenue per Unit b. Total Variable Cost = Material Cost per Unit × Labor Cost per Unit c. Total Variable Cost = Total Cost – (Material Cost per Unit + Labor Cost per Unit) d. Total Variable Cost = (Material Cost per Unit + Labor Cost per Unit) × Production Volume Answer: D Difficulty: Moderate LO: 7.1, Pages 322-324 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: Total Variable Cost = Material Cost per Unit × Labor Cost per Unit × Production Volume. 13. Which of the following would be a likely mathematical expression for Total Revenue? a. Total Revenue = Production Volume + Revenue per Unit b. Total Revenue = Profit – Production Volume × Revenue per Unit c. Total Revenue = Production Volume × Revenue per Unit d. Total Revenue = Total Variable Cost + Production Volume + Revenue per Unit Answer: C Difficulty: Easy LO: 7.1, Pages 322-324 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: Total Revenue = Production Volume × Revenue per Unit. 14. With reference to a what-if model, an uncontrollable model input is known as a(n) _____. a. decision variable b. dummy variable c. parameter d. outlier Answer: C Difficulty: Easy LO: 7.1, Page 325 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: In a what-if model, a parameter refers to an uncontrollable model input. 15. A(n) _____ refers to a model input that the decision maker can control in a what-if model. a. decision variable
b. outlier c. parameter d. dummy variable Answer: A Difficulty: Easy LO: 7.1, Page 325 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A model input the decision maker can control is referred to as a decision variable. 16. Which of the following design guidelines, if followed, enables the user to update the model parameters without the risk of mistakenly creating an error in a formula? a. Separating the parameters from the spreadsheet model b. Documenting the spreadsheet model c. Using numbers in the spreadsheet formula d. Using simple formulas Answer: A Difficulty: Moderate LO: 7.1, Page 326 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Separating the parameters from the model enables the user to update the model parameters without the risk of mistakenly creating an error in a formula. 17. Navigation in a spreadsheet model can be facilitated by: a. using different spreadsheets for each formula in the model. b. using long calculations in the cells. c. using clear labels and proper formatting and alignment. d. referencing data by using hyperlinks to the problem statement. Answer: C Difficulty: Easy LO: 7.1, Page 326 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Clear labels and proper formatting and alignment in a spreadsheet model facilitate navigation and understanding. 18. An Excel _____ quantifies the impact of changing the value of a specific input on an output of interest. a. Watch Window b. Data Table c. Goal Seek d. Chart
Commented [JDC2]: I think you could argue this is true as well. I’d change from Trial-and-error to something else - - maybe Influence Diagram. ansrsource: We have replaced distractor D with “Chart”.
Answer: B Difficulty: Moderate LO: 7.2, Page 327 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An Excel Data Table quantifies the impact of changing the value of a specific input on an output of interest. 19. A one-way data table summarizes: a. a single input’s impact on the output of interest. b. multiple input’s impact on a single output of interest. c. values of the input cells that will cause the single output value to equal zero. d. values of cells when not all of the model is observable on the screen. Answer: A Difficulty: Easy LO: 7.2, Page 327 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A one-way data table summarizes a single input’s impact on the output of interest. 20. The impact of two inputs on the output of interest is given by a _____. a. Goal Seek b. Watch Window c. multiple-way data table d. two-way data table Answer: D Difficulty: Easy LO: 7.2, Page 327 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An Excel two-way data table summarizes two inputs’ impact on the output of interest. 21. Excel’s _____ tool allows the user to determine the value of an input cell that will cause the value of a related output cell to equal some specified value. a. Goal Seek b. Watch Window c. Data Validation d. XLMiner Answer: A Difficulty: Easy LO: 7.2, Page 331 Bloom’s: Knowledge
BUSPROG: Analytic DISC: Feedback: Excel’s Goal Seek tool allows the user to determine the value of an input cell that will cause the value of a related output cell to equal some specified value. 22. The SUM function in Excel: a. adds up all the numbers in the cells diagonally. b. adds up only positive numbers in a range of cells. c. adds up all the numbers in a range of cells. d. adds up the cells specified by a given condition or criteria. Answer: C Difficulty: Easy LO: 7.3, Page 332 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The SUM function adds up all of the numbers in a range of cells. 23. The _____ function pairs each element of the first array with its counterpart in the second array, multiplies the elements of the pairs together, and adds the results. a. SUM b. SUMPRODUCT c. SUMIF d. VLOOKUP Answer: B Difficulty: Easy LO: 7.3, Page 332 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The SUMPRODUCT function returns the sum of the products of elements in a set of arrays. 24. With reference to the SUMPRODUCT function, which of the following statements is true? a. The range of cells for each array must contain only nonzero values. b. Any cell that does not satisfy the specified given condition or criteria will not be considered. c. The array appearing as the first argument must be sorted in ascending order. d. The arrays that appear as arguments must be of the same dimension. Answer: D Difficulty: Moderate LO: 7.3, Page 333 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The arrays that appear as arguments to the SUMPRODUCT function must be of the same dimension.
Commented [JDC3]: I would change option B - - it is true in that you can select the input range for the formula. ansrsource: Option B reads “b. adds up the number of cells in a selected range.” We use “Count” function to find the number of cells. However, we have changed this option to “b. adds up only positive numbers in a range of cells.” to avoid confusion.
25. The _____ function is used for the conditional computation of expressions in Excel. a. MAX b. IF c. SUMSQ d. NOT Answer: B Difficulty: Easy LO: 7.3, Page 335 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The IF function is used for the conditional computation of expressions in Excel. 26. The arguments supplied to the IF function are: a. the condition for execution, the result if condition is true, and the result if condition is false. b. the range of cells and the condition for execution. c. the array1 of data cells, the array2 of data cells, and the condition for execution. d. the condition for execution only. Answer: A Difficulty: Moderate LO: 7.3, Page 335 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The arguments supplied to the IF function are the condition for execution, the result if condition is true, and the result if condition is false. 27. Within a given range of cells, the number of times a particular condition is satisfied is computed by using the _____ function. a. SUMIF b. IF c. VLOOKUP d. COUNTIF Answer: D Difficulty: Easy LO: 7.3, Page 335 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The COUNTIF function is used to compute the number of times a particular condition is satisfied within a given range of cells. 28. The _____ function allows the user to pull a subset of data from a larger table of data based on some criterion. a. VLOOKUP
b. IF c. SUMIF d. COUNTIF Answer: A Difficulty: Easy LO: 7.3, Page 337 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The VLOOKUP function allows the user to pull a subset of data from a larger table of data based on some criterion. 29. The condition that VLOOKUP assumes is that: a. there are no nonzero values in the range. b. all the arguments are of the same dimension. c. the first column of the table is sorted in ascending order. d. the columns with empty cells are to be neglected. Answer: C Difficulty: Moderate LO: 7.3, Page 337 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The condition that VLOOKUP assumes is that the first column of the table is sorted in ascending order. 30. The VLOOKUP with range set to _____ takes the first argument and searches the first column of the table for the last row that is strictly less than the first argument. a. FALSE b. TRUE c. LESS d. NULL Answer: B Difficulty: Easy LO: 7.3, Page 337 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The VLOOKUP with range set to TRUE takes the first argument and searches the first column of the table for the last row that is strictly less than the first argument. 31. Excel searches for an exact match of the first argument in the first column of the data when the range in the VLOOKUP function is _____. a. TRUE b. NULL c. FALSE
d. EXACT Answer: C Difficulty: Easy LO: 7.3, Page 337 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: If the range in the VLOOKUP function is FALSE, Excel searches for an exact match of the first argument in the first column of the data. 32. The _____ button, located in the Formula Auditing group, creates arrows pointing to the selected cell from cells that are part of the formula in that cell. a. Trace Dependents b. Trace Precedents c. Error Checking d. Watch Window Answer: B Difficulty: Easy LO: 7.4, Page 339 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Trace Precedents button, located in the Formula Auditing group, creates arrows pointing to the selected cell from cells that are part of the formula in that cell. 33. Arrows pointing from the selected cell to cells that depend on the selected cell are generated by using the _____ button of the Formula Auditing group. a. Error Checking b. Trace Precedents c. Trace Dependents d. Watch Window Answer: C Difficulty: Easy LO: 7.4, Page 339 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Trace Dependents button of the Formula Auditing group shows arrows pointing from the selected cell to cells that depend on the selected cell. 34. The function of Trace Precedents and Trace Dependents is to: a. highlight errors in copying and formula construction. b. ascertain how model parts are segregated. c. trace the range of cells included in the Watch Window box list. d. investigate the cell calculations in great detail.
Answer: A Difficulty: Moderate LO: 7.4, Page 340 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The function of Trace Precedents and Trace Dependents is to highlight errors in copying and formula construction by showing that incorrect sections of the worksheet are referenced. 35. The _____ button in the Formula Auditing group allows the user to inspect each formula in detail in its cell location. a. Evaluate Formula b. Error Checking c. Watch Window d. Show Formulas Answer: D Difficulty: Easy LO: 7.4, Page 340 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The use of Show Formulas button allows the user to inspect each formula in detail in its cell location. 36. The calculations of a cell can be investigated in great detail by using the _____ button. a. Calculation Options b. Evaluate Formula c. Error Checking d. Show Formulas Answer: B Difficulty: Easy LO: 7.4, Page 340 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Evaluate Formula button allows the user to investigate the calculations of a cell in great detail. 37. Which of the following tools provides an excellent means of identifying the exact location of an error in a formula? a. Error Checking b. Function Library c. Show Formulas d. Evaluate Formula Answer: D
Difficulty: Moderate LO: 7.4, Page 341 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Evaluate Formula tool provides an excellent means of identifying the exact location of an error in a formula. 38. The _____ button provides an automatic means of checking for mathematical errors within formulas of a worksheet. a. Error Checking b. Trace Precedents c. Watch Window d. Math & Trig Answer: A Difficulty: Easy LO: 7.4, Page 341 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Error Checking button provides an automatic means of checking for mathematical errors within formulas of a worksheet. 39. The user can monitor how listed cells change with a change in the model without searching through the worksheet or changing from one worksheet to another by using the _____ functionality. a. Goal Seek b. Evaluate Formula c. Watch Window d. Trial-and-error Answer: C Difficulty: Easy LO: 7.4, Page 342 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The user can monitor how listed cells change with a change in the model without searching through the worksheet or changing from one worksheet to another by using the Watch Window functionality. 40. The Watch Window is observable: a. only when the complete model is observable on the screen. b. only in the same worksheet of a workbook. c. across different worksheets of a workbook. d. across different workbooks in the same folder. Answer: C
Difficulty: Moderate LO: 7.4, Page 343 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The Watch Window is observable across different worksheets of a workbook. Problems 1. The Gatson manufacturing company has estimated the following components for a new product. Fixed cost = $50,000 Material cost per unit = $2.15 Labor cost per unit = $2.00 Revenue per unit = $7.50 Note that fixed cost is incurred regardless of the amount produced. Per-unit material and labor cost together make up the variable cost per unit. Profit is calculated by subtracting the fixed cost and total variable cost from total revenue, assuming the company sells all its produced goods. a. b. c. d.
Build an influence diagram that illustrates how to calculate profit. Using mathematical notation, give a mathematical model for calculating profit. Implement your model from part (b) in Excel using the principles of good spreadsheet design. If the company decides to make 70,000 units of the new product, what will be the resulting profit?
Answer: a.
b. Let, q = production volume R = revenue per unit FC = the fixed costs of production MC = material cost per unit LC = labor cost per unit P(q) = total profit for producing (and selling) q units P(q) = (R) × (q) – FC – (MC) × (q) – (LC) × (q) c.
d. Profit of $184,500 will be earned from a production volume of 70,000. Difficulty: Easy LO: 7.1, Pages 322-327 Bloom’s: Application BUSPROG: Analytic DISC:
2. The Gatson manufacturing company has estimated the following components for a new product. Fixed cost = $50,000 Material cost per unit = $2.15 Labor cost per unit = $2.00 Revenue per unit = $7.50 Note that per-unit material and labor cost together make up the variable cost per unit. a. Using the spreadsheet model, construct a one-way data table with production volume as the column input and profit as the output. Breakeven occurs when profit goes from a negative to a positive value, that is, breakeven is when total revenue = total cost, yielding a profit of zero. Vary production volume from 0 to 100,000 in increments of 10,000. In which interval of production volume does breakeven occur? b. Use Goal Seek to find the exact breakeven point. Assign Set cell: equal to the location of profit, To value: = 0, and By changing cell: equal to the location of the production volume in your model.
Answer: a.
Breakeven appears in the production volume interval of 10,000 to 20,000 units. b.
The breakeven point is 14,925 units. Difficulty: Moderate LO: 7.2, Pages 327-332 Bloom’s: Application BUSPROG: Analytic DISC: 3. The Gatson manufacturing company has estimated the following components for a new product. Fixed cost = $50,000 Material cost per unit = $2.15 Labor cost per unit = $2.00 Revenue per unit = $7.50 Note that per-unit material and labor cost together make up the variable cost per unit. Use a two-way data table to show how the profit changes as a function of different production volumes and different values of material cost per unit. Vary the production volume from 0 to 100,000 in increments of 10,000. The five different material costs are $1.50, $1.95, $2.15, $2.85, and $3.25. Answer:
Difficulty: Easy LO: 7.2, Pages 327-330 Bloom’s: Application BUSPROG: Analytic DISC: 4. A company asked one of their analysis team to analyze and create models that help decide whether they should manufacture a particular product or outsource its production. The different components are given below: Fixed Cost, FC = $25,000 Material Cost per Unit, MC = $2.15 Labor Cost per Unit, LC = $2.00 Outsourcing Cost per Unit, O = $4.50 Note that per-unit material and labor cost together make up the variable cost per unit. a. Build an influence diagram that illustrates how to calculate the difference in cost of manufacturing and outsourcing. b. Using mathematical notation, give a mathematical model for calculating the difference in cost of manufacturing and outsourcing. c. Implement your model from part (b) in Excel using the principles of good spreadsheet design. d. If the company wants to make 30,000 units of a particular product, what are the savings due to outsourcing? Answer: a.
b. The cost-volume model for producing q units is TMC = FC + ((MC + LC) × q) And, a mathematical model for outsourcing q units is TOC = O × q Here, TMC = Total Manufacturing Cost TOC = Total Outsource Cost c.
d. The model shows that the cost of manufacturing 30,000 units is $149,500 and the cost of outsourcing the same 30,000 units is $135,000. The savings from outsourcing is $14,500. Difficulty: Moderate LO: 7.1, Pages 322-327 Bloom’s: Application
BUSPROG: Analytic DISC: 5. A company asked one of their analysis team to analyze and create models that help decide whether they should manufacture a particular product or outsource its production. The different components are given below: Fixed Cost, FC = $25,000 Material Cost per Unit, MC = $2.15 Labor Cost per Unit, LC = $2.00 Outsourcing Cost per Unit, O = $4.50 Note that per-unit material and labor cost together make up the variable cost per unit. a. Using the spreadsheet model, construct a one-way data table with production volume as the column input and savings due to outsourcing as the output. Breakeven occurs when profit goes from a positive to a negative value, that is, breakeven is when total outsource cost = total cost, yielding savings due to outsourcing equal to zero. Vary production volume from 0 to 100,000 in increments of 10,000. In which interval of production volume does breakeven occur? b. Use Goal Seek to find the exact breakeven point. Assign Set cell: equal to the location of savings due to outsourcing, To value: = 0, and By changing cell: equal to the location of the production volume in your model. Answer: a.
Breakeven appears in the production volume interval of 70,000 to 80,000 units.
b.
The breakeven point is approximately 71,429 units. Difficulty: Moderate LO: 7.2, Pages 327-332 Bloom’s: Application BUSPROG: Analytic DISC: 6. A company asked one of their analysis team to analyze and create models that help decide whether they should manufacture a particular product or outsource its production. The different components are given below: Fixed Cost, FC = $25,000 Material Cost per Unit, MC = $2.15 Labor Cost per Unit, LC = $2.00 Outsourcing Cost per Unit, O = $4.50
Note that per-unit material and labor cost together make up the variable cost per unit. Use a two-way data table to show how the savings due to outsourcing changes as a function of different production volume and different bids on per-unit cost for outsourcing. Vary the production volume from 0 to 100,000 in increments of 10,000. The six bids are $3.11, $3.49, $4.50, $4.98, $5.12, and $5.45. Answer:
Difficulty: Easy LO: 7.2, Pages 327-330 Bloom’s: Application BUSPROG: Analytic DISC: 7. A clothing retail store offers a discount at the rate of 10% on the customer bill if the purchase exceeds $100. The owner of the store wishes to know the total amount that has been discounted on the customers’ purchases on a particular day. The purchase amount for each of the 12 customers who visited the store on that day is given below. Use the IF and SUM functions to find the total amount discounted on this single day purchases. Customer 1 2 3 4 5 6 7 8 9 10 11
Purchase Amount ($) 123 114 43 123 32 72 119 52 89 116 176
12
9
Answer:
The total amount discounted on a particular day is $77.1. Difficulty: Moderate LO: 7.3, Pages 332-336 Bloom’s: Application BUSPROG: Analytic DISC: 8. Given below is a sample list of 20 products in a grocery store with the product code, the price, and the associated discount rates. Product code A003 A345 B985 C765 F302 B453 A109 F432 D234 B432 D765 A406 D203 F405 C432 C106
Price ($) 4.00 2.70 4.50 1.50 3.00 6.80 9.50 4.80 5.40 2.60 6.90 2.60 5.40 3.60 5.20 3.20
Discount (%) 5 0 5 0 5 10 10 5 10 0 10 5 10 0 5 5
Commented [JDC4]: Maybe provide a data file? ansrsource: The required data is provided below the question and we have been following this for all the completed chapters till date.
D324 F456 A156 B654
1.30 5.20 2.50 1.10
0 10 5 0
a. Use the VLOOKUP function to find the price of the products A109, F432, B985, D203, C432, B654, and A345. b. Use the COUNTIF function to determine the number of products associated with each discount rate - 0%, 5%, and 10%, from the provided list. Answer: a.
b.
Difficulty: Moderate LO: 7.3, Pages 335-338 Bloom’s: Application BUSPROG: Analytic DISC:
9. The average cost/unit for the production of a particular component at a manufacturing plant varies with the number of units produced in each batch. The data are given below: Number of units produced 0-49 50-100
Cost/unit $37.72 $25.02
Suppose the selling price of each unit is $35. a. Build a model to calculate the profit of the manufacturing industry if the demand is 20. b. Construct a data table that shows the profit per unit as a function of demand if the demand ranges between 20 units through 80 units in increments of 10 units. Answer: a.
b.
Difficulty: Moderate LO: 7.2, Pages 327-330 Bloom’s: Application BUSPROG: Analytic DISC:
10. The average cost/unit for the production of a particular component at a manufacturing plant varies with the number of units produced in each batch. The data are given below. Number of units produced 0-49 50-100
Cost/unit 37.72 25.02
Suppose the selling price of each unit is $35.
Use a two-way data table to show how the profit changes as a function of demand and the selling price of the product. Vary the demand from 20 units to 80 units in increments of 10 units and selling price from $30 to $40 in increments of $2. Answer:
Difficulty: Easy LO: 7.2, Pages 327-330 Bloom’s: Application BUSPROG: Analytic DISC:
11. An electronics store sells two models of television. The sales of these two models, X and Y, are dependent, that is, if the price of one increases, the demand for the other increases. A study is made to find the relationship between the demand (D) and the price (P) in order to maximize the revenue from these products. The result of the study is shown below: DX = 476 – 0.54 PX + 0.22 PY DY = 601 + 0.12 PX – 0.54 PY a. Construct a model for the total revenue and implement it on a spreadsheet. b. Develop a two-way data table to estimate the optimal prices of each of the two products in order to maximize the total revenue. Vary price of each product from $600 to $900 in increments of $50. Answer: a.
b. From the table shown below, the maximum revenue occurs at prices $700 and $800 for TV models X and Y, respectively.
Difficulty: Challenging LO: 7.2, Pages 327-330 Bloom’s: Application BUSPROG: Analytic DISC:
12. The selling price of each product sold in a furnishing showroom, and the number of units of each of these product sold during a period of one month are given below. The rental cost of the showroom is $225 and the other costs incurred are included in the cost/unit. Product code AD12 FD23 BD34 AG56 ET76 FA56
Price/Unit($) 232 334 342 267 345 235
Cost/Unit($) 162.4 233.8 239.4 186.9 241.5 164.5
Units 12 24 5 11 15 23
Commented [JDC5]: Provide a data file? ansrsource: The required data is provided below the question and we have been following this for all the completed chapters till date.
DE78 BF32
546 245
382.2 171.5
34 22
The manager would like to know the profit gained in this month. There are several ways to get this information from the given data set. One way is to use the SUMPRODUCT function. The SUMPRODUCT function returns the sum of the products of elements in a set of arrays. The general form of the function is =SUMPRODUCT(array1, [array2], [array3], …) Use this SUMPRODUCT function to find the profit earned by the showroom in a month. Answer:
The profit earned is $14,769.30. Difficulty: Moderate LO: 7.3, Pages 332-333 Bloom’s: Application BUSPROG: Analytic DISC: 13. Suppose a company supplies four of its products A, B, C, and D, to five different regions. The management wanted to know the total number of all products supplied to each region and the total units of each product supplied. The data collected over a period of month are given below:
Region Region 1 Region 2 Region 3 Region 4 Region 5 Region 1 Region 2 Region 3 Region 4 Region 5 Region 1 Region 2
Model A A A A A B B B B B C C
Number of units 1784 2170 415 2040 2991 947 2111 1234 2061 607 2907 4790
Commented [JDC6]: Provide a data file? ansrsource: The required data is provided below the question and we have been following this for all the completed chapters till date.
Region 3 Region 4 Region 5 Region 1 Region 2 Region 3 Region 4 Region 5
C C C D D D D D
2191 1942 220 2557 2980 1518 2957 4462
There are several ways to get this information from the given data set. One way is to use the SUMIF function. The SUMIF function extends the SUM function by allowing the user to add the values of cells meeting a logical condition. The general form of the function is =SUMIF(test range, condition, range to be summed) Using the SUMIF function, find the total volume by each region and total volume by each product. Answer:
Difficulty: Moderate LO: 7.3, Pages 332-338 Bloom’s: Application BUSPROG: Analytic DISC:
14. John would like to establish a retirement plan that returns an amount of $100,000 after a period of 20 years from now. Build a spreadsheet model to calculate the amount John must contribute at the end of each year towards his retirement fund, assuming an annual interest rate of 6%. Use the Excel function =PMT(rate, nper, pv, fv, type) The arguments of this function are rate = the interest rate for the loan nper = the total number of payments pv = present value (the amount borrowed which is 0 in this case) fv = future value (in the formula, indicate this value as negative as the future value command assumes a stream of payments not deposits) type = payment type (0 = end of period, 1 = beginning of the period)
Also, construct a one-way table with interest rate as the column variable and the amount contributed at the end of each year as the output. Vary the interest rate from 4% to 7% in increments of 0.5%. Answer:
Key cell formula:
Difficulty: Moderate LO: 7.1, 7.2, and 7.3, Pages 324-338 Bloom’s: Application BUSPROG: Analytic DISC: 15. Starsystems is a small information systems company that employs 50 workers. The employee details for a particular month are given below:
Name John Olivia Gabriel Logan James Ava Isabella Sophia Joshua Abigail Anthony Matthew Jayden Emily Alexis Angel Ryan Michael Grace Julia Ella Noah Madison Tyler Jose Samantha Lily Elizabeth Anna Luis Jackson Aiden Madison Lillian Natalie Christopher Taylor Wyatt Chloe
Age (in years)
Gender
Work experience (in years)
Income (in 1000 $)
Number of leaves taken in a month
47 26 38 37 44 55 44 30 63 34 52 55 52 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52
M F M M M F F F M F M M M F F M M M F F F M F M M F F F F M M M F F F M F M F
22 3 16 12 22 30 23 5 35 8 26 25 28 29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24
53 22 29 32 32 45 50 22 56 23 29 34 45 23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22
0 3 1 4 0 5 0 4 2 3 2 0 1 1 0 4 0 3 0 5 0 1 1 2 2 3 1 0 4 3 1 4 0 0 1 3 1 2 3
Jack Sarah Mason Mason Alanis Brooklyn Jessica Chase Aiden David Andrew
56 47 54 25 40 61 29 52 56 61 26
M F M M F F F M M M M
31 24 31 2 16 30 6 25 31 33 4
19 34 45 21 34 49 34 39 54 43 23
3 2 1 4 5 0 4 0 1 0 2
a. The administrative manager of the company wanted to know the total number of employees who were on leave for 4 days and 5 days in this month. Use COUNTIF function to determine this. b. Now, the manger wanted the details of employees, Ava, Julia, and Alanis who are working in the company. Use VLOOKUP function to get these employees details. Answer: a. A part of the spreadsheet model is given below:
Key cell formulas:
b.
A part of the spreadsheet model is given below:
Key cell formulas:
Difficulty: Moderate LO: 7.3, Pages 333-338 Bloom’s: Application BUSPROG: Analytic DISC:
16. Starsystems is a small information systems company which employs 50 workers. The employee details are given below:
Name John Olivia Gabriel Logan James Ava Isabella Sophia Joshua Abigail Anthony Matthew Jayden Emily Alexis Angel Ryan Michael Grace Julia Ella Noah Madison Tyler Jose Samantha Lily Elizabeth Anna Luis Jackson Aiden Madison Lillian Natalie Christopher
Age (in years)
Gender
Work experience (in years)
Income (in 1000 $)
Number of leaves taken in a month
47 26 38 37 44 55 44 30 63 34 52 55 52 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50
M F M M M F F F M F M M M F F M M M F F F M F M M F F F F M M M F F F M
22 3 16 12 22 30 23 5 35 8 26 25 28 29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23
53 22 29 32 32 45 50 22 56 23 29 34 45 23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32
0 3 1 4 0 5 0 4 2 3 2 0 1 1 0 4 0 3 0 5 0 1 1 2 2 3 1 0 4 3 1 4 0 0 1 3
Taylor Wyatt Chloe Jack Sarah Mason Mason Alanis Brooklyn Jessica Chase Aiden David Andrew
57 38 52 56 47 54 25 40 61 29 52 56 61 26
F M F M F M M F F F M M M M
25 15 24 31 24 31 2 16 30 6 25 31 33 4
32 25 22 19 34 45 21 34 49 34 39 54 43 23
1 2 3 3 2 1 4 5 0 4 0 1 0 2
a. Find the total number of male and female employees who are working in this company. Use the COUNIF function. b. Using SUMIF function, find the average incomes of both male and female employees who are working in this company. Answer: a. A part of the spreadsheet model is given below:
Key cell formulas:
b. A part of the spreadsheet model is given below:
Key cell formulas:
Difficulty: Challenging LO: 7.3, Pages 332-338 Bloom’s: Application BUSPROG: Analytic DISC:
17. Anna operates a consignment shop where she sells cloths for women and children. The average consignments sold per month is 1000. The average material cost and the selling price of each consignment is $8 and $20, respectively. The monthly fixed cost to run this business is given below: Rental cost: $750 Utilities: $150 Advertising: $35 Insurance: $100 Labor cost: $4000 a. Build an influence diagram that illustrates how to calculate profit. b. Using mathematical notation, give a mathematical model for calculating profit. c. Implement your model from part (b) in Excel using the principles of good spreadsheet design. Answer: a.
b.
Let R - Rental cost U - Utilities A - Advertising I - Insurance L - Labor cost M - Material cost per consignment C - Consignments sold per month S - Selling price Profit = (C × S) - ((M × C) + (R + U + A + I + L))
c.
Difficulty: Moderate LO: 7.1, Pages 322-327 Bloom’s: Application BUSPROG: Analytic DISC: 18. Anna operates a consignment shop where she sells cloths for women and children. The average consignments sold per month is 1000. The average material cost and the selling price of each consignment is $8 and $20, respectively. The monthly fixed cost to run this business is given below: Rental cost: $750 Utilities: $150 Advertising: $35 Insurance: $100 Labor cost: $4000 a. Using the spreadsheet model, construct a one-way data table with number of consignments sold per month as the column input and profit as the output. Breakeven occurs when profit goes from a negative to a positive value. Vary number of consignments sold per month from 400 to 1200 in increments of 100. In which interval does breakeven occur? b. Use Goal Seek to find the exact breakeven point. Assign Set cell: equal to the location of profit, To value: = 0, and By changing cell: equal to the location of the number of sold consignments in your model.
Answer: a.
Breakeven point occurs in the interval of 400 to 500 consignments sold per month. b.
The breakeven point is approximately 420 consignments. Difficulty: Moderate LO: 7.2, Pages 327-332 Bloom’s: Application BUSPROG: Analytic DISC:
19. Anna operates a consignment shop where she sells cloths for women and children. The average consignments sold per month is 1000. The average material cost and the selling price of each consignment is $8 and $20, respectively. The monthly fixed cost to run this business is given below: Rental cost: $750 Utilities: $150 Advertising: $35 Insurance: $100 Labor cost: $4000 Use a two-way data table to show profit changes as a function of different number of consignments sold per month and different material costs. Vary the number of consignments from 400 to 1200 in increments of 100. The eight different material costs are $5.45, $6.23, $6.95, $7.54, $8.23, $8.88, $9, and $9.45. Answer:
Difficulty: Easy LO: 7.2, Pages 327-330 Bloom’s: Application BUSPROG: Analytic DISC: 20. Suppose you have $1100 and decided to purchase a new model of television that costs you $1100. You find an electronics store where a gift voucher, worth $50, is offered with this TV model if payment is made in full during the time of purchase, or it can be financed at 0 percent interest for 5 months with a monthly payment of $220. You now have two options: either invest your amount for an annual interest rate of 10% and opt for 0 percent financing option for the TV purchase; or choose full payment option. Develop a spreadsheet model to find the better option that results in a good saving? Also, find the discount rate for 0 percent financing option. Hint: Use Goal Seek to find the discount rate that makes the net present value of the payments = $1050. Answer:
Key cell formulas:
The 0 percent interest option saves $45.83 whereas the full payment option saves $50. Hence, it would be better if full payment is made at the time of purchase. The discount rate for 0 percent financing option is 28.58%. Difficulty: Challenging LO: 7.1, 7.2, and 7.3, Pages 324-338 Bloom’s: Application BUSPROG: Analytic DISC:
Commented [JDC7]: This does not take into account that you have to withdraw payments from this money (so it overstates the savings). ansrsource: We have rephrased the question.
Chapter 8: Linear Optimization Models 1. The term _____ refers to the expression that defines the quantity to be maximized or minimized in a linear programming model. a. objective function b. problem formulation c. decision variable d. association rule Answer: A Difficulty: Easy LO: 8.1, Page 353 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The term objective function refers to the expression that defines the quantity to be maximized or minimized in a linear programming model. 2. Constraints are: a. quantities to be maximized in a linear programming model. b. quantities to be minimized in a linear programming model. c. restrictions that limit the settings of the decision variables. d. input variables that can be controlled during optimization. Answer: C Difficulty: Easy LO: 8.1, Page 353 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Constraints are restrictions that limit the settings of the decision variables. 3. _____ is the process of translating a verbal statement of a problem into a mathematical statement. a. Problem-solving approach b. Data preparation c. Data structuring d. Problem formulation Answer: D Difficulty: Easy LO: 8.1, Page 355 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Problem formulation is the process of translating a verbal statement of a problem into a mathematical statement. 4. A controllable input for a linear programming model is known as a _____.
a. b. c. d.
parameter decision variable dummy variable constraint
Answer: B Difficulty: Easy LO: 8.1, Page 355 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A controllable input for a linear programming model is known as a decision variable. 5. In problem formulation, the: a. objective is expressed in terms of the decision variables. b. constraints are expressed in terms of the obtained objective function coefficients. c. nonnegativity constraints are always ignored. d. optimal solution is decided upon. Answer: A Difficulty: Moderate LO: 8.1, Page 355 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In problem formulation, the objective is expressed in terms of the decision variables. 6. When formulating a constraint, care must be taken to ensure that: a. all the objective function coefficients are included. b. there are no inequalities in the mathematical expression. c. the decision variables are set at either maximum or minimum values. d. the units of measurement on both sides of the constraint match. Answer: D Difficulty: Moderate LO: 8.1, Page 356 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: When formulating a constraint, the units of measurement on the left-hand side of the constraint must match the units of measurement on the right-hand side. 7. Nonnegativity constraints ensure that a. the problem modeling includes only nonnegative values in the constraints. b. the solution to the problem will contain only nonnegative values for the decision variables. c. the objective function of the problem always returns maximum quantities. d. there are no inequalities in the constraints. Answer: B
Difficulty: Moderate LO: 8.1, Page 357 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Nonnegativity constraints ensure that the solution to the problem will contain only nonnegative values for the decision variables. 8. A mathematical function in which each variable appears in a separate term and is raised to the first power is known as a _____. a. power function b. linear function c. what-if function d. nonlinear function Answer: B Difficulty: Easy LO: 8.1, Page 357 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A mathematical function in which each variable appears in a separate term and is raised to the first power is known as a linear function. 9. The _____ assumption necessary for a linear programming model to be appropriate, means that the contribution to the objective function and the amount of resources used in each constraint are in accordance to the value of each decision variable. a. proportionality b. divisibility c. additivity d. negativity Answer: A Difficulty: Easy LO: 8.1, Page 357 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The proportionality assumption necessary for a linear programming model to be appropriate, means that the contribution to the objective function and the amount of resources used in each constraint are in accordance to the value of each decision variable. 10. The assumption that is necessary for a linear programming model to be appropriate and that ensures that the value of the objective function and the total resources used can be found by summing the objective function contribution and the resources used for all decision variables is known as _____. a. proportionality b. negativity
c. additivity d. divisibility Answer: C Difficulty: Easy LO: 8.1, Page 357 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The additivity assumption means that the value of the objective function and the total resources used can be found by summing the objective function contribution and the resources used for all decision variables. 11. In a linear programming model, the _____ assumption plus the nonnegativity constraints mean that decision variables can take on any value greater than or equal to zero. a. proportionality b. divisibility c. additivity d. negativity Answer: B Difficulty: Easy LO: 8.1, Page 357 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: In a linear programming model, the assumption divisibility means that the decision variables are continuous. The divisibility assumption plus the nonnegativity constraints mean that decision variables can take on any value greater than or equal to zero. 12. A(n) _____ solution satisfies all the constraint expressions simultaneously. a. feasible b. objective c. infeasible d. extreme Answer: A Difficulty: Easy LO: 8.2, Page 358 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A feasible solution satisfies all the constraints simultaneously. 13. In the case of a linear model with two decision variables, if the constraints are in the form of inequalities they are visually represented by regions called as: a. half spaces. b. curves. c. 2-spaces.
d. regions of linear intersection. Answer: A Difficulty: Moderate LO: 8.2, Page 358 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: When we have only two decision variables and the functions of these variables are linear, they form lines in 2-space. If the constraints are inequalities, the constraint cuts the space into two, with the line and the area on one side of the line being the space that satisfies that constraint. These subregions are called half spaces. 14. The intersections of half spaces represent_____. a. objective functions b. feasible solutions c. decision variables d. 2-spaces Answer: B Difficulty: Moderate LO: 8.2, Page 358 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The intersections of half spaces represent feasible solutions. 15. The nonnegativity constraints create a feasible region that is: a. unbound by the horizontal axis only. b. an area with no point satisfying all the constraints. c. symmetric about the vertical axis around the origin. d. bound by the horizontal and vertical axes. Answer: D Difficulty: Moderate LO: 8.2, Page 358 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The nonnegativity constraints create a feasible region that is bound by the horizontal and vertical axes. 16. A(n) _____ refers to a set of points that yield a fixed value of the objective function. a. objective function coefficient b. infeasible solution c. objective function contour d. feasible region Answer: C
Difficulty: Easy LO: 8.2, Page 358 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An objective function contour refers to a set of points that yield a fixed value of the objective function. 17. The points where constraints intersect on the boundary of the feasible region are termed as _____. a. feasible points b. extreme vertices c. extreme points d. feasible edges Answer: C Difficulty: Easy LO: 8.2, Page 358 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The points where constraints intersect on the boundary of the feasible region are termed as extreme points. 18. Which algorithm, developed by George Dantzig, is effective at investigating extreme points in an intelligent way to find the optimal solution to even very large linear programs? a. The ellipsoidal algorithm b. The complex algorithm c. The trial-and-error algorithm d. The simplex algorithm Answer: D Difficulty: Easy LO: 8.2, Page 360 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The simplex algorithm, developed by George Dantzig, is quite effective at investigating extreme points in an intelligent way to find the optimal solution to even very large linear programs. 19. _____ is an Excel tool that utilizes Dantzig’s simplex algorithm to solve linear programs by systematically finding which set of constraints form the optimal extreme point of the feasible region. a. Data Analysis b. Goal Seeker c. Excel Solver d. Watch Window
Answer: C Difficulty: Easy LO: 8.2, Page 360 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Excel Solver is the software that utilizes Dantzig’s simplex algorithm to solve linear programs by systematically finding which set of constraints form the optimal extreme point of the feasible region. 20. A _____ refers to a constraint that holds as equality at the optimal solution. a. dummy variable b. first class constraint c. slack variable d. binding constraint Answer: D Difficulty: Easy LO: 8.2, Page 363 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A binding constraint refers to a constraint that holds as an equality at the optimal solution. 21. Geometrically, binding constraints intersect to form the _____. a. subspace b. optimal point c. decision cell d. zero slack Answer: B Difficulty: Moderate LO: 8.2, Page 363 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Geometrically, binding constraints intersect to form the optimal point. 22. The _____ value for each less-than-or-equal-to constraint indicates the difference between the left-hand and right-hand values for a constraint. a. objective function coefficient b. slack c. unbounded d. surplus Answer: B Difficulty: Easy LO: 8.2, Page 363
Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The slack value for each less-than-or-equal-to constraint indicates the difference between the left-hand and right-hand values for a constraint. 23. The slack value for binding constraints is: a. always a positive integer. b. zero. c. a negative integer. d. equal to the sum of the optimal points in the solution. Answer: B Difficulty: Moderate LO: 8.2, Pages 363-364 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The slack value for binding constraints is zero. 24. The _____ Report generated by Excel Solver gives information on the objective function value when variables are set to their limits. a. Answer b. Sensitivity c. Classical d. Limits Answer: D Difficulty: Easy LO: 8.2, Page 364 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Limits Report generated by Excel Solver gives information on the objective function value when variables are set to their limits. 25. A variable subtracted from the left-hand side of a greater-than-or-equal to constraint to convert the constraint into an equality is known as a(n) _____. a. surplus variable b. slack variable c. unbounded variable d. binding constraint Answer: A Difficulty: Easy LO: 8.3, Page 365 Bloom’s: Knowledge BUSPROG: Analytic DISC:
Feedback: A variable subtracted from the left-hand side of a greater-than-or-equal to constraint to convert the constraint into an equality is known as a surplus variable. 26. A scenario in which the optimal objective function contour line coincides with one of the binding constraint lines on the boundary of the feasible region leads to _____ solutions. a. infeasible b. alternative optimal c. binding d. unique optimal Answer: B Difficulty: Easy LO: 8.4, Page 367 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: A special case in which the optimal objective function contour line coincides with one of the binding constraint lines on the boundary of the feasible region leads to alternative optimal solutions; in such cases, more than one solution provides the optimal value for the objective function. 27. _____ is the situation in which no solution to the linear programming problem satisfies all the constraints. a. Unboundedness b. Divisibility c. Infeasibility d. Optimality Answer: C Difficulty: Easy LO: 8.4, Page 368 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Infeasibility is the situation in which no solution to the linear programming problem satisfies all the constraints, including the nonnegativity constraints. 28. Problems with infeasible solutions arise in practice because: a. management doesn’t specify enough restrictions. b. too many restrictions have been placed on the problem. c. of errors in objective function formulation. d. there are too few decision variables. Answer: B Difficulty: Moderate LO: 8.4, Page 368 Bloom’s: Comprehension BUSPROG: Analytic DISC:
Feedback: Problems with no feasible solution do arise in practice, most often because management’s expectations are too high or because too many restrictions have been placed on the problem. 29. The situation in which the value of the solution may be made infinitely large in a maximization linear programming problem or infinitely small in a minimization problem without violating any of the constraints is known as _____. a. infeasibility b. unbounded c. infiniteness d. semi-optimality Answer: B Difficulty: Easy LO: 8.4, Page 370 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Unbounded refers to the situation in which the value of the solution may be made infinitely large in a maximization linear programming problem or infinitely small in a minimization problem without violating any of the constraints. 30. Which of the following error messages is displayed in Excel Solver when attempting to solve an unbounded problem? a. Solver could not find a feasible solution. b. Solver cannot improve the current solution. All constraints are satisfied. c. Solver could not find a bounded solution. d. Objective Cell values do not converge. Answer: D Difficulty: Moderate LO: 8.4, Page 370 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Whenever you attempt to solve an unbounded problem using Excel Solver, you will receive a message in the Solver Results dialog box telling you that the “Objective Cell values do not converge.” 31. In linear programming models of real problems, the occurrence of an unbounded solution means that the: a. resultant values of the decision variables have no bounds. b. mathematical models sufficiently represent the real-world problems. c. problem formulation is improper. d. constraints have been excessively used in modeling. Answer: C Difficulty: Moderate LO: 8.4, Page 370
Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In linear programming models of real problems, the occurrence of an unbounded solution means that the problem has been improperly formulated. 32. The study of how changes in the input parameters of a linear programming problem affect the optimal solution is known as_____. a. regression analysis b. cluster analysis c. optimality analysis d. sensitivity analysis Answer: D Difficulty: Easy LO: 8.5, Page 372 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Sensitivity analysis is the study of how changes in the input parameters of a linear programming problem affect the optimal solution. 33. The change in the optimal objective function value per unit increase in the right-hand side of a constraint is given by the _____. a. objective function coefficient b. shadow price c. restrictive cost d. right-hand side allowable increase Answer: B Difficulty: Easy LO: 8.5, Page 372 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The change in the optimal objective function value per unit increase in the right-hand side of a constraint is given by the shadow price. 34. The shadow price of nonbinding constraints a. will always be zero. b. always take positive values. c. can never be equal to zero. d. is no longer valid if the right-hand side of the constraint remains the same. Answer: A Difficulty: Moderate LO: 8.5, Page 372 Bloom’s: Comprehension BUSPROG: Analytic
DISC: Feedback: Nonbinding constraints will always have a shadow price of zero. 35. The _____ for a decision variable is the shadow price of the nonnegativity constraint for that variable. a. range of optimality b. slack value c. reduced cost d. range of feasibility Answer: C Difficulty: Easy LO: 8.5, Page 374 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The reduced cost for a decision variable is the shadow price of the nonnegativity constraint for that variable. 36. The reduced cost indicates the change in the optimal objective function value that results from changing the right-hand side of the nonnegativity constraint from: a. 1 to 0. b. 0 to 1. c. 1 to ∞. d. ∞ to 1. Answer: B Difficulty: Moderate LO: 8.5, Page 374 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The reduced cost indicates the change in the optimal objective function value that results from changing the right-hand side of the nonnegativity constraint from 0 to 1. Reference - 8.1: Use the information given below to answer questions 37-38. Rob is a financial manager with Sharez, an investment advisory company. He must select specific investments—for example, stocks and bonds—from a variety of investment alternatives. 37. Reference - 8.1. Which of the following statements is most likely to be the objective function in this scenario? a. Minimization of the number of stocks held b. Maximization of expected return c. Minimization of tax dues d. Maximization of investment risk Answer: B Difficulty: Moderate
LO: 8.6, Page 375 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The objective function for portfolio selection problems usually is maximization of expected return or minimization of risk. 38. Reference - 8.1. Restrictions on the type of permissible investments would be a _____ in this case. a. feasible solution b. surplus variable c. slack variable d. constraint Answer: D Difficulty: Moderate LO: 8.6, Page 375 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The constraints, in portfolio selection problems, usually take the form of restrictions on the type of permissible investments, state laws, company policy, maximum permissible risk, and so on. Reference - 8.2: Use the information given below to answer questions 39-40. A canned food manufacturer has its manufacturing plants in three locations across a state. Their product has to be transported to three central distribution centers, which in turn disperse the goods to seventy-two stores across the state. 39. Reference - 8.2: Which of the following is most likely to be the objective function in this scenario? a. Increasing the number of goods manufactured at the plant b. Decreasing the cost of their raw material sourcing c. Minimizing the cost of shipping goods from the plant to the store d. Minimizing the quantity of goods distributed across the stores Answer: C Difficulty: Moderate LO: 8.6, Page 378 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The usual objective in a transportation problem is to minimize the cost of shipping goods from the origins to the destinations. 40. Reference - 8.2: Which of the following visualization tools could help understand this problem better?
a. b. c. d.
A time-series plot A scatter chart A network graph A contour plot
Answer: C Difficulty: Moderate LO: 8.6, Page 379 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The given scenario can be visualized using a network graph. Problems: 1. Gatson manufacturing company produces 2 types of tires: Economy tire; Premium tire. The manufacturing time and the profit contribution per tire are given in the following table:
Manufacturing Time (Hours) Operation
Time Available
Economy tires
Premium tires
Hours
Material Preparation Tire Building Curing
4/3 4/5 1/2
1/2 1 2/4
600 650 580
Final Inspection
1/5
1/3
120
Profit/Tire
$12
$10
Answer the following assuming that the company is interested in maximizing the total profit contribution. a. What is the linear programming model for this problem? b. Develop a spreadsheet model and find the optimal solution using Excel Solver. How many tires of each model should Gatson manufacture? c. What is the total profit contribution Gatson can earn with the optimal production quantities? Answer: a. Let E = number of economy tires manufactured P = number of premium tires manufactured Max 12E + 10P s.t. 4/3E + 1/2P ≤ 600 4/5E + P ≤ 650 1/2E + 2/4P ≤ 580 1/5E + 1/3P ≤ 120
– – – –
Material Preparation Tire Building Curing Final Inspection
E, P ≥ 0 b.
Gatson should manufacture about 406 Economy tires and about 116 Premium tires. c. With the optimal production quantities, the profit Gatson can earn is approximately $6039. Difficulty: Moderate LO: 8.1, 8.2, Pages 354-364 Bloom’s: Application BUSPROG: Analytic DISC: 2. Hire-a-Car System rents three types of cars at two different locations. The profit made per day for each car type and company at the two locations is listed below:
Company A B
Type 1 $25 $30
Car Type Type 2 $40 $35
Type 3 $10 $45
The management forecasts the demand per day by car type. A linear programming model developed to maximize profit is used to determine how many reservations to accept for each type of car. The demand forecast for a particular day is 125 rentals for Type 1 cars, 55 rentals for Type 2 cars, and 40 rentals for Type 3 cars. The company has 100 cars in location A and 120 cars in location B. Use linear programming to determine how many reservations to accept for each car type and how the reservations should be allocated to the different locations. Is the demand for any car type not satisfied? Explain. Answer: Let A1 = number of reservations made for Type 1 car of Company A A2 = number of reservations made for Type 2 car of Company A A3 = number of reservations made for Type 3 car of Company A B1 = number of reservations made for Type 1 car of Company B B2 = number of reservations made for Type 2 car of Company B B3 = number of reservations made for Type 3 car of Company B Max
25A1 + 40A2 + 10A3 + 30B1 + 35B2 + 45B3 s.t. A1 + A2 + A3 ≤ 100 B1 + B2 + B3 ≤ 120 A1 + B1 ≤ 125 A2 + B2 ≤ 55 A3 + B3 ≤ 40 A1, A2, …, B3 ≥ 0
The optimal solution obtained using Excel Solver shows the reservations to accept for each car type and their allocation to different locations. Also, the demand for all the car types is satisfied. Difficulty: Moderate LO: 8.1, 8.2, Pages 354-364 Bloom’s: Application BUSPROG: Analytic DISC: 3. The supervisor of a production company is trying to determine the number of two assembling parts, Part X and Part Y to be produced per day in three different sections of the plant. The time required for the production along with the profit contribution for each part are given in the following table: Time required (Minutes/Unit)
Part X Part Y Available time (minutes)
Section 1 50 80 3600
Section 2 30 45 2500
Section 3 18 22 1200
Profit/Unit $2 $3
Each part made (X and Y) must be processed in each of the three sections. No more than 60 units of Part X can be produced, but up to 70 units of Part Y can be produced per day. The company already has orders for 30 units of Part Y that must be satisfied. a. Develop a linear programming model and solve the model to determine the optimal production quantities of parts X and Y. b. If more time could be made available in Section 2, how much worth would it be? Answer: a. Let P1 = number of units of Part X produced P2 = number of units of Part Y produced Max
2P1 + 3P2 s.t. 50P1 + 80P2 ≤ 3600 30P1 + 45P2 ≤ 2500 18P1 + 22P2 ≤ 1200 P1 ≤ 60 P2 ≤ 70 P2 ≥ 30 P1, P2 ≥ 0
b. Nothing worth, because there are 430 minutes of slack time for the section 2 at the optimal solution. Difficulty: Moderate LO: 8.1, 8.2, Pages 354-364 Bloom’s: Application BUSPROG: Analytic DISC: 4. A beverage cans manufacturer makes 3 types of soft drink cans needed for the beverage producers to fill soft drinks of three different volumes. The maximum availability of the machines’ time allotted per day is 90 hours and the supply of metal is limited to 120 kg per day. The following table provides the details of the input needed to manufacture one batch of 100 cans. Cans
Metal (kg)/batch Machines’ Time (hr)/batch Profit/batch
Large 9 4.4 $50
Medium 6 4.2 $45
Small 5 4 $42
Maximum 120 90
Formulate and solve for the recommended production quantities for all the three different types cans by maximizing the profit. Answer: Let L = number of Large cans produced M = number of Medium cans produced S = number of Small cans produced Max 50L + 45M + 42S s.t. 9L + 6M + 5S ≤ 120 4.4L + 4.2M + 4S ≤ 90 L, M, S ≥ 0
Difficulty: Easy LO: 8.1, 8.2, Pages 354-364 Bloom’s: Application BUSPROG: Analytic DISC: 5. Robin Tires, Inc. makes two types of tires, for SUV’s and Hatchbacks. The firm has 500 hours of production time, 250 hours of packaging, and 150 hours available for shipping. The production time required per tire type is given in the following table:
Type SUV tires Hatchback tires
Production Hours Production Packaging Shipping 2 1.5 1 1.5
1
0.5
Profit/Tire $22 $12
Assuming that the company is interested in maximizing the total profit contribution, answer the following: a. What is the linear programming model for this problem? b. Develop a spreadsheet model and find the optimal solution using Excel Solver. How many tires of each model should Robin manufacture? c. What is the total profit contribution Robin can earn with the optimal production quantities? Answer: a. Let S = number of SUV tires manufactured
H = number of Hatchback tires manufactured Max 22S + 12H s.t. 2S + 1.5H ≤ 500 1.5S + H ≤ 250 S + 0.5H ≤ 150 S, H ≥ 0 b.
– – –
Production Packaging Shipping
Robin should manufacture about 100 SUV tires and about 100 Hatchback tires. c. With the optimal production quantities, the total profit Robin can earn is 100(22) + 100(12) = $3400. Difficulty: Moderate LO: 8.1, 8.2, Pages 354-364 Bloom’s: Application BUSPROG: Analytic DISC:
6. Robin Tires, Inc. makes two types of tires, for SUV’s and Hatchbacks. The firm has 500 hours of production time, 250 hours of packaging, and 150 hours available for shipping. The production time required per tire type is given in the following table:
Type SUV tires Hatchbacks tires
Production Hours Production Packaging Shipping 2 1.5 1 1.5
1
0.5
Profit/Tire $22 $12
Assuming that the company is interested in maximizing the total profit contribution, find the optimal solution using Excel Solver and answer the following: a. How many hours of production time will be scheduled in each department? b. What is the slack time in each department? c. If one more hour is available for packaging, what is the change in profit? d. What is the change in profit if one more hour is available for shipping? Answer: Let S = number of SUV tires manufactured H = number of Hatchbacks tires manufactured Max 22S + 12H s.t. 2S + 1.5H ≤ 500 1.5S + H ≤ 250 S + 0.5H ≤ 150 S, H ≥ 0
– – –
Production Packaging Shipping
a. Production: 100(2) + 100(1.5) = 350 Packaging: 100(1.5) + 100(1) = 250 Shipping: 100(1) + 100(0.5) = 150 b. Hours
Production Packaging Shipping
Time used 350 250 150
Available time 500 250 150
Unused (Slack) time 150 0 0
c. The Sensitivity Report:
The constraint is: 1.5S + H ≤ 250. This constraint is binding and its shadow price is 4. If an additional hour is available for packaging, that is if the constraint is changed from 1.5S + H ≤ 250 to 1.5S + H ≤ 251, the optimal objective function value will increase by $4; that is, the new optimal solution will have objective function value or the profit equal to $3400 + $4 = $3404. d. The constraint is: S + 0.5H ≤ 150. This constraint is binding and its shadow price is 16. If an additional hour is available for shipping, that is if the constraint is changed from S + 0.5H ≤ 150 to S + 0.5H ≤ 151, the optimal objective function value will increase by $16; that is, the new optimal solution will have objective function value equal to $3400 + $16 = $3416. Difficulty: Moderate LO: 8.2, 8.5, Pages 358-364, 372-374 Bloom’s: Application BUSPROG: Analytic DISC:
7. A manager of a quality testing team wanted to test different lots of products using three resources, R1, R2, and R3. Each lot can be tested for quality using any one of the three procedures, P1, P2, or P3. The product once tested will be sent for packaging. The profit contribution per lot for each of these procedures varies and they are $4, $5, and $8, respectively. Also, resource A requires 2 hours, 3 hours, and 4 hours to test a lot using the procedure P1, P2, and P3, respectively. Resource B requires 3 hours, 2 hours, and 3 hours using the procedure P1, P2, and P3, respectively. Lastly, resource C requires 2 hours, 3 hours, and 4 hours using the procedure P1, P2, and P3, respectively. The available times for these three resources are 80 hours, 90 hours, and 65 hours, respectively. Formulate and solve a linear program and solve for the optimal solution for the above scenario by maximizing the profit. a. What will be the change in total profit, if machine M3 is given an extra hour of production time? Answer: Let
x11: x12: . . . x33:
number of lots tested by R1 using procedure P1 number of lots tested by R1 using procedure P2
Max
4(x11 + x21 + x31) + 5(x12 + x22 + x32) + 8(x13 + x23 + x33) s.t. 2x11 + 3x12 + 4x13 ≤ 80 3x21 + 2x22 + 3x23 ≤ 90 2x31 + 3x32 + 4x33 ≤ 65
number of lots tested by R3 using procedure P3
x11, x12, x13, …, x33 ≥ 0
Difficulty: Challenging LO: 8.1, 8.2, Pages 354-364 Bloom’s: Application BUSPROG: Analytic DISC: 8. Clever Sporting Equipment, Inc. makes two types of balls: Soccer balls and Cork balls. The making of each soccer ball and cork ball requires 3 hours and 4 hours of production time, respectively. For the next month, the total production hours of 500 are available. Also, it is given that the combined production quantity for these two balls must be at least 150 units. The objective for this linear programming model is to fulfil the given production requirements at a minimum cost for the total production. The production cost for each Soccer ball is $9 and each Cork ball is $7. Formulate and solve for the recommended production quantities. Answer: Let S = number of Soccer balls manufactured C = number of Cork balls manufactured
Min s.t.
9S + 7C S + C ≥ 150 3S + 4C ≤ 500 S, C ≥ 0
Difficulty: Easy LO: 8.3, Pages 364-367 Bloom’s: Application BUSPROG: Analytic DISC: 9. A Cake & Pastry shop makes 3 types of cakes which require three significant ingredients, given the combination of other ingredients vary. The data for the amount of these ingredients needed to make the cakes are provided in the table below: Cake Plain flour (Ounce) Caster sugar (Ounce) Cocoa powder (Ounce)
Small 8 18 3
Medium 16 22 5
Large 21 25 11
Profit/Unit
$18
$25
$32
Available 400 550 150
Develop and solve a linear programming model to maximize the profit. What is the optimal solution for this problem? Answer: Let S = Number of small cakes made M = Number of medium cakes made L = Number of large cakes made Max s.t.
18S + 25M+ 32L
8S + 16M + 21L ≤ 400 18S + 22M + 25L ≤ 550 3S + 5M + 11L ≤ 150 S, M, L ≥ 0
Difficulty: Moderate LO: 8.1, 8.2, Pages 354-364 Bloom’s: Application BUSPROG: Analytic DISC: 10. Two mining fields, field A and field B of a coal mining company produce Lignite and Bituminous coal. The operating cost per day for field A and field B are $55,000 and $45,000, respectively. The recent records at the company indicate that the mining field A can produce 250 tons of Lignite along with 300 tons of Bituminous coal per day whereas the mining field B can produce 200 tons of Lignite along with 450 tons of Bituminous coal per day. The demand for Lignite is expected to be 120,000 tons and for Bituminous coal, it is expected to be 170,000 tons. The expected demand should be met. To minimize the operating costs of the mining fields, how many days does the company need to operate each of these fields? Answer:
Let A = number of days Field A operates B = number of days Field B operates Min s.t.
55,000A + 45,000B 250A + 200B = 120000 300A + 450B = 170000
A, B ≥ 0
Difficulty: Easy LO: 8.3, Pages 364-367 Bloom’s: Application BUSPROG: Analytic DISC: 11. Sunseel Industries produces two types of raw materials A and B, with a production cost of $4 and $8 per unit, respectively. The combined production of the raw materials A and B must be at least 700 units per month. At least 400 units of the raw material B and not more than 1200 units of the raw material A must be produced per month. The processing time for the raw material A is observed to be 5 hours and for B, it is found to be 4 hours. A total of 3000 such hours are available per month. How much of each raw material should be produced in order to minimize the cost. Develop a linear program that Sunseel Industries can use to determine how many units of each raw material to produce to minimize the total cost. Answer: Let A = number of units of Raw material-A produced per month
B = number of units of Raw material-B produced per month Min s.t.
4A + 8B 5A + 4B ≤ 3000 A ≤ 1200 B ≥ 400 A + B ≥ 700
A, B ≥ 0
Difficulty: Moderate LO: 8.3, Pages 364-367 Bloom’s: Application BUSPROG: Analytic DISC: 12. Michael has decided to invest $40,000 in three types of funds. Fund A has projected an annual return of 8 percent, Fund B has projected an annual return of 10 percent, and Fund C has projected an annual return of 9 percent. He has decided to invest no more than 30 percent of the total amount in Fund B and no more than 40 percent of the total amount in Fund C. a. Formulate a linear programming model that can be used to determine the amount of investments Michael should allocate to each type of fund to maximize the total annual return. b. How much should be allocated to each type of fund? What is the total annual return? Answer: a. Let A = amount invested in Fund A
B = amount invested in Fund B C = amount invested in Fund C Max s.t.
0.08A + 0.10B + 0.09C
A + B + C = 40000 B ≤ 0.3(A + B + C) C ≤ 0.4(A + B + C) A, B, and C ≥ 0
→ →
b.
Cell 18 =SUM(B15:B17) Cell 20 =SUMPRODUCT(B4:B6,B15:B17)
B ≤ 12000 B ≤ 16000
Difficulty: Moderate LO: 8.6, Pages 374-378 Bloom’s: Application BUSPROG: Analytic DISC: 13. Northwest California Ventures Ltd. has decided to provide capital in five market areas for the startups. The investment consultant for the venture capital company has projected an annual rate of return based on the market risk, the product, and the size of the market. Market Area Electronics
Annual Rate of Return on Capital (%)
Software
18
Logistics
15
Education
12
Retail
17
12
The maximum capital provided will be $5 million. The consultant has imposed conditions on allotment of capital based on the risk involved in the market. • The capital provided to retail should be at most 40 percent of the total capital. • The capital for education should be 26 percent of the total of other four markets (Electronics, Software, Logistics, and Retail) • Logistics should be at least 15 percent of the total capital. • The capital allocated for Software plus Logistics should be no more than the capital allotted for Electronics. • The capital allocated for Logistics plus Education should not be greater than that allocated to Retail. Calculate the expected annual rate of return based on the allocation of capital to each market area to maximize the return on capital provided. Also, show the allocation of capital for each market area. Answer: Let x1 = investment on Electronics x2 = investment on Software x3 = investment on Logistics x4 = investment on Education x5 = investment on Retail Max
0.12x1 + 0.18x2 + 0.15x3 + 0.12x4 + 0.17x5 s.t. x5 ≤ 0.4(5000000) x4 = 0.26 (x1 + x2 + x3 + x5) x3 ≥ 0.15(5000000) x2 + x3 ≤ x1 x3 + x4 ≤ x5 x1+ x2 + x3 + x4+ x5 = 5000000 x1, x2, x3, x4, and x5 ≥ 0
The expected annual rate of return based on the allocation of capital to each of the five market areas, as shown below, is obtained as $736,547.62, approximately.
Market Area Electronics (x1) Software (x2) Logistics (x3) Education (x4) Retail (x5) Total Difficulty: Challenging LO: 8.6, Pages 374-378
Allocation (approx.) $984,127 $234,127 $750,000 $1,031,746 $2,000,000 $5,000,000
Bloom’s: Application BUSPROG: Analytic DISC: 14. Jackson just obtained $240,000 by selling mutual funds and is now looking for other investment opportunities for these funds. His financial consultant recommends that all new investments be made in the stocks of industries such as like Agriculture, Healthcare, Banking, Manufacturing, and Real Estate. The projected annual rates of returns for the investments are as follows: Expected Annual Returns of the Stocks Stocks Agriculture Health Care Banking Manufacturing Real Estate
Return (%) 11 6.50 9 12 8.50
His consultant has set constraints on the investments based on the calculated risks involved with the industries: 1) Neither Agriculture nor Manufacturing industry should receive more than $100,000. 2) Neither Healthcare nor Banking should receive more than $50,000. 3) The amount invested in Manufacturing industry should not be more than 45 percent of the sum of the investment in Banking and Healthcare sectors. 4) The amount invested in Real Estate should be at least 20 percent of the sum of the investment in Banking and Healthcare sectors. What portfolio recommendations—investments and amounts—should be made for the available $240,000? Answer: Let X1 = amount invested in Agriculture X2 = amount invested in Health Care X3 = amount invested in Banking X4 = amount invested in Manufacturing X5 = amount invested in Real Estate Max: s.t.
0.11X1 + 0.065X2 + 0.09X3 + 0.12X4 + 0.085X5 X1 + X2 + X3 + X4 + X5 = 240,000 X1 ≤ 100000 X4 ≤ 100000 X2 ≤ 50000 X3 ≤ 50000
X4 ≤ 0.45(X2 + X3) X5 ≥ 0.20(X2 + X3) X1, X2, X3, X4, X5 ≥ 0
Difficulty: Challenging LO: 8.6, Pages 374-378 Bloom’s: Application BUSPROG: Analytic DISC: 15. A soft drink manufacturing company has 3 factories set up one in each of the three cities - Orland, Tampa, and Port St. Lucie and it supplies the produced soft drink bottles to 3 warehouses located in the city of Miami. The associated per-unit transportation cost between the factories and the warehouses is provided in the table below:
Transportation Costs of Factories ($) Factories/Warehouses
W1
W2
W3
Orlando
7
4
5
Tampa Port St. Lucie
7 5
6 5
4 6
The factory at Orlando has a capacity of 14,000 units. The factory at Tampa has a capacity of 25,000 units. The factory at Port St. Lucie has a capacity of 23,000 units. The requirements of the warehouses are: Warehouse W1 W2 W3
Requirement (Bottles) 18,000 19,000 22,000
Determine how much of the company’s production should be shipped from each factory to each warehouse in order to minimize the total transportation cost? Answer: Let
Min
x11: x12: . . . x33:
number of bottles shipped from Orlando to W1 number of bottles shipped from Orlando to W2
number of bottles shipped from Port St. Lucie to W3
7x11 + 4x12 + 5x13 + 7x21 + 6x22 + 4x23 + 5x31 + 5x32 + 6x33 s.t. x11 + x12 + x13 ≤ 14000 x21 + x22 + x23 ≤ 25000 x31 + x32 + x33 ≤ 23000 x11 + x21 + x31 = 18000 x12 + x22 + x32 = 19000 x13 + x23 + x33 = 22000 x11, x12, x13, …, x33 ≥ 0
Difficulty: Moderate LO: 8.6, Pages 378-381 Bloom’s: Application BUSPROG: Analytic DISC: 16. A soft drink manufacturing company has 3 factories set up one in each of the three cities - Orland, Tampa, and Port St. Lucie and it supplies the produced soft drink bottles to 3 warehouses located in the city of Miami. The associated per-unit transportation cost table is provided below:
Factories/Warehouse (W) Orlando Tampa Port St. Lucie
Transportation Costs ($) W1 W2 4 3 7 6 3 6
The factory at Orlando has a capacity of 15,000 units.
W3 7 4 6
The factory at Tampa has a capacity of 18,000 units. The factory at Port St. Lucie has a capacity of 8,000 units. The requirements of the warehouses are: Warehouse W1 W2 W3
Requirement (Bottles) 18,000 12,000 5,000
a. Determine how much of the company’s production should be shipped from each factory to each warehouse in order to minimize the total transportation cost? b. Find an alternative optimal solution for this transportation problem? Hint: Use the procedure described in section 8.7. Answer: a. Let
Min
x11: x12: . . . x33:
number of bottles shipped from Orlando to W1 number of bottles shipped from Orlando to W2
number of bottles shipped from Port St. Lucie to W3
4x11 + 3x12 + 7x13 + 7x21 + 6x22 + 4x23 + 3x31 + 6x32 + 6x33 s.t. x11 + x12 + x13 ≤ 15000 x21 + x22 + x23 ≤ 18000 x31 + x32 + x33 ≤ 8000 x11 + x21 + x31 = 18000 x12 + x22 + x32 = 12000 x13 + x23 + x33 = 5000 x11, x12, x13, …, x33 ≥ 0
b. To find an alternative optimal solution, solve the problem (maximize the sum of supplies that are zeroes in the above solution; subject to cost must be optimal). Max
x13 + x22 + x32 + x33 s.t. x11 + x12 + x13 ≤ 15000 x21 + x22 + x23 ≤ 18000 x31 + x32 + x33 ≤ 8000 x11 + x21 + x31 = 18000 x12 + x22 + x32 = 12000 x13 + x23 + x33 = 5000 x11, x12, x13, …, x33 ≥ 0
Difficulty: Challenging LO: 8.7, Pages 386-388 Bloom’s: Application BUSPROG: Analytic DISC: 17. Ethan Steel, Inc. has two factories that manufacture steel components for four different rail projects located at four different sites. The demand for the steel components for the four projects, Project A, Project B, Project C, and Project D, are 3220, 3675, 4125, and 2975, respectively. The shipping details are as below: Production Details:
Factory 1 2
Maximum Capacity 6500 8500
Shipping Details (with per-unit shipping cost):
Project Factory 1 2
A $7 $6
B $7 $5
C $8 $7
D $4 $3
What is the optimal (cost minimizing) distribution plan for this transportation problem? Answer: Let
Min
x11: x12: . . . x24:
number of steel components shipped from Factory 1 to Project A number of steel components shipped from Factory 1 to Project B
number of steel components shipped from Factory 2 to Project D
7x11 + 7x12 + 8x13 + 4x14 + 6x21 + 5x22 + 7x23 + 3x24 s.t. x11 + x21 = 3220 x12 + x22 = 3675 x13 + x23 = 4125 x14 + x24 = 2975 x11 + x12 + x13 + x14 ≤ 6500 x21 + x22 + x23 + x24 ≤ 8500 x11, x12, x13, …, x24 ≥ 0
Difficulty: Moderate LO: 8.6, Pages 378-381 Bloom’s: Application BUSPROG: Analytic DISC: 18. Ethan Steel, Inc. has two factories that manufacture steel components for four different rail projects located at four different sites. The demand for the steel components for the four projects, Project A, Project B, Project C, and Project D, are 3220, 3675, 4125, and 2975, respectively. The shipping details are as below: Production details:
Factory 1
Maximum Capacity 6500
2
8500
Shipping Details (with per-unit shipping cost): Project Factory 1 2
A $7 $6
B $7 $5
C $8 $7
D $4 $3
Find an alternative optimal solution for this transportation problem? Hint: Use the procedure described in section 8.7.
Answer: First, solve the problem by minimizing the total shipping cost. Let
Min
x11: x12: . . . x24:
number of steel components shipped from Factory 1 to Project A number of steel components shipped from Factory 1 to Project B
number of steel components shipped from Factory 2 to Project D
7x11 + 7x12 + 8x13 + 4x14 + 6x21 + 5x22 + 7x23 + 3x24 s.t. x11 + x21 = 3220 x12 + x22 = 3675 x13 + x23 = 4125 x14 + x24 = 2975 x11 + x12 + x13 + x14 ≤ 6500 x21 + x22 + x23 + x24 ≤ 8500 x11, x12, x13, …, x24 ≥ 0
To find an alternative optimal solution, solve the problem (maximize the shipping that is zero in the above solution; subject to cost must be optimal). Max s.t.
x21 + x12 + x13 x11 + x21 = 3220 x12 + x22 = 3675 x13 + x23 = 4125 x14 + x24 = 2975 x11 + x12 + x13 + x14 ≤ 6500 x21 + x22 + x23 + x24 ≤ 8500
x11, x12, x13, …, x24 ≥ 0
Difficulty: Challenging LO: 8.7, Pages 386-388 Bloom’s: Application BUSPROG: Analytic DISC: 19. Three plants P1, P2, and P3 of a gas corporation supply gasoline to three of their distributors located in the city at three different locations A, B, and C. The plants’ daily capacities are 4500, 3000, and 5000, gallons respectively, while the distributors’ daily requirements are 5500, 2500, and 4200 gallons. The per-gallon transportation costs (in $) are provided in the table below:
Plant P1 P2 P3
A 0.8 0.7 0.5
Distributor B 0.5 0.65 0.45
C 1 0.8 0.7
Because of a failure of expected supply earlier, the distributors - A, B, and C this time have decided to charge a penalty of $0.45, $0.55, and $0.5 per gallon, respectively, to avoid any further delays. Now, determine the optimum supply of gasoline to the distributors, in order to minimize the total transportation cost as well as the charges payable as penalty. Answer: Let
Min
x11: x12:
number of gallons of gasoline supplied from Plant 1 to Distributor A number of gallons of gasoline supplied from Plant 1 to Distributor B
. . . X33:
number of gallons of gasoline supplied from Plant 3 to Distributor C
0.8x11 + 0.5x12 + 1x13 + 0.7x21 + 0.65x22 + 0.8x23 + 0.5x31 + 0.45x32 + 0.7x33 s.t. x11 + x12 + x13 ≤ 4500 x21 + x22 + x23 ≤ 3000 x31 + x32 + x33 ≤ 5000 x11 + x21 + x31 = 5500 x12 + x22 + x32 = 2500 x13 + x23 + x33= 4200 x11, x12, …, x33 ≥ 0
Difficulty: Moderate LO: 8.6, Pages 378-381 Bloom’s: Application BUSPROG: Analytic DISC: 20. Three plants P1, P2, and P3 of a gas corporation supply gasoline to three of their distributors in the city located at A, B, and C locations. The plants’ daily capacities are 4500, 3000, and 5000, gallons respectively, while the distributors’ daily requirements are 5500, 2500, and 4200 gallons. The pergallon transportation costs (in $) are provided in the table below:
Plant P1 P2 P3
A 0.8 0.7 0.5
Distributor B 0.5 0.65 0.45
C 1 0.8 0.7
Because of a failure of expected supply earlier, the distributors this time have decided to charge a penalty of $0.45, $0.55, and $0.5 per gallon, respectively for the locations A, B, and C to avoid any further delays. Find an alternative optimal solution for this transportation problem? Hint: Use the procedure described in section 8.7. Answer: First, solve the problem by minimizing the total transportation cost. Let
Min
x11: x12:
number of gallons of gasoline supplied from Plant 1 to Distributor A number of gallons of gasoline supplied from Plant 1 to Distributor B
. . . X33:
number of gallons of gasoline supplied from Plant 3 to Distributor C
0.8x11 + 0.5x12 + 1x13 + 0.7x21 + 0.65x22 + 0.8x23 + 0.5x31 + 0.45x32 + 0.7x33 s.t. x11 + x12 + x13 ≤ 4500 x21 + x22 + x23 ≤ 3000 x31 + x32 + x33 ≤ 5000 x11 + x21 + x31 = 5500 x12 + x22 + x32 = 2500 x13 + x23 + x33= 4200 x11, x12, …, x33 ≥ 0
To find the alternative optimal solution, solve the problem (maximize the supply that is zero in the above solution; subject to cost must be optimal). Max s.t.
x13 + x21 + x22 + x32
x11 + x12 + x13 ≤ 4500 x21 + x22 + x23 ≤ 3000 x31 + x32 + x33 ≤ 5000 x11 + x21 + x31 = 5500 x12 + x22 + x32 = 2500 x13 + x23 + x33 = 4200 x11, x12, …, x33 ≥ 0
Difficulty: Challenging LO: 8.7, Pages 386-388 Bloom’s: Application BUSPROG: Analytic DISC:
Chapter 9: Integer Linear Optimization Models 1. The imposition of integer restriction is necessary for models where: a. nonnegativity constraints are needed. b. variables can take negative values. c. the decision variables cannot take fractional values. d. possible values of variables are restricted to particular intervals. Answer: C Difficulty: Moderate LO: 9.1, Pages 406-407 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The imposition of integer restriction is necessary for models where the decision variables cannot take fractional values. 2. The objective function for a linear optimization problem is: Max 3x + 5y, with one of the constraints being x, y ≥ 0 and integer. x and y are the only decisions variables. This is an example of a(n) _____. a. all-integer linear program b. mixed-integer linear program c. nonlinear program d. binary integer linear program Answer: A Difficulty: Moderate LO: 9.1, Page 407 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: If all variables are required to be integer in a linear program, then it is an all-integer linear program. 3. The linear program that results from dropping the integer requirements for the variables in an integer linear program is known as _____. a. convex hull b. a mixed-integer linear program c. LP relaxation d. a binary integer linear program Answer: C Difficulty: Easy LO: 9.1, Page 407 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The linear program that results from dropping the integer requirements for the variables in an integer linear program is known as LP relaxation.
4. The objective function for an optimization problem is: Min 3x – 2y, with one of the constraints being x, y ≥ 0 and integer. If the integer restriction on the variables is removed, this would be a familiar two-variable linear program; however, it would be an example of _____. a. convex hull of the linear program b. a mixed-integer linear program c. LP relaxation of the integer linear program d. a binary integer linear program Answer: C Difficulty: Moderate LO: 9.1, Page 407 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The linear program that results from dropping the integer requirements for the variables in an integer linear program is known as LP relaxation. 5. The objective function for an optimization problem is: Max 5x – 3y, with one of the constraints being x, y ≥ 0 and y integer. x and y are the only decisions variables. This is an example of a(n) _____. a. all-integer linear program b. mixed-integer linear program c. LP relaxation of the integer linear program d. binary integer linear program Answer: B Difficulty: Moderate LO: 9.1, Page 407 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: If some, but not necessarily all, variables are required to be integer in a linear program, then it is a mixed-integer linear program. 6. In binary integer linear program, the integer variables take only the values: a. 0 or 1. b. 0 or ∞. c. 1 or ∞. d. 1 or –1. Answer: A Difficulty: Easy LO: 9.1, Page 407 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: In binary integer linear program, the integer variables take only the values 0 or 1.
7. Which of the following is true of rounding the solution to an integer? a. It always produces the most optimal integer solution. b. It never produces a feasible solution. c. It does not affect the objective function. d. It may or may not be feasible. Answer: D Difficulty: Moderate LO: 9.2, Page 409 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Rounding to an integer solution is a trial-and-error approach. Each rounded solution must be evaluated for feasibility as well as for its impact on the value of the objective function. Even when a rounded solution is feasible, we do not have a guarantee that we have found the optimal integer solution. 8. The _____ of a set of points is the smallest intersection of linear inequalities that contain the set of points. a. concave hull b. slope c. convex hull d. geometry Answer: C Difficulty: Easy LO: 9.2, Page 409 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The convex hull of a set of points is the smallest intersection of linear inequalities that contain the set of points. 9. The optimal solution to the integer program will be an extreme point of the _____. a. convex hull b. objective contour c. cutting plane d. slope Answer: A Difficulty: Moderate LO: 9.2, Page 409 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The optimal solution to the integer program will be an extreme point of the convex hull.
10. Which of the following is true of the relationship between the value of the optimal integer solution and the value of the optimal solution to the LP Relaxation? a. For integer linear programs involving minimization, the value of the optimal solution to the LP Relaxation provides an upper bound on the value of the optimal integer solution. b. For integer linear programs involving maximization, the value of the optimal solution to the LP Relaxation provides a lower bound on the value of the optimal integer solution. c. For integer linear programs involving minimization, the value of the optimal solution to the LP Relaxation provides a lower bound on the value of the optimal integer solution. d. For any linear program involving either minimization or maximization, the value of the optimal solution to the LP Relaxation provides an infeasible value for the optimal integer solution. Answer: C Difficulty: Moderate LO: 9.2, Page 410 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: For integer linear programs involving minimization, the value of the optimal solution to the LP Relaxation provides a lower bound on the value of the optimal integer solution. 11. The _____ approach to solving integer linear optimization problems breaks the feasible region of the LP Relaxation into subregions until the subregions have integer solutions or it is determined that the solution cannot be in the subregion. a. cutting plane b. trial-and-error c. breaking region d. branch-and-bound Answer: D Difficulty: Easy LO: 9.2, Page 410 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The branch-and-bound approach to solving integer linear optimization problems breaks the feasible region of the LP Relaxation into subregions until the subregions have integer solutions or it is determined that the solution cannot be in the subregion. 12. Which of the following approaches to solving integer linear optimization problems tries to identify the convex hull by adding a series of new constraints that do not exclude any feasible integer points? a. Branch-and bound approach b. Cutting plane approach c. Trial-and-error approach d. Convex hull approach Answer: B Difficulty: Easy
LO: 9.2, Page 410 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Cutting plane approaches try to identify the convex hull by adding a series of new constraints that do not exclude any feasible integer points. 13. The worksheet formulation for integer linear programs and linear programming problems is exactly the same except that the _____ for integer linear programs. a. objective function using Set Objective in the Solver Parameters dialog box is set to Value Of option b. decision variables need not be added in By Changing Variable Cells in the Solver Parameters dialog box c. decision variables must be added in By Changing Variable Cells in the Solver Parameters dialog box along with selecting the Ignore Integer Constraints in the Integer Options dialog box d. constraints must be added in the Solver Parameters dialog box to identify the integer variables and the value for Tolerance in the Integer Options dialog box may need to be adjusted Answer: D Difficulty: Moderate LO: 9.3, Pages 410-411 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The worksheet formulation for integer linear programs is similar to that for linear programming problems. Actually the worksheet formulation is exactly the same, but some additional information must be provided when setting up the Solver Parameters and Options dialog boxes. Constraints must be added in the Solver Parameters dialog box to identify the integer variables. In addition, the value for Tolerance in the Integer Options dialog box may need to be adjusted to obtain a solution. 14. Binary variables are identified with the _____designation in the Solver Parameters dialog box. a. bin b. 0 and 1 c. int d. dif Answer: A Difficulty: Easy LO: 9.3, Page 413 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Binary variables are identified with the bin designation in the Solver Parameters dialog box.
15. The importance of _____ for integer linear programming problems is often intensified by the fact that a small change in one of the coefficients in the constraints can cause a relatively large change in the value of the optimal solution. a. objective function b. decision variables c. sensitivity analysis d. optimization analysis Answer: C Difficulty: Moderate LO: 9.3, Page 415 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Sensitivity analysis often is more crucial for integer linear programming problems than for linear programming problems. A small change in one of the coefficients in the constraints can cause a relatively large change in the value of the optimal solution. 16. Which of the following is true about the sensitivity analysis for integer optimization problems? a. Sensitivity reports are readily available for integer optimization problems similar to the linear programming problems. b. Because of the discrete nature of the integer optimization, Excel Solver takes much more time to calculate objective function coefficient ranges, shadow prices, and right-hand-side ranges. c. The sensitivity analysis is not important for integer problems. d. To determine the sensitivity of the solution to changes in model inputs for integer optimization problems, the data must be changed and the problem must be re-solved. Answer: D Difficulty: Moderate LO: 9.3, Page 415 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Sensitivity reports are not available for integer optimization problems. To determine the sensitivity of the solution to changes in model inputs, you must change the data and re-solve the problem. 17. In order to choose the best solution for implementation, practitioners usually recommend resolving the integer linear program several times with variations in the_____. a. objective function b. decision variables c. constraint coefficients d. integer constraints Answer: C Difficulty: Moderate LO: 9.3, Page 415 Bloom’s: Comprehension
BUSPROG: Analytic DISC: Feedback: Because of the extreme sensitivity of the value of the optimal solution to the constraint coefficients, practitioners usually recommend re-solving the integer linear program several times with variations in the constraint coefficients before attempting to choose the best solution for implementation. 18. In cases where Excel Solver experiences excessive run times when solving integer linear problems, the Integer Optimality is set to _____. a. 5% b. 0% c. infinity d. a value equal to the number of integer constraints Answer: A Difficulty: Moderate LO: 9.3, Page 415 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: If an optimal solution cannot be found within a reasonable amount of time, the Integer Optimality (%) can be reset to 5 percent or some higher value so that the search procedure may stop when a near optimal solution (within the tolerance of being optimal) has been found. In general, unless you are experiencing excessive run times, it is recommended that the Integer Optimality (%) is set to 0. 19. The objective function for a linear optimization problem is: Max 3x + 2y, with one of the constraints being x, y = 0, 1. x and y are the only decision variables. This is an example of a _____. a. nonlinear program b. mixed-integer linear program c. LP relaxation of the integer linear program d. binary integer linear program Answer: D Difficulty: Easy LO: 9.4, Page 416 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In some applications, the integer variables may take on only the values 0 or 1. Then we have a binary integer linear program. 20. _____ is a binary integer programming problem that involves choosing which possible projects or activities provide the best investment return. a. Capital budgeting problem b. Fixed-cost problem c. Market share optimization problem d. Location problem
Answer: A Difficulty: Easy LO: 9.4, Page 416 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Capital budgeting problem is a binary integer programming problem that involves choosing which possible projects or activities provide the best investment return. 21. In a production application involving a fixed cost and a variable cost, the use of _____ makes including the fixed cost possible in a production model. a. location variables b. noninteger constraints c. objective function coefficients d. binary variables Answer: D Difficulty: Moderate LO: 9.4, Page 416 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In a production application involving a fixed cost and a variable cost, the use of binary variables makes including the fixed cost possible in a production model. 22. A binary mixed-integer programming problem in which the binary variables represent whether an activity, such as a production run, is undertaken or not is known as the _____. a. capital budgeting problem b. share of choice problem c. fixed-cost problem d. covering problem Answer: C Difficulty: Easy LO: 9.4, Page 417 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A binary mixed-integer programming problem in which the binary variables represent whether an activity, such as a production run, is undertaken or not is known as the fixed-cost problem. 23. In a fixed-cost model, each fixed cost is associated with a binary variable and a specification of the: a. upper bound for the corresponding production variable. b. upper bound for each of the binary variable. c. integer constraints involving the corresponding production variables. d. objective function involving these binary variables only.
Answer: A Difficulty: Moderate LO: 9.4, Page 419 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In a fixed-cost model, each fixed cost is associated with a binary variable and a specification of upper bound for the corresponding production variable. 24. Which of the following is a likely constraint on the production quantity x associated with a maximum value and a setup variable y in a fixed-cost problem? a. x ≥ My b. x ≤ My c. Mx ≤ y d. xy ≥ M Answer: B Difficulty: Moderate LO: 9.4, Page 420 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: x ≤ My is a likely constraint on the production quantity x associated with a maximum value and a setup variable y in a fixed-cost problem. 25. In a fixed-cost problem, choosing excessively large values for the maximum production quantity will result in: a. all reasonable levels of production. b. no production. c. no solution at all. d. possibly a slow solution procedure. Answer: D Difficulty: Moderate LO: 9.4, Page 420 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In a fixed-cost problem, the value of the maximum production quantity M should be large enough to allow for all reasonable levels of production, but choosing values of M excessively large will slow the solution procedure. 26. For a location problem, if the variables are defined as xi = 1 if an outlet store is established in region i and 0 otherwise, the objective function is best defined by _____ for i = 1, 2, …, n number of outlet stores included in the problem. a. Min(∑xi) b. Max(∑xi) c. Min(πxi) d. Max(πxi)
Answer: A Difficulty: Moderate LO: 9.4, Page 421 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: For a location problem, if the variables are defined as xi = 1 if an outlet store is established in region i and 0 otherwise, the objective function is best defined by Min(∑xi) for i = 1, 2, …, n number of outlet stores included in the problem. 27. _____ is a market research technique that can be used to learn how prospective buyers of a product value the product’s attributes. a. Part-worth analysis b. Conjoint analysis c. Regression analysis d. Sensitivity analysis Answer: B Difficulty: Easy LO: 9.4, Page 424 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Conjoint analysis is a market research technique that can be used to learn how prospective buyers of a product value the product’s attributes. 28. The _____ is the utility value that a consumer attaches to each level of each attribute in a conjoint analysis model. a. weightage b. share of choice c. part-worth d. share of market Answer: C Difficulty: Easy LO: 9.4, Page 424 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The part-worth is the utility value that a consumer attaches to each level of each attribute in a conjoint analysis model. 29. The part-worth for each of the attribute levels in a conjoint analysis is determined by _____. a. regression analysis b. sensitivity analysis c. online surveys d. word-of-mouth Answer: A
Difficulty: Moderate LO: 9.4, Page 424 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The part-worth for each of the attribute levels in a conjoint analysis is determined by regression analysis. 30. Coming up with a product design that will have the highest utility for a sufficient number of people to ensure sufficient sales to justify making the product is known as the _____ in marketing literature. a. capital budgeting problem b. share of choice problem c. fixed-cost problem d. traveling-salesman problem Answer: B Difficulty: Easy LO: 9.4, Page 425 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Coming up with a product design that will have the highest utility for a sufficient number of people to ensure sufficient sales to justify making the product is known as the share of choice problem in marketing literature. Reference - 9.1: Use the information given below to answer questions 31-35. An apparel designing company is planning to enter the women’s trousers market. They are in the process of developing a product that will appeal most to customers. 31. Reference - 9.1. What category does the above objective fall under? a. Capital budgeting problem b. Covering problem c. Fixed-cost problem d. Product design and market share optimization problem Answer: D Difficulty: Moderate LO: 9.4, Pages 424-426 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The process of designing a new product that is likely to appeal to most customers falls under the product design and market share optimization problem. 32. Reference - 9.1. The available size of the trousers will be a(n) _____ in an integer programming model for this problem. a. binary variable
b. constraint c. attribute d. regression constant Answer: C Difficulty: Moderate LO: 9.4, Pages 424-426 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The available size of the trousers will be an attribute in an integer programming model for this problem. 33. Reference - 9.1. Pink, green, and black will be _____ of the color attribute. a. levels b. constraints c. regression constants d. utility values Answer: A Difficulty: Moderate LO: 9.4, Pages 424-426 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: Pink, green, and black will be levels of the color attribute. 34. Reference - 9.1. The levels – small, medium, and large of the size attribute are modeled using: a. objective function coefficients. b. slack variables. c. binary variables. d. nonlinear coefficients. Answer: C Difficulty: Moderate LO: 9.4, Pages 424-426 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The levels – small, medium, and large of the size attribute are modeled using binary variables. 35. Reference - 9.1. The part-worths for each of the attribute levels obtained from an initial customer survey and the subsequent regression analysis can be used to determine the: a. customer utility value. b. optimal solution for the regression analysis. c. overall profit for the company. d. overall sales achieved by the company.
Answer: A Difficulty: Moderate LO: 9.4, Pages 424-426 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The part-worths for each of the attribute levels obtained from an initial customer survey and the subsequent regression analysis can be used to determine the customer utility value. 36. According to the _____ constraint, the sum of two or more binary variables must be equal to one. a. conditional b. corequisite c. multiple-choice d. mutually exclusive Answer: C Difficulty: Moderate LO: 9.5, Page 427 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: A multiple-choice constraint requires that the sum of two or more binary variables equals one. 37. The sum of two or more binary variables must be less than or equal to one in _____ constraint. a. corequisite b. conditional c. multiple-choice d. mutually exclusive Answer: D Difficulty: Moderate LO: 9.5, Page 427 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: In a mutually exclusive constraint, the sum of two or more binary variables must be less than or equal to one. 38. A constraint involving binary variables that does not allow certain variables to equal one unless certain other variables are equal to one is known as a _____. a. conditional constraint b. corequisite constraint c. k out of n alternatives constraint d. mutually exclusive constraint Answer: A
Difficulty: Easy LO: 9.5, Page 428 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A constraint involving binary variables that does not allow certain variables to equal one unless certain other variables are equal to one is known as a conditional constraint. 39. _____ is a constraint requiring that two binary variables be equal and that thus are both either in or out of the solution together. a. Conditional constraint b. Corequisite constraint c. k out of n alternatives constraint d. Mutually exclusive constraint Answer: B Difficulty: Easy LO: 9.5, Page 428 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Corequisite constraint is a constraint requiring that two binary variables be equal and that thus are both either in or out of the solution together. 40. Which of the following is true about generating alternatives in binary optimization? a. If the second-best solution is very close to optimal, it is always preferred over the true optimal solution because of factors outside the model. b. If alternative solutions exist, it would not help management because some factors that make one alternative are not preferred over the factors that make another alternative. c. If the solution is a unique optimal solution, it would be good for management to know how much worse the second-best solution is than the unique optimal solution. d. If any alternative solution exists, it would only be a second-best next to the optimal solution because there is no third-best or an alternative second-best solution to any binary integer programming problem. Answer: C Difficulty: Moderate LO: 9.6, Page 428 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: If alternative optimal solutions exist, it would be good for management to know this because some factors that make one alternative preferred over another might not be included in the model. Also, if the solution is a unique optimal solution, it would be good to know how much worse the second-best solution is than the unique optimal solution. If the second-best solution is very close to optimal, it might be preferred over the true optimal solution because of factors outside the model. There could be a third-best solution or an alternative second-best solution to a binary optimization problem.
Problems 1. A manufacturer makes two types of rubber, Butadiene and Polyisoprene. The plant has two machines, Machine-1 and Machine-2, and both of them are used to make the rubber strips. Machine-1 is available 180 hours per month and Machine-2 is available 200 hours per month. Manufacturing one strip of Butadiene requires 2.75 hours on Machine-1 and 3 hours on Machine-2. For processing one strip of Polyisoprene, it takes 3.5 hours on Machine-1 and 4 hours on Machine-2. Formulate an all-integer model that will determine how many units of each type of the rubber should be used to maximize the manufacturer’s contribution to profit if he gets a profit of $20 on Butadiene and $26 on Polyisoprene? Answer: Let B = Number of units of Butadiene produced P = Number of units of Polyisoprene produced Max 20B + 26P s.t. 2.75B + 3.5P ≤ 180 3B + 4P ≤ 200 B, P ≥ 0 and integer Difficulty: Easy LO: 9.1, 9.2, Pages 406-408 Bloom’s: Application BUSPROG: Analytic DISC: 2. A manufacturer makes two types of rubber, Butadiene and Polyisoprene. The plant has two machines, Machine-1 and Machine-2, and both of them are used to make the rubber strips. Machine-1 is available 180 hours per month and Machine-2 is available 200 hours per month. Manufacturing one strip of Butadiene requires 2.75 hours on Machine-1 and 3 hours on Machine-2. For processing one strip of Polyisoprene, it takes 3.5 hours on Machine-1 and 4 hours on Machine-2. Formulate an all-integer model that will determine how many units of each type of the rubber should be used to maximize the manufacturer’s contribution to profit if he gets a profit of $20 on Butadiene and $26 on Polyisoprene? Answer: Let B = Number of units of Butadiene produced P = Number of units of Polyisoprene produced Max s.t.
20B + 26P
2.75B + 3.5P ≤ 180 3B + 4P ≤ 200 B, P ≥ 0 and integer
Difficulty: Moderate LO: 9.3, Pages 410-414 Bloom’s: Application BUSPROG: Analytic DISC: 3. A coffee manufacturing company has two different plants that roast imported coffee beans. After roasting, the plants produce three types of coffee beans, A, B, and C. The company has contracted with a chain of cafes to provide per week, 20 tons of coffee bean A, 11 tons of coffee bean B, and 18 tons of coffee bean C. The two plants have the same capacity, but diverse operational features as below:
Plant P1 P2 Demand
Manufacturing Cost per Ton ($) A B C 900 1125 875 850 1200 950 20 11 18
Capacity 25 25
Formulate and solve all-integer model that will determine how many tons of each type of coffee beans are produced in each plant by minimizing the total cost. Answer: Let Xij = number of tons of coffee beans of type i produced in plant j; i = 1, 2, 3, and j = 1, 2 Min 900X11 + 1125X12 + 875X13 + 850X21 + 1200X22 + 950X23 s.t. X11 + X12 + X13 ≤ 25 X21 + X22 + X23 ≤ 25 X11 + X21 = 20 X12 + X22 = 11 X13 + X23 = 18 Xij ≥ 0 and integer; i = 1, 2, 3, and j = 1, 2
Difficulty: Moderate LO: 9.3, Pages 410-414 Bloom’s: Application BUSPROG: Analytic DISC: 4. FinFone Paper Mill is a small-scale paper making company which has four making machines to produce four different types of papers. Each type of paper must go through processing on each of four machines. The manufacturing time (in minutes) per unit of paper produced is in the following table: Time required (in minutes) Paper Type Machine Type 1 2 3 4
A
B
C
D
2.4 2.1 1.6 2.5
1.2 2.4 0.9 2.5
2.7 3.2 2.6 3.2
3.2 3.3 5.1 6.5
The maximum time allotted for each machine is 30 hours per week and at least 100 units of each type of paper should be made using these machines during the week. Profit per unit is: Paper Type Profit ($)
A
B
C
D
0.25
0.32
0.44
0.5
Develop and solve an all-integer model that will determine, using the available machine time, the number of units of each paper type to be produced in order to meet the weekly demand and to maximize the profit. Answer: Let A = Number of units of Type A paper made B = Number of units of Type B paper made C = Number of units of Type C paper made D = Number of units of Type D paper made Max 0.25A + 0.32B + 0.44C + 0.5D s.t. 2.4A + 1.2B + 2.7C + 3.2D ≤ 1800 2.1A + 2.4B + 3.2C + 3.3D ≤ 1800 1.6A + 0.9B + 2.6C + 5.1D ≤ 1800 2.5A + 2.5B + 3.2C + 6.5D ≤ 1800 A, B, C, D ≥ 100 A, B, C, D ≥ 0 and integer
Difficulty: Moderate LO: 9.2, 9.3, Pages 407-414 Bloom’s: Application BUSPROG: Analytic DISC: 5. The following questions refer to an advertisement budgeting problem involving printing of five magazines represented by binary variables M1, M2, M3, M4, and M5. a. Write a constraint modeling a situation in which two of the magazines M1, M4, and M5 must be printed. b. Write a constraint modeling a situation in which, if M2 or M3 is printed, they must both be printed. c. Write a constraint modeling a situation in which magazine M1 or M3 must be printed, but not both. d. Write constraints modeling a situation where M2 cannot be printed unless magazine M3 and M5 also are printed. e. Write a constraint in which not more than 4 of all the five magazines have to be printed.
f.
Write a constraint in which exactly five of the magazines are printed.
Answer: a. M1 + M4 + M5 = 2 b. M2 - M3 = 0 c. M1 + M3 = 1 d. M2 ≤ M3 M2 ≤ M5 e. M1 + M2 + M3 + M4 + M5 ≤ 4 f.
M1 + M2 + M3 + M4 + M5 = 5
Difficulty: Easy LO: 9.4, Pages 415-420 Bloom’s: Application BUSPROG: Analytic DISC: 6. Delisshious Toasty Chocolates Company largely produces two types of chocolate bars, Almond Tasty and Cashew Crunchy. The maintenance costs per day incurred to produce the two types of chocolate bars are $100 and $120, respectively. The manufacturing cost per chocolate bar is $2 for Almond Tasty and $2.5 for Cashew Crunchy. The daily production capacity of Almond Tasty and Cashew Crunchy chocolate bars are 1100 and 1250, respectively. There is only one machine to produce whichever type of chocolate bar will be produced that day. Only one type of bar can be made on a given day. Let C1 = the number of Almond Tasty chocolate bars produced C2 = the number of Cashew Crunchy chocolate bars are produced Y1 = 1 if the machine produces Almond Tasty; 0, otherwise Y2 = 1 if the machine produces Cashew Crunchy; 0, otherwise a. Write a constraint that sets the next day’s maximum production of Almond Tasty to 1100. b. Write a constraint that sets the next day’s maximum production of Cashew Crunchy to 1250. c. Write a constraint that requires that production be set up for exactly one of the two chocolates bars. d. Write the cost function to be minimized. Answer:
a. C1 ≤ 1100Y1 b. C2 ≤ 1250Y2 c. Y1 + Y2 = 1 d. Min 2C1 + 2.5C2 + 100Y1 + 120Y2 Difficulty: Easy LO: 9.4, Pages 415-416 Bloom’s: Application BUSPROG: Analytic DISC: 7. A chocolate making company largely produces one particular type of crunchy chocolate bar. There are two machines in the plant that produces this chocolate bar. The maintenance costs per day incurred on these two machines are $100 and $120, respectively. The manufacturing cost per chocolate bar is $2.5 for Machine-1 and $2 for Machine-2. The maximum daily production capacity for Machine-1 and Machine-2 are 1100 and 1250, respectively, and the company must produce at least 1000 chocolate bars per day. Develop and solve an integer programming model for minimizing the total cost. Answer: Let C1 = the number of chocolate bars produced by Machine-1 C2 = the number of chocolate bars produced by Machine-2 Y1 = 1 if Machine-1 produces chocolate bar; 0, otherwise Y2 = 1 if Machine-2 produces chocolate bar; 0, otherwise Min 2.5C1 + 2C2 + 100Y1 + 120Y2 s.t. C1 ≤ 1100Y1 C2 ≤ 1250Y2 Y1 + Y2 = 1 C1 + C2 ≥ 1000 C1, C2 ≥ 0 and integer Y1, Y2 = 0, 1
Difficulty: Challenging LO: 9.4, Pages 415-420 Bloom’s: Application BUSPROG: Analytic DISC: 8. Four shipping containers can altogether take a maximum of 20 tons for one shipment order. The following table provides details on the weight (in tons) and value of each container: Container Weight of container (tons) Value / Container
1 5 $6000
2 6 $5500
3 9 $7500
4 7 $6000
Develop a binary integer model that will determine the two containers and the quantity that should be considered for shipment using these two containers to maximize the value of the shipment.
Answer: Let C1 = if Container 1 is considered for shipment; 0 otherwise C2 = if Container 2 is considered for shipment; 0 otherwise C3 = if Container 3 is considered for shipment; 0 otherwise C4 = if Container 4 is considered for shipment; 0 otherwise Max
6000C1 + 5500C2 + 7500C3 + 6000C4
s.t. 5C1 + 6C2 + 9C3 + 7C4 ≤ 20 C1 + C2 + C3 + C4 = 2 Ci = 0, 1 (for i = 1, 2, 3, 4)
Difficulty: Moderate LO: 9.4, Pages 415-420 Bloom’s: Application BUSPROG: Analytic DISC: 9. BBest Ink Printing Co. received an order to print a minimum of 50,000 tickets for a concert. They have three printing machines available to meet the order they received. The set-up cost of these machines and the unit cost/ticket printed using each machine along with their maximum production are provided in the table below: Machine A B C
Set-up cost $7000 $4000 $5400
Cost per unit $18 $21 $24
Maximum Production 30,000 25,000 30,000
a. Formulate a binary integer linear programming model to find two machines that could be used to print the required number of tickets in order to minimize the cost. b. Solve the problem in part a. Answer: a. Let P1 = Number of tickets printed by Machine A P2 = Number of tickets printed by Machine B P3 = Number of tickets printed by Machine C A = 1 if Machine A is used to print; 0 if not B = 1 if Machine B is used to print; 0 if not C = 1 if Machine C is used to print; 0 if not Min
18P1 + 21P2 + 24P3 + 7000A + 4000B + 5400C
s.t. P1 + P2 + P3 ≥ 50,000 P1 ≤ 30,000A P2 ≤ 25,000B P3 ≤ 30,000C P1, P2, P3 ≥ 0 and integer A, B, C = 0, 1 b.
Difficulty: Moderate LO: 9.4, Pages 415-420 Bloom’s: Application BUSPROG: Analytic DISC: 10. Light-Twilight Inc. currently has four workshops for manufacturing light bulbs and five warehouses where the produced light bulbs are shipped from the workshops for storage. The cost, demand, and volume details are given as below: Shipping cost/1000 bulbs ($) Warehouse Workshop
1
2
3
4
5
A B C
78 49 31
25 72 90
39 27 31
77 17 25
48 29 42
Volume (in 1000’s) 90 85 80
Fixed Cost ($) 32,700 35,000 40,000
D Demand (in 1000’s)
28
20
18
27
21
40
36
16
23
29
86
20,000
a. Formulate a mixed-integer programming model to identify two workshops the management should retain in order to fulfil the estimated demand that minimizes the cost. b. Solve the model you formulated in part a. What is the optimum cost? Answer: a. Let Xij = Number of light bulbs shipped from ith workshop to jth warehouse A = 1 if the workshop A is retained; 0 if not B = 1 if the workshop B is retained; 0 if not C = 1 if the workshop C is retained; 0 if not D = 1 if the workshop D is retained; 0 if not Min 78X11 + 25X12 + 39X13 + 77X14 + 48X15 + 49X21 + 72X22 + 27X23 + 17X24 + 29X25 + 31X31 + 90X32 + 31X33 + 25X34 + 42X35 + 28X41 + 20X42 + 18X43 + 27X44 + 21X45 + 32700A + 35000B + 40000C + 20000D s.t. X11 +X12 +X13 +X14 +X15 ≤ 90A X21 +X22 +X23 +X24 +X25 ≤ 85B X31 + X32 + X33 + X34 + X35 ≤ 80C X41 + X42 + X43 + X44 + X45 ≤ 86D X11 + X21 + X31 + X41 = 40 X12 + X22 + X32 + X42 = 36 X13 + X23 + X33 + X43 = 16 X14 + X24 + X34 + X44 = 23 X15 + X25 + X35 + X45 = 29 Xij ≥ 0, where i = 1, 2, 3, 4; j = 1, 2, 3, 4, 5 A, B, C, D = 0, 1 b.
The optimal cost is $56,736. Difficulty: Challenging LO: 9.4, Pages 415-423 Bloom’s: Application BUSPROG: Analytic DISC: 11. Sansuit Investments is deciding on future investment for the coming two years. The company considered four bonds to choose from and their investment details for the next two years are given in the table below:
Bond A Bond B
Investment Requirements ($) Year 1 Year 2 32,000 35,000 15,000 21,000
Bond C Bond D
8000 10,000
9500 7000
The net worth of these four bonds is $75,000, $40,000, $25,500, and $18,000, respectively. The funds available with the company for Year 1 and Year 2 are $35,000 and $62,000, respectively. Develop and solve a binary integer programming model for maximizing the net worth. Answer: Let X1 = 1 if Bond A is selected for investment; 0 if it is not X2 = 1 if Bond B is selected for investment; 0 if it is not X3 = 1 if Bond C is selected for investment; 0 if it is not X4 = 1 if Bond D is selected for investment; 0 if it is not Max
75000X1 + 40000X2 + 25500X3 + 18000X4
s.t. 32000X1 + 15000X2 + 8000X3 + 10000X4 ≤ 35000 35000X1 + 21000X2 + 9500X3 + 7000X4 ≤ 62000 X1, X2, X3, X4 = 0, 1
Difficulty: Moderate LO: 9.4, Pages 415-420 Bloom’s: Application BUSPROG: Analytic DISC: 12. Sansuit Investments is deciding on future investment for the coming two years. The company considered four bonds to choose from and their investment details for the next two years are given in the table below:
Bond A Bond B Bond C Bond D
Investment Requirements ($) Year 1 Year 2 32,000 35,000 15,000 21,000 8000 9500 10,000 7000
The net worth of these four bonds is $75,000, $40,000, $25,500, and $18,000, respectively. The funds available with the company for Year 1 and Year 2 are $35,000 and $62,000, respectively.
a. Develop and solve a binary integer programming model for maximizing the net worth assuming that only one of the bonds can be considered. b. Suppose the investment has to be made on Bond B, and only two of the four bonds can be considered for investment. Modify your formulation from part a. to reflect this new situation. Of these two options, which is better? Answer: a. Let X1 = 1 if Bond A is selected for investment; 0 if it is not X2 = 1 if Bond B is selected for investment; 0 if it is not X3 = 1 if Bond C is selected for investment; 0 if it is not X4 = 1 if Bond D is selected for investment; 0 if it is not Max 75000X1 + 40000X2 + 25500X3 + 18000X4 s.t. 32000X1 + 15000X2 + 8000X3 + 10000X4 ≤ 35000 35000X1 + 21000X2 + 9500X3 + 7000X4 ≤ 62000 X1 + X2 + X3 + X4 = 1 X1, X2, X3, X4 = 0, 1
The optimal solution is $75,000. b. Let X1 = 1 if Bond A is selected for investment; 0 if it is not X2 = 1 if Bond B is selected for investment; 0 if it is not X3 = 1 if Bond C is selected for investment; 0 if it is not X4 = 1 if Bond D is selected for investment; 0 if it is not Max 75000X1 + 40000X2 + 25500X3 + 18000X4 s.t. 32000X1 + 15000X2 + 8000X3 + 10000X4 ≤ 35000 35000X1 + 21000X2 + 9500X3 + 7000X4 ≤ 62000 X2 = 1 X1 + X3 + X4 = 1 X1, X2, X3, X4 = 0, 1
Here, the optimal solution is obtained as $65,500 which is less than $75,000, the optimal solution obtained for part a. Hence, the option a. of investing in bonds is better than option b. Difficulty: Challenging LO: 9.4, Pages 415-420 Bloom’s: Application BUSPROG: Analytic DISC: 13. A manufacturer wants to construct warehouses in six different locations of the city to supply dry cells to his customers on time. The manufacturer wants to construct the minimum number of warehouses, to meet the market demand on time, such that at least one warehouse is within 40 miles from each other warehouse. The following table provides the distance (in miles) between the locations.
From Location A Location B Location C
Location A 0
Location B 35 0
Location C 40 35 0
To Location D 45 40 45
Location E 60 70 50
Location F 70 75 50
Location D Location E Location F
0
40 0
50 30 0
Formulate and solve an integer linear program that can be used to determine the minimum number of warehouses needed to be constructed. What are their locations? Answer: Let X1, X2, X3, X4, X5, and X6 be the variables indicating the locations A, B, C, D, E, and F, respectively, where the warehouses are to be constructed. Xi = 1 if warehouse is constructed in Location i; 0, if not, where i = 1, 2, …, 6 Min X1 + X2 + X3 + X4 + X5 + X6 s.t. X1 + X2 + X3 ≥ 1
–
Location A constraint
X1 + X2 + X3 + X4 ≥ 1
–
Location B constraint
X1 + X2 + X3 ≥ 1
–
Location C constraint
X2 + X4 + X5 ≥ 1
–
Location D constraint
X4 + X5 + X6 ≥ 1
–
Location E constraint
X5 + X6 ≥ 1
–
Location F constraint
Xi = 0, 1 (i = 1, 2, 3, 4, 5, 6)
The minimum number of warehouses to be constructed is two at the locations B and E. Difficulty: Moderate LO: 9.4, Pages 420-424 Bloom’s: Application BUSPROG: Analytic DISC: 14. ROFiL Pizza Delivery is planning to open new stores in different other regions so that the excess demand for the pizza home delivery services is met. The locations under consideration and the potential areas (coded in numbers 1-9) that can be reached quickly from the considered locations for home delivery are given in the following table: Store Location L1 L2 L3 L4
Potential Areas Covered 3, 4, 6, 8 1, 5, 9 1, 4, 7 2, 6, 7
L5 L6 L7 L8 L9
3, 4, 9 2, 7, 9 4, 8, 9 1, 2, 5, 6 3, 6, 8
Formulate an integer programming model that could be used to find the minimum number of stores to open in order to cover customers of all areas for the home delivery service. Answer: Let L1, L2, L3, L4, L5, L6, L7, L8, and L9 be the variables indicating the locations L1, L2, L3, L4, L5, L6, L7, L8, and L9, respectively. Li = 1 if the location i is selected for the store; 0 if not Min L1 + L2 + L3 + L4 + L5 + L6 + L7 + L8 + L9 s.t. L2 + L3 + L8 ≥ 1
-
Potential Area 1
L4 + L6 + L8 ≥ 1
-
Potential Area 2
L1 + L5 + L9 ≥ 1
-
Potential Area 3
L1 + L3 + L5 + L7 ≥ 1
-
Potential Area 4
L2 + L8 ≥ 1
-
Potential Area 5
L1 + L4 + L8 + L9 ≥ 1
-
Potential Area 6
L3 + L4 + L6 ≥ 1
-
Potential Area 7
L1 + L7 + L9 ≥ 1
-
Potential Area 8
L2 + L5 + L6 + L7 ≥ 1
-
Potential Area 9
Li = 0, 1 where i = 1, 2, …, 9
(i = 1, 2, …, 9)
Difficulty: Moderate LO: 9.4, Pages 415-423 Bloom’s: Application BUSPROG: Analytic DISC: 15. ROFiL Pizza Delivery is planning to open new stores in different other regions so that the excess demand for the pizza home delivery services is met. The locations under consideration and the potential areas (coded in numbers 1-9) that can be reached quickly from the considered locations for home delivery are given in the following table: Store Location L1 L2 L3 L4 L5 L6 L7 L8 L9
Potential Areas Covered 3, 4, 6, 8 1, 5, 9 1, 4, 7 2, 6, 7 3, 4, 9 2, 7, 9 4, 8, 9 1, 2, 5, 6 3, 6, 8
Solve an integer programming model that could be used to find the minimum number of stores to open in order to cover customers of all areas for the home delivery service. Answer: Note: There are alternative optima for this model. Let L1, L2, L3, L4, L5, L6, L7, L8, and L9 be the variables indicating the locations L1, L2, L3, L4, L5, L6, L7, L8, and L9, respectively. Li = 1 if the location i is selected for the store; 0 if not Min L1 + L2 + L3 + L4 + L5 + L6 + L7 + L8 + L9 s.t. L2 + L3 + L8 ≥ 1
-
Potential Area 1
L4 + L6 + L8 ≥ 1
-
Potential Area 2
L1 + L5 + L9 ≥ 1
-
Potential Area 3
L1 + L3 + L5 + L7 ≥ 1
-
Potential Area 4
(i = 1, 2, …, 9)
L2 + L8 ≥ 1
-
Potential Area 5
L1 + L4 + L8 + L9 ≥ 1
-
Potential Area 6
L3 + L4 + L6 ≥ 1
-
Potential Area 7
L1 + L7 + L9 ≥ 1
-
Potential Area 8
L2 + L5 + L6 + L7 ≥ 1
-
Potential Area 9
Li = 0, 1 where i = 1, 2, …, 9
The minimum number of stores needed to open is 3 and the locations are L1, L6, and L8. Difficulty: Moderate LO: 9.4, Pages 415-423 Bloom’s: Application BUSPROG: Analytic DISC:
16. A coffee maker vendor has set up two coffee machines, Machine 1 and Machine 2, inside an organization. The service cost incurred on Machine 1 and Machine 2 is $100 and $80, respectively. The production cost of coffee is $2 per mug for Machine 1 and $3 per mug for Machine 2. There is no service provided by the vendor on Sundays. The weekly production capacity is 1000 mugs for Machine 1 and 1200 mugs for Machine 2, and thereafter the machine needs to be serviced before any extra mug of coffee is to be served. Due to Christmas, the employee attendance at the organization is going to be low and only one machine has to be used to serve at least 800 mugs of coffee in that week in order to minimize the total cost. Formulate and solve the integer programming model that could be used to determine the coffee machine which minimizes the cost. Answer: A = 1 if Machine 1 is used to serve coffee; 0, if not B = 1 if Machine 2 is used to serve coffee; 0, if not M1 = Number of mugs of coffee served by Machine 1 M2 = Number of mugs of coffee served by Machine 2 Minimize 100A + 80B + 2M1 + 3M2 M1 ≤ 1000A M2 ≤ 1200B M1 + M2 ≥ 800 A+B=1 M1, M2 ≥ 0 and integer A, B = 0, 1
Difficulty: Challenging LO: 9.3, 9.4, Pages 410-423 Bloom’s: Application BUSPROG: Analytic DISC: 17. Greenbell Software Inc. conducted a study on its smartphone products in the market to determine which phone has the best features in terms of three prominent attributes: operating system of the phone (A or B), RAM (512MB or 1GB), and the rear camera specifications (3MP, 5MP, or 7MP) provided with the phone. Eight sample customers have participated in the study and provided the following part-worths for each of the above attributes.
Consumer 1 2 3 4 5 6 7 8
Operating System A B 25 29 35 32 45 35 55 65 40 40 30 47 35 45 25 25
RAM 512 MB 1 GB 20 45 30 35 30 30 25 45 35 40 40 35 30 30 25 30
3MP 18 27 15 25 20 15 20 25
Camera 5MP 18 19 25 20 25 20 20 30
7MP 15 13 30 35 30 15 20 30
Suppose the overall utility (sum of part-worths) of the current favorite Greenbell smartphone is 100 for each consumer. What new product design will maximize the share of choice for the eight consumers in the sample? Answer: Note: There are alternative optima for this model. Lij = 1 if Greenbell chooses level i for attribute j; 0 otherwise Yk = 1 if consumer k prefers the new Greenbell product; 0 otherwise Max Y1 + Y2 + . . . + Y8 s.t. 25L11 + 29L21 + 20L12 + 45L22 + 18L13 + 18L23 + 15L33 ≥ 1 + 100Y1 35L11 + 32L21 + 30L12 + 35L22 + 27L13 + 19L23 + 13L33 ≥ 1 + 100Y2 45L11 + 35L21 + 30L12 + 30L22 + 15L13 + 25L23 + 30L33 ≥ 1 + 100Y3 55L11 + 65L21 + 25L12 + 45L22 + 25L13 + 20L23 + 35L33 ≥ 1 + 100Y4 40L11 + 40L21 + 35L12 + 40L22 + 20L13 + 25L23 + 30L33 ≥ 1 + 100Y5 30L11 + 47L21 + 40L12 + 35L22 + 15L13 + 20L23 + 15L33 ≥ 1 + 100Y6 35L11 + 45L21 + 30L12 + 30L22 + 20L13 + 20L23 + 20L33 ≥ 1 + 100Y7 25L11 + 25L21 + 25L12 + 30L22 + 25L13 + 30L23 + 30L33 ≥ 1 + 100Y8 L11 + L21 = 1 L12 + L22 = 1 L13 + L23 + L33 = 1
The optimal solution obtained using Excel Solver shows l21 = l22= l23 = 1. This indicates that a smartphone with an operating system B, RAM 1GB, and 5MP camera will maximize the share of choices. The optimal solution also has y4 = y5 = y6 = 1 which indicates that customers 4, 5, and 6 will prefer this new smartphone. Note: An alternative optimal solution is l11 = l12 = l33 = 1. Difficulty: Challenging LO: 9.4, Pages 424-426 Bloom’s: Application BUSPROG: Analytic DISC:
18. Greenbell Software Inc. conducted a study on its smartphone products in the market to determine which phone has the best features in terms of three prominent attributes: operating system of the phone (A or B), RAM (512MB or 1GB), and the rear camera specifications (3MP, 5MP, or 7MP) provided with the phone. Eight sample customers have participated in the study and provided the following part-worths for each of the above attributes.
Consumer 1 2 3 4 5 6 7 8
Operating System A B 25 29 35 32 45 35 55 65 40 40 30 47 35 45 25 25
RAM 512 MB 1 GB 20 45 30 35 30 30 25 45 35 40 40 35 30 30 25 30
3MP 18 27 15 25 20 15 20 25
Camera 5MP 18 19 25 20 25 20 20 30
7MP 15 13 30 35 30 15 20 30
Assume the overall utility (sum of part-worths) of the current favorite Greenbell smartphone for customers 1 to 5 is 105 and customers 6 to 8 is 90. What new product design will maximize the share of choice for the eight consumers in the sample? Answer: Note: There are alternative optima for this model. Lij = 1 if Greenbell chooses level i for attribute j; 0 otherwise Yk = 1 if consumer k prefers the new Greenbell product; 0 otherwise Max Y1 + Y2 + . . . + Y8 s.t. 25L11 + 29L21 + 20L12 + 45L22 + 18L13 + 18L23 + 15L33 ≥ 1 + 105Y1 35L11 + 32L21 + 30L12 + 35L22 + 27L13 + 19L23 + 13L33 ≥ 1 + 105Y2 45L11 + 35L21 + 30L12 + 30L22 + 15L13 + 25L23 + 30L33 ≥ 1 + 105Y3 55L11 + 65L21 + 25L12 + 45L22 + 25L13 + 20L23 + 35L33 ≥ 1 + 105Y4 40L11 + 40L21 + 35L12 + 40L22 + 20L13 + 25L23 + 30L33 ≥ 1 + 105Y5 30L11 + 47L21 + 40L12 + 35L22 + 15L13 + 20L23 + 15L33 ≥ 1 + 90Y6 35L11 + 45L21 + 30L12 + 30L22 + 20L13 + 20L23 + 20L33 ≥ 1 + 90Y7
25L11 + 25L21 + 25L12 + 30L22 + 25L13 + 30L23 + 30L33 ≥ 1 + 90Y8 L11 + L21 = 1 L12 + L22 = 1 L13 + L23 + L33 = 1
The optimal solution obtained using Excel Solver shows l21 = l22= l33 = 1. This indicates that a smartphone with an operating system B, RAM 1GB, and 7MP camera will maximize the share of choices. The optimal solution also has y4 = y5 = y6 = y7 = 1 which indicates that customers 4, 5, 6, and 7 will prefer this new smartphone. Difficulty: Moderate LO: 9.4, Pages 424-426 Bloom’s: Application BUSPROG: Analytic DISC: 19. Andrew is ready to invest $200,000 in stocks and he has been provided nine different alternatives by his financial consultant. The following stocks belong to three different industrial sectors and each sector has three varieties of stocks each with different expected rate of return. The average rate of return taken for the past ten years is provided with each of the nine stocks.
Stock
Industry
1 2 3 4 5 6 7 8 9
Airlines Airlines Airlines Banking Banking Banking Agriculture Agriculture Agriculture
Annual Return 18.24% 28.75% 11.08% 20.12% 14.00% 26.17% 23.67% 18.25% 16.50%
The decision will be based on the constraints provided below: o o o o o
Exactly 5 alternatives should be chosen. One stock can have a maximum invest of $55,000. Any stock chosen must have a minimum investment of at least $25,000. For the Airlines sector, the maximum number of stocks chosen should be two. The total amount invested in Banking must be at least as much as the amount invested in Agriculture.
Now, formulate a model that will decide Andrew’s investment strategy to maximize his expected annual return. Answer: Let X1, X2, X3, X4, X5, X6, X7, X8, and X9 be the amount (in dollars) invested in Stocks 1, 2, 3, …, 9, respectively. Let Yi = 1 if Andrew invests in Stock i; 0, otherwise; where i = 1, 2, …, 9. Max 0.1824X1 + 0.2875X2 + 0.1108X3 + 0.2012X4 + 0.1400X5 + 0.2617X6 + 0.2367X7 + 0.1825X8 + 0.1650X9 s.t. Y1 + Y2 + Y3 + Y4 + Y5 + Y6 + Y7 + Y8 + Y9 = 5 Xi ≤ 55,000Yi; where i = 1, 2, …, 9 Xi ≥ 25000Yi; where i = 1, 2, …, 9 Y1 + Y2 + Y3 ≤ 2 X4 + X5 + X6 ≥ X7 + X8 +X9 X1+ X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 = 200,000 Xi ≥ 0; where i = 1, 2, …, 9 Difficulty: Easy
LO: 9.5, Pages 426-428 Bloom’s: Application BUSPROG: Analytic DISC: 20. Andrew is ready to invest $200,000 in stocks and he has been provided nine different alternatives by his financial consultant. The following stocks belong to three different industrial sectors and each sector has three varieties of stocks each with different expected rate of return. The average rate of return taken for the past ten years is provided with each of the nine stocks. Stock
Industry
1 2 3 4 5 6 7 8 9
Airlines Airlines Airlines Banking Banking Banking Agriculture Agriculture Agriculture
Annual Return 18.24% 28.75% 11.08% 20.12% 14.00% 26.17% 23.67% 18.25% 16.50%
The decision will be based on the constraints provided below: o o o o o
Exactly 5 alternatives should be chosen. Any stock chosen can have a maximum investment of $55,000. Any stock chosen must have a minimum investment of at least $25,000. For the Airlines sector, the maximum number of stocks that can be chosen is two. The total amount invested in Banking must be at least as much as the amount invested in Agriculture.
Formulate and solve a model that will decide Andrew’s investment strategy to maximize his expected annual return. Answer: Let X1, X2, X3, X4, X5, X6, X7, X8, and X9 be the amount (in dollars) invested in Stocks 1, 2, 3, …, 9, respectively. Let Yi = 1 if Andrew invests in Stock i; 0, otherwise; where i = 1, 2, …, 9. Max 0.1824X1 + 0.2875X2 + 0.1108X3 + 0.2012X4 + 0.1400X5 + 0.2617X6 + 0.2367X7 + 0.1825X8 + 0.1650X9 s.t. Y1 + Y2 + Y3 + Y4 + Y5 + Y6 + Y7 + Y8 + Y9 = 5
Xi ≤ 55,000Yi; where i = 1, 2, …, 9 Xi ≥ 25000Yi; where i = 1, 2, …, 9 Y1 + Y2 + Y3 ≤ 2 X4 + X5 + X6 ≥ X7 + X8 +X9 X1+ X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 = 200,000 Xi ≥ 0; where i = 1, 2, …, 9
Difficulty: Moderate LO: 9.3, 9.4, 9.5, Pages 410-428 Bloom’s: Application BUSPROG: Analytic DISC:
Chapter 10: Nonlinear Optimization Models 1. In a nonlinear optimization problem: a. the objective function is a nonlinear function of the constraints. b. all the constraints are nonlinear only when the objective is to maximize the function of the decision variables. c. at least one term in the objective function or a constraint is nonlinear. d. both the objective function and the constraints must have all nonlinear terms. Answer: C Difficulty: Moderate LO: 10.1, Page 449 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: A nonlinear optimization problem is any optimization problem in which at least one term in the objective function or a constraint is nonlinear. 2. A nonlinear function with term to the power of two is known as a _____. a. hyperbolic function b. quadratic function c. logarithmic function d. cubic function Answer: B Difficulty: Easy LO: 10.1, Page 450 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A nonlinear function with term to the power of two is known as a quadratic function. 3. A _____ is the shadow price of a binding simple lower or upper bound on the decision variable. a. reduced gradient b. binding constraint c. binary variable d. local optimum Answer: A Difficulty: Easy LO: 10.1, Page 454 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A reduced gradient is the shadow price of a binding simple lower or upper bound on the decision variable. 4. The reduced gradient is analogous to the _____ for linear models. a. binary variable
b. binding constraint c. reduced cost d. objective coefficient Answer: C Difficulty: Moderate LO: 10.1, Page 454 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The reduced gradient for nonlinear models is analogous to the reduced cost for linear models. 5. The Lagrangian multiplier is the _____ for a constraint in a nonlinear problem. a. shadow price b. payoff value c. reducing gradient d. reduced cost Answer: A Difficulty: Easy LO: 10.1, Page 454 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Lagrangian multiplier is the shadow price for a constraint in a nonlinear problem. 6. In a nonlinear problem, the rate of change of the objective function with respect to the righthand side of a constraint is given by the ____. a. slope of the contour line b. local optimum c. Reducing gradient d. Lagrangian multiplier Answer: D Difficulty: Moderate LO: 10.1, Page 454 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In a nonlinear problem, the rate of change of the objective function with respect to the right-hand side of a constraint is given by the Lagrangian multiplier. 7. The _____ of a solution is a mathematical concept that refers to the set of points within a relatively close proximity of the solution. a. objective function contour b. neighborhood c. regression equation d. Lagrangian multiplier
Answer: B Difficulty: Easy LO: 10.2, Page 455 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The neighborhood of a solution is a mathematical concept that refers to the set of points within a relatively close proximity of the solution. 8. A feasible solution is a(n) _____ if there are no other feasible solutions with a better objective function value in the immediate neighborhood. a. efficient frontier b. local optimum c. global maximum d. diverging function Answer: B Difficulty: Easy LO: 10.2, Page 455 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A feasible solution is a local optimum if there are no other feasible solutions with a better objective function value in the immediate neighborhood. 9. If there are no other feasible solutions with a larger objective function value in the immediate neighborhood, then the feasible solution is known as _____. a. a global maximum b. infeasible c. a nonlinear solution d. a local maximum Answer: D Difficulty: Easy LO: 10.2, Page 455 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A feasible solution is a local maximum if there are no other feasible solutions with a larger objective function value in the immediate neighborhood. 10. A feasible solution is a local minimum if there are no other feasible solutions with a: a. smaller objective function value in the immediate neighborhood. b. same objective function value in the immediate neighborhood. c. set of points defining the minimum possible risk in the entire feasible region. d. same objective function value in the entire feasible region. Answer: A Difficulty: Easy
LO: 10.2, Page 455 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A feasible solution is a local minimum if there are no other feasible solutions with a smaller objective function value in the immediate neighborhood. 11. A feasible solution is _____ if there are no other feasible points with a better objective function value in the entire feasible region. a. infeasible b. unbounded c. nonlinear d. a global optimum Answer: D Difficulty: Easy LO: 10.2, Page 455 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A feasible solution is a global optimum if there are no other feasible points with a better objective function value in the entire feasible region. 12. If there are no other feasible points with a larger objective function value in the entire feasible region, a feasible solution is _____. a. an efficient frontier b. a global maximum c. not a local maximum d. a global minimum Answer: B Difficulty: Easy LO: 10.2, Page 455 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A feasible solution is a global maximum if there are no other feasible points with a larger objective function value in the entire feasible region. 13. A feasible solution is _____ if there are no other feasible points with a smaller objective function value in the entire feasible region. a. a global minimum b. not a local maximum c. not a local minimum d. bowl-shaped Answer: A Difficulty: Easy LO: 10.2, Page 455
Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A feasible solution is a global minimum if there are no other feasible points with a smaller objective function value in the entire feasible region. 14. A global minimum a. is also a local maximum. b. need not be a local maximum, but vice versa is true. c. is also a local minimum. d. need not be local minimum, but vice versa is true. Answer: C Difficulty: Moderate LO: 10.2, Page 455 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: A global minimum is also a local minimum. 15. A function that is bowl-shaped down is called a _____ function. a. concave b. convex c. conic d. linear Answer: A Difficulty: Easy LO: 10.2, Page 455 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A function that is bowl-shaped down is called a concave function. Reference - 10.1: Use the graph given below to answer questions 16-17.
16. Reference - 10.1. Which of the following functions is most likely to yield the above shape? a. f(X, Y) = X2 + Y2 b. f(X, Y) = –X – Y c. f(X, Y) = –X2 – Y2
d. f(X, Y) = Xsin(5πX) + Ysin(5πY) Answer: C Difficulty: Moderate LO: 10.2, Page 456 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The given figure represents a concave function f(X, Y) = –X2 – Y2. 17. Reference - 10.1. The point (0, 0) is a(n) _____ for the given concave function. a. local maximum b. local minimum c. convergence point d. endpoint Answer: A Difficulty: Moderate LO: 10.2, Page 456 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The maximum value for the given function is 0, and the point (0, 0) gives the optimal value of 0. The point (0, 0) is a local maximum; but it is also a global maximum because no point gives a larger function value. 18. If all the squared terms in a quadratic function have a negative coefficient and there are no cross-product terms, then the function is a _____ function. a. convex quadratic b. nonlinear objective c. concave quadratic d. negative elliptical Answer: C Difficulty: Easy LO: 10.2, Page 456 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: If all the squared terms in a quadratic function have a negative coefficient and there are no cross-product terms, then the function is a concave quadratic function. 19. A function that is bowl-shaped up is called a(n) _____ function. a. concave b. optimal c. convex d. elliptical Answer: C
Difficulty: Easy LO: 10.2, Page 456 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A function that is bowl-shaped up is called a convex function. Reference - 10.2: Use the graph given below to answer questions 20-21.
20. Reference - 10.2. Which of the following functions is most likely to yield the above shape? a. f(X, Y) = X2 + Y2 b. f(X, Y) = Xsin(2πY) + Ysin(2πX) c. f(X, Y) = –X2 – Y2 d. f(X, Y) = Xsin(5πX) + Ysin(5πY) Answer: A Difficulty: Moderate LO: 10.2, Page 456 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The given figure represents a convex function f(X, Y) = X2 + Y2. 21. Reference - 10.2. What is the minimum value for this function? a. –∞ b. 0 c. –1 d. 1 Answer: B Difficulty: Moderate LO: 10.2, Page 456 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The minimum value for the convex function is 0. Reference - 10.3: Use the graph given below to answer questions 22-24.
22. Reference - 10.3. Which of the following equations is most likely to yield the above curve? a. f(X, Y) = Xlog(2πY) + Ylog(2πX) b. f(X, Y) = X – Y c. f(X, Y) = –X2 – Y2 d. f(X, Y) = Xsin(5πX) + Ysin(5πY) Answer: D Difficulty: Moderate LO: 10.2, Page 456 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The function shown in the graph is f(X, Y) = Xsin(5πX) + Ysin(5πY). 23. Reference - 10.3. The feasible region for the function represented in the graph is: a. –1 ≤ X ≤ 1, –1 ≤ Y ≤ 1. b. –1.5 ≤ X ≤ 1, 0 ≤ Y ≤ ∞. c. –1.5 ≤ X ≤ 2.0, –1.5 ≤ Y ≤ 2.0. d. 0 ≤ X ≤ 1, 0 ≤ Y ≤ 1. Answer: D Difficulty: Moderate LO: 10.2, Page 456 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The feasible region for the given function is 0 ≤ X ≤ 1, 0 ≤ Y ≤ 1. 24. Reference - 10.3. Which of the following is true of the above function? a. It has single local minimum. b. It has multiple local optima. c. It has single local maximum. d. It has no maxima and minima.
Answer: B Difficulty: Moderate LO: 10.2, Page 457 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The given function has multiple local optima. 25. The _____ option is helpful when the solution to a problem appears to depend on the starting values for the decision variables. a. Restart b. Convergence c. Derivatives d. Multistart Answer: D Difficulty: Easy LO: 10.2, Page 459 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Multistart option is helpful when the solution to a problem appears to depend on the starting values for the decision variables. 26. Solving nonlinear problems with local optimal solutions is performed using _____, in Excel Solver, which is based on more classical optimization techniques. a. Goal Seeker b. Linear Regression c. GRG Nonlinear d. Simplex LP Answer: C Difficulty: Easy LO: 10.2, Page 459 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Solving nonlinear problems with local optimal solutions is performed using GRG Nonlinear, in Excel Solver, which is based on more classical optimization techniques. Evolutionary Solver may be useful for more complex nonlinear models that involve Excel functions such as VLOOKUP and IF. 27. Excel Solver’s _____ is based on a method that searches for an optimal solution by iteratively adjusting a population of candidate solutions. a. Evolutionary Solver b. Goal Seeker c. Simplex LP d. GRG Nonlinear
Answer: A Difficulty: Easy LO: 10.2, Page 459 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Excel Solver’s Evolutionary Solver is based on a method that searches for an optimal solution by iteratively adjusting a population of candidate solutions. 28. A portfolio optimization model used to construct a portfolio that minimizes risk subject to a constraint requiring a minimum level of return is known as_____. a. capital budgeting pricing model b. market share optimization model c. Hauck maximum variance portfolio model d. Markowitz mean-variance portfolio model Answer: D Difficulty: Easy LO: 10.4, Page 461 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A portfolio optimization model used to construct a portfolio that minimizes risk subject to a constraint requiring a minimum level of return is known as Markowitz meanvariance portfolio model. 29. The measure of risk most often associated with the Markowitz portfolio model is the a. expected return of the portfolio. b. annual interest on the portfolio. c. variance of the portfolio’s return. d. number of investments listed in the portfolio. Answer: C Difficulty: Moderate LO: 10.4, Page 462 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The measure of risk most often associated with the Markowitz portfolio model is the variance of the portfolio’s return. 30. The portfolio variance is the: a. sum of the squares of the deviations from the mean value under each scenario. b. average of the sum of the squares of the deviations from the mean value under each investment scenario. c. average of the product of the squares of the deviations from the mean value under each scenario. d. average of the sum of the deviations from the mean value under each investment scenario.
Answer: B Difficulty: Easy LO: 10.4, Page 463 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The portfolio variance is the average of the sum of the squares of the deviations from the mean value under each investment scenario. 31. If the portfolio variance were equal to zero, the amount of risk would be _____. a. unity b. a positive number greater than 1 c. negative always d. zero Answer: D Difficulty: Moderate LO: 10.4, Page 463 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: If the portfolio variance were equal to zero, then every scenario return would be equal, and there would be no risk. 32. One of the ways to formulate the Markowitz model is to: a. maximize the variance of the portfolio subject to a constraint on the expected return of the portfolio. b. minimize the expected return of the portfolio subject to a constraint on variance. c. minimize the variance of the portfolio subject to a constraint on the expected return of the portfolio. d. minimize the expected return of the portfolio with no constraint on variance. Answer: C Difficulty: Moderate LO: 10.4, Page 463 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: One of the ways to formulate the Markowitz model is to minimize the variance of the portfolio subject to a constraint on the expected return of the portfolio. 33. Which of the following is a second way of formulating the Markowitz model? a. Maximizing the expected return of the portfolio subject to a constraint on variance b. Minimizing the expected return of the portfolio subject to a constraint on variance. c. Maximizing the variance of the portfolio subject to a constraint on the expected return of the portfolio d. Maximizing the variance of the portfolio with no constraint needed for the expected return of the portfolio
Answer: A Difficulty: Moderate LO: 10.4, Page 463 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Two basic ways to formulate the Markowitz model are (1) to minimize the variance of the portfolio subject to a constraint on the expected return of the portfolio and (2) to maximize the expected return of the portfolio subject to a constraint on variance. 34. A(n) _____ is a set of points defining the minimum possible risk for a set of return values. a. Contour b. Efficient frontier c. Unity constraint d. Reduced gradient Answer: B Difficulty: Easy LO: 10.4, Page 465 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An efficient frontier is a set of points defining the minimum possible risk for a set of return values. 35. The _____forecasting model uses nonlinear optimization to forecast the adoption of innovative and new technologies in the marketplace. a. Hauck b. LMS c. Markowitz d. Bass Answer: D Difficulty: Easy LO: 10.5, Page 465 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Bass forecasting model uses nonlinear optimization to forecast the adoption of innovative and new technologies in the marketplace. 36. In the Bass forecasting model, parameter m: a. measures the likelihood of adoption due to a potential adopter being influenced by someone who has already adopted the product. b. measures the likelihood of adoption, assuming no influence from someone who has already adopted the product. c. refers to the number of people estimated to eventually adopt the new product. d. refers to the number of people who have already adopted the new product.
Answer: C Difficulty: Easy LO: 10.5, Page 465 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: In the Bass forecasting model, parameter m refers to the number of people estimated to eventually adopt the new product. 37. In the Bass forecasting model, the _____ measures the likelihood of adoption due to a potential adopter being influenced by someone who has already adopted the product. a. coefficient of innovation b. coefficient of imitation c. coefficient of regression d. coefficient of the objective function Answer: B Difficulty: Moderate LO: 10.5, Page 465 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In the Bass forecasting model, the coefficient of imitation measures the likelihood of adoption due to a potential adopter being influenced by someone who has already adopted the product. 38. In the Bass forecasting model, the _____ measures the likelihood of adoption, assuming no influence from someone who has already purchased (adopted) the product. a. coefficient of correlation b. coefficient of imitation c. coefficient of independence d. coefficient of innovation Answer: D Difficulty: Moderate LO: 10.5, Page 465 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In the Bass forecasting model, the coefficient of innovation measures the likelihood of adoption, assuming no influence from someone who has already purchased (adopted) the product. 39. Which of the following conclusions can be drawn from the below figure using the Bass forecasting model? (Note: Bass forecasting model is given by: Ft = (p + q[Ct – 1 /m]) (m – Ct – 1), where m = the number of people estimated to eventually adopt the new product, Ct – 1 = the number of people who have adopted the product through time t – 1,
q = the coefficient of imitation, and p = the coefficient of innovation.)
a. b. c. d.
q<p q>p m<q p>m
Answer: A Difficulty: Moderate LO: 10.5, Page 466 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In this graph, the innovation factor appears to dominate the imitation factor, and hence, q < p. 40. One of the ways to use the Bass forecasting model is to wait until several periods of data for the problem under consideration are available. This is known as the _____ approach. a. branch-and-bound b. cutting plane c. rolling-horizon d. sensible-period Answer: C Difficulty: Easy LO: 10.5, Page 469 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: One of the ways to use the Bass forecasting model is to wait until several periods of data for the problem under consideration are available. This is known as the rolling-horizon approach.
Problems 1. Jeff is willing to invest $5000 in buying shares and bonds of a company to gain maximum returns. From his past experience, he estimates the relationship between returns and investments made in this company to be: 𝑅 = −2𝑆 2 − 9𝐵2 − 4𝑆𝐵 + 20𝑆 + 30𝐵. where, R = total returns in thousands of dollars S = thousands of dollars spent on Shares B = thousands of dollars spent on Bond Jeff would like to develop a strategy that will lead to maximum return subject to the restriction provided on amount available for investment. a. What is the value of return if $3,000 is invested in shares and $2,000 is invested bonds of the company? b. Formulate an optimization problem that can be solved to maximize the returns subject to investing no more than $5,000 on both share and bonds. c. Determine the optimal amount to invest in shares and bonds of the company. How much return will Jeff gain? Round all your answers to two decimal places. Answer: a. With $3000 being invested in shares and $2000 being invested in bonds, we can simply substitute these values into the returns function (remembering that the variables are defined as thousands of dollars). 𝑅 = −2𝑆 2 − 9𝐵2 − 4𝑆𝐵 + 20𝑆 + 30𝐵 = −2(32 ) − 9(22 ) − 4(3)(2) + 20(3) + 30(2) = 42 Return of $42,000 will be realized with this allocation of the investment.
b. and c. We simply add an investment constraint and non-negativity constraint to the return function that is to be maximized.
Max −2𝑆 2 − 9𝐵2 − 4𝑆𝐵 + 20𝑆 + 30𝐵 s.t. 𝑆+𝐵 ≤5 𝑆, 𝐵 ≥ 0
The solution is S = $4,290 and B = $710 with return of $53,570. The spreadsheet model is:
Difficulty: Easy LO: 10.1; Pages 449-459 Bloom’s: Application BUSPROG: Analytic Skills DISC: 2. Consider the objective function, 𝑌 = 𝐴 × 𝐿∝ × 𝐾𝛽 where, Y = total output A= total-factor productivity L = labor input K= capital input α = capital input share of contribution for L β = capital input share of contribution for K
a. Assume α = 0.33, β = 0.67, A = 10 and each unit of labor costs $45 and each unit of capital costs $55. With $50,000 available in the budget, develop an optimization model to determine the number of units of capital and labor required in order to maximize output. b. Find the optimal solution to the model you formulated in part a. Round all your answers to two decimal places. (Hint: When using Excel Solver, use the Multistart option with bounds 0 ≤ L ≤ 700 and 0 ≤ K ≤ 1000.) Answer: a. The optimization model is Max 10 × 𝐿0.33 × 𝐾 0.67 s.t. 45𝐿 + 55𝐾 ≤ 50000 𝐿, 𝐾 ≥ 0 b. The optimal solution is L = 366.67, K = 609.09 with the total output of $5151.65. The spreadsheet model is:
Difficulty: Moderate LO: 10.1; Pages 449-459 Bloom’s: Application BUSPROG: Analytic Skills DISC: 3. The profit function for two types of iPod is : 𝑃𝑟𝑜𝑓𝑖𝑡 = −5𝑥12 + 38𝑥1 − 5𝑥22 + 44𝑥2 + 520 where x1 and x2 represent number of units of production of basic and advanced iPods, respectively. Production time required for the basic iPod is 6 hours per unit, and production time required for the advanced iPod is 8 hours per unit. Currently, 50 hours are available. The cost of hours is already factored into the profit function.
a. Formulate an optimization problem that can be used to find the optimal production quantity of basic and advanced iPods. b. Solve the optimization model you formulated in part a. How much should be produced? Answer: a. The optimization model is, Max −5𝑥12 + 38𝑥1 − 5𝑥22 + 44𝑥2 + 520 s.t. 6𝑥1 + 8𝑥2 ≤ 50 𝑥1 , 𝑥2 ≥ 0 b. The optimal solution is: 𝑥1 = 3.32 and 𝑥2 = 3.76 with a profit of $685.8. The spreadsheet model is:
Difficulty: Moderate LO: 10.1; Pages 451-454 Bloom’s: Application BUSPROG: Analytic Skills DISC: 4. Roger is willing to promote and sell two types of smart watches, X and Y, at his outlet. The demand for these two watches are as follows: DX = -0.45PX + 0.34PY + 242 DY = 0.2PX - 0.58PY + 282 where, DX is the demand for watch X, PX is the selling price of watch X, DY is the demand for watch Y, and PY is the selling price of watch Y. Rogers wishes to determine the selling price that maximizes revenue for these two products. Develop the revenue function for these two models, and find the revenue maximizing prices. Answer: The revenue function is: PXDX + PYDY = PX(- 0.45PX + 0.34PY + 242) + PY(0.2PX - 0.58PY + 282). This is an example of an unconstrained optimization problem because no constraints are required here. The optimal solution is: PX = $575.49 and PY = $511 with optimal revenue of $141,686.18. The spreadsheet model is:
Difficulty: Moderate LO: 10.1; Pages 450-451 Bloom’s: Application BUSPROG: Analytic Skills DISC: 5. A Steel Manufacturing company has two production facilities that manufacture Dishwashers. Production costs at the two facilities differ because of varying labor costs, local property taxes, type of material used, volume, and so on. For Plant A, the weekly costs for producing a number of units of Dishwashers is expressed as a function: TCA(X) = X2 - 2X + 12000 where X is the weekly production volume and TCA(X) is the weekly cost for Plant A. Plant B’s weekly production costs are given by TCB(Y) = Y2 + 8Y + 10000 where Y is the weekly production volume and TCB(Y) is the weekly cost for Plant B. The manufacturer would like to produce 50 dishwashers per week at the lowest possible cost.
a. Formulate a mathematical model that can be used to determine the optimal number of dishwashers to produce each week at each facility. b. Solve the optimization model to determine the optimal number of dishwashers to produce at each facility. Answer: a. If X is the weekly production volume at plant A and Y is the weekly production volume at plant B, then the optimization model is Min X2 - 2X + 12000 + Y2 + 8Y + 10000 s.t. X + Y = 50 X, Y ≥ 0 b. The optimal solution is X = 27.5 and Y = 22.5 for an optimal cost of $23,388. These are all in thousands. The spreadsheet model is:
Difficulty: Moderate LO: 10.1; Pages 450-451 Bloom’s: Application BUSPROG: Analytic Skills DISC: 6. An Electrical Company has two manufacturing plants. The cost in dollars of producing an Amplifier at each of the two plants is given below. The cost of producing Q1 Amplifiers at first plant is: 65Q1 + 4Q12+ 90 and the cost of producing Q2 Amplifiers at the second plant is 20Q2 + 2Q22+ 120
The company needs to manufacture at least 60 Amplifiers to meet the received orders. How many Amplifiers should be produced at each of the plant to minimize the total production cost? Round the answers to two decimal places and the total cost to the nearest dollar value. Answer: If Q1 and Q2 are the number of units of amplifiers manufactured in first and second plant, respectively, then the optimization model is Min (65Q1 + 4Q12+ 90) + (20Q2 + 2Q22+ 120) s.t. Q1 + Q2 ≥ 60 Q1, Q2 ≥ 0 The optimal solution to this model is to produce 16.25 Amplifier’s at plant 1 for a production cost of $2,202.50 and 43.75 Amplifier’s at plant 2 for a production cost of $4823.13. The total cost is $7026. The spreadsheet model follows.
Difficulty: Moderate LO: 10.1; Pages 450-451 Bloom’s: Application BUSPROG: Analytic Skills DISC:
7. The exponential smoothing model is given by, 𝑦̂𝑡+1 = 𝛼𝑦𝑡 + (1 − 𝛼)𝑦̂𝑡 where 𝑦̂𝑡+1 = forecast of sales for period 𝑡 + 1 𝑦𝑡 = actual sales for period 𝑡 𝑦̂𝑡 = forecast of sales for period 𝑡 𝛼 = smoothing constant, 0 ≤ 𝛼 ≤ 1 This model is used to predict the future based on the past data values. a. The observed values with the smoothing constant α = 0.45 are given in the below table. The third column of the table displays the forecast values obtained using the above model. The forecasted error 𝑦𝑡 − 𝑦̂𝑡 is calculated in the fourth column, and the square of the forecast error and the sum of squared forecast errors are given in fifth column. Construct this table in your spreadsheet model using the formula above. (Hint: The first forecast value is same as the observed value.) Alpha = 0.45
Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Observed Value 15 12 14 18 15 16 21 18 12 19 22 14 21 20 19
Forecast 15.00 15.00 13.65 13.81 15.69 15.38 15.66 18.06 18.03 15.32 16.98 19.24 16.88 18.73 19.30
Forecast Error 0.0000 -3.0000 0.3500 4.1925 -0.6941 0.6182 5.3400 -0.0630 -6.0346 3.6809 5.0245 -5.2365 4.1199 1.2660 -0.3037
Squared Forecast Error 0.0000 9.0000 0.1225 17.5771 0.4818 0.3822 28.5159 0.0040 36.4169 13.5494 25.2458 27.4211 16.9737 1.6026 0.0922
177.39
Sum of the squared forecast error
b. The value of α is often chosen by minimizing the sum of squared forecast errors. Use Excel Solver to find the value of α that minimizes the sum of squared forecast errors. Answer:
a.
b.
Min (y2 - 𝑦̂2)2 + (y3 - 𝑦̂3)2 + (y4 - 𝑦̂4)2 + (y5 - 𝑦̂5)2 + (y6 - 𝑦̂6)2 + (y7 - 𝑦̂7)2 + (y8 - 𝑦̂8)2 + (y9 - 𝑦̂9)2 + (y10 𝑦̂10)2 + (y11 - 𝑦̂11)2 + (y12 - 𝑦̂12)2 + (y13 - 𝑦̂13)2 + (y14 - 𝑦̂14)2 + (y15 - 𝑦̂15)2 s.t. 𝑦̂1 = 𝑦1 𝑦̂2 = 𝑦̂1 + 𝛼(𝑦1 − 𝑦̂1 ) 𝑦̂3 = 𝑦̂2 + 𝛼(𝑦2 − 𝑦̂2 ) 𝑦̂4 = 𝑦̂3 + 𝛼(𝑦3 − 𝑦̂3 ) 𝑦̂5 = 𝑦̂4 + 𝛼(𝑦4 − 𝑦̂4 ) 𝑦̂6 = 𝑦̂5 + 𝛼(𝑦5 − 𝑦̂5 ) 𝑦̂7 = 𝑦̂6 + 𝛼(𝑦6 − 𝑦̂6 ) 𝑦̂8 = 𝑦̂7 + 𝛼(𝑦7 − 𝑦̂7 ) 𝑦̂9 = 𝑦̂8 + 𝛼(𝑦8 − 𝑦̂8 ) 𝑦̂10 = 𝑦̂9 + 𝛼(𝑦9 − 𝑦̂9 ) 𝑦̂11 = 𝑦̂10 + 𝛼(𝑦10 − 𝑦̂10 ) 𝑦̂12 = 𝑦̂11 + 𝛼(𝑦11 − 𝑦̂11 ) 𝑦̂13 = 𝑦̂12 + 𝛼(𝑦12 − 𝑦̂12 ) 𝑦̂14 = 𝑦̂13 + 𝛼(𝑦13 − 𝑦̂13 ) 𝑦̂15 = 𝑦̂14 + 𝛼(𝑦14 − 𝑦̂14 ) y1 = 15 y2 = 12 y3 = 14
y4 = 18 y5 = 15 y6 = 16 y7 = 21 y8 = 18 y9 = 12 y10 = 19 y11 = 22 y12 = 14 y13 = 21 y14 = 20 y15 = 19 0≤ ≤1
The optimal solution is = 0.2221. The spreadsheet model follows.
Difficulty: Challenging LO: 10.1; Pages 450-455 Bloom’s: Application BUSPROG: Analytic Skills DISC: 8. Consider the EOQ model for multiple products that are independent except for a budget restriction. The following model describes this situation: Let Dk = annual demand for product k Ck = unit cost of product k Sk = cost per order placed for product k i = inventory carrying charge as a percentage of the cost per unit B = the maximum amount of investment in goods N = number of products The decision variables are Qk, the amount of product k to order. The model is:
𝑁
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ∑ [𝐶𝑘 𝑄𝑘 + 𝑘=1
𝑆𝑘 𝐷𝑘 𝑄𝑘 + 𝑖𝐶𝑘 ] 𝑄𝑘 2
s.t. ∑𝑁 𝑘=1 𝐶𝑘 𝑄𝑘 ≤ 𝐵 𝑄𝑘 ≥ 0 for 𝑘 = 1, 2, … 𝑁 a. Set up a spreadsheet model and for the following data:
Annual Demand Product Cost Order Cost B i
Product 1 1250 $120 $110 $30,000 0.3
Product 2 1550 $90 $175
Product 3 1450 $105 $140
b. Solve the problem using Excel Solver. (Hint: For Solver to find a solution, you need to start with decision variable values that are greater than 0.) Answer: a.
b. The optimal solution is 𝑄1 = 31.565, 𝑄2 = 51.194, and 𝑄3 = 41.002 with a total minimum cost of $29,211.
Difficulty: Challenging
LO: 10.1; Pages 450-455 Bloom’s: Application BUSPROG: Analytic Skills DISC: 9. Gatson manufacturing company is willing to promote 2 types of tires: Economy tire; Premium tire, and these two tires are independent of each other in terms of demand, cost, price, etc. An analytics team of this company has estimated the profit functions for both the tires as, Monthly profit for Economy tire = 49.2415 LN(XA) + 180.414 Monthly profit for Premium tire = 84.344 LN(XB) - 150.112 where XA and XB are the advertising amount allocated to Economy tire and Premium tire, respectively, and LN is the natural logarithm function. The advertising budget is $200,000, and management has dictated that at least $20,000 must be allocated to each of the two tires. (Hint: To compute a natural logarithm for the value X in Excel, use the formula =LN(X). For Solver to find an answer, you also need to start with decision variable values greater than 0 in this problem.) Develop and solve an optimization model that will prescribe how the company should allocate its marketing budget to maximize profit. Answer: Max 49.2415 LN(XA) + 180.414 + 84.344 LN(XB) - 150.112 s.t. XA + XB ≤ 200,000 XA ≥ 20,000 XB ≥ 20,000 The optimal solution is XA = $73,722.82 and XB = $126,277.18 with a profit of $1,572.93. The spreadsheet model follows:
Difficulty: Challenging LO: 10.1; Pages 450-455 Bloom’s: Application BUSPROG: Analytic Skills DISC: 10. Jim is trying to solve a problem where point A should be within the radius of 15cms from the points B, C, D, E and F. The decision variables are defined as below. X = horizontal coordinate of point A Y = vertical coordinate of point A The data on the distances is given below:
B C D E F
Horizontal Vertical Coordinate Coordinate 9 11 14 18 18 22 13 16 17 21
Formulate a model to find the optimal location of the point A. Answer: Let
X = horizontal coordinate of point A Y = vertical coordinate of point A
Min (√(𝑋 − 9)2 + (𝑌 − 11)2 + √(𝑋 − 14)2 + (𝑌 − 18)2 + √(𝑋 − 18)2 + (𝑌 − 22)2 + √(𝑋 − 13)2 + (𝑌 − 16)2 + √(𝑋 − 17)2 + (𝑌 − 21)2 ) s.t. √(𝑋 − 9)2 + (𝑌 − 11)2 ≤ 15 √(𝑋 − 14)2 + (𝑌 − 18)2 ≤ 15 √(𝑋 − 18)2 + (𝑌 − 22)2 ≤ 15 √(𝑋 − 13)2 + (𝑌 − 16)2 ≤ 15 √(𝑋 − 17)2 + (𝑌 − 21)2 ≤ 15 The optimal solution is Horizontal Coordinate (X) = 14, Vertical Coordinate (Y) = 18, with an objective function value of 20.738. The spreadsheet model follows:
Difficulty: moderate LO: 10.3; Pages 459-461 Bloom’s: Application BUSPROG: Analytic Skills DISC: 11. Jim is trying to solve a problem where the point A should be within the radius of 15cms from points B, C, D, E, and F. The decision variables are defined as below. X = horizontal coordinate of point A Y = vertical coordinate of point A The data on the distances is given below:
B C D
Horizontal Vertical Coordinate Coordinate 9 11 14 18 18 22
E F
13 17
16 21
Formulate and solve a model that minimizes the maximum distance from point A to each of the points B, C, D, E, and F. Round all your answers to three decimal places. Answer: Let X = horizontal coordinate of point A Y = vertical coordinate of point A Let d = the maximum distance from point A to points B, C, D, E, and F. Min d s.t. 𝑑 ≥ √(𝑋 − 9)2 + (𝑌 − 11)2 𝑑 ≥ √(𝑋 − 14)2 + (𝑌 − 18)2 𝑑 ≥ √(𝑋 − 18)2 + (𝑌 − 22)2 𝑑 ≥ √(𝑋 − 13)2 + (𝑌 − 16)2 𝑑 ≥ √(𝑋 − 17)2 + (𝑌 − 21)2 The optimal solution is Horizontal Coordinate (X) = 13.498 and Vertical Coordinate (Y) = 16.502 with a minimum of maximum distance, 7.106. The spreadsheet model follows:
Difficulty: moderate LO: 10.3; Pages 459-461 Bloom’s: Application BUSPROG: Analytic Skills DISC: 12. The manager of a supermarket estimates the average number of trips made to the warehouse from each of the 5 outlets and he wants the warehouse to be closer to the outlets which has high number of average trips. The available data on the distance between the warehouse and the outlets are provided in terms of horizontal and vertical distances. That is, X = horizontal coordinate of the warehouse; and Y = vertical coordinate of the warehouse. The data is shown below.
Outlet 1 Outlet 2 Outlet 3 Outlet 4 Outlet 5
Horizontal Vertical Coordinate Coordinate Demand 2.5 3.2 10 3 3.8 16 2 2.8 13 3.5 4.2 17 2.8 3.6 19
a. Develop a new unconstrained model that minimizes the sum of the demand-weighted distance defined as the product of the demand (measured in number of trips) and the distance to the warehouse. b. Solve the model you developed in part a. Answer: a. The demand-weighted objective is: 𝑀𝑖𝑛 (10√(𝑋 − 2.5)2 + (𝑌 − 3.2)2 + 16√(𝑋 − 3)2 + (𝑌 − 3.8)2 + 13√(𝑋 − 2)2 + (𝑌 − 2.8)2 + 17√(𝑋 − 3.5)2 + (𝑌 − 4.2)2 + 19√(𝑋 − 2.8)2 + (𝑌 − 3.6)2 ) b. The optimal solution is Horizontal Coordinate (X) = 2.8, Vertical Coordinate (Y) = 3.6, with an objective function value of 39.907. The spreadsheet model follows:
Difficulty: Challenging LO: 10.3; Pages 459-461 Bloom’s: Application BUSPROG: Analytic Skills DISC: 13. Mark and his friends are planning for a holiday party. Data on longitude, latitude, and number of friends at each of the 10 locations are given below. Mark would like to identify the location for the holiday party such that it minimizes the demand-weighted distance, where demand is the number of friends at each location. Find the optimal location for the party. The distance between two cities can be approximated by the following formula 50√(𝑙𝑎𝑡1 − 𝑙𝑎𝑡2 )2 + (𝑙𝑜𝑛𝑔1 − 𝑙𝑜𝑛𝑔2 )2 where lat1 and long1 are the latitude and longitude of city 1, and lat2 and long2 are the latitude and longitude of city 2. (Hint: Notice that all longitude values given for this problem are negative. Make sure that you do not check the option for Make Unconstrained Variables Non-Negative in Solver.)
Location Ohio
latitude 26.782
longitude Friends -77.639 12
TN Mass Iowa New York Virginia NJ Wyoming Maryland CA
38.952 36.961 33.216 36.499 44.934 40.850 42.901 41.019 43.623
-76.164 -85.921 -81.753 -84.575 -72.798 -76.657 -75.514 -120.491 -119.626
2 1 8 10 7 5 6 1 9
Answer: Let X = the latitude of the optimal location to visit for a holiday trip Y = the longitude of the optimal location to visit for a holiday trip Ri = number of friends in ith location (demand) 10
𝑀𝑖𝑛 ∑ 𝑅𝑖 (50√(𝑙𝑎𝑡1 − 𝑙𝑎𝑡2 )2 + (𝑙𝑜𝑛𝑔1 − 𝑙𝑜𝑛𝑔2 )2 ) 𝑖=1
The optimal solution is X = 35.772, Y = -81.999, with an objective function value of 37,747.156. The spreadsheet model follows:
Difficulty: Challenging LO: 10.3; Pages 459-461 Bloom’s: Application BUSPROG: Analytic Skills DISC: 14. Consider the return scenario for 3 types of mutual funds, shown in the following table: Scenario Mutual Fund X Y Z
1 -33.8 -45.5 -11.8
2 12.1 15.8 128.8
3 129.9 58.9 164.4
4 137.7 34.6 17.8
5 -55.5 -39.8 -43.4
6 16.3 -64.5 -32.3
a. Develop the Markowitz portfolio model for these data with a required expected return of at least 20 percent. Assume that the six scenarios are equally likely to occur. b. Solve the model developed in part a. Answer:
a.
Let
X = the fraction of the portfolio to invest in Mutual fund-X Y = the fraction of the portfolio to invest in Mutual fund-Y Z = the fraction of the portfolio to invest in Mutual fund-Z
𝑀𝑖𝑛 1⁄6 ∑6𝑠=1(𝑅𝑠 − 𝑅̅ )2 s.t. −33.8𝑋 − 45.5𝑌 − 11.8𝑍 = 𝑅1 12.1𝑋 + 15.8𝑌 + 128.8𝑍 = 𝑅2 129.9𝑋 + 58.9𝑌 + 164.4𝑍 = 𝑅3 137.7𝑋 + 34.6𝑌 + 17.8𝑍 = 𝑅4 −55.5𝑋 − 39.8𝑌 − 43.4𝑍 = 𝑅5 16.3𝑋 − 64.5𝑌 − 32.3𝑍 = 𝑅6 𝑋+𝑌+𝑍 =1 1⁄ ∑6 𝑅 = 𝑅̅ 6 𝑠=1 𝑆 𝑅̅ ≥ 20 𝑋, 𝑌, 𝑍 ≥ 0 b.
Difficulty: Moderate LO: 10.4; Pages 461-465 Bloom’s: Application BUSPROG: Analytic Skills DISC: 15. Consider the stock return data given below. Stock A B C D
Month 1 12.07 15.95 30.52 32.42
Month 2 10.12 4.16 16.51 21.36
Month 3 14.54 6.31 34.25 13.84
Month 4 46.58 -2.74 45.62 8.12
Month 5 -19.34 6.54 -27.21 -6.84
a. Construct the Markowitz model that maximizes expected return subject to a maximum variance of 35. b. Solve the model developed in part a. Round all your answers to three decimal places. Answer:
a. Let
A = proportion of portfolio invested in Stock A. B = proportion of portfolio invested in Stock B. C = proportion of portfolio invested in Stock C. D = proportion of portfolio invested in Stock D. 𝑅̅ = the expected return of the portfolio. 𝑅𝑆 = the return of the portfolio in month s. 𝑀𝑎𝑥 𝑅̅ s.t. 12.07𝐴 + 15.95𝐵 + 30.52𝐶 + 32.42𝐷 = 𝑅1 10.12𝐴 + 4.16𝐵 + 16.51𝐶 + 21.36𝐷 = 𝑅2 14.54𝐴 + 6.31𝐵 + 34.25𝐶 + 13.84𝐷 = 𝑅3 46.58𝐴 − 2.74𝐵 + 45.62𝐶 + 8.12𝐷 = 𝑅4 −19.34𝐴 + 6.54𝐵 − 27.21𝐶 − 6.84𝐷 = 𝑅5 𝐴+𝐵+𝐶+𝐷 =1 1⁄ ∑5 𝑅 = 𝑅̅ 5 𝑠=1 𝑆 1⁄ ∑5 (𝑅 − 𝑅̅ )2 ≤ 35 5 𝑠=1 𝑠 𝐴, 𝐵, 𝐶, 𝐷 ≥ 0
b.
Difficulty: Moderate LO: 10.4; Pages 461-465 Bloom’s: Application BUSPROG: Analytic Skills DISC: 16. Consider the following data on the returns from bonds:
Bond 1 Bond 2 Bond 3
1 0.20 0.128 0.167
2 0.126 0.21 0.27
3 0.321 0.325 0.426
Year 4 5 -0.39 -0.67 -0.243 0.169 -0.84 0.143
6 0.135 0.125 -0.46
7 0.52 0.304 0.147
8 0.75 0.286 0.704
a. Construct the Markowitz portfolio model using a required expected return of at least 15 percent. Assume that the 8 scenarios are equally likely to occur. b. Solve the model using Excel Solver.
Answer: a. Let
X = the fraction of the portfolio to invest in Bond 1 Y = the fraction of the portfolio to invest in Bond 2 Z = the fraction of the portfolio to invest in Bond 3 𝑀𝑖𝑛 1⁄8 ∑8𝑠=1(𝑅𝑠 − 𝑅̅ )2 s.t. 0.20𝑋 + 0.128𝑌 + 0.167𝑍 = 𝑅1 0.126𝑋 + 0.21𝑌 + 0.27𝑍 = 𝑅2 0.321𝑋 + 0.325𝑌 + 0.426𝑍 = 𝑅3 −0.39𝑋 − 0.243𝑌 − 0.84𝑍 = 𝑅4 −0.67𝑋 + 0.169𝑌 + 0.143𝑍 = 𝑅5 0.135𝑋 + 0.125𝑌 − 0.46𝑍 = 𝑅6 0.52𝑋 + 0.304𝑌 + 0.147𝑍 = 𝑅7 0.75𝑋 + 0.286𝑌 + 0.704𝑍 = 𝑅8 𝑋+𝑌+𝑍 =1 1⁄ ∑8 𝑅 = 𝑅̅ 8 𝑠=1 𝑆 𝑅̅ ≥ 0.15 𝑋, 𝑌, 𝑍 ≥ 0
b.
Difficulty: Moderate LO: 10.4; Pages 461-465 Bloom’s: Application BUSPROG: Analytic Skills DISC: 17. Consider the following data on the returns from bonds: Year Bond 1 Bond 2 Bond 3
1 0.200 0.128 0.067
2 0.026 0.100 0.700
3 0.121 0.125 0.226
4 -0.139 -0.243 -0.184
5 -0.167 0.269 0.234
6 0.135 0.225 -0.146
7 0.152 0.204 0.047
8 0.175 0.186 0.604
Develop and solve the Markowitz portfolio model using a required expected return of at least 15 percent. Assume that the 8 scenarios are equally likely to occur. Use this model to construct an
efficient frontier by varying the expected return from 2 to 18 percent in increment of 2 percent and solving for the variance. Round all your answers to three decimal places. Answer: Let
X = the fraction of the portfolio to invest in Bond 1 Y = the fraction of the portfolio to invest in Bond 2 Z = the fraction of the portfolio to invest in Bond 3 𝑀𝑖𝑛 1⁄8 ∑8𝑠=1(𝑅𝑠 − 𝑅̅ )2 s.t. 0.20𝑋 + 0.128𝑌 + 0.067𝑍 = 𝑅1 0.026𝑋 + 0.1𝑌 + 0.7𝑍 = 𝑅2 0.121𝑋 + 0.125𝑌 + 0.226𝑍 = 𝑅3 −0.139𝑋 − 0.243𝑌 − 0.184𝑍 = 𝑅4 −0.167𝑋 + 0.269𝑌 + 0.234𝑍 = 𝑅5 0.135𝑋 + 0.225𝑌 − 0.146𝑍 = 𝑅6 0.152𝑋 + 0.204𝑌 + 0.047𝑍 = 𝑅7 0.175𝑋 + 0.186𝑌 + 0.604𝑍 = 𝑅8 𝑋+𝑌+𝑍 =1 1⁄ ∑8 𝑅 = 𝑅̅ 8 𝑠=1 𝑆 𝑅̅ ≥ 0.15 𝑋, 𝑌, 𝑍 ≥ 0
The efficient frontier is shown below. We see that, as the expected return increases the minimum variance (possible risk) also increases, and this increase is evident for expected returns of more than 8 percent.
Difficulty: Moderate LO: 10.4; Pages 461-465 Bloom’s: Application BUSPROG: Analytic Skills DISC: 18. Consider the data on investment made in four types of funds and returns from S&P 500. Mutual Fund Large-Cap Growth Large-Cap Value Small-Cap Growth Small-Cap Value S&P 500 Return
Year 1 41.54 32.45 26.13 37.56 33.15
Year 2 36.18 44.78 7.04 18.53 27.62
Year 3 32.76 28.61 -23.97 27.53 15.84
Year 4 -20.63 38.49 45.67 -5.48 30.42
a. Develop an optimization model that will give the fraction of the portfolio to invest in each of the funds so that the return of the resulting portfolio matches as closely as possible the return of the S&P 500 Index. (Hint: Minimize the sum of the squared deviations between the portfolio’s return and the S&P 500 Index return for each year in the data set.) b. Solve the model developed in part a. Answer: a. Let LG = proportion of portfolio invested in large-cap growth fund. LV = proportion of portfolio invested in large-cap value fund. SG = proportion of portfolio invested in small-cap growth fund.
SV = proportion of portfolio invested in small-cap value fund. 𝐷𝑆 = the difference between the portfolio return and the S&P 500 return, year s. 𝑀𝑖𝑛 𝐷12 + 𝐷22 + 𝐷32 + 𝐷42 s.t. 41.54𝐿𝐺 + 32.45𝐿𝑉 + 26.13𝑆𝐺 + 37.56𝑆𝑉 − 33.15 = 𝐷1 36.18𝐿𝐺 + 44.78𝐿𝑉 + 7.04𝑆𝐺 + 18.53𝑆𝑉 − 27.62 = 𝐷2 32.76𝐿𝐺 + 28.61𝐿𝑉 − 23.97𝑆𝐺 + 27.53𝑆𝑉 − 15.84 = 𝐷3 −20.63𝐿𝐺 + 38.49𝐿𝑉 + 45.67𝑆𝐺 − 5.48𝑆𝑉 − 30.42 = 𝐷4 𝐿𝐺 + 𝐿𝑉 + 𝑆𝐺 + 𝑆𝑉 = 1 𝐿𝐺, 𝐿𝑉, 𝑆𝐺, 𝑆𝑉 ≥ 0 b.
Difficulty: Moderate LO: 10.4; Pages 461-465 Bloom’s: Application BUSPROG: Analytic Skills DISC: 19. Develop a model that minimizes semivariance for the data given below with a required return of 15 percent. Define a variable 𝑑𝑠 for each scenario and let 𝑑𝑠 ≥ 𝑅̅ − 𝑅𝑠 with 𝑑𝑠 ≥ 0. Then make the objective function: Min 1⁄4 ∑4𝑠=1 𝑑𝑠2 .
Mutual Fund Large-Cap Growth Large-Cap Value Small-Cap Growth
Scenario Year 1 Year 2 Year 3 Year 4 41.54 36.18 32.76 -20.63 32.45 44.78 28.61 38.49 26.13 7.04 -23.97 45.67
Small-Cap Value
37.56
18.53
27.53
-5.48
Solve the model you developed with a required expected return of at least 15 percent. Answer: Let: LG = proportion of portfolio invested in large-cap growth fund. LV = proportion of portfolio invested in large-cap value fund. SG = proportion of portfolio invested in small-cap growth fund. SV = proportion of portfolio invested in small-cap value fund. 𝑅̅ = the expected return of the portfolio. 𝑅𝑆 = the return of the portfolio in year s. 𝑑𝑆 = the difference between the expected portfolio return and the return for year s. 𝑀𝑖𝑛 1⁄4 ( 𝑑12 + 𝑑22 + 𝑑32 + 𝑑42 ) s.t. 41.54𝐿𝐺 + 32.45𝐿𝑉 + 26.13𝑆𝐺 + 37.56𝑆𝑉 − 33.15 = 𝑅1 36.18𝐿𝐺 + 44.78𝐿𝑉 + 7.04𝑆𝐺 + 18.53𝑆𝑉 − 27.62 = 𝑅2 32.76𝐿𝐺 + 28.61𝐿𝑉 − 23.97𝑆𝐺 + 27.53𝑆𝑉 − 15.84 = 𝑅3 −20.63𝐿𝐺 + 38.49𝐿𝑉 + 45.67𝑆𝐺 − 5.48𝑆𝑉 − 30.42 = 𝑅4 1⁄ ∑4 𝑅 = 𝑅̅ 4 𝑠=1 𝑆 𝑑1 ≥ 𝑅̅ − 𝑅1 𝑑2 ≥ 𝑅̅ − 𝑅2 𝑑3 ≥ 𝑅̅ − 𝑅3 𝑑4 ≥ 𝑅̅ − 𝑅4 𝐿𝐺 + 𝐿𝑉 + 𝑆𝐺 + 𝑆𝑉 = 1 𝑑𝑠 ≥ 0; s = 1, 2, 3, 4 𝑅̅ ≥ 0.15
𝐿𝐺, 𝐿𝑉, 𝑆𝐺, 𝑆𝑉 ≥ 0
Difficulty: Moderate LO: 10.4; Pages 461-465 Bloom’s: Application BUSPROG: Analytic Skills DISC: 20. Consider the stock return data given below. Stock A B C D
Month 1 12.07 15.95 30.52 32.42
Month 2 10.12 4.16 16.51 21.36
Month 3 14.54 6.31 34.25 13.84
Month 4 46.58 -2.74 45.62 8.12
Month 5 -19.34 6.54 -27.21 -6.84
Develop and solve the Markowitz model that maximizes expected return subject to a maximum variance of 35. Use this model to construct an efficient frontier by varying the maximum allowable variance from 25 to 55 in increments of 5 and solving for the maximum return for each. Answer: Let
A = proportion of portfolio invested in Stock A. B = proportion of portfolio invested in Stock B. C = proportion of portfolio invested in Stock C. D = proportion of portfolio invested in Stock D. 𝑅̅ = the expected return of the portfolio. 𝑅𝑆 = the return of the portfolio in month s. 𝑀𝑎𝑥 𝑅̅ s.t. 12.07𝐴 + 15.95𝐵 + 30.52𝐶 + 32.42𝐷 = 𝑅1 10.12𝐴 + 4.16𝐵 + 16.51𝐶 + 21.36𝐷 = 𝑅2 14.54𝐴 + 6.31𝐵 + 34.25𝐶 + 13.84𝐷 = 𝑅3 46.58𝐴 − 2.74𝐵 + 45.62𝐶 + 8.12𝐷 = 𝑅4 −19.34𝐴 + 6.54𝐵 − 27.21𝐶 − 6.84𝐷 = 𝑅5 𝐴+𝐵+𝐶+𝐷 =1 1⁄ ∑5 𝑅 = 𝑅̅ 5 𝑠=1 𝑆 1⁄ ∑5 (𝑅 − 𝑅̅ )2 ≤ 35 5 𝑠=1 𝑠 𝐴, 𝐵, 𝐶, 𝐷 ≥ 0
The output is shown below. As the maximum variance increases the expected return increases but at a decreasing rate. This curve is known as the efficient frontier.
Difficulty: Easy LO: 10.4; Pages 461-465 Bloom’s: Application BUSPROG: Analytic Skills DISC:
Chapter 11: Monte Carlo Simulation 1. A _____ uses repeated random sampling to represent uncertainty in a model representing a real system and that computes the values of model outputs. a. Monte Carlo simulation b. what-if analysis c. deterministic model d. discrete event simulation Answer: A Difficulty: Easy LO: 11.1, Page 486 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A Monte Carlo simulation uses repeated random sampling to represent uncertainty in a model representing a real system and that computes the values of model outputs. 2. A simulation model extends spreadsheet modeling by a. extending the range of parameters for which solutions are computed. b. using real-time values for parameters from the application to formulate solutions. c. replacing the use of single values for parameters with a range of possible values. d. using historical data to make predictions about future values and expected trends. Answer: C Difficulty: Moderate LO: 11.1, Page 486 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: A simulation model extends spreadsheet modeling by replacing the use of single values for parameters with a range of possible values. 3. A description of the range and relative likelihood of possible values of an uncertain variable is known as a _____. a. risk analysis b. probability distribution c. base-case scenario d. simulation optimization Answer: B Difficulty: Easy LO: 11.1, Page 486 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A description of the range and relative likelihood of possible values of an uncertain variable is known as a probability distribution.
4. A(n) _____ is an input to a simulation model whose value is uncertain and described by a probability distribution. a. identifier b. constraint c. random variable d. decision variable Answer: C Difficulty: Easy LO: 11.1, Page 486 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A random variable is an input to a simulation model whose value is uncertain and described by a probability distribution. 5. The outcome of a simulation experiment is a(n) a. objective function. b. probability distribution for one or more output measures. c. single number. d. what-if scenario. Answer: B Difficulty: Moderate LO: 11.1, Page 487 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: A simulation experiment produces a distribution of output values that correspond to the randomly generated values of the uncertain input variables. This probability distribution of the output values describes the range of possible outcomes, as well as the relative likelihood of each outcome. 6. An input to a simulation model that is selected by the decision maker is known as a _____. a. random variable b. nonnegativity constraint c. probable input d. controllable input Answer: D Difficulty: Easy LO: 11.1, Page 487 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An input to a simulation model that is selected by the decision maker is known as a controllable input.
7. The process of evaluating a decision in the face of uncertainty by quantifying the likelihood and magnitude of an undesirable outcome is known as _____. a. risk analysis b. regression analysis c. data mining d. decision tree analysis Answer: A Difficulty: Easy LO: 11.1, Page 487 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The process of evaluating a decision in the face of uncertainty by quantifying the likelihood and magnitude of an undesirable outcome is known as risk analysis. 8. In a base-case scenario, the output is determined by assuming a. worst values that can be expected for the random variables of a model. b. the mean trial values for the random variables of a model. c. best values that can be expected for the random variables of a model. d. the most likely values for the random variables of a model. Answer: D Difficulty: Easy LO: 11.1, Page 487 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: In a base-case scenario, the output is determined by assuming the most likely values for the random variables of a model. 9. A _____ analysis involves considering alternative values for the random variables and computing the resulting value for the output. a. random b. what-if c. risk d. cluster Answer: B Difficulty: Easy LO: 11.1, Page 488 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A what-if analysis involves considering alternative values for the random variables and computing the resulting value for the output. 10. A disadvantage of the simple what-if analyses is that a. there are errors induced as a result of rounding.
b. the optimal solutions are not guaranteed. c. there is no indication of the likelihood of various output values. d. it cannot compute alternate optimal solutions. Answer: C Difficulty: Moderate LO: 11.1, Page 488 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: A disadvantage of the simple what-if analyses is that there is no indication of the likelihood of the various output values. 11. The values for random variables in a Monte Carlo simulation are a. selected manually. b. generated randomly from probability distributions. c. taken from forecasting analysis. d. derived secondarily using formulas. Answer: B Difficulty: Moderate LO: 11.2, Page 489 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Instead of manually selecting the values for the random variables, a Monte Carlo simulation randomly generates values for the random variables so that the values used reflect what might be observed in practice. 12. The choice of the probability distribution for a random variable can be guided by a. an objective function. b. likelihood factors. c. forecasting. d. historical data. Answer: D Difficulty: Moderate LO: 11.2, Page 489 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The analyst can use historical data and knowledge of the random variable to specify the probability distribution. 13. Which of the following inferences about a variable of interest can be drawn from the graph given below?
a. b. c. d.
The variable is equally likely to take any value between 20 and 40. The variable is more likely to take the value 20 than 40. The variable is more likely to take any value outside the range of 20 and 40. The variable can only take the value 30.
Answer: A Difficulty: Moderate LO: 11.2, Pages 490-491 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: This is a uniform probability distribution for the variable of interest with the range being 20 to 40. Therefore, the variable is equally likely to take any value between 20 and 40. Reference - 11.1: Use the graph given below to answer questions 14-15.
14. Reference - 11.1: The type of distribution shown in the graph is a. uniform. b. normal. c. exponential. d. beta. Answer: B Difficulty: Easy LO: 11.2, Page 491
Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The graph shows a normal distribution. 15. Reference - 11.1: Which of the following inferences can be drawn about the monthly salary? a. The average monthly salary is $3000. b. The monthly salary is always less than $3000. c. The monthly salary is always greater than $3000. d. The range of the monthly salary distribution is $3000 to $5000. Answer: A Difficulty: Moderate LO: 11.2, Page 491 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: This is a normal distribution centered on 3. The average monthly salary is $3000. 16. In a _____, a random variable can take any value in a specified range. a. discrete probability distribution b. cumulative distribution c. relative frequency distribution d. continuous probability distribution Answer: D Difficulty: Easy LO: 11.2, Page 491 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: In a continuous probability distribution, a random variable can take any value in a specified range (not just a discrete set of values). 17. A distribution of a random variable for which values extremely larger or smaller than the mean are increasingly unlikely can possibly be modeled as a(n) _____ probability distribution. a. binomial b. normal c. exponential d. gamma Answer: B Difficulty: Moderate LO: 11.2, Page 491 Bloom’s: Comprehension BUSPROG: Analytic DISC:
Feedback: The normal probability distribution is a continuous probability distribution in which any value is possible, but values extremely larger or smaller than the mean are increasingly unlikely. 18. In simulation analysis, the _____ of random variables can be adjusted to determine the impact of the assumptions about the shape of the uncertainty on the results. a. probability distributions b. ranges c. relative frequencies d. manual generations Answer: A Difficulty: Moderate LO: 11.2, Page 491 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In simulation analysis, the probability distributions of random variables can be adjusted to determine the impact of the assumptions about the shape of the uncertainty on the results. 19. A set of values for the random variables is called a(n) _____. a. event b. permutation c. trial d. combination Answer: C Difficulty: Easy LO: 11.2, Page 491 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A set of values for the random variables is called a trial. 20. The range of computer-generated random numbers is _____. a. [-∞, ∞] b. [-∞, 0) c. [1, ∞] d. [0, 1) Answer: D Difficulty: Moderate LO: 11.2, Page 491 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Computer-generated random numbers are randomly selected numbers from 0 up to, but not including, 1; this interval is denoted [0,1).
21. All the values of computer-generated random numbers are_____. a. Poisson distributed b. lognormally distributed c. uniformly distributed d. normally distributed Answer: C Difficulty: Moderate LO: 11.2, Page 491 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: All values of the computer-generated random numbers are equally likely and so the values are uniformly distributed over the interval from 0 to 1. 22. The _____ function is used to generate a pseudorandom number in Excel. a. FREQUENCY() b. RAND() c. NORM.INV() d. ROUND() Answer: B Difficulty: Easy LO: 11.2, Page 491 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The RAND() function is used to generate a pseudorandom number in Excel. 23. Which of the following parameters is required to convert a computer-generated random variable into a uniform random variable? a. Range of the distribution b. Mean of the distribution c. Variance of the distribution d. Moments of the distribution Answer: A Difficulty: Moderate LO: 11.2, Page 492 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: To generate a value for a random variable characterized by a continuous uniform distribution, the following Excel formula is used: Value of uniform random variable = lower bound + (upper bound – lower bound) × RAND(). 24. The weekly demand for an item in a retail store follows a uniform distribution over the range 70 to 83. What would be the weekly demand if its corresponding computer-generated value is 0.5?
a. b. c. d.
90.1 83 76.5 50.85
Answer: C Difficulty: Moderate LO: 11.2, Page 492 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: To generate a value for a random variable characterized by a continuous uniform distribution, the following Excel formula is used: Value of uniform random variable = lower bound + (upper bound – lower bound) × RAND(). Upon substituting the given values, the weekly demand is obtained as 76.5. 25. For a given mean and standard deviation, the _____ function in Excel is used to generate a value for the random variable characterized by a normal distribution. a. NORM.INV b. RAND c. VLOOKUP d. FREQUENCY Answer: A Difficulty: Easy LO: 11.2, Page 493 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: For a given mean and standard deviation, the NORM.INV function in Excel is used to generate a value for the random variable characterized by a normal distribution. 26. The _____ function in Excel is used to compute the statistics required to create a histogram. a. NORM.INV b. RAND c. FREQUENCY d. STDEV.S Answer: C Difficulty: Easy LO: 11.2, Page 498 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The FREQUENCY function in Excel is used to compute the statistics required to create a histogram. 27. The random variables corresponding to the interarrival times of customers and the service times of customers are commonly described by a(n) _____ distribution.
a. b. c. d.
Poisson Rayleigh lognormal exponential
Answer: D Difficulty: Moderate LO: 11.2, Page 498 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The exponential distribution is a common distribution for random variables corresponding to the interarrival times of customers, the service times of customers, and so forth. 28. In Excel, the expression LN(RAND())*(-m) would generate a(n) _____ random variable with mean m. a. lognormal b. logarithmic c. normal d. exponential Answer: D Difficulty: Moderate LO: 11.2, Page 498 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: In Excel, the expression LN(RAND())*(-m) would generate an exponential random variable with mean m. 29. The Excel function _____ generates integer values between lower and upper bounds. a. RAND b. RANDBETWEEN c. LOWER d. UPPER Answer: B Difficulty: Easy LO: 11.2, Page 498 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Excel function RANDBETWEEN generates integer values between lower and upper bounds. 30. The Excel add-in _____ is used to design a spreadsheet simulation model. a. Analytic Solver Platform b. Goal Seek
c. Solver d. GPSS Answer: A Difficulty: Easy LO: 11.3, Page 498 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The Excel add-in Analytic Solver Platform is used to design a spreadsheet simulation model. 31. In a simulation process, the error of the estimates of the output can be reduced by a. increasing the range of possible values for the random variables. b. increasing the number of random variables. c. increasing the number of trials per simulation. d. conducting regression analysis to forecast values. Answer: C Difficulty: Moderate LO: 11.3, Page 505 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In a simulation process, the error of the estimates of the output can be reduced by increasing the number of trials per simulation. 32. According to the _____, the sum of independent random variables can be approximated by a normal probability distribution. a. what-if analysis b. central limit theorem c. simulation optimization approach d. discrete-event simulation method Answer: B Difficulty: Easy LO: 11.3, Page 518 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: According to the central limit theorem, the sum of independent random variables can be approximated by a normal probability distribution. 33. Which of the following is true of simulation optimization using Analytic Solver Platform (ASP)? a. It is computationally simple and cheap. b. It can be executed in a very short time. c. It is an iterative process. d. It guarantees an optimal solution.
Answer: C Difficulty: Moderate LO: 11.4, Page 518 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: In general, ASP searches for optimal values of the decision variables by iteratively adjusting the values of the decision variables. Solving a simulation optimization model can be very computationally expensive and take a long time. Furthermore, ASP cannot guarantee an optimal solution to a simulation optimization model. 34. ASP cannot guarantee an optimal solution to a simulation optimization model because a. the optimization problem will have only one local optimal solution due to the linear relationship. b. of sampling error resulting from nonlinear relationships. c. it is not possible to compare solutions across runs. d. of sampling error resulting from the presence of the random variables. Answer: D Difficulty: Moderate LO: 11.4, Pages 518-519 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: ASP cannot guarantee an optimal solution to a simulation optimization model because of sampling error resulting from the simulation trials and because the optimization problem may have multiple local optimal solutions due to nonlinear relationships. Therefore, it is helpful to solve the model multiple times and compare the solutions across runs. 35. The process of determining that a computer program implements a simulation model as it is intended is known as _____. a. validation b. verification c. correlation d. optimization Answer: B Difficulty: Easy LO: 11.5, Page 524 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The process of determining that a computer program implements a simulation model as it is intended is known as verification. 36. Which of the following is true of verification? a. It is largely a debugging task. b. It requires an agreement among analysts and managers.
c. It deals with the accurate modeling of real system operations. d. It is performed prior to the development of the computer procedure for simulation. Answer: A Difficulty: Moderate LO: 11.5, Page 524 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Verification is largely a debugging task to make sure that no errors are in the computer procedure that implements the simulation. 37. _____ is the process of determining that a simulation model provides an accurate representation of a real system. a. Regression b. Verification c. Consideration d. Validation Answer: D Difficulty: Easy LO: 11.5, Page 524 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Validation is the process of determining that a simulation model provides an accurate representation of a real system. 38. Which of the following is a disadvantage of using simulation? a. Experimenting directly with a simulation model is often not feasible. b. Each simulation run provides only a sample of how the real system will operate. c. The simulation models are used to describe systems without requiring the assumptions that are required by mathematical models. d. Simulation models warn against poor decision strategies by projecting disastrous outcomes such as system failures, large financial losses, and so on. Answer: B Difficulty: Moderate LO: 11.5, Pages 524-525 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Simulation is not without disadvantages. Each simulation run provides only a sample of how the real system will operate. As such, the summary of the simulation data provides only estimates or approximations about the real system. 39. _____ is a measure of dependence between two random variables. a. Approximation b. Verification
c. Correlation d. Validation Answer: C Difficulty: Easy LO: Appendix 11.1, Pages 537-538 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Correlation is a measure of dependence between two random variables. 40. In a(n) _____ relationship between two quantities, either one quantity never increases as the other increases, or one quantity never decreases as the other increases. a. monotonic b. random c. classified d. independent Answer: A Difficulty: Moderate LO: Appendix 11.1, Page 538 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: In a monotonic relationship between two quantities, either one quantity never increases as the other increases, or one quantity never decreases as the other increases. Problems 1. Sunseel Industries produces different types of raw materials and it is interested in using simulation to estimate the profit per unit for its new product X. The selling price for the product will be $40 per unit. Probability distributions for the raw material cost, the production cost, and the marketing cost are estimated as follows:
Raw Material Cost ($) 16 18 20 22
Probability 0.20 0.30 0.35 0.15
Production Cost ($) 10 11 12
Marketing Probability Cost ($) Probability 0.25 5 0.40 0.45 6 0.60 0.30
a. Compute profit per unit for the base case, worst case, and best case. b. Construct a simulation model to estimate the mean profit per unit. c. Management believes the project may not be sustainable if the profit per unit is less than $2. Use simulation to estimate the probability the profit per unit will be less than $2.
Answer: a. Profit = Selling Price – Raw Material Cost - Production Cost - Marketing Cost Base-Case using most likely costs Profit
=
40 - 20 - 11 - 6 = $3/unit
=
40 - 22 - 12- 6 = $0/unit
=
40 - 16 - 10 - 5 = $9/unit
Worst-Case scenario: Profit Best-Case scenario: Profit
b. The average profit from the simulation model (see below) should be approximately $4.45.
c. As seen in the chart below, there is approximately a 0.09 probability that profit per unit will be less than $2.
Difficulty: Moderate LO: 11.1; LO: 11.3, Pages 487-518 Bloom’s: Application BUSPROG: Analytic DISC: 2. Salemach Corporation is a start-up company that manufactures simple machines. It is interested in analyzing the profit from a new machine. It estimates that the selling price will be $150 per unit and the setup and advertising costs will total $250,000. The company estimates that the per unit raw material cost is uniformly distributed between $50 and $80 and are equally likely. The demand is normally distributed with a mean of 12,000 units and a standard deviation of 3000 units. The probability distribution for a range of labor cost per unit is given below. Labor Cost $52 $53 $54 $55 $56
Probability 0.05 0.25 0.40 0.25 0.05
a. Obtain estimates for the mean profit, maximum profit, minimum profit, and standard deviation of profit. b. What is your estimate of the probability of a loss?
Answer: a. Estimated mean profit = $121,702; maximum profit = $725,148; minimum profit = – $232,635; profit standard deviation = $140,713. b. Estimated probability of a loss = 0.2106
Difficulty: Moderate LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 3. A company has produced a new battery with an estimated mean lifetime of 60 hours. Management also believes that the standard deviation is 4.5 hours and that battery hours are normally distributed. To promote the new battery, the management has offered to refund some money if the battery fails to reach 50 hours before the battery needs to be recharged. Specifically, for batteries with a lifetime below 50 hours, the management will refund a customer $50 per hour short of 50 hours. a. For each battery sold, what is the expected cost of the promotion? b. What hours should the company set the promotion claim if it wants the expected cost to be $0.50? Answer: a. The average cost of the promotion per battery is approximately $1.03.
b. The solution is obtained using simulation optimization model with an objective of setting the expected value of cost (in cell B12) to $0.50 and setting cell B4 to be the decision variable. The solution obtained will vary slightly across optimization runs, but when rounded, a promotion claim of 49 hours will result in an average/expected cost of $0.50.
Difficulty: Easy LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 4. An Investment firm offers free financial planning seminars at major hotels for groups of 30 individuals. Each seminar costs them $4,000 and the average first-year commission for each new enrollment is $6,000. The firm estimates that for each individual attending the seminar, there is a 0.05 probability that he/she will enroll. a. Determine the equation for computing the profit per seminar, given values of the relevant parameters. b. Construct a spreadsheet simulation model to analyze the profitability of the seminars. Would you recommend the investment firm to continue running the seminars? Answer: a. Profit = (New Enrollment × 6000) – 4000. b. The expected profit from a seminar is $5,000. Hence, the investment firm can continue conducting seminars.
Difficulty: Easy LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC:
5. The stock price of Robin Tires, Inc., listed on the Stock Exchange is currently $20. The following probability distribution shows how the price per share is expected to change over a threemonth period:
Stock Profit Change ($) -3 -2 -1 0 +1 +2 +3
Probability 0.25 0.2 0.05 0.15 0.1 0.15 0.1
a. Construct a spreadsheet simulation model that computes the value of the stock profit in 3 months, 6 months, 9 months, and 12 months under the assumption that the change in profit over any 3-month period is independent of the change in profit over any other 3month period. b. With the current profit of $20 per share, simulate the profit per share for the next four 3month periods. What is the average profit in 12 months? What is the standard deviation of the profit in 12 months? Answer: a. Refer to the screenshot below for the Spreadsheet Simulation Model.
b. The mean stock profit after 12 months is $18 and the standard deviation is $4.20. Difficulty: Moderate
LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 6. A football tournament is conducted between Team-A and Team-B of a college and the winner being the first team to win four games of seven games. The probability that Team-A wins each game are as follows: Game Probability of Win
1 0.48
2 0.5
3 0.45
4 0.6
5 0.55
6 0.40
a. Set up a spreadsheet simulation model in which whether Team-A wins each game is a random variable. b. What is the probability that the Team- A win the tournament? c. What is the average number of games played regardless of winner? Answer: a. Refer to the screenshot below for the Spreadsheet Simulation Model.
b. Team A has approximately 0.507 probability of winning the Football Tournament.
7 0.55
c.
The average length of the tournament is approximately 5.757 games.
Difficulty: Moderate LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 7. The quality of a device should be examined in the inspection department sequentially in three steps before it is sent to packaging department. The probability distributions for the time required to complete each of the activities are as follows:
Step 1
2
3
Time (minutes) 3 5 7 9 11 13 15 8 10
Probability 0.15 0.25 0.35 0.25 0.25 0.30 0.45 0.35 0.20
12
0.45
a. Construct a spreadsheet simulation model to estimate the average time spent in inspection department and the standard deviation of the time spent in inspection department. b. What is the estimated probability that the inspection will be completed in 32 minutes or less? Answer: a. Expected duration is 30 minutes with a standard deviation of 3.19 minutes. b. Probability of completing project in 32 minutes or less is 0.8171.
Difficulty: Moderate LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 8. Team X is scheduled to play against Team Y in an upcoming game in Baseball’s World Series. Assume that each player’s point production can be represented as an integer uniform variable with the ranges provided in the following table:
Player 1
Team X [4,10]
Team Y [2,4]
2 3 4 5 6 7 8 9
[2,6] [7,20] [3,5] [6,20] [5,10] [7,10] [12,40] [9,20]
[14,30] [2,20] [1,10] [8,20] [7,12] [14,20] [3,5] [14,25]
a. Develop a spreadsheet model that simulates the points scored by each team. b. What are the average and standard deviation of points scored by Team X? What is the shape of the distribution of points scored by Team X? c. What are the average and standard deviation of points scored by Team Y? What is the shape of the distribution of points scored by Team Y? d. Let Point Differential = Team X points – Team Y points. What is the average point differential between the two teams? What is the standard deviation in the point differential? What is the shape of the point differential distribution? e. What is the probability that the Team X scores more points than the Team Y? Answer: a. Refer to the screenshot below for the Spreadsheet Simulation Model.
b. Team X scores an average of approximately 98 points with a standard deviation of 11.25 points. The distribution of points is bell-shaped (approximately normal) as a result of the Central Limit Theorem.
c.
Team Y scores an average of approximately 105.50 points with a standard deviation of 9.87 points. The distribution of points is bell-shaped (approximately normal) as a result of the Central Limit Theorem.
d. The average point differential is -7.5 points with a standard deviation of 14.95 points. The distribution of point differential is bell-shaped (approximately normal). These observations provide empirical proof that the difference of two independent normal random variables is normally distributed with a mean equal to the difference in the means of the two underlying random variables and a variance equal to the sum of the variances of the two underlying random variables.
e. Team X has approximately a 0.2991 probability of scoring more points than Team Y.
Difficulty: Challenging LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 9. A toy company designs a new toy car this season. The fixed cost to produce the car is $120,000. The variable cost which includes raw materials, production, and shipping costs, is $40 per car. The company will sell the car for $48 each. A distributor has agreed to pay the toy company $12 for each car remaining after the retail selling season. Forecasts are for expected sales of 55,000 toy cars with a standard deviation of 9000. The normal probability distribution is assumed to be a good description of the demand. The management has tentatively decided to produce 55,000 units (the same as average demand), but it wants to conduct an analysis regarding this production quantity before finalizing the decision. a. Create a what-if spreadsheet model using formula that relate the values of production quantity, demand, sales, revenue from sales, amount of surplus, revenue from sales of surplus, total cost, and net profit. What is the profit corresponding to average demand (55,000 units)? b. Modeling demand as a normal random variable with a mean of 55,000 and a standard deviation of 9000, simulate the sales of the toy car using a production quantity of 55,000 units. What is the estimate of the average profit associated with the production quantity of 55,000 cars? How does this compare to the profit corresponding to the average demand (as computed in part a)? c. Before making a final decision on the production quantity, management wants an analysis of a more aggressive 65,000-unit production quantity and a more conservative 45,000-unit production quantity. Run your simulation with these two production quantities. What is the mean profit associated with each? Answer: a. Profit equals $320,000 when demand is equal to its average of 55,000 units.
b. Average profit is approximately $190,742. Average profit is less than the profit corresponding to average demand. This phenomenon is often called the Flaw of Averages.
c. When ordering 45,000 units, the average profit is approximately $218,251. When ordering 65,000 units, the average profit is $18,252. Difficulty: Moderate LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 10. A tourist bus can accommodate 80 people and currently books up to 80 reservations. Past data shows that the tourist bus always accommodates all 80 reservations but that, on average, two people do not show up. To capture additional profit, the travel agent is considering an overbooking strategy in which he would accept 82 reservations even though the tourist bus can accommodate only 80 people. The travel agent believes that he will be able to always book all 82 reservations. The probability distribution for the number of people showing up when 82 reservations are accepted is estimated as follows:
People Showing Up 78 79 80
Probability 0.05 0.3 0.5
81 82
0.1 0.05
The travel agent receives a marginal profit of $110 for each passenger who books a reservation (regardless whether they show up). The travel agent will also incur a cost for any passenger denied seating on the bus. This cost covers added expenses of rescheduling the passenger as well as loss of goodwill, estimated to be $160 per passenger. Develop a spreadsheet simulation model for this overbooking system. Simulate the number of passengers showing up. a. What is the average net profit for each tourist bus with the overbooking strategy? b. What is the probability that the net profit with the overbooking strategy will be less than the net profit without overbooking (80*$110 = $8,800)? c. Explain how your simulation model could be used to evaluate other overbooking levels such as 81, 83, and 84 and for recommending a best overbooking strategy. Answer: a. The average net profit for each tourist bus with the overbooking strategy is $8,988. b. There is a .05 probability that the overbooking strategy will result in less than $8,800 net profit (the net profit resulting from no overbooking).
c. The same spreadsheet design can be used to simulate other overbooking strategies including accepting 81, 83 and 84 passenger reservations. In each case, the travel agent needs to estimate the distribution of the number of passengers showing up and rerun the simulation model. This would enable the agency to evaluate the other overbooking alternatives and determine at the most beneficial overbooking policy.
Alternatively, the distribution of passengers showing up could be modeled as a binomial random variable in which n = the reservation limit and p = probability of an individual passenger boards the bus. Difficulty: Moderate LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 11. The manager of a company decides to arrange a party for being promoted and has invited 50 guests for dinner. The following table contains information on the number of RSVP’ed guests. He assumes that 12 people will not turn up. He also estimates that 13 guests planning to come solo has a 65 percent chance of attending alone, a 30 percent chance of not attending, and a 5 percent chance of bringing a companion. For each of the 20 guests who plan to bring a companion, there is a 75 percent chance that she or he will attend with a companion, a 10 percent chance of attending solo, and a 15 percent chance of not attending at all. For the 5 people who have not responded, the wedding planner assumes that there is an 85 percent chance that each will not attend, a 10 percent chance they will attend alone, and a 5 percent chance they will attend with a companion.
Guests 0 1 2 No response
Number of Invitations 12 13 20 5
a. Assist the manager by constructing a spreadsheet simulation model to determine the expected number of guests who will attend the party. b. Use the Monte Carlo simulation model to determine X, the minimum number of guests for whom the dinner needs to be ordered, so that there is at least a 95 percent chance that the actual attendance is less than or equal to X. What is the best estimate for the value of X? Answer: a. The expected number of attendees is 42.75 ≈ 43. b. P(actual attendance ≤ 49) ≈ 96%. So, 49 guests would be a relatively safe number on which to base dinner preparations.
Difficulty: Moderate LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 12.A distributor has generated a rough estimate of aftershave demand at their retails store. The distributor is confident that demand will range from 100 to 650. The following table lists weights for demand values within this range.
Demand Weight
230 0.15
330 0.25
430 0.35
530 0.25
The distributor pays a wholesale price of $21 per aftershave and then sells at a retail price of $31. a. Construct a spreadsheet model that computes net profit corresponding to a given level of demand and specified order quantity. Model demand as a random variable with ASP’s custom general distribution. b. Using simulation optimization, determine the order quantity that maximizes expected profit. What is the probability of running out of aftershave at this order quantity?
c. How many aftershaves does the distributor need to order so that the probability of running out of aftershaves is only 25 percent? How much expected profit will the distributor lose if he orders this amount rather than the amount from part b? Answer: a. Refer to the screenshot below for the Spreadsheet Model.
b. An order quantity of about 345 maximizes the expected profit at a value of approximately $2638. There is a 68% chance of running out of aftershaves at this order quantity. c. As shown in the figure below, the 75th percentile of demand is 489. Therefore an order quantity of 489 ensures only a 25% chance of running out of aftershaves. Re-running the simulation with this order quantity results in an expected profit of $1653. Thus, distributor will be losing $985 in expected profit if it decides to achieve a 75% service level rather than maximizing expected profit.
Difficulty: Moderate LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 13.Consider the table below with information regarding each activity, immediate predecessors, and duration estimates (in minutes) for each activity.
Activity A B C D E F G
Immediate Predecessors — — A B B, C D E
Minimum Time 6 4 3 2 3 4 4
Likely Time 7 9 9 6 5 7 7
Maximum Time 9 11 12 12 10 9 15
a. Using the PERT distribution in ASP to represent the duration of each activity, construct a simulation model to compute the total time to complete the task. b. What is the expected duration of the entire project? What is the standard deviation of the project duration? c. What is the likelihood that the project will be complete in 26 minutes? Answer: a. Refer to the screenshot below for the Spreadsheet Simulation Model.
b. The expected project duration is 29.03 minutes. The standard deviation of the project duration is 2.9 minutes. c. The project has an estimated 0.1528 probability of being completed within 26 minutes. Difficulty: Moderate LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC:
14.A specialty hedge fund is considering the purchase of a Jackson Pollock painting. It estimates the value of the painting to be $185 million. In an auction, both the number of competing bids and the amount of the competing bids is uncertain. The hedge fund has maintained a file summarizing 10 recent art auctions that it believes are similar to the upcoming auction. It is considering a bid of $163 million and would like to evaluate its chances of winning the upcoming auction with this bid.
Company 1 2 3 4 5 6 7 8 9 10
1 0.817 0.771 0.804 0.880 0.890 0.851 0.881 0.804 0.819 0.756
Bid Amount (As Fraction of Estimated Share Value) 2 5 6 3 4 0.884 0.756 0.863 0.825 0.819 0.851 0.786 0.851 0.786 0.851 0.756 0.874 0.877 0.910 0.804 0.819 0.860 0.880 0.880 0.786 0.896 0.784 0.792 0.792 0.786 0.804 0.819 0.819 0.860 0.880 0.773 0.896 0.877 0.860 0.784 0.819 0.804 0.786 0.786 0.819 0.885
7
0.824 0.880
a. Construct a spreadsheet simulation model to determine the likelihood of the hedge fund winning the auction. Use a discrete uniform distribution between the minimum and maximum number of bidders in the 10 observed auctions to model the number of bidders in the Jackson Pollock auction. Fit a realistic distribution to the bid data to generate values of competing bid amounts. b. For a bid amount of $163 million, estimate the probability of the hedge fund winning the auction? Answer: a. Refer to the screenshot below for the Spreadsheet Simulation Model.
b. The probability of the hedge fund winning the auction at a bid amount of $163 is 0.34. Difficulty: Challenging LO: 11.3, Pages 495-502
Bloom’s: Application BUSPROG: Analytic DISC: 15.A specialty hedge fund is considering the purchase of a Jackson Pollock painting. It estimates the value of the painting to be $185 million. In an auction, both the number of competing bids and the amount of the competing bids is uncertain. The hedge fund has maintained a file summarizing 10 recent art auctions that it believes are similar to the upcoming auction. It is considering a bid of $163 million and would like to evaluate its chances of winning the upcoming auction with this bid.
Company 1 2 3 4 5 6 7 8 9 10
1 0.817 0.771 0.804 0.880 0.890 0.851 0.881 0.804 0.819 0.756
Bid Amount (As Fraction of Estimated Share Value) 2 5 6 3 4 0.884 0.756 0.863 0.825 0.819 0.851 0.786 0.851 0.786 0.851 0.756 0.874 0.877 0.910 0.804 0.819 0.860 0.880 0.880 0.786 0.896 0.784 0.792 0.792 0.786 0.804 0.819 0.819 0.860 0.880 0.773 0.896 0.877 0.860 0.784 0.819 0.804 0.786 0.786 0.819 0.885
7
0.824 0.880
a. Construct a spreadsheet simulation model for this auction. Use a discrete uniform distribution between the minimum and maximum number of bidders in the 10 observed auctions to model the number of bidders in the Jackson Pollock auction. Fit a realistic distribution to the bid data to generate values of competing bid amounts. Use ASP to apply simulation optimization to determine the hedge fund’s bid amount that maximizes the expected return = P(winning auction)*(185 – bid amount). Hint: Placing reasonable bounds on the highest and lowest possible bid amount will greatly assist the optimization algorithm. b. What is the probability that the hedge fund wins the auction if it bids the amount that maximizes its expected return? Answer: a. Refer to the screenshot below for the Spreadsheet Simulation Model.
A bid amount of about $168.35 million maximizes the expected return on the auction to a value of approximately $16.65 million. b. A bid amount of $168.35 million has a nearly a 100% chance of winning the auction. This is because the maximum competitor bid is .91*185 = $168.35 million. Difficulty: Moderate LO: 11.3, Pages 495-518 Bloom’s: Application BUSPROG: Analytic DISC: 16.A store is offering a discount on 800 pairs of basketball shoes. The amount of the discount varies and is not revealed to the customer until paying for the shoes. The distribution of discounts is given in the below table: Discount Rate (%) Number of Tags 5 250 20 220 35 120 50 80 65 70 90 60 How many pairs of shoes does a customer have to buy so that, on average, he has purchased five containing a 65% or 90% discount? (Hint: Use the hypergeometric distribution in ASP to answer this question.) Answer: A customer needs to buy 31 shoes in order to have purchased an average of at least five shoes containing a 65% or 90% discount.
Difficulty: Moderate LO: 11.4, Pages 518-524 Bloom’s: Application BUSPROG: Analytic DISC: 17.A store is offering a discount on 800 pairs of basketball shoes. The amount of the discount varies and is not revealed to the customer until paying for the shoes. The distribution of discounts is given in the below table: Discount Rate (%) 5 20 35 50 65 90
Number of Tags 250 220 120 80 70 60
Use the negative binomial distribution to approximate the average number of pairs of shoes that a customer has to buy before purchasing two pair with a discount of at least 50%. Answer: On average a customer will buy 5.62 pairs of shoes with less than a 50% discount before having purchased two pair of shoes with a 50% or 65% or 90% discount. Thus, on average a customer purchases 5.62 + 2 = 7.62 pairs of shoes to obtain two pair with a discount of at least 50%.
Difficulty: Moderate LO: 11.4, Pages 518-524 Bloom’s: Application BUSPROG: Analytic DISC: 18. An entrepreneur who operates a cellular phone store orders inventory of cell phones based on four internal memory specifications – 8 GB, 16 GB, 32 GB, and 64 GB. She wants to evaluate her inventory ordering policy for the phones with four different amounts of internal memory. Any phone unsold at the end of a period is kept in inventory for the next sales period and incurs a holding cost expressed as 10% of the cost per unit per period. If demand during a period exceeds supply for a phone, then sale is lost. The data on the cost and selling prices of the cell phones categorized by these memory specifications are known, and representative data on the past sales are also available. Internal Memory 8 GB 16 GB 32 GB 64 GB
Cost ($) 420 500 590 670
Selling Price ($) 430 520 620 700
Past Sales: Period 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
8 GB 276 302 299 261 260 306 227 228 205 236 279 303 266 263 210 247 314 283 283 252
16 GB 274 211 276 258 244 252 213 217 215 237 251 271 252 306 250 230 285 304 245 237
32 GB 327 264 306 232 278 262 290 308 246 272 301 271 276 285 328 317 289 312 294 262
64 GB 332 329 239 277 220 252 253 250 277 284 282 282 317 310 253 340 332 256 272 323
a. Construct a spreadsheet simulation model to estimate the total profit the entrepreneur earns in a period when ordering 300 units of each cell phone. To model the respective cell phone demands, fit a realistic distribution to the sales data. What is the average total profit? What is the estimated likelihood that the entrepreneur makes less than $10,000 next period? b. Using Spearman rank correlation, compute the correlations between the demands for the cells phones based on the four memory specifications. Incorporate a correlation matrix to captures the interrelationships between the demands for each cell phone type. What is the average total profit? What is the estimated likelihood that the entrepreneur makes less than $10,000 next period? Comparing these answers to (a), conclude how the correlated demand affects the model’s implications. Answer: a. The spreadsheet simulation model is shown below. The average total profit earned by the entrepreneur is about $17,515. Also, the estimated probability that the entrepreneur makes less than $10,000 next period is 0.0278.
b. The calculation of ranks, using sales data, for computing the correlations between the demands for the cells phones based on the four memory specifications is given below:
The average total profit earned by the entrepreneur is about $17,515. Also, there is a 0.0435 probability that the total profit made by the entrepreneur is less than $10,000 next period. Comparing these results with the answers in part (a), we see that considering the correlation between demands does not have a large impact on the profit estimation, but correlated demand does suggest a slightly higher chance of more extreme outcomes.
Difficulty: Moderate LO: 11.4 and Appendix 11.1, Pages 518-524 and 537-540 Bloom’s: Application BUSPROG: Analytic DISC: 19.A branded store has outlets around the world that generates profit in the British pound, the New Zealand kiwi, and the Japanese yen. At the end of each quarter, the store converts the revenue from these three international outlets back into U.S. dollars, exposing itself to exchange rate risk. The current exchange rates are US$1.56 per £1, US$0.85 per NZD$1, and US$0.02 per ¥1. The management of the store wants to construct a simulation model to assess its vulnerability to uncertain exchange rate fluctuations. The quarterly profits earned in British pounds, New Zealand kiwis, and Japanese yen are £150,000, NZD$200,000, and ¥9,000,000, respectively. The data is given below. USD/GBP 6.85% 2.74% 4.60%
USD/NZD 4.25% 6.96% 13.54%
USD/JPY 8.84% -1.83% 6.38%
4.73% 2.99% 6.62% 8.78% -0.84% 3.91% 6.38% 1.86% -0.36% -3.73% -0.87% 2.67% 4.51% 6.57% 2.30% 2.90% 3.71% 2.91% 3.91% -1.47% -0.01% -0.37% -17.09% -8.32% 8.58% 8.02% 3.22% -3.20% 2.96% 3.34% 4.32% 5.92% 2.41% 2.44% 2.09% 3.04% -0.11% 1.62% 3.39% 8.36% 1.93% 0.23% 0.17%
6.42% 3.27% 7.41% 10.64% -5.30% 7.04% 8.44% 5.01% 0.22% 0.41% 3.73% -0.31% -4.64% 3.28% 9.32% 4.39% 7.45% 1.16% 5.73% 5.18% 0.47% -7.40% -17.84% -5.49% 15.48% 14.63% 11.91% -2.46% 4.21% 4.85% 3.97% 4.99% 6.19% 7.99% 2.77% 0.27% 5.53% 3.78% 4.44% 2.71% -0.92% 2.67% -4.61%
3.59% 2.63% 9.23% 2.88% -1.84% 2.25% 5.22% 2.35% -0.68% -2.14% -2.97% 2.64% 4.86% 1.49% 1.66% 2.34% -0.63% 6.10% 4.66% 8.43% 3.82% 0.39% 12.47% 6.72% -0.85% 5.80% 7.38% 0.31% 0.67% 6.23% 7.53% 0.28% 6.75% 15.23% -9.44% -3.46% -2.33% -1.30% 1.73% -0.52% -3.51% 1.62% 3.39%
2.03% 2.66% 3.71% 0.13% 2.02% 1.34% 1.05% 2.33% 2.99% 4.20% 1.81% -1.13% -1.79% 3.78%
-0.93% -3.25% -4.95% -2.08% 3.74% 5.73% -5.03% 7.73% 7.79% 5.74% 1.54% -1.63% 1.50% -0.48%
-2.95% 1.19% -4.55% -0.15% 18.55% 4.33% -0.45% 3.39% 7.23% 4.89% -2.70% -0.68% -1.37% 3.48%
a. If exchange rates stay at their current values, what is the total quarterly profit in U.S. dollars? b. Model the uncertainty in the quarterly changes of the exchange rates between U.S. dollars and British pounds, New Zealand kiwis, and Japanese yen using a SLURP. Use your simulation model to estimate the average total quarterly profit in U.S. dollars. What is the probability that the total quarterly profit will be lower than the answer in part a? Answer: a. If exchange rates do not change, total revenue in U.S. dollars will be $584,000. b. Expected total revenue in U.S. dollars is about $599,000. The probability that the total revenue in U.S. dollars will be less than $584,000 is 0.28. Thus, store faces significant downside risk due to exchange rate fluctuation.
Difficulty: Moderate LO: 11.4, Pages 518-524 Bloom’s: Application BUSPROG: Analytic DISC: 20.A company has improved its anti-virus program and has released a new version. Assume there is a 0.6 probability that the users of this anti-virus will upgrade the version in any particular year. That is, the upgrade year of the user is a geometric random variable. The revenue generated from the upgrade (when it occurs) follows a normal distribution with a mean of $80,000 and a standard deviation of $22,000. a. Complete a simulation model that analyzes the net present value of the revenue from the user upgrade. Use an annual discount rate of 8 percent. b. What is the average net present value earned by the company? c. What is the standard deviation of net present value? Answer: a.
b. Average NPV is about $76,253. c. Standard deviation of NPV is about $21,816. Difficulty: Moderate LO: 11.4, Pages 518-524 Bloom’s: Application BUSPROG: Analytic
Chapter 12: Decision Analysis 1. An uncertain future event affecting the consequence associated with a decision is known as a _____. a. chance event b. decision alternative c. decision node d. payoff Answer: A Difficulty: Easy LO: 12.1, Page 552 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An uncertain future event affecting the consequence, or payoff, associated with a decision is known as a chance event. 2. A(n) _____ refers to the result obtained when a decision alternative is chosen and a chance event occurs. a. payoff table b. outcome c. state of nature d. node Answer: B Difficulty: Easy LO: 12.1, Page 552 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An outcome refers to the result obtained when a decision alternative is chosen and a chance event occurs. 3. _____ are possible outcomes for chance events that affect the consequences associated with a decision alternative. a. Payoffs b. Forecasts c. Decision trees d. States of nature Answer: D Difficulty: Easy LO: 12.1, Page 552 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: States of nature are possible outcomes for chance events that affect the payoff associated with a decision alternative.
4. No more than one state of nature can occur at a given time for a chance event. This indicates that the states of nature are defined such that they are _____. a. collectively exhaustive b. mutually exclusive c. independent outcomes d. conservative events Answer: B Difficulty: Easy LO: 12.1, Page 552 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The states of nature are defined so that they are mutually exclusive (no more than one can occur). 5. The states of nature are defined so that they are _____. This means that at least one state of nature must occur at a given time for a chance event. a. collectively exhaustive b. mutually exclusive c. certain events d. optimistic outcomes Answer: A Difficulty: Easy LO: 12.1, Page 552 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The states of nature are defined so that they are mutually exclusive (no more than one can occur) and collectively exhaustive (at least one must occur). 6. A measure of the outcome of a decision such as profit, cost, or time is known as a _____. a. branch b. payoff c. regret d. forecasting index Answer: B Difficulty: Easy LO: 12.1, Page 553 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A payoff is a measure of the outcome of a decision such as profit, cost, or time. 7. _____ refer to graphical representations of the decision problems that show the sequential nature of the decision-making process.
a. b. c. d.
Influence diagrams Utility functions Decision trees Payoff tables
Answer: C Difficulty: Easy LO: 12.1, Page 553 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A decision tree is a graphical representation of the decision problem that shows the sequential nature of the decision-making process. 8. Which of the following is true of decision trees when used to solve a complex problem? a. They provide a useful way to decompose the problem. b. They are used to compute a decision maker’s risk tolerance. c. They can be converted into truth tables. d. They can be used only when the decision maker is risk neutral. Answer: A Difficulty: Moderate LO: 12.1, Page 554 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: The first step in solving a complex problem is to decompose the problem into a series of smaller subproblems. Decision trees provide a useful way to decompose a problem and illustrate the sequential nature of the decision process. 9. An intersection or junction point of a decision tree is called a(n) _____. a. node b. stem c. intercept d. branch Answer: A Difficulty: Easy LO: 12.1, Page 553 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: An intersection or junction point of a decision tree is called a node. 10. Chance nodes are a. nodes provided at the end of the states-of-nature branches. b. nodes indicating points where an uncertain event will occur. c. nodes provided at the end of the decision alternative branches where a payoff is shown. d. nodes indicating points where a decision is made.
Answer: B Difficulty: Easy LO: 12.1, Page 553 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: Chance nodes indicate points where an uncertain event will occur. 11. Lines showing the alternatives from decision nodes and the outcomes from chance nodes are called _____. a. weights b. payoffs c. diagonals d. branches Answer: D Difficulty: Easy LO: 12.1, Page 553 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Lines showing the alternatives from decision nodes and the outcomes from chance nodes are called branches. 12. The _____ approach evaluates each decision alternative in terms of the best payoff that can occur. a. conservative b. optimistic c. maximin regret d. expected value Answer: B Difficulty: Easy LO: 12.2, Page 554 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The optimistic approach evaluates each decision alternative in terms of the best payoff that can occur. The decision alternative that is recommended is the one that provides the best possible payoff. 13. For a maximization problem, the optimistic approach often is referred to as the _____ approach. a. minimin b. maximin c. minimax d. maximax Answer: D Difficulty: Moderate
LO: 12.2, Page 554 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: For a maximization problem, the optimistic approach often is referred to as the maximax approach. 14. For a minimization problem, the optimistic approach often is referred to as the _____ approach. a. maximin b. minimax c. minimin d. maximax Answer: C Difficulty: Easy LO: 12.2, Page 554 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: For a minimization problem, the optimistic approach often is referred to as the minimin approach. 15. Choosing a decision alternative that maximizes the minimum profit is a feature of the _____ approach. a. expected value b. optimistic c. conservative d. maximin regret Answer: C Difficulty: Easy LO: 12.2, Page 555 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The conservative approach evaluates each decision alternative in terms of the worst payoff that can occur. It leads to choosing the decision alternative that provides the best of the worst possible payoffs. 16. For a maximization problem, the conservative approach often is referred to as the _____ approach. a. maximin b. maximax c. minimax d. minimin Answer: A Difficulty: Easy LO: 12.2, Page 555
Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: For a maximization problem, the conservative approach often is referred to as the maximin approach. 17. For a minimization problem, the conservative approach often is referred to as the _____ approach. a. maximin b. minimax c. minimin d. maximax Answer: B Difficulty: Easy LO: 12.2, Page 555 Bloom’s: Comprehension BUSPROG: Analytic DISC: Feedback: For a minimization problem, the conservative approach often is referred to as the minimax approach. Reference - 12.1: Use the payoff table given below for a maximization problem to answer questions 18-19.
Decision Alternative D1 D2 D3 D4 D5
Payoff Table State of nature 1 5 –4 1 10 6
State of nature 2 7 1 –3 2 4
18. Reference - 12.1: Which is the recommended decision alternative using the optimistic approach? a. D1 b. D4 c. D2 d. D5 Answer: B Difficulty: Moderate LO: 12.2, Page 555 Bloom’s: Application BUSPROG: Analytic DISC:
Feedback: For a maximization problem, the optimistic approach advocates choosing the decision alternative with the maximum payoff, which is given by the decision alternative D4. 19. Reference - 12.1: Which is the recommended decision alternative using the conservative approach? a. D1 b. D5 c. D2 d. D3 Answer: A Difficulty: Moderate LO: 12.2, Page 555 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The conservative approach evaluates each decision alternative in terms of the worst payoff that can occur. The decision alternative recommended is the one that provides the best of the worst possible payoffs. For this maximization problem, the conservative approach recommends decision alternative D1 that maximizes the minimum payoff. 20. The amount of loss (lower profit or higher cost) from not making the best decision for each state of nature is known as _____. a. best payoff b. opportunity loss c. risk profile d. utility Answer: B Difficulty: Easy LO: 12.2, Page 556 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The amount of loss (lower profit or higher cost) from not making the best decision for each state of nature is known as regret or opportunity loss. 21. The minimax regret approach is a. purely optimistic. b. purely conservative. c. both purely optimistic and purely conservative. d. neither purely optimistic nor purely conservative. Answer: D Difficulty: Moderate LO: 12.2, Page 556 Bloom’s: Comprehension BUSPROG: Analytic DISC:
Feedback: The minimax regret approach is neither purely optimistic nor purely conservative. 22. For a particular maximization problem, the payoff for best decision alternative is $15.7 million while the payoff for one of the other alternatives is $12.9 million. The regret associated with the alternate decision would be a. $28.6 million. b. $15.7 million. c. $0.129 million. d. $2.8 million. Answer: D Difficulty: Moderate LO: 12.2, Page 556 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: In decision analysis, the regret for a particular decision alternative is the absolute difference between the payoff associated with the best decision alternative and the payoff associated with the chosen decision alternative. In this case, the regret would be $15.7 million – $12.9 million = $2.8 million. 23. The weighted average of the payoffs for a chance node is known as the_____. a. median value b. variance of the node c. risk measure d. expected value Answer: D Difficulty: Easy LO: 12.3, Page 557 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The weighted average of the payoffs for a chance node is known as the expected value. 24. Which of the following tools is used to create decision trees in Excel? a. MegaStat b. Analytic Solver Platform c. Arena d. Data Analysis Answer: B Difficulty: Easy LO: 12.3, Page 558 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Analytic Solver Platform is used to create decision trees in Excel.
25. _____ is the study of the possible payoffs and probabilities associated with a decision alternative or a decision strategy in the face of uncertainty. a. Risk analysis b. Cost analysis c. Certainty analysis d. Optimization Answer: A Difficulty: Easy LO: 12.3, Page 559 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Risk analysis is the study of the possible payoffs and probabilities associated with a decision alternative or a decision strategy in the face of uncertainty. 26. The study of how changes in the probability assessments for the states of nature or changes in the payoffs affect the recommended decision alternative is known as _____. a. uncertainty analysis b. cost analysis c. sensitivity analysis d. probability analysis Answer: C Difficulty: Easy LO: 12.3, Page 560 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Sensitivity analysis is the study of how changes in the probability assessments for the states of nature or changes in the payoffs affect the recommended decision alternative. 27. New information obtained through research or experimentation that enables an updating or revision of the state-of-nature probabilities is known as _____. a. joint probability b. sample information c. conditional probability d. expected utility Answer: B Difficulty: Easy LO: 12.4, Page 561 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: New information obtained through research or experimentation that enables an updating or revision of the state-of-nature probabilities is known as sample information.
28. _____ refer to the probabilities of the states of nature after revising the prior probabilities based on sample information. a. Preliminary probabilities b. Perfect probabilities c. Joint probabilities d. Posterior probabilities Answer: D Difficulty: Easy LO: 12.4, Page 561 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Posterior probabilities refer to the probabilities of the states of nature after revising the prior probabilities based on sample information. 29. What would be the value added by a market analysis undertaken, if the expected value with sample information is $8.56 million and the expected value without sample information is $6.39 million? a. $8.56 million b. $6.39 million c. $2.17 million d. $14.95 million Answer: C Difficulty: Moderate LO: 12.4, Page 566 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The difference between the expected value associated with sample information about the states of nature and the expected value associated without sample information about the states of nature gives the required expected value of sample information as $2.17 million. In other words, conducting the market research study adds a value of $2.17 million. 30. A special case of sample information where the information tells the decision maker exactly which state of nature is going to occur is known as_____ information. a. perfect b. mutual c. conditional d. prior Answer: A Difficulty: Easy LO: 12.4, Page 567 Bloom’s: Knowledge BUSPROG: Analytic DISC:
Feedback: A special case of sample information where the information tells the decision maker exactly which state of nature is going to occur is known as perfect information. 31. _____ refers to the probability of one event, given the known outcome of a (possibly) related event. a. Joint probability b. A priori probability c. Decisive probability d. Conditional probability Answer: D Difficulty: Easy LO: 12.5, Page 568 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Conditional probability refers to the probability of one event, given the known outcome of a (possibly) related event. 32. The probabilities of both sample information and a particular state of nature occurring simultaneously are termed as _____. a. joint probabilities b. probable probabilities c. preliminary probabilities d. posterior probabilities Answer: A Difficulty: Easy LO: 12.5, Page 570 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The probabilities of both sample information and a particular state of nature occurring simultaneously are termed as joint probabilities. 33. Bayes’ theorem a. can be used only for cases where conditional probabilities are unknown. b. cannot be used to calculate posterior probabilities. c. enables the use of sample information to revise prior probabilities. d. is useful for determining optimal decisions without requiring knowledge of probabilities of the states of nature. Answer: C Difficulty: Moderate LO: 12.5, Pages 568-571 Bloom’s: Comprehension BUSPROG: Analytic DISC:
Feedback: Bayes’ theorem is a theorem that enables the use of sample information to revise prior probabilities. Reference - 12.2: Use the data below and Bayes’ theorem to answer questions 34-35. States of Nature (Sj) S1 S2 S3 Total
Prior Conditional Probabilities Probabilities P(Sj) P(U|Sj) 0.65 0.75 0.20 0.35 0.15 0.20 1.00
34. Reference - 12.2: What would be the joint probabilities, P(U ∩ Sj)? a. 0.83, 0.12, and 0.05 b. 0.49, 0.07, and 0.03 c. 0.47, 0.49, and 0.04 d. 1.00, 0.59, and 1.00 Answer: B Difficulty: Moderate LO: 12.5, Pages 570-571 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: The joint probabilities are computed by multiplying the prior probability values by the corresponding conditional probability values. Thus, the required probabilities are 0.49 (0.4875 rounded to 2 decimal places), 0.07, and 0.03, respectively. 35. Reference - 12.2: Which of the following would be the posterior probabilities, P(Sj|U)? a. 0.83, 0.12, and 0.05 b. 0.49, 0.07, and 0.03 c. 0.47, 0.49, and 0.04 d. 1.00, 0.59, and 1.00 Answer: A Difficulty: Moderate LO: 12.5, Pages 570-571 Bloom’s: Application BUSPROG: Analytic DISC: Feedback: To obtain the revised or posterior probabilities, divide each joint probability by their sum. The required probabilities are, therefore, 0.49/0.59, 0.07/0.59, 0.03/0.59 or about 0.83, 0.12, 0.05, respectively. 36. _____ is a measure of the total worth of a consequence reflecting a decision maker’s attitude toward considerations such as profit, loss, and risk. a. Cost-to-company
b. Utility c. Decision value d. Regret Answer: B Difficulty: Easy LO: 12.6, Page 572 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: Utility is a measure of the total worth of a consequence reflecting a decision maker’s attitude toward considerations such as profit, loss, and risk. 37. A _____ is a decision maker who would choose a guaranteed payoff over a lottery with a better expected payoff. a. risk taker b. risk-neutral c. risk avoider d. risk-creator Answer: C Difficulty: Easy LO: 12.6, Page 574 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A risk avoider is a decision maker who would choose a guaranteed payoff over a lottery with a better expected payoff. 38. The utility function for money is a curve that depicts the relationship between a. decision alternative and utility. b. branch probabilities and utility. c. regret and utility. d. monetary value and utility. Answer: D Difficulty: Easy LO: 12.6, Page 578 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: A curve that depicts the relationship between monetary value and utility is known as the utility function for money. 39. Exponential utility functions indicate that the decision maker is _____. a. risk monitor b. risk averse c. risk neutral d. risk taker
Answer: B Difficulty: Easy LO: 12.6, Page 580 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: All exponential utility functions indicate that the decision maker is risk averse. 40. The parameter R in an exponential utility function represents a. the decision maker’s risk tolerance. b. the utility function’s error tolerance. c. the posterior probability. d. the likely profit/loss from the investment. Answer: A Difficulty: Easy LO: 12.6, Page 580 Bloom’s: Knowledge BUSPROG: Analytic DISC: Feedback: The parameter R in an exponential utility function represents the decision maker’s risk tolerance. Problems 1. Emil Hansen is interested in leasing a sports-utility vehicle and has contacted three automobile dealers for pricing information. Each dealer offered Emil a 24-month lease with no down payment due at the time of signing. Each lease includes a monthly cost, mileage allowances, and the cost for additional miles and the details are given in the below table. Dealer True Vehicle FCO Jack’s Auto
Monthly cost ($) 300 360 410
Mileage allowances 40,000 46,000 50,000
Cost per Additional Mile ($) 0.30 0.35 0.15
Emil decided to choose the lease option that will minimize his total 24-month cost. Emil is not sure how many miles he will drive in the next two years. Hence, for the purpose of decision, assume that Emil wants to evaluate options of driving 20,000 miles per year, 23,000 miles per year, and 25,000 miles per year. a. What is the decision, and what is the chance event? b. Construct a payoff table for Emil’s problem. Answer:
a. The decision faced by Emil is to select the best lease option from three alternatives (True Vehicle, FCO, and Jack’s Auto). The chance event is the number of miles Emil will drive. b. The payoff for any combination of alternative and chance event is the sum of the total monthly charges and total additional mileage cost, i.e., for the True Vehicle lease option: 40,000 miles (20,000 miles for 2 years): 24($300) + $0.30(40000 – 40000) = $7,200 46,000 miles (23,000 miles for 2 years): 24($300) + $0.30(46000 – 40000) = $9,000 50,000 miles (25,000 miles for 2 years): 24($300) + $0.30(50000 – 40000) = $10,200 for the FCO lease option: 40,000 miles (20,000 miles for 2 years): 24($360) + $0.35*Max((40000 – 46000),0) = $8,640 46,000 miles (23,000 miles for 2 years): 24($360) + $0.35*Max((46000 – 46000),0) = $8,640 50,000 miles (25,000 miles for 2 years): 24($360) + $0.35*Max((50000 – 46000),0) = $10,040 for the Jack’ Auto lease option: 40,000 miles (20,000 miles for 2 years): 24($410) + $0.15*Max((40000 – 50000),0) = $9,840 46,000 miles (23,000 miles for 2 years): 24($410) + $0.15*Max((46000 – 50000),0) = $9,840 50,000 miles (25,000 miles for 2 years): 24($410) + $0.15*Max((50000 – 50000),0) = $9,840 Below is the payoff table for Emil’s problem:
Dealer True Vehicle FCO Jack’s Auto
Difficulty: Moderate LO: 12.1, Pages 552-554 Bloom’s: Application BUSPROG: Analytic DISC:
Actual Miles Driven Annually 20,000 23,000 25,000 $7,200 $9,000 $10,200 $8,640 $8,640 $10,040 $9,840 $9,840 $9,840
2. Emil Hansen is interested in leasing a sports-utility vehicle and has contacted three automobile dealers for pricing information. Each dealer offered Emil 24-month lease with no down payment due at the time of signing. Each lease includes a monthly cost, mileage allowances, and the cost for additional miles and the details are given in the below table. Dealer True Vehicle FCO Jack’s Auto
Monthly cost ($) 300 360 410
Mileage allowances 40,000 46,000 50,000
Cost per Additional Mile ($) 0.30 0.35 0.15
Emil decided to choose the lease option that will minimize his total 24-month cost. Emil is not sure how many miles he will drive in the next two years. Hence, for the purpose of decision, assume that Emil wants to evaluate options of driving 20,000 miles per year, 23,000 miles per year, and 25,000 miles per year. a. Construct a decision tree based on the payoff table constructed in the previous problem. b. Recommend a decision based on the use of optimistic, conservative, and minimax regret approaches? Answer: a. The payoff table for the cost is:
Dealer True Vehicle FCO Jack’s Auto
Actual Miles Driven Annually 20,000 23,000 25,000 7,200 9,000 10,200 8,640 8,640 10,040 9,840 9,840 9,840
b. Decision Alternatives
Maximum Cost
Minimum Cost
True Vehicle FCO Jack’s Auto
$10,200 $10,040 $9,840
$7,200 $8,640 $9,840
Optimistic approach: Select True Vehicle lease option which has the smallest minimum cost. Conservative approach: Select Jack’s Auto lease option which has the smallest maximum cost. Regret or opportunity loss table: Actual Miles Driven Annually Dealer True Vehicle FCO Jack’s Auto
20,000
23,000
25,000
Maximum regret
$0 $ 1,440.00 $ 2,640.00
$ 360 $0 $ 1,200.00
$ 360 $ 200.00 $0
$ 360 $ 1,440 $ 2,640
Minimax Regret results in the selection of the True Vehicle lease option (which has the smallest regret of the three alternatives: $ 360). Difficulty: Challenging LO: 12.1; LO: 11.2, Pages 553-557 Bloom’s: Application BUSPROG: Analytic DISC: 3. Greentrop Pharmaceutical Products are the world leader in the area of sleep aids. Its major product is “Dozealot”. The Research-and-Development Division has defined two alternatives to improve the quality of the product, which are simple reformulations of the product to minimize the side effects and to improve the product efficacy. To conduct an analysis, management has decided to consider the possible demands for the drug under each alternative. The following payoff table shows the projected profit in millions of dollars. Demand
Decision Alternatives
Low
Medium
High
d1
$500
$350
$525
d2
$875
$300
$765
a. Construct a decision tree for this problem. b. If the decision maker knows nothing about the probabilities of three states of nature, what is the recommended decision using the optimistic, conservative, and minimax regret approaches? Answer: a.
b.
Decision Alternatives
Maximum Profit
Minimum Profit
d1
$525
$350
d2
$875
$300
Optimistic approach: select D2 which has the largest maximum profit. Conservative approach: select D1 which has the largest minimum profit. Regret or opportunity loss table: Demand Decision Alternatives
Low
Medium
High
Maximum Regret
d1
$375
0
$240
$375
d2
0
$50
0
$50
Minimax Regret: The decision alternative is D2 with the minimum of the maximum regret values $50. Difficulty: Moderate LO: 12.1; LO: 11.2, Pages 553-557 Bloom’s: Application BUSPROG: Analytic DISC: 4. Meega airlines decided to offer direct service from Akron to Clearwater beach, Florida. Management must decide between full-price service using a company’s new fleet of jet aircraft and a discountservice using smaller capacity commuter planes. Management developed estimates of the contribution to profit for each type of service based upon two possible levels of demand for service on Clearwater beach: high, moderate, and low. The following table shows the estimated quarterly profits (in thousands of dollars): Service Full price Discount
Demand for service High Medium Low 900 760 –430 710 650 350
a. If the demand probabilities are 0.3, 0.5, and 0.2, what is the best decision using the expected value approach? b. Construct a risk profile for the optimal decision in part a. What is the probability of the profit exceeding $700,000?
Answer: a. EV(Full) = 0.3(900) + 0.5(760) + 0.2(–430) = 564 → $564,000 EV(Discount) = 0.3(710) + 0.5(650) + 0.2(350) = 608 → $608,000 Optimal Decision: Discount service b. The risk profile in tabular form is shown. The probability that the cost exceeds $700,000 is 0.3. Profit 350 650 710
Probability 0.2 0.5 0.3
Difficulty: Moderate LO: 12.3, Pages 557-560 Bloom’s: Application BUSPROG: Analytic DISC: 5. The following payoff table shows the profit for a decision problem with three states of nature and three decision alternatives: Decision Alternative d1 d2 d3
State of Nature s1 7 2 8
s2 3 4 2
s3 4 5 3
a. Suppose P(s1) = 0.1, P(S2) = 0.3, and P(S3) = 0.6. What is the best decision using the expected value approach? b. Suppose that the probability of sate of nature, s1, s2, and s3 changes to 0.4, 0.2, and 0.4, respectively. What is the best decision using the expected value approach in this case? Answer: a.
EV(d1) = 0.1(7) + 0.3(3) + 0.6(4) = 4 EV(d2) = 0.1(2) + 0.3(4) + 0.6(5) = 4.4 EV(d3) = 0.1(8) + 0.3(2) + 0.6(3) = 3.2 Therefore, the best decision alternative is d2.
b. EV(d1) = 0.4(7) + 0.2(3) + 0.4(4) = 5 EV(d2) = 0.4(2) + 0.2(4) + 0.4(5) = 3.6 EV(d3) = 0.4(8) + 0.2(2) + 0.4(3) = 4.8 Therefore, the best decision alternative is d1. Difficulty: Easy LO: 12.3, Pages 557-559 Bloom’s: Application BUSPROG: Analytic DISC: 6. Visual Park is considering marketing one of its two television models for coming Christmas season: Model A or Model B. Model A is a unique featured television and appears to have no competition. Estimated profits (in thousand dollars) under high, medium, and low demand are given below: Demand Model A
High
Medium
Low
Profit
1200
900
500
Probability
0.2
0.6
0.2
Visual Park is optimistic about the TV Model B. However, the concern is that profitability will be affected if a competitor launches a TV model which has similar features as Model B. Estimated profits (in thousand dollars) with and without competition is as follows: Model B
Demand
With competition
High
Medium
Low
Profit Probability
1200 0.2
900 0.3
500 0.5
Model B
Demand
Without competition
High
Medium
Low
Profit
1600
1100
700
Probability
0.6
0.2
0.2
a. Develop a decision tree for the Visual Park problem. b. For planning purposes, Visual Park believes there is a 0.7 probability that its competitor will launch a TV model similar to Model B. Given this probability of competition, the director of planning recommends marketing the Model A. Using expected value, what is your recommended decision? c. Show a risk profile for your recommended decision. d. Use sensitivity analysis to determine the probability of competition for Model B would have to be for you to change your recommended decision alternative. Answer: a.
b. EV(node 2) = 0.2(1200) + 0.6(900) + 0.2(500) = 880 EV(node 4) = 0.2(1200) + 0.3(900) + 0.5(500) = 760 EV(node 5) = 0.6(1600) + 0.2(1100) + 0.2(700) = 1320 EV(node 3) = 0.7EV(node 4) + 0.3EV(node 5) = 0.7(760) + 0.3(1320) = 928 Model B is recommended as the expected value of $928,000 is $48,000 better than Model A. c. Risk profile: 1600 1200 1100
0.3*0.6 0.7*0.2 0.3*0.2
0.18 0.14 0.06
900 700 500
0.7*0.3 0.3*0.2 0.7*0.5
0.21 0.06 0.35
d. Let p = probability of competition p=1 p=0
→ →
EV(node 5) = 1120 EV(node 4) = 460
Setting the Expected Value of both decisions equal to each other, gives us: EV(Model B) = EV(Model A) 1320 – p(1320 – 760) = 880 560p = 440 p = 440/560 = 0.7857 For p > 0.7857, the EV of Model A is greater; for p < 0.7857, the EV of Model B is greater. Therefore, the probability of competition would have to be greater than 0.7857 before we would change to the Model A. Difficulty: Challenging LO: 12.3; LO: 12.4, Pages 560-566 Bloom’s: Application BUSPROG: Analytic DISC:
7. The Golden Jill Mining Company is interested in procuring 10,000 acres of coal mines in Powder River Basin. The mining company is considering two payment-plan options to buy the mines: I. 100% Payment II. Installment-Payment The payoff received will be based on the quality of coal obtained from the mines which has been categorized as High, Normal, and Poor Quality as well as the payment plan. The profit payoff in million dollars resulting from the various combinations of options and quality are provided below: Payment-Plan Options 100% Payment Installment-Payment
High 450 350
Quality Normal 320 300
Poor –250 –110
a. What is the decision to be made, what is the chance event, and what is the consequence for this problem? How many decision alternatives are there? How many outcomes are there for the chance event? b. If nothing is known about the probabilities of the chance outcomes, what is the recommended decision using the optimistic, conservative, and minimax regret approaches? Answer: a. The decision to be made is to choose the payment-plan option to buy the mines. The chance event is the quality of the coal obtained from the mines which has been categorized as High, Normal, and Poor. The consequence is the amount of profit. There are two decision alternatives (100% Payment and Installment-Payment). There are three outcomes for the chance event (High, Normal, and Poor). b. Payment-Plan Options 100% Payment Installment-Payment
Maximum Profit
Minimum Profit
450 350
–250 –110
Optimistic Approach: 100% Payment Conservative Approach: Installment-Payment Opportunity Loss or Regret Table Payment Options 100% Payment
Quality Low
Normal
Poor
0
0
140
Maximum Regret 140
InstallmentPayment
300
220
0
300
Minimax Regret Approach: 100% Payment Difficulty: Moderate LO: 12.1; LO: 11.2, Pages 553-557 Bloom’s: Application BUSPROG: Analytic DISC: 8. The Golden Jill Mining Company is interested in procuring 10,000 acres of coal mines in Powder River Basin. The mining company is considering two payment-plan options to buy the mines: I. 100% Payment II. Installment-Payment The payoff received will be based on the quality of coal obtained from the mines which has been categorized as High, Normal, and Poor Quality as well as the payment plan. The profit payoff in million dollars resulting from the various combinations of options and quality are provided below: Payment-Plan Options 100% Payment InstallmentPayment
High 450
Quality Normal 320
Poor –250
350
300
–110
a. Suppose that management believes that the probability of obtaining High Quality coal is 0.55, probability of Normal Quality Coal is 0.35, and probability of Poor Quality Coal is 0.1. Use the expected value approach to determine an optimal decision. b. Suppose that management believes that the probability of High Quality coal is 0.25, probability of Normal Quality Coal is 0.4, and probability of Poor Quality Coal is 0.35. What is the optimal decision using the expected value approach? Answer: a. EV(100% Payment) = 0.55(450) + 0.35(320) + 0.10(-250) = 334.5 EV(Installment - Payment) = 0.55(350) + 0.35(300) + 0.10(-110) = 286.5 Optimal Decision: 100% Payment b. EV(100% Payment) = 0.25(450) + 0.40(320) + 0.35(-250) = 153 EV (Installment - Payment) = 0.25(350) + 0.40(300) + 0.35(-110) = 169 Optimal Decision: Installment – Payment
Difficulty: Easy LO: 12.3; Pages 557-560 Bloom’s: Application BUSPROG: Analytic DISC: 9. Meega airlines decided to offer direct service from Akron to Clearwater beach, Florida. Management must decide between full-price service using a company’s new fleet of jet aircraft and a discountservice using smaller capacity commuter planes. Management developed estimates of the contribution to profit for each type of service based upon two possible levels of demand for service on Clearwater beach: high, moderate, and low. The following table shows the estimated quarterly profits (in thousands of dollars): Service Full price Discount
Demand for service High Medium Low 900 760 –430 710 650 350
The probabilities for the demand is P(High) = 0.3, P(Medium) = 0.5, and P(Low) = 0.2, respectively. a. What is the optimal decision strategy if perfect information were available? b. What is the expected value for the decision strategy developed in part a? c. Using the expected value approach, what is the recommended decision without perfect information? What is its expected value? d. What is the expected value of perfect information? Answer: a. If demand is High or Medium, select the decision alternative Full price; if demand is Low, select the decision alternative Discount. b. EVwPI = 0.3(900) + 0.5(760) + 0.20(350) = 720 → $720,000 c. EV(Full) = 0.3(900) + 0.5(760) + 0.2(–430) = 564 → $564,000 EV(Discount) = 0.3(710) + 0.5(650) + 0.2(350) = 608 → $608,000 Thus, the recommended decision is Discount. Hence, EVwoPI = $608,000. d. EVPI = EVwPI – EvwoPI = $720,000 – $608,000 = $112,000 Difficulty: Moderate LO: 12.4; Pages 561-568 Bloom’s: Application BUSPROG: Analytic DISC:
10. The following table provides information about the profit payoff of an investment strategy. State of Nature
Decision Alternative
s1
s2
s3
s4
d1
52
44
44
36
d2
52
68
60
40
d3
36
36
36
44
Probability
0.3
0.2
0.4
0.1
a. What is the optimal decision strategy if perfect information were available? b. What is the expected value for the decision strategy developed in part a? c. Using the expected value approach, what is the recommended decision without perfect information? What is its expected value? d. What is the expected value of perfect information? Answer: a. If s1, select d1 or d2 and receive a payoff of 52. If s2, select d2 and receive a payoff of 68. If s3, select d2 and receive a payoff of 60. If s4, select d3 and receive a payoff of 44. b. EVwPI = 0.3(52) + 0.2(68) + 0.4(60) + 0.1(44) = 57.6 c. EV(d1) = 0.3(52) + 0.2(44) + 0.4(44) + 0.1(36) = 45.6 EV(d2) = 0.3(52) + 0.2(68) + 0.4(60) + 0.1(40) = 57.2 EV(d3) = 0.3(36) + 0.2(36) + 0.4(36) + 0.1(44) = 36.8 Thus, the recommended decision is d2. Hence, EVwoPI = 57.2. d. EVPI = EVwPI – EvwoPI = 57.6 – 57.2 = 0.4. Difficulty: Moderate LO: 12.4; Pages 561-568 Bloom’s: Application BUSPROG: Analytic DISC: 11. Consider a decision situation with four possible states of nature: s1, s2, s3, and s4. The prior probabilities are P(s1) = 0.35, P(s2) = 0.15, P(s3) = 0.20, P(s4) = 0.30. The conditional probabilities are
P(C|s1) = 0.2, P(C|s2) = 0.09, P(C|s3) = 0.15, and P(C|s4) = 0.20. Find the revised (posterior) probabilities P(s1|C), P(s2|C), P(s3|C), and P(s4|C). Answer: State of Nature s1 s2 s3 s4
P(sj)
P(C sj)
P(C sj)
P(sjC)
0.35 0.15 0.2 0.3
0.2 0.09 0.15 0.2 P(F) =
0.0700 0.0135 0.0300 0.0600 0.1735
0.4035 0.0778 0.1729 0.3458 1.0000
Difficulty: Easy LO: 12.5; Pages 569-571 Bloom’s: Application BUSPROG: Analytic DISC: 12. A construction company must decide on the size of the shopping mall, i.e. Large, Medium or Small, that has to be constructed in their acquired plot in the sub-urban area of Seattle. Due to the market conditions, the number of visitors to the mall will be High, Moderate, or Low. The level of response and the size of the mall will decide the return of investment from the mall. The profit payoff table for management (in millions of dollars) after 5 years is provided below. Number of visitors Size of the mall
High
Moderate
Low
Large
25
15
–20
Medium
20
12
–10
Small
15
13
5
The probabilities are P(High) = 0.35, P(Moderate) = 0.40, and P(Low) = 0.25. a. Use a decision tree to recommend a decision. b. Use EVPI to determine whether the construction company should attempt to obtain a better estimate of the response. Answer: a. Let d1 = Size of the shopping mall is large d2 = Size of the shopping mall is medium d3 = Size of the shopping mall is small
s1 = High demand s2 = Moderate demand s3 = Low demand
EV(node 2) = (0.35)(25) + (0.40)(15) + (0.25)(–20) = 9.75 EV(node 3) = (0.35)(20) + (0.40)(12) + (0.25)(–10) = 9.30 EV(node 4) = (0.35)(15) + (0.40)(13) + (0.25)(5) = 11.7 Recommended decision: d3 (Small shopping mall). b.
Optimal decision strategy with perfect information:
If s1, select d1; If s2, select d1; If s3, select d3; Expected value of this strategy is (0.35)(25) + (0.40)(15) + (0.25)(5) = 16 EVPI = EVwPI – EvwoPI = 16 – 11.7 = 4.3 or $4.3 million. In other words, $4.3 million represents the additional expected value that can be obtained if perfect information were available about the states of nature. Difficulty: Moderate LO: 12.4; Pages 561-568 Bloom’s: Application BUSPROG: Analytic DISC: 13. A construction company must decide on the size of the shopping mall, i.e. Large, Medium or Small, that has to be constructed in their acquired plot in the sub-urban area of Seattle. Due to the market conditions, the number of visitors to the mall will be High, Moderate, or Low. The level of response and the size of the mall will decide the return of investment from the mall. The profit payoff table for management (in millions of dollars) after 5 years is provided below.
Size of the mall
Number of visitors High Moderate Low
Large
25
15
–20
Medium
20
12
–10
Small
15
13
5
The probabilities for the state of nature are P(High) = 0.35, P(Moderate) = 0.40, and P(Low) = 0.25. a. A test market study of the potential response for the mall in that area is expected to report either a favorable (F) or unfavorable (U) condition. The relevant conditional probabilities are as follows: P(F|High) = 0.35; P(U|High) = 0.65 P(F|Moderate) = 0.45; P(U|Moderate) = 0.55 P(F|Low) = 0.20; P(U|Low) = 0.80 What is the probability that the market research report will be favorable? b. Show the decision tree for this problem. Answer: a. Let d1 = Size of the shopping mall is large d2 = Size of the shopping mall is medium d3 = Size of the shopping mall is small
s1 = High demand s2 = Moderate demand s3 = Low demand If F - Favorable State of Nature s1 (High) s2 (Moderate) s3 (Low)
P(sj) 0.35 0.4 0.25
P(Fsj) 0.35 0.45 0.2 P(F) =
P(F sj) 0.1225 0.1800 0.0500 0.3525
P(siF) 0.3475 0.5106 0.1418 1.0000
P(sj) 0.35 0.4 0.25
P(U/sj) 0.65 0.55 0.8
P(U sj) 0.2275 0.2200 0.2000 0.6475
P(siU) 0.3514 0.3398 0.3089 1.0000
If U - Unfavorable State of Nature s1 (High) s2 (Moderate) s3 (Low)
The probability the report will be favorable is P(F ) = 0.3525 b. Assuming the test market study is used, a portion of the decision tree is shown below.
Difficulty: Challenging LO: 12.5; Pages 568-571 Bloom’s: Application BUSPROG: Analytic DISC: 14. Three decision makers have assessed payoffs for the following decision problem (payoff in dollars): Decision Alternative d1 d2
State of Nature s1 s2 s3 15 40 –20 60 80 –80
The indifference probabilities are as follows:
Payoff 80 60 40 15 –20 –80
Indifference Probability (p) Decision Maker A Decision Maker B Decision Maker C Does not apply Does not apply Does not apply 0.7 0.95 0.87 0.5 0.9 0.74 0.3 0.8 0.59 0.15 0.6 0.37 Does not apply Does not apply Does not apply
a. Plot the utility function for money for each decision maker. b. Classify each decision maker as a risk avoider, a risk taker, or risk neutral. c. For the payoff of 40, what is the premium that the risk avoider will pay to avoid risk? What is the premium that the risk taker will pay to have the opportunity of the high payoff? Answer: a.
b. A - Risk taker B - Risk avoider C - Risk neutral c. For risk avoider B: at $40, indifference probability p = 0.9 Thus, EV(Lottery) = 0.9(80) + 0.1(–80) = $64. Therefore, B will pay $64 – $40 = $24. For risk taker A: at $40, indifference probability p = 0.5 Thus, EV(Lottery) = 0.5(80) + 0.5(–80) = $0. Therefore, A will pay $40 – $0 = $40. Difficulty: Moderate LO: 12.6; Pages 571-579 Bloom’s: Application BUSPROG: Analytic DISC: 15. Three decision makers have assessed payoffs for the following decision problem (payoff in dollars): Decision Alternative d1 d2
State of Nature s1 s2 s3 15 40 –20 60 80 –80
The indifference probabilities are as follows:
Payoff 80 60 40 15 –20 –80
Indifference Probability (p) Decision Maker A Decision Maker B Decision Maker C Does not apply Does not apply Does not apply 0.7 0.95 0.85 0.5 0.9 0.7 0.3 0.8 0.55 0.15 0.6 0.35 Does not apply Does not apply Does not apply
If P(s1) = 0.30, P(s2) = 0.55, and P(s3) = 0.15, find a recommended decision for each of the three decision makers. Answer: For each of the cases, assume that the utilities for best and worst payoffs are 10 and 0, respectively. For Decision Maker A: Utility Table Decision Alternative d1 d2
State of Nature s1 s2 s3 3 5 1.5 7 10 0
EU(d1) = 0.30(3) + 0.55(5) + 0.15(1.5) = 3.875 EU(d2) = 0.30(7) + 0.55(10) + 0.15(0) = 7.60 The recommended decision is d2. For Decision Maker B: Utility Table Decision Alternative d1 d2
State of Nature s1 s2 s3 8 9 6 9.5 10 0
EU(d1) = 0.30(8) + 0.55(9) + 0.15(6) = 8.25 EU(d2) = 0.30(9.5) + 0.55(10) + 0.15(0) = 8.35 The recommended decision is d2. For Decision Maker C:
Utility Table Decision Alternative d1 d2
State of Nature s1 s2 s3 5.5 7 3.5 8.5 10 0
EU(d1) = 0.30(5.5) + 0.55(7) + 0.15(3.5) = 6.025 EU(d2) = 0.30(8.5) + 0.55(10) + 0.15(0) = 8.05 The recommended decision is d2. Difficulty: Moderate LO: 12.6; Pages 575-579 Bloom’s: Application BUSPROG: Analytic DISC: 16. A manufacturing company introduces two product alternatives. The table below provides profit payoffs in thousands of dollars. Bet on
State of Nature (Demand) Up
Stable
Product A
11
8
Down 8
Product B
8
10
12
The probabilities for the state of nature are P(Up) = 0.35, P(Stable) = 0.35, and P(Down) = 0.30. a. Use a decision tree to recommend a decision. b. Use EVPI to determine whether the manufacturing company should attempt to obtain a better estimate of the response. Answer: a.
EV(node 2) = (0.35)(11) + (0.35)(8) + (0.30)(8) = 9.05 or 9.05 thousands of dollars EV(node 3) = (0.35)(8) + (0.35)(10) + (0.30)(12) = 9.9 or 9.9 thousands of dollars Recommended decision: d2 (Product B). c.
Optimal decision strategy with perfect information: If demand is up, select product A; If demand is stable, select product B; If demand is down, select product B; Expected value of this strategy is (0.35)(11) + (0.35)(10) + (0.30)(12) = 10.95 or 10.95 thousands of dollars. EVPI = EVwPI – EvwoPI = 10.95 – 9.9 = 1.05 or 1.05 thousands of dollars. In other words, 1.05 thousands of dollars represents the additional expected value that can be obtained if perfect information were available about the states of nature.
Difficulty: Moderate LO: 12.4; Pages 561-568 Bloom’s: Application BUSPROG: Analytic DISC: 17. A Manufacturing company introduces two product alternatives. The table below provides profit payoffs in thousands of dollars. Bet on
State of Nature (Demand) Up
Stable
Product A
11
8
Down 8
Product B
8
10
12
The probabilities for the state of nature are P(Up) = 0.35, P(Stable) = 0.35, and P(Down) = 0.30. A test market study of the potential demand for the product is expected to report either a favorable (F) or unfavorable (U) condition. The relevant conditional probabilities are as follows: P(F|Up) = 0.5; P(F|Stable) = 0.3; P(F|Down) = 0.2 P(U|Up) = 0.2; P(U|Stable) = 0.3; P(U|Down) = 0.5 Use Bayes’ theorem to compute the conditional probability of the demand being up, stable, or down, given each market research outcome. Answer: Let
s1 = Demand is up s2 = Demand is stable s3 = Demand is down
𝑃(𝑠1 |𝐹) =
𝑃(𝑠2 |𝐹) =
𝑃(𝑠3 |𝐹) =
𝑃(𝐹|𝑠1 )𝑃(𝑠1 ) 𝑃(𝐹|𝑠1 )𝑃(𝑠1 ) + 𝑃(𝐹|𝑠2 )𝑃(𝑠2 ) + 𝑃(𝐹|𝑠3 )𝑃(𝑠3 ) 0.5 × 0.35 = = 0.5147 0.5 × 0.35 + 0.3 × 0.35 + 0.2 × 0.3 𝑃(𝐹|𝑠2 )𝑃(𝑠2 ) 𝑃(𝐹|𝑠1 )𝑃(𝑠1 ) + 𝑃(𝐹|𝑠2 )𝑃(𝑠2 ) + 𝑃(𝐹|𝑠3 )𝑃(𝑠3 ) 0.3 × 0.35 = = 0.3088 0.5 × 0.35 + 0.3 × 0.35 + 0.2 × 0.3 𝑃(𝐹|𝑠3 )𝑃(𝑠3 ) 𝑃(𝐹|𝑠1 )𝑃(𝑠1 ) + 𝑃(𝐹|𝑠2 )𝑃(𝑠2 ) + 𝑃(𝐹|𝑠3 )𝑃(𝑠3 ) 0.2 × 0.3 = = 0.1765 0.5 × 0.35 + 0.3 × 0.35 + 0.2 × 0.3
𝑃(𝑠1 |𝑈) =
𝑃(𝑠2 |𝑈) =
𝑃(𝑠3 |𝑈) =
𝑃(𝑈|𝑠1 )𝑃(𝑠1 ) 𝑃(𝑈|𝑠1 )𝑃(𝑠1 ) + 𝑃(𝑈|𝑠2 )𝑃(𝑠2 ) + 𝑃(𝑈|𝑠3 )𝑃(𝑠3 ) 0.2 × 0.35 = = 0.2154 0.2 × 0.35 + 0.3 × 0.35 + 0.5 × 0.3 𝑃(𝐹|𝑠2 )𝑃(𝑠2 ) 𝑃(𝑈|𝑠1 )𝑃(𝑠1 ) + 𝑃(𝑈|𝑠2 )𝑃(𝑠2 ) + 𝑃(𝑈|𝑠3 )𝑃(𝑠3 ) 0.3 × 0.35 = = 0.3231 0.2 × 0.35 + 0.3 × 0.35 + 0.5 × 0.3 𝑃(𝐹|𝑠3 )𝑃(𝑠3 ) 𝑃(𝑈|𝑠1 )𝑃(𝑠1 ) + 𝑃(𝑈|𝑠2 )𝑃(𝑠2 ) + 𝑃(𝑈|𝑠3 )𝑃(𝑠3 ) 0.5 × 0.3 = = 0.4615 0.2 × 0.35 + 0.3 × 0.35 + 0.5 × 0.3
Difficulty: Moderate LO: 12.5; Pages 568-568 Bloom’s: Application BUSPROG: Analytic DISC: 18. Harold has visited a casino and paid an entry fee of $20,000 to play the game of cards. Below is the payoff table in terms of the decision to play or not to play the game (Note: Harold will not pay the entry fee if he does not want to play and the below payoff table includes the entry fee): Decision Play the game, d1 Do not play the game, d2
State of Nature Win ($)
Lose ($)
50,000
–20,000
0
0
a. In his previous visits, Harold has won 1 out of every 5 games that he has played. Use the expected value approach to recommend a decision. b. Assume that the utilities for 50,000 and –20,000 are 10 and 0, respectively. If a particular decision maker assigns an indifference probability of 0.0001 to the $0 payoff, would Harold play the game? Use expected utility to justify your answer. Answer: a. P(Win) = 1/5 = 0.2; P(Lose) = 4/5 = 0.8 EV(d1) = (1/5)(50000) + (4/5)(–20000) = –6000 EV(d2) = 0 Therefore, the best decision under the EV approach is d2 – Do not play the game. b. Utility Table: Decision
State of Nature
Play the game, d1 Do not play the game, d2
Win
Lose
10
0
0.0001
0.0001
EU(d1) = (1/5)(10) + (4/5)(0) = 2 EU(d2 ) = 0.0001 Therefore, the best decision under the EU approach is d1 - purchase lottery ticket. Difficulty: Moderate LO: 12.6; Pages 571-580 Bloom’s: Application BUSPROG: Analytic DISC: 19. Translate the following monetary payoffs into utilities for a decision maker whose utility function is described by an exponential function with R = 6450: –$3000, –$1500, $0, $1500, $3000, $4500, $6000, $7500, $9000. Answer: Monetary Payoff, x –3000 –1500 0 1500 3000 4500 6000 7500 9000
Utility, U(x) –0.592 –0.262 0.000 0.207 0.372 0.502 0.606 0.687 0.752
Difficulty: Moderate LO: 12.6; Pages 580-581 Bloom’s: Application BUSPROG: Analytic DISC: 20. Consider an advertising company which has to decide on investing with the current team that has a 50 percent chance of earning a net profit of $35,000 and a 50 percent chance of losing $17,500 invested. a. Write the equation for the exponential function that approximates the advertising company’s utility function.
b. Plot the exponential utility function for this advertising company for x values between –30,000 and 45,000. Is the management for the advertising company risk seeking, risk neutral, or risk averse? c. Suppose the management would like to invest more on marketing and actually be willing to make an investment that has a 50 percent chance of earning $50,000 and a 50 percent chance of losing $25,000. Plot the exponential function that approximates this utility function and compare it to the utility function from part b. Is the management becoming more risk seeking or more risk averse? Answer: a. The exponential utility function for the advertising company is 1 − 𝑒 −𝑥/35,000 . b. The utility function values and plot of U(x) = 1 − 𝑒 −𝑥/35,000 is shown below. x –30000 –25000 –20000 –15000 –10000 –5000 0 5000 10000 15000 20000 25000 30000 35000 40000 45000
Utility, U(x) –1.356 –1.043 –0.771 –0.535 –0.331 –0.154 0.000 0.133 0.249 0.349 0.435 0.510 0.576 0.632 0.681 0.724
1.000
Utility, U(x)
0.500
0.000
-40000
-20000
0
20000
40000
60000
-0.500
-1.000
-1.500
The decision maker is risk averse. c. 1.000
Utility, U(x)
0.500
-40000
-30000
-20000
0.000 -10000 0
10000
20000
30000
40000
50000
-0.500
-1.000
-1.500 U(x) = 1 – EXP(–x/35000)
U(x) = 1 – EXP(–x/50000)
The plots of two exponential utility functions are shown here. We observe that the new utility function is “flatter” than the utility function from part b. Therefore, the management here is being
more risk seeking (less risk averse) than in part b. While the management is still, in general, risk averse, it is willing to accept more risk in part b. Difficulty: Moderate LO: 12.6; Pages 580-581 Bloom’s: Application BUSPROG: Analytic DISC: