A01_BENG9800_01_SE_FM.indd 6
31/08/11 5:40 PM
About
the
Author
Farouk A. Benghezal has been teaching management science, statistics, operations management, and other business courses for more than thirty years. He received his BA and MA in Economics from the University of Algiers. He was awarded an AMIDEAST scholarship to study at Michigan State University, where he completed an MSc in Operations Research and Statistics and a PhD in Management Science. He has previously taught at Michigan State University, the School of Statistics and Planning in Algiers, the University of Algiers, and several universities in the United Arab Emirates (Ajman University of Science and Technology, Abu Dhabi University, and the University of Sharjah). In addition, he has held various positions at research institutions and consultancy firms. He is currently based at The American University in the Emirates. Dr Benghezal has previously authored a textbook, Programmation Linéaire (linear programming),in French. He has published numerous papers on modeling and has c o-authored several monographs.
A01_BENG9800_01_SE_FM.indd 5
31/08/11 5:40 PM
Brief Contents Chapter 1:
Introduction to Statistics 2
Chapter 2:
Descriptive Statistics 56
Chapter 3:
Probability Concepts and Theory
120
Chapter 4:
Discrete Probability Distributions
162
Chapter 5:
Continuous Probability Distributions
212
Chapter 6:
Sampling Distributions
254
Chapter 7:
Estimation and Confidence Intervals
282
Chapter 8:
One-Sample Hypothesis Tests
320
Chapter 9:
Inference from Two Samples
372
Chapter 10:
Chi-Square Tests
424
Chapter 11:
Analysis of Variance
462
Chapter 12:
Simple Linear Regression
518
Chapter 13:
Nonparametric Tests (Part A)
566
Chapter 14:
Statistical Quality Control
620
ADDITIONAL CHAPTERS ON CD:
A01_BENG9800_01_SE_FM.indd 7
Chapter 15:
Multiple Linear Regression 2
Chapter 16:
Nonparametric Tests (Part B) 40
Chapter 17:
Time Series, Forecasting, and Index Numbers 72
31/08/11 5:40 PM
Contents
ix
Contents Preface Xv Acknowledgments
XX
Chapter 1 Introduction to Statistics 2 What is Statistics? 4 Data Collection 6 Concepts in Statistics 7 Levels of Data Measurement 10 Types of Statistics 11 Descriptive Statistics 11 Inferential Statistics 11 Sampling Methods 12 Simple Random Sampling 12 Systematic Sampling 13 Stratified Sampling 14 Cluster Sampling 14 Frequency Distribution 15 Qualitative Data 15 Quantitative Data 16 Check your Understanding 20 Graphic Presentations of a Frequency Distribution 22 Bar Chart 22 Histogram 23 Technology: Template for Histograms 24 Frequency Polygon 26 Technology: Template for Frequency Polygons 26 Ogive 27 Technology: Template for Ogives 28 Pie Chart 29 Technology: Template for Pie Charts 30 Stem and Leaf Display 32 Other Graphic Presentations of Data 35 Time Series 35 Technology: Template for Time Series 37 Scatter Plots 37 Technology: Template for Scatter Plots 40 Pareto Chart 41 Technology: Template for Pareto Chart 42 Check your Understanding 44 Chapter Summary 46 Key Terms 46 Solved Problems 47 Problem A 47 Problem B 47 Problem C 49 Problems 49 Miniprojects 54 Chapter 2 Descriptive Statistics Measures of Central Tendency Mean Weighted mean Median Midrange Mode Geometric Mean Trimmed Mean Harmonic Mean Technology: Template for Measures of Central Tendency
A01_BENG9800_01_SE_FM.indd 9
56 58 59 62 65 68 68 70 73 74 75
Check your Understanding 75 Measures of Dispersion 77 Range 78 Variance 79 Standard Deviation 84 Coefficient of Variation 85 Chebyshev’s Theorem 87 Technology: Template for Measures of Dispersion 88 Check your Understanding 89 Measures of Location 90 Z-Score 90 Percentile 92 Quartiles 97 Technology: Template for Percentile Graphs 98 Exploratory Data Analysis 98 Outliers 98 Box Plots 102 Measures of Shape 103 Skewness 103 Kurtosis 105 Technology: Template for Measures of Location and Shape 106 Check your Understanding 107 Chapter Summary 108 Key Terms 109 Key Formulas 110 Solved Problems 110 Problem A 110 Problem B 111 Problem C 111 Problem D 112 Problems 114 Miniprojects 119 Chapter 3 Probability Concepts and Theory The Concept of Probability Classical Approach Empirical Approach Subjective Approach Counting Rules The Multiplication Rule The Permutation Rule The Combination Rule Technology: Template for Counting Rules Check your Understanding Laws of Probabilities Addition Law of Probability Conditional Law of Probability Relationship among Joint, Conditional and Marginal Probabilities Technology: Template for Conditional Probabilities Check your Understanding Posterior Probabilities and Bayes’ Theorem Technology: Template for Bayesian Probabilities Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems
120 122 124 124 125 125 126 127 129 130 131 132 135 138 141 147 148 150 153 154 155 155 156 156
31/08/11 5:40 PM
x 
Contents
Problem A Problem B Problem C Problems Miniprojects Chapter 4 Discrete Probability Distributions Random Variables Probability Distribution Discrete Probability Distributions Mean,Variance and Standard Deviation of a Probability Distribution Technology: Template for Discrete Random Variables The Binomial Distribution Binomial Probability Tables Mean of the Binomial Distribution Variance of the Binomial Distribution Technology: Template for the Binomial Distribution Check your Understanding The Negative Binomial Distribution Mean and Variance of the Negative Binomial Distribution Technology: Template for the Negative Binomial Distribution The Geometric Distribution Mean and Variance of the Geometric Distribution Technology: Template for the Geometric Distribution Check your Understanding The Hypergeometric Distribution Mean and Variance of the Hypergeometric Distribution Technology: Template for the Hypergeometric Distribution The Poisson Distribution Poisson Probability Tables Mean and Variance of the Poisson Distribution Poisson Approximation to the Binomial Technology: Template for the Poisson Distribution Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problem C Problems Miniprojects Chapter 5 Continuous Probability Distributions The Uniform Distribution Mean and Variance of the Uniform Distribution Technology: Template for the Uniform Distribution The Exponential Distribution Mean and Variance of the Exponential Distribution Technology: Template for the Exponential Distribution Check your Understanding
A01_BENG9800_01_SE_FM.indd 10
156 156 157 158 161 162 164 165 167 169 173 173 176 177 178 178 182 184 186 187 188 189 190 191 192 194 196 197 198 199 200 201 202 203 203 203 205 205 205 206 206 210 212 214 217 218 219 222 222 223
The Normal Distribution Standard Normal Table Finding Probabilities of the Normal Distribution Finding Values of Z Given Probabilities The Inverse Transformation Approximation of the Binomial Distribution by the Normal Distribution Technology: Template for the Normal Distribution Technology: Template for the Normal Approximation to Binomial Distributions Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problem C Problem D Problems Miniprojects Chapter 6 Sampling Distributions Sampling Population Parameters and Sample Statistics Reasons for Sampling Random Sampling Sampling Distribution of the Mean The Central Limit Theorem Technology: Template for the Sampling Distribution of the Mean Check your Understanding Sampling Distribution of the Sample Proportion Technology: Template for the Sampling Distribution of the Proportion The Correction Factor Technology: Template for Finite Correction Factor Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problems Miniprojects Chapter 7 Estimation and Confidence Intervals Estimation Confidence Interval for the Population Mean Confidence Interval for the Population Mean when is Known Finite Correction Factor Technology: Template for Confidence Intervals for Means with known Confidence Interval for the Mean when is Unknown Technology: Template for Confidence Intervals for Means with unknown Check your Understanding Confidence Interval for a Proportion
225 227 229 232 234 237 241
243 244 246 246 246 247 247 247 248 248 248 252 254 256 256 257 258 259 262 266 266 268 272 272 275 276 276 277 277 278 278 278 279 281 282 284 286 287 289 290 291 297 298 299
31/08/11 5:40 PM
Contents
Technology: Confidence Intervals for Proportions Confidence Interval for the Variance Using the Chi-Square Table Confidence Intervals with the Chi-Square Distribution Technology: Confidence Intervals for Variances Check your Understanding Estimation of the Sample Size Sample Size for Estimating μ when is Known Sample Size for Estimating μ when is Unknown Sample Size when Estimating the Population Proportion Technology: Template for Sample Size Determination Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problem C Problems Miniprojects Chapter 8 One-Sample Hypothesis Tests Hypothesis Testing: a Preview Hypothesis Testing Procedure Types of Hypothesis Tests One-Tailed Test Two-Tailed Test Test for a Population Mean with Known Variance The Critical Value Approach The p-Value Approach Technology: Template for Hypothesis Test on the Mean with Known Variance Test for a Population Mean with Unknown Variance Technology: Template for Hypothesis Test on the Mean with Unknown Variance Check your Understanding Test for a Population Proportion Technology: Template for Hypothesis Tests for Proportions Check your Understanding Test for a Population Variance Technology: Template for Hypothesis Tests for Variances Check your Understanding Confidence Interval versus Hypothesis Test Test of Type II Errors Technology: Template for Beta and Power Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A
A01_BENG9800_01_SE_FM.indd 11
301 302 303 304 306 306 307 307 308 310 311 311 312 313 313 314 314 314 314 315 318 320 322 324 328 329 330 331 332 335 338 339
343 343 345 347 348 349 352 353 354 355 361 362 362 363 363 364 364
Problem B Problem C Problems Miniprojects Chapter 9 Inference from Two Samples One–sample versus Two-sample Test Testing the Difference between Two Means Testing the Difference between Two Means for Large and Independent Samples with Known Variances Testing the Difference between Two Means for Large and Independent Samples with Unknown Variances Testing the Difference between Two Means for Small and Independent Samples with Unknown and Unequal Variances Testing the Difference between Two Means for Paired Samples Confidence Intervals for the Difference of Two Means Confidence Intervals for the Difference between Two Means for Paired Samples Technology: Templates for Testing the Difference between Two Means Check your Understanding Testing the Difference between Two Proportions Confidence Intervals for the Difference of Two Proportions Technology: Template for Testing the Difference between Two Proportions Check your Understanding Testing the Difference between Two Variances Use of F-Tables The F-Test for Two Population Variances Technology: Template for Testing the Difference between Two Variances Check your Understanding Testing the Difference between Two Means for Small and Independent Samples when the Variances are Unknown and Equal Confidence Intervals for Means with Equal Variances Technology: Template for Testing the Difference between Two Means for Small Samples and Equal Variances Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problem C Problems Miniprojects Chapter 10 Chi-Square Tests Test for Goodness of Fit Application to a Uniform Distribution Application to a Multinomial Distribution Application to a Normal Distribution
xi
364 365 366 370 372 374 375
376
378
379 382 385 386
387 389 391 396 397 398 399 400 401 406 407
407 410
411 411 412 413 413 415 415 416 416 418 423 424 426 427 429 431
31/08/11 5:40 PM
xii
Contents
Application to a Poisson Distribution Technology: Templates for Goodness-of-Fit Test Check your Understanding Contingency Analysis: a Chi-Square Test for Independence Technology: Template for Contingency Analysis: a Chi-Square Test for Independence Contingency Analysis: a Test for Homogeneity of Proportions Technology: Template for Contingency Analysis: a Test for Homogeneity of Proportions Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problem C Problems Miniprojects Chapter 11 Analysis of Variance One-Way Analysis of Variance Technology: Template for One-Way ANOVA Multiple Comparison Tests Test of Homogeneity of Variances Technology: Template for One-Way ANOVA with T ukey-Kramer Criterion Check your Understanding Randomized Complete Block ANOVA Technology: Template for Randomized Complete Block ANOVA Two-Way ANOVA with Replication A Word about Interaction Technology: Template for Two-Way ANOVA Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problems Miniprojects Chapter 12 Simple Linear Regression Linear Regression: a Preview Simple Linear Regression Scatter Diagram Least-squares Line Technology: Template for Simple Linear Regression Check your Understanding The Standard Error The Coefficient of Determination The Coefficient of Correlation Inference about the Regression Relationship Tests of Hypotheses Confidence Intervals
A01_BENG9800_01_SE_FM.indd 12
433 436 438 440
444 447
449 451 453 453 453 454 454 455 456 457 460 462 464 471 471 475 476 479 482 489 489 496 498 499 501 502 502 503 503 504 505 516 518 520 521 523 524 527 529 531 533 535 537 537 541
Analysis of Variance and the F-test of the Regression Model Check your Understanding Prediction of Y Using the Regression Model Analysis of Residuals Normality Assumption Constant Variance Assumption Independence Assumption Technology: Template for Linear Regression Model Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problems Miniprojects Chapter 13 Nonparametric Tests (Part A) Nonparametric Tests The Sign Test Tests on Categorical Data Tests on the Median Technology: Templates for the Sign Test The Runs Test Small Samples Large Samples Technology: Template for the Runs Test Check your Understanding The Wilcoxon Signed-Rank Test for Paired Data Small Samples Large Samples Technology: Template for the Wilcoxon Signed-Rank Test The Mann–Whitney U-Test for Independent Samples Small Samples Large Samples Technology: Template for the Mann–Whitney U-Test Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problem C Problems Miniprojects Chapter 14 Statistical Quality Control A Brief History of Modern Quality Management Tools of Total Quality Management Process Map Check Sheets Histograms Scatter Diagrams Pareto Analysis
541 543 545 546 547 547 548 551 552 553 553 553 555 555 557 558 565 566 568 569 569 574 577 579 579 582 584 585 586 588 592 594 596 597 602 605 606 607 607 608 609 609 609 610 612 618 620 622 623 624 624 625 625 626
31/08/11 5:40 PM
Contents
Cause-and-Effect Diagrams Control Charts Check your Understanding Statistical Process Control Causes of Variation Statistical Process Control Charts Statistical Process Control Charts for Variables Technology: Template for x and R Charts Technology: Template for x and MR Charts Technology: Template for x and S Charts Check your Understanding Statistical Process Control Charts for Attributes Technology: Template for the p Chart Technology: Template for the c Chart Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problem C Problems Miniprojects
626 627 628 629 629 630 632 637 641 644 645 647 649 652 652 654 655 655 657 657 658 659 660 666
Chapter 15 Multiple Linear Regression 2 Multiple Linear Regression Model 4 The F-Test for Overall Significance 7 The Coefficient of Determination 9 Significance Tests for Regression Parameters 10 Confidence Intervals for Regression Coefficients 12 Technology: Multiple Linear Regression with Excel 12 Technology: Multiple Linear Regression with Minitab 14 Check your Understanding 15 Prediction using the Multiple Regression Model 17 Binary Independent Variables 18 Multicollinearity 23 Model Building 27 Stepwise Regression 27 Forward Selection 29 Backward Elimination 30 Check your Understanding 30 Chapter Summary 31 Key Terms 32 Key Formulas 32 Solved Problems 33 Problem A 33 Problem B 34 Problems 35 Miniprojects 38 Chapter 16 Nonparametric Tests (Part B) 40 The Kruskal–Wallis Test 42 Technology: Template for the Kruskal–Wallis Test 46 Check your Understanding 48 The Friedman Test 50
A01_BENG9800_01_SE_FM.indd 13
xiii
Technology: Template for the Friedman Test 54 Check your Understanding 55 The Spearman Rank Correlation Test 56 Technology: Template for the Spearman Rank Correlation Test 60 Check your Understanding 61 Chapter Summary 62 Key Terms 62 Key Formulas 63 Solved Problems 63 Problem A 63 Problem B 64 Problem C 66 Problems 67 Miniprojects 71 Chapter 17 Time Series, Forecasting, and Index Numbers 72 Forecasting 74 Qualitative Forecasting Methods 75 Group Averaging 75 Group Consensus 76 Historical Analogy 76 Delphi Method 76 Time Series Forecasting Methods 77 Time Series Forecasting Based on Averages 78 Technology: Template for Simple Moving Average 88 Technology: Template for Weighted Moving Average 89 Technology: Templates for Single Exponential Smoothing 89 Check your Understanding 91 Time Series Forecasting Based on Trend 93 Technology: Template for Simple Linear Regression 97 Exponential Trend Model 97 Quadratic Trend Model 101 Technology: Templates for Double Exponential Smoothing 105 Technology: Templates for Ratio–to–Moving–Average Model with Seasonality 105 Check your Understanding 107 Time Series Forecasting Based on Seasonal Patterns 109 Technology: Templates for Linear Trend Model with Seasonality 121 Check your Understanding 124 Causal Models 126 Controlling the Forecast 129 Tracking Signal 129 Control Chart 130 Technology: Template for Tracking Signal and Control Chart 130 Check your Understanding 133 Index Numbers 133 Unweighted Aggregate Price Index 134 Weighted Aggregate Price Index 136 Laspeyres Price Index 136 Paasche Price Index 137 Fisher’s Ideal Price Index 140 Technology: Template for Index Numbers 140
31/08/11 5:40 PM
xiv 
Contents
Check your Understanding Chapter Summary Key Terms Key Formulas Solved Problems Problem A Problem B Problem C
A01_BENG9800_01_SE_FM.indd 14
141 142 143 144 146 146 147 149
Problems Miniprojects Answers to Selected Odd-Numbered Problems List of Appendix Tables Bibliography
150 156
668 A2 B2
31/08/11 5:40 PM
Preface Statistics is playing an increasingly vital role in practically all professions, and some familiarity with this subject is now an essential element of any college education. Most colleges include in their curricula a semester of study of statistics. The need to accommodate an increasing list of academic requirements often necessitates that the coverage be succinct. Keeping these conditions in mind, this book provides students with a first exposure to the main ideas of modern Statistics. This book introduces and develops statistical methods in business contexts, and is intended for the needs of those working or studying in the area of business. It presents the general principles, but also supports them with many worked examples. Most of the examples have been designed to demonstrate the relevance of statistics to making decisions in a day-today business environment. In each chapter, application-oriented problems are provided to test the student’s ability to use the tools learned to solve typical problems. The applications cover different areas of business, accounting, economics, finance, management, marketing, operations, and more. Examples and problems are drawn from a wide range of applications from all facets of life and specifically from business. Many examples and problems are based on real data from the Arab world. Sources of real data are mentioned. Managers make decisions every day. Some are routine and involve little thought, but many are more complex and depend on numbers to suggest and justify subsequent courses of action. Good data include information, and when carefully interpreted, increase knowledge. Statistical methods coupled with sound organizational practice can be a key to good management. Throughout the text the discussion of statistical methods is emphasized along with practical business applications to help students see the significance of statistics to their daily lives. At the end of the book, an English–Arabic glossary gives brief explanations of the key terms. Each key term is translated into Arabic. Statistics for Business is written with the Arab student in mind. Names of persons, companies, and places refer to the student’s environment. The book is clearly written, with a vocabulary that is readable and understandable for students whose mother tongue is not English. However, the book is kept to scientific standards. The text emphasizes the clarity and concision of the presentation. The approach is user-friendly and easy to understand. There are no formal proofs in the text. This book avoids using formal mathematical derivations and proofs and instead keeps the mathematics at a minimum to motivate the reader. All students have either personal computers or access to computing facilities in a campus lab. Statistics for Business problem solving is spreadsheet-oriented wherever possible. Use of spreadsheets has become a primary medium of instruction in Statistics. Microsoft Office is the standard and Excel, in particular, is ideal for manipulating quantitative data. This book aims to provide students with the skills to use Excel as a spreadsheet tool, as this is likely to be the prevalent software they will employ in the workplace. Excel templates are presented in special ‘Technology’ sections within each chapter. Screen captures are used to help the student become familiar with the nature of the software output. Statistics for Business is written for one- or two-semester courses in statistics. The text is intended for students who do not have a background in mathematics. The only prerequisite is knowledge of elementary algebra.
A01_BENG9800_01_SE_FM.indd 15
31/08/11 5:40 PM
xvi
Key Features
This book places an emphasis on problem solving. I believe people learn best when provided with motivation and structure. In order to facilitate the learning process, several pedagogical features are incorporated:
Chapter six
sampLing Distributions In this chapter, we focus on the properties of two important sampling distributions: the sampling distribution of the sample mean and the sampling
Key Features • •
distribution of the sample proportion. We also present the central limit theorem, which plays a crucial role in statistical analysis.
Learning
Learning objectives are presented at the beginning of every chapter, providing students with clear goals and direction, whilst also providing a list of the key topics that are covered.
Why use statistics?
In Tunis, citizens living downtown complain that it is very to find parking spaces for their cars because of the numb businesses taking up the parking spaces. The neighborhood h that are rented to employees during the day (from 7:00 a. However, these garages are not enough. Parking spaces avail are insufficient for residents and people coming for business. The municipality had developed an urban plan for dow the erection of a few high-rise office buildings (with parking for their employees). The result of this urban development in the number of people coming downtown for business. In municipality decided to install parking meters operating from p.m. with a maximum stay of two hours. This helped the mu additional resources for its budget.This decision was welcome came downtown for business, but not by residents. A petition of downtown inhabitants was sent to suggesting that they provide parking spaces for resident c owner would pay a monthly subscription to have the right to day in designated spaces for residents. Before implementin the mayor decided to conduct a survey. The number of downtown was close to 3,700. The municipality took a sample The proposal was to allow car owners living in the neigh during the day in designated spaces. For similar proposals in c the percentage of residents favoring the proposal was 55%. Th that he would accept the proposal if there were more than 1 the sample favoring the proposal. Using sampling distributions, the mayor would be a the likelihood of this happening. In this chapter, we will l distribution methods that would enable the mayor to calcula 7.21% chance that he will accept the proposal.
1. Review sampling methods.
objectives
2. Distinguish between population parameters and sample statistics.
3. Explain the central limit theorem. 4. Use the sampling distributions of X and pn .
Each chapter begins with an opening case, which describes an interesting and relevant real-world application to the material covered in the chapter. Four of these openers include real data from the Arab world.
•
10 428 Chapter Chi-Square teStS
429 Examples throughout the text are presented with a step-by-step approach to enable students to follow techniques easily and then solve other problems. More than one example is provided for each new concept. More than 240 examples are presented and solved in 17 chapters. They cover all aspects of the concepts introduced. Each example is followed by a clear and concise solution that develops the step-by-step methodology
teSt for GoodneSS of fit
where n is the sample size and p is the probability that an element belongs to that category if the null hypothesis is true. In a goodness-of-fit test,the number of degrees of freedom is df = k - 1 where k is the number of categories (or outcomes) for the experiment. In our case, k 5 4, the number of entrances. Next, we present the procedure for performing a goodness-of-fit test, which involves the same five steps that were used in the previous chapters. The chi-square statistic is computed as follows: 2 = a
Table 10.2 Computations for goodness-of-fit test
Entrance
k
( Oi - Ei ) 2 Ei
B
C
D
96
108
84
112
Expected frequency Ei
100
100
100
100
16
64
256
144
0.16
0.64
2.56
1.44
(Oi 2 Ei)2 (Oi 2 Ei)2/Ei
Total
4.80
2 = 4.80 STEp 4
i =1
A
Observed frequency Oi
10.1
Decision rule. A 5% significance level corresponds to the area in the right tail of the chi-square distribution being a = 0.05. The number of degrees of freedom is df = k - 1 = 4 - 1 = 3
where Oi 5 observed frequency for category i, Ei 5 expected frequency for category i, which is given by npi where pi is the theoretical probability of an element being in category i. The chi-square test for goodness of fit is always a right-tailed test because the Oi 2 Ei values are squared. To perform a goodness-of-fit test, the sample size should be large enough so that the expected frequency for each category is at least 5. This is known as Cochran’s Rule.
From the chi-square table (Appendix E), the critical value of 2 for df 5 3 and an area of 0.05 is 7.815. If the computed value of 2 is greater than the critical value, we reject the null hypothesis. STEp 5
Make a decision. Since the computed 2 value 4.80 is less than the critical value 7.815, we do not reject H0. There is enough evidence that the number of people entering the mosque through each entrance is uniformly distributed.
ExAMplE 10.1
•
Recall the above example of door entrances to a mosque. At the 5% significance level, can we reject the null hypothesis that the distribution of people entering the mosque is uniform across the four entrances?
Solution STEp 1
State the hypotheses. Because the number of people entering the mosque across the four entrances is supposed to be the same (i.e. uniformly distributed), we can write the null and alternative hypotheses as H0: distribution of people is uniform across the four entrances; H1: distribution of people is not uniform across the four entrances.
STEp 2
Select a, the level of significance: a = 0.05. Select the test statistic. We use Formula 10.1 to compute the test statistic:
STEp 3
2 = a k
( Oi - Ei ) 2
i=1
Ei
•
•
We just saw an application of the chi-square goodness-of-fit test to the uniform distribution. In the following sections, we will see how the same test can be applied to multinomial, normal, and Poisson distributions.
ApplicAtion
The observed frequencies are given in Table 10.1. The expected frequency for each category is 100. Table 10.2 shows the computations necessary to obtain 2.
to A
MultinoMiAl Distribution
The example presented in the previous section is an application of the goodnessof-fit test when the expected distribution is uniform. A chi-square test can be used to compare sample frequencies with any probability distribution. A multinomial distribution is defined by k probabilities p1,p2, c,pk that sum to 1. For example, the theoretical probabilities of possible outcomes when rolling a die are p1 = p2 = c = p6 = 1/6. The following example illustrates this situation.
Check Your Understanding problems are found at the end of each major section of the chapter, giving students the chance to practise what they have learnt and ensure understanding before moving on to the next topic. For each chapter, a set of Excel spreadsheets for the Technology sections can be used to solve problems. The Excel templates are available on the CD that accompanies the text. Although templates are useful and efficient aids to solving problems, illustrations of manual calculations have been retained so that students can calculate manually any result found in the templates.
A01_BENG9800_01_SE_FM.indd 16
Throughout the text, important formulas and main results are highlighted, signaling to readers that this material is particularly relevant for their understanding.
2 106 Chapter DesCriptive
Measures of shape
statistiCs Figure 2.7
CheCk your understanding
Kurtosis Types
Review PRoblems Mesokurtic
Leptokurtic
2.29
Consider the following score set 48, 45, 10, 26, 33, 40, and 47. What value corresponds to the 60th percentile?
2.30
A final exam for a history course has a mean of 81 and a standard deviation of 3. Find the corresponding z-score for each raw data value. a) b) c)
2.31
Platykurtic
83 76 90
d) e)
73 79
7, 13, 1. Comment on the shape of the distribution. 2.35
15
Dr. Souad gives a 15-point test to 10 students. The scores are
30
teChnology
template 2.4
b)
T emplaTe
For
m easures
oF
l ocaTion
and
s hape 2.32
Find the percentile rank of the grade 7. Find the percentile rank for the score of 10.
Assume that the data shown below represent the number of hours 12 parttime employees at Muskham Shopping Center (Ramallah, Palestine) worked during the week before and after Eid: Before
27, 32, 29, 35, 13, 9, 28, 27, 21, 32, 15, 21
After
13, 15, 19, 9, 23, 21, 29, 11, 15, 12, 9, 15
21
18
26
24
42
9
27
17
28
18
25
33
19
24
22
17
26
27
a)
13, 7, 10, 1, 2, 3, 14, 6, 5, 8 a)
Golden Pizza claims your pizza is free if the delivery time is more than 30 minutes. An investigator monitored 20 consecutive deliveries. The delivery times in minutes are reported below.
b)
2.36
The accumulated quantity of gold by December 31, 2010 (in tons) at the following Arab countries’ central banks is displayed in the following table.
Construct a box plot for each set of data and compare the distributions. 2.33
The total number of goals scored in one Football World Cup is as follows.
Number of goals Frequency
If we want to obtain the data value that corresponds to a given percentile, we enter the percentile in cell C21, C22, or C23 and the result appears in the adjacent cell D21, D22, or D23. If we want to obtain the percentile for a given data value, we enter the data value in cell F21, F22, or F23 and the result appears in the adjacent cell G21, G22, or G23. The quartiles and IQR are computed in cells D26:D29. The coefficients of skewness and kurtosis are computed respectively in cells D33 and D34.
0 4
1 12
2 12
3
4
18 11
5 6
7 1
Compute a) b) 2.34
the first quartile, the third quartile.
Construct a box plot for the number of TV sets sold during a randomly selected week at Carrefour: 6, 10, 21, 3,
Construct a box plot for the time it takes to deliver a pizza. Does the distribution show any outliers?
Saudi Arabia
322.9
Lebanon
286.8
Algeria
173.6
Libya
143.8
Kuwait
79.0
Egypt
75.6
Syria
25.8
Morocco
22.0
Jordan
12.8
Qatar
12.4
Tunisia
6.8
Bahrain
4.7
Yemen
1.6
Mauritania
0.4
www.gold.org/government_affairs/gold_reserves a) b)
Construct a box plot. Are there any outliers?
31/08/11 5:40 PM
107
xvii 
Key Features
•
At the end of the chapter, a chapter summary reviews the main topics covered, a key terms list gives all bolded terms in the chapter, key formulas are listed to make them easy for a reader to locate, and Solved Problems provide a comprehensive review of the concepts and procedures for tackling a problems.
5 246  Chapter Continuous probability
solveD problems
Distributions
Chapter Summary • In this chapter, we have discussed contin-
uous probability distributions described by continuous probability curves. In this case, a probability is represented by an area under the probability curve. We studied three continuous probability distributions: uniform, exponential, and normal.
• For the uniform distribution, the height
of the curve is the same over an interval defined by two parameters a and b, the lower and upper limits; the mean of this distribution is the average of its parameters (Formula 5.1).
• The exponential distribution describes
waiting times between occurrences of two events; its parameter is , the mean number of occurrences per unit of time (Formula 5.4). The exponential distribution and the Poisson distribution are related: the parameter is the mean of the
Normal Distribution Poisson distribution and 1/ the mean of the exponential distribution; both of these distributions are widely used in analysis of waiting times.
f (X) =
• The normal distribution, defined by
two parameters , its mean, and , its standard deviation, has a bell-shaped curve (Formula 5.9). Because different values of and give different normal istributions, we apply the transformation d Z = (X – )/ to get a standard normal distribution with = 0 and = 1. A table is available for the standard normal distribution. The normal distribution can be used to approximate a binomial distribution when n and n(1 – ) are both at least 5. Because the binomial distribution is a discrete distribution, we use a correction for continuity by adding or subtracting 0.5 to the X value being analyzed.
-
1 2
a
X - 2 b
for - 6 X 6 +
5.9
22
Standard normal variable: Z =
Variance = 2 X -
5.10
Normal Approximation to the Binomial Distribution = n
5.12
2 = n (1 - )
5.13
Solved problemS Problem A A taxi driver claims that the minimum time it takes him to reach downtown Amman is 7 minutes. a) Assuming a uniform distribution, what is the upper limit if the probability it takes this cab driver more than 15 minutes to reach downtown is 0.40? b) What is the probability density function?
Key termS Approximation to the binomial distribution Density function Distribution function
e
Mean =
Solution a) We use Goal Seek. 237 215 215
Exponential distribution Normal distribution Standard normal distribution Uniform distribution
219 225 228 214
Set cell B3 to 7 and cell A9 to 15. In ‘Set cell’ enter D9, in ‘To value’ enter 0.4, and in ‘By changing cell’ enter C4. The result is: b = 20.33 minutes. b) f(X) = 1/(20.33 - 7) = 0.075 for X between 7 and 20.33.
Problem b
Key FormulaS
Assume that it takes an exponential time with a mean of 2.5 minutes to answer phone calls at the call center of Garyounis University in Benghazi in Libya. a) What is the probability density function of X? b) What is the probability that the length of a phone call is no more than 4 minutes? c) What is the probability that the length of a phone call is at least 3 minutes? d) What is the probability that the length of a phone call is between 2 and 5 minutes? e) What is the probability that the length of a phone call is at most 45 seconds?
Uniform Distribution
114 
Chapter 2 DesCriptive statistiCs
f(X) = 1/(b - a) for a ‌ X ‌ b
for f(X) = 0 otherwise
5.1
Mean
= (a + b)/2
5.2
Variance
2 = (b - a)2/12
5.3
for X 7 0
5.4
Solution = 1/2.5 = 0.4.
Exponential Distribution f(X) = e
Chapter review problems 2.37 Prima Sport was recently opened in Lattakia to provide equipment and supplies to players and teams. During the six months of operations, the manager kept track of the number of purchases made by customers each day. Assume she selected a random sample of 30 days. Below is the number of invoices issued for the sample. 14 14 19 19 21 23 24 23 22 25 24 27 28 29 27 25 24 27 30 33 30 28 29 31 38 31 28 24 29 23 Compute the mean, median, mode, mid range, 10% trimmed mean, and the average growth rate for these data and the standard deviation. 2.38 A Bank conducted a study of its customers to determine the total credit card debt. Suppose that a sample of 40 customers shows the following results in dollars. 1,245 3,723 5,216
968 2,218
2,261 1,845 3,167
0
0 3,824 3,316 5,830 1,792
2,010 4,390 3,978 1,549 4,056 420 2,045 4,174 3,156 3,621 1,268
647 1,941
0
947
0 4,210
0 2,843
0 1,642
339
867
759
0 1,324
a) Compute the mean, median, mode, and midrange for these data. b) Develop a box and whiskers plot for these data. c) Based on the box and whiskers plot, does it appear that the distribution of credit card debt is skewed? If so, in which direction is it skewed? Discuss.
•
2.41 File Syria Stock Exchange represents data on the volume traded between January 3, 2011 and March 22, 2011 Source: www.dse.sy.
a) b) c) d)
What is the daily average of volume traded? What is the standard deviation of volume traded? Develop a box plot. Are there any outliers?
2.42 Consider Problem 2.2 on p. 81. Compute the mean and the standard deviation. What can you say about the shape of this distribution? 2.43 Consider the aircraft movement in Manama airport (Problem 1.52 on p. 55). Compute the mean, median, mode and range.
•
Listed below is the time in minutes for a sample of 20 customers. 8
3
5
10
9
22
15
8
12
15
4
7
16
13
6
25
14
26
17
7
a) Find the mean, median, and mode of the times. b) Determine the range and the standard deviation.
E(X) = 1/
5.7
V(X) = 1/ 2
5.8
f(X) = P(X ‌ P(X Ú P(2 ‌ P(X ‌
0.4e-0.4X for X 7 0. 4) = 0.7981. 3) = 0.3012. X ‌ 5) = 0.314. 0.75) = 0.2592.
2.44 Consider the case of expatriate students in Riyadh schools (Problem 1.55 on p. 56). Compute the mean and the standard deviation. 2.45 The following table contains the annual returns (in percent) of two stocks.
Stock A
Stock B
Stock A
21.5445
–3.3014
2.6765
Stock B 2.9415
16.0590
–0.3612
–4.5580
19.1065 13.5680
15.2640
5.8565
10.7590
2.9415
13.2330
–2.8090
2.6765
19.1065
6.3468
–38.3190
–4.5580
13.5680
6.9523
16.9865
10.7590
a) Compute the mean return and skewness coefficient for each stock. b) Interpret the differences.
after tax of MEC, a Jordanian company.
Descrip tion Net profit after tax
2008
2007
2006
2005
2004
5,750,109 10,779,521 9,876,915 22,035,397 5,428,297
Source: MEC Annual Report 2008
1 54  Chapter IntroduCtIon to StatIStICS
Miniprojects Miniproject 1.1 Consider data file “DJIA 2000–2006�. In this miniproject, you are asked to compute the change in percentage between two consecutive values of the closing price. a) Construct a frequency distribution of the percentage change of the closing price. b) Draw a histogram. c) Draw a frequency polygon.
Miniproject 1.2 For this miniproject, you are asked to collect a data set of interest to you that you will use for this chapter and the following chapters.Your data set should contain at least one qualitative variable and at least one quantitative variable. The data should contain between 50 and 100 observations. Examples of data sets could be cars. Quantitative variables could be price, mileage, age of a car, etc. Qualitative variables could include the model, type (compact, SUV, minivan, etc.). Another example could be demographics data where quantitative variables may consist of income, family size, birth and death rates, etc. The qualitative variables may include the regions, cities, gender, etc. Other examples from sports or other areas may be used for this miniproject. Write a short report that answers the following questions: a) Describe the variables that you collected information on. Is your data set a sample or a population? If it is a sample is it a random sample? Is your sample representative of the population? For each quantitative variable, state whether it is continuous or discrete. f) Describe the meaning of variable, measurement. g) Set up an appropriate type of frequency distribution table of this data set for one of the quantitative variables and then compute the relative b) c) d) e)
A01_BENG9800_01_SE_FM.indd 17
a) b) c) d) e)
Chapter Review Problems at the end of the chapter contain problems that are more involved and cover all sections. Experience has shown that when students are asked to solve the problems at the end of each section, they tend to have less difficulty because they already know which formulas are directly involved in the solution.
At the end of each chapter, Miniprojects 2.46 Assume that the following data represent weekly provide supplementary illustrations of the salaries received by part-time consultants at, a consulting firm. Their purpose is to applicationsBeirut-based of statistics. 1,550 1,300 1,350 1,550 1,450 1,350 1,350 give students the opportunity to carry out 1,450 1,350 5,850 1,550 1,450 5,350 1,700 8 6 4 5 9 3 8 10 12 research projects in statistics by using approa) Compute the mean salary. a) Is the above data set a sample or a population? b) What is theprocedures. median salary? b) Compute the deviation from the mean. priate statistical There are 76 c) What is the modal salary? c) Verify that their sum is equal to zero d) Which of the above three measuresworld) is the (π(X – Ο) = 0). real cases (54 from the Arab out of 97 most representative? Justify your answer. 2.40 The manager of Aden Bank of Commerce is studMiniprojects. 2.47 Assume that the following data represent profit ying the time spent by customers in the bank.
2.39 Alexandria Electronics manufactures CDs at a rate of 1000/hour. A machine tests every hour the total production. The following are data concerning the number of defects recorded during 9 hours of operations.
- x
h) Build a histogram, a pie chart, and a stem and leaf.
Miniproject 1.3 Below are 50 names of students with their major taking a General Statistics class in Ibn Khaldun Business School. See data file “Ibn Khaldun�. Select a sample of nine students from this alphabetical list by using a) a simple random sample, b) systematic sampling, c) a cluster sample. Try to ensure that every student has an equal chance of being selected. Which sampling method seems most appropriate?
Miniproject 1.4 Consider the data file Qatar-Labor-ForceSample-Survey-2009 . Answer the following questions. a) Construct a pie chart for the labor force “economically active� using Table 1. b) Construct a relative frequency distribution of the educational status using Table 8. c) Construct a cumulative frequency distribution of Qatari females according to their occupation using Table 19. d) Construct a Pareto chart of non-Qatari females according to their educational status using Table 21.
Miniproject 1.5 Consider data file Yemen Trade by Country. Select a continent of your choice: Africa, America, Asia, or Europe. Next, select a region of at least ten countries. Use the data of total export in terms of value to answer the following questions. a) Construct a relative frequency distribution. 31/08/11 5:41 b) Construct a pie chart.
PM
247
Other Features Several other features make this book a valuable resource. Answers to odd-numbered problems are given at the end of the book. Extensive solutions are provided for some problems, so that students may review the tools and procedures used to solve these problems. A set of statistical tables is also provided in the appendix. A comprehensive index at the end of the book enables readers to use the book as a reference for their continued studies, while a bibliography provides a current selection of more advanced books and interesting articles.
Cd An accompanying CD contains Excel templates as wel as Excel data sets for problems and Miniprojects indicated by the CD icon. This can save the student from having to enter data by hand, which takes up valuable time and increases the chances of error. The CD also contains the chapter slides, and the additional material, Chapters 15 through 17, are available on the CD.
Instructor Supplements Instructors can access a variety of print, digital, and presentation resources available with this text in downloadable format at the Instructors Resource Center, accessible via the link: www.pearsoned.co.uk/awe/benghezal. Registration is simple and gives you immediate access to new titles and new editions. As a registered faculty member, you can download resource files and receive immediate access to and instructions for installing course management content on your campus server. The following supplements are available for download to instructors adopting this textbook:
• • •
Solutions Manual TestGen (test-generating program) PowerPoint slides
While the book is comprehensive, its organization allows instructors to choose topics and depth of coverage as desired. The text contains more material than one could normally hope to cover in a one-semester course. Topics presented near the end of each chapter can be considered optional and hence be skipped without loss of continuity. One of the primary objectives in writing this book is to provide you, the reader, with a book that enhances your learning experience in quantitative business analysis. However, the degree of success you achieve in your quantitative business analysis studies will depend in large measure on the effectiveness of your learning habits. The better you can explain the “how” and “why” of key concepts the more thorough will be your understanding.
A01_BENG9800_01_SE_FM.indd 19
31/08/11 5:41 PM
Acknowledgements I am grateful to my colleagues at the University of Algiers, Ajman University of Science and Technology, Abu Dhabi University and the University of Sharjah for their advice and help. I should not forget all those former students who had a hard time with Statistics and other quantitative topics. I am sure they will recognize themselves because I used their names in this book. I want to thank Dr. Maher who kindly agreed to translate glossary terms into Arabic. The Pearson staff deserves my gratitude for their professionalism, guidance, suggestions and encouragement throughout what has been a hard, but rewarding, task. Special thanks to Rasheed Roussan, the Pearson Acquisitions Editor, who contacted me to participate in the Arab World Publishing Program. I extend my most heartfelt thanks to Sophie Bulbrook, my development editor who has done a fantastic job. I would also like to thank Kate Sherington, Project Editor, and Fay Gibbons, Editor. I wish to acknowledge the continuing support, understanding, patience, and encouragement that I receive so generously from all the members of my family. My sons, Amin, Sami, and Rochdi, provided the necessary inspiration to undertake and complete this project. I would like to thank the following reviewers for their thoughtful comments and suggestions: Dr. Idries Al-Jarrah, University of Jordan, Jordan Fadi Awawdeh, Hashemite University, Jordan Edgard A. Rizk, Lebanese German University, Lebanon Professor Fathi M. Allan, United Arab Emirates University, UAE Professor Medhat Hassanein, American University in Cario, Egypt Dr. George Fahmy Rezk, Arab Academy for Science & Technology, Egypt Prof. Dr. Akram M. Chaudhry, University of Bahrain, Bahrain Dr. Kastoori Srinivas, Osmania University, India Farouk Benghezal
A01_BENG9800_01_SE_FM.indd 21
31/08/11 5:41 PM
Chapter seven
Estimation and Confidence Intervals In this chapter, we focus on the properties of estimators. We will use the estimates obtained from sampling to determine confidence interval estimates for parameters.
Learning
objectives
1. Define estimators and describe their properties.
2. Determine confidence intervals for a mean. 3. Determine confidence intervals for a proportion.
4. Determine confidence intervals for a variance and a standard deviation.
5. Compute the minimum sample size for estimating a parameter given a confidence level.
M07_BENG9800_01_SE_C07.indd 282
24/08/11 7:42 PM
Chapter 7 Estimation and Confidence Intervals
283
Why use Statistics? Almarai is the largest integrated dairy foods company in the Arab world. It produces a wide range of dairy products and juice. Almarai Laban, one of their dairy products, is available in four different sizes: 2 liters, 1 liter, 500 ml, and 200 ml. As part of quality assurance, the company conducts tests every day on Laban to make sure that it meets the product standards. The nutrition standards per 100 ml serving are as shown in the following panel: How could Almarai be Nutrition Information (per 100 ml serving) sure that its laban meets these Vitamin D3: 400 I.U/L standards? One way would be to Calcium: 100mg take a sample of 100 ml of Laban Carbohydrate: 4.7g at different times every day. Protien: 3.1g Suppose that a quality engineer, Saud, measures the content of Source: "Our Products: Fresh Laban," Almarai website, calcium in the sample.A sample www.almarai.com/main.html#/en/Our%20Products/4. of 14 measurements yields the following: 99.7, 100.2, 100.3, 99.6, 100, 101.4, 99.5, 100.2, 99.7, 98.9, 100.3, 100.1, 99.6, and 100.1 mg of calcium. In this example, eight values are above the standard requirement (100 mg) and 6 values are below it. Is Saud going to reject the day’s production because there are 6 values below the industry standards? There are tolerance limits within which products are accepted by the Dairy Association: Laban’s calcium content must be between 99 and 101 mg. One observation is below the lower limit of 99 and one is above the upper limit. Does this mean that Almarai has failed to meet the industry standards? How is the quality engineer going to use these data to check whether today’s Laban production meets the industry standards? If the standards are not met, Saud can conclude that the process of producing Laban is out of adjustment. To make a decision about accepting the day’s production, Saud can com pute the following values: the sample mean of 99.97 mg and the standard de viation of 0.568 mg. Using these two values he is able to conclude that the day’s production of Laban meets the industry standards. How did he do that when the mean 99.97 is below the standard? In this chapter, we will learn that the statistics 99.97 and 0.568 are defined as point estimates. Saud computed the following in terval: [9.64, 10.30], which he compared to the industry tolerance limits (99 and 101 mg). Then, he concluded that the production meets the industry standards for calcium content and the production process does not need to be adjusted. The interval [9.64, 10.30] is a 95% confidence interval. In this chapter, you will learn how to compute confidence intervals.
M07_BENG9800_01_SE_C07.indd 283
24/08/11 7:42 PM
7 284 Chapter Estimation and Confidence Intervals
In Chapter 6, we used samples to obtain values such as the sample mean, sam ple variance, sample standard deviation, and sample proportion. These statistics give us an idea about the parameters of the population. For example, the sample mean is a statistic that tells us what the value of the population parameter is – if the sample properly represents the population. In general, we calculate a sample mean or a sample proportion to make an inference about a population parameter (mean or proportion p). Different samples give different sample means or sample proportions. For example, suppose 20 owners of Mercedes Class C200 cars are asked to drive on a highway for a distance of 100 km. The fuel consumption of each car is recorded. Average consumption of fuel for this sample is computed to be 7.2 liters/100 km. So in this example, 7.2 liters/100 km would be the estimate of fuel consumption of the Mercedes Class C200, the population. If three other samples of the same size were taken, we may get different estimates, such as 6.9, 7.6, and 6.7 liters/100 km. These values are estimates of the population mean . Because these estimates are different, it means that that they are subject to errors. To overcome this problem, and based on our knowledge of the sampling dis tribution (using the central limit theorem of Chapter 6), we can develop an interval estimate constructed around the sample mean so that we are reasonably sure that this interval contains the population mean. This interval is known as a confidence interval, which has a specific probability of containing the population parameter we want to estimate: .
Estimation Estimation is one important aspect of inferential statistics.
E stimation Assignment of value(s) to a population parameter based on a value of a sample statistic. Generally, populations are large, and a sample allows us to collect data from the population of interest and determine an estimate of the true population para meters. One question arises: How large should the sample size be in order to make an accurate estimate of a population parameter? The answer to this question de pends on factors such as the desired precision and the probability of making an accurate estimate. The sample size will be determined according to the desired level of accuracy we want our estimate to have. We will be able to show that the mean of a random sample is a sufficient estimator of the mean of a normal population with known variance, and this implies that there is nothing to be gained for this purpose by actually specifying the individual values of the sample or the order in which they are obtained.
E stimate A statistic obtained from a sample to infer the value of a population parameter. A point estimate is the value of the estimator in a given sample.
M07_BENG9800_01_SE_C07.indd 284
24/08/11 7:42 PM
Estimation
P oint
285
estimate
A value of a sample statistic that is used to estimate a population parameter. For example, assume that you go to the airport to pick up your father, who is oming home from a business trip. You check the arrivals board to see if the flight is c late. Fortunately, the flight is not late but you notice that 3 other flights are late out of the 15 flights displayed on the board. The proportion of late flights is 3/15 = 0.2. Thus, you have just computed a point estimate of the proportion of late flights from a sample of 15 arrivals. If we use a sample mean to estimate the mean of a population, a sample proportion to estimate the parameter p of a binomial distribution, or a sample variance to estimate the variance of a population, we are in each case using an estimate of the parameter in question. In this section, we present some important properties of a good estimator: unbi asedness, efficiency, consistency, and sufficiency. Let us look at unbiasedness first.
U nbiasedness The property of an estimator that its expected value is equal to the population parameter it estimates. In other words, it would seem desirable that the expected value of an estima tor be equal to the parameter it is supposed to estimate. Recall that in Example 6.1 on p. 259 we computed the mean of all possible samples of size 2 taken from the population (refer to Table 6.3 on p. 260).When we computed the mean of the sample means, we reached the following conclusion: The mean of the sample means, X-, is equal to the population mean . Therefore, the sample mean X is an unbiased estimator of the population mean . This indicates that if we keep taking samples from this population and compute X for each of the samples, in the long run the average value of the Xs will be the parameter . The next property of good estimators is efficiency.
E fficiency The property of an estimator that it has a relatively small variance. If we compare two estimators, one estimator is relatively more efficient than the other if its variance is smaller. Another desirable property of estimators is consistency.
C onsistency The property of an estimator that its probability of being close to the parameter it estimates increases as the sample size increases.
M07_BENG9800_01_SE_C07.indd 285
24/08/11 7:42 PM
7 286 Chapter Estimation and Confidence Intervals
For example, as we have seen in Chapter 6, the sample mean X has a variance sX2 = s2/n. As the sample size n increases, the variance of X decreases, and therefore the probability of being close to the parameter increases: a consistent estimator con verges towards the parameter it estimates. The last property of good estimators is sufficiency.
S ufficiency The property of an estimator that it utilizes all the information that is contained in a sample. For example, we will be able to show that the mean of a random sample is a sufficient estimator of the mean of a normal population with known variance; this indi cates that there is nothing to be gained by specifying the order in which the individual values of the sample were obtained. Similarly, we will be able to show that the sample proportion is a sufficient estimator of the parameter p of the binomial distribution; this means that there is nothing to be gained by actually specifying the order in which successes and failures were obtained. We want our estimators such that sample mean, sample proportion, and sam ple variance to have the properties of a good estimator (unbiasedness, efficiency, consistency, and sufficiency) so that we can use them to make inferences about the population parameters , p, and s2.
Confidence Interval for the Population Mean When we select a sample from a population, we compute the mean of this sample by dividing the sum of the values by the sample size. If a second sample is taken and a mean is calculated, it is very likely that the second sample will provide a differ ent mean. Further samples will yield more (different) values for the sample mean. We note that the population stays the same during this process. Therefore, the point estimate assigns a value to , the population parameter, that will almost always be different from the true parameter. So, instead of assigning a single value to a popu lation parameter, an interval is constructed around the point estimate and then a probabilistic statement that this interval contains the population parameter is made. The probabilistic statement is given by the confidence level.
C onfidence
level
A degree of certainty, expressed as a percentage, that an interval would include the population parameter (for example, a 95% confidence level). An interval constructed based on a confidence level is called a confidence interval.
C onfidence
interval
A range of values within which we can declare, with some level of confidence, the population parameter lies.
M07_BENG9800_01_SE_C07.indd 286
24/08/11 7:42 PM
Confidence Interval for the Population Mean
287
When calculating confidence intervals for population means, we need to con sider whether the standard deviation s is known or not. We will look at both cases: where s is known and where s is unknown.
Confidence Interval for Mean when s is Known
the
Population
To present the concept of confidence interval by means of an example, let us con sider again the sampling distribution of X for random samples of size n taken from a population with a mean of and a known standard deviation of s. We know (by the central limit theorem) that the random variable X is normally distributed with s mean and standard deviation when n is at least 30. The following Z-formula for 1n sample means can be used to find probabilities:
Z=
X- s
6.4
2n The value of can be obtained by rearranging this formula algebraically: = X - Zs> 2n
As the sample mean can be greater than or less than the population mean , Z can be positive or negative. Therefore, we write the preceding expression in the form = X { Zs> 2n
The interval in which is contained is
X - Zs> 2n … … X + Zs> 2n
Figure 7.1 displays this interval; the area under the curve is the confidence level or probability that is contained within the confidence interval. Figure 7.1 Confidence Interval for for Known
This area is the confidence level This is the confidence interval X –Zσ
n
X+Zσ
n
The value of Z depends on the probability with which we want to fall in this inter val. We know that the area under the normal curve is 0.95 between Z = {1.96. The probability that the mean is located in the interval between Z = {1.96 is shown in Figure 7.2: P a X - 1.96 s> 2n … … X + 1.96 s> 2n b = 0.95
This indicates that 0.05 of the area under the normal curve is located in the tails. Let us denote by z/2 the point on the horizontal axis under the standard normal curve that yields a right-hand tail equal to >2. As illustrated in Figure 7.2, the area under the standard normal curve to the right of z/2 is /2 and, by symmetry of the standard normal distribution, the area under the curve to the left of –z/2 is also >2. Therefore, the area under the standard normal curve between -z/2 and + z/2 is 1 - .
M07_BENG9800_01_SE_C07.indd 287
24/08/11 7:42 PM
7 288 Chapter Estimation and Confidence Intervals Figure 7.2 A 95% Confidence Interval for for Known 0.95 0.025 X – 1.96 σ
0.025 n
X + 1.96 σ
n
In general, the confidence interval is expressed as a percentage: for example, 95% or 99%. A (1 - )100% confidence interval for when s is known and sampling is done from a normal population is the interval bounded by s X { z/2 7.1 1n Formula 7.1 can be used as long as the population is normally distributed and the standard deviation is known. Moreover, there is no restriction on the sample size, whether small or large. The value z/2 is called the cut-off value for the (1 - )100% confidence level.
Example 7.1 Dr. Abbas from the University of Baghdad selects a sample of 16 grades from a population that he assumes to be normally distributed with a standard de viation of 3. The sample mean turns out to be 76. Construct a 95% confidence interval for the population mean .
Solution We have X = 76, s = 3, and n = 16. A 95% confidence interval gives a cut-off value z>2 = 1.96: P a X - 1.96 s/1n … … X +1.96 s/1n b = 0.95
The confidence interval is
76 - 1.96 3/216 … … 76 + 1.96 3/216 74.53 … … 77.47
There is a 95% chance that such an interval would include the population mean .
When a large sample is selected from a population that is not normally distrib uted, we can use the results of the central limit theorem, which states that for large samples (n Ú 30) the sample mean X is normally distributed with mean and standard deviation s/2n whatever the shape of the population. If s is unknown, then S, the sample standard deviation, can be substituted for s; when n is large, we can use Formula 7.2:
M07_BENG9800_01_SE_C07.indd 288
24/08/11 7:42 PM
Confidence Interval for the Population Mean
289
A (1 - )100% confidence interval for when s is known and the sample size is large (n Ú 30) is the interval bounded by X { z/2
S 1n
7.2
where S is the sample standard deviation.
Example 7.2 Assume that a sample of 49 accounts of credit card holders from Gulf International Bank yields a monthly mean balance of $675 and a standard deviation of $121. Construct a 90% confidence interval for the population mean .
Solution Here we have X = 675, S = 121, and n = 49. Since the sample size is large (n Ú 30), we can use the normal distribution approximation and apply Formula 7.2. For a 90% confidence interval: Z/2 = {1.645 P a X - 1.645 S/2n … … X + 1.645 S/2n b = 0.95
The confidence interval is
675 - 1.645 a 121/249 b … … 675 + 1.645a 121/249 b 646.6 … … 703.4
Gulf International Bank can be 90% confident that the average monthly bal ance of credit card holders is between $646.60 and $703.40.
Finite Correction Factor Recall from Chapter 6 that if the sample is taken from a finite population, a finite population correction factor may be used to increase the accuracy of the solution. In the case of confidence intervals, the finite correction factor is used to reduce the width of the interval. As stated in Chapter 6, if the sample size is less than 5% of the population, the finite correction factor does not significantly change the solution. The following is Formula 7.1 modified to include the finite correction factor:
M07_BENG9800_01_SE_C07.indd 289
X { Z>
2
s N - n 1n C N - 1
7.3
24/08/11 7:42 PM
7 290 Chapter Estimation and Confidence Intervals
Example 7.3 Imagine that a sample of 40 employees is taken from Kwality Ice Cream, a Saudi ice-cream manufacturer that employs 500 people. Suppose that a random sample indicates that the average number of days of absence in a year is 5.3. Records show that the company has experienced in the past a standard devia tion of 3.2 days of absence. Construct a 95% confidence interval to estimate the average number of days of absence of all employees in this company.
Solution This example involves a finite population. The sample size is n = 40, which is greater than 5% of the population (N = 500). The sample mean is X = 5.3 days, and the population standard deviation is s = 3.2 days. The z/2 value for a 95% confidence interval is 1.96. Substituting into Formula 7.3, we obtain 3.2 500 - 40 = 5.3 { 0.95 C 500 - 1 140 The 95% confidence interval for the mean number of days of absence for the population of employees in this company is 4.35 … … 6.25 Without the finite correction factor, the result would have been 5.3 { 1.96
4.31 … … 6.29 The interval is wider. The square root of (500 – 40)/(500 – 1) is 0.96. Multiplying the standard error a 1.96 a
3.2
/140 b b by this factor reduces the standard error by 4% (i.e. 1 – 0.96).
This reduction in the size of the standard error yields a smaller range of values for estimating the population mean. The larger the sample size the greater will be the reduction in the standard error. For example, if the sample size is 80, you can check that the reduction in the standard error is 8.3%.
If the sample size is large enough (n Ú 30) and the population standard devia tion is unknown, we can substitute in Formula 7.3 the sample standard deviation S for the population standard deviation s.
Technology C onfidence I ntervals
M07_BENG9800_01_SE_C07.indd 290
for
Template 7.1A M eans
with
s
known
24/08/11 7:42 PM
Confidence Interval for the Population Mean
291
Figure 7.3 presents the case where the standard deviation is known: we use the z-statistic. The population standard deviation, sample size and sample mean are entered in cells C4, C5, and C6 respectively. The lower and upper limits (cells C10:D13) of the confidence interval are automatically computed for different levels of confidence (cells B10:B13). If the population is not normally distributed but the sample size n is at least 30, we use Formula 7.2 and enter in cell C4 the sample standard deviation as an estimate of the population standard deviationi. If the population is finite and the sample size is greater than 5% of the population size, we use the finite correction factor to compute the confidence interval. We enter the population size in cell J4. The finite correction factor is automatically calculated in cell J5 and incorporated in to the formula that computes the confidence intervals. iYou
can check this for Example 7.2.
Confidence Interval s is Unknown
for the
Mean
when
In this section, we consider the case of a small sample taken from a population that is normally distributed with an unknown standard deviation.
Student’s t-Distribution
When the population standard deviation is known, the sampling distribution of the mean has only one unknown parameter: its mean . This is estimated by X. In real sampling situations, however, the population standard deviation s is rarely known. The reason for this is that both and s are population parameters. When we select a sample from a normal population with the purpose of estimating its unknown mean , the other parameter of the population, the standard deviation s, is very unlikely to be known. If we use S, the sample standard deviation, as an estimate of the normal population parameter s, we obtain the following random variable:
t =
X - S 1n
7.4
This random variable follows a distribution known as the t-distribution with (n – 1) degrees of freedom.ii
t- distribution A distribution that describes the sample of data in small samples (n 6 30) taken from a normal population with an unknown standard deviation.
D egrees
of freedom
The number of observations in a sample (n) minus the number of parameters being estimated. iiIn
M07_BENG9800_01_SE_C07.indd 291
1908, W. S. Gosset discovered the distribution of the random variable in Formula 7.3 and published his findings under the pen name Student.
24/08/11 7:42 PM
7 292 Chapter Estimation and Confidence Intervals
We will explain shortly the concept of degrees of freedom. The t-distribution has the following characteristics. a) The t-distribution and the standard normal distribution are shown graphically in Figure 7.3. Both distributions are continuous and bell-shaped, but the t-distribu tion is flatter than the normal distribution. This is because the standard deviation of the t-distribution is larger. Figure 7.3 Normal and tDistributions
Standard normal distribution
t-distribution with df = 22
t-distribution with df = 6
0
b) The t-distribution is characterized by its degrees-of-freedom parameter, denoted by df. For each degree of freedom df = 1, 2…, there is a corresponding t-distribution. Therefore, there is not only one t-distribution but rather a “family” of t-distributions. The mean of the t-distribution is zero, but the standard deviation depends on the degrees of freedom, and for df 7 2 it is equal to df/(df 2). c) As the number of degrees of freedom increases, the t-distribution approaches the standard normal distribution, because the errors in using S to estimate s decrease with larger samples. This is shown in Figure 7.3.
Degrees of Freedom
Previously, we stated that the t-distribution has (n–1) degrees of freedom. The degrees of freedom are associated with the sample standard deviation S. Recall that the sample standard deviation S given by Formula 2.14 is equal to 2 a ( X -X )
S=
S
n-1
When we compute S, we need the n values for X and the sample mean X, which is the sum of the n values of X divided by the total number of values n. If we know (n–1) values of X and the sample mean X, we are able to determine the nth value. The nth value is equal to nX - a Xi n-1
i=1
In this expression, we can freely select (n–1) values and the nth is obtained if we know X. For example, assume we know X = 8 and only five values out of six: 7, 12, 9, 5, and 10. The last value is computed as 6(8) - (7 + 12 + 9 + 5 + 10) = 6 The nth value depends on our choice for the (n–1) values. If the values selected freely are 4, 12, 8, 6, and 14, then the last value is 4.
M07_BENG9800_01_SE_C07.indd 292
24/08/11 7:42 PM
Confidence Interval for the Population Mean
293
This is the reason why the number of degrees of freedom is (n–1): the nth one is determined by the fact that we know the statistic X.
Using the t-Table
To find a value in the t-distribution table (Appendix D) requires that we know the sample size n. The t-distribution table is a compilation of many t-distributions, where each line represents a different sample size. However, the sample size must be con verted to degrees of freedom before determining a table value. For each distribution, the table gives values that correspond to areas under the curve. To find a value of t, we use Appendix D, a portion of which is reproduced in Table 7.1. A value of 2.3646 corresponds to seven degrees of freedom. This value also corresponds to
• a confidence level of 95% • a one-tailed of 0.025 • a two-tailed of 0.5 Figure 7.4 gives the location of the t-statistic and the corresponding probabilities. The area under the t-curve between –2.3646 and 2.3646 is 95%. The area under the t-curve to the right of 2.3646 corresponds to one tail, i.e. = 0.025. Finally, the areas under the t-curve to the right of 2.3646 and to the left of –2.3646 correspond to two tails, i.e. = 0.05.
Table 7.1
Confidence intervals
A portion of the t-distribution
80%
90%
95%
98%
99%
One-tailed
0.25
0.1
0.05
0.025
0.01
0.005
Two-tailed
0.5
0.2
0.1
0.05
0.02
0.01
1
1
3.0777
6.3137
12.706
31.821
63.656
2
0.8165
1.8856
2.9200
4.3027
6.9645
9.9250
3
0.7649
1.6377
2.3534
3.1824
4.5407
5.8408
4
0.7407
1.5332
2.1318
2.7765
3.7469
4.6041
5
0.7267
1.4759
2.0150
2.5706
3.3649
4.0321
6
0.7176
1.4398
1.9432
2.4469
3.1427
3.7074
7
0.7111
1.4149
1.8946
2.3646
2.9979
3.4995
8
0.7064
1.3968
1.8595
2.3060
2.8965
3.3554
9
0.7027
1.3830
1.8331
2.2622
2.8214
3.2498
10
0.6998
1.3722
1.8125
2.2281
2.7638
3.1693
11
0.6974
1.3634
1.7959
2.2010
2.7181
3.1058
12
0.6955
1.3562
1.7823
2.1788
2.6810
3.0545
13
0.6938
1.3502
1.7709
2.1604
2.6503
3.0123
df
M07_BENG9800_01_SE_C07.indd 293
50%
24/08/11 7:42 PM
7 294 Chapter Estimation and Confidence Intervals Figure 7.4 Values of t and Probabilities
df = 7 α/2
α/2
1 – α = .95 0.025 –2.3646
0.025 2.3646
Confidence Intervals with the t-Distribution
When the standard deviation s is unknown, the sample size is small, and the sample is drawn from a normal distribution, we use Formula 7.4 t=
X- S/2n
to build a confidence interval to estimate the mean . This formula can be manipu lated to yield the following boundaries for the interval containing : X { t/2, n - 1
S 1n
where t>2, n-1 corresponds to the cut-off value for the specified (1 -)100% confi dence level for (n–1) degrees of freedom.
A (1 -)100% confidence interval for the mean when the standard deviation s is unknown and sampling is done from a normal population with sample size n less than 30 is the interval bounded by X { t/2, n - 1
S 1n
7.5
Example 7.4 The following data represent the average production of drinking water (in liters/ second) per well in a sample of 24 Algerian governorates. 17.59
10.04
12.59
13.01
5.53
18.06
13.42
11.60
10.02
21.64
15.93
15.26
23.93
15.23
12.82
25.44
12.61
42.19
11.67
20.33
25.70
9.00
26.40
10.48
Source: Ministere des Ressources en Eau, www.ons.dz
Assume that the data are normally distributed; construct a 90% confidence in terval for the population mean of this set of data.
M07_BENG9800_01_SE_C07.indd 294
24/08/11 7:42 PM
Confidence Interval for the Population Mean
295
Solution First, we compute the mean and standard deviation of this sample of 24 data: X = 16.69, S = 7.9 The number of degrees of freedom for t is 24 - 1 = 23. The t-value for a 90% confidence interval when the number of degrees of freedom is 23, obtained from the t-table (Appendix D), is t0.05;23 = 1.714. The confidence interval has the limits 16.69 { 1.714a
7.9 b = 16.69 { 2.76 124
13.93 … … 19.45
P(13.93 … … 19.45) = 0.90 We are 90% confident that the average production of drinking water per well is between 13.93 and 19.45 liters/second.
Example 7.5 Suppose that Europcar Aden, a car rental agency in Yemen, collected the follow ing mileages for a sample of 15 cars rented in January 2011. What assumption should the manager make before constructing a 95% confidence interval for the population mean? Interpret the results. Note that data are given in km/day. 85
48
74
146
75
92
59
72
114
230
83
55
172
95
67
Solution First, the manager must assume that the population of mileages is approxi mately normally distributed in order to use the t-distribution to compute the confidence interval. To construct the confidence interval he needs to compute the sample mean and sample standard deviation: X = 97.8, S = 49.6 The number of degrees of freedom for t is 15 - 1 = 14. The t-value for a 95% confidence interval when the number of degrees of freedom is 14, obtained from the t-table (Appendix D), is t0.025;14 = 2.145. The confidence interval has the limits 97.8 { 2.145 a
49.6 415
70.7 … … 124.1
b
P(70.7 … … 124.1) = 0.95 Europcar Aden could therefore be 95% confident that the average daily mile age is between 70.7 and 124.1 km.
M07_BENG9800_01_SE_C07.indd 295
24/08/11 7:42 PM
7 296 Chapter Estimation and Confidence Intervals
Recall the chapter opening case of Almarai, where a sample of 14 observations on the calcium content of laban gives a sample mean of 99.97 mg and a standard deviation of 0.568 mg. If we assume that the sample is taken from a normal popula tion with unknown variance, we can use the t-distribution to compute the 95% con fidence interval: X = 99.97 S = 0.568 The number of degrees of freedom for t is 14 - 1 = 13. The t-value for a 95% confidence interval when the degrees of freedom is 13, obtained from the t-table (Appendix D), is t0.025;13 = 2.1604. The confidence interval is computed as 99.97 { 2.1604 a
0.568 413
b = 99.97 { 0.34
99.63 … … 100.31 There are tolerance limits accepted by the Dairy Association requiring that the calcium content of laban be between 99 and 101 mg. The confidence interval [99.63, 100.31] is inside the tolerance limits accepted by the Dairy Association; therefore, Almarai is meeting the required standard. Students sometimes have difficulty deciding whether to use z>2 or t>2 values when finding confidence intervals for the mean:
• As stated previously, when s is known, z>
values can be used no matter what the 2 sample size is, as long as the variable is normally distributed or n Ú 30. • When s is unknown and n Ú 30, S can be used as in Formula 7.2, and z>2 values can be used. • Finally, when s is unknown and n 6 30, we use S and t>2 values as in Formula 7.5, as long as the variable is approximately normally distributed. This is summarized in Figure 7.5.
Figure 7.5 When to Use Z- or t-Distributions
Is the population normal?
No Is n ≥ 30? No
Use an appropriate nonparametric test
M07_BENG9800_01_SE_C07.indd 296
Yes
Use the Z distribution
Yes
Is the population variance known? No
Use the t distribution
Yes
Use the Z distribution
24/08/11 7:42 PM
Confidence Interval for the Population Mean
Technology C onfidence I ntervals
for
297 
Template 7.1B M eans
with
s U nknown
Figure 7.7 presents the case where the standard deviation is unknown and the sample is taken from a population that is normally distributed: we use the t-statistic. The sample size, the sample mean, and the sample standard Âdeviation are entered in cells C17, C18, and C19 respectively. The confidence interval is automatically computed for different levels of confidence (cells B23:B26).
Example 7.6 A sample of hospitals in 20 regions of Saudi Arabia yielded a national average length of stay for inpatients of 3.8 days, with a standard deviation of 0.96 days. Assume that the distribution is normal. Construct a 95% confidence interval for the length of stay in Saudi hospitalsiii.
iii Source: 2009
Health Statistical Yearbook
(Continued )
M07_BENG9800_01_SE_C07.indd 297
24/08/11 7:42 PM
7 298 Chapter Estimation and Confidence Intervals
Solution X = 3.8, S = 0.96, n = 20 The number of degrees of freedom for t is 20 - 1 = 19. The t-value for a 95% confidence interval when the degrees of freedom is 19, obtained from the t-table (Appendix D), is t0.025,19 = 2.093. The confidence interval is given by 3.8 { 2.093a
0.96 b = 3.8 { 0.46 119
3.34 … … 4.26
In this case, there can be 95% confidence that the length of stay is between 3.34 days and 4.26 days.
Check your Understanding
Review Problems 7.1
Find a confidence interval for , as suming that each sample is taken from a normal population. a) b) c)
7.2
X = 14, s = 4, n = 5, 95% confidence level. X = 22, s = 6, n = 12, 90% confidence level. X = 56, s = 9, n = 22, 99% confidence level.
36 liters and a standard deviation of 8 liters. Find a 99% confidence interval for the population mean. What did you assume about the population? 7.5
Alexandria Travel Service owns 500 rental cars. The manager is interested in estimating the mean number of kilometers for which cars are used dur ing weekends. She selects a random sample of 35 cars for a particular week end. The survey indicates an average of 128 km and a standard deviation of 31.5 km. Construct a 90% confidence inter val for the population mean.
7.6
A survey of 26 Indian executives reveals that they spend an average of 52 hours in the office. The standard deviation is 4.3 hours. Assume that the sample is selected from a normal population. Find a 95% confidence interval for the population mean. Can we assume that the mean of the population is 54 hours?
7.7
A sample of six hotels shows that the average rate of occupancy of all types of hotels is 53.6% with a standard devi ation of 7.28%. Assume the data comes from a normal distribution; what is the 95% confidence interval for the popu lation mean rate of occupancy?
Find the t-values for the following levels and degrees of freedom: t0.10,29 t0.025,13 t0.05,18 t0.90,20 t0.95,25 t0.10,40
7.3
7.4
M07_BENG9800_01_SE_C07.indd 298
Suppose that a German car manufac turer wants to estimate the average km/liter highway rating for a new model. Suppose that a previous study indicated that the standard devia tion for similar models is 2.5 km/liter. A random sample of 81 highway runs yields a sample mean of 15.34 km/liter. Find a 95% confidence interval for the population mean km/liter highway rating. A gas station owner in Abu Dhabi would like to estimate the mean number of lit ers of gasoline sold to his customers. From past records, he selects a random sample of 70 sales and finds a mean of
Source: Oman Ministry of Tourism
24/08/11 7:42 PM
Confidence Interval for a Proportion
7.8
7.9
Forty-nine items are randomly selected from a population of 400 items. The sam ple mean is 38 and the sample standard deviation is 8. Find a 90% confidence interval for the population mean.
1,730.98
5,557.91
1,374.25
2,445.00
1,794.79
4,318.81
2,144.99
1,743.35
3,247.81
2,492.62
2,553.72
Assume a normal distribution; con struct a 90% confidence interval for the population mean of monthly spending.
7.10 Habib
The following data represent a sample of the monthly average amount of spend ing ($) per tourist in 2010 in Jordan.
3,261.84
299
Cortas selects a sample of 18 2-liter bottles of orange juice to check whether the filling machine is operat ing correctly. The sample, taken from a normal population, gives a mean of 1.96 liters and a standard deviation of 0.05. Construct a 98% confidence in terval for the mean content of a 2-liter bottle. Is the filling machine out of adjustment?
Sources: Jordan Ministry of Tourism and Central Bank
Confidence Interval
for a
Proportion
Methods similar to those presented in the previous sections can be applied to es timate the population proportion. In Chapter 5, we mentioned that for a binomial random variable, approximation by a normal distribution works well and, in general, offers a good estimate when both np and n(1 -p) are greater than 5. In Chapter 6, we also indicated that the central limit theorem applies to sample proportions provided that np 7 5 and n(1 -p) 7 5. The mean of a sample of proportions over all samples of size n randomly selected from a population is p (the population proportion), and the standard deviation of the sample proportion is (Formula 6.9 on p. 269) 3p(1 - p)>n Probabilities concerning sample proportions are computed using Formula 6.6 on p. 269, namely pn - p Z = p(1 - p) n C where pn = sample proportion = x/n p = population proportion x = number of successes n = sample size In Formula 6.6, we are trying to estimate p. However, the standard deviation of the sample proportion 2p (1 -p)/n requires that p be known. To overcome this problem, we must estimate the standard deviation of the sample proportion by sub stituting the sample proportion pn for the population proportion p, but this is valid only for large samples. A (1 - )100% confidence interval for the population proportion p is an interval bounded by pn (1 - pn ) pn { z> 7.6 2C n where the sample proportion is pn = x/n (with x being the number of successes in a sample of n trials).
M07_BENG9800_01_SE_C07.indd 299
24/08/11 7:42 PM
7 300 Chapter Estimation and Confidence Intervals
We illustrate the use of Formula 7.6 in the following example.
Example 7.7 Imagine that a market research firm based in Dubai wants to estimate the share that local companies have in the Gulf market for food products. If a study of 110 randomly selected consumers revealed that 40 of them buy local products, find a 95% confidence interval for the share of local products in the market.
Solution We have n = 110, x = 40. The sample proportion estimate is pn = 40/110 = 0.364 A 95% confidence interval for p has the limits pn { z>
2C
pn (1 - pn ) 0.364(1 - 0.364) = 0.364 { 1.96 = 0.364 { 1.96(0.0459) n C 110 = 0.364 { 0.09
The firm can be 95% confident that local products represent anywhere from 27.4% to 45.4% of the market.
Example 7.8 A sample shows that the Lebanese Central Bank has increased the interest rate 11 times from January to February and decreased it 17 times. The data concern the period between January 1982 and December 2010.iv Find a 90% confidence interval for the proportion of interest rate increases.
Solution We have n = 28, x = 11. The sample proportion estimate is pn = 11/28 = 0.393 A 90% confidence interval for p is bounded by pn { z>
2C
pn (1 - pn ) 0.393(1 - 0.393) = 0.393 { 1.645 = 0.393 { 1.645(0.092) n C 28 = 0.393 { 0.15
We are 90% confident that interest rate increases represent anywhere from 24.3% to 54.3% of the changes. iv Source: Banque
du Liban.
If the sample is selected from a finite population and if the sample size is greater than 5% of the population, we incorporate the finite correction factor N - n into Formula 7.6 as we did in Formula 7.3. CN - 1
M07_BENG9800_01_SE_C07.indd 300
24/08/11 7:42 PM
Confidence Interval for a Proportion
301
Example 7.9 Imagine that an airline asked all its passengers boarding a flight out of Egypt to fill in a questionnaire about the company. The number of questionnaires completed was 796. Hafez, a marketing manager, selected a sample of 45 ques tionnaires and recorded the answers to the question about the quality of food served on board. Nine passengers responded that the food was excellent. Construct a 98% confidence interval for the proportion of passengers that found the food excellent.
Solution We have N = 796, n = 45, x = 9. The sample proportion estimate is pn = 9/45 = 0.2 Since n/N = 45/796 = 0.057 7 0.05, we must use the correction factor to get the 98% confidence interval for p: pn { z>
2C
pn (1 - pn ) N - n 0.2(1 - 0.2) 796 - 45 = 0.2 { 2.33 n CN - 1 C 45 C 796 - 1 = 0.2 { 0.135
Hafez can be 98% confident that the proportion of passengers that found the food served on board excellent lies in the range from 6.5% to 33.5%.
Technology
Template 7.2
C onfidence I ntervals
for P roportions The sample size and sample proportion are entered in cells C4 and C5 respectively. The confidence intervals are given in cells C9:D12 for different confidence levels. This template has provision for the finite correction factor too.
M07_BENG9800_01_SE_C07.indd 301
24/08/11 7:42 PM
7 302 Chapter Estimation and Confidence Intervals
Confidence Interval
for the Variance
In this section, we explain how to obtain a confidence interval for the variance and the standard deviation. For example, when items that fit together – such as pipes – are manufactured, it is essential to keep the variations in the diameters as small as possible; otherwise the items will not fit correctly and will be scrapped. In Chapter 2, we used Formula 2.12 to compute the sample variance, namely S2 =
2 a (X - X) n - 1
where the sum of squared deviations from the mean,a (X - X)2, is divided by (n - 1) rather than by n.The reason concerns the degrees of freedom for the deviations. In the previous section, the t random variable offered us a method of construct ing confidence intervals for the mean of a normal population when the standard deviation is unknown and is replaced by its estimate S. Another such continuous dis tribution that allows us to find the confidence interval of the variance and standard deviation of a normal distribution is the chi-square distribution: 2(pronounced ‘kai square’).
C hi - square
distribution
A skewed continuous distribution whose shape depends on the number of degrees of freedom.
The chi-square distribution with various degrees of freedom is displayed in Figure 7.6. There is a theorem in Statistics, the proof of which is beyond this text, that states that if S2 is the variance of a random sample of size n drawn from a normal population with mean and standard deviation s, then (n - 1)S2 s2 has a chi-square distribution with (n–1) degrees of freedom. The chi-square distribu tion’s shape varies according to the number of degrees of freedom, as illustrated in Figure 7.6.
Figure 7.6 Chi-Square Distributions
df = 1
df = 5
df = 15
0
M07_BENG9800_01_SE_C07.indd 302
χ2
24/08/11 7:42 PM
Confidence Interval for the Variance
Using
the
303
2 Table
Appendix E gives values of the chi-square statistic 2 with different degrees of free dom, for given tail probabilities. A truncated version of the table is shown in Table 7.2 for different tail values. The table provides 2 for different right tails (probabilities) and for given degrees of freedom (df).
df
0.995
0.99
0.975
0.95
0.9
0.1
0.05
0.025
0.01
0.005
1
0.00005
0.0002
0.001
0.0039
0.0158
2.7055
3.8415
5.0239
6.6349
7.8794
2
0.0100
0.0201
0.0506
0.1026
0.2107
4.6052
5.9915
7.3778
9.2104
3
0.0717
0.1148
0.2158
0.3518
0.5844
6.2514
7.8147
4
0.207
0.2971
0.4844
0.7107
1.0636
7.7794
9.4877
5
0.4118
0.5543
0.8312
1.1455
1.6103
9.2363
6
0.6757
0.8721
1.2373
1.6354
2.2041
7
0.9893
1.239
1.6899
2.1673
2.8331
8
1.3444
1.6465
2.1797
2.7326
3.4895
9
1.7349
2.0879
2.7004
3.3251
4.1682
14.684
16.919
19.023
21.666
23.589
10
2.1558
2.5582
3.247
3.9403
4.8652
15.987
18.307
20.483
23.209
25.188
9.3484
10.597
11.345
12.838
11.143
13.277
14.86
11.07
12.832
15.086
16.75
10.645
12.592
14.449
16.812
18.548
12.017
14.067
16.013
18.475
20.278
13.362
15.507
17.535
20.09
21.955
Table 7.2 A portion of the 2 distribution
Example 7.10 Using the chi-square distribution with 10 degrees of freedom, find P(2 7 18.307) and P(2 6 3.247).
Solution Table 7.2 shows the right-tail area (probability). P(2 7 18.307) = 0.05 This probability is shown in Figure 7.7. Figure 7.7 2
P( + 18.307) 0.05
df = 10
α = 0.05 0
18.307
χ2
(Continued )
M07_BENG9800_01_SE_C07.indd 303
24/08/11 7:42 PM
7 304 Chapter Estimation and Confidence Intervals
This can be written as 20.05,10 = 18.307 For 2 = 3.247, Table 7.2 tells us that the area to the right of 3.247 is 0.975. Because the total area is 1, the area to the left of 3.247 is 1 – 0.975 = 0.025 and so P(2 6 3.247) = 0.025. This probability is shown in Figure 7.8. Figure 7.8 2
P( * 3.247)
df = 10
α =.025
.975
0
χ2
3.247
Next, we will use the chi-square distribution to construct confidence intervals for variances and standard deviations.
Confidence Intervals Distribution
with the
2
To derive a confidence interval for s2, we use the fact that
(n - 1)S2
follows a chis2 square distribution with (n–1) degrees of freedom; therefore, the sampling distribu tion for S2 can be defined using the chi-square distribution:
2 =
(n - 1)S 2 s2
7.7
We can also write s2 =
(n - 1)S2 2
A (1 - )100% confidence interval for the population variance s2 (where the population is normally distributed) is
(n - 1)S 2 2>2
… s2 …
(n - 1)S 2 21 - >2
7.8
where 2/2 is the value of the chi-square distribution with (n–1) degrees of freedom such that the area under the curve to its right is /2, and 12 - > is the value of the chi-square distribution with (n–1) 2 degrees of freedom such that the area under the curve to its left is >2 (or the area to its right is 1 -> 2).
M07_BENG9800_01_SE_C07.indd 304
24/08/11 7:42 PM
Confidence Interval for the Variance
305
Example 7.11 Flour comes in 50-kg bags. Suppose the packaging unit of Union Mills Co., based in Aleppo, Syria, is concerned about the variation in weight of 50-kg bags since it acquired new packaging equipment. Suppose that a random sample of 15 bags yields the following weights: 51.2
47.5
50.8
51.5
49.5
51.1
51.3
50.7
46.7
49.2
52.1
48.3
51.6
49.2
51.5
Find a 90% confidence interval for s2 and for s. Assume that the bag weights are normally distributed. A previous study made by a Turkish company shows that a 95% confi dence interval for their population standard deviation is [1.35, 2.78]. How do you compare the two companies?
Solution The mean and standard deviation for these data are X = 50.15, S = 1.65 The corresponding 90% confidence interval for s2 is (15 - 1)(1.65)2 20.05
… s2 …
(15 - 1)(1.65)2 20.95
(14)(1.65)2 (14)(1.65)2 … s2 … 23.685 6.571 1.61 … s2 … 5.80 The corresponding 90% confidence interval for s is 1.27 … s … 2.41 The packaging unit can estimate with 90% confidence that the population stan dard deviation of the weight of flour bags is between 1.27 and 2.41 kg. We cannot compare the two companies’ results because the confidence levels are different: the first is 90% and the second is 95%. To make comparisons, we must use a common confidence level. The 95% confidence level for Union Mills Co. is [1.21, 2.60], which is smaller than the Turkish company’s result These results suggest that Union Mills Co. is performing bet ter than the Turkish company.
A higher confidence level results in a wider confidence interval. Caution: The confidence intervals of the variance and standard deviation re quire that the population be normally distributed. We cannot use the central limit theorem for S2. The confidence intervals will not be accurate if the population is not normal.
M07_BENG9800_01_SE_C07.indd 305
24/08/11 7:42 PM
7 306 Chapter Estimation and Confidence Intervals
Technology C onfidence I ntervals
for
Template 7.3 V ariances
The sample size and sample variance are entered in cells C4 and C5 respectively. Confidence intervals are shown in cells C9:D12 for different confidence levels. This template has provision for calculating confidence intervals for standard deviations; see cells H9:I12.
Check your Understanding
Review Problems 7.11 Use
the chi-square table to determine the following values:
c)
20.1,10 20.025,30 20.95,15
d)
20.01,26
a) b)
7.12 For
each of the following cases, find pn and construct a confidence interval. a) b) c) d)
n n n n
7.13 For
= = = =
60, x = 35, confidence level 90% 150, x = 85, confidence level 95% 120, x = 72, confidence level 98% 90, x = 42, confidence level 99%
each of the following situations, check if the sample size is large enough to use the normal distribution to make a confidence interval for p:
M07_BENG9800_01_SE_C07.indd 306
n n c) n d) n a) b)
= = = =
50, 150, 350, 70,
np np np np
= = = =
0.35 0.06 0.45 0.08
7.14 Suppose
Al-Baghdadia TV has recently introduced a new television series for its afternoon programming. The pro gram manager wants to know how the audience likes the series. She ran domly selects 100 people who watched this series during the previous week. Of the sample, 71 people liked the series. Construct a 95% confidence in terval for the proportion of all people who like this series.
7.15 A
sample of 20 observations selected from a normal distribution yields a sample variance of 33. Construct a
24/08/11 7:42 PM
Estimation of the Sample Size
interval for the variance of lifetimes of all light bulbs of this manufacturer.
onfidence interval for s2 for each of c the confidence levels in a) to c): a) b) c) d)
99% 95% 90% What happens to the confidence in terval of s2 as the confidence level decreases?
nationwide study claimed that 21% of the 13- to 15-year-old age group in Jordan are smokers. A sample of 120 teenagers selected from a large school revealed that 19% were smokers. Find a 95% con fidence interval for the proportion and compare this with the results of the study.
307
7.18 Professor
Mazbout’s one-hour lectures vary in length. A sample of 20 of these lectures yields a standard deviation of 2.5 minutes. Assume that Professor Mazbout’s lectures are normally dis tributed. Construct and interpret a 98% confidence interval for the population variance and standard deviation of the lengths of all one-hour lectures by Professor Mazbout.
7.16 A
7.19 The
sugar content (in grams) of a ran dom sample of 25 cl containers of or ange juice are the following.
Source: Jordan Times 7.17 Faisal,
the quality engineer of a man ufacturer of light bulbs, periodically takes a random sample of 25 light bulbs for testing the lifetime. A sample yields a variance of 4,600 hours. Assume that the life of light bulbs is normally dis tributed. Construct a 95% confidence
Estimation
of the
3.6
4.5
6.2
4.3
6
5.3
4.6
5.7
3.9
3.7
7.1
4.7
5.8
3.9
5.7
6.6
4.5
5.1
5.3
4.9
Construct a 99% confidence interval for the population standard deviation. Assume a normal distribution of sugar content.
Sample Size
A large sample generally provides a better representation of the population than does a smaller one. But acquiring a large sample can be costly and time consum ing: why obtain a sample of size n = 500 if a sample of size n = 200 will provide sufficient accuracy in estimating a population parameter such as mean, proportion, or variance? This section demonstrates how to determine what sample size is neces sary for estimating the different parameters.
Sample Size for Estimating when s is Known When we want to estimate the mean of a population, the sample size can be deter mined by using the Z-formula for sample means to solve for n:
Z =
X - s 1n
6.4
How close do we want our sample estimate X to be to the unknown parameter ? The answer lies in the error of estimation that results from the sampling process.
M07_BENG9800_01_SE_C07.indd 307
24/08/11 7:42 PM
7 308 Chapter Estimation and Confidence Intervals
E rror
of estimation
The difference between the sample mean X and the population mean . Let us define E = 0 X - 0 as the error of estimation. Substituting E into the Formula 6.4 yields Z =
E
s/2n
If we know the critical value z> for a given level of confidence we can solve for the 2 sample size.
n =
z2>2s2 E
2
= a
z >2 s E
2
b
7.9
Example 7.12 Suppose that a Jeddah-based market research firm wants to determine the sample size required to estimate household spending on grocery products. The company specified that any estimate must be based on a 95% confidence level. Further, suppose that the margin of error must not exceed {$8. Given these requirements, what sample size is needed if the standard deviation of spending is $35?
Solution We use Formula 7.9 to estimate the sample size n. For this, we need z> for a 2 confidence level of 95%; this corresponds to z0.025 = 1.96. Also, E = 8, s = 35
Substituting these values into Formula 7.9, we obtain n = a
1.96(35) 2 b 8
= 73.5 74
Thus, to meet the requirement, a sample of 74 customers should be selected.
The more we increase the confidence level, the larger the sample will need to be, as can be seen from Formula 7.9.
Sample Size Unknown
for
Estimating
when
s is
Formula 7.9 assumes we know the population standard deviation s. Most of the time, the population standard deviation s is unknown and must be determined. There are different approaches to estimating the population standard deviation.
M07_BENG9800_01_SE_C07.indd 308
24/08/11 7:42 PM
Estimation of the Sample Size
309
• We can select a pilot sample from the population of interest of a smaller size than
the anticipated sample size. This pilot sample should provide us with an estimate of the population standard deviation. Then, the pilot sample standard deviation S is used in Formula 7.9 to obtain the sample size n. • The second option is to use the fact that 95% of the values of a normal population are located within two standard deviations of the mean (refer to Figure 7.2). In this case, the upper bound is estimated at + 2s and the lower bound at -2s. Thus, the range is 4s. An estimate for s is range/4. For example, if the maximum value is 35 and the minimum value is 15, an estimate for s is (35 -15)/4 = 5. Some statisticians use the fact that 99% of values of a normal population fall within three standard deviations of the mean. In this case, the range is divided by 6 to obtain an estimate of s. For our example, we would have s = (35 -15)/6 = 3.33.
• Suppose we are dealing with a Poisson distribution, such as the number of arrivals
at a bank; in this case we know that the standard deviation s is 1 where is the mean arrival rate (Formula 4.17 on p. 197). If the arrival rate is 25 customers per hour, then an estimate for the standard deviation is s = 125 = 5
Example 7.13 Ramzi, the owner of a Lebanese food company, wants to estimate the mean weight content of one kilogram tomato cans that are filled by a machine, with 95% confidence and an error of 8 grams. Assume that a pilot sample of 30 cans shows a sample standard deviation of 36 grams. What should the sample size be to estimate the mean weight content?
Solution For a 95% confidence level we set z = 1.96. We use S = 36 in place of s and set the desired error E to 8 to obtain the required sample size: z> s 2 n = a 2 b = [(1.96)(36)/8]2 E = 77.8 or 78 cans.
Example 7.14 In Example 7.13, assume that the largest weight content is 1,085 g and the small est is 890 g. Use a range of 6s to obtain a sample size with a 95% confidence level.
Solution For a 95% confidence level, we set z = 1.96. S = range/6 = (1,085 - 890)/6 = 32.5 E = 8 (Continued )
M07_BENG9800_01_SE_C07.indd 309
24/08/11 7:42 PM
7 310 Chapter Estimation and Confidence Intervals
The required sample size is n = [(1.96)(32.5)/8]2 = 63.4 or 64 cans Note that if we had used a range of 4s, the estimated standard deviation would have been S = (1,085 - 890)/4 = 48.75 and the sample size 143. Therefore, a larger estimate of the standard deviation leads to a larger sample size.
Caution: E, the allowable error, must be expressed in the same units as X or s.
Sample Size when Estimating Proportion
the
Population
Suppose we want to estimate a population proportion with an error of estimation within {E. What sample size should we require? In order to find the sample size needed to determine a confidence interval we use Formula 7.6. The error of estimation is
The sample size n is
E = z> n = a
pn (1 - pn ) 2C n
z>2 E
2
b pn (1 - pn )
7.10
where pn = x/n is an estimate of the population proportion.
Example 7.15 What sample size would be needed to estimate the true proportion of house holds in Muscat that own a plasma TV, with 90% confidence and an error of {2%, when a previous sample gave a proportion of 0.25?
Solution We set E = 0.02; a 90% confidence level corresponds to z = 1.645. The sample proportion is pn = 0.25. Applying Formula 7.10, we obtain n = a
1.645 2 b 0.25(1 - 0.25) 0.02
= 1268.4
The sample size should be 1,269.
Note: The unit used for E is not a percentage but a proportion: use 0.02 instead of 2%.
M07_BENG9800_01_SE_C07.indd 310
24/08/11 7:42 PM
Estimation of the Sample Size
Technology
311
Template 7.4
S ample S ize D etermination
This spreadsheet can be used in conjunction with Excel’s Goal Seek to find the required confidence level for a given size n of the sample and a specified error E. For example, we want to know the confidence level for a sample of size 100. We call up Goal Seek and define the ‘Set cell’ as C9; the value of the sample size is entered in ‘To value’: 100. The ‘By changing cell’ is B9. Once this is done we obtain a confidence level of 98%. Suppose we want to know the error that corresponds to a given confidence level (say 95%) and sample size (say 100). We again use Goal Seek. The ‘Set cell’ is C11 because it corresponds to the 95% confidence level. Next, we set ‘To value’to 100. The ‘By changing cell’ corresponds to the error E: cell C6. The result is an error of 6.86v. vYou
should enter the values for each example to see how Goal Seek works.
Check your Understanding
Review Problems 7.20 Find
the sample size necessary to esti mate when a) b)
s = 25 and E = 6 at a 95% confidence level, s = 3.8 and E = 1.6 at a 90% confidence level.
7.21 Find
the sample size necessary to esti mate p when a)
=
p = 0.35 and E = 0.04 at a 98% confidence level,
b)
=
p = 0.72 and E = 0.03 at a 90% confidence level.
7.22 You
want to estimate the mean spend ing of households in Aden. A pilot sam ple of 20 families yields a standard deviation of $65. You want to be 98% confident and you want your estimate to be within $13. How many house holds should you interview?
7.23 Amin,
a quality inspector, wants to es timate the percentage of defects from (Continued )
M07_BENG9800_01_SE_C07.indd 311
24/08/11 7:42 PM
7 312 Chapter Estimation and Confidence Intervals
a production process. He wants to be 95% confident about the result, with an error margin of 0.03. A previous sample of 60 products yielded two defects. How large should the sample be? a registrar at the University of Basrah, wants to determine the size of the sample needed to estimate the proportion of students who register in the College of Business. Of last year’s sample of 48 students admitted to the university, 18 elected to study business. She wants to be 90% confident with an error of 0.1. Calculate the size of the sample for Ghada.
7.26 Fatima
wants to invest in a school. She wants to be 95% confident about the estimate of the mean number of admis sions. She decides to take a sample of seven schools that opened in the last three years for her pilot study. The data on admissions are as follows.
7.24 Ghada,
7.25 The
mean arrival rate of customers at a certain bank last year was 25 customers per hour on Sunday mornings. How large a sample would be required to es timate this year’s mean arrival rate with a 90% confidence interval and an error of 2? Explain your assumption about s.
284
326
290
352
315
298
274
How large should the sample be if she wants the error to be no more than 12 admissions?
7.27 Consider
Problem 7.24. What could be the maximum error Ghada makes if she wanted to take a sample of 52 students at the same confidence level (90%)?
7.28 Consider
Problem 7.22. What is the maximum error you can make if you take a sample of only 70 at a confi dence level of 90%?
Chapter Summary • An estimator is a statistic obtained from a sample to infer the value of a population parameter. A good estimator has the fol lowing characteristics.
■
■ ■
■
Unbiasedness: its expected value is equal to the value of the population parameter. Efficiency: it has a small variance. Consistency: its probability of being close to the population parameter increases as n increases. Sufficiency: it utilizes all the informa tion contained in a sample.
• We developed confidence intervals for
three parameters: , p and s.When the sample size is at least 30, we applied the normal approximation (central limit theorem) to get the confidence interval for when the distribution is not normal (Formulas 7.1 and 7.2).
• If the sample size is less than 30 and
the sample is taken from a normal distribution with unknown variance, we used the t-distribution (with n -1 degrees
M07_BENG9800_01_SE_C07.indd 312
of freedom) to obtain the confidence interval for the parameter (Formula 7.5).
• If the sample is large, we use the central
limit theorem to get a confidence interval for the proportion (Formula 7.6). If the sample is taken from a finite population and the sample size is greater than 5% of the population, we incorporate the finite N - n correction factor into Formula CN - 1 7.6 as we did in Formula 7.3.
• To construct a confidence interval for
variances, we use the chi-square distribu tion (with n - 1 degrees of freedom) if the sample comes from a normal distri bution (Formula 7.8).
• The minimum sample size n is computed based on a given confidence level (per centage of time that the confidence interval contains the parameter) and the error of estimation E, defined as |X - | n -p| (Formula 7.10). (Formula 7.9) or |p
24/08/11 7:42 PM
Key Formulas
313
Key Terms Chi-square distribution Confidence interval Confidence level Consistency Degrees of freedom Efficiency Error of estimation
302 286 286 285 291 285 307
Estimation Estimate Point estimate Sufficiency t-distribution Unbiasedness
284 284 284 286 291 285
Key Formulas Confidence interval for when is known has the limits Confidence interval for when is unknown and the sample size is large (n Ú 30) has the limits Confidence interval for when s is known and sample is taken from finite population has the limits
s
X { z>
2
X { z>
2
X { z>
2
S
N -n 2n C N - 1 s
7.3
X - >S
t=
Confidence interval for when s is unknown and the sample size is small (n 6 30) has the limits
X { t> , n - 1
Confidence interval for the population proportion p has the limits
pn { z>
Chi-square statistic
2 =
2n 2
2C
S 7.5
2n
pn (1 - pn ) n
7.6
7.7
� s2
( n -1 ) S2 2>
7.4
(n - 1)S2
… s2 …
( n -1 ) S2
2
z2>2 s2
Sample size when estimating
n=
Sample size when estimating p
n= a
M07_BENG9800_01_SE_C07.indd 313
7.2
2n
t-statistic
Confidence interval for the population variance s2
7.1
2n
E
2
z>2 E
= a 2
21-/2
z>2 s E
b pn ( 1 -pn )
b
2
7.8
7.9
7.10
24/08/11 7:42 PM
7 314 Chapter Estimation and Confidence Intervals
Solved Problems Problem A To attract candidates to its MBA program, a university claims to be accepting 35% of candidates. During the previous year, of the 245 candidates who applied 70 were accepted. Construct a 95% confidence interval for the proportion of acceptances to this program.What do you think of the university’s claim?
Solution np = x/n = 70/245 = 0.286 We use Formula 7.6 to obtain the confidence interval for the population proportion. A 95% confidence interval for p is given by: pn { z>
pn (1 - pn ) 0.286(1 - 0.286) = 0.286 { 1.96 = 0.286 { 1.96(0.029) 2C n C 245
confidence interval = [0.2294, 0.3426]
The 35% acceptance claim is outside the confidence interval.
Problem B A Geant manager in Tunis wants to estimate the average amount of money that customers spend in the mall. Suppose that 144 customers are randomly selected and the sample results yield an average of $125 and a standard deviation of $29. Use the appropriate template to answer the following questions. a) Construct a 90% confidence interval for the mean spending in the mall. b) If the manager wanted to reduce the margin of error in part a), what options exist to do so?
Solution We use Template 7.1B. a) 90% confidence interval is [121, 129] b) T o reduce the margin of error, the manager can reduce the confidence level to, say, 80% and this will narrow the confidence interval to [121.9, 128.1]; or the manager can increase the sample size to, say, 200, which would shrink the confidence interval to [121.6, 128.4], assuming that the sample mean and standard deviation remain the same.
Problem C
In the automotive industry, an ‘early car replacement’ is defined as replacing a car within the first three years of its life and a ‘late car replacement’ as replacing a car after at least seven years. A Gulf consumer agency surveyed 250 early replacement buyers and obtained an average of 2.7 years and a standard deviation of 0.55 years. Another sample of 220 late replacement buyers yielded a mean of 8.3 years and a standard deviation of 1.2 years. a) Compute a 95% confidence interval for early car replacements. b) Compute a 90% confidence interval for late car replacements. c) How large a sample of early car replacement buyers is required to be 90% confident that the sample mean X is within 0.07 of ? d) How large a sample of late car replacement buyers is required to be 95% confident that the sample mean X is within 0.1 of ?
M07_BENG9800_01_SE_C07.indd 314
24/08/11 7:42 PM
Chapter Review Problems
315
Solution a) A 95% confidence interval for early car replacements is given by P(X - 1.96 S/1n … … X + 1.96 S/1n ) = 0.95
P(2.7 - 1.96 0.55/1250 … … 2.7 + 1.96 0.55/1250 ) = 0.95 confidence interval = [2.63, 2.77]
b) A 90% confidence interval for late car replacements is given by P(X - 1.64 S/1n … … X + 1.64 S/1n ) = 0.90
P(8.3 - 1.64 1.2/1220 … … 8.3 + 1.64 1.2/1220 ) = 0.90 confidence interval = [8.17, 8.43]
c) n = a d) n = a
2
1.64(0.55) b = 166.04, so the sample size should be at least 167. 0.07 1.96(1.2) 2 b = 554 0.1
Chapter Review Problems 7.29 Salalah College of Business wants to install a pho tocopy machine for staff. From experience at other colleges, the dean believes the number of docu ments is normally distributed with a daily standard deviation of 44 copies. The machine is tested for five days and the resulting daily mean is 345 copies. a) Give a 99% confidence interval for the mean number of pages copied per day. b) Suppose the dean will install the copier if she can be confident that the daily average number of copies will exceed 290. Does the result of a) justify purchasing a copier? Explain. 7.30 Professor Bin Tifor gave three tests last week in a large class. The standard deviation was s = 6 for all three tests and the scores were normally dis tributed. Below are 10 randomly selected scores on each test. Find a 95% confidence interval for the mean score on each exam. Do the con fidence intervals overlap? If so, what does this suggest? Test 1: 76, 69, 78, 71, 80, 72, 76, 82, 76, 70 Test 2: 73, 94, 85, 83, 72, 89, 80, 77, 66, 71 Test 3: 65, 64, 69, 67, 72, 64, 59, 56, 70, 64 7.31 Suppose that the finance department of Orascom group, a large Egyptian corporation with several branches, conducted a survey to determine the mean travel spending of its salespeople. If a
M07_BENG9800_01_SE_C07.indd 315
s ample of 64 travel expenses yields a weekly average of $256 and a standard deviation of $82: a) What is the estimate of the population mean? b) Determine a 95% confidence interval for . Explain what it indicates. c) Assume the finance department selects another sample of 81 travel expenses. If the sample mean and sample standard deviation remain the same, what is the 95% confidence interval for the population mean? Explain why this confidence interval is narrower. 7.32 A sample of observations selected from a nor mal distribution yields a sample variance of 55. Construct a 95% confidence interval for s2 in each of the cases a) to c): a) n = 12, b) n = 16, c) n = 25. d) What happens to the confidence interval of s2 as the sample size increases? 7.33 The new manager of the Al Buhaira Bowling Club would like to know how long current mem bers have been members of the club. He selects a sample of 45 current members. The mean length of membership of the sample is 6.38 years and the sample standard deviation is 1.85 years. a) What is the mean of the population?
24/08/11 7:42 PM
7 316 Chapter Estimation and Confidence Intervals
b) Construct a 90% confidence interval for the population mean. c) The former manager reported a mean of about 7.5 years. Does the sample information support this claim? Explain. 7.34 Suppose there are 2,500 students that are eligible to vote for the students’ council at the University of Sharjah College of Business. A sample of 150 stu dents revealed that 92 planned to vote for the cur rent president of the students’ council. Construct a 99% confidence interval for the proportion of eligible voters that plan to vote for the current pres ident. From this sample information, can you con firm that the current president will be re-elected?
7.40 A hospital has just been informed by a major pharmaceutical group that its new drug may have some undesirable side effects, specifically an increase in heart rate. Twenty patients were se lected who were prescribed the drug. All patients in the sample showed a heart rate of 58 prior to taking the medication. The following heart rates were recorded after taking the drug for a week. 52
72
62
72
92
78
74
54
82
87
57
70
74
83
44
56
68
80
60
74
a) Based on the sample data, construct a 90% confidence interval to estimate the mean heart rate, assuming a normal population. b) Referring to your answer in a), can the estimate be applied to all potential patients taking the drug? c) Referring to your computations in part a), if the average heart rate increased, determine the probability that a sample mean would be at least as large as the one obtained from the sample, as suming that the beginning mean rate is 62.
7.35 Hessa is interested in estimating the average purchase amount at convenience stores in the city of Doha. She selected a random sample of 36 purchases from several convenience stores. Use the following data to construct and interpret a 90% confidence interval for the population mean purchase amount. 7
3 42 21 18 12
13
6 33 24 21 15 12 4
9 17 27 11
9
16 30 24 14 19
15
8 19 15 23 31
7
26 14 35
5 28
7.36 Leila wants to estimate the average time taken to travel to work in the city of Cairo. Using a con fidence level of 95%, what kind of confidence intervals, based on the following random sample of commuters, can she construct?
7.41 The bad debt ratio for a bank is defined as the ratio of the amount of loans defaulted on to the total amount of loans. Suppose that a random sample of nine banks in a certain city yields the following bad debt ratios expressed in percent. 4
20 54 12 17 33 39 18 36 49 55 24 20 17 8 21 36 29 11 65 39 28 17 12 25 35 50
7.38 Assume that there are 2,320 students at an Islamic university. Currently, classrooms are segregated. To cut costs, the university management is consider ing offering nonsegregated classes at senior level. A survey of 340 students yields 124 students who favor no segregation in classrooms. Develop a 95% confidence interval for the proportion of students who favor no segregation. Management claims that the proportion of students who favor no segrega tion is at most 25%.What do you think of this claim? 7.39 A machine produces chips for alarms. A quality control inspector checks samples of the chips produced by the machine. If too many chips are defective, the production process is stopped to readjust the machine. If a random sample of 58 chips results in 5 defects, give a 98% confi dence interval for the population proportion of defective chips made by this machine.
M07_BENG9800_01_SE_C07.indd 316
5
4
2
5
4
3
5
a) Assuming that the bad debt ratios are nor mally distributed, determine a 95% confi dence interval for the mean bad debt ratio. b) The bank association claims that the average bad debt ratio for all banks of the country is 2.5 and that the mean bad debt ratio for this city is higher. Using a 95% confidence interval, can we be 95% confident that this claim is true? Using a 99% confidence interval, can we be 99% confident that this claim is true?
24 29 15 19 36 29 44 28 19 47 51 37 35
7.37 Suppose Azur Airlines wants to estimate the pro portion of business people traveling from Paris to Beirut, a new route. A sample of 196 passen gers revealed that 128 were on a business trip. Construct a 90% confidence interval for the pro portion of business travelers on this new route.
3
7.42 Musabah, a chemical engineer, would like to determine whether a new catalyst increases the output of a chemical process, which is currently 450 kg per day. To test this new catalyst, nine trials are made with the following results. 475
523
489
512
548
464
471
498
510
a) Assuming the output is normally distributed, determine a 95% confidence interval for the mean output obtained using this new catalyst. b) Based on the confidence interval of part a), can we be 95% confident that the mean output obtained with the new catalyst exceeds 450? c) Construct a 99% confidence interval for the population variance. 7.43 The following data are a sample of the number of liters of gasoline purchased at a gas station.
40 51 43 48 44 57 54 39 42 48 45 39 43
24/08/11 7:42 PM
Chapter Review Problems
a) Assuming this sample is taken from a normal population, construct a 95% confidence interval to estimate the mean value of the population. b) What is the point estimate for the mean? c) Construct a 90% confidence interval for the population variance. 7.44 Suppose the marketing department of the Marrakech Chamber of Commerce wants to esti mate the average number of customers who enter Semmarine Souq every 10 minutes. Aliya, a research assistant, is tasked to select 10-minute intervals and count the number of arrivals at the mall during each interval.Assume that she obtains the following data. 68
42
51
57
66
90
55
39
42
88
a) The analyst assumes the number of arrivals is normally distributed. Compute a 90% con fidence interval for the mean arrivals for all 10-minute intervals. b) Construct a 95% confidence interval for the population variance. 7.45 During the last two weeks of Ramadan, a fashion department store in Lebanon ran a promotion campaign to boost its sales during the pre-Eid period. The results were impressive on the first few days. Arwa, the manager, wants to estimate the average amount customers spent during this twoweek period. Assume that she randomly selects a sample of 24 bills, which yields the following customer spending: 321 546 449 540 125 987 519 350 764 467 582 762 420 328 557 865 408 326 547 910 373 502 842 586 a) Assume that the data are normally distributed. Construct a 98% confidence interval for the mean spending of all customers during this two-week period. b) Construct a 98% confidence interval for the population variance. 7.46 Suppose that Cirta Engineering, a small water pump company, produced 2,000 water pumps in 1995. In an effort to promote the quality of its product, the company decided to conduct a multiyear study of its 1995 water pumps. A sample of 250 owners of these water pumps was selected randomly. The owners were asked to contact an 800 number when the first major re pair was required for their water pump. After sev eral years, 214 water pump owners had reported. The other 36 were disqualified because they no longer owned the 1995 water pump. The average number of years before the first major repair oc curred was 6.2 years with a standard deviation of 1.53 years for the 214 owners who reported. If the company wants to advertise the average number of repair-free years of life expectancy for its water pumps, what is the point estimate? Construct a 90% confidence interval for the average number of years until the first major repair.
M07_BENG9800_01_SE_C07.indd 317
317
7.47 A sample of size 14 from a normally distributed population yields the following sample statistic:
2 a ( X- X ) = 135.8
a) Construct a 95% confidence interval for the population variance. b) Construct a 95% confidence interval for the population standard deviation. 7.48 A sample of size six from a normally distributed population yields the following sample statistics:
2 a X = 326 and a X = 42
a) Find a 90% confidence interval for the variance. b) Find a 90% confidence interval for the standard deviation. 7.49 Let be the weekly wage for workers in a large construction company. A random sample of such workers (with n 7 30) yielded a 95% confidence level for of $258 to $326 using a normal distribu tion with a known population standard deviation. a) What is the sample mean X? b) Construct a 99% confidence interval for based on this sample. 7.50 Hamed selected a first sample of 25 observa tions and obtained a 90% confidence interval of [148, 175]. Then, he selected another independ ent sample of 38 observations and obtained, at the same confidence level, a confidence interval of [156, 181]. What is the probability that neither sample includes the population mean? 7.51 Last semester, the minimum and the maximum times to complete an exam in finance were 39 and 62 minutes respectively. a) Based on this information, how large a sample would be needed to estimate this semester’s mean time to complete the finance exam with 95% confidence and an error of 3 minutes? Explain your assumption about s. b) What would be your answer for a 99% confi dence interval? 7.52 Suppose you want to estimate the average age of all Toyota Corolla cars still on the road in your city. You want to be 95% confident and you want your estimate to be accurate to within 1.5 years. The Corolla was first sold in your country 35 years ago and you believe that there are no cars older than 24 years on the road. How large should your sample be? 7.53 A survey conducted among 900 two-child families revealed that 75% of them own family-size cars. a) Use this information to determine a 90% confidence interval for the true proportion of two-child families who own family-size cars. b) Compute the largest margin of error that could occur when estimating this proportion.
24/08/11 7:42 PM
7 318 Chapter Estimation and Confidence Intervals
7.54 The following data represent the daily price index changes of real estate companies listed on the Amman stock market between January 2, 2010 and March 22, 2011. –0.1
–0.9
6.8
15.0
–29.1
59.2
–18.5
30.6
–1.3
–39.1
–41.2
–57.4
–28
8.4
–29.5
2.3
13.6
14.6
–22.2
–47.5
–45.8
0.6
19.9
–60.5
–38.8
–26.4
26.6
–12.1
12.5
–15.1
–42.1
–17.9
4.7
–7.0
8.5
–32.4
–45.6
–10.2
–16.6
14.5
3.1
31.6
–16.5
14.7
48.0
–41.8
42.5
-2.5
26.9
10.1
–22.9
–2.2
–66.0
–17.3
5.2
–25.7
Source: www.ase.com.jo
Assume that the population is normally distrib uted. Use this information to determine a 90% confi dence interval for the true mean price index change. 7.55 Albustan Safety supplies electronic devices for alarm systems in Bahrain. Imagine that, as part of the company’s quality control efforts, it wishes to estimate the mean number of days a particular electronic device is used before repair is needed. Suppose that a pilot sample of 40 electronic devices indicates a sample standard deviation of 200 days. The company wishes its estimates to have a margin of error of no more than 50 days and the confidence level must be 95%. a) Given this information, how many additional devices should be sampled? b) The pilot study was initiated because of the costs involved in sampling. Each sampled
bservation costs approximately $3.20 to ob o tain. Originally, it was thought that the popula tion’s standard deviation might be as large as 300. Determine the amount of money saved by obtaining the pilot sample. (Hint: figure the total cost of obtaining the required sample for each method). 7.56 Suppose that Naftal, an Algerian petroleum com pany, is considering building a gas station at a given intersection. The company would like to estimate the average number of cars that go past this location per hour in the afternoon. The com pany thinks that the number of cars passing this intersection per hour has a population standard deviation of 90 during the afternoon. a) On how many randomly selected afternoons should the number of cars passing the inter section be observed so that the company can be 95% confident that the estimate will be within 50 cars of the true average? b) Suppose the company finds out that the population standard deviation of the number of cars passing the location per hour is not 90 but 143. If the company has already taken the sample of size calculated in part a), what confidence can the company have that the point estimate is within 50 cars of the true average? c) If the company has already taken the sample of size calculated in part a) and later finds out that the population standard deviation of the number of cars passing the intersection per hour is actually 112, the company can be 95% confident that the point estimate is within how much of the true average?
Miniprojects Miniproject 7.1 Consider the data file Profits of Top 500 Companies for 2006. a) Randomly sample 20 companies and find the 95% confidence interval for , the mean profit. Assume that the population is normally distributed. b) Repeat question a) for samples of 35 and 50 companies. c) Compare the widths of your three confidence intervals. d) Compute the mean of the population of all companies. Which confidence intervals contain the population mean ?
Miniproject 7.2 A manufacturer of mixed (roasted and salted) nuts claims that the proportion of cashews in a bag is 18%.
M07_BENG9800_01_SE_C07.indd 318
Any bag of mixed nuts contains six types of nuts: cashews, almonds, peanuts, pecans, Brazil nuts, and hazelnuts. To verify the claim, you buy a 2.5 kg bag with a couple of friends and perform the following experiment. You select 30 samples of 20 nuts with replacement and compute the number of cashews in each sample. (Note: avoid eating nuts before finish ing sampling.) Use the sample proportion of all the samples to compute a 95% confidence interval for the proportion of cashews in all 2.5 kg bags of mixed nuts produced by this manufacturer. a) What percentage of confidence intervals con tain the population proportion 0.18? b) Is this percentage close to the confidence level of 95%? c) What happens if your sample size increases to 40 and then to 80?
24/08/11 7:42 PM
Miniprojects
Miniproject 7.3
Miniproject 7.7
For this project, select a football league of your choice (in your country or a European one). Obtain data on the heights and ages of all players in a given year.
For this project use either the ADCB Trading file or the Kuwait Stock Market file. Define a random variable as (Close Price – Open Price) for ADCB and (Close Market Index – Open Market Index) for the Kuwait Stock Market. The difference will be positive if an in crease occurs, zero if there is no change, or negative if a decrease occurs. Take a sample (using random numbers) of 35 values of the random variable.
a) Take a random sample of 20 players and find a 95% confidence interval for , the popula tion mean age. Assume that the ages of these players are normally distributed. b) Redo part a) for samples of size 30 and 50 respectively. c) Compare the widths of these three confi dence intervals. d) Compute the mean of the population of ages of all players. Which confidence intervals contain ?
Miniproject 7.4 Use the data you obtained for Miniproject 7.3. Take 15 random samples of 20 players. a) Compute a 95% confidence interval for the proportion of players who scored five goals during the season for each sample. b) Compute the population proportion p of all players who scored five goals during the season. c) What percentage of confidence intervals con tain the population proportion p computed in b)? d) Is this percentage close to the confidence level of 95%?
Miniproject 7.5 Use the data file Brazilian Football Players. Select the variables height and age of players. a) Take a random sample of 20 players and find a 95% confidence interval for , the popula tion mean age. Assume that the ages of these players are normally distributed. b) Redo part a) for samples of size 30 and 50 respectively. c) Compare the widths of these three confi dence intervals. d) Compute the mean of the population of ages of all players. Which confidence intervals contain ?
Miniproject 7.6 se the data file Brazilian Football Players. U Take 15 random samples of 20 players. Select the variable that indicates which foot a player prefers to use: left or right. a) Compute a 95% confidence interval for the proportion of players who are left-footed for each sample. b) Compute the population proportion p of all players who are left-footed. c) What percentage of confidence intervals contain the population proportion p com puted in b)? d) Is this percentage close to the confidence level of 95%?
M07_BENG9800_01_SE_C07.indd 319
319
a) Find a 95% confidence interval for . b) Find a 90% confidence interval for s. c) What is the number of observations that you must collect if you want to be 95% confident that your error of estimation does not exceed 0.02 for ADCB or 17 for the Kuwaiti Stock Market. What assumption did you make? d) Define a new random variable: increase or decrease (disregard no change); let p be the proportion of increases. Find a 90% confi dence interval for this proportion.
Miniproject 7.8 For this project, you need to interview at least 50 customers doing their weekly shopping in a large su permarket. To conduct these interviews, form a team of three or four students. Prepare a questionnaire about weekly spending on groceries that contains a list of questions (no more than six, so that your interview will not take more than three minutes). Among the ques tions ask the following:
• Does this shopping correspond to your weekly budget for groceries? • If yes, what is the amount of the budget. • What is the size of the family you are shopping for? • What is the family income? (Do not insist if the customer does not want to answer.)
The data collected may be used for future mini projects; hence it is important that you execute this task well! Note that you may be required to get the approval of the supermarket manager. Once you have your survey results, work out the following: a) Are the data collected a population or a sam ple? How many observations did you collect? b) Graph the data and discuss the shape of the distribution. c) What is the mean? d) What is the standard deviation? e) Find a 90% and a 95% confidence interval for the average spending and what assumption you are making. f) Find a 90% and a 95% confidence interval for the variance and state what assumption you are making. g) What is the number of observations that you must collect if you want to be 95% confident that your error of estimation does not exceed $15? What assumption did you make?
24/08/11 7:42 PM
Answers to Selected Odd-Numbered Problems
Chapter 8
5.15 a) .0945.  b) .7147.  c) .4766. 5.17 a) .9525.  b) .7967.  c) .4595.
8.1 a) Reject in left tail.  b) Reject in both tails. c) Reject in right tail.
5.19 366.
8.3 a) Type II error. b) Type I error.
5.21 .0436. 5.23 a) .6814.  b) .1497.  c) .4835.
8.5 a) H0: ‌ 8 H1: 7 8 (claim), right-tailed test. b) H0: ‌ 5 (claim), H1: 7 5, right-tailed test. c) H0: Ú 400 H1: 6 400 (claim), left-tailed test.
Chapter 6 6.1
16 samples.
6.3
= 89.34, = 182.55.
6.5
a) X = 20,
2 = 0.36,
X = 0.6.
b) X = 70,
X2 = 0.64,
X = 0.8.
c) X = 5,
2 X
= 0.0225,
X = 0.15.
d)   X = 45,
2 X
= 0.0156,
X = 0.125.
8.7 H0: ‌ 25, H1: 7 25 (claim), t = 3.254, reject H0. 8.9 H0: Ú 4 (claim), H1: 6 4, t = 1.688, do not reject H0. 8.11 H0: ‌ 3000 (claim), H1: 7 3000, t = - .791, do not reject H0.
6.7 Normal distribution (n Ăš 30). a) X = 25, X = 6/7. b) .0099.  c) .0516.
8.13 a) - 1.645.  b) 1.282.  c) {2.575.
6.9
8.17 z = - 1.28, do not reject H0
a) 25.  b) 100.
8.15 z = 1.371, do not reject H0.
6.11 a) Yes.  b) No.  c) Yes.
8.19 2 = 36.1909.
6.13 a) .84.  b) - .56  c) 1.13  d) - 1,141
8.21 a) 4.5748.  b) 24.769.  c) 6.5706 and 23.6848.
6.15 .0005.
8.23 c) 2 = 31.043, do not reject.
6.17 .0382.
8.25 b) 2 = 24.5, do not reject.
6.19 .0148.
8.27 a) False.  b) False.
Chapter 7
8.29 = 0.1251.
7.1 a) [10.49, 17.51]. b) [19.15, 24.85]. c) [51.06,60.94]. 7.3
[14.80, 15.88].
7.5
[119.55, 136.45].
7.7
[45.96, 61.24].
7.9
[2091.73, 3352.61].
7.11 a) 15.987.  b) 46.979.  c) 7.261.  d) 45.642. 7.13 a) Yes.  b) Yes.  c) Yes.  d) Yes. 7.15
a) 99% CI = [16.25, 91.61]. b) 95% CI = [19.09, 70.40]. c) 90% CI = [20.80, 61.97]. d) As the level of confidence decreases, the confidence interval increases.
Chapter 9 9.1 a) Independent.  b) Paired samples. c) Independent. 9.3 a) Two-tailed test. c) z = - 2.92. e) p-value = .0036. 9.5
z = .69, do not reject H0.
9.7
t = - 2.113, do not reject H0.
9.9 a) [11.85, 23.15].  b) [50.08, 61.72]. c) [25.66, 32.94]. 9.11 [57.52, 330.48]. 9.13 c) p-value = .0537 do not reject H0.
7.17 [2804.59, 8902.40].
9.15 a) p-value = .407, do not reject H0.
7.19 [0.6914, 1.6415].
9.17 [ - 0.1899, - 0.1101] 1 is less than 2.
7.21 a) 770.  b) 606.
9.19 p-value = .163, do not reject H0.
7.23 138.
9.21 a) 3.49.  b) 2.61.
7.25 17, Poisson.
9.23 F = 1.8025, do not reject H0.
7.27 Goal seek, .110.
9.25 F = 6.84, do not reject H0.
Z03_BENG9800_01_SE_ANS.indd 841
841 
08/07/11 4:58 PM
to Selected 842 Answers Odd-Numbered Problems
11.13
9.27
p-value 0.1188.
9.29
a) t = - 8.297 is outside the interval [ - 2.724 and 2.724], reject H0. b) t = - 2.64, do not reject H0.
Chapter 10 10.1
2 = 2.810, do not reject H0.
10.3
2 = 8.26, do not reject H0.
10.5
2 = 122, reject H0 Type I. = 3.519, do not reject H0.
10.9
2 = 19.104, reject H0.
2
139.813
17.6309
409.77
3
136.59
17.205
33.2
6
5.533
952.7
120
7.939
1675.3
131
0.697
11.15 F(SSBL) = 6.056, F(SSB) = 5.444, reject H0 11.17 a ) 60 observations b) 3 levels for A and 4 levels for factor B c) 5 replications d) Reject H0
10.13 = 2.11, do not reject H0. 2
10.15 = 12.76, do not reject H0. 10.17 2 = 25.3, not independent. 10.19 2 = 5.08, homogeneous.
Chapter 12
10.21 2 = 10542, reject H0. and Group 3.
Chapter 11 11.1
Source Degrees of Sum of of Mean Fvariation squares freedom square statistic Between treatments
352
2
176
Within treatments
748
39
19.18
1100
41
9.17
12.1
a) x = number of hours. b) y = 25 + 12x. c) $85.
12.3
yn = - 3.514 + 0.264x.
12.4
yn = 3091.91 + 0.103x, where y is the food spending.
12.7
a) yn = 3.77 + .27x, where y is the number of injuries. b) The slope is not meaningful.
12.9
yn = - 375.126 + 301, where y is the purchase amount.
12.11 a) 12.13 a) r = .875
b) t = 5.116, reject H0.
12.15 a ) S = +1559.7. b) r2 = .404, 59.6% not explained.
11.3
F = 3.592, do not reject H0.
11.5
F = 5.44, reject H0.
11.7
a) Reject H0. b) p-value = .0000. c) Means differ significantly. d)s.
11.9
b) F = .482, do not reject H0. c) p-value = .621. d) Reject H0.
12.17 a) r = - .996. b) t = - 36.57, reject H0. 12.19 a ) Yes. b) r = .636. c) t = 2.018, do not reject H0. 12.21 a) [0.257, 0.284]. b) value = .00009, reject H0. c) r2 = .995.
11.11
Source Degrees of Sum of of Mean Fvariation squares freedom square statistic Between blocks
223.5
5
44.7
Between treatments
75.6
3
25.2
Within treatments
478.3
15
31.89
777.4
23
Z03_BENG9800_01_SE_ANS.indd 842
279.626
Total SST
2
Total
Factor A
Within- treatments SSE
10.11 2 = 1.573, do not reject H0.
Total
Degrees Sum of of Mean Fsquares freedom square statistic
Factor B AB interaction
2
10.7
Source of variation
1.4 .79
12.23 a) Yes. b) r = .938. c) t = 7.643, reject H0. 12.25 a) [107.6, 118.8]. b) [92.7, 133.7]. 12.27 a) t = 59.8, reject H0. b) [84.8, 87.2]. c) [80.9, 91.1]. d) DW = 2.262, negative autocolleration.
Chapter 13 13.1
a) Reject if T Ú 11. b) Reject if T … 5. c) Reject if T … 2 or T Ú 10.
13.3
Reject if T … 1 or T Ú 8. If T = 6, we do not reject H0.
08/07/11 4:58 PM
A2 APPENDICES
APPENDICES
Appendix A
Binomial Distribution
Appendix B
Poisson Distribution
A11
Appendix C
Areas for the Standard Normal Distribution
A15
Appendix D
The t-Distribution
A16
Appendix E
The 2-Distribution
A18
Appendix F
The F-Distribution
A20
Appendix G
ritical Values of the Studentized Range C Distribution
A30
Appendix H
Critical Values of Hartley’s Fmax Test
A34
Appendix I
Distribution Function of the Number of Runs
A35
Appendix J
ritical Values of T for the Wilcoxon C Signed-Rank Test
A38
umulative Distribution of the Mann-Whitney C U-Statistic
A39
ritical Values of Spearman’s Rank Correlation C Coefficient
A47
Appendix M
Matrix Approach to Multiple Regression
A48
Appendix N
Random Number Table
A52
Appendix O
Table of Factors for Control Limits
A54
Appendix p
Stepwise Regression
A55
Appendix K
Appendix L
Z02_BENG9800_01_SE_APP.indd 2
A3
8/9/11 5:24 PM
Glossary Adjusted coefficient of d etermination معامل التحديد الضابط a modified value of the coefficient of determination that avoids inflating R2 when new independent variables are included in the regression model. Aggregate price index اجمالي مؤشر األسعار used to express the relative change from a base year for a set of at least two items. Alternative hypothesis (H1) الفرضية البديلة a claim (or a statement) that will be true if the null hypothesis is false. Analysis of variance (ANOVA) حتليل التباين a procedure by comparing simultaneously three or more means in a single test. Approximation to the binomial distribution التقريب للتوزيع ذو احلدين for large values of n, the approximation works well and in general offers a good estimate when both n Ú 5 and n(1 - ) Ú 5. Attribute السمة is a piece of data that is counted. Autocorrelation االرتباط الذاتي It occurs when successive error terms, or residuals are correlated in regression analysis. Backward elimination االزالة بأثر رجعي procedure in regression modeling where all variables are entered at once. Nonsignificant independent variables are deleted one at a time from the model. The procedure ends when all independent variables left have significant t values. Bar chart األعمدة البيانية a graphical display of categorical data represented by bars on the horizontal and vertical axes. Bayes’ theorem نظرية بايز a theorem used to compute posterior probabilities by revising prior probabilities. Beta () بيتا probability of committing a type II error. Binary variable املتغير الرقمي a variable that is assigned a value equal to either one or zero, depending on whether the observation possesses a given characteristic. Binomial distribution التوزيع ذو احلدين the probability distribution of r successes in n independent trials. Blocks a set of units having similar characteristics in terms of the blocking variables. Box plot or box and whiskers plot خانات أو حواجز a diagram that incorporates the quartiles Q1, Q3, the median, and the two extreme values to graphically display quantitative data. chart خريطة سي used to control the number of times a particular characteristic appears in a sampling unit.
Z06_BENG9800_01_SE_GLOSS.indd 2
Causal forecasting model منوذج التنبؤ السببي model that considers several variables that are related to the variable being predicted. Cause and effect diagram (or fishbone diagram or Ishikawa diagram) مخطط هيكل السمكة أو مخطط إيشيكاوا/مخطط السبب والتأثير the diagram looks like a fish skeleton, with the problem being the head of the fish, major causes being the “ribs” of the fish, and subcauses forming smaller bones off the ribs. Centered moving average املتوسط املركزي املتحرك average of two consecutive moving averages. Central limit theorem نظرية النهائية املركزية if samples of size n are drawn randomly from a population with mean μ and standard deviation , the sample means X are approximately normally distributed with mean μ and standard deviation / 1n for sufficiently large samples (n Ú 30) regardless of the shape of the population distribution. Chebyshev’s theorem نظرية شيبيشيف regardless of how the data are distributed, at least (1 1/k2) of the values will fall within k standard deviations of the mean, where k is a number greater than 1. Check sheet قائمة مراجعة data gathering tools that can be used for problem identification in quality control. Chi-square distribution توزيع مربع كاي a skewed continuous distribution whose shape depends on the number of degrees of freedom. Chi-square goodness-of-fit test أختبار جودة التوفيق Chi-square test of homogeneity أختبار التجانس ملربع كاي test of equality of proportions across several populations. Chi-square test of independence أختبار االستقالل ملربع كاي a test applied to analyze the frequencies of two variables having several characteristics to check whether the two variables are independent. Classical approach to probability اإلحتمال الكالسيكي the probability of an event is equal to the number of outcomes where the event occurs divided by the total number of possible outcomes. Cluster sampling املعاينة العنقودية a method by which the population is divided into groups, or clusters that are considered as mini populations. A random sample of m clusters is selected and a sample is collected by randomly selecting from each cluster. Coefficient of determination (r2) معامل التحديد proportion of variability of the dependent variable that is explained by the independent variable. Coefficient of variation (CV) معامل االختالف the ratio of the standard deviation to the mean expressed in percentage in a set of observations. Combinations التوافيق the possible selections of k items from a group of n items regardless of the order of selection.
23/08/11 5:16 PM
Glossary
Complement متممة of an event E denoted E (read as E bar) is the event that includes all the outcomes for an experiment that are not in E. Conditional probability االحتمال الشرطي the probability that an event will occur given that another event has already occurred. Confidence interval فترات الثقة a range of values within which we can declare with some confidence the population parameter lies. Confidence level مستوى الثقة او حدود الثقة a degree of certainty expressed in percentage that an interval would include the population parameter. Consistency التجانسية a property of an estimator if its probability of being close to the parameter it estimates increases as the sample size increases. Consumer’s risk مخاطرة املستهلك probability that a nonconforming product will be available for sale. Contingency table جدول االقتران a cross-tabulation of frequencies into rows and columns.
Cumulative frequency التكرار التراكمي the sum of all frequencies up to and including a given value (or class or category). Cutoff point نقطة التقاطع the separation between rejecting and not rejecting the null hypothesis. Cyclical Component املركبات الدائرية it describes patterns in the data that occur every several years. They are related to business cycles. Data set مجموعة البيانات a collection of observations on one or more variables. Degrees of freedom (df) درجات احلرية it is equal to the number of observations of a sample (n) minus the number of parameters being estimated. Delphi method طريقة دلفاي this method allows each member to benefit from the experience and knowledge of other members by neither meeting face-to-face nor knowing the other members’ identity; and personality conflicts are ignored. Dependent variable املتغير التابع a variable being predicted in regression analysis.
Continuous random variable املتغير العشوائي املتصل a variable that can take on any value in an interval of numbers.
Dependent samples العينة التابعة Two samples drawn from two populations where the selection of one sample from one population does have an influence on the selection of the second sample from the second population.
Continuous variables املتغير املتصل result from measuring (weight, length, etc.) and assume all values between any two specific values.
Descriptive Statistics االحصاء الوصفي methods of organizing, summarizing, and presenting data in an informative way by using tables, graphs and summary measures.
Control chart خارطة التحكم a graphical display of measurements (generally means of many samples of measurements) over time through repeated observation.
Deseasonalizing (the data) التخلص من أثر املوسم process of removing seasonal effects from the actual data.
Control chart (in forecasting) خارطة التحكم للتنبؤ it sets lower and upper limits for individual forecast errors using multiples of the square root of MSE. Correction factor or continuity correction معامل التصحيح a correction made when a binomial distribution (discrete variable) problem is approximated by the normal distribution (continuous variable). Correlation analysis حتليل االرتباط the process of determining a measure of the strength of the linear relationship of the variables. Correlation coefficient معامل االرتباط It measures the strength of the linear relationship that exists within a sample of n pairs of data. Critical value القيمة احلرجة it separates the region where the null hypothesis is rejected from the rest of the distribution. Critical value approach القيمة احلرجة a method of testing hypothesis in which the sample statistic is compared to a critical value in order to reach a conclusion about rejecting or failing to reject the null hypothesis.
Z06_BENG9800_01_SE_GLOSS.indd 3
G3
Discrete probability distribution التوزيع االحتمالي املتقطع a listing of all outcomes of an experiment and the probability associated with each outcome. Discrete variables املتغير املتقطع result from counting and can be assigned values such as 0, 1, 2, 3, and so on. Discrete random variable املتغير العشوائي املتقطع a random variable that can take on only integer values. Double Exponential Smoothing التجانس األسي املزدوج a smoothing model that incorporates a second smoothing constant to account for the trend in a time series. Durbin-Watson Test اختبار ديربني واتسون a statistical test for determining whether significant correlation is present when the regression analysis uses a sample of a time series data. Efficiency الكفاءة a property of an estimator if it has a relatively small variance. Empirical approach النهج التجريبي it is based on defining probabilities from statistical data collected from historical occurrences. Error of estimation خطأ التخمني the difference between the sample mean X and the population mean μ.
23/08/11 5:16 PM
A01_BENG9800_01_SE_FM.indd 6
31/08/11 5:40 PM
Arab World Edition
Statistics for Buisness
Statistics for Business is an excellent resource for introductory students of statistics, written specifically for students in the Arab region. Drawing on examples from Arab companies, and using both regional and international case studies, this textbook helps students understand the importance of statistics in day to day business operations. Using thorough analysis and detailed explanations this unique textbook is an essential tool for both students and teachers of statistics in the Arab world.
Statistics for Business Farouk Benghezal
Benghezal