Contents To Teachers: About This Book To Students: What Is Statistics? About the Authors Data Table Index Beyond the Basics Index
PART I
xv xxv xxix xxxi xxxiii
Looking at Data
CHAPTER 1 Looking at Data—Distributions
1
Introduction Variables Measurement: know your variables
1 2 5
1.1 Displaying Distributions with Graphs
6
Graphs for categorical variables Data analysis in action: don’t hang up on me Stemplots Histograms Examining distributions Dealing with outliers Time plots
7 9 12 15 17 18
Beyond the basics: decomposing time series* Section 1.1 Summary Section 1.1 Exercises
19 22 22
1.2 Describing Distributions with Numbers Measuring center: the mean Measuring center: the median Mean versus median Measuring spread: the quartiles The five-number summary and boxplots The 1.5 × IQR rule for suspected outliers Measuring spread: the standard deviation Properties of the standard deviation Choosing measures of center and spread Changing the unit of measurement
30 30 32 34 34 36 38
Section 1.2 Summary Section 1.2 Exercises
6
40 42 43 45 47 48
1.3 Density Curves and Normal Distributions Density curves Measuring center and spread for density curves Normal distributions The 68–95–99.7 rule Standardizing observations Normal distribution calculations Using the standard Normal table Inverse Normal calculations Normal quantile plots
53 55 56 58 59 61 62 64 66 68
Beyond the basics: density estimation Section 1.3 Summary Section 1.3 Exercises Chapter 1 Exercises
71 71 72 78
CHAPTER 2 Looking at Data—Relationships
83
Introduction Examining relationships
83 84
2.1 Scatterplots
86
Interpreting scatterplots Adding categorical variables to scatterplots More examples of scatterplots
88 89 90
Beyond the basics: scatterplot smoothers
92
Categorical explanatory variables
93
Section 2.1 Summary Section 2.1 Exercises
94 95
2.2 Correlation
101
The correlation r Properties of correlation
102 102
Section 2.2 Summary Section 2.2 Exercises
105 105
2.3 Least-Squares Regression Fitting a line to data Prediction Least-squares regression Interpreting the regression line
Sections marked with an asterisk are optional.
108 110 111 112 115
vii
viii
•
CONTENTS
Correlation and regression *Understanding r2 Beyond the basics: transforming relationships Section 2.3 Summary Section 2.3 Exercises
2.4 Cautions about Correlation and Regression
115 118 119 121 122
125
Residuals Outliers and influential observations Beware the lurking variable Beware correlations based on averaged data The restricted-range problem
126 129 132
Beyond the basics: data mining Section 2.4 Summary Section 2.4 Exercises
136 137 137
2.5 Data Analysis for Two-Way Tables The two-way table Joint distribution Marginal distributions Describing relations in two-way tables Conditional distributions Simpson’s paradox The perils of aggregation Section 2.5 Summary Section 2.5 Exercises
2.6 The Question of Causation Explaining association: causation Explaining association: common response Explaining association: confounding Establishing causation Section 2.6 Summary Section 2.6 Exercises Chapter 2 Exercises
CHAPTER 3 Producing Data
135 135
142 142 144 145 146 146 148 151 151 152
154 154 155 156 157 159 159 161
Randomized comparative experiments How to randomize Cautions about experimentation Matched pairs designs Block designs
183 184 188 189 190
Section 3.1 Summary Section 3.1 Exercises
191 192
3.2 Sampling Design Simple random samples Stratified samples Multistage samples Cautions about sample surveys Section 3.2 Summary Section 3.2 Exercises
207 208
3.3 Toward Statistical Inference Sampling variability Sampling distributions Bias and variability Sampling from large populations Why randomize?
212 213 214 217 219 219
Beyond the basics: capture-recapture sampling Section 3.3 Summary Section 3.3 Exercises
220 221 221
3.4 Ethics Institutional review boards Informed consent Confidentiality Clinical trials Behavioral and social science experiments Section 3.4 Summary Section 3.4 Exercises Chapter 3 Exercises
PART II 171
197 200 202 203 204
224 225 226 227 228 230 232 232 234
Probability and Inference
CHAPTER 4 Probability: The Study of Randomness
Introduction Anecdotal data Available data Sample surveys and experiments
171 171 172 173
Introduction
237
3.1 Design of Experiments Comparative experiments Randomization
178 180 181
4.1 Randomness The language of probability Thinking about randomness
237 239 240
237
CONTENTS The uses of probability
241
Section 4.1 Summary Section 4.1 Exercises
241 241
4.2 Probability Models Sample spaces Probability rules Assigning probabilities: finite number of outcomes Assigning probabilities: equally likely outcomes Independence and the multiplication rule Applying the probability rules Section 4.2 Summary Section 4.2 Exercises
4.3 Random Variables
242 243 245 248 249 251 254 255 255
258
Discrete random variables Continuous random variables Normal distributions as probability distributions
265
Section 4.3 Summary Section 4.3 Exercises
267 267
4.4 Means and Variances of Random Variables The mean of a random variable Statistical estimation and the law of large numbers Thinking about the law of large numbers Beyond the basics: more laws of large numbers
259 263
270 270 273 275 277
Rules for means The variance of a random variable Rules for variances and standard deviations
277 279
Section 4.4 Summary Section 4.4 Exercises
285 286
4.5 General Probability Rules* General addition rules Conditional probability General multiplication rules Tree diagrams Bayes’s rule Independence again Section 4.5 Summary Section 4.5 Exercises Chapter 4 Exercises
281
289 290 293 298 299 301 302 302 303 307
CHAPTER 5 Sampling Distributions Introduction 5.1 Sampling Distributions for Counts and Proportions The binomial distributions for sample counts Binomial distributions in statistical sampling Finding binomial probabilities: software and tables Binomial mean and standard deviation Sample proportions Normal approximation for counts and proportions The continuity correction* Binomial formula* Section 5.1 Summary Section 5.1 Exercises
5.2 The Sampling Distribution of a Sample Mean
•
ix
311 311 313 314 315 316 319 321 322 326 327 330 331
335
The mean and standard deviation of x The central limit theorem A few more facts
337 339 343
Beyond the basics: Weibull distributions Section 5.2 Summary Section 5.2 Exercises Chapter 5 Exercises
344 346 346 350
CHAPTER 6 Introduction to Inference Introduction Overview of inference 6.1 Estimating with Confidence Statistical confidence Confidence intervals Confidence interval for a population mean How confidence intervals behave Choosing the sample size Some cautions Beyond the basics: the bootstrap Section 6.1 Summary Section 6.1 Exercises
6.2 Tests of Significance The reasoning of significance tests
353 353 354 356 356 358 360 363 364 366 368 368 369
372 372
x
•
CONTENTS
Stating hypotheses Test statistics P-values Statistical significance Tests for a population mean Two-sided significance tests and confidence intervals P-values versus fixed α
386 388
Section 6.2 Summary Section 6.2 Exercises
390 390
6.3 Use and Abuse of Tests Choosing a level of significance What statistical significance does not mean Don’t ignore lack of significance Statistical inference is not valid for all sets of data Beware of searching for significance Section 6.3 Summary Section 6.3 Exercises
6.4 Power and Inference as a Decision* Power Increasing the power Inference as decision* Two types of error Error probabilities The common practice of testing hypotheses Section 6.4 Summary Section 6.4 Exercises Chapter 6 Exercises
CHAPTER 7 Inference for Distributions
374 376 377 379 382
394 395 396 397 398 398 399 399
401 401 405 406 406 407 409 410 410 412
417
Introduction
417
7.1 Inference for the Mean of a Population The t distributions The one-sample t confidence interval The one-sample t test Matched pairs t procedures Robustness of the t procedures The power of the t test* Inference for non-Normal populations*
418 418 420 422 428 432 433 435
Section 7.1 Summary Section 7.1 Exercises
440 441
7.2 Comparing Two Means
447
The two-sample z statistic
448
The two-sample t procedures The two-sample t significance test The two-sample t confidence interval Robustness of the two-sample procedures Inference for small samples Software approximation for the degrees of freedom* The pooled two-sample t procedures*
450 451 454
Section 7.2 Summary Section 7.2 Exercises
466 467
7.3 Optional Topics in Comparing Distributions* Inference for population spread The F test for equality of spread Robustness of Normal inference procedures The power of the two-sample t test Section 7.3 Summary Section 7.3 Exercises Chapter 7 Exercises
CHAPTER 8 Inference for Proportions
456 457 460 461
473 473 474 476 477 479 479 481
487
Introduction
487
8.1 Inference for a Single Proportion Large-sample confidence interval for a single proportion
488
Beyond the basics: the plus four confidence interval for a single proportion
488 491
Significance test for a single proportion Confidence intervals provide additional information Choosing a sample size
496 498
Section 8.1 Summary Section 8.1 Exercises
501 502
8.2 Comparing Two Proportions Large-sample confidence interval for a difference in proportions
493
505 506
Beyond the basics: plus four confidence interval for a difference in proportions
509
Significance test for a difference in proportions
511
Beyond the basics: relative risk Section 8.2 Summary Section 8.2 Exercises Chapter 8 Exercises
515 516 517 519
CONTENTS
PART III
Section 10.2 Summary Chapter 10 Exercises
Topics in Inference
CHAPTER 9 Analysis of Two-Way Tables
525
CHAPTER 11 Multiple Regression
•
xi 593 594
607
Introduction
525
Introduction
607
9.1 Inference for Two-Way Tables The hypothesis: no association Expected cell counts The chi-square test The chi-square test and the z test
526 529 529 530 533
607
Beyond the basics: meta-analysis Section 9.1 Summary
534 536
11.1 Inference for Multiple Regression Population multiple regression equation Data for multiple regression Multiple linear regression model Estimation of the multiple regression parameters Confidence intervals and significance tests for regression coefficients ANOVA table for multiple regression Squared multiple correlation R2
9.2 Formulas and Models for Two-Way Tables* Computations Computing conditional distributions Computing expected cell counts The X 2 statistic and its P-value Models for two-way tables Concluding remarks Section 9.2 Summary
9.3 Goodness of Fit* Section 9.3 Summary Chapter 9 Exercises
CHAPTER 10 Inference for Regression
536 536 537 540 540 541 544 545
545 548 548
559
Introduction
559
10.1 Simple Linear Regression
560
Statistical model for linear regression Data for simple linear regression Estimating the regression parameters Confidence intervals and significance tests Confidence intervals for mean response Prediction intervals
570 572 574
Beyond the basics: nonlinear regression Section 10.1 Summary
576 578
10.2 More Detail about Simple Linear Regression* Analysis of variance for regression The ANOVA F test Calculations for regression inference Inference for correlation
560 561 565
579 579 581 583 590
11.2 A Case Study Preliminary analysis Relationships between pairs of variables Regression on high school grades Interpretation of results Residuals Refining the model Regression on SAT scores Regression using all variables Test for a collection of regression coefficients Beyond the basics: multiple logistic regression Chapter 11 Summary Chapter 11 Exercises
CHAPTER 12 One-Way Analysis of Variance
607 608 609 610 611 612 613 615 615 616 618 619 620 621 622 623 623 625 627 628
637
Introduction
637
12.1 Inference for One-Way Analysis of Variance Data for one-way ANOVA Comparing means The two-sample t statistic An overview of ANOVA The ANOVA model Estimates of population parameters Testing hypotheses in one-way ANOVA The ANOVA table The F test
638 638 639 640 641 644 646 648 649 652
xii
•
CONTENTS
12.2 Comparing the Means Contrasts Multiple comparisons Software Power* Section 12.2 Summary Chapter 12 Exercises
CHAPTER 13 Two-Way Analysis of Variance
655 655 661 665 666 669 670
683
Introduction
683
13.1 The Two-Way ANOVA Model Advantages of two-way ANOVA The two-way ANOVA model Main effects and interactions
684 684 688 689
13.2 Inference for Two-Way ANOVA
694
The ANOVA table for two-way ANOVA
694
Section 13.2 Summary Chapter 13 Exercises
698 699
CHAPTER 15 Nonparametric Tests Introduction 15.1 The Wilcoxon Rank Sum Test The rank transformation The Wilcoxon rank sum test The Normal approximation What hypotheses does Wilcoxon test? Ties Rank, t, and permutation tests Section 15.1 Summary Section 15.1 Exercises
15.2 The Wilcoxon Signed Rank Test
CHAPTER 14 Logistic Regression
14-1
Introduction
14-1
14.1 The Logistic Regression Model Binomial distributions and odds Odds for two samples Model for logistic regression Fitting and interpreting the logistic regression model
14-1 14-2 14-3 14-4
14.2 Inference for Logistic Regression
14-8
14-6
Confidence intervals and significance tests Multiple logistic regression
14-8 14-14
Section 14.2 Summary Chapter 14 Exercises Chapter 14 Notes
14-16 14-17 14-23
15-1 15-3 15-4 15-5 15-7 15-8 15-10 15-12 15-14 15-15
15-17
The Normal approximation Ties
15-20 15-21
Section 15.2 Summary Section 15.2 Exercises
15-23 15-23
15.3 The Kruskal-Wallis Test* Hypotheses and assumptions The Kruskal-Wallis test Section 15.3 Summary Section 15.3 Exercises Chapter 15 Exercises Chapter 15 Notes
Companion Chapters (on the IPS Web site www.whfreeman.com/ips6e and CD-ROM)
15-1
CHAPTER 16 Bootstrap Methods and Permutation Tests
15-26 15-28 15-28 15-30 15-31 15-33 15-35
16-1
Introduction Software
16-1 16-2
16.1 The Bootstrap Idea The big idea: resampling and the bootstrap distribution Thinking about the bootstrap idea Using software
16-3
Section 16.1 Summary Section 16.1 Exercises
16.2 First Steps in Using the Bootstrap Bootstrap t confidence intervals Bootstrapping to compare two groups Beyond the basics: the bootstrap for a scatterplot smoother Section 16.2 Summary Section 16.2 Exercises
16-4 16-9 16-10 16-11 16-11
16-13 16-13 16-17 16-20 16-21 16-22
CONTENTS 16.3 How Accurate Is a Bootstrap Distribution?* Bootstrapping small samples Bootstrapping a sample median Section 16.3 Summary Section 16.3 Exercises
16.4 Bootstrap Confidence Intervals Bootstrap percentile confidence intervals More accurate bootstrap confidence intervals: BCa and tilting Confidence intervals for the correlation Section 16.4 Summary Section 16.4 Exercises
16-24 16-26 16-28 16-28 16-30
16-30 16-31 16-32 16-35 16-38 16-38
16.5 Significance Testing Using Permutation Tests 16-41 Using software Permutation tests in practice Permutation tests in other settings
16-45 16-45 16-49
Section 16.5 Summary Section 16.5 Exercises Chapter 16 Exercises Chapter 16 Notes
16-52 16-52 16-56 16-59
CHAPTER 17 Statistics for Quality: Control and Capability
17-1
•
xiii
x charts for process monitoring s charts for process monitoring
17-8 17-12
Section 17.1 Summary Section 17.1 Exercises
17-17 17-17
17.2 Using Control Charts x and R charts Additional out-of-control rules Setting up control charts Comments on statistical control Don’t confuse control with capability! Section 17.2 Summary Section 17.2 Exercises
17.3 Process Capability Indexes* The capability indexes Cp and Cpk Cautions about capability indexes Section 17.3 Summary Section 17.3 Exercises
17.4 Control Charts for Sample Proportions
17-21 17-22 17-23 17-25 17-30 17-33 17-34 17-35
17-39 17-41 17-44 17-45 17-46
17-49
Control limits for p charts
17-50
Section 17.4 Summary Section 17.4 Exercises Chapter 17 Exercises Chapter 17 Notes
17-54 17-54 17-56 17-57
Data Appendix
D-1
Introduction Use of data to assess quality
17-1 17-2
Tables
T-1
Answers to Odd-Numbered Exercises
A-1
17.1 Processes and Statistical Process Control
17-3
Notes and Data Sources
N-1
17-3 17-6
Photo Credits
C-1
Describing processes Statistical process control
Index
I-1