SOLUTIONS MANUAL for Statistical Methods 4th Edition by Mohr Donna, Wilson William & JFreund Rudolf.

Page 1


CHAPTER 1

EXERCISE 1 a) Mean = 17.00, SD = 5.53, median = 17.00, Range = 22, IQR = 20 – 13 = 7. b) See boxplot.

c) A typical value for X is 17 which is both the mean and the median. About half the values are between 13 and 20. The boxplot shows that the upper tail is slightly longer, but difference is small and mean and median are equal. This distribution is roughly symmetric.

Chapter 1 – Page 1


EXERCISE 2 Variable

Mean

Median

Variance

Standard Deviation

Shape

WATER

7.125

1.500

452.864

21.28

extremely positively skewed

VEG

1.120

0.00

4.327

2.080

extremely positively skewed

FOWL

75.635

11.5

42197.33

205.420

extremely positively skewed

WATER Stem Leaf

#

Boxplot

14 9 13 12 11 10 9 8 7 6 5 4 3 13 2 1 05667 0 00000000000001111111111112222222222345556779 + + + + + + + + Multiply Stem.Leaf by 10**+1

1

*

2

*

5 44

0 +--+--+

VEG Stem Leaf 9 0 8 8 0 7 7 0 6 6 5 5 2 4 4 0 3 3 0 2 8 2 00002 1 58 1 0002 0 58 0 00000000000000000000000000000002 + + + + + +

# 1

Boxplot *

1

*

1

*

1

0

1

| | | | | +-----+ | + | | | *-----*

1 1 5 2 4 2 32

Chapter 1 – Page 2


FOWL Stem Leaf 14 1 13 12 11 10 9 8 7 6 5 4 3 6 2 124 1 227888 0 00000000000000000000001111112222223335678 + + + + + + + +Multiply Stem.Leaf by 10**+2

# 1

Boxplot *

1 3 6 41

* 0 0 +--+--+

b) Frequency distribution for FOWL Fowl 0 ≤ x < 100 100 ≤ x < 200 200 ≤ x < 300 300 ≤ x < 400 1400 ≤ x < 1500

Frequency 41 6 3 1 1

Midpoint 50.0 150.0 250.0 350.0 1450.0

approximate mean = 5500/52 = 105.8 approximate variance = 50746

c) Scatterplots

fow l 1500 1400 1300 1200 1100 1000 900 800 700 600 500 400 300 200 100 0 0

1

2

3

4

5

6

7

8

9

veg

Aside from the single point with extraordinary number of waterfowl, there is little relationship.

Chapter 1 – Page 3


EXERCISE 3 a) INDEX: mean = -0.149 median = -0.155

variance = 1.771

IQR = 1.59 (show stem and leaf, histogram, or boxplot as part of answer)

FUTURE: mean = -0.208 median = -0.3 variance = 0.601 IQR = 0.81 (show stem and leaf, histogram, or boxplot as part of answer)

b) The Scatter plot is shown below. There is a clear trend for positive changes in the FUTURE contract value to be associated with positive changes in the NYSE INDEX. Yes, FUTURE can be used to help predict changes in the NYSE index.

index 3 2 1 0 -1 -2 -3 -4 -3

-2

-1

0

1

2

future

Chapter 1 – Page 4


EXERCISE 4 Y1

Y2

Y3

Stem Leaf 8 14 7 3 6 1237889 5 338 4 025 3 112 2 478 1 668 +

Stem Leaf 6 5679 6 4 5 789 5 22 4 678 4 0024 3 559 3 12333 + Stem Leaf 9 4 8 0 7 6 777 5 444 4 000 3 2 7777777 1 333333 0 0 +

+

+

+

+

# 2 1 7 3 3 3 3 3

Boxplot | | +-----+ *-----* | + | +-----+ | |

# 4 1 3 2 3 4 3 5

Boxplot | | +-----+ | | *--+--* | | +-----+ |

# 1 1

Boxplot | | | | +-----+ | | | + | *-----* +-----+ |

+

+

3 3 3 7 6 1 +

+

+

Chapter 1 – Page 5


Y4

Stem Leaf 7 4 6 5 0 4 578 3 99 2 123357 1 01468 0 1113699 +

# 1 1 3 2 6 5 7 +

+

Boxplot | | | | +-----+ *--+--* | | +-----+

+

Y1 is slightly negatively (or left) skewed; Y2 is nearly symmetrically distributed, though without a well defined peak; Y3 is positively (or right) skewed; Y4 is extremely positively skewed. comparison to empirical rule, actual percentage of observations in interval mean

Std dev

Shape

±1 SD (expect 68%)

±2 SD (expect 95%)

±3 SD (expect all)

Y1

4.840

2.107

slight left skew

68%

100%

100%

Y2

4.748

1.262

near symm.

60%

100%

100%

Y3

3.696

2.434

right skew

76%

96%

100%

Y4

2.328

1.888

extreme right skew

64%

96%

100%

The empirical rule works reasonably well even for the skewed distributions.

Chapter 1 – Page 6


EXERCISE 5 a) DAYS: Mean= 15.854 median= 17, variance= 24.324

TEMP: Mean= 39.348 median= 40 variance= 11.152

b) Scatterplot:

days 30

20

10

0 30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

temp

Chapter 1 – Page 7


From the scatterplot, there appears to be no definitive relationship between the average temperature and the number of rainy January days.

EXERCISE 6 a) The distribution of EXPEND is extremely right skewed with a mean ($1816M) that is much greater than the median ($1169M), so the best measure of location is the median and the best measure of dispersion is the IQR ($1938M). There are three outlier states with California being an extreme outlier at $15348M. b) Per Capita expenditures are also right skewed, but slightly less so than expenditures. The median is $0.269k ($2,690) and the IQR is $0.139k. c) For Per Capita expenditures, there is one state that is an extreme outlier (Alaska) and another that is a moderate outlier (Delaware). Both these are states with small populations, which may mean that there are a lot of fixed costs in the systems.

EXERCISE 7 Scatter of DFOOT vs. HCRN:

dfoot 6

5

4

3

2 1

2

3

4

5

hcrn

6

7

8

9

Scatter of

Chapter 1 – Page 8


dfoot vs. ht:

dfoot 6

5

4

3

2 20

21

22

23

24

25

26

27

28

29

30

27

28

29

30

31

32

ht

Scatter of HCRN vs. HT:

hcrn 9 8 7 6 5 4 3 2 1 20

21

22

23

24

25

26

ht

31

32

The

strongest relationship exists between DFOOT, the diameter of the tree at one foot above ground level and HT, the total height of the tree. (notice from the scatter plot that it is closest to a linear relationship) One would expect that as the base of the tree increases in diameter the tree would increase in height as well.

Chapter 1 – Page 9


EXERCISE 8 a and b) The histogram for the ages of the cases slightly left-tailed (it would be less so except for the artificial truncation of the uppermost class). But the histogram for the ages of those who died is extremely left skewed – almost all the deaths are among older cases.

c) The death rate increases dramatically after age 50.

Chapter 1 – Page 10


EXERCISE 9 a) Median was 249 and mean 292.4. The mean is larger than (to the right of) the median, indicating a distribution skewed to the right. Yes, both the stem and leaf plot and the box plot reveal the skewness of the distribution. b) The outliers 955 and 1160 may have resulted from younger patients or from patients diagnosed earlier or in whom the disease was less severe. c) 38 out of 51 patients were in remission for less than one year, so approximately 75%.

EXERCISE 10 a) For the combined group, scores are almost evenly distributed over the interval from 0 to 100.

1 7 .5

1 5 .0

1 2 .5 P e 0 .0 r 1 c e .5 n 7 t 5 .0

2 .5

0 0

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

1 0 0

sco re

Chapter 1 – Page 11


b)

30 25 P e 20 r 0 c 1 5 e n t 10 s t a t u s

5 0 30 25 P e 20 r 1 c 1 5 e n t 10 5 0 0

10

20

30

40

50

60

70

80

90

100

6 0

7 0

8 0

9 0

1 0 0

sco re

30 25 P e 20 r 2 c 1 5 e n t 10 s t a t u s

5 0 0

1 0

2 0

3 0

4 0

5 0 sco re

As expected, the placement scores among those who passed the class (Status=0) tend to be higher than the scores of those who failed the class (Status=1). Surprisingly, the scores for those who withdrew (Status=2) are as high as those for those who passed. Perhaps this is because those who withdrew do so for a variety of reasons frequently unrelated to their math ability.

NOTE TO INSTRUCTOR: The differences between the groups might best be displayed with boxplots, though boxplots have a tendency to hide multiple modes.

Chapter 1 – Page 12


100

75

s c o r e

5 0

25

0 0

1

2

sta tu s

Chapter 1 – Page 13


EXERCISE 11 a) Since the data is sequentialby year, use a time series plot.

b) Prices have been increasing, with an especially sharp jump in the period from 2007 to 2007.

EXERCISE 12 a) Reliability (36.7%) and Enthusiasm (25%) are the traits most frequently thought of as most important.

4 0

Relative Frequency

3 0

2 0

1 0

0 R E LB LE N T H US LF S TG G R M HE LO Q TP U S H Y

TRAIT

Chapter 1 – Page 14


EXERCISE 13a) 4

Half-Life of drug

3

2

1

0 1

2

3

4

5

6

7

8

9

10

11

Dosage in mg per kg

The initial dosage seems to be higher for drug A but the half-lives are not much different. b) There is not any strong relationship between half-life and dosage, either overall or within the individual drugs. c) Drug A: M = 9.21, SD = 1.14;

Drug B: M = 2.67, SD = 0.44;

Typical dosage is much higher for

Drug A. This supports the conclusion in part (a).

Chapter 1 – Page 15


EXERCISE 14 a) IQR = 119-18 = 101, upper fence = 119+1.5*101 = 270.5, upper whisker will stop at 253 and there is one outlier at 300. See boxplot below. This distribution is very right (or positively) skewed. A small number of homes linger on the market for a long time. b) The mean will be greater than the median.

EXERCISE 15. The boxplots show that typical wind speeds are higher at 4PM than at 4AM. There is not much difference in the dispersions (the IQR is slightly higher at 4AM but the range slightly higher at 4PM). Both distributions are very right skewed, with numerous outliers on the right side.

Chapter 1 – Page 16


CHAPTER 2 EXERCISE 1 a) Y

50,000

10,000

1,000

10

0

P(y)

1/150,000

5/150,000

25/150,000

1,000/150,000

148,969/150,000

b) mean= (1/150000) * 50000 + (5/150000) * 10000 + (25/150000) * 1000 + (1000/150000) * 10 + (148,969/150,000)*0 = $0.90 c) no, expected value is only $0.90 so there would be an expected (average over many tickets) loss of $0.10 per ticket. d) σ²= (50000-.9)²*(1/150000) + (10000-.9)²*(5/150000) + (1000-.9)²*(25/150000) + (10-.9)²*(1000/150000) + (0-.9)²*(148969/150000) = 20166.5233 σ = $142.01 This distribution is very skewed, so the standard deviation does not reflect the very large (but low probability) prizes, even if you use an interval of mean +3 standard deviations.

EXERCISE 2 a) 0.5, using area of rectangle with base from 0 to 1. b) using formulas for uniform distribution in section 2.4, µ = 1, Variance = 4/12 = 1/3.

EXERCISE 3 a) mean=(0*.3277)+ (1*.4096) + (2*.2048) + (3*.0512) + (4*.0064) + (5*.0003) = 0.9999 Variance= 0.7997 b) µ= np = (5)(.2) = 1

σ² = np(1-p) = (5)(.2)(.8)= .8

Yes, they agree except for small rounding error.

EXERCISE 4 The system will fail if at least one component fails (since they are in series). Hence, they will fail if they do not all work. Probability of failing = 1 – probability all work = 1 – (0.999)10 = 0.00996

Chapter 2 – Page 1


EXERCISE 5 Arrangement 1: probability of failure = prob [(A1 fails and A2 fails) OR (B1 fails and B2 fails)] = (0.01)2 + (0.01)2 – (0.01)4 = 0.0001999 Arrangement 2: probability of failure = Prob[(A1 fails or B1 fails) AND (A2 fails or B2 fails)] = (0.01 + 0.01 – 0.012)2 = 0.000396 Arrangement 1 is most reliable.

EXERCISE 6 If you use a scientific calculator rather than table A.1, your values may differ slightly. a) 0.1587 b) 0.8413 c) 0.5 – 0.1587 = 0.3413 d) 1 – 0.9332 – 0.0668 e) 0.9808 – 0.1635 = 0.8173 f) look for 0.05 in the column for probability, A is between 1.64 and 1.65 (or use Table A.1A to find it is 1.645, or an inverse normal function on a scientific calculator or software). g) look for 0.025 in the column for probability, C is 1.96

EXERCISE 7

a) P(Y >15) = P(Z > 1)= 0.1587 on TI-84, normcdf(15,1E99,10,5)

b) P(8 < Y < 12) = P( -2/5 < Z < 2/5) = .6554 - .3446 = 0.3108 on TI-84, normcdf(8,12,10,5) = 0.3108 c) P(Z < c) = .90 P( Z > c) = .1 c = 1.282 = (Y – 10)/5 Y = 16.41 On TI-84 calculator, invnorm(0.90,10,5) = 16.41

EXERCISE 8 Treating grades as a continuous random variable. a) Prob (Y < 60) = P(Z < (60-76)/14) = P(Z < -1.14) = 0.1265 b) (Y – 76)/14 = 1.28, so lowest score for an A is 93.92, or 94. c) Adding points changes the mean, not the standard deviation. If only 5% are to fall at or below 60, then the new mean must be such that

Chapter 2 – Page 2


(60 - µ)/14 = -1.645, or µ = 60+1.645*14 = 83.03. Since the original mean was 76, we need to add 7 points.

EXERCISE 9 The sample proportion favoring the increase has mean = p = 0.20, and variance = p(1-p)/n = (0.2 * 0.8)/250, so std dev= 0.025298 Z= (0.24 - 0.2)/0.025298 = 1.581 P(Y > 1.581) = 0.0571 (or on TI-84, 0.0569)

EXERCISE 10 a) µ = 0(0.368)+1(0.368)+2(0.184)+3(0.061)+4(0.015)+5(0.003)+6(0.001)= 1.00 Variance = (0-1)2(0.368)+(1-1)2(0.368)+ . . .+(6-1)2(0.001) = 1.004 b) the mean is correct, but the variance is off a little bit. Perhaps it is rounding error.

EXERCISE 11 Let O be the number of operators and Y = number of calls arriving. If number of calls exceeds number of operators, get a busy signal. Want to choose O so P(Y > O) = 0.05. Z = 1.645 = (O-40)/(sqrt(40)) →

O = 50.4, so need 50 or 51 operators.

EXERCISE 12 We set  = np = 200*0.015 = 3. The probability of 0 cases is e-3*30/0! = 0.0498.

EXERCISE 13

standard error of sample mean = 25/sqrt(100) = 2.5

Z1 = (138-140)/2.5 = -0.8

Z2 = (142-140)/2.5 = 0.8

P(138 < Y < 142 ) = P( -.8 < y < .8) = .7881 - .2119 = 0.5762 On TI-84 calculator, normcdf(138,142,140,2.5) = 0.5762

EXERCISE 14 We replace products with lifetime less than A, where P(X < A ) = 0.10. The z-score that corresponds is -1.28, so (A – 1000)/150 = -1.28, hence A = 808. So the manufacturer should guarantee the product for 808 days.

EXERCISE 15 P(Z < z1) = .1 and

P(Z > z2) = .1 Chapter 2 – Page 3


Z1= -1.282 Z1 = -1.282 = (60 - µ)/σ and

and

Z2= 1.282

Z2 = 1.282 = (90 - µ)/σ

Solving two equations for two unknowns yields: σ = 11.70

σ² = 136.9

µ = 75

EXERCISE 16 a) If treatment not effective, probability of survival still 0.3. Probability of at least 2 out of 3 using binomial is 0.216. b) Probability of at least 4 out of 6 using binomial is 0.070. c) In both cases, at least 2/3 of victims survived, but the chance of this happening in a sample of 6 if the treatment is NOT effective is much smaller, and less likely to be due to chance.

EXERCISE 17 For a normal distribution, the quartiles are about 0.67 above and below the mean, so the IQR is about 1.34. The upper fence starts at 0.67 + 1.5*(1.34) = 2.68. The probability beyond this is 0.0037. Counting the lower tail as well, there is about 0.0074 probability (0.74%) that a value will be an outlier.

EXERCISE 18 The upper limit for the control chart is 0.05 + 3 * .05 * .95 / 100 = 0.115 . The process was in control throughout the first shift. During the second half of the second shift an upward trend began in the fraction of nonconforming. No single shift was listed as ‘out of control’ until the third shift, but the persistent high nonconforming rates during the second shift could be viewed as a warning.

Chapter 2 – Page 4


Proportion Nonconforming

0 .2 0

0 .1 5

0 .1 0

0 .0 5

0 .0 0 0

2

4

6

8

1 0

1 2

1 4

1 6

1 8

2 0

2 2

2 4

Sample Number

EXERCISE 19 The number of different tickets is 53*52*51*50*49*48/6! = 22,957,480 a) There is only one choice of numbers that will win the grand prize, probability = 1/22957480 = 4.356*10-8 b) 1.233*10-5 There are 6 choices for a number to get wrong, and 47 numbers to insert in its place. c) 0.01484

EXERCISE 20 The control limits are 4  3 0.25 / 5 = (3.33,4.67) The process was initially producing windshields that were too thin, shifted to ones that were too thick, and then back to windshields that were too thin. Perhaps the production control is over-compensating in its attempt to correct the process.

Chapter 2 – Page 5


6

Mean Thickness

5

4

3

2 0

2

4

6

8

1 0

1 2

1 4

1 6

1 8

Sample

EXERCISE 21 a) Use binomial with p = 0.05. A: 0.939, B: 0.921 b) Use binomial with p = 0.15. A: 0.322, B: 0.097 c) Plan B is more expensive but has a smaller probability of missing an increase in the true error rate.

EXERCISE 22 a) All people who have recently filed for unemployment benefits in your county. b) If the true proportion p is 10%, then the number in our sample should be a binomial with n = 68 and p = 0.10. Denote the number of those with these feelings in the sample as X. If you are using a calculator or program that has the binomial cdf, then Prob(X ≥ 12) = 1 – Prob(X ≤ 11) = 1 - 0.9638 = 0.0362 c) If the true proportion in the population is 10%, it would be unusual to find 12 or more in a sample of 68. I would conclude that the proportion of unemployed people who have had feelings of sadness recently is most likely more than 10%. d) The sample proportion would be nearly normally distributed with a mean of 0.10 and a standard deviation of

.1 * .9 = 0.0364 . Using the normal distribution, the probability that the sample 68

proportion will be less than 0.05 or more than 0.15 is 0.0848+0.0848 = 0.1696.

Chapter 2 – Page 6


EXERCISE 23 a) 119.2 b) 0.4972 (TI84: 0.4950) c) standard error = 15/sqrt(20) = 3.354, probability = 0.3734 (TI84: 0.3711)

EXERCISE 24 a) If n = 50 and p = 0.03, then the expected number of birth defects is 50(0.03) = 1.5. Using the binomial distribution, the probability of X > 3 (note ‘more than twice’) is the same as 1 – Prob (X ≤ 3) = 1 – 0.9372 = 0.0628. b) If n = 150 and p = 0.03, then the expected number of birth defects is 150(0.03) = 4.5. Using the binomial distribution, the probability of X > 9 = 1 – prob (X ≤ 9) = 1 – 0.9845 = 0.0155. Though the decision rules are very similar, our chances of a ‘false’ spike in the observed proportion of birth defects is much smaller when we have a large sample.

EXERCISE 25 Inspecting Table A.4 with 5 and 5 df, P(F > 5) = about 5%. On TI84 calculator, get 0.051 = Fcdf(5,1E99,5,5).

Chapter 2 – Page 7


CHAPTER 3 Answers may differ slightly due to rounding. EXERCISE 1 a) The most serious error would be to need the umbrella and not have it, so Ho: it will rain versus H1: it will not rain. Type I error is when it truly will rain but you say there is evidence it will not. This is the situation where you would not have an umbrella when you needed it, and you will control the probability of this error at a low level. b) Most serious error would be to fire a good employee – null hypothesis is employee should not be fired. c) Usually consider that there needs to be evidence for a merit raise, hence null hypothesis is no merit and alternative hypothesis is that employee merits a raise. d) Must show evidence for claim, so null hypothesis is claim is not correct. e) Must show evidence for guilt, so null hypothesis is innocent. f) Must serious problem is to have battery run out during test, so null hypothesis is battery needs to be changed. g) Highly unpopular to restrict driving! Would need to justify with strong evidence it would improve achievement. Null hypothesis is that restricting driving does not improve achievement.

EXERCISE 2 a) Ho: µ = 8, H1:µ ≠ 8, reject Ho if x < 7.9 or x > 8.1. Since the standard error is 0.15/sqrt(16) = 0.0375, the probability of rejecting Ho when the mean is truly 8 (that is, Ho true) is P(Z < -2.67 or Z > 2.67) = 2*0.00383 = 0.00766 = α b) The following type of calculation must be done for several choices of µ, then the resulting β graphed against the choices of µ. When µ = 8.05, then probability of not rejecting Ho = = P(7.9 ≤ x ≤ 8.1) = P(-4 ≤ Z ≤ 1.33) = 0.9087 = 

EXERCISE 3 a) Do not reject Ho if -1.96 ≤ z ≤ 1.96 → 9.616 ≤ x ≤ 11.184. Assuming  = 10.0 and standard error of mean is 2/sqrt(25) = 0.4, then the probability of this happening is P( -0.96 ≤ z ≤ 2.96) = β = 0.8300. On a scientific calculator, you may get 0.8299. b) Do not reject Ho if 9.3696 ≤ x ≤ 11.4304. β =0.9428 (may get 0.9423 on scientific calculator) c) when =0.05, accept Ho if 10.416 ≤ x ≤ 11.984, =0.1492; when =0.01, =.3372

Chapter 3 – Page 1


d) when =0.05, accept Ho if 10.542 ≤ x , =0.0877; when =0.01, =0.2514 e) As you reduce the chance of Type I error (reduce ) you increase the chance of Type II error (increase ). As the discrepancy between the true value of  and the value hypothesized in the null hypothesis increases, your chance of failing to reject Ho decreases – that is, it is more likely that you will be able to ‘prove’ strong effects. One tailed tests have a smaller  than two-tailed tests.

EXERCISE 4

n = 100, µ = 10,  = 2, standard error = 0.2

a) Ho: µ = 10.4 versus H1: µ ≠ 10.4 at α = 0.05, Do not reject Ho if 10.008 ≤ x ≤ 10.792. β = P(0.04 ≤ z ≤ 3.96) = 0.4840 b) at α = 0.01, do not reject Ho if 9.8848 ≤ x ≤ 10.9152. β = P(-0.576 ≤ z ≤ 4.576) = 0.7190 c) Ho: µ= 11.2 versus H1: µ ≠ 11.2. For α = 5%, do not reject if 10.808 ≤ x ≤ 11.592; β = P(4.04 ≤ z ≤ 7.96)  0.000 For α = 1%, do not reject Ho if 10.685 ≤ x ≤ 11.715; β = P(3.4 ≤ z ≤ 8.58)  0.000 d) Ho: µ = 11.2 versus H1: µ < 11.2 at α = 0.05 Do not reject Ho if x > 10.871; β = P(Z > 4.36)  0.000 Ho: µ = 11.2 versus H1: µ < 11.2 at α = 0.01 Do not reject Ho if x > 10.735; β = P(Z > 3.67) = 0.0001 e) Comparing the answers here to those of Exercise 3, we see that increasing the sample sizes reduces β.

EXERCISE 5 a) Ho: µ = 100 versus H1: µ ≠ 100. z = (92-100) / (10/sqrt(30)) = -4.38, reject Ho, there is significant evidence that the mean in this class differs from 100. b) P(|z| > 4.38) is approximately 0

Chapter 3 – Page 2


c) Since the test shows this class has grades that are systematically less than is typical, it may be that the teaching methods need to be re-evaluated.

EXERCISE 6 a) Ho:  = 48750 versus H1:   48750. Reject Ho if |z| > 1.96 z = 1.739 (p value = 0.082). There is no significant evidence that the mean income has changed. b) 51200 ± 1.645*12200/sqrt(75) = (48883, 53517) With confidence 90%, the true mean income is between $48,883 and $53,517. c) Accept Ho if 45988.88 ≤ x ≤ 51511,12.  = 0.1453

EXERCISE 7 a) Ho: p=.4 vs. H1: p=.6 α = probability of getting 4 red in 4 tries assuming Ho is true = (0.4)^4= .0256 b) β = probability of not rejecting Ho assuming H1 is true = P(y  3) = 1 – P(y = 4) = 1 – (.6)^4 = 0.8704

EXERCISE 8 a) P( x < 76) assuming µ = 80 = P(z < -5.56)  0.000. b) Reject Ho if x < 80-1.645(0.72) = 78.816

EXERCISE 9 a) E= (1.96)*(8)/12 = 1.31 b) 79.6 ± (1.96)*(8/12) (78.29, 80.91) With confidence 95%, the true mean reaction time is between 78.29 and 80.91s. c) n = (1.96)²*(8)²/(1)² = 245.8 so n = 246

EXERCISE 10 a) Ho:  = 0.3, H1:  > 0.3, reject Ho if z > 1.645. z = (0.35-0.3) / (0.4/sqrt(20)) = 0.56, p value = 0.288. There is no significant evidence that the drug produces a meaningful decrease. b) Use α = 1%. The lower the significance level, the stronger the required evidence. Chapter 3 – Page 3


EXERCISE 11 a) Ho:  = 100 versus H1:  < 100 Reject Ho if z < -1.645. z = (91.6-100) / (20/sqrt(8)) = -1.188 There is no significant evidence that the true mean is less than 100.

Rejection region shown as shaded Area to right of -1.645, has area 0.05

b) p value = P(z < -1.188) = 0.1174

p value is area to right of -1.188, as shown as shaded area in graph. Note it is same shape as classic rejection region, but using observed -1.188 as dividing point.

EXERCISE 12 Ho: p = 0.2 versus H1: p < 0.2 where p is probability of exceeding threshold. Using binomial with n = 8 and p = .2, p value = P(Y  1) = 0.503 There is no significant evidence that the probability is less than 20%.

EXERCISE 13 Ho: µ = 12.3 versus H1:  < 12.3 (note, however, that many statisticians prefer the two-tailed test) z= (10.9 – 12.3)/(3.5/sqrt(100)) = -4.0 p value is .00003, much less than 0.05. There is significant evidence that the mean processing time has been reduced.

Chapter 3 – Page 4


EXERCISE 14 The population is all batches of loans that could have been produced by the simulation. In this population, with confidence 95%, the mean error rate would be between 4.57 and 4.63 per batch.

EXERCISE 15 n = (1.96)²*(0.5)²/(.01)²= 9604

EXERCISE 16 a) p = true probability of birth defect in this community Ho: p = 0.03 versus H1: p > 0.03 b) p value = P(Y ≥ 3) assuming p = 0.03, = 1 – P(Y ≤ 2 ) = 0.1178 c) Since the p value is not less than α=0.05, there is no significant evidence that the birth defect rate in this community is elevated.

EXERCISE 17 a) assuming Ho true (p = .03), the probability of rejecting Ho is P( Y ≥ 4 ) = 1 – P(Y ≤ 3) = 1 – 0.9686 = 0.0314

using the binomial distribution with n = 40

b) P(Y ≤ 3) assuming p = 0.10 = 0.4231

EXERCISE 18 a) µ = mean calorie consumption among teenage boys in your county Ho: µ = 2700 versus H1: µ ≠ 2700 Reject Ho if |z|>1.96. z = (2620-2700) / (450/sqrt(36))= -1.07 There is no significant evidence that the mean calorie consumption among teenage boys in this county differs from the national mean. (The p value is 0.286.) b) Reject Ho if x < 2553 or x > 2847. If µ = 2600, the probability of this happening is P(z < -0.63 or z > 3.29) = 0.2643 + 0.0005 = 0.2648.

EXERCISE 19 a) (2473, 2767) b) about 138 or 139

Chapter 3 – Page 5


EXERCISE 20 a) Use a one-sided confidence interval for µ, the mean overpayment among all claims from the chain of clinics. Lower limit = 21.32 – 1.645(32.45/sqrt(100)) = $15.98 b) It is quite likely that the individual overpayments have a very positively skewed distribution (most overpayments near 0, but a few very large). The large size of the standard deviation versus the mean supports this notion. Hence, we need a large sample so that x will be normally distributed. EXERCISE 21 a) P(Y ≥ 4) using Poisson distribution with  = 1.8 = 0.1087 b) binomial with n = 12 and p = 0.1087, P(Y ≥ 1 ) = 0.7486

EXERCISE 22 a) Ho: p = .3 versus H1: p > 0.3, Here are some probabilities, by inspection, of the binomial distribution with n = 10 and p = .3 P(Y > 4) = 1 - .8497 = 0.1503, P(Y > 5) = 1 - .9527 = 0.0473. There is no value with α exactly .10, so we would go as close as we can without exceeding 0.10. So our rejection rule is ‘There is significant evidence of effectiveness if more than 5 (or at least 6) victims survive.’ b) There is no significant evidence that the treatment is effective. c) Power = P(Y > 5) assuming p = .6, power = 0.6331

Chapter 3 – Page 6


CHAPTER 4 EXERCISE 1

µ = mean weight loss among all if put on diet

Ho: µ = 0 versus H1: µ > 0 t = (2.4583-0) / [2.1339/sqrt(12)] = 3.9907 With 11 df, reject Ho if t > 1.7959 So reject Ho, yes a mean weight loss was achieved. OR p value = 0.0011, which is less than 0.05, so reject Ho. (The peril of a one-tailed test is that if the subjects actually gained significant amounts of weight, we would not catch it!)

EXERCISE 2

x =2.458, s = 2.134  is the mean weight loss of all people if put on this diet program Ho: =1 versus Ho: >1. Reject Ho if t > 2.718 using 11 df. t = 2.367 (p value = 0.0187). At =1%, there is no significant evidence that the mean weight loss exceeds 1 pound.

EXERCISE 3

Without assuming the true mean is 0, SS = 0.9217 calculated about the sample mean, and

we have 14 df. Ho: σ² = 0.01 versus H1: σ² < 0.01 (range/4 = 0.1, so want std dev < 0.1) X²=0.9217 / 0.01= 92.17 Reject Ho if X² < 6.571. Do not reject Ho. There is no significant evidence that the watches have the claimed accuracy. Note that it is also reasonable to take the SS calculated about the claimed value of 0, in which case there are 15 df.

EXERCISE 4

x =1.87, s = 1.719 With confidence 95%, the mean value among all claims is between 1.066 and 2.675 (in thousands of dollars). Since the distribution is quite skewed and the sample size small, this interval may not have the claimed confidence level. This interval is extremely wide, so it is not very helpful in establishing the true value of the population mean.

Chapter 4 – Page 1


EXERCISE 5 a) Ho: p = 0.2, H1: p < 0.2 npo = 6, n(1− po ) = 24 , so yes b) z = (0.167-0.2) / sqrt(0.2*0.8/30) = -0.46, p value = P( z <-0.46) = 0.324. No significant evidence the mill meets the requirement. c) z = -2.28, p value = 0.011 Now there is evidence the mill meets the requirement.

EXERCISE 6 Ho:  = 500 versus H1:  < 500

y = 452, s = 56.471, t = −4.66, p value  0.0001. There is significant evidence that the mean is less than 500.

EXERCISE 7 With 29 df, the critical values from the Chi-squared distribution are 16.0471 and 45.722 (using Microsoft Excel function chisq.inv). The confidence interval for the variance is

(29)(56.4712 ) / 45.722   2  (29)(56.4712 ) / 16.0471 To get the confidence interval for the standard deviation, take the square roots of the limits, to get that the standard deviation is between 44.97 and 75.91, with confidence 95%.

EXERCISE 8 Ho: µ = 129 versus H1: µ > 129 We have selected a one-tailed alternative in order to detect high blood pressures. Reject Ho if t > 2.7181. t = (133.0-129)/[13.941/sqrt(12)] = 0.994 (p value 0.1708) There is no significant evidence that the mean blood pressure in this community is higher than normal.

EXERCISE 9

p = proportion of all voters who prefer candidate X

Ho: p = 0.50 versus H1: p > 0.50 z = 0.65328 reject Ho if z > 1.645 Do not reject Ho, she should not reduce campaign funds. Asymptotic requirement fulfilled: 150*0.5=150*(1-0.5) = 75 EXERCISE 10 p̂ = 79 / 150 = 0.5267

With confidence 95%, the true proportion of voters favoring candidate X is between 0.4468 and 0.6066. Chapter 4 – Page 2


This data set is sufficiently large for the normal approximation, np̂ = 79  5 and n(1 − p̂) = 71  5.

EXERCISE 11 µ = mean weight of all healthy newborns in this neighborhood Ho: µ=7.5 versus H1: µ < 7.5

t = -1.079

Reject Ho if t < -2.8214, so do not reject Ho. We cannot conclude that the babies from this neighborhood are, on average, underweight.

EXERCISE 12

x = 7.080,s = 1.231 With confidence 99%, the mean weight of all newborns in this neighborhood is between 5.815 and 8.345 pounds.

EXERCISE 13 We need the standard deviation to be less than 0.2/2.576 = 0.07764 Ho:  = 0.07764 versus H1:  < 0.07764, reject Ho if X² < 10.117 (lower tail with 19 df) SS=0.1445 X²=23.97. Do not reject Ho. There is no evidence that the units satisfy the regulation.

EXERCISE 14 s2 = 0.0076 , SS = 0.1445. Upper limit is 0.1445/8.907 = 0.0162 and lower limit is 0.1445/32.852 = 0.0044. With confidence 95%, the variance in the weights of all units is between 0.0044 and 0.0162.

EXERCISE 15 Management will routinely sample washers and calculate the mean and standard deviation. If the null hypothesis that the standard deviation is at most 0.002 is rejected, then they will have to stop production and fix the machine. The choice of significance level depends on how difficult or costly the adjustment is. If it is a quick and cheap adjustment, then they may accept a high chance of a Type I error (unnecessary adjustments) in order to keep quality high. This would equate to a high choice for α. But if the adjustment is very difficult or costly, then they would want to avoid Type I errors by setting α low. A low α would mean more Type II errors, where the machine really does need adjustment but they do not take any action.

EXERCISE 16 Ho:  = 0.002 versus H1: > 0.002

X 2 = [24 * 0.00372] / (0.002)2 = 82.14 Chapter 4 – Page 3


Using 24 df and a TI-84 calculator or Microsoft Excel, the p value is < 0.0001, this would be significant even at a very small α. So the machine should be adjusted.

EXERCISE 17 normality assumption violated, very right (positively) skewed distribution, assumptions are not valid . Try a transformation such as logarithm of claim.

EXERCISE 18 Stem Leaf # 101 9 1 101 04 2 100 56 2 100 01244 5 99 577777 6 99 24 2 98 6 1 98 2 1 + + + + Multiply Stem.Leaf by 10**-1

Boxplot 0 | | +--+--+ *-----* | | 0

The distribution appears reasonably symmetric, but perhaps slightly long tailed (two moderate outliers). If the sample were larger, we might base our test on the proportion of observations more than 0.2 from the 10.0 target value.

EXERCISE 19 1.87546 ± (2.0796)(0.6299/sqrt(22)) = 1.87546 ± (2.0796)(0.134299) (1.596, 2.155) With confidence 95%, the true mean half-life of this drug is between 1.596 and 2.155.

Chapter 4 – Page 4


EXERCISE 20 SS = 8.33275. Using Microsoft Excel function CHISQ.INV to find the 5th and 95th percentile with 21 df, we get 11.591 and 32.67. The lower limit is 8.33275/32.671 and the upper limit is 8.33275/11.591. With confidence 90%, the variance of the half-life is between 0.2551 and 0.7189.

EXERCISE 21 p = true proportion of customers who also drink another brand Ho: p = 0.10 versus H1: p > 0.10 (H1 contradicts claim) . Reject Ho if z >1.645. z = 2.667 Reject Ho, there is evidence that the proportion is more than 10%. OK to use z-test here because npo = 100 * 0.10 = 10, n(1− po ) = 90, both  5.

EXERCISE 22 Ho: p = 0.1 versus H1: p > 0.10 Using α = 5%, we will reject the null hypothesis if the sample proportion is more than 0.1+1.645*sqrt(.1*.9/100) = 0.1506. That is, we will reject the null hypothesis if X ≥15, where X is the number in the sample of 100 who drink another brand. Using the binomial distribution to calculate power = P(X ≥ 15) with n = 100 and p = several choices at or above 0.10 gives:

Note that this critical region is only approximately of significance level 5%. The actual probability that we will reject Ho if p = 0.1 is 7.3% rather than 5%. This is because we used the normal approximation to set the critical region. The binomial is a discrete distribution and there is no rejection region with probability exactly 5%.

Chapter 4 – Page 5


EXERCISE 23 Ho: σ² = 0.01 versus H1: σ² < 0.01 However, since the mean difference should be 0, we need to take the sums of squares around 0 (the uncorrected sums of squares). Hence, the χ² should be based on 15 degrees of freedom. Reject Ho if χ² < 7.261if we use  = 5%. The device does not meet the standard for accuracy. SS = 0.5032, X²= 50.32

EXERCISE 24 a) The 6.285 is an extreme outlier. There are three other 6.50

values that are moderate outliers: 5.52 on the high side and 5.115 and 5.125 on the low side.

6.25

b) The extreme positive outlier causes the distribution to be

typical pH.

PH

skewed. Hence, medians would be a better descriptor of

6.00

5.75

5.50

c) M = population median Ho: M ≥ 5.40 versus H1: M < 5.40

5.25

The number of observations strictly larger than 5.40 is 4. If

5.00

the median was 5.4, the probability an observation would exceed 5.4 would be 0.50. Using the binomial distribution, the probability of 4 or fewer successes in 20 tries when p = 0.50 is 0.0059. This is the p value. There is significant evidence that the median is less than 5.40. Note that if this had been a two-tailed alternative, we would have doubled this to get a p value of 0.0118. This data set is large enough to use a z test for the proportion,Ho: p = 0.5, z = -2.68, p value = 0.0036. Answers differ slightly because z test for proportion gets more accurate when samples really large. EXERCISE 25 p̂ = 22 / 44 = 0.50,

0.50 1.645 0.5* (1− 0.5) / 44

= (0.376, 0.624)

With confidence 90%, the percentage of all Kintyre residents who would rate windfarms this way is between 37.6% and 62.4%.

EXERCISE 26 Ho: µ = 90 versus H1: µ > 90 Reject Ho if t > 1.383 a) t = -2.68, p value 0.987. Do not reject Ho. There is no significant evidence the mean sound level exceeds 90dBa. Chapter 4 – Page 6


b) t = 2.19, p value 0.028. Reject Ho. There is significant evidence that the mean sound level exceeds 90 dBa. (Pay attention to negative signs!) c) Given the potential for hearing damage, we want to require protection if there is even modest evidence the sound level exceeds the threshold.

EXERCISE 27 H0 : p = 0.10, H1 : p  0.10 Since np0 = 70 *.1 = 7  5 and

n(1− p0 ) = 70 *.9 = 63  5 we can use the z = 1.195, p value = 0.116. There is no significant evidence that the proportion of times the noise is greater than the threshold exceeds 10%. The company does not need to begin requiring ear protection.

EXERCISE 28 a) p = proportion defective among all parts from this vendor Ho: p = 0.05 versus H1: p > 0.05 b) npo = 20*0.05 = 1 which is less than 5. Hence, cannot use z test for proportions. c) Using the binomial with n = 20, the probability of 4 or more defects in 20 parts when p = 0.05 is 0.0159. d) Since the p value is less than α = 0.05, there is significant evidence that the proportion of defectives exceeds 5%. EXERCISE 29 a) X 2 = 2.18, 11 df . Critical value from table A.3 = 3.053. There is evidence variance is less than 0.25. b)

boxplot appears symmetric, normality is reasonable

Chapter 4 – Page 7


normal probability plot – points cluster along straight line so that normality is reasonable. b)

EXERCISE 30 p = probability a survey respondent will be Hispanic Ho: p = 0.59 versus H1: p ≠ 0.59 We can use the z test for proportions because 454*0.59 = 268 and 454*(1-0.59)=186 which are both at least 5. Reject Ho if z < -2.576 or z > 2.576. z =

(0.36 − 0.59) 0.59 * 0.41 / 454

= −9.96

The proportion of Hispanics in the survey differs from the census figure by more than can be attributed by chance. It is apparently much lower than it should be. EXERCISE 31 a) p = probability of improving. Ho: p = 0.5 versus H1: p ≠ 0.5 z = 2.49, p value = 0.0127. At  = 5%, there is significant evidence that the proportion improving differs from chance. b) Using the binomial distribution with n = 9 and p = 0.5, the probability of 6 or more out of 9 improving is 0.2539. The two-tailed p value is 0.5078. There is no significant evidence that the proportion improving in Putnam differs from that due to chance.

Chapter 4 – Page 8


CHAPTER 5 EXERCISE 1

Ho: µA = µB versus H1: µA ≠ µB.

Using the pooled t test, there are 23 df, so reject Ho if |t| > 2.0687. Sp²= 77.885; t = 4.321/3.5329 = 1.223 with 23 df, p value = 0.2337. Do not reject Ho. There is no significant evidence that there is a difference between methods. Alternatively, unequal variance test had t’ = 1.219 with 22.5 df, p value = 0.235. Same conclusion. EXERCISE 2 Using Class B – Class A, the Confidence interval is 4.321  2.0687 77.885 (1 / 12 + 1 / 13) = 4.321  7.309

With confidence 95%, the difference in the mean scores when taught by the two methods is between -2.99 and 11.63. This answer uses the pooled version of the test statistic. If you used the unequal variance version on a scientific calculator or statistical software you would get -3.02 to 11.66. However, if you did the unequal variance version and guessed the df as 12-1 = 11, then you would get a wider interval.

EXERCISE 3

Ho: µA = µB versus H1: µA ≠ µB.

Pooled t test: There are 14 df. Reject Ho if |t| > 2.1448. Sp²= 26.2201/14 = 1.873 t = 1.479, p value 0.1616. Unequal variance t test: t’ =1.478, df = 14, p value 0.1616. Do not reject Ho. There is no significant evidence the mean pollution indexes differ for the two areas.

EXERCISE 4 It is now a paired t test. By using pairing, we can eliminate day-to-day variability in the indices, and focus just on the difference in the areas. Taking differences as Area B – Area A, Ho: d = 0 versus H1: d  0 d = -1.011, sd = 0.1974, t = -14.49 with 7 df.

Reject Ho if |t| > 2.3646. There is significant evidence the areas differ in their mean index.

EXERCISE 5

Ho: µNew = µReg versus H1: µNew > µReg or µNew-µReg > 0

Pooled t test has 16 df. Reject Ho if t > 1.7549 Sp²= 1578.393;

t = 3.136, p value 0.0032 Chapter 5 – Page 1


Unequal variance t test has 13.8 df, t’ = 3.08, p value 0.0042. Reject Ho. The new diet does seem to result in higher mean weights. (Pooled t test is reasonable here, Ftest comparing variances had p value 0.18.)

EXERCISE 6 Ho: µNew = µReg versus H1: µNew > µNew+25 or µNew - µReg > 25

yold = 845.5, ynew = 904.6, snew = 43.28, sold = 36.73 Pooled t test has 16 df. Reject Ho if t > 1.7459 Sp²= 1578.393;

t = 1.81, p value = 0.0446,

Unequal variance t test has 13.8 df, t’ = 1.775, p value = 0.0488 Reject Ho. There is evidence the new diet increases mean weight by more than 25 pounds.

EXERCISE 7

p1, p2 the true probabilities that the two machines will produce a defective.

Ho: p1 = p2 versus H1: p1  p2 p̂1 = .0786, p̂2 = 0.05, p = .0618,, so then z =1.077, p value 0.281.

There is no significant difference in the reliability of the two machines, at any reasonable significance level. The asymptotic requirement is satisfied because in the smallest sample,

np = 8.65  5

and n(1 − p) = 131.3  5.

EXERCISE 8 Paired t test, differences are mpg with device – mpg without device. Ho: µD = 0 versus H1: µD > 0. Assume α=5%, reject Ho if t > 1.7959. d = 0.7667, sd = 1.3647,t = 1.95 , p value = 0.0388

At α=5%, there is significant evidence that the device improves mean mileage. [If the device is expensive, it is worth noting that there is NOT evidence if α = 2.5%.]

EXERCISE 9

Ho: (new) = (standard) versus H1: (new) < (standard),

or H1: (standard)/(new) > 1 2 2 F= sstan / snew = (3.9049)²/(3.1862)² = 1.502 with 15, 15 df.

P value = 0.2200, or Reject Ho if F > 2.40. Do not reject Ho. No significant evidence that the methods differ in their consistency.

Chapter 5 – Page 2


EXERCISE 10 a) This is a paired t test. If d = after-before, then Ho: d = 0 versus H1: d > 0. d = 2.167, sd = 1.722,t = 3.08, p value = 0.0137.

There is significant evidence that the participants feel the seminar improved their knowledge. b) However, it is very possible that only the participants who had the most positive attitude about the seminar were the ones returning the follow-up survey. That is, those who did not feel it was helpful are not represented in the sample. This would cause the sample to appear better than the group as a whole. EXERCISE 11 Ho: µ1 = µ2 versus H1: µ1 ≠ µ2 If α = 5% and with 8 df, reject Ho if |t| > 2.306 Sp²= 3755.227

t = -1.1798

p value 0.2720

Do not reject Ho. No significant difference in mean level of hydrocarbons. An F-test for the null hypothesis of equal variances had F = 2.21 and p value 0.41, so the equal variance assumption is reasonable.

EXERCISE 12 a) Independent Sample t test. Ho: ped can=obst versus H1: ped can  obst Unequal variance t’ = 1.887, df=11.4, p value = 0.0848 Pooled t = 1.860, df = 13, p value = 0.0857 There is no significant evidence the groups differ in their mean burnout value. b) F test. Ho: 2ped can/2obst = 1 versus H1: 2ped can/2obst  1 F = 1.148, p value = 0.820 There is no significant evidence that the variances differ c) Samples are very small, and the F test is particularly sensitive to the normality assumption. EXERCISE 13 Ho: p1 = p2 versus H1: p1  p2 where p is probability of improving z = -1.27, p value 0.2038 There is no significant evidence that counties differ in their probability of school improvement. Asymptotic requirement satisfied, 89*0.261=23.3 ≥ 5, 89*(1-0.261)= 65.7 ≥ 5.

EXERCISE 14 Paired t test. d = treated - untreated. If treatment works, expect d to be positive. Ho: d = 0 versus H1: d > 0.

Chapter 5 – Page 3


d = 1.425, sd

= 1.722,t = 2.341, p value = 0.0259.

There is significant evidence (at α = 5%) that the treatment improves the mean number of surviving fish. (Note, however, that if we had a two-tailed alternative, there would not be significant evidence at 5%!) EXERCISE 15 Ho: µa = µg versus H1: µa ≠ µg The F test shows moderate evidence that the variances differ F(21,20)=2.51, p value = 0.0441). Use the unequal variance version of the t test. Df are at least 20, at which we would reject if |t| > 1.7247, but may be as much as 41, at which we would reject if |t| > 1.6839 t = -0.89 with 35.7 df. Scientific calculators, Excel, or stat software will give p value of 0.3814.

There is

no significant evidence that the drugs differ in the mean of their half-lives.

EXERCISE 16 (Refers to data for Exercise 15)

The half-lives for Aminikacin (“A”) is nearly symmetric, while that for Gentamicin (“G”) has a very slight positive skew. Fortunately, the t test is robust against mild skew.

EXERCISE 17 a) p1 = probability manager in industry 1 will rate reliability as most important. Ho: p1 = p2 versus H1: p1 ≠ p2 Reject Ho if |z| > 1.96 p = (44 +60)/(120+150) = 0.385

z = -0.0333/0.0596 = -0.56

Do not reject Ho, proportions rating reliability as most important are not significantly different for the two industries. OR p value = 2*P(|z| > 0.56) = 0.575, no evidence proportions differ. Chapter 5 – Page 4


Asymptotic requirement fulfilled: 120*0.385=46.2, 120*(1-0.385)=73.8. b) Difference in sample proportions is normal with mean 0.1 and true standard deviation will be sqrt( 0.3*0.7/120+0.4*0.6/150)=0.0579. Following instructions in problem, assume estimated std error very close to this.

P(z  −1.96 or z  1.96) = P(| p̂1 − p̂2 |  0.1134) = 0.0001 + 0.4085 = 0.4086. EXERCISE 18 a) paired t test (final-initial) t = 3.06, one-tailed p value = 0.0189 There is significant evidence of an increase in mean BUN in the control group. b) paired t test (final-initial) t = 0.56, one-tailed p value = 0.3036 There is no significant evidence of an increase in mean BUN in the intervention group. c) independent samples t test comparing the mean change in the two groups. Variances do not differ significantly. Equal variance t = 2.49 with 8 df, p value = 0.0377. There is significant evidence that the groups differ in their mean change. The mean change was apparently less in the intervention group.

EXERCISE 19 a) p = probability of requiring ICU or death Ho: pHCQ = pnoHCQ z = -0.399, p value = 0.69, There is no significant evidence that patients on HCQ differ in their probability of a worsening condition. Asymptotic requirement fulfilled, 84*.215=18, 84*(1-.215)=66. b) With confidence 95%, the probability of developing a heart problem is between 3.2% and 15.8%. Asymptotic requirement fulfilled, y=8, n-y=76. c) No clear benefit (as shown in part a) and potential side effect (part b) so this data does not support use of the drug.

EXERCISE 20 p = probability of making the choice that detracts from the imbalance Ho: ppeople = pcontrol versus Ho: ppeople ≠ pcontrol z = -2.73 with p value 0.0063 There is significant evidence the groups differ in their probability of making the choice that detracts from the imbalance.

EXERCISE 21 p = probability of spending more time looking at the new goal events Chapter 5 – Page 5


Ho: preach = pfirst versus H1: preach ≠ pfirst z = 2.556, p value = 0.0106 At  = 5%, there is significant evidence that the proportions differ by more than would be expected by chance, but not at  = 1%. Check for sufficient sample size: p = 0.50 , np = n(1 − p) = 7.5  5 in each group.

EXERCISE 22 Independent sample t tests (unequal variance version) are used to compare the means in each group. Exhaustion: t’ = -8.06, p value = 0 Cynicism: t’ = -2.30, p value = 0.0216 Efficacy: t’ = 0.39, p value = 0.6966 This is consistent with the authors’ statement. Police were significantly different (apparently lower) for both exhaustion and cynicism. The difference does seem to be greater for the exhaustion scale. EXERCISE 23 McNemar’s test. There are 36 discordant pairs, split 11 to 25. Testing Ho: p = .5 using the z test for proportions, z = -2.33, p value = 0.0196. There is significant evidence of a difference, with the wife apparently more likely to favor background checks than the husband. Asymptotic requirement: 36*.5 = 18 ≥ 5

Chapter 5 – Page 6


CHAPTER 6 EXERCISE 1 There are 40 observations and T = 5 groups, so DfB between is 5-1=4 and DfW = 40-5=35 Source

Df

SS

MS

F

Between

4

25

6.25

1.411

Within

35

155

4.429

Total

39

180

F(4,35)=1.411, p value = 0.2506. There is no significant evidence that the mean number of words memorized differs for any group.

EXERCISE 2 Since there are 4 groups, there are dfB = 4-1 = 3 and dfW=60-4 = 56. Source

Df

SS

MS

F

Between

3

155

51.67

3.23

Within

56

896

16.0

Total

59

1051

F(3,56) = 3.23 with p value = 0.029 (using TI 84 calculator or Excel). There is significant evidence that at least one type of insulation gives a different mean change in the energy consumption. EXERCISE 3 TSS = 29*(3.424)2 = 339.99, SSW = 9*(2.52+32+32) = 218.25 Source

Df

SS

MS

Between

2

121.74 60.87

Within

27

218.25 8.08

Total

29

339.99

F 7.53

F(2,27)=7.53, p value = 0.0025. There is significant evidence that the mean number of words memorized differs for at least one group.

EXERCISE 4

SSW = 5(4.22 + 5.32 + 5.02 + 4.92 ) = 473.7, overall mean = 23.55, SSB = 6 [(15.5 − 23.55)2 + .. + (31.8 − 23.55)2 ] = 6 *135.53 = 813.18 Source

Df

SS

MS

Between

3

813.18 271.06

Within

20

473.7

Total

23

F 11.44,

23.685

Chapter 6 – Page 1


p value = 0.0001 There is significant evidence that at least one type of insulation has a different mean change in energy consumption. EXERCISE 5 Using Tukey’s Procedure with 3 group and family-wise significance level of 5% gives q between 3.52 (25 df) and 3.49 (30 df). Will use 3.52 for calculations here. Declare two groups different if means differ by more than | t |  3.52 / 2 = 2.489  | ya − yb |  2.489 8.08(2 / 10) = 3.164. Examining the table of means, we see that Group 2 is higher than 1 and 3. Groups 1 and 3 do not differ significantly. Group

1

3

2

Mean

8.5 a 9.0 a 13.0 b

EXERCISE 6 The researchers should use Dunnett’s test, since they are comparing to a control. Reject Ho if |t| > 2.54 using Table A.5. All t will have denominator ̂ L =

23.685(1 / 6 +1 / 6) = 2.81.

Insul. 2 vs 1: t = (22.3-15.5)/2.81= 2.42 Insul 3 vs 1: t = (24.6 – 15.5) / 2.81 = 3.24 Insul 4 vs 1: t = (31.8 – 15.5) / 2.81 = 5.80 Insulations 3 and 4 are significantly better than the standard insulation #1.

EXERCISE 7

a) Ho: µ length of stay is the same in all groups, H1: not all means are equal

F= 3.5791667/0.2335526316 = 15.3249 with 3 and 76 df At  = 0.05 the critical value is 2.73, so reject Ho (p value < 0.0001). There is significant evidence that the mean length of stay is different for at least one group. b) Since the variable is discrete, it is unlikely to follow a normal distribution. Moderate sample size (20 per group) may make up for this if there are no extreme outliers. c) Normal probability plot of residuals and boxplots indicate a serious failure of the normality assumption. Levene test shows unequal variances (p value = 0.0276).

Chapter 6 – Page 2


1 .5

1 .0

0 .5 r e s i d s 0

-0 .5

-1 .0 -3

-2

-1

0

1

2

3

N o rm a l Q u a n ti l e s

d) Tukey’s HSD indicates groups 1 and 2 have shorter mean stays than groups 3 and 4. However, patients may not be suitable to give treatments 1 and 2.

EXERCISE 8 a) Ho: all the colors have the same mean time (there is no effect due to door color) F = 20.01 with 2 and 12 df, p value = 0.0002 There is significant evidence of an effect due to door color. b) All pairwise comparisons should be made, so it is appropriate to use Tukey’s HSD test. Green has a significantly different (apparently higher) mean than either Red or Black. Red and Black are not significantly different from each other. c) This could be interpreted as meaning that we want to test the contrast Ho: µred – 2µgreen + µblack = 0, which has F=38.64 (t = 6.22). If the contrast was decided after-the-fact, its critical value should be determined using Scheffe’s method (critical value for F = 2*3.89 = 7.78, for t = 2.79). There is significant evidence that the mean for green differs from the combined mean of the other two colors.

EXERCISE 9

a) Ho: variance among suppliers is 0, H1: variance among suppliers is not 0

Note that supplier is a random effect, however, the test statistic is computed the same way as for the usual one-way ANOVA. F = 19.04 with 3 and 12 df, p value < 0.0001. There is significant evidence that the variance in mean tensile strength among suppliers is greater than 0. b) est(σ²)=139.65

est(σs²)= (2659.4 – 139.65) / 4 = 629.94

Chapter 6 – Page 3


EXERCISE 10 a) Ho: mean compression resistance is the same for all percent sand F = 14.87 with 4 and 20 df, p value < 0.0001 There is significant evidence mean compression resistance is different for at least one sand percentage. b) Since the researchers wish to explore ideas, they need to use Scheffe’s test. Contrast will be significant if F  (5 −1)(2.87) = 3.39, where 2.87 comes from the table of the F distribution using α = 5% and 4 and 20 df. For Medium versus Low, F = 33.91. For Medium versus High, F = 25.10. Medium sand groups appear to do better than either very low or very high sand groups.

EXERCISE 11 a) F = 53.55 with 4 and 20 df, p value < 0.0001. There is significant evidence that at least one group has a different mean, so there is evidence that one or more of these instructors should be rewarded. Tukey’s HSD shows that instructors 5 and 3 are significantly higher than all other instructors, and 2 is lower than all other instructors. Instrct

2

1

4

3

5

Mean

7.82 c

10.68 b

12.12 b

14.3 a

14.58 a

b) CONTROL L1= µ1 + µ3 + µ4 + µ5 - 4 µ2

F= 104.04/0.7266= 143.19 , p value < 0.0001

There is significant evidence that the average of the means with some type of additive differs from the mean with no additive. MFG

L2= µ1 + µ3 - µ4 - µ5

F= 5.09, p value = 0.0354.

There is significant evidence that the average of the means for manufacturer I differs from that for manufacturer II. ADD

L3= µ1 + µ4 - µ3 - µ5

F = 63.59, p < 0.0001

There is significant evidence that the average of the means for Additive type A differs from that for Additive Type B.

EXERCISE 12 Using the pooled version of the independent sample t test in Exercise 3 of Chapter 5, t = 1.48 with 14 df, and the p value was 0.1616.

Chapter 6 – Page 4


Using an ANOVA, F = 2.18 with 1 and 14 df, and the p value was 0.1616. The value of t2 = 1.482 = 2.19  F, with the difference attributable to round-off. The p values are identical.

EXERCISE 13 Ho: all treatments have same mean hours of sleep H1: not all treatments have the same mean hours of sleep MSB= 16.58222334

MSE=1.102445556

F = 15.0413 with 2 and 15 df, p value = 0.0003

There is significant evidence that at least one treatment has a different mean number hours of sleep. Since one group is a placebo, it is reasonable to compare both drugs to the placebo (and not make the comparison between the 2 drugs). Using Dunnett’s test, both the standard drug and experimental drug showed significantly greater mean sleep than the placebo. Dunnett’s test will not compare the standard and experimental drug, for that you need to use Tukey’s test throughout for all comparisons. Tukey’s shows that the two drugs do better than the placebo, but don’t differ significantly from each other.

EXERCISE 14 Source SS

df

MS

F

Between

452.6853

150.9

1.69

Within

1784.0020

89.2

Reject Ho if F > 3.10. There is no significant evidence that the paints differ in their mean time to peel.

EXERCISE 15 MSB = 162.9305556

MSW= 9.0583333

F = 17.99 , p value < 0.0001.

There is significant evidence that at least one insecticide has a different mean number of dead. Using Tukey’s HSD, insecticides D and C have a different (apparently higher) mean number of dead than A and B. Boxplots of the data show that the groups with the small means tend to have much larger variability. To stabilize the variances, the Arcsine (sqrt (proportion)) transformation was applied. The variances are more similar, though one group still shows an outlier. F = 23.48, again there is significant evidence that at least on insecticide has a different median number of dead. Using Tukey’s HSD, insecticide D now appears to have a significantly higher median number of dead than C, which is in turn greater than B or A. Insecticides B and A do not differ significantly from each other.

EXERCISE 16 There are 30 – 5 = 25 dfW. a) ordinary p value from t distribution (2 tailed)=2*(0.0255)=0.0510 On a TI84 calculator, right tail is TCDF(2.05,1E99,25) Chapter 6 – Page 5


b) Adjust p value using Bonferroni’s Method = 4*.0510 = 0.204 c) Scheffe’s method F* = (2.05)2 / (5 −1) = 1.051 p value is area to right of 1.051 for F distribution with 4 and 25 df p value = 0.4011 d) Tukey’s method (not available in Excel or most scientific calculators) Recall that the scale for the studentized range statistic is for | t |

2 = 2.05 2 = 2.899 Using Probmc(‘RANGE’,2.899,.,25,5) in

SAS we get the right tail probability is 1-0.7274 = 0.2726 If you tried to use Bonferroni for all pairwise comparisons, there are 5*4/2 = 10, so the p value would be at most 10*.0510 which is an extremely conservative estimate.

EXERCISE 17 Cadmium: MSB = 0.192845

MSW= 0.00332467

F = 58, p value < 0.0001

There is significant evidence that at least one soil type has a different mean absorption of cadmium. Using Tukey HSD, soil type D seems to have the lowest mean absorption, followed by C and E (C and E do not differ significantly from each other). Levene’s test shows no significant evidence that the variances differ (F = 2.02, p value = 0.1227). The QQplot of the residuals shows one or two points below the line at the upper end, indicating some mild departures from normality. These are not outliers, but rather evidence of a ‘short tail’ at the upper end. Lead (PB) : MSB = 27.3667

MSW = 1.98

F = 13.82, p value < 0.0001

There is significant evidence that at least one soil type has a different mean absorption of lead. Using Tukey’s HSD, soil types B, C and E do not differ significantly from each other, but are lower than A and D. Levene’s test shows significant evidence that the variances differ (F = 3.31 with p value = 0.0263) and the QQ plot shows a possible moderate outlier. Hence, this data might benefit from some type of transformation. A logarithmic transform shows somewhat more similar variances (except for soil C). The conclusions are similar to those from the original lead contents.

EXERCISE 18 a) There is extremely strong evidence that the mean fungus diameter is different for at least one of the mediums (F(6,21) = 168.61, p value < 0.0001). (Note the style in which the df are given inside parentheses for the F statistic.) b) Using Tukey’s HSD with a family-wise significance level of 5%

Chapter 6 – Page 6


medium

WA

TWA

PCA

CMA

NA

RDA

PDA

Mean

4.25 a

5.175 b

6.125 c

6.275 c

6.80 d

7.00 d

7.250 e

diameter The fungus grows best in medium PDA, and next best in RDA/NA (statistically tied).

EXERCISE 19 UDL= 53.775 +2.68 * 9.445 *sqrt(3/24) = 62.72 LDL= 53.775 – 2.68 * 9.445 * sqrt(3/24)= 44.83 All means fall in this range. This is consistent with the results of Exercise 14.

EXERCISE 20 The mean number of defects across all lines is c = 150 and the estimated standard deviation is s =

150 = 12.25. If α=5%, the value of h = 2.56 and the control limits are

UDL = 150+2.56*12.25*sqrt(4/5)= 178.05 LDL = 150-2.56*12.25*sqrt(4/5)= 121.95 Line 4 is significantly worse than the average, the manufacturer should focus on improving that line. Line 5 is substantially better than average, try to find out what is being done right on that line!

EXERCISE 21 a) 28*0.05 = 1.4 b) Using Bonferroni’s Method, they could use 0.05/28 = 0.0018 as a comparison-wise  for each test. c) No, neither of the tests would have been significant.

EXERCISE 22 b) Grand mean = 64.77/125 = 0.5182 SSB = 0.2881, SSW = 8.5062, F(2,122) = 2.07 with p value = 0.13 (using Excel) There is no significant evidence that any of the means differ, that is, there is no significant evidence that the instructions on Focus affected the mean IAT. c) The p values are the same as those that would be cited for an ordinary independent samples t test with 122 df. They do not appear to have been adjusted in any way for the multiple comparison problem, not even by requiring the overall F test to be significant.

Chapter 6 – Page 7


CHAPTER 7 EXERCISE 1 a) F(1,13) = 1.8148, p value = 0.2009 No significant evidence of an association. b) t =  1.8148 = −1.347 = −0.6 / std.err. → std.err. = 0.445 , Confidence interval for expected change is 4*(-0.6 ± 2.160*0.445)=(-6.2,1.4) Note this interval contains 0, consistent with test of no association. c) 1.6 d) 1.6  2.16 1.22 (1+1 / 15 + 4 / (14 *32 ) = 1.6  2.72

EXERCISE 2 µ(sugar|days)= 12.04189 – 0.82162*Days

a)

0.1587. There is no significant evidence of a linear relationship between days after picking and sugar content.

Residuals

The t-test for the slope has t = -1.61 with p value =

b) The plot of the residuals versus the predicted values suggests a nonlinear model might be better.

EXERCISE 3

Predicted value

a) µ(y|x)= 14.39557 + 0.76509x F(1,13) = 19.41, with p value 0.0007, so there is significant evidence of a linear relationship between the scores on the midterm and on the final. b) Obs

final

predict

resi

lower lim

upper lim

1

76

77.1328

-1.1328

51.2937

102.972

2

83

70.2470

12.7530

44.6196

95.874

3

89

87.0789

1.9211

60.1782

113.980

4

76

64.8914

11.1086

39.1221

90.661

Prediction intervals (limits in last 2 columns) are very wide, so an individual student’s final exam can vary wildly around the expected value. A final is necessary.

c)

Plot of actual observations and predicted equation. The points are widely scattered about the line.

Chapter 7 – Page 1


90

80

Final

70

60

50

40

30

20 30

40

50

60

70

80

90

100

Midterm

d) Predicted final is 77.1, which matches height of line if you draw vertical up from 82. e) r = .7739 R-Square= 0.5989, F = (15-2)(0.5989)/(1-0.5989) = 19.41, which agrees with (a)

EXERCISE 4 a) ˆ1 = 1.5, t = 2.65 with 12 df, p value = 0.0213. Note that as X increases by one unit, we expect an increase in the mean of Y of size 1.5. As X increases two units from -1 to 1, we expect an increase of size 3. This is the same as the difference between the sample means for the two groups. b) That X takes on just two values suggests an independent samples t test comparing Y when X=-1 to Y when X=1. The equal variance version of the test has t = 2.65 with 12 df and p value 0.0213, that is, is exactly the same as the regression result!

Chapter 7 – Page 2


EXERCISE 5

10

a) The fitted equation is estimated Range = -6.48 + 0.75*Latitude. That is, the discrepancy between the maximum and minimum

b) The residual plot shows 5 large outliers on the negative side (less range than expected). Inspecting the listing of the

Residuals

increases as one moves further north.

0

-10

city names, all of these outliers correspond to cities on the West Coast near the Pacific Ocean. Perhaps being right on the -20

ocean keeps the temperatures less variable, by keeping it cooler in summer and/or warmer in winter than would be

20

30

40

Latitude

expected.

Chapter 7 – Page 3

50


EXERCISE 6 An initial regression of FOWL on WATER showed strong evidence that the expected number of birds increased as open water increased: FOWL = 14.74 + 8.54Water, F(1,50)=181.26, p value < .0001. However, the residual plot showed strong evidence of curvilinearity, and one point with exceptionally large water and fowl that was ‘controlling’ the regression.

After some experimentation, a better model is obtained by using LNFOWL=LN(Fowl+1) and LNWATER=LN(Water+1). Adding 1 is necessary because some observations have values of 0. LN(fowl+1) = 1.262+0.952*LN(Water+1), F(1,50)=15.34, p value 0.0003.

Chapter 7 – Page 4


EXERCISE 7 a) Apparently, electricity usage increases as

80

average temperature increases.

70

b) predicted Kwh= -97.92389 + 2.00101(Tavg),

ˆ 2 = 70.355

KWH

60 50 40 30

F value = 110.2, p value < 0.0001, there is

20

significant evidence of a linear relationship

10

between daily mean temperature and electricity

60

65

70

75

80

Avg Temperature

consumption. For every increase of 1 degree in the mean temperature, the expected electricity consumption increases by 2 Kwh.

3 0

20

c) residual plot: no evidence of nonlinearity, nonconstant variance or extreme outliers. Residuals

1 0

0

-10

-20

-30 20

30

40

5 0

60

Chapter 7 – Page 5

70


EXERCISE 8 a) Using LNPE (as in Chapter 5, to stabilize variances) we get LNPE = 3.3173 + 0.352X, t = 1.84, p value = 0.0744. The intercept is the expected LNPE when x=0, that is, for NYSE exchange. The slope (estimated at 0.352) is the increase in expected PE as x goes from 0 to 1, that is, the difference between the NYSE and NASDAQ exchanges. b) sample mean for NYSE is 3.3173, consistent with interpretation in (A), and difference between 2 means is 0.352. Pooled t test has t = 1.84, p = 0.0744 same as regression result. Unequal variance t is a little different, t’ = 1.83 with 32 df, p value 0.0766.

EXERCISE 9

N = 217 and r = 0.36, so F (1, 215) = (217 − 2)(0.362 ) / (1− 0.362 ) = 32.01,

p value < 0.0001 There is strong evidence of an association. The large sample size makes it possible to detect this association, even though it is too weak to make accurate prediction possible.

EXERCISE 10 a) Amikacin: predicted half-life = 1.75241 + 0.01336Dose_mg_kg Gentimacin: predicted half-life = 2.11150 + -0.03537Dose_mg_kg While one slope is positive and the other is negative, both are so close to 0 that these are essentially parallel (horizontal) lines. b) For the null hypothesis that 1 = 0 Amikacin: t = 0.11, p value = 0.9148, there is no significant evidence of a relationship Gentamicin: t = -0.17, p value = 0.8661, there is no significant evidence of a relationship As related to part (a), neither predicted line has a slope which differs significantly from 0.

Chapter 7 – Page 6


c) 4

The degree of scatter about the two lines makes it obvious why neither relationship appeared significant.

A

Half Life

3

G G

A

G G G G G G G G GG G G

G 2

A

A A A A

AA A A AA A A

1

A A A A A A

0

0

2

4

6

8

10

12

Dose in mg/kg

EXERCISE 11 Index = 0.1357 + 1.3672*Future. F(1,44) = 76.37, p value < 0.0001 showing strong evidence of an association, though sqrt(MSE) = 0.814 indicates very far from perfect prediction! The residual plot shows no evidence of curvilinearity or heteroscedasticity. There are no obvious outliers.

EXERCISE 12 a) F =

(n − 2)r2 1 − r2

=

176 * 0.18 = 38.63 with 1 and 176 df. There is significant evidence of a linear 0.82

relationship between the two, so the authors are justified in their claim. b) t = − F = −6.215 = −.62 / std.err. Hence, the standard error is approximately 0.10, and a confidence interval would be −0.62  1.96 * 0.10 = (−0.82, −0.42). Since this interval is entirely negative, it allows the authors to describe the relationship as a decline.

Chapter 7 – Page 7


EXERCISE 13 200

The residual plot shown to the right

150

shows a failure of the regression

of the residuals is greater for the houses of larger size. That is, there is more

100

Residuals

assumptions. Apparently, the magnitude

50 0

variability in house price for the very

-50

large homes than for the smaller homes. -100 0

1

The Q-Q plot shows an apparent failure

2

3

4

5

Size in 1000 sf

of the normality assumption. However, this same pattern can also be caused by a set of residuals coming from a much larger variance.

A plot of the price per square foot versus size shows both that the price per

200

sf is generally rising as size increases, 150

and becoming more variable. Residuals

100 50 0 -50 -100 -3

-2

-1

0

1

2

3

Normal Quantiles

Chapter 7 – Page 8


EXERCISE 14 a) For X1 and X2, F(1,13) = 6.26, which exceeds the critical value from the F table for 1 and 13 df with α=5% (4.67). So this correlation coefficient is significantly different from 0. For X1 and X3, F(1,13) = 0.44, which does not exceed the critical value. So this correlation coefficient is not significantly different from 0. b) Yes, these statements are consistent with the hypothesis tests in (a).

EXERCISE 15 a) increases in reading times are associated with decreases in response times b) Since the t statistic is the ratio of the estimate to its standard error, we can solve for the standard error as -0.03/-2.11=0.0142. 20 *(−.03  2.1009 * 0.0142) = -1.196 to -.034 With 95% confidence, if reading time increases by 20 seconds, the mean response time will decrease between .034 and 1.196 ms. 2

c) t

= = F

= 4.45

18r2

1 − r2

− 4.45 4.45r

2

=

2

18r

= 4.45 22.45r

2

2

so

r

=

, no

0.198

Chapter 7 – Page 9


EXERCISE 16 (a & b) The relationship for the original variables is clearly not linear. After logarithms, the relationship is more linear for the majority of the time period, but at the end levels off. 2.4

12

2.3 10 2.2 2.1

Ln(Water)

6 4

2.0 1.9 1.8 1.7

2 1.6

c)

0 0

1

2

3

4

5

1.5 -2

6

-1

0

1

2

Ln(Soaking Time)

Soaking Time

LN(water)=1.979+0.260LN(Stime). However, the residual plot clearly indicates the remaining nonlinearity in the model.

0 . 15 0 . 12 0 . 09 0 . 06

Residuals

Water

8

0 . 03 0 . 00 -0 . 03 -0 . 06 -0 . 09 -0 . 12 -0 . 15 1 .6 1 .7 1 .8 1 .9 2 .0 2 .1 2 .2 2 .3 2 .4 2 .5

Predicted

Chapter 7 – Page 10


CHAPTER 8 EXERCISE 1 For the X’X matrix, the sum of the squares of the X1 values yields 378, that is the entry on the X1/X1 entry. For the X’Y matrix, the sum of the products of the X1 times Y entries is 544.9, that is the entry on the X1 /Y cell of the table. To verify the inverse, note that the first row of the X’X inverse matrix times the first column of the X’X matrix is 12.76103*8 – 0.762244*48 – 1.89706*34 = 1.0005 (should be 1, but there is some roundoff). The third row of the X’X Inverse matrix times the first column of the X’X matrix is -1.89706*8 + 0.1078431*48 + 0.2941176*34 = -0.00001 (should be 0, but there is some roundoff).

EXERCISE 2 a) F(10,489)=(500-10-1)*(0.07)/[10*(1-0.07)]=3.681, p value = 0.00009 Yes, it is ‘significant’. b) There is evidence that at least one of the variables is linearly related to socioeconomic status, because the F test in part (a) tests the null hypothesis that none are related. c) No, individual prediction is very poor, R-square is only 0.07, meaning that the independent variables only explain 7% of the variability in socoeconomic Status. d) F(10,39)=(50-10-1)*0.07/[10*(1-0.07)]=0.294, p value = 0.979, no.

EXERCISE 3

Result for Stress: µ(y1|x) =

700.61805 - 1.52568X1 + 175.98394X2 - 6.69714X3

The fitted regression shows that the stress at 300

failure apparently decreases as percent binder increases, however this particular effect is not

weaker as the ambient temperature increases (holding other variables constant.) However, it becomes significantly stronger as the loading

Residual stress

significant. The material becomes significantly

200

rate increases (also holding other variables

100

0

-100

constant). The residual plot does not show any indication of specification error. There are no features of

-200 -200

0

200

400

600

800

1000

1200

Predicted stress

curvature or flare, nor are there any obvious outliers.

Chapter 8 – Page 1


Result for Strain µ(y2|x)= -5.61130 + 0.66754X1 - 1.23535X2 + 0.07319X3 This shows that the strain at failure significantly 6 5

this type of strength increases with binder,

4

holding other variables constant). It significantly

3

weakens as the loading rate increases, unlike the stress at failure. It becomes significantly stronger as the ambient temperature increases. However, the residual plot shows clear indication of a nonlinear relationship, and

Residual strain

increases as the percent binder increases (that is,

2 1 0 -1 -2 -3 -4

possibly an increase in variances. A transformation of the dependent variable might

-3 -2 -1 0

1

2

3

4

5

6

7

8

9 1 0

Predicted strain

improve the model.

Chapter 8 – Page 2


EXERCISE 4 9 0

a) The residuals from the regression of PASS on

8 0 7 0

the quantitative variables shows clear

6 0 5 0

misspecification error. The variances seem to Residual

grow as the predicted size grows.

4 0 3 0 2 0 1 0 0 -10 -20 -30 -40 -50 -20

b) The logarithmic transformation makes sense,

0

20

40

60

80

1 00

120

1 40

Predicted value

2

as it often helps when variances grow with predicted size. The residual plot is enormously

reasonably stable variance. The fitted equation is Predicted ln(passengers) =

Residual

improved, showing reasonable linearity and

1

0

-1

-6.47 - 0.44Lmiles + 2.91Linm +.78Lins+.43Lpopm +.39Lpops+.71Lairl

-2 0

It makes sense that the number of passengers should

1

2

3

4

5

Predicted value

increase with the population size, but less sense that it decreases with distance. One might think that there would be more incentive to fly than drive, but perhaps the more the distance, the less strong the ties between the cities and the fewer the people who need to make the trip.

c) Using airlines as an independent variable and passengers as dependent (as in parts A and B) is treating capacity as the given variable. But treating Airlines as the dependent variable is thinking of it as if airlines are reacting to demand by moving into the market.

Chapter 8 – Page 3


EXERCISE 5

a) µ(y|x) = -379.248 + 170.220(DBH) + 1.900(height) + 8.156(age) – 1192.868(grav)

Notice residuals, this model is clearly mis-specified and not very useful. A transformation would be helpful. Since both nonlinearity and nonconstant variance appear, perhaps a multiplicative model. 400

Residual logweight

300

200

100

0

-100

-200

-300 -200

0

2 00 400 6 00 800 1 000 12 00 14 00 16 00

Predicted logweight

b) All variables were transformed by taking logarithms. The regression showed that 0 .4

log(grav) was not significant, it was dropped. The

µ(y|x) = -1.727+2.147Ldbh + 0.996Lht 0.152Lage. The residual plot is much improved with respect to linearity, but still shows some residuals with greater magnitude on the left.

0 .2

Residual logweight

fitted model was

0 .3

0 .1 0 .0 -0 .1 -0 .2 -0 .3 -0 .4 -0 .5 4

Log(age) does not contribute significantly to the

5

6

7

8

Predicted logweight

regression. Since it is harder to estimate than Height or DBH, it could be dropped and still have R-square of 98%.

Chapter 8 – Page 4


EXERCISE 6 a) Source Model Error Corrected Total

Variable Intercept value doct nurse vn

Analysis of Variance Sum of Mean DF Squares Square 4 409928367 102482092 16 6781702 423856 20 416710068

F Value 241.78

Parameter Estimates Parameter Standard Estimate Error t Value 530.26514 263.57743 2.01 1.29583 0.51457 2.52 0.46388 1.71375 0.27 -0.88013 0.90711 -0.97 2.14219 0.68198 3.14

DF 1 1 1 1 1

Pr > F <.0001

Pr > |t| 0.0614 0.0228 0.7901 0.3464 0.0063

Variance Inflation 0 56.71 121.31 77.55 31.67

The number of deaths appears to increase as the number of doctors and vocational nurses increases, but decreases when there are more nurses! The former relationships don’t make sense. However, since large population regions have more deaths, but also tend to have more doctors and nurses, the results are probably the result of population. That is also the cause of the multicollinearity problem.

b) Converting all variables to per capita basis by dividing by population should help, as it should put large population regions and small ones on the same footing.

c)

Converting to per capita values Analysis of Variance Sum of Mean Squares Square 24.29002 6.07251 50.16061 3.13504 74.45063

Source Model Error Corrected Total

DF 4 16 20

Variable Intercept pcvalue pcdoct pcnurse pcvn

Parameter Estimate 7.11622 -0.84243 0.17416 -0.28438 1.49954

DF 1 1 1 1 1

Parameter Estimates Standard Error t Value 2.56261 2.78 1.51528 -0.56 2.95725 0.06 1.00699 -0.28 0.65146 2.30

F Value 1.94

Pr > |t| 0.0135 0.5859 0.9538 0.7813 0.0351

Pr > F 0.1533

Variance Inflation 0 2.87 3.16 1.56 1.20

Chapter 8 – Page 5


The multicollinearity is greatly reduced, but now there is no significant evidence that any of the variables are related to death rates. Most likely, the relationships seen in part A were just due to the relationship of all variables to population. You cannot compare the R-squared since you have changed the dependent variable.

EXERCISE 7 a) µ(y/x) = 219.27461 + 77.72454(time) Distance significantly increases with time, but a

1 0 0

linear model is probably not satisfactory. The residual plot shows nonlinearity. It also

misrecorded.

Residual

0

suggests that the 18th observation may have been

-1 0 0

-2 0 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 01 2 0 01 4 0 01 6 0 01 8 0 02 0 0 0

Predicted

b) µ(y/x)= 178.07841 + 93.10596(time) 0.72874(time²). The addition of the square term does improve the model. The size of

shows curvature (see the residual plot for c). Negative coefficient for quadratic term shows that the effect of increase in time is less at higher times than at lower times (rate of increase is slowing). c) Residual plot for quadratic model: A model using time and square of time was better but makes it clear that the 18th

Residual

the residuals is somewhat reduced, but still

9 0 8 0 7 0 6 0 5 0 4 0 3 0 2 0 1 0 0 -1 0 -2 0 -3 0 -4 0 -5 0 -6 0 -7 0 -8 0 -9 0 -1 0 0 -1 1 0 0

2 0 0 4 0 0 6 0 0 8 0 01 0 0 01 2 0 01 4 0 01 6 0 01 8 0 02 0 0 0

Predicted

observation was probably incorrect. Note: replacing square of time with log(time) or square root of time are even better models.

Chapter 8 – Page 6


Chapter 8 – Page 7


EXERCISE 8 a) Fitted goalmade = -67.365 – 0.142*dash100 + 1.203*height – 0.011*weight, MSE = 0.57508, F(3,21)=127.10 b) The largest VIF is 1.011, so there is no multicollinearity (surprising since both height and weight are present).

c) The third observation (height = 70, weight=170,

Jackknifed residual

goalmade=11) is an outlier, using the jackknifed residuals.

d) The residual plot suggests some mild nonlinearity, which can be improved by fitting a full quadratic model for height, weight, their squares, and the interaction height*weight. (Dash100 is not helpful, and was dropped.) However, the outlier remains, and one suspects the outlier is partially responsible for the apparent

Predicted value

nonlinearity.

EXERCISE 9

a) COOL SEASON

5

coolweight= -2.63755 +

4

0.43937(coolwidth) +

3

0.11038(coolheight) 2

Residual plot shows nonconstant variance and nonlinearity, suggesting a

1

transformation of the dependent variable

0

and possibly the independent variables.

-1

-2 -1

0

1

2

3

4

5

6

7

8

9

P re d i cte d va lu e

Chapter 8 – Page 8


WARM SEASON Warmweight= -2.11678 + 5

0.20691(warmwidth) + 0.11846(warmheight)

4

This residual plot also shows nonconstant

3

variance and possible nonlinearity.

2 1 0 -1 -2 -3 -4 -5 -1

0

1

2

b) transforming all variables to logarithms.

3

4

5

6

7

8

9

10 11

P redi cted val ue

COOL Season 2

logcoolweight= -4.59650 + 1.57149(logcoolwidth) + 1

0.74670(logcoolheight) The residual plot is greatly improved. 0

Weight increases as width and height increase, which makes sense. -1

WARM SEASON logwarmweight= -4.42125 + -2

1.66939(logwarmwidth) +

-2

-1

0

1

2

3

P redi cted l og w ei ght

0.20887(logwarmheight) The residual plot is greatly improved. 2

Weight increases as width and height increase, which makes sense. Note that the coefficient with 1

respect to height seems somewhat less in warm season rather than cool.

0

-1

-2 -3

-2

-1

0

1

2

3

P redi cte dlo gw ei gh t

Chapter 8 – Page 9


EXERCISE 10 a) The fitted model suggests increasing survival time with increasing CLOT, PROG, ENZ and LIV (though the latter is not significant. However, as the residual plot shows, the relationship is both nonlinear and has variance that grows with magnitude of predicted value, suggesting a logarithmic transformation.

b) The best residual plot is from a model using log(time), but leaving the other variables unchanged (see below). All variables except LIV have significant positive impacts on log survival time. You cannot compare R-square or MSE to the results of part (A) because the dependent variable has been changed, but the residual plot is now a featureless blob, indicating we have reduced curvilinearity and heteroscedasticity.

Predicted ln(Surv) =

1.1254+0.1578*clot+0.213*Prog+0.0218*Enz + 0.0044*Liv

F(4,49) = 430.98, MSE = 0.01188 0 .4 0 .3

Residuals

0 .2 0 .1 0 .0 -0 .1 -0 .2 -0 .3 3

4

5

6

7

Predicted

Chapter 8 – Page 10


EXERCISE 11 crimes= -10.30475 + 0.37775(age) + 2.29391(sex) + 0.17933(college) + 0.29339(income) The residual plot shows some evidence of nonconstant 7

variance.

F

M

6 5

M

4 F

Residual

3

M

F

F

2 F F

1 F F

0

FM F M M F F

F

-1 F -2

M F

F

M M

F

F

M MM F F

M F

F F F

-3

M F

-4 M

M

-5

5 6 7 8 91 01 11 21 31 41 51 61 71 81 92 02 12 22 3

Predicted value s ex

FFF0

MMM1

A transformation of all variables except SEX using logarithms stabilized the variances but suggested some

0 .4

nonlinearity. After exploration, a somewhat better

0 .3

model was obtained by) dropping LCOLLEGE (not

0 .2

M

LAGE). The residual plot is shown to the right. There is still some suggestion of nonlinearity.

Residual

significant, probably due to multicollinearity with

F F

F F

F M M FF M

0 .1 F 0 .0

FM

F

F

F -0 .1 F

F F

M M M M M M F FMF F M

F F

F F

-0 .2

F M M

Log(crimes_=-3.5074

-0 .3

+0.597Log(age)+.1514Sex+1.083Log(Income)

-0 .4 1 .6

M

F F 1 .8

2 .0

Holding other variables constant, the perception of

2 .2

2 .4

2 .6

2 .8

3 .0

3 .2

Predicted value

crimes increases with age and income. Men tend to perceive more crimes than women of the same age and

se x

0

1

income.

EXERCISE 12 a) Fitted model predicted ln(Water) = 2.055+0.301X-0.071X2 with R2=0.9895 b) Fitted model predicted ln(Water) = 2.047+0.304T-0.061T2 with R2=0.9914

Chapter 8 – Page 11


c) The second model fits slightly better, since it does a better job of mimicking the ‘flattening’ of the curve on the far right.

Chapter 8 – Page 12


EXERCISE 13 a) HT = 15.994 + 2.331*DFOOT + 0.309*HCRN, F(2,61)=24.41, p value < 0.0001 Plot of residuals versus predicted value appears featureless, so curvilinearity and heteroscedasticity are minimal. A model using sqrt(DFOOT) is also sensible, but is very similar. b) However, R-square only 0.4445, this model does not do a good job of accurately predicting HT.

EXERCISE 14 a)Note: price is in 1000s of dollars and size is in 1000s of sq ft. The selected model is Price = 36.93 -0.319(AGE) – 11.868(BED) + 61.774(SIZE) The coefficients are similar to those in the full model discussed in Example 8.2. The increase of price with size and the decrease of price with age, other values being held constant, makes sense. The negative coefficient with Bed was explained in Example 8.2. aii) The full data set (table 1.2). Expressing price in 1000s of dollars and house size and lot size in 1000s of sq ft, the selected model is Price = -42.466 – 1.05(AGE) +90.623(SIZE) + 0.303(LOT) This model makes sense regarding the signs of the coefficients, but is much different from the model fit on the lower priced homes. Its residual plot shows evidence of nonlinearity and increased variance on the right side. If I was looking for only lower priced homes, I would use the first model. If I was looking for higher priced homes, I would re-fit a model using only more expensive homes. aiii) A large positive residual may signal a home that is overpriced, while a very large negative residual may signal a bargain (or a home with a problem!) However, until a suitable model without curvilinearity is built, then the residuals don’t mean very much. As this problem stands, the only exceptionally large residuals are for the large homes that are showing higher prices than we would expect. b) Price=-42.14 + 79.9(SIZE)-1.01(AGE)+0.29LOT+13.0GARAGE Garage slightly improved prediction, but model still shows nonlinearity. →Log price showed slightly better residual plots with less curvilinearity, and some possible ‘bargains’ at the low end of the price range. EXERCISE 15 Ho: 2 = 3 = 0 , H1: at least one coefficient is nonzero F (2,46) = [(932-901)/2](901/46) = 0.79, p value 0.4599, not significant. Model 2 does not fit significantly better.

Chapter 8 – Page 13


EXERCISE 16 a) F =

(256 − 194)/ 1 194 / 46

= 14.7 with 1 and 46 df. Since this is well beyond the critical value of the F

table even using  = 1%, there is significant evidence that X3 is related to Y, after controlling for X1 and X2. b) t =  F with the sign matching the sign for the parameter estimate. So t = 3.83 with 46 df. Once again, there is significant evidence that X3 is related to Y, after controlling for X1 and X2. c) t = ̂1 / std.error = 3.83 = 2.1 / std.error so the standard error must be 0.548. A 95% confidence interval would be 2.1  2.013 * 0.548 using the value from the t-table with 46 df (you can use TI-84 or Excel to get this t critical value). With confidence 95%, the expected increase in Y if X3 increases by 1 and other variables remain constant, is between 0.997 and 3.203.

EXERCISE 17 a) 16

B

14

G

12 10

G

8 6 4

B 0

1

2

3

4

5

x2

b) For low values of X2, girls tend to score higher than boys. But for high values of X2, boys tend to score higher than girls. c) girls-boys: = (0 + 1 + 32 + 33) − (0 + 32 ) = 1 + 33 d) 5-1.5*3= 0.5

Chapter 8 – Page 14


EXERCISE 18 a) b) When X2 is very low, the value of Y

9

will decrease slightly as X1 increases. However, when X2 is

8

high, the value of Y will increase steeply as X1 increases.

Predicted Y

1 0

7

c) No, ̂1 = 1.5 is the expected increase in Y if X1 increase

6

5

by 1, provided that X2=0.

4

3 0

1

2

X1 x2

-1

1

d) Now the value of Y tends to increase as X1 increases, no matter what the value of X2. 8

Predicted Y

7

6

5

4

3 0

1

2

X1

x2

-1

1

Chapter 8 – Page 15


EXERCISE 19 a) Model 1: R-squared = 0.07 = SSR/45.778, therefore SSR = 3.2, SSE=42.578, F(2,97) = 3.65, p value 0.0298. There is significant evidence that at least one of the two independent variables is related to RO Model 2: R-squared=0.19 = SSR/45.778, therefore SSR = 8.70, SSE=37.08, F(4,95) = 5.57, p value 0.0005. There is significant evidence that at least one of the 4 independent variables is related to RO b) t(95) = -3.29, significant. Since the coefficient is negative, workers in diversion have higher expected RO than those in non-diversion, all other independent variables held constant. c) CI for coefficient for type of work = -0.56 ±1.985*0.17 = (-0.897, -0.222) d) F(2,95) = [42.574-37.08)/2](37.08/95) = 7.04 p value 0.0014. There is significant evidence that at least one of type of work or employment length are related to mean RO, after controlling for social support and cultural competency.

EXERCISE 20 Model 1 must have SSE = (1-0.05)*33.7 = 32.015 with 217 df. Model 2 must have SSE = (1-0.34)*33.7 = 22.242 with 214 df. F = [(32.015-22.242) / (217-214)]  (22.242/214) = 31.34 with 3 and 214 df. There is significant evidence that at least one of the burnout scores is related to frequency of complaints, after controlling for gender and age.

Chapter 8 – Page 16


CHAPTER 9 EXERCISE 1 The two-way ANOVA indicates significant evidence for each type of effect. Source

DF

Type III SS

Mean Square

F Value

Pr > F

a

1

56.25000000

56.25000000

85.88

<.0001

t

3

8.66750000

2.88916667

4.41

0.0414

a*t

3

18.62000000

6.20666667

9.48

0.0052

The contrast of the Control treatment versus the average of the other three treatments (coefficients 3, -1 , -1, -1) shows that the average of the other three treatments do differ significantly from the control. The p value here has not been adjusted for the multiple comparison problem. Contrast

DF

Contrast SS

Mean Square

F Value

Pr > F

vs control

1

8.16750000

8.16750000

12.47

0.0077

Using Tukey’s HSD to compare the Treatment main effects shows that M, N, and Q do not differ significantly from each other. While Q is different from C, M and N are intermediate and not significantly different from the control. Inspection of a profile plot suggests that Factor A has very little effect in the control treatment, but for the other treatments level A yields a higher value on average than level B. This is the source of the interaction.

EXERCISE 2 Sum of Source

DF

Squares

Mean Square

F Value

Pr > F

Model

11

79.54833333

7.23166667

10.37

0.0002

Error

12

8.37000000

0.69750000

Corrected Total

23

87.91833333

Source

DF

Type III SS

Mean Square

F Value

Pr > F

a

2

38.20333333

19.10166667

27.39

<.000

c

3

33.54833333

11.18277778

16.03

0.0002

a*c

6

7.79666667

1.29944444

1.86

0.1691

Chapter 9 – Page 1


Since we do not have significant interactions, we can compare main effects only using Tukey’s HSD. For Factor A, Level M is significantly lower than R or P (which do not differ from each other). For Factor C, Level C is significantly higher than all the others. B appears next highest, but B and D do not differ, though B is higher than A.

EXERECISE 3 An ANOVA (using Proc GLM) with dependent variable log(time) showed a significant main effect for LIVCAT (two categories).

A cell mean plot shows that those with low liver function tended to have shorter log(time).

The plot suggests that the effect of liver function is greater among those with better prognosis, but this interaction was not strong enough to be significant. To test whether LIV has any kind of effect (either main or interaction), construct F test based on full and restricted model. Full model SSE=10.753, df=50; Restricted Model (w/o LIV or interact.) SSE=15.579, df=52, F(2,50)=11.2, p value < 0.0001. Yes, knowing LIV does improve prediction of survival time, after controlling for PROG.

Chapter 9 – Page 2


EXERCISE 4. Source

DF

Type III SS

Mean Square F Value

Pr > F

period

1

13.22500000

13.22500000 26.02

<.0001

day

4

13.65000000

3.41250000

6.71

0.0006

day*period

4

9.65000000

2.41250000

4.75

0.0044

There was a change due to period (before and after the rules), but according to the significant interaction, the change was greater on some days than on others. Examining the cell mean plot, the reduction in addons was primarily on Mondays.

Chapter 9 – Page 3


EXERCISE 5 Since the dependent variable was in the form of percentages, they were transformed to Y=arcsin(sqrt(survive/100)). Sum of Source

DF

Squares

Mean Square

F Value

Pr > F

Model

5

4.42484933

0.88496987

8.08

0.0001

Error

24

2.62772434

0.10948851

Corrected Total

29

7.05257367

Source

DF

Type III SS

Mean Square

F Value

Pr > F

concen

1

2.23508411

2.23508411

20.41

0.0001

fung

2

1.28497456

0.64248728

5.87

0.0084

concen*fung

2

0.90479067

0.45239534

4.13

0.0287

The profile plot (still using the transformed values of the dependent variable), help us interpret the ANOVA. The high concentration results in significantly higher proportions of seeds that

1 .2

survive. This is the main effect for concentration.

1 .1 1 .0

The effect of concentration is much greater for

0 .9

fungicide A than for the other fungicides. This

main effect for Fungicide. At low concentrations, the fungicide don’t differ very much, but at high

0 .8

Mean Y

explains the interactions. It is also the source of the

0 .7 0 .6 0 .5 0 .4

concentrations, fungicide A is much better.

0 .3

Tukey’s HSD was used to compare the 6

0 .2

combinations of fungicide and concentration. Fungicide A, at concentration 1000, has a significantly higher mean (transformed) survival

H

H H

L

0 .1

L L

0 .0 a

bc

Fungicide

than any other combination.

Chapter 9 – Page 4


EXERCISE 6

The scatter plot suggest a curvilinear

50

relationship of growth rate to nutrient 40

concentration. A quadratic regression Growth

yielded Predicted growth = 7.5238 + 1.2964(NC) -0.0143(NC2)

30

20

with SSE = 1040.54 and 33 df. 10

A one-way ANOVA using 9 groups had SSE

0 0

= 959.32 with 27 df.

10

20

30

40

50

Nutrient Concen.

The lack of fit test gave F = [(1040.54-959.32)/6] / (959.32/27) =

0.38 with 6 and 27 df.

There is no significant evidence that the quadratic model does not fit. We may take the quadratic as a model. NOTE: Comparing the one-way ANOVA model to a simpler linear model had lack-of-fit F(7,27) = 1.81, p value = 0.126. Not significantly worse than the ‘full’ model, but quadratic model is significantly better than linear model.

EXERCISE 7 The profile plot, where plotting symbol is the number of the level for TEMPR, is

is greater for higher levels of CLEAN than it is at low levels of CLEAN. At higher levels of CLEAN, the higher values of ELAST seem to be found at intermediate levels of TEMPR (levels #2=1.65 or #3 =2.38) rather than the highest or lowest levels. The ANOVA confirms both the main effects and interactions.

Mean Elast

shown to the right. Clearly, the effect of TEMPR

1 5 0 1 4 0 1 3 0 1 2 0 1 1 0 1 0 0 9 0 8 0 7 0 6 0 5 0 4 0 3 0 2 0 1 0 0

3 2

4 3 2

1

4 3 4 1 2 1 0 3 4 2 0

2 3 4 1 0

1 0

0 1

0

2

CLEAN

SSE = 872.69, df = 50

Chapter 9 – Page 5


Source

DF

Type I SS

Mean Square

F Value

Pr > F

clean

4

86093.77441

21523.44360

1233.17

<.0001

tempr

4

22554.07993

5638.51998

323.05

<.0001

clean*tempr

16

24042.93605

1502.68350

86.09

<.0001

Given the strength of the interactions, to further investigate the comparisons, we would want to treat this as a very large one-way ANOVA with 25 groups. The precise list of comparisons would depend on the goal. If we wish to identify the combination of TEMPR and CLEAN which gives the highest ELAST, we should make the pairwise comparisons among all 25 groups (using Tukey’s HSD on the 25 groups, CLEAN=2.0 and TEMPR =1.65 or 2.38 are significantly better than any other combination). If, however, we wish to describe the best value of TEMPR within each selection of CLEAN, then we have a different set of comparisons to make.

Alternatively, we can treat this as a multiple regression. µ(ELAST|x) = -8.03515 -30.95195(clean) + 36.27533(tempr) + 23.78724(clean²) -12.46042(tempr²) + 17.65392(clean*tempr) However, the SSE of this model is SSE=13313.57 with 69 df. Lack of fit test has F(19,50)=[(13313.57872.69)/(69-50)] / (872.69/50) = 37.5 By comparison to the two-way ANOVA, we see this regression does NOT fit the data. Higher order polynomial and cross product terms are needed.

Chapter 9 – Page 6


EXERCISE 8. Since we do not have the original replicated data, we in essence only have one observation per cell. Hence, we must use the SS for the highest order interaction to estimate the error variance. Also, the interactions with the nitrogen application rate seem weak, so we will delete them also. The ANOVA for the final model is shown below. The effect of nitrogen is relatively simple to describe: the more the better. Tukey’s HSD applied to the main effects shows the 150 rate gave significantly higher yields than the 60 rate. The 120 and 90 rates are intermediate and not significantly different from any other rate. A profile plot is helpful in understanding the effect of variety and location. At most locations, the varieties are fairly similar. However, at location E, variety L is much better.

We can fit a linear trend in Nitrogen application: t = 3.92 with 33 df. There is a significant linear trend in nitrogen. Source Model Error Corrected Total

DF 14 33 47

Squares 68117055.67 7281838.33 75398894.00

Source variety nitapp location variety*location

DF 2 3 3 6

Type III SS 12448648.62 3439716.67 34391297.50 17837392.87

Sum of Mean Square 4865503.98 220661.77

F Value 22.05

Pr > F <.0001

Mean Square 6224324.31 1146572.22 11463765.83 2972898.81

F Value 28.21 5.20 51.95 13.47

Pr > F <.0001 0.0047 <.0001 <.0001

Chapter 9 – Page 7


EXERCISE 9 The sums of squares can be obtained from the means and standard deviations given in the table. For example TSS=119*(4.902)2 = 2=2859.5, SSE=29*(3.72 + 4.12 + 4.42 + 4.82) = 2114.1, some small rounding errors will occur when compared to the values obtained from squaring the cell means. Source

df

SS

MS

F

P value

Model

3

745.4

248.5

13.43 < 0.0001

Political

1

270

270

14.81 0.0002

CommDisSt 1

367.5

367.5

20.16 < 0.0001

Interaction

1

107.9

107.9

5.87

Error

116 2114.1 18.225

0.0169

Examining the cell means, we see that those living in Hot Spots will have significantly lower scores than those in less affected communities. There is an interaction, with the effect of community disease status being smaller among those who identify as liberal. In general, liberals tend to score lower than conservatives.

EXERCISE 10. The residuals from the first ANOVA

0.7

using weights as the dependent variable showed a

0.6

values. Two transforms, log(1+weight) and sqrt(weight), were tried and the square root transform seemed slightly better. The profile plot for the

N

Mean sqrt weight

tendency towards greater variance at higher predicted

K

0.5

R

R B N

K N

K R

0.4 B

0.3 0.2 B

sqrt(weight) is shown to the right. Clearly, all the

0.1

varieties are better at the lower temperatures, and this

0.0

is reflected in the significant evidence for a main effect

N R K B

15

of temperature. Given the lack of significant evidence

20

25

30

Temperature

for an interaction, primary interest is on the varieties that have the highest main effect. Using Tukey’s HSD, variety BUR has overall lower mean weight than NOR or KEN, but not significantly lower than RLS. The other varieties do not differ significantly among themselves.

Chapter 9 – Page 8


Sum of Source

DF

Squares

Mean Square

F Value

Pr > F

Model

15

3.42854201

0.22856947

5.15

<.0001

Error

80

3.54720955

0.04434012

Corrected Total

95

6.97575156

Source

DF

Type III SS

Mean Square

F Value

Pr > F

temp

3

2.42938098

0.80979366

18.26

<.0001

var

3

0.55899201

0.18633067

4.20

0.0082

temp*var

9

0.44016902

0.04890767

1.10

0.3704

EXERCISE 11 Using only treatments 1 through 9, we can perform a two-way ANOVA. There are significant effects due to PREP but not for the interaction with GRAIN. Source

DF

Type I SS

Mean Square

F Value

Pr > F

grain

2

621.262344

310.631172

2.93

0.0635

prep

2

1551.674144

775.837072

7.33

0.0018

grain*prep

4

784.949511

196.237378

1.85

0.1353

Pairwise comparisons of the main effects for PREP show that BSB has a significantly higher expected value than either WHOLE or DECORT (which do not differ significantly from each other). To compare to the control group, we used a one-way ANOVA with all 10 treatment groups. The contrast compared the average of the means in the first 9 groups to the control group, coefficients for TRT were -1, -1, -1, -1, -1, -1, -1, -1, -1, 9. F(1,50)=38.51, p < 0.0001. There is significant evidence that the average of the mean values in the treatment groups differs from that in the Control group. Inspecting the sample averages, the Control group has a HIGHER mean biovalue. Note that this has a different MSE than the first analysis.

Chapter 9 – Page 9


EXERCISE 12 There are no significant interaction effects, so we can focus on the significant main effects for program.

Using Tukey’s HSD to compare the main effects, we find that the mean Risk for those trained by Program A is significantly worse than either Program B or C. Programs B and C do not differ significantly, you may choose either of those.

Program

A

B

C

Mean Risk

24.90 a 15.30 b 16.20 b

Means with same letter Are not significs. diff.

EXERCISE 13 a) Source

df

SS

MS

F

Model

5

92

18.4

4.94

Test

2

15

7.5

2.01

Distraction

1

69

69

18.5

Test*Dist.

2

17

8.5

2.28

Error

22

82

3.727

Corrected

27

174

b) At =5%, the only significant effect is the main effect for distraction. Scores in one Distraction condition were consistently higher than in the other condition, over all versions of the test. But without the cell means, we don’t know which distraction condition was higher. c) the SS for the main and interaction effects don’t sum to the Model SS.

Chapter 9 – Page 10


EXERCISE 14 Source

df

SS

MS

F

Model

7

1840

262.857

3.613

Medication

3

910

303.333

4.170

Dose

1

500

500.000

6.873

Med*Dose

3

430

143.333

1.970

Error

40

2910

72.750

Corrected

47

4750

Using EXCEL to assign p values to the F statistics, there is significant evidence of main effects for Medication and Dose. However, there is no significant evidence of an interaction.

EXERCISE 15 a)

7

b) The plot and test statistics are consistent in showing a very strong main effect of Outcome, with participants showing the highest

mean out. satis.

L 6 5 4

H

3

L

2

H L

1 E q

satisfaction when Outcome is perceived

B tr

W s r

Outcome

as Equal. There is a weak main effect for Cog. Busyness, with a tendency for those with the Low Busyness to have less satisfaction. However, this tendency seems strongest in the Better Outcome category, leading to a significant interaction. c) 6*5/2 = 15 d) Yes, as noted in (b), the difference between the High and Low Busyness categories is most pronounced in the Better outcome category. There is less of a difference in the Equal and Worse categories. Means when outcomes Equal significantly better than in any other groups.

Chapter 9 – Page 11


EXERCISE 16 5

means, and 2 to mark the Minority source cell means.

The plot shows strong interactions. Among those who did not change their attitudes, those who were told their source was a

Mean Beh. Intent.

a and b) The plot uses 1 to mark the Majority source cell

2

4

3

1

1

2

Majority opinion had slightly higher mean Behavioral Intention. Among those who did change their attitude, those

2 No

Yes

Attitude Change

who were told their source was a Minority opinion had much

higher mean Behavioral Intention. The significant interaction shows that there is significant evidence that the effect of source status differs among those who did and did not change their intention. While the main effects are significant, the interactions are so strong that the main effects have little meaning. c) Labeling the cell means using subscripts that correspond to the table, the contrasts are: #1: Ho : 11 − 21 = 0 , L̂ = 3 − 2.29 = 0.71,t =0.71 /

2.248 * (1 / 16 + /17) = 1.36

#2: Ho : 12 − 22 = 0 , L̂ = 2.40 − 5.00 = −2.60, t= −2.60 / 2.248 * (1 / 10 + /10) = −3.88 Each t-statistic has 53-4=49 df. To control the experimentwise error rate at 5%, apply Bonferroni’s method and set the comparisonwise rate for each hypothesis at 2.5%. Reject Ho if |t| > 2.31 There is no significant difference in the means by source status for those who did not change their attitude, but there is a significant difference among those who did change their attitude. Yes, these results are consistent with the authors’ statements.

EXERCISE 17 Full model with all factors and their interactions had SSE = 447.50 and df = 72. Reduced model dropping all terms involving TIME had SSE=542.822, df=88, F(16,72) =

(542.833 − 447.5) / (88 − 72) 447.5 / 72

= 0.96

p value is 0.509. There is no significant evidence that TIME has any kind of effect.

Chapter 9 – Page 12


EXERCISE 18 Applying the coefficients for the quadratic trend to each sample cell mean, L̂ = 267.7900, t = 267.7900 /

20.2241 * (60 / 3) = 13.3

Applying the coefficients for the quadratic trend to each sample cell mean, L̂ = −72.8399, t = −72.8399 /

20.2241 * (12 / 3) = − 8.1

The t statistic has 24 df. Using α = 2.5% (to control for multiple comparisons using Bonferroni) reject Ho if |t| > 2.39. Reject Ho for each contrast. The significant linear trend has a positive coefficient, apparently the expected yield increases with increasing phosphorous. However, here is significant evidence of a quadratic trend and its coefficient is negative, meaning that the increase in yield per unit increase of P is not constant and tends to fall off as P increases. Both these statements are consistent with the profile plot.

EXERCISE 19 a) For each contrast, the coefficients sum to 0 so these are legitimate contrasts. For each pair of contrasts, the products of the corresponding coefficients sum to 0, so they are orthogonal. For example, L1 and L3 have dot-product 1*0 + 1*1 +1*(-1) +(-1)*0+(-1)*1+(-1)*(-1) = 0. b) L4: the difference between Standard and the average of Multi and GasMiser is the same for 6-cylinder cars as it is for 4-cylinder cars. L5: the difference between Multi and GasMiser is the same for 6-cylinder cars as it is for 4-cylinder cars. c) The regression gives the same SSE (1.08433) as the two-way ANOVA. The regression gives a t = 0.43 for the Multi vs GasMiser comparison (L3) where the hand calculation gave 0.44, this is probably rounding error. d) The regression gives t = 4.25. Doing it using the cell means gives L̂ = 3.96,t = 3.96 / 1.08433 * (4 / 5) = 4.25 , so they do correspond.

Chapter 9 – Page 13


CHAPTER 10 EXERCISE 1

This is a randomized block design with sampling, where Replicate is block. The proper

denominator is the interaction. There is no significant evidence of a difference in mean oxygen consumption by treatment. Hence, this data does not support either of the hypotheses. Source

DF

Type III SS

Mean Square

F Value

Pr > F

trt

2

2.76847407

1.38423703

6.80

0.0046

rep

1

1.57597920

1.57597920

7.74

0.0103

trt*rep

2

2.12804180

1.06402090

5.23

0.0130

Tests of Hypotheses Using the Type III MS for trt*rep as an Error Term Source

DF

Type III SS

Mean Square

F Value

Pr > F

trt

2

2.76847407

1.38423703

1.30

0.4346

EXERCISE 2 Each replication is a block, and we have a 3x4 factorial (unreplicated) design within each block. The denominator for the test of SOLN, VAR, and SOLN*VAR uses the pooled SS for all the interactions with REP. Source

DF

Type III SS

Mean Square

F Value

Pr > F

rep

2

107.722222

53.861111

0.08

0.9234

*

soln

2

92764

46382

68.86

<.0001

*

var

3

9177.000000

3059.000000

4.54

0.0127

soln*var

6

1776.000000

296.000000

0.44

0.8445

Error: MS(Error)

22

14819

673.588384 2 0 0

The profile plot is consistent with the significant evidence for a

1 7 5

strong effect of the solution amount. Variety #1 (7) seems to

analysis confirms a significant main effect due to Variety.

Mean Weight

have a higher mean weight, while variety #3 the lowest. The

1 5 0 1 2 5 1 0 0 7 5 5 0 2 5 0 5

6

7

8

9 1 0 1 1 1 2 1 3 1 4 1 5

Solution amount

Chapter 10 - Page1


EXERCISE 3. a) 11.0

It appears that Variety A may have slightly

L

10.8

lower Yield, when averaged over Nitrogen Mean YIELD

10.6

levels. There may be an interaction, judging from the way that Variety B has higher observed yield under the low

H

10.4 10.2

L

H

10.0 9.8 9.6

nitrogen level (marked with an L on the

H

9.4

plot).

9.2

b)

L

A

Type III SS

B

C

Source

DF

Mean Square

F Value

YEAR

2

29.83474444

14.91737222

9.65

0.0018

VAR

2

6.23078611

3.11539306

2.02

0.1657

NIT

1

0.00027222

0.00027222

0.00

0.9896

YEAR*VAR

4

33.53125556

8.38281389

5.42

0.0059

YEAR*NIT

2

22.92551111

11.46275556

7.42

0.0053

VAR*NIT

2

15.66710278

7.83355139

5.07

0.0197

REP

3

1.98078333

0.66026111

0.43

0.7363

REP*YEAR

6

2.03080000

0.33846667

0.22

0.9650

REP*VAR

6

1.01085833

0.16847639

0.11

0.9941

REP*NIT

3

0.68818333

0.22939444

0.15

0.9292

REP*YEAR*VAR

12

4.81083333

0.40090278

0.26

0.9887

REP*YEAR*NIT

6

3.08036667

0.51339444

0.33

0.9101

REP*VAR*NIT

6

3.17314167

0.52885694

0.34

0.9043

VARIETY

Pr > F

Main effect for VARIETY: F(2,6) = [6.2308/2] / [1.0109/6] = 18.49, p value = 0.0027 There is significant evidence of a main effect for variety. Main effect for NITROGEN: F(1,3) = [0.0003/1] / [0.6882/3] = 0, p value = 0.975 There is no significant main effect for Nitrogen. Interaction effect for VAR*NIT: F(2,6) = [15.667/2] / [3.173/6] = 14.81, p value = 0.0048 There is a significant interaction, that is the effect of NIT is different in at least one variety. All these results are consistent with the profile plot. c) These results are specifically for random blocks (REPs) in these years, and not for years in general.

Chapter 10 - Page2


EXERCISE 4 In a classical split plot design, the experiment for the main plot (VARiety) is treated as a randomized block where block is the REPlication. Hence, the test for variety will use the REP*VAR interaction (sometimes called error(a)). The tests for TRT and VAR*TRT will used the pooled SS for the interactions with REP, excluding REP*VAR.

Dependent Variable: brate

Source *

DF

Type III SS

Mean Square

F Value

Pr > F

Trt

1

1.071019

1.071019

21.61

0.0002

trt*var

5

0.589569

0.117914

2.38

0.0799

rep(var)

18

2.938962

0.163276

3.29

0.0076

Error: MS(Error)

18

0.891963

0.049553

* This test assumes one or more other fixed effects are zero.

Source *

Var Error: MS(rep(var))

D F 5 1 8

Type III SS

Mean Square

4.142785

0.828557

2.938963

0.163276

F Value

Pr > F

5.07

0.0045

Chapter 10 - Page3


The profile plot is useful to understanding the results. The

4 .0

varieties with the highest blooming rate appear to be

3 .8 3 .6

Mean BRate

varieties 4 and 5, particularly with TRT=2 (the higher planting rate). Tukey’s HSD applied to the main effects for VAR, using the MS(REP(VAR) interaction as error,

3 .4 3 .2 3 .0 2 .8

showed that varieties 4 and 5 were significantly higher

2 .6

than 1 and 6, but could not confirm a difference with 2 and

2 .4 1

3.

2

3

4

5

6

Variety

Dependent Variable: mrate Source *

DF

Type III SS

Mean Square

F Value

Pr > F

Trt

1

4.687500

4.687500

0.09

0.7691

trt*var

5

481.937500

96.387500

1.83

0.1583

rep(var)

18

1362.875000

75.715278

1.43

0.2256

Error: MS(Error)

18

949.875000

52.770833

* This test assumes one or more other fixed effects are zero.

Source *

Var Error: MS(rep(var))

D F 5 1 8

Type III SS

Mean Square

160.937500

32.187500

1362.875000

75.715278

F Value

Pr > F

0.43

0.8252

MRATE does not seem to differ by any of the factors.

To choose the best combination, it would seem sufficient to focus on the best BRATE.

Chapter 10 - Page4


EXERCISE 5 a) The profile plot uses symbol to mark the number of the level for SALT, 1 = lowest and 4 = highest. The highest level of SALT (4) tends to have a 90

lower mean percent of seeds emerging at every day.

days, where an increasing amount of salt is clearly associated with a decreasing percentage of seeds emerging. The effect fades over time (a possible

Mean Emergence

The effect of SALT is most noticeable early, at 5

70

2 1 3

1 2 3

80 1

1 3 2 4

4

2 4

60 50 40

3

30

interaction). By DAY 14, only the highest level of

20

salt seems different.

10

4

5

6

7

8

9 10 11 12 13 14

DAY

b) Salt is a between subjects factor, since each plot (REP, which is subject) was entirely treated with the same level of Salt. DAY is a within subjects factor. Source

DF

Type III SS

Mean Square

F Value

Pr > F

DAY

3

8129.395833

2709.798611

347.16

<.0001

SALT

3

4343.395833

1447.798611

185.48

<.0001

DAY*SALT

9

3451.520833

383.502315

49.13

<.0001

REP(SALT)

8

1666.666667

208.333333

26.69

<.0001

ERROR

24

187.3333

7.80556

Main effect for SALT: F(3,8) = [4343.396/3] / [1666.667/8] = 6.95, p value = 0.0128. There is significant evidence of a main effect for SALT. Interaction for Day*SALT: F(9,24) = 49.13, p value < 0.0001. There is significant evidence for a SALT*DAY interaction.

c) The hypothesis tests confirm the main effect for SALT, which is apparently of the form that the higher the SALT level, the less the percentage of emerging grass. However, as confirmed by the interaction, thiseffect varies by DAY, and seems to be most important at days 5 and 8.

Chapter 10 - Page5


EXERCISE 6. From the profile plot (averaging together data from the two 7

panels) there seems to be a strong effect due to temperature. 6

seems considerably worse than at the other temperatures. While all the meat tends to taste worse as time progresses, the effect is particularly noticeable among meat stored at the

Mean Rating

Meat stored at the higher temperature (level 4, 38 degrees)

5

4

3

highest temperature. 2 1 2 3 4 5 6 7 8 9 1 01 11 21 3

TIME

For the formal analysis, I treated this as a two-way factorial design (TIME and TEMP as factors) embedded in a randomized block where BLOCKS were the panels of judges. Normally, it is important to include a panel or judge effect, as panels can differ in their preferences or scoring attributes. PANEL is a random effect. The interaction of PANEL with the other factors are pooled used to estimate Error. Source

DF

Type III SS

Mean Square

F Value

Pr > F

PANEL

1

0.77841000

0.77841000

5.53

0.0366

TEMP

3

20.54393000

6.84797667

48.65

<.0001

TIME

4

15.27168500

3.81792125

27.12

<.0001

TEMP*TIME

12

4.88609500

0.40717458

2.89

0.0390

PANEL*TEMP

3

0.23781000

0.07927000

0.56

0.6496

PANEL*TIME

4

0.39696500

0.09924125

0.70

0.6036

Error

12

1.68921500

0.14076792

Pooled estimate of error is (0.23781+0.39697+1.68922)/19 = 0.1223. Test for TIME: F(4,19) = 3.818/0.1223 = 31.2, p value < 0.0001. There is a strong main effect for time. Test for TEMP: F(3,19) = 6.848/0.1223 = 55.99, p value < 0.0001. There is a strong main effect for temperature. Test for TIME*TEMP: F(12,19) = 0.4072/0.1223 = 3.33, p value = 0.0095. There is evidence of an interaction.

All these results are consistent with the profile plot.

Chapter 10 - Page6


EXERCISE 7 a) If Rep is nested within Light, as in Example 10.7, and the Rep(Light) effect is random, we get an analysis as shown here. There are significant main effects due to Light and Leaf but no interaction.

b) If both Light and Leaf are treated as quantitative variables, we might try a quadratic in both variables with a crossproduct. The predicted efficiency is given by 1.781 + 0.013LIGHT – 0.000 LIGHT2 +0.267LEAF -0.047LEAF2 + 0.0005LIGHT*LEAF This model has SSR = 2.463 with 5 df. c) By comparison, the model fit in part (a) has SSR=5.431 with 26 df (it is a much more complicated model). The lack of fit test is F(21,48) = [(5.431-2.463)/21] / 0.02236 = 6.32, p value < 0.0001, showing that the quadratic model does not fit this data very well.

EXERCISE 8 The profile plot suggests that Additive 1 is overall the best, 23

and Additive 3 the worst, but with such large variability among

Treating this as a randomized block with subsampling then the denominator of the F-test for a main effect for Additive would be the Car*Additive interaction. This F(2,4) = 3.39 has p value = 0.1379 which is not

22

Mean MPG

cars it isn’t clear whether the effect is significant.

21 20 19 18 17 16 1

significant, that is, there is no significant evidence of a

2

3

ADDITIVE

difference in the additives. This may seem surprising given the strength of the effect in the graph and in the ordinary (incorrect) two-way ANOVA. In essence, this analysis says that the observed difference Chapter 10 - Page7


could be explained as a chance effect due to the particular characteristics of the three cars in the experiment. Note that more repetitions per car/additive combination will not increase the small number of degrees of freedom in the denominator of the F-test for Additive. Perhaps another car would be needed. Results of Ordinary two-way ANOVA Source

DF

Type III SS

Mean Square

F Value

Pr > F

car

2

65.15166667

32.57583333

48.57

<.0001

additive

2

31.26166667

15.63083333

23.30

<.0001

car*additive

4

18.46666667

4.61666667

6.88

0.0006

Error

27

18.1100000

0.6707407

Table of Expected mean squares. Source

Type III Expected Mean Square

car

Var(Error) + 4 Var(car*additive) + 12 Var(car)

additive

Var(Error)

car*additive

+

4

Var(car*additive)

+

Q(additive)

Var(Error) + 4 Var(car*additive)

Tests based on table of Expected Mean Squares. Source

DF

Type III SS

Mean Square

F Value

Pr > F

car

2

65.151667

32.575833

7.06

0.0488

additive

2

31.261667

15.630833

3.39

0.1379

Error

4

18.466667

4.616667

Error: MS(car*additive) Source

DF

Type III SS

Mean Square

F Value

Pr > F

car*additive

4

18.466667

4.616667

6.88

0.0006

Error: MS(Error)

27

18.110000

0.670741

Chapter 10 - Page8


EXERCISE 9 These analyses were run as a split-plot design with COWID nested within COLOR as a random effect. 40

Surface: The profile plot suggests that there are very

B

39

Mean Surf. Temp

strong treatment effects, with Fan and Mist doing the best job of keeping the cows at a lower surface temperature. There is not much of an effect due to color (used as the plotting symbol), nor is there a very strong interaction.

W

38 W

37 B

36

B

The ANOVA is in agreement, there are strong main

W

B W

fan

mist

35

effects for Treatment, but no significant evidence of a main effect for color or a Treatment*Color interaction.

shade

sun

Treatment

Source

DF

Type I SS

F Value

Pr>F

trt

3

73.54093750

16.61

<.0001

cowid(color)

2

1.7856

0.60

0.5550

trt*color

3

3.17593750

0.72

0.5523

Tests of Hypotheses Using the Anova MS for COWID(Color) as an Error Term Source

DF

Anova SS

Mean Square

F Value

Pr > F

color

1

0.00281250

0.00281250

0.00

0.9603

40.2 40.1

shows less clarity. There is possibly a tendency for

40.0

cows with the Fan treatment to have lower temperatures. There may be a tendency for white cows to have lower temperature, though there is an exception in the shade group. The ANOVA does not find any significant effects, either for main effects or

Mean Rect. Temp

Rectal: The profile plot for the rectal temperatures

B

B

W

39.9

W

W

39.8 39.7

B

39.6 39.5

B W

39.4

fan

interactions.

mist

shade

sun

Treatment

Source

DF

Type III SS

Mean Square

F Value

Pr > F

trt

3

1.25250000

0.41750000

1.79

0.1783

cowid(color)

2

0.57125

0.285625

1.23

0.3130

trt*color

3

0.52125000

0.17375000

0.75

0.5366

Tests of Hypotheses Using the Type III MS for cowid(color) as an Error Term Source

DF

Type III SS

Mean Square

F Value

Pr > F

color

1

0.06125000

0.06125000

0.21

0.6888

Chapter 10 - Page9


EXERCISE 10 An analysis using Route as block (a random effect) and year as a qualitative variable showed no significant main effect for year. However, a linear contrast for year is significant (F=6.46, p = 0.0143) showing a trend upward at a rate of about 11.70 per year. Source

DF

Type III SS

Mean Square

F Value

Pr > F

route

2

4172531

2086265

25.20

<.0001

year

24

2693219

112217

1.36

0.1822

Error: MS(Error)

48

3973337

82778

Contrast

DF

Contrast SS

Mean Square

F Value

Pr > F

linyear

1

534713.7323

534713.7323

6.46

0.0143

Fortunately, the residuals within each route/year are not strongly correlated with those of the previous year. However, the residuals show positive skew with larger variability for routes A and B (construct a boxplot).

An analysis using ln(BIRD) as the dependent variable produced more nearly symmetric and equally distributed boxplots of residuals, but also more clearly correlated errors. It also shows strong evidence of a linear trend in ln(BIRD).

EXERCISE 11 a) Averaging within each lab / material combination and submitting the resulting means to a randomized block design where Labs are the blocks yields exactly the same F values for LAB (F(12,72) = 7.64) and MATERIAL (F(6,72) = 135.48). We can no longer test the LAB*Material interaction. b) As we average together the n values within each cell, the resulting mean has variance  2 / n +  2 . So the leading two terms in the EMS for treatments has decreased by a factor of 1/n. However, the third term now has n=1, so the third term will also be decreased by a factor of 1/n. Hence, all terms in the numerator and denominator of the F-test will decrease by the same factor, and the F-value will not change.

Chapter 10 - Page10


EXERCISE 12 This is a repeated measures design, with a two-way factorial experiment embedded within each subject. The effects of interest are SHOCK, NOISE and the SHOCK*NOISE interaction, all within-subjects factors. The denominator for the test of each effect is its interaction with SUBJECT. The three-way interaction is used to estimate error. Both Shock and Noise have significant main effects but the interaction is not significant. Focusing on the main effects, mean response at Noise=80 is significantly higher than at 40. To compare main effects for Shock, we use Tukey’s HSD with error term set to the same Shock*Subject interaction used to construct the F-test. The mean response at Shock=0.75 was significantly higher than that at either Shock=1 or 0.5. The mean response at Shock=0.25 was significantly lower than at any other level. Source

DF

Type III SS

Mean Square

F Value

Pr > F

noise

1

65.0250000

65.0250000

42.29

<.0001

shock

3

219.2750000

73.0916667

47.54

<.0001

subj

4

361.8500000

90.4625000

58.84

<.0001

shock*noise

3

12.6750000

4.2250000

2.75

0.0890

noise*subj

4

6.3500000

1.5875000

1.03

0.4300

shock*subj

12

39.3500000

3.2791667

2.13

0.1020

Error

12

18.4500000

1.5375000

Source

DF

Type III SS

Mean Square

F Value

Pr > F

*

1

65.025000

65.025000

40.96

0.0031

4

6.350000

1.587500

noise

Error

Error: MS(shock*subj) * This test assumes one or more other fixed effects are zero. Source

DF

Type III SS

Mean Square

F Value

Pr > F

*

3

219.275000

73.091667

22.29

<.0001

12

39.350000

3.279167

shock

Error

Error: MS(noise*subj) * This test assumes one or more other fixed effects are zero. Source

DF

Type III SS

Mean Square

F Value

Pr > F

subj

4

361.850000

90.462500

27.17

0.0004

6.4322

21.413805

Error

3.329167

Error: MS(shock*subj) + MS(noise*subj) - MS(Error)

Chapter 10 - Page11


Source

DF

Type III SS

Mean Square

F Value

Pr > F

shock*noise

3

12.675000

4.225000

2.75

0.0890

Error: MS(Error)

12

18.450000

1.537500

If you analyze this data as a two-way factorial embedded in a block (subjects as blocks, as in Section 10.4) the model will not usually include the subject*treatment interactions. These will be pooled with the three way interaction to form the estimate of error, yielding 28 df for SHOCK, NOISE and SHOCK*NOISE tests. This may be reasonable if all the Subject interactions appear weak, but in repeated measures designs, they frequently are not weak, as subjects vary in their reactions to the treatments. This is the primary difference between a repeated measures analysis with two within-subjects factors, and a factorial embedded in a block.

EXERCISE 13 The individual fish within each bucket are not independent of each other, because there is possibly an effect for each bucket’s individual characteristics. Including an effect of bucket in the analysis should reduce the dependence between the individual fish. This bucket effect should be a random effect nested within Treatment level. The resulting F-statistics for the TREAT effect are WEIGHT: F(4,15) = 16.54, p < 0.0001 LENGTH: F(4,15) = 8.31, p = 0.001 RELWT: F(4,15) = 27.91, p < 0.0001 That is, feeding level does have a significant effect on each of these size factors. Profile plots suggest that increasing TREAT leads to larger size, until about TREAT=1 or 2. Thereafter, further increases do not lead to larger size.

Note that the individual buckets are independent of each other, so we can analyze the total, or average, weight of the fish in the buckets using a one-way ANOVA. The resulting F statistics are exactly like those given above.

Chapter 10 - Page12


EXERCISE 14 a) Now the design is a factorial experiment (SALT*DAY) embedded in a randomized block (REP). The interactions with REP are pooled to estimate experimental error. Source

DF

Type III SS

Mean Square

F Value

Pr > F

REP

2

251.375000

125.687500

2.35

0.1124

*

SALT

3

4343.395833

1447.798611

27.10

<.0001

*

DAY

3

8129.395833

2709.798611

50.73

<.0001

DAY*SALT

9

3451.520833

383.502315

7.18

<.0001

Error: MS(Error)

30

1602.625000

53.420833

* This test assumes one or more other fixed effects are zero.

b) The difference between the two problems is that in the description of problem 5, we actually have 12 plots (three at each level of SALT) that are observed repeatedly on 4 different days. This makes it a repeated measures model with DAY as a within-subjects factor and SALT as a between subjects factor (a subject is a plot, identified as a combination of REP and SALT). In problem 14, however, there are actually three locations (identified by REP), assumed randomly selected. Rather than repeatedly observing the same plot multiple times, we select 16 separate plots. Some are observed at day 5, others at day 8, etc. We might do this if the actual observation process of measuring emergence damaged the plot in some way (e.g. by walking on it or digging in it), making it impossible to measure the same plot repeatedly.

Chapter 10 - Page13


CHAPTER 11 EXERCISE 1 Source

DF

Type III SS

Mean Square

F Value

Pr > F

wwt

1

394.0805861

394.0805861

34.24

<.0001

weaned

2

289.8186081

144.9093040

12.59

0.0004

There is significant evidence that weaning time is related to FWT, even after adjusting for WWT. Pairwise comparisons using Bonferroni’s to control the family-wise error rate shows that the LATE group is significantly less than either the EARLY or MEDIUM, after adjusting for weaning weight. Check for unequal slopes: F= (195.6337-179.2549)/(17-15) / 11.9503259 = 0.6853 (p value 0.519) From Table A.4, critical value F(2,15)= 3.68, so we do not reject Ho that slopes are equal. However, piglets from within the same litter are probably not independent because of maternal and genetic effects. Piglets were not randomly assigned to weaning time (needed to keep all piglets of same litter together).

EXERCISE 2. a) Unequal variance t’(12.1)=0.94, p value = 0.3675. Pooled t test has similar conclusion, t(14)=0.94, p value = 0.3651. Though the Calcium plus exercise group had a slightly higher change in bone density, the difference was not significant. b) overall mean BMI = 25.9625 Among Calcium group, expected change = 3.8325 - 0.0956*25.9625 = 1.35 Among Ca+Ex group, expected change = 3.8325 + 0.4233 - 0.0956*25.9625 = 1.77 Difference is significant (t(13)=2.77, p value = 0.0158) c) Interaction formed by multiplying dummy for group times BMI did not significantly improve model (t (12)=-0.94, p value = 0.3666) d) ANCOVA is valid because treatment was assigned randomly. Source of difference is that BMI explained some of the variation in the response, reducing the size of the error variance.

EXERCISE 3 a) People in Class 1 have expected value 0 + 3PRE and those in Class 3 have expected value 0 + 2 + 3PRE . The expected difference is 2 with confidence interval 4.058  2.0066 2.664 =(0.78,7.33)

Chapter 11 – Page 1


b) People in Class 2 have expected value 0 + 1 + 3PRE . The difference between a person in Class 3 and one in Class 2 (having same PRE) is 2 − 1 . Confidence interval

(4.058 − (−0.957))  2.0066 2.664 − 2(1.095) + 2.504 = 5.015  3.463 = (1.55,8.48) c) Person in Class 3 with PRE = 6 has expected value 0 + 2 + 36 and the person in Class 1 with PRE=10 has expected value 0 + 310 . The difference is 2 − 43 . The confidence interval is

(4.058 − 4 * 0.773)  2.0066 2.664 − 2 * 4 * 0.026 + 16 * 0.029 = 0.966  3.429 = (−2.46,4.40)

EXERCISE 4 a) Using an ANCOVA, trees on the south side have the fitted equation Circum = -17.084 + 2.701*Rings While those on the north side have the fitted equation Circum = 40.366+2.701*Rings. The expected difference between two trees with the same number of rings, one on the north side and one on the south, is 57.45, which is significantly different from 0 (t = 8.13, p value = 0.0001). The overall mean number of rings was 27.714. Given this number of rings, the adjusted circumference on the north side is 115.2 and that on the south side is 57.8. Trees were not randomly assigned to sides of the mountain. While the number of rings tended to be higher on the south side, the difference was not significant. b) An interaction between Side and Rings was not significant (t = 1.68, p value = 0.1003). c) Growth is the growth rate per year, and is a direct measure of the relation of tree size to age. An independent samples t test (unequal variance version) can be used to compare growth rates on the two sides. The north side had substantially higher growth rates (t = 6.8 with 31 df, p value = 0.0001). EXERCISE 5 a) Expected difference = E(width – length) = E(width) – E(length) = 12 – 12 = 0 b) Variance = 0.0025 – 2*(0.0026) + 0.0064 = 0.0037 and the standard deviation is 0.0608

EXERCISE 6 a) The cell mean plot shows that for most risk groups, the mean hospital time is less in the Antiviral group than in the Placebo group. However, in 3 of the 5 older-age risk groups, there is very little difference. Risk Group digits are in the order of Diabetes, Hypertension, Imm. Suppression, Age.

Chapter 11 – Page 2


b) F(21,203)=12.12, p value < .0001. At least one group has a different mean hospital stay. c) The antiviral has a significant main effect (t(215)=-2.75, p value =0 .0064) showing it tends to reduce hospital time on average by 2.79 days. However, there is a significant interaction with AGE with a regression coefficient of 2.48 days that lessens the impact of the antiviral among older patients. This is consistent with the cell mean plot. This is the only significant interaction. Diabetes, Immune Suppression and Age are all risk factors that tend to increase hospitalization time. d) full model (part B) had SSE = 1794.79 and df = 203 Restricted model (part C) had SSE = 1907.525 with df=215 F(12,203)=[(1907.525-1794.79) / (215-203)] / (1794.79/203) = 1.06, p value = 0.396. No significant evidence that the model in part C does not fit, we can accept the model in part C.

Chapter 11 – Page 3


EXERCISE 7

a)

Proc ANOVA: MSE = 2.4819 Source

DF

Anova SS

Mean Square

F Value

Pr > F

pave

2

216.7736862

108.3868431

43.67

<.0001

tread

2

203.6755235

101.8377618

41.03

<.0001

pave*tread

4

22.1545082

5.5386271

2.23

0.1086

DUMMY VARIABLE using proc GLM with class statement for PAVE and TREAD: Source

DF

Type III SS

Mean Square

F Value

Pr > F

pave

2

233.5838333

116.7919167

47.06

<.0001

tread

2

212.4633531

106.2316765

42.80

<.0001

pave*tread

4

6.6989814

1.6747453

0.67

0.6186

Proc ANOVA is NOT NOT NOT for unbalanced multi-way ANOVA. Use PROC GLM.

b) Performing an ANCOVA with TREAD as the covariate, Source

DF

Type III SS

Mean Square

F Value

Pr > F

pave

2

232.8180697

116.4090348

52.31

<.0001

tread

1

219.0624849

219.0624849

98.44

<.0001

This model had MSE 2.2254, which is even better than the MSE from the full two-way ANOVA model (MSE = 2.482). The Tread*Pave interaction was not significant and was dropped.

c)

µ(y|x)= 26.19377022 + 28.65978927(friction) + 1.37431143(tread) with MSE = 2.474. This

model does not fit as well as the model in (B). The relationship is most likely non-linear in Friction.

EXERCISE 8 a) The overall mean dosage (X) is 6.014. At that dosage, the fitted mean half_life for drug A is 1.85 and that for drug G is 2.04. The fitted difference is 0.19 hours greater for Gentamicin. However, this difference is so small that there is no significant evidence that the drugs differ in their expected half-life, after controlling for dosage (t = 0.29, p value = 0.7714). b) There is no significant evidence that the slopes differ (t = -0.17, p value = 0.8694.) However, note that the dosage (the covariate) differs greatly for the two groups (that is, dosage is not independent of drug type), so that it is not reasonable to compare half-life for the two drugs at the same dose.

Chapter 11 – Page 4


EXERCISE 9

a)

Source

DF

Type III SS

Mean Square

F Value

Pr > F

medium

2

3137.391892

1568.695946

50.97

<.0001

minutes

3

1514.467662

504.822554

16.40

0.0005

medium*minutes

6

514.574392

85.762399

2.79

0.0812

There is no significant evidence of an interaction. There is significant evidence that the mean number of words differs by Medium and by number of Minutes. Using LSMEANs and Tukey’s HSD to compare main effects, it appears that mean number of words is higher at 10 and 20 minutes than at 5 and 15. Each of the mediums is significantly different, written material seems to be the highest. b) There seems to be a tendency for children in the longer time periods to drop out. This may produce a bias, if it is the weaker learners, who might be bored, who are falling asleep. There is also a tendency for students in the TV group to be less likely to drop out. Again, this may produce a bias.

EXERCISE 10 Note, all variables were transformed using natural logarithms giving a log-linear model. A dummy variable was created for season, IS = 0 for cool, 1 for warm. Two interactions were formed by multiplying IS times ln(WEIGHT) and IS times ln(HEIGHT). The full model including independent variables ln(WIDTH), ln(HEIGHT), IS, IS*ln(WIDTH) and IS*ln(HEIGHT) had SSE = 23.2702 and 53 df. The restricted model using only ln(WIDTH) AND ln(HEIGHT) had SSE = 31.33024 and df = 56. The F test for the hypothesis that the 3 parameters related to season are 0 is F(3,53) = [(31.33024 – 23.2702)/3] / (23.2702/53) = 6.12, which has p value = 0.0012. There is significant evidence that the relation between ln(weight) and ln(width) and ln(height) differs by season. Inspecting the fitted regression coefficients in the full model, it appears that the major difference is with respect to ln(height). In the cool season, the partial regression coefficient for ln(height) is +0.747. However, in the warm season, the partial regression coefficient is +0.747-0.538=0.209.

Chapter 11 – Page 5


EXERCISE 11 a) Gas price (in cents) = 6 0

112.727 + 2.258(oil price in dollars per

5 0

barrel)

4 0

b) The residuals are plotted against the previous 3 0

Residual

month’s residuals shows a moderately strong positive association. Residuals do not appear to be independent.

2 0 1 0 0

c) The Durbin-Watson statistic was 0.744 and

-10

there was significant evidence of a positive

-20 -30

correlation in the residuals. This is consistent

-40

with the plot in (b).

-40 -30 -20 -10

d) Since the residuals are positively correlated,

0

10

20

30

40

50

60

Previous residual

we would expect gas prices to be lower than predicted next month also.

EXERCISE 12 a) Here is the covariance matrix of the parameter estimates. Intercept

ISeason

LogHt

LogWt

LogHT*IS

LogWT*IS

Intercept

0.423

-0.423

-0.051

-0.155

0.051

0.155

Iseason

-0.423

0.647

0.051

0.155

-0.080

-0.211

LogHT

-0.051

0.051

0.041

-0.013

-0.041

0.013

LogWT

-0.144

0.155

-0.013

0.089

0.013

-0.089

LogHT*IS

0.051

-0.080

-0.041

0.013

0.061

-0.021

LOGWT*IS 0.155

-0.211

0.013

-0.089

-0.021

0.120

b) Mean ln(height)= 2.185 , mean ln(weight)= 2.287 Mean difference warm season – cool season using average values for ln(height) and ln(weight) is

iseason + islhd * 2.185 + islwd * 2.287 Point estimate = 0.175 – 0.538*2.184+0.098*2.287 = -.776 (warm less than cool for given height, weight) In the calculation of the variance, use a ' = (0

1

0 0 2.185 2.287)

Variance =0.042, std deviation = 0.203, t0.025(53) = 2.006, Confidence interval for difference in ln(weight) = -0.776 ± (2.006)*0.203 = (-1.18 ,-0.37). This is a confidence interval for log(wt) in warm – log(wt) in cool.

Chapter 11 – Page 6


EXERCISE 13 a) Among Gizzard Shad, a Two-way ANOVA with dependent variable LENGTH and factors NET_TYPE and mesh SIZE showed no significant evidence of an interaction. Source

DF

Type III SS

Mean Square

F Value

Pr > F

SIZE

1

913381.3159

913381.3159

3881.24

<.0001

NET_TYPE

1

85.5505

85.5505

0.36

0.5472

SIZE*NET_TYPE

1

461.1210

461.1210

1.96

0.1630

220

51773.107

235.332

Error

Dropping the interaction and fitting a model with only main effects showed significant evidence of an effect due to mesh SIZE (F(1,221) = 4968.49, p value < 0.0001) and due to NET_TYPE (F(1,221) = 4.42, p value = 0.0367). Mesh size seems to be the most important factor, with Mesh Size = 2 associated with much larger fish.

b) Among Other fish, a two-way ANOVA showed a significant interaction of NET_TYPE and Mesh SIZE (F(1,33)=4.72, p value 0.0371). Size had a strong main effect(F(1,33)=17.92, p value = 0.0002). Examining the means in the 2x2 table, it appears that for Mesh SIZE = 1, fish were larger with NET_TYPE = Mult, but when Mesh SIZE = 2, fish were larger with NET_TYPE = Mono. A profile plot is useful in understanding the interaction, the plotting symbol ‘1’ is for Mono and ‘2’ is for Mult type nets.

EXERCISE 14 Dropping the effect of P = Planting Rate (and all its interactions) and replacing it with NO = actual number of plants treated as a quantitative independent variable yields the following tests: Dependent Variable: TDM Source

DF

Type III SS

Mean Square

F Value

Pr > F

REP

3

4.730618

1.576873

1.11

0.4661

2.9966

4.246301

1.417055

Error

Error: 1.0007*MS(REP*WTR) - 0.0007*MS(Error)

Chapter 11 – Page 7


Source

DF

Type III SS

Mean Square

F Value

Pr > F

WTR

1

45.102474

45.102474

31.83

0.0110

3.0031

4.254776

1.416796

Error

Error: 0.9994*MS(REP*WTR) + 0.0006*MS(Error) Source

DF

Type III SS

Mean Square

F Value

Pr > F

REP*WTR

3

4.250757

1.416919

1.17

0.3270

NRATE

2

87.780296

43.890148

36.18

<.0001

NRATE*WTR

2

50.612990

25.306495

20.86

<.0001

no

1

30.817525

30.817525

25.40

<.0001

Error: MS(Error) 83

100.694635

1.213188

Note that the MS(ERROR) is much larger here (1.213) than it was in Table 10.22 (0.899) suggesting that dropping the interactions of Planting Rate with NRATE and REP was not a good idea. That is, it may be that the lines are not parallel.

Repeating the analysis using NO*WTR, NO*NRATE and NO*WTR*NRATE reduces the MSE (though only to 1.08). Given the strong interactions of NO with NRATE, an ANCOVA is not appropriate. Source

DF

Type III SS

Mean Square

F Value

Pr > F

WTR

1

8.687273

8.687273

8.04

0.0089

25.1

27.111166

1.080138

Error

Error: 0.2721*MS(REP*WTR) + 0.7279*MS(Error) Source

DF

Type III SS

Mean Square

F Value

Pr > F

REP*WTR

3

3.797345

1.265782

1.25

0.2966

NRATE

2

2.965979

1.482990

1.47

0.2369

NRATE*WTR

2

11.551206

5.775603

5.71

0.0048

no

1

35.342156

35.342156

34.97

<.0001

no*NRATE

2

16.910387

8.455193

8.37

0.0005

no*WTR

1

0.285493

0.285493

0.28

0.5966

no*NRATE*WTR

2

4.575578

2.287789

2.26

0.1108

Error: MS(Error)

78

78.837742

1.010740

Chapter 11 – Page 8


EXERCISE 15 a) Source

DF

Type III SS

Mean Square

F Value

Pr > F

zip

3

617174205

205724735

0.58

0.6297

size

1

30252281277

30252281277

85.55

<.0001

bed

1

2382102087

2382102087

6.74

0.0123

bath

1

65353263

65353263

0.18

0.6691

Using reference cell coding with zip=4 as the baseline class, the fitted model was Price (in 1000’s) = 39.54 +63.86(size in 1000’s) -12.46(BED) -2.95(BATH) -10.097(ZIP1) -5.825(ZIP2) -4.143(ZIP3) Zip does not seem to make a difference, AFTER CONTROLLING for SIZE. But if you do a one-way ANOVA using only ZIP, ZIP might be significant if bigger houses fall in one of the zip codes.

b) Source

DF

Type III SS

Mean Square

F Value

Pr > F

zip

3

13233407

4411136

0.02

0.9968

age

1

541740770

541740770

2.15

0.1503

size

1

13676600860

13676600860

54.40

<.0001

bed

1

1532827804

1532827804

6.10

0.0181

bath

1

31275485

31275485

0.12

0.7263

lot

1

587574347

587574347

2.34

0.1346

exter

2

549059509

274529755

1.09

0.3458

garage

1

2338586052

2338586052

9.30

0.0042

fp

1

43352447

43352447

0.17

0.6803

EXERCISE 16 a) The covariance matrix for Y = (output, demand) is 49 0  V =    0 100 

The expected difference is 500-500=0.

The variance is 49 + 2*(-1)*0 + 100 = 149, so the standard deviation of the difference is 12.21. b) Now the covariance is 0.7 49 * 100 = 49 so the covariance matrix is 49 49  V =  . The variance is 49 + 2*(-1)*49 + 100 = 51 and the standard deviation is 7.14. (Since 49 100 

the standard deviation is smaller than in part (A), there is less chance that the factory will have a large discrepancy between their output and demand.)

Chapter 11 – Page 9


EXERCISE 17 2 0

a) The plot shows no strong difference between the weight gains in the Girls (sex = 0) and in the Boys (sex = 1). This is consistent with the results of the independent samples t test (pooled t(22)=0.75, p value = 0.4692). There is no significant evidence that the mean weight change differs

1 0 c h a n g e

0

-10

for the boys and girls.

-20 0

1 sex

b) The ANCOVA shows that Girls are 2 0

expected to gain 10.8 kg more than a Boy of

B G

the same initial weight. The difference is 1 0

Notice that this is a different statement than in (A). Here we are comparing a Girl and Boy of

Change

significant (t(21)=-2.69, p value = 0.0138).

G G

B

G G G

0

B BB

G

B

B

B G

the same initial weight. That would have to be a heavier than average girl or a lighter than

G G

G

B

-1 0

B G

average boy.

B B

-2 0 4 0

c) Since the students are not randomly

5 0

6 0

7 0

8 0

9 0

Weight Before

assigned a Sex independently of the initial weight, and the mean weights differ so greatly for the two groups, then the ANCOVA is not appropriate.

Chapter 11 – Page 10


EXERCISE 18 a) Since boys have SEX=0, a ' = (1 0 1.4 0) . The point estimate is  −75.75 20.73  = (1 0 1.4 0)  34.12  78.48     −12.45 

The estimated variance of this estimate is  126.56 −126.56 −82.59 82.59   1   10.93  −126.56 288.07 82.59 −187.14 0 −10.93   = (1 0 1.4 0)   = 1.26 (1 0 1.4 0)  82.59 54.06 −54.06  1.4  −82.59  −6.91        82.59 −187.14 −54.06 121.87   9   6.91 

The standard error is 1.26 = 1.12 Approximating the t with 192 df as a standard normal, a 95% confidence interval would be 34.12 ± 1.96*1.12 = 32.92 to 36.32 kg. b) The point estimate for an actual boy would also be 34.42, but the estimated variance of this estimate is 1.26 + 34.91 (using the MSE as an estimate for  2 ). The confidence interval for an actual boy’s weight would be 34.12 ± 1.96*6.01 = (22.34, 45.90).

EXERCISE 19 a) From the fitted regression parameters, we estimate that a southern state (SOUTH=1) will have a homicide rate that is 0.257 higher than a non-southern state (SOUTH=0) that has the same values for the other independent variables. This difference is significant (p value = 0.01). b) Since the states were not randomly assigned to levels of SOUTH, it may be that the southern and nonsouthern states differ substantially with respect to the values of the other variables. Comparing a southern and non-southern state at some mid-level of the other variables may be a way of extrapolating outside the range of the data. c) Oddly, Divorce Rate (DIV) is more strongly associated with homicide rate, as it has a standardized regression coefficient with a much larger absolute value. Other variables, such as RD and YOUNG, are of about the same influence as SOUTH.

Chapter 11 – Page 11


EXERCISE 20 a) F(3,162) =

(166 − 4)(0.34)2 (1 − 0.342)

= 21.18 . Yes, there is significant evidence that at least one of the

independent variables is linearly related to child’s effort rating. b) The t test statistics for the null hypothesis that the coefficient is 0 are: AGE t = 0.12/0.03 = 4; SEX t = -0.24/0.14 = -1.71; CONDITION t = -0.06/0.14 = -0.43 so AGE is significant. c) We estimate that girls will rate effort, on average, 0.24 points lower than boys of the same age and condition. However, the difference between girls and boys of the same age and condition is not significantly different from 0 at  = 5%. For each additional year of age, children are expected to have a rating that increases by 0.12, compared to other children of the same sex and condition. This increase is significantly different from 0 at  = 5%.

Chapter 11 – Page 12


CHAPTER 12 EXERCISE 1 a) The sample proportion is 524/737 = 0.71099. With confidence 99%, the probability of a success for this set of dogs is between 0.668 and 0.754. Calculation: 0.71099 ± (2.576)*(0.0166976) b) Ho: all 4 dogs have the same probability of success. X²= 3.2268 with 3 df. Reject Ho if X² greater than 7.815 from table of Chi-squared. Do not reject Ho. Or from SAS, note that the p value is 0.3580 which is greater than 0.05. There is no significant evidence that the dogs differ in their probability of success. EXERCISE 2 X2=8.46 with 1 df, p value = 0.0036. There is a significant difference in the crime patterns in the two cities. While both cities show more auto thefts than robberies, in City C the proportion of auto thefts is even higher than in City B.

EXERCISE 3

Ho: pool and nonpool apts have the same probability of being leased by single occupants.

X²= (22-20.7)²/20.7 + (23-24.3)²/24.3 + (24-25.3)²/25.3 + (31-29.7)²/29.7= .275, p value = 0.6001. Do not reject Ho. There is no significant difference in the proportion of single occupants in the pool and non-pool apartments.

EXERCISE 4 X 2 =156.8 with 3 df. There is significant evidence of a change in the distribution of the return times. Apparently many more surveys are being returned, and returned earlier.

EXERCISE 5 mean = 4.302, std dev. = 0.602. If use 7 categories of width 0.3 centered at the mean, dfoot 3.552 3.552 3.852 4.152 4.452 4.752 >5.152 3.852 4.152 4.452 4.752 5.152 N

8

7

7

15

E

6.81

7.74

11.15 12.59 11.15 7.74

6.81

2

0.21

0.07

1.54

0.48

0.46

9

0.41

13

3.57

5

 2 = 6.74 with 7 – 1 – 2 = 4 df, p value = 0.150, no significant evidence that the distribution is nonnormal. CAUTION: YOUR VALUE OF CHI-SQUARED MAY DIFFER SOMEWHAT DEPENDING ON HOW YOU FORM YOUR CLASSES. Keep E values of at least 5.0.

Chapter 12 – Page 1


EXERCISE 6 Fisher’s exact test had two-sided p value = 0.0022. There is significant evidence that those who took the prerequisite had a different probability of passing the course compared to those who did not take the prereq.

EXERCISE 7 Ho: distribution of output quality has not changed. E1 = (200 * .8) = 160

E2 = (200*.18) = 36

E3 = (200*.02) = 4

X²= 3.306, which is less than the critical value from the Chi-squared table with 2 df, 5.991 (p value 0.191). Do not reject Ho, no significant change in output of machine. We should repeat this analysis combining the last unacceptable class with the good class, since the expected value for this class is less than 5. Then X²= 0.281 with 1 df. Again, there is no significant change in output quality for the machine.

EXERCISE 8 X2 =79.35 with 2 df, p value < .0001, there is strong evidence of a difference in job distribution by gender. Examining the standardized residuals, women are underrepresented in accounting and overrepresented in secretarial posts, and the opposite pattern for men.

EXERCISE 9 X²= 7.38 with 4 df, p value = 0.1173. There is no significant evidence that the distribution of rating differs by city. SAS also reports the likelihood ratio statistic X²= 7.66 with p value of 0.1051, with a similar conclusion.

EXERCISE 10 Ho: each category has probability 1/3. Under Ho, each category has expected value 40. X 2 = 3.8 with 2 df. Reject Ho if X2 > 5.991. If using p values, p value = 0.15. There is no significant

evidence that any of the traits is more commonly valued than any other. EXERCISE 11 a) X²= (50-45.22)²/45.22 + (44-36.17)²/36.17 + (10-22.61)²/22.61 + …..+ (2-.87)²/.87= 26.254 which has 6 df (p value from SAS was 0.0002). Critical value from Table A.4 is X²(6)= 12.592, reject Ho. The school types differ significantly with respect to the distribution of the type of person the students feel most affected their life. b) problem: some cell frequencies are less than 5. Combine the Other group with the Politician group. Adjust cells gives X²=23.39, p value 0.0001, still reject Ho.

Chapter 12 – Page 2


EXERCISE 12. a) X2 = 13.62 with 4 df, p value = 0.0086. There is significant evidence of a relationship between the traits the managers consider most important in a salesperson and the traits they consider most important in a sales manager. b) The contingency coefficient is 0.319. While there is significant evidence the variables are not independent, the actual strength of the dependence is not very great.

EXERCISE 13 a) Forming four groups (for the four combinations of Size and Net_type) and testing that the proportion of gizzard shad is the same in all groups, X 2 (3) = 63.29, the proportion is different in at least one group. Using the Chi-squared test (or Fisher’s exact test) to make the six pairwise comparisons, using α = .05/6 = .0083 on each, we get that the MULT-2 group is the only one that is different. In the table, groups with the same letter were not significantly different. Group

Mult-2

Mono-2

Mult-1

Mono-1

% G. Shad 37.9% a 86.1% b 91.7% b 94.6% b b) The probability of exceeding 193 is significantly different for some group, X 2 (3) = 96.95. This indicates that the median length is different for at least one group. Using Fisher’s exact test for pairwise comparisons at α = .0083, we see that nets with size 2 tend to catch longer fish than those with size 1. Group

Mono-1

Mult-1

Mult-2

Mono-2

% exceeding 193 25.81% a 36.46% a 96.55% b 100.0% b EXERCISE 14 Ho: age distribution is the same for both those reporting and not reporting AI-driving. X 2 = 96.94 with 2 df, p value < 0.0001. There is significant evidence that the age distributions in the two groups do differ. Those reporting AI driving are under-represented in the 18-20 age group (where alcohol sales are illegal), and over-represented in the older age groups.

EXERCISE 15 Since the counts are small for the number developing fevers, use Fishers Exact Test, twotailed p value = 0.6827. This difference could easily arise by chance, there is no significant evidence that the groups differ in their probability of developing a fever.

EXERCISE 16 Ho: the distribution of ratings is the same in both communities. Combining rating categories 1 through 3 into a single category, X 2 = 6.84 with 2 df and p value = 0.0327. There is significant evidence of a difference in the distributions. Chapter 12 – Page 3


CHAPTER 13 EXERCISE 1 a) The plot shows that 2

the ln(odds) of favoring the death penalty are much higher among whites. quickly in the 1970’s but the rate of increase slowed by the mid 1980’s. The

1

Ln(ODDS)

In both races, the ln(odds) increases

W W WW WW

W W W W W W W W W W W W W W

W

B B

0

size of the gap between whites and

B B B B BBB

B BB

BB B

B

B

blacks seems stable across the years.

BB B B -1 19 70

19 80

b) We fit a model that replaced YEAR

19 90

20 00

YEAR

with YEARS=YEAR-1972, to avoid very large numbers in the quadratic term. We coded RACE = 0 for white and 1 for blacks. Ln(odds) = 0.4291 -1.224Race + 0.0813YEARS -0.00219 YEARS2, with -2lnL = 38,448.182. Yes, there is significant evidence that the quadratic term improves the model, it had Wald Chisquared = 68.05 with p < 0.0001. Race dummy variable had Chi-square 2104.14 with p < 0.0001, significantly lower approval of death penalty among Blacks. c) Adding an interaction of RACE with YEARS and RACE with YEARS2 gave -2lnL = 38,443.496. The likelihood ratio test has X 2 = 4.686 with 2 df, which is not significant at  = 5% (it has p value just below 0.10). Hence, there is no significant evidence of an interaction.

d) Yes, it is reasonable to refer to the gap as enduring. The first model showed extremely strong evidence for a gap. Since the interactions were not significant, the gap in the ln(odds) does not appear to be changing much with time. Note, however, that the plot of the probabilities might show a slight change in the size of the gap, but it would be small.

Chapter 13 –Page 1


EXERCISE 2 a) The Chi-squared test of homogeneity has X2 = 8.2736 with p value 0.0160. There is significant evidence that the types of neighborhood differ with respect to the probability a robbery will involve a gun. b) Using 2 dummy variables I2 = 1 if faith is medium, 0 otherwise I3 = 1 if faith is high, 0 otherwise The Chi-squared test based on the likelihood ratio has X2 = 8.3468 with p value 0.0154, which shows significant evidence of a relationship between the probability a robbery will involve a gun and the level of Faith in Police. The medium faith neighborhoods had a probability that differed significantly from the low faith neighborhoods (estimated OR = 0.381, p value = 0.0076). However, the high faith neighborhoods did not differ significantly, even though the estimated OR (0.456, p value = 0.0655) wasn’t much different from that in medium faith neighborhoods. This is possibly because the sample was smallest in the high faith neighborhoods.

you subdivided the categories. You may need to combine several categories in the lower ‘ventrical’ groups. There seem to be increasing probabilities of an abnormal EEG as ventrical size increases. However, it may be that the ln(odds) are not linear in ventricle size but somewhat curvilinear.

Ln(odds) abnormal EEG

EXERCISE 3 a) Your plot may vary depending on how

1 .0 0 .9 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 .0 -0 .1 -0 .2 -0 .3 -0 .4 -0 .5 -0 .6 -0 .7 -0 .8 -0 .9 -1 .0 3 0

4 0

5 0

6 0

7 0

8 0

9 0

Midpoint of Ventrical category

b) fitted model: Ln(odds) = -4.0478 + 0.0569V For each additional unit of ventricle size, the ln(odds) of an abnormal EEG increases by 0.0569. This increase is significant (likelihood ratio Chi-square 6.836 with p value 0.0089, Wald ChiSquare p value 0.0149).

Chapter 13 –Page 2


EXERCISE 4 a) The first dummy variable was TIME (0 before intervention and 1 afterwards). Then there were 26 dummy variables for cities 2 through 27 (city 1 was the baseline city). The actual data set was rearranged to have 54 lines, two for each city (one with the Crashes and Years before intervention and one with the Crashes and Years after intervention). Ln(Years) was used as an offset variable. The Poisson regression had a fitted coefficient of -1.1248 for TIME with Chi-square=118.01 and p value < 0.0001. There is significant evidence that the crash rate was reduced significantly. With confidence 95%, the logarithm of the crash rate was reduced between -1.3277 and -0.9219. That is, the actual crash rate after intervention was between 26.5% and 39.77% of what it was before the intervention. b) However, the title makes it sound like this might not have been a complete tabulation of all sites or even a random sample. The authors may have picked the sites with the greatest crash reductions to give in their table, thus biasing the results.

Chapter 13 –Page 3


EXERCISE 5 a) The plot shows that accidents involving male drivers are much more likely to be alcohol related. There is some indication of a mild -1

women is somewhat larger in the later age categories.

Estimated ln(odds)

interaction, in that the gap between odds for men and M

-2

M

M

-3

F

M

F F

-4 F

-5 1

2

3

4

AGE Category

b) The plot of the estimated probabilities still shows that crashes among men are more likely to be alcohol

13

M

related than crashes among women. However, now the suggestion of an interaction is much stronger.

c) Dummy variable for Sex was coded as 0 for men and 1 for women.

Estimated probability %

12 M

11 10

M

9 8 7 6 5 4

F M

F F

3 2

There were 3 age dummy variables.

1

F

1

A2 was 1 for age 18-20 and 0 otherwise.

2

3

4

AGE Category

A3 was 1 age 21-24 and 0 otherwise. A4 was 1 for age 25+ and 0 otherwise. Fitted ln(odds) = -3.2067 – 1.2144Sex + 1.051A2 +1.268A3 + 1.118A4 All the dummy variables were significant. Females have significantly lower ln(odds) of a crash being alcohol-related. All the older age groups have significantly higher ln(odds) of a crash being alcohol-related compared to the youngest age group.

d) Fitted ln(odds) = -3.234 - 1.0503Sex + 1.0246A2 + 1.303A3 + 1.1699A4 + 0.1561Sex*A2 - 0.2128Sex*A3 - 0.3104Sex*A4 The interaction affects the size of the gap between the ln(odds) of the men and women. In the youngest age group, the ln(odds) for women is -1.0503 less than the ln(odds) for men. In the oldest age group, the ln(odds) for women are -1.0503-.3104 = -1.36 less than the ln(odds) for men. This is consistent with the plot in (a). e) In the reduced model without the interactions, -2lnL = 84011.029

Chapter 13 –Page 4


In the full model that includes the interactions, -2lnL = 84063.981. X2(3) = (84063.981 − 84011.029) = 52.952 , p value < 0.001

There is significant evidence that at least one of the interactions is non-zero, that is, that the gap in the ln(odds) for men and women is different in at least one age group.

EXERCISE 6 a) It looks like the maximum is about 10.5, so initial

1 1

estimate of 1 =10.5. At time 0, it looks like the curve

10

9

2 = ln(4 / 10.5) = −0.965 . It also looks like the

curve would achieve 90% of maximum sometime around hour 2, giving an initial estimate of3 =-1.1.

Water

would intercept the Y axis at about 4, so

8

7

6

5

4 0

1

2

3

4

5

6

TIME

b) fitted value of Water = 10.3986*exp(-1.1276*exp(-1.3686STIME))

c) We need 0.75 = exp(-1.1276*exp(-1.3686STIME)) -> ln(.75) = -1.1276*exp(-1.3686STIME) -> ln(0.25513) = -1.3686 STIME -> 0.998 = Time at which peas absorb 75% of their maximum. This makes sense when we look at the plot. At time one hour, the peas have absorbed about 8 for water, which is about 75% of the approximate maximum.

EXERCISE 7 This uses Poisson regression where the dependent variable is the number of crashes and the independent variable of interest is a dummy variable for design (coded as 0 for design A and 1 for design B). ln(AWTL) is used as an offset variable. (Expected number of crashes = Rate*AWTL.) Fitted value ln(crash rate per unit of AWTL) is -0.8166 + 0.7784DVDesign. This corresponds to an increase in the ln(crash rate) of 0.7784 for Design B versus Design A. This difference is significant, Wald Chi-squared = 22.96 with p < 0.0001.

Chapter 13 –Page 5


Chapter 14 – Page 2


CHAPTER 14 EXERCISE 1 Assuming a two-tailed alternative hypothesis with α=5%. a) T(+)= 9 ; T(-)=66-9, so T=9. From table A.8, Reject Ho if T  10. (SAS proc univariate gives p value = 0.032.) Reject Ho. There is significant evidence that the location differs from 12.5. Note, however, that the significant value is in the wrong direction – it appears that the average is LESS than what the manufacturer claimed! b) t = -2.178, p value = 0.0544. There is no significant evidence that the mean differs from 12.5. While it is difficult to judge normality in small samples, the boxplot for the data is somewhat right-skewed. A nonparametric test may be safer.

EXERCISE 2 Using the Wilcoxon signed rank test, eliminating subject with change of 0, T(+) = 38.5 and T(-) = 27.5, so T = 27.5. From Table A.8, reject Ho if T  10 (using α = 5%). Do not reject Ho. There is no significant evidence that the differences are not symmetrically distributed about 0. (No significant evidence of a change in weight.) Examining a boxplot of the data shows no particular reason a paired t test wouldn’t work. Student’s t = 0.73 with 11df, again, no significant evidence the mean loss differs from 0.

EXERCISE 3

Ho: groups have the same location. Using Table A.9 and α=5%, we reject Ho if T  78.

Since T= 81, we do not reject Ho. There is no significant evidence that the scores for the two professors differ with respect to location. These observations do not show any serious skewness or outliers, so normality is not seriously in doubt. Exam scores have the potential for a small proportion of very low or very high values, so in general medians might be better than means.

EXERCISE 4 Mann Whitney / Wilcoxon rank sum test. T(1)=15, T(2)=40. Using Table A.9 with n1=4 and n2=6, reject Ho if T  12 at α = 5%. SAS will provide an exact p value of 0.17. No significant evidence that the distributions differ for the two species. Pooled t test had t = -1.19 with p value 0.27, no significant evidence that the means differ. EXERCISE 5

The null hypothesis is that the differences are symmetrically distributed about 0.

a) Using Table A.8, Reject Ho if T  3. T(+) = 2, T(-) = 34, so T = 2. There is significant evidence that the differences are not symmetrically distributed about 0. Since the test is insensitive to lack of symmetry, we interpret this is meaning that the typical value is not 0, that is, that the treatment teeth had significantly different (apparently LOWER) mineral content than the control teeth. b) t = -2.281, p value 0.0565. There is no significant evidence that the treatment differs from the control Chapter 14 – Page 1


with respect to mean mineral content. This test is suspect since there are two teeth where the differences are much larger than for the other 6 teeth. This inflates the standard deviation. EXERCISE 6 T(1) = 84, T(2) = 55.5, T(3) = 31.5. S2 = 28.471. Using SAS, Kruskal-Wallis H = 6.673. Compared to a Chi-squared table with 2 df, the p value is 0.0356. Reject Ho, there is a significant difference in at least one teaching method. We declare two methods significantly different if the sample means in their ranks differs by more than 2.1314 28.4706 *

method

1

2

3

n

6

7

5

18 − 6.673

1

15

ni

+

1

= 9.883

nj

1 ni

+

1 nj

Mean rank 14.0 a 7.93 b 6.3 b

Teaching method 1 is significantly different from 2 or 3. Apparently, it has higher typical values.

EXERCISE 7

Counties are the treatment, because you are trying to find whether counties differ. Year is

the blocking variable. Sums of ranks for each county: 7, 6.5, 10, 6.5 A=89.5

B=77.83

T*

Chapter 14 – Page 1


CHAPTER 1: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator. 1. The boxplots shown below summarize data for vocabulary scores from two samples of first-graders, some of whom attended Pre-K, and some of who did not (No Pre-K). a. Among Pre-K children, approximately what proportion have vocabulary scores greater than 41? b. Among no Pre-K children, approximately what proportion have vocabulary scores between 24 and 28? c. Which group apparently has the higher typical vocabulary score? d. Which group shows the greater variability in their vocabulary scores? e. Summarize the shape of the distribution for each group. 2. You jotted down some summary values for a data set of 15 observations of wave heights, in feet, at a monitoring site. Your notes state that: y2 i = 173. yi = 37 and a. Give the mean and the standard deviation of the wave heights, in feet. b. Give the mean and the standard deviation of the wave heights, in meters. (Hint: a meter = 3.3 feet). c. You also have written down that the median wave height, in feet, was 2 . Is the distribution more likely negatively skewed, positively skewed, or symmetric?


3. Volunteers in a psychology experiment watch a video re-enactment of a crime. After a delay, they answer questions about the scene. The researcher records both the length of the delay (in hours) and the number of questions the volunteer answered correctly. A graphical display of the data is shown below. Write a sentence, in simple language, that summarizes the relationship between the two variables.

4. A school is concerned that the playground area may have contaminated soil left from a time when a creosote plant was nearby. The school can not afford to test all the soil on the playground, so a statistician divides a map of the playground into 1000 equally sized rectangles. A random number generator is used to select 30 rectangles. A soil specimen is taken from each selected rectangle. a. Identify the population, the sampling frame, and the sample. b. Is this study observational or a designed experiment? c. For each variable, say whether its scale is nominal, ordinal, interval or ratio. Identify one type of graph that can be used to summarize the data. Contamination level (none, trace, moderate, high) Creosote concentration, in parts per billion Usage (general play, organized sports, drainage structure, etc.)

5. Barometric pressures in tropical cyclones are very negatively (left) skewed. You wrote down 1005 and 985 as the mean and median, but forgot to label which was which. Label the values properly.


6. The two histograms below summarize total cholesterol levels for large samples of elderly men and women. Write a short paragraph comparing the two groups. Be sure to contrast the typical values, variability and shape of the distributions. 30 25

F

P e 20 r c 15 e n t 10 5 0 30 25

P e 20 r M c 15 e n t 10 5 0 112. 5

137. 5

162. 5

187. 5

212. 5

237. 5

262. 5

287. 5

312. 5

337. 5

362. 5

387. 5

chol

7. The frequency table below summarizes the results for mercury concentrations in a sample of Florida lakes. Display the relative frequencies using an appropriate graph. Mercury Concentration Trace (very low) Low Borderline Dangerous

Number of cases 11 27 8 3


8. The data below shows electricity costs in cents per KWH for a sample of 14 utility companies. The data has already been sorted, reading across each row. 12.4 13.9 14.6

12.8 14.0 14.8

13.1 14.2 15.0

13.6 14.2 15.3

13.8 14.4

a. Calculate the first quartile, median, and third quartile. b. Calculate the range and the interquartile range. c. Identify the outliers, if there are any. Show your computations. d. Construct the boxplot for this data, and comment on the shape of the distribution.

9. The boxplots show NOx emissions from your power plant (expressed as pounds per MWh of electricity generated), using two different settings for the air intakes. Compare the emissions, being sure to address the typical values, variability, shape and/or other special features. If your goal is to lower typical NOx emissions, which setting should you select? If your goal is to avoid any extremely high values, which setting should you select?

setting

HIGH

LOW

2.00

4.00

6.00

8.00

NOx

10. a. In a data set with a sample mean of 10 and a standard deviation of 2, you note that 78% of observations fall in the interval (6, 14). Is this distribution approximately bellshaped? Why or why not? b. In a data set with a sample mean of 10 and a standard deiation of 2, you wrote down the percent of the observations that fall in the interval (6, 14). Later, you find your writing is hard to read, but the number is either 63% or 83%. Which must it be and why? SOLUTIONS 1. a. 25% b. 25% c. The pre-K group apparently has higher typical values. d. The No pre-K group shows slightly higher variability. e. Scores in the pre-K group are nearly symmetrically distributed, but in the No pre-K group they are positively (or right) skewed.


2. a. mean = 2.47, SD = 2.42 b. this is a simple change of scale. Mean = 2.47/3.3 = 0.75, SD = 2.42/3.3 = 0.73 c. Since the mean is substantially greater than the median, the distribution is most likely right, or positively, skewed. Alternately, one might see that the size of the SD compared to the mean, combined with the fact that there are no negative wave heights, implies a positive skew. 3. As the time delay increases, the number of questions answered correctly tends to decrease. 4. a. The population is all soil in the playground, or alternatively all 1000 rectangles of soil. The sampling frame is the list of the 1000 rectangles. The sample is the 30 rectangles selected for study. b. Observational c. Contamination level is ordinal, use bar chart or pie chart; Creosote concentration is ratio, use a histogram, boxplot or stem-and-leaf plot. Usage is nominal, use a bar chart or pie chart. 5. Since the data is very left skewed, the mean must be less than the median. Mean = 985, median = 1005. 6. Typical values for the females are slightly higher than those for the men. Females show slightly higher variability, though the difference in variability is not large. Both distributions show some right skew, but the skew is stronger in the males. 7. The bar chart is shown below. Pie charts are possible, but are harder to draw and harder to compare when there is more than one sample.

8. a. median = (14.0+14.2)/2 = 14.1 Using the algorithm described in the text, Q1 = 13.35 and Q3 = 14.5 However, a variety of algorithms exist, on the TI83/84 calculators, Q1 = 13.6 and Q3 = 14.6. b. Range = 2.9. IQR = 1.15 using algorithm for quartiles given in book, but IQR = 1 using TI83/84. c. Using algorithm for quartiles given in book, lower fence = 11.625 and upper fence = 16.225. There are no outliers. d. will depend slightly on algorithm for outliers. Nearly symmetric, but some students will not a slight negative skew.

9. The typical NOx values are lower at the High air intake setting, as shown by the lower median. However, the NOx values are more variable at the high air intake setting, and somewhat right-skewed. At the low intake setting, the values are more symmetric and less variable. Hence, to lower the typical NOx values one should choose the high air intake setting, but to avoid extremely high values, one needs to use the low air intake setting.


10. a. This distribution cannot be approximately bell-shaped because if it were the percentage in this interval (mean ± 2 std.dev) would be close to 95%. b. It must be 83%, because this interval (mean ± 2 std. dev) must include at least 75% of the observations, using Tchebysheff’s Theorem.


CHAPTER 2: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator. 1. a. You have collected data on debt ratios (debt divided by annual income) among households in your county. The distribution is very positively (right) skewed. The most appropriate probability distribution for modeling the individual observations in this data would be: normal distribution binomial distribution 2 distribution Poisson distribution b. You have collected data on numbers of power outages each month at your home. The most appropriate probability distribution for modeling this data would be: normal distribution binomial distribution 2 distribution Poisson distribution c. You have collected data on debt ratios (debt divided by annual income) among 500 randomly selected households in your county, and computed the resulting sample mean. The most appropriate probability distribution for modeling the sample mean would be: normal distribution binomial distribution 2 distribution Poisson distribution d. Each month, you randomly survey a sample of 200 adults in your metropolitan area to ask them whether or not they are ‘confident’ in their financial security. The most appropriate probability distribution for modeling the number who answer Yes would be: normal distribution binomial distribution 2 distribution Poisson distribution

2. The temperature readings in a thermocouple fluctuate with a normal distribution that has a mean of 800 C and a standard deviation of 4 C. a. What is the probability the temperature will be between 798 and 807 C ? b. Give a value that separates the lowest 20% of the temperatures from the upper 80%. Draw the sketch that accompanies this question.


3. Even when a manufacturing line is working properly, each resistor it produces has a small (4%) chance of being defective. At the end of each shift, the supervisor randomly selects 20 independent resistors and tests them. If two or more of the test sample is defective, then the manufacturing line is halted for recalibration. a. What is the probability the line will be halted, even though the probability of being defective is really its usual 4%? b.Suppose the line has slipped out of adjustment, and the probability for each resistor of being defective is now 10%. What is the probability the supervisor will not detect the problem, and the line will not be halted for calibration?

4. Specifications for a monofilament line state that 95% of the specimens must have a diameter between 3.5 and 4.0 mm.. The diameters follow a normal distribution, and you have adjusted the manufacturing machine so the mean = 3.75 mm. What must be? Begin by showing the appropriate sketch for this problem.

5. The Department of Environmental Protection demands that your power plant show a mean NOx emission less than 5 lb/MWh. Your plant’s daily NOx emission actually fluctuates randomly with a mean of 4.9 lb/MWh and a standard deviation of 0.5 lb/MWh. If the DEP randomly selects and tests emissions on 30 independent days, what is the probability that the sample mean will exceed the limit?

6. Your hospital tracks the monthly number of hospital-acquired infections. Past experience shows this number follows a Poisson distribution with a mean of 4 infections per month. a. What is the probability that a randomly selected month will have no hospital-acquired infections? b. Using the properties of the Poisson distribution, give the appropriate control limits for a control chart for the number of infections. 7. An emergency management district has collected data on the number of people per household that would need special assistance in the case of a prolonged power outage (X). The probability distribution is shown below. x 0 1 2 P(x) 0.98 0.015 0.005 a. Calculate the expected value and the variance for X. b. A community has 10,000 households. Identify the distribution for the number of households that have at least 1 person needing special assistance.


8. The sketch below shows a diagram where every component of the system is independent of the others. Components labeled ‘A’ are expensive, and work with probability 0.99. Components labeled ‘B’ are less expensive, and work with probability 0.95. Calculate the reliability of the system.

B1

A1

B2

9. Two housemates volunteer for a clinical trial of a vaccine. Each is randomly and independently assigned to the vaccine with probability 2/3 and the placebo with probability 1/3. a. What is the probability that both the housemates will receive the vaccine? That at least one will receive the vaccine? b. If there are five households where two of the housemates volunteer, what is the probability that none of the households has both housemates receive the vaccine? 10. A scale at a shipping company rounds all package weights to the nearest tenth of a kilo, that is, the reported weight is within ± 0.05 kg of the true weight. a. What is a reasonable probability distribution for modeling the rounding error (the difference between the true and reported weight)? b. During the day, the shipping company weighs 1000 packages, writing down the reported weight of each individual package. At the end of the day, the container of all 1000 packages is weighed on a very large accurate scale. This bulk weight is compared to the weight obtained by adding up the written values for the individual 1000 packages. Give the mean and standard deviation for the discrepancy between the two values. Hint: the discrepancy equals the sum of the 1000 rounding errors. c. What is the probability that the discrepancy is between -2 and +2 kg?


SOLUTIONS 1 a) 2 distribution (after scaling by constant) b) Poisson distribution c) normal distribution d) binomial distribution 2. a) Your method will vary a little bit depending on whether you use the calculator or the paper table. From the TI-84 calculator normcdf(798,807,800,4) = 0.6514. From the paper table, Z = (798-800)/4 = -0.5 and for the upper limit Z = (807-800)/4 = 1.75, so probability = .959941 - 0.308538 = 0.6514 b)Your sketch should show .2 of the probability to the left of the dividing line. On the TI84, X = invnorm (.2,800,4) = 796.64, Or using the table, Z = -0.84, so X = 800 -0.84*4 = 796.64 3. a) X = number of defective in sample is binomial with n=20 and p = .04 Want P(X 2) = 1 – [ P(X=0)+P(X=1) ] = .1897 b) X = number of defective in sample is binomial with n = 20 and p = .10 Want P(X 1) = .3918 4. Sketch will have 0.95 of the probability between 3.5 and 4.0, and .025 of the probability in each of the tails. Hence the lower limit of 3.5 must match a Z-score of -1.96, yielding the equation 3.5 3.75 1.96 = = .1276 5. Want probability that x > 5.0 We know that sample mean has

x

= 4.9 and standard

error x = .5 / 30 = .0913 . Since n is large, the sample mean is normally distributed. Usingcalculator or normal probability table P(x > 5) = P(Z > 1.095) = 0.1367 minor variations on the answer will occur due to different ways people rounded or entered the z-table. 6. a. e 4 = 0.0183 b. The standard deviation would be 4 = 2 . Using the 3-sigma rule, the upper control limit would be 4+3*2 = 10. The lower limit would be 4-3*2 = -2, but since the number can’t be negative, the lower limit would effectively be 0. 7 a) expected value = 0.025, variance = 0.03438 b) binomial distribution with n = 10000 and p = 0.02. 8. P( [B1 or B2 work] and A1 works ) = 0.9975 * 0.99 = 0.9875


9. a. P (both receive vaccine) = (2/3)*(2/3) = 4/9 P (at least 1 receive vaccine) = 1 – P(neither receive vaccine) = (1/3)*(1/3) = 1/9 b. There are n = 5 households, and we are counting Y = number of households where both receive vaccine, so p = 4/9. Using binomial, P(Y = 0) = (5/9)^5 = 0.053 10. a. Uniform on the interval from -0.05 to +0.05 kg. This distribution has mean 0.0 and variance (0.1)2 / 12 = 0.000833. b. This is the sum of 1000 independent variables each with the mean and variance given in part a. The mean for the total is 1000*0 = 0 and the variance for the total is 1000*.000833 = 0.833. The sample mean (total divided by 1000) is nearly normally distributed with mean 0 and standard deviation 0.000833 / 1000 = 0.000913. The probability P( -2 < discrepancy < 2) = P(-2/1000 < average discrepancy < 2/1000) = P(-2.19 < z < 2.19) = 0.9715.


CHAPTER 3: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator. 1. a) A university administrator states ‘there is no evidence, at = 1%, that aliens have taken over the university’. You are interested in this conclusion, but prefer to use = 5%. (1) You would also say there is no significant evidence of an alien takeover. (2) You would say there is significant evidence of an alien takeover. (3) You do not know whether there is significant evidence or not, until you know the p value. b) You are testing the null hypothesis ‘mean reading scores are not changing in my school district’ using a significance level of 5%. Which statement below corresponds to a Type II error? (1) In truth, mean reading scores have not changed. Sample data yields a p value of 0.03. (2) In truth, mean reading scores have not changed. Sample data yields a p value of 0.30. (3) In truth, mean reading scores have changed. Sample data yields a p value of 0.03. (4) In truth, mean reading scores have changed. Sample data yields a p value of 0.30. c) For a hypothesis test at a fixed significance level of 5%, power will increase if (1) increases (2) sample size decreases (3) the standard error increases (4) the discrepancy between the true parameter value and the hypothesized value increases d) The margin of error will generally decrease if (1) the confidence coefficient decreases (2) increases (3) sample size decreases (4) the standard error increases 2. You are reading a research article that states ‘there is significant evidence that the distribution of debt ratios is changing, z = 2.14’. a) Calculate a p value for this test statistic, assuming a two-tailed alternative. b) The authors were using α = 5%. Would you agree with their conclusion, if you use α = 1%?

3. From past experience, the cognition scores on a certain test are known to have a mean of 60 and = 12. You have tested a random sample of 40 subjects who have a history of depression and found a sample mean of 58. a) Is there significant evidence that the mean score in this population differs from 60? Use =5% and assume that the standard deviation is still 12. In your conclusion, identify the relevant population. b) Is it necessary to assume the data come from a normal distribution? Why or why not?


4. In a survey of household saving rates, a sample of 42 households randomly selected from County X showed a mean saving rate of 1.45 (as a percentage of gross income). Assuming that the population standard deviation is still at its historical value of 0.79, give a 90% confidence interval for the mean saving rate in this population.

5. A technician is trying to use data to show that the mean pH readings from an instrument are biased. The technician takes 8 independent readings of pH from a neutral test solution with known pH of 7.0. Ho: = 7 and H1: 7. Not knowing any formal statistics, the technician intuitively understands that a sensible rule is ‘decide the instrument is biased if the sample mean is less than 6.9 or greater than 7.1’. Suppose the standard deviation of individual readings from the instrument is known from technical specifications to be = 0.15 . Calculate if the instrument really is biased with a true mean in this situation of 6.85.

6. You are studying sizes for single-family houses in Gainesville, FL. You believe that the standard deviation in sizes will be about 400 square feet. How large a sample size will you need, if you want to estimate the mean size in this population with a confidence interval of 90% and a margin of error of 50 square feet?

7. From past experience, the cognition scores on a certain test are known to have a mean of 60 and = 12. You wish to show that mean scores among people with a history of depression will exceed 60. Your plan is to randomly select a sample of 40 such people, and claim evidence for your hypothesis if the sample mean exceeds 62. What significance level are you using? Problems 8 through 10 use distributions other than the normal in applying concepts of inference. 8. In the past, the success rate (proportion of students earning a C or better) in an elementary statistics course has been 0.65. The instructor increases the quantity of homework in the hopes of increasing the success rate. In the semester after the change, 22 of the 30 students initially enrolled in the course succeed in earning a C or better. a) State the null and alternative hypotheses in terms of the probability of succeeding after the change in homework (p). b) Calculate the p value for the observed data, using the binomial distribution. c) Using a significance level of 5%, what conclusion would you reach?


9. A nutritionist tests 200 foods to see if they affect the risk of heart disease. In each hypothesis test of the null hypothesis “Food X has no effect on heart disease”, he uses =5%. Though the nutritionist doesn’t know it, none of the foods has any relation to heart disease so that the null hypothesis is true in every case. Assume all the foods (and tests) are independent. a) What is the expected number of Type I errors, in which the nutritionist claims that there is evidence the food has an effect on heart disease (even though it truly does not)? b) What is the probability that there is at least 1 Type I error in the list of 200 hypothesis tests? 10. In the past, a company’s number of computer network outages in a month has followed a Poisson distribution with a mean of 1.5. The company has undertaken steps to reduce the number of network outages. Their assessment plan states ‘we will have evidence our steps have been effective if there are no network outages in either of the two months following implementation’. a) If the plan has not been effective, so that the mean is still 1.5, what is the probability that a single month will have 0 network outages? b) What significance level is the company using, assuming months are independent?


SOLUTIONS 1. a) #3

b) #4

c) #4

d) #1

2. a) p value = P(Z < -2.14 or Z > 2.14) = 2*0.0162 = 0.0324 b) No, using α = 1%, there is no significant evidence that the distribution of debt ratios is changing. 3. a) = mean cognition scores among people with a history of depression Ho: µ = 60 H1: µ ≠ 60 Reject Ho if |z| > 1.96 or p value less than 0.05. z = -1.054 and p value = 0.2918 There is no significant evidence that the mean cognition score among people with a history of depression differs from 60. b) No, it is not necessary. Since the sample size is large (at least 30), the Central Limit Theorem implies that the sample mean will be approximately normally distributed unless the distribution of individual values is extremely skewed. 4. With confidence level 90%, the mean savings rate for households in County X is between 1.2495 and 1.6505. 5. The standard error is 0.15 / 8 = 0.053 . Assuming µ = 6.85, then = P( 6.9 ≤ x ≤ 7.1) = P(0.943 ≤ Z ≤ 4.717) = 0.1728 2 6. n = 1.645 * 400 = 173.2 The sample size should be at least 173.

50

7. The standard error is 12 / 40 = 1.897 . Assuming µ = 60, then = P( x ≥ 62) = P( z ≥ 1.054) = 0.1459 8. a) Ho: p ≤ 0.65 versus H1: p > 0.65 b) p value = P(X ≥ 22) assuming X has binomial distribution with n = 30 and p = 0.65. p value = 0.2247 c) There is no significant evidence that increasing the quantity of homework increased the success rate. 9. The number of Type I errors among the 200 hypothesis tests will follow a binomial distribution with n = 200 and p = 0.05 a) µ = np = 10 b) 1 – (0.95)^200 = 0.99996 10. a) Using the Poisson distribution, the probability of no outage in a single month is e 1.5

= 0.2231

b) The significance level equals the probability of neither month having a network outage, assuming µ has not changed. If months are independent, this is 0.22312 = 0.0498 .


CHAPTER 4: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator. 1. A. When the distribution of the individual observations is extremely skewed, inferences on the may be more meaningful than inferences on the mean. B. The degrees of freedom for the t test are based on the number of observations used in the calculation of the . C. The larger the degrees of freedom for the t distribution, the more the critical values resemble those of _ . D. To demonstrate that a high proportion of observations will be close to the mean, one can use the alternative hypothesis that the _ is small.

2. You are studying educational level (measured as years of school successfully completed) among Florida prisoners. In a sample of 40 prisoners, you find a mean of 9.4 years and a standard deviation of 2.1 years. Give a 95% confidence interval for the mean educational level in this population, specifying the population in your sentence.

3. The U. S. Sentencing Commission reported that in 2003, the mean sentence nationally for persons convicted of auto theft was 60.8 months. You pull records for a random sample of 35 Florida convictions for auto theft during this same year, and find a mean of 61.9 months and a standard deviation of 4.8 months. Is there significant evidence that the mean sentence in Florida differed from the national mean, using = 5%? 4. In a sample of 120 households of Jacksonville Beach, 52 are rated as ‘poorly prepared’ to evacuate on short notice. Give a 90% confidence interval for the proportion of all households in this community that are poorly prepared to evacuate.

5. In the past, the proportion of undergraduates passing a required skills exam on the first try has been 55%. You believe that the proportion has increased. In a sample of 60 recent first-time takers of the exam, 40 passed. Is there evidence to confirm your belief, at = 5%?

6. A marketing firm is planning a study among elderly persons to estimate the proportion who do not have Internet access. They have no prior information on what this proportion might be. Recommend a sample size, if they wish their 90% confidence interval to have a margin of error of only 0.05.


7. A past study showed that the sentences for persons convicted of auto theft in Florida had a standard deviation of 6.5 months. The legislature was concerned that there was too great an inconsistency in the sentences, and approved guidelines for judges to use in sentencing. You pull records for a random sample of 36 Florida convictions since the institution of the guidelines and find a standard deviation of 4.8 months. Is there significant evidence that the standard deviation has decreased, using = 5%?

(Problem 8 is longer because it requires two hypothesis tests. It can be shortened, or turned in to two problems.) 8. In order to qualify for automatic submission of its claims to an insurance company, a medical billing company must pass an accuracy test. In the accuracy test, the insurance company randomly selects and audits a sample of 30 claims from the billing company. To pass, there must be 1) No significant evidence, at α = 10%, that the mean error differs from 0. 2) Significant evidence, at α = 1%, that the standard deviation in the errors is less than 10. a) Explain why the first hypothesis test uses a high value of . b) In the sample of 30, the mean error was $2.12 and the standard deviation was $8.01. Does the company pass the test?

9. On the standard treatment, mice exposed to a toxin have a median survival time of 25 days. You have developed a new treatment, and want to show that it increases median survival time. In a random sample of 12 mice exposed to the toxin, 10 survived more than 25 days. At = 5%, does this constitute evidence that the median survival time has increased? 10. A random sample of 10 insurance claims from a medical provider is audited for errors. The sample shows errors with a standard deviation of $8.00 per claim. Give a 95% confidence interval for the standard deviation in the population of all claims from this provider.

SOLUTIONS. 1. A. median B. standard deviation (or sums of squares or variance) C. the standard normal D. variance (or standard deviation) 2. With confidence 95%, the mean years of education among Florida prisoners is between 8.728 and 10.072 years.


3. = mean sentences for auto theft in Florida Ho: µ = 60.8 H1: ≠ 60.8 t = 1.356, p value = 0.1841 There is no significant evidence that the mean sentence for auto theft in Florida differs from the national mean. 4. p̂ =52/120 We can use the normal approximation because np̂ =52 ≥ 5 and n(1 p̂)=68 ≥ 5. With confidence 90%, the proportion of Jax Beach residents who are poorly prepared to evacuate is between 0.3589 and 0.5077. (The correction due to Agresti and Coull will not make much difference here, since sample size is reasonably large.) 5. p = current probability a student will pass on the first try Ho: p ≤ 0.55 H1: p > 0.55, can use normal approximation because np0 =33 ≥ 5 and n(1 p0)=27 ≥ 5. z = 1.816, p value = 0.0346 There is significant evidence that the probability a student will pass on the first try has increased. 2 6. n = 1.645 * .5 * .5 = 271

.05

7. Ho:

2

42.25

versus H1:

2

< 42.25 .

SS = 35*(4.82) =806.4,

X2 = 806.4 / 42.25 = 19.09

Using the table of the Chi-squared distribution with 35 df, reject Ho if X2 < 22.465 There is significant evidence that the standard deviation is now below 6.5. 8 a) The hypothesis that would mean no systematic error in billing practices is = 0. Normally, we would require the company to prove it has good billing practices, but there is no way to make ‘=’ the alternative hypothesis. Instead, we have to refuse the company if there is even moderate evidence that they have poor billing practices. b) #1: Ho: = 0, H1: ≠ 0, t = 1.45, p value = 0.1579, There is no significant evidence the mean differs from 0. #2: Ho: 2 ≥ 100 versus H1: 2 < 100, SS = 1860.6429, X2 =18.61 With 29 df, reject Ho if X2 < 17.708. Hence, there is no significant evidence that the standard deviation is less than 10. The company does not pass the test. 9. M = median survival under new treatment Ho: M ≤ 25 H1: M > 25 p value = 0.01 = probability of 10 or more surviving 25+ days under the binomial distribution with n = 12 and p = 0.50. There is evidence that the median survival time is now higher than 25 days.


10. SS = (10-1)8=576. Some modern calculators have a function for the Inverse ChiSquared distribution, otherwise, using Table A.3 with 9 df we get 2 = 2.700, 2 = 19.023. The confidence interval for the population variance, with 0.975

0.025

95% confidence, is from 576/19.023 to 576/2.700, or from 30.279 to 213.33 With confidence 95%, the population standard deviation for the errors is between 5.50 and 14.60.


CHAPTER 5: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator. 1. For each scenario below, choose the most likely method of analysis and write the corresponding number in the blank. #1 two-sample t test #2 paired t test #3 two-sample z test for proportions #4 two-sample F test #5 McNemar’s test #6 two-sample test for medians A. Household incomes are often extremely right-skewed. Based on sample data from two neighborhoods, do typical household incomes differ for the two neighborhoods? B. An investment counselor gives both the husband and the wife in each couple a questionnaire on risk tolerance (measured on a quantitative scale). Are husbands typically more risk tolerant than the wives? C. Grade point averages are collected for random samples of engineering majors and business majors. Do typical GPAs differ for the two groups? D. Pollution levels, categorized as either Low or Elevated, are measured at a sample of lakes in the Southeast, and also at a sample of lakes in the Northeast. Do the two regions differ in the probability a lake will have an elevated pollution level? E. A sample of lakes in the Southeast have their pollution level measured twice, once in the Spring and once in the Summer. Pollution level is categorized as either Low or Elevated. Does the probability a lake will have an elevated pollution level vary by season? F. Diabetics about to undergo surgery are randomly assigned to one of two insulin protocols. Is one of the protocols more effective at reducing variability in the post-operative blood glucose levels? 2. A researcher compares the sample means in two small samples, one of size 5 and the other of size 9. She uses the unequal variance version of the t test, since the sample standard deviations seem very different. a. What is the minimum possible degrees of freedom, and what is the maximum? b. [This version of question requires scientific calculator with t distribution.] Assume a twotailed alternative hypothesis. If t’ = 2.04, what is the largest and smallest possible p value that could be assigned to the test? Can you reach a conclusion? c. [This version of the question requires paper table of t distribution or scientific calculator with inverse t distribution.] Assume a two-tailed alternative hypothesis. What would the critical regions be with the minimum and maximum degrees of freedom. If t’ = 2.04, can you reach a conclusion?

3. Eight volunteers participate in an experiment where their reaction time (in milliseconds) is recorded in response to a flashed light. Each volunteer has one reaction time measured using a red light, and again using a blue light. The data is shown below. Give a 95% confidence interval for the difference in mean reaction times for the two light colors. Subject #1 #2 #3 #4 #5 #6 #7 #8 Red 168 204 125 193 231 221 185 200


Blue

192

206

144

191

252

239

198

214

4. A researcher interviews a sample of a city’s residents to determine their attitudes towards the public school system. Among the 87 respondents who had school-age children, 39 expressed confidence in the school system. Among the 45 who did not have school-age children, 23 expressed confidence. At  = 5%, is there evidence that those who do and those who do not have children differ in their level of confidence in the school system?

Problems 5 through 7 are based on the following scenario. At the end of their 4th grade year, 50 schoolchildren are tested for their reading ability. The children are randomly divided into two groups. One group is given a focused reading list for the summer, the other a broader list. At the beginning of the next school year, the children are tested again. The data for this study is summarized below. Loss is taken as the score at the end of the 4th grade – beginning of the 5th grade. The goal is to see whether one of the lists reduces the mean loss of reading ability that occurs over summer break. Group

Mean Std. Dev. n

Focused reading list Begin 5th End 4th grade grade 51.13 43.37 5.30 6.49 25 25

Loss 7.76 2.90 25

Broader reading list Begin 5th End 4th grade grade 51.91 44.29 4.51 5.90 25 25

Loss 7.62 4.19 25

5. a. At the end of the 4th grade, did the groups differ significantly with respect to the mean reading ability? Use  = 5%. b. At the end of the 4th grade, did the groups differ significantly with respect to the variability in their reading ability? Use  = 5%. 6. a. Did the children in the focused reading list group show a significant change in their mean reading scores during the summer? Use  = 5%. b. Did the children in the broader reading list group show a significant change in their mean reading scores during the summer? Use  = 5%. 7. a. Is there evidence, at  = 5%, that the groups differ with respect to the mean loss in reading ability over the summer? b. Is there evidence, at  = 5%, that the groups differ with respect to the variability in their loss of reading ability?

8. A company is testing two different monitors to see if they differ with respect to their ability to detect elevated carbon monoxide (CO) levels. The two monitors are mounted side-by-side, and then the pair are exposed to 50 different situations with elevated CO. The table below


summarizes the results. Test the null hypothesis that the monitors have equal probabilities of detecting CO, using  = 5%. Monitor A works Monitor B works Monitor B does not work

28 4

Monitor A does not work 12 6

9. Volunteers are randomly divided into two groups and given a cognition test. Participants in group A take the exam after a normal nights’ sleep. Volunteers in group B take the exam after going 24 hours without sleep. The scores on the exam seem to be right skewed, so the researchers decide to compare the medians in the groups. The overall median score was 42. In group A, 19 of the 25 participants scored above 42. In group B, 6 of the 25 participants scored above 42. At  = 1%, is there evidence that the medians differ? If so, which group seems to have the higher median? 10. You are trying to compare typical household incomes in two zip codes. The box plots of the independent samples, both of size 10, are shown below. Briefly describe two different methods of analysis that would be appropriate for this data.


SOLUTIONS 1 A. #6

B. #2

C. #1

D. #3

E. #5

F. #4

2. a. The minimum possible df are 5-1 = 4 and the maximum is 5+9-2 = 12. b. If df = 4, then p value = 2*.0555 = 0.111. If df = 12, then p value = 2*0.032 = 0.064. Yes, under all possible df, there is no significant evidence the population means differ. c. If df = 4, reject Ho if |t| > 2.776. . If df = 12, reject Ho if |t| > 2.179. Yes, under all possible df, there is no significant evidence the population means differ. 3. Paired samples. Taking the differences within samples as Blue – Red, the sample mean is 13.625 and the sample standard deviation is 9.180. With confidence 95%, the difference in mean reaction times is between 5.951 and 21.30, where the positive values indicate that Blue has a longer mean reaction time than Red. 4. two-sample test for proportions. Ho: p1 = p2 versus H1: p1 ≠ p2. z = -0.686, p value = 0.4929 There is no significant evidence that the proportions feeling confident in the school system differ for those who do and do not have children. 5. a. two sample t test. Using the pooled t test, t = -0.56 with 48 df and p value 0.5778 There is no significant difference in the mean reading ability of the two groups initially. b. two sample F test. F = 1.381 with 24 and 24 df. p value = 0.4349 There is no significant difference in the variances of the reading ability in the two groups initially. 6. Paired t tests Ho: D = 0 versus H1: D ≠ 0 a. within the focused group: t = 13.38, p value = 0, the mean reading loss in the focused group differed significantly from 0. b. within the broader reading group: t = 9.09, p value = 0, the mean reading loss in the broader reading group differed significantly from 0. 7. a. applying the two sample t test to the losses, t = 0.137, p value = 0.8913 There is no significant evidence that the mean loss differs for the two types of reading lists. b. F = 2.09, there is no significant evidence that the variances differ in the two groups. 8 McNemars test. Out of 16 discordant pairs, only 4 were for B does not work/A works. z = -2.0, p value =0.0455. There is significant evidence that the monitors differ in their effectiveness. Monitor B seems to be more reliable. 9. Compare the proportion of observations exceeding 42 for the two groups.


z = 3.68, p value = 0.0002, there is significant evidence that the medians in the two groups are different. There is apparently a higher median cognition score among people who have a normal night’s sleep. 10. The small sample size and amount of skew makes an independent samples t test inappropriate, even if you use the unequal variance version. 1) You can try transforming the data using logarithms or square roots to reduce skew and make the dispersions more nearly equal, then apply an independent samples t test, either version. 2) You can compare the medians directly by looking at the proportion in each group that exceed the overall median.


CHAPTER 6: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator and specialized tables of distributions. 1. For each scenario below, select the best method for controlling the family-wise error rate at 5%. A. You are comparing the mean reaction time to four different colors of lights. Your only concern is to decide which colors have the shortest mean reaction time. B. You are comparing mean hardness for epoxy resins cured for one of five different curing times (1 hour, 1.25, 1.5, 1.75 or 2 hours). You have planned in advance that you will test for a linear and for a quadratic trend (two orthogonal contrasts), and that these are the only follow-up hypotheses that will be tested. C. You are comparing mean hardness for epoxy resins cured for one of five different curing times (1 hours, 1.25, 1.5, 1.75 or 2 hours). You will do all pairwise comparisons, but also wish to make any other contrasts which occur to you after inspecting the data. D. You are comparing the effectiveness of two blood pressure medications (A and B) to the current ‘gold standard’ treatment C. You are not particularly interested in the comparison between A and B, rather, you are focused on comparing each of these to drug C.

2. You are comparing mean hardness for epoxy resins cured under one of four different conditions: #1: catalyst A for 1 hour #2: catalyst A for 2 hours #3: catalyst B for 1 hour #4: catalyst B for 2 hours A. Give the contrast coefficients which correspond to the following statements. 1) The average of the means for catalyst A equals that for Catalyst B. 2) The difference between the means at 1 and 2 hours is the same for Catalyst A as it is for catalyst B. B. Show that the two contrasts for part (A) are orthogonal. C. Create a new contrast that is orthogonal to both those in part (A), and interpret it.

3. The ANOVA table below summarizes an experiment with 4 groups, each with 8 observations. Complete the table, and give the appropriate interpretation assuming = 5%. Source Between Within Total

df

SS 480 2160

MS

F


Problems 4 through 6 are based on the following scenario and table of results. You have an experiment in which 48 patients with mild diabetes are randomly assigned to one of 4 different protocols for controlling blood glucose levels. Each patient submits to extensive monitoring for a one week period. The dependent variable is the percentage of time that the patient’s blood glucose was high (TIMEHIGH). It is desirable that this value be low.

Source Model Error Corrected Total

Sum of Squares 528.000000 554.776930 1082.776930

DF 47 Level of protocol 1 2 3 4

N 12 12 12 12

Mean Square

F Value

---timehigh Mean 27.5403924 27.5403924 23.5403924 19.5403924

4. Is there significant evidence that at least one group has a different mean TIMEHIGH, assuming = 5%? Cite the appropriate test statistic.

5. [Requires table for Studentized Range Statistic.] Protocols 1 and 3 are similar except that protocol 3 includes an exercise plan. Do these two group means differ significantly? Assume that all the pairwise comparisons are being conducted, not just this one, and that the family-wise significance level is to be kept at 10%. 6. After examining the data, the researcher decides to test the null hypothesis that the mean for Protocol 4 equals the average of the means for the other 3 groups: 1 + 2 + 3 – 3 4 = 0 . What would the appropriate method be for deciding whether this test statistic is significant? Would it be significant at a family-wise error rate of 5%? 7. (Requires table for Analysis of Means). The following table summarizes data for Sales per Week for random samples of supermarkets, in $100,000s. Plot the control lines using = 10%, and the observed means. Identify any weeks for which the sales seemed anomalous. Week 1 2 3 4 5

n 10 10 10 10 10

Mean Sales 18.6 23.4 21.8 22.6 26.2

Standard deviation 5.9 6.2 6.0 5.8 6.1


8. The following table summarizes data for Sales per Week for random samples of supermarkets, in $100,000s. The weeks correspond to weeks of an advertising campaign. In advance, it was decided that the only hypothesis to be tested is that there is a linear trend in the data. The coefficients for a linear trend in five groups are (-2, -1, 0, 1, 2). Test the null hypothesis of no linear trend using = 5%. Hint: SSW = 1620.9. Week 1 2 3 4 5

n 10 10 10 10 10

Mean Sales 18.6 23.4 21.8 22.6 26.2

Standard deviation 5.9 6.2 6.0 5.8 6.1

9. You are examining the effect of curing time on the mean hardness of epoxy resins. There are five levels of curing time (1, 1.25, 1.5, 1.75 and 2 hours). Four batches of resin are cured at each level of time, for a total of 20 observations. Prior to the experiment, it was hypothesized that a model based on two orthogonal contrasts (one a linear trend and the other quadratic) would suffice to produce a good fit to the data. The ANOVA table below shows the usual sums of squares and those for these two contrasts. a. Is there evidence for a linear trend? b. Is there evidence for a quadratic trend? c. How many additional orthogonal contrasts are possible? Is there evidence that any of these would improve the fit of the model? Source Between Linear Quadratic Within

SS 90 79 5 30

10. [Requires table for the Analysis of Means.] A hospital administrator randomly selects records for 50 emergency room patients every month to find the proportion who had to wait more than 1 hour for assistance. For six consecutive months, the number waiting more than an hour were: 8, 12, 2, 19, 10, 13. Use an ANOM procedure with a significance level of 5% to see whether any of the months seems to be different.


SOLUTIONS 1. a. Tukey’s HSD b. Bonferroni’s for two hypotheses c. Scheffe’s d. Dunnett’s 2. A1: 1 1 -1 -1 A2: 1 -1 -1 1 b. 1(1) + 1(-1) + (-1)(-1) + (-1)1 = 0 so these contrasts are orthogonal c. average of the means at two hours equals the average of the means at one hour (1 -1 1 -1) This is orthogonal to both A1 [1(1) + (-1)1 + (1)(-1) + (-1)(-1) = 0] and A2 [1(1) + (-1)(-1) + 1(-1) + (-1)(1) = 0]. 3. Source df SS MS F Between 3 480 160 2.67 Within 28 1680 60 Total 31 2160 The critical value with 3 and 28 df is about 2.92. Hence, there is no significant evidence that any of the groups have a different mean value. 4. a) F=13.96 with 3 and 44 df. The critical value is about 2.81. There is significant evidence that at least one group has a different mean. 5. The critical value is

3.79 12.6086 (1 / 12 + 1 / 12) = 3.88. The difference in the sample 2

means for these two groups is 4, so Protocols 1 and 3 do differ significantly. 6. Use Scheffe’s method, since the contrast was decided after examining the data. The critical value is 3(2.81)(1 + 1 + 1 + 9)(12.6086 / 12) = 10.31 . The calculated value for the contrast is L̂ = 20 . There is significant evidence that the mean for Protocol 4 differs from the means for the other groups. 7. The control limits are 22.52 (2.37)(6.002) 4 / 50 = 22.52 No weeks means plot outside the control limits. 8. L̂ = 14.4,t =

4.02 = (18.50, 26.54)

14.4 = 2.40 with 45 degrees of freedom. (Note the sum of the 1620.9 10 * 45 10

squared coefficients is 10, and the sample size in each group is also 10.) At significant evidence of a linear trend.

= 5%, there is

9. a. The MSW = 30/(20-5) = 2.0 and MSLinear = 79/1 = 79, so F = 39.5 with 1 and 15 df, p value 0.0001, there is strong evidence for a linear trend.


b. MSQuadratic = 5/1, so F = 5/2 = 2.5 with 1 and 15 df, p value =0.135, there is no strong evidence of a quadratic trend. c. There are 5-1 total possible orthogonal contrasts and 2 have been used, so there are 2 remaining. The SS for these remaining contrasts is 90 – 79 – 5 = 6, so the F = (6/2)/2 = 1.5 with 2 and 15 df, p value = 0.255, there is no evidence that either of these two contrasts would improve the fit of the model.


10. The overall proportion is 0.2133, the control limits are 0.2133 rd

2.62 0.2133(1

0.2133) / 50 * th

5 / 6 = 0.2133

0.1386

= (0.0747,0.3519)

The 3 week (2/50 = 0.04) and the 4 week (19/50=0.38) are outside the control limits.


CHAPTER 7: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator.

1. You are reading a research paper that describes a study of Y = mean class score on arithmetic exam versus X = number of arithmetic homework problems per week assigned in the class. Since the data set is small, with only n = 10 classes, you perform the regression on a scientific calculator. The screen output states; Y = ax + b a = 1.4 b = 2.1 r = 0.71 a. Interpret the coefficients, and describe the relationship in simple language. b. Is there significant evidence of a linear relationship, using = 5%? c. Give a 95% confidence interval for the expected change in Y, if one additional homework problem is assigned per week. (Hint: solve for the t statistic for the slope.)

2. You are reading a research paper that describes a regression of Y = fruit sugar content on the independent variable X = low temperature at harvest time in degrees Celsius. The data consisted of a sample of 16 observations. The authors provide the following summary information: Intercept X

Estimated Regression parameters 14.26 -0.35 MSE=0.11

a. In simple language, describe the relationship between fruit sugar content and low temperature. b. Give a point estimate of the expected fruit sugar content when the low temperature is 10 C. c. Assume that the values of X in the data had a sample mean of 12 C and a sample standard deviation of 4 C. Give a 95% confidence interval for the expected fruit sugar content when the low temperature is 10 C.

3. In a sample of 25 observations, the sample correlation between systolic blood pressure and body mass index is 0.29. Is there significant evidence of a linear relationship between these two variables, using = 5% ?


4. You are researching the relationship between survival time for liver transplant patients with particular emphasis on the possible relation with age. In an article in a medical journal, you find the following statement. The correlation between age and log(survival time) was -0.65 (p value = 0.094). a. Describe the apparent relationship between age and log(survival time) in simple language. b. What is the most likely reason for stating the correlation in terms of the logarithms rather than the survival times? c. The presence of the p value implies a hypothesis was tested. What hypothesis was most likely tested, and what conclusion would you reach if you use = 5%? d. How could a correlation coefficient with such a large absolute value have such a high p value?

5. A research article states that The fitted regression of Y = ln(asphalt strength) on X = ln(sand content) was Yˆ = 3.42

0.49X,

MSE = 0.16

Assume this regression came from a sample of 15 observations where the sample mean of X was ln(20). What can you say about the typical value of asphalt strengths when sand content is 20? Use a 95% confidence interval, and be specific with regard to the parameter for which you are giving an interval.

6. A utility company is attempting to predict summer daily demand for electricity using the twoday ahead weather forecast for the day’s high. (Using the two-day ahead forecast gives the utility company time to prepare.) A regression is done using records for 40 randomly selected summer days during the past two years. Demand is in megawatts. Part of the printout is shown below. ANOVA Source SS Model 191.00 Error Corr. Total 500.00 Parameter estimates Intercept -12.320 High_temp 0.350

df

MS

F

Std. error. 2.4631 0.0149

a. Fill in the blanks in the printout. b. Give a 95% confidence interval for the difference in expected electricity consumption for two days, one of which is 5 degrees hotter than the other day. c. Consider two days with exactly the same predicted high temperature. Write the symbolic expression (in terms of 0, 1 and ) for the difference in their actual electricity consumption. Knowing what we have discussed regarding independent random variables, use the information above to give a rough 95% confidence interval for the amount by which these two days could differ in their actual electricity consumption. [This is a stretch-your-brain problem, do as much as you can.]


7. You are modeling vehicle carbon emissions as a function of ethanol content of fuel. Each of the residual plots below is for some transform of the dependent and independent variables. For each, say what is the most obvious violation of the assumptions, and suggest the remedy (e.g. non-normal residuals, transform Y). If the model appears acceptable, say so. A.

5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 0

1

2

3

4

5

6

7

8

9

10

6

7

8

9

10

6

7

8

9

10

Et hanol

B.

2

1

0

-1

-2 0

1

2

3

4

5

Et hanol

C. 4

3

2

1

0

-1

-2

-3 0

1

2

3

4

5

Et hanol

8. You have data on a random sample of 15 fifth-grade classes where Y = mean score on standardized reading test and X = percent of students whose parents are not native speakers of English. Here is a portion of the regression printout: Yˆ = 32.4

0.07 X,

MSE = 5.09,

x = 15.0, sX = 5.0

Give a 95% confidence interval for the mean score in an individual class where 20 percent of the students have parents who are not native speakers of English.


SOLUTIONS 1. a. As number of homework problems increases, the mean score tends to increase. For each extra homework problem per week, the class mean is expected to increase by 1.4 points. If 0 homework problems are assigned, the class mean is expected to be 2.1. b. F =

8(0.71)2 (1

0.712)

= 8.13 with 1 and 8 df. The critical value at

= 5% is 5.32. There is significant

evidence of a relationship. 1.4 = std. error Confidence interval 1.4

c. t =

8.13

std. error

2.306(0.491) = 1.4

= 0.491 1.13 = (0.27, 2.53)

2 a. As the low temperature increases (that is, weather is warmer), the fruit sugar content declines. b. 10.76 c. 10.76

3. F =

2.1448 0.11

1 4 + 2 16 15(4 )

= 10.76

0.20 = (10.56,10.96)

23(0.292) = 2.11 with 1 and 23 df. The critical value at (1 0.292)

= 5% is 4.28. There is no

significant evidence of a relationship between systolic blood pressure and body mass index. 4. a. As age increases, log(survival time) tends to decline. b. The relationship of age and survival time is most likely nonlinear. c. Ho: = 0 where is the population correlation between age and log(survival time). Since the p value is 0.094, we cannot reject Ho. There is no significant evidence of a linear relation between age and log(survival time). d. Must have been a small sample, or F would have been large. 5. Must remember to transform Sand Content = 20 to X = ln(20). Point estimate for mean ln(asphalt strength) is 3.42 – 0.49*ln(20) = 1.952. Since value of X for which the prediction is desired is at the sample mean, do not need the sums of squares for the X values. Conf. Interval is 1.952 2.1604 0.16(1 / 15) = 1.952 0.223 = (1.729, 2.175) Note that this is a confidence interval for the mean of ln(asphalt strength) when Sand Content is 20. To obtain a statement about typical Asphalt Strength, must exponentiate. With confidence 95%, median Asphalt Strength is between 5.64 and 8.80 when Sand Content is 20. 6. a. Source Model Error Total

SS 191.00 309.00 500.00

df 2 38

MS 95.5 8.132

F 11.744

b. 5(0.350 2.03(0.0149) = (1.6, 1.9) With confidence 95%, the expected difference is between 1.6 and 1.9. +

0

+

consumption is 1

2.

Since the errors are independent with variance

1X

1 and for Day 2, Y2

=

0

+

+

c. For Day 1, Y1 =

1X

2.

The difference in their actual

2 the difference has mean 0


and variance 2 2 2 2

2 . Hence, a rough confidence interval for the difference would be

2 2(8.132) =

8.07

7. A. nonconstant variance, transform the Y variable B. Looks good C. Nonlinearity, but no strong change in variance. Transform the X variable. 8. point estimate 32.4 – 0.07(20) = 31.0 Confidence interval 31.0

2.1604

5.09(1 +

1 25 + = 31.0 15 14(52)

5.20

= (25.8,36.2)


CHAPTER 8: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator.

1. CIRCLE THE NUMBER WHICH CORRESPONDS TO THE CORRECT ANSWER A. You need to choose between several regression models for the same dependent variable. You would select the model with: #1. the largest MSE #2. the largest MSR #3 the smallest MSR B. You are in charge of forecasting natural gas prices for an energy company, a task for which you use a multiple regression. You must deliver your forecast for next week’s price, with confidence level 95%. You need: #1: a confidence interval for mean price given values of the independent variables #2: a prediction interval for an individual price given values of the independent variables C. You run a regression of Y on five different independent variables. While the F test yields significant evidence that at least one independent variable is linearly related to Y, all the t tests for the individual independent variables have very high p values. This is because: #1: the p values for the individual t tests have not been adjusted for the multiple comparison problem #2: the independent variables are most likely multicollinear D. When the random errors in a regression have non-constant variance, then #1: the regression parameter estimates will be biased #2: the estimated standard deviations will be incorrect E. A model with high R-squared may still show very wide prediction intervals for individuals at given values of the independent variables if #1: the original variation (TSS) in the Y variable is quite large #2: there are numerous independent variables in the regression

2. EACH OF THE STATISTICAL CONCLUSIONS BELOW HAS SOMETHING WRONG WITH IT. REWRITE THE CONCLUSION. Assume the test itself is correctly reported, it is the conclusion drawn from the test that is incorrect. There may be more than one possible correct re-statement. a. In a multiple regression of Memory on quantitative variables Age and Health, the independent variable Age was not significant (t = 1.42, p value = 0.166). Hence, Age has no significant relationship with Memory. b. In a multiple regression of Memory on quantitative variables Age, Health, and Age*Health, the interaction variable was significant (t = 2.56, p value = 0.009). Hence, Age has a significant relationship with Health. c. In a multiple regression of Memory on quantitative variables Age and Health, the F-test from the ANOVA was significant (F = 4.68, p value = 0.005). Hence, both Age and Health have a significant relationship with memory.


3. A researcher has collected data on log(Income) for 600 men in Jacksonville. Log(Income) is used as the independent variable in a series of multiple regressions using independent variables X1 = Age in years X2 = Years of Education X3 = Race (0=white/1=nonwhite) . The full model has SSE(Int, X1, X2, X3, X1*X2, X1*X3, X2*X3) = 35.38. Various simpler models had SSE(Int, X1, X2, X3) = 35.90 SSE(Int, X1, X3, X1*X3) = 36.09 SSE(Int, X2, X3, X2*X3) = 49.85 SSE(Int) = 66.10 Int is short for Intercept, that is, 0 . a. What is R-squared for the full model? b. Test the null hypothesis that X2 has no association of any kind (either alone or through an interaction). Use  = 5%. 4. You are carry out a regression of child’s Reading Score on the independent variables AGE (in years), MOM (Mother’s years of formal education), INCOME (household income in $1000s). Part of the regression printout is summarized below. There were 200 children in the sample. Variable Parameter Estimate Standard Error Intercept -29.4 6.32 AGE 8.56 1.68 MOM 1.24 .35 INCOME .28 .095 a. Give a 95% confidence interval for the increase in mean reading scores if INCOME increases by 10 ($10,000), if AGE and MOM’s education are held constant. b. Previous research had indicated that mean reading scores increased by 10 points for each additional year of AGE, provided other independent variables are held constant. Does this data provide evidence to dispute that claim? Use  = 10%.

5. An urban planner is studying Y = per capita property tax base for various neighborhoods (in $1000s) as a function of X1 = average age of homes and X2 = average size of homes. Data are available for a sample of 120 neighborhoods, in which TSS = 17,136. Here is information on two models. Model 1: y = 0 + 1X1 + 2X2 + 3X1X2 + , R2 = 0.365 Model 2: y = 0 + 2X2 + , R2 = 0.303 Does Model 1 fit significantly better than Model 2, assuming  = 5%? What does your result imply regarding the association with age of homes?


6. You are modeling the Hardness of polyester resins as a function of X1 = curing time. Several models are fit using polynomials in X1. Based on the SSE given below, what order polynomial would you recommend for use as a model? There were 20 observations in the data. TSS = 76 SSE from linear model = 42 SSE from quadratic model = 28 SSE from cubic model = 24 SSE from quartic model = 22

7. The effect of extra tutoring hours (X1) on math scores (Y) is being studied in high-risk High School students. We also want to control for each student’s hours per week outside class spent studying on their own (X2). Our primary emphasis is on studying the effect of X1. The regression printout is attached. a. Using the graph on the next page, plot the predicted value for Y when X2 = 0 and again when X2 = 8. Note that some of the predicted values have already been computed for you: when X1 = 0 and X2 = 0, then Yˆ = 25.4 when X1 = 3 and X2 = 0, then Yˆ = 31.0

when X1 = 0 and X2 = 8, then Yˆ = when X1 = 3 and X2 = 8, then Yˆ = 63.6

?

b. Using your graph as a guide, explain in terms that a non-statistician can understand how extra tutoring hours (X1) affects expected math scores. Under what conditions is the extra tutoring most helpful? c. Give a 95% confidence interval for the increase in mean math scores if tutoring hours are increased by 1, AND hours spent studying on their own (X2) is held at 0.


PRINTOUT FOR PROBLEM 7 Number of Observations Used

80

Analysis of Variance Source

DF

Sum of Squares

Mean Square

Model Error Corrected Total

3 76 79

6210.08235 8543.05152 14753

2070.02745 112.40857

Root MSE Dependent Mean Coeff Var

10.60229 41.36909 25.62853

R-Square Adj R-Sq

F Value

Pr > F

18.42

<.0001

0.4209 0.3981

Parameter Estimates Variable Intercept x1 x2 x1x2

DF 1 1 1 1

Parameter Estimate 25.39257 1.85587 1.93391 0.71724

Standard Error 4.52073 2.56473 0.87829 0.52000

t Value 5.62 0.72 2.20 1.38

Pr > |t| <.0001 0.4715 0.0307 0.1718

Variance Inflation 0 5.85170 3.07128 7.49525

80 70 60 50 40 30 20 10 0

1

2

X 1 =e xtra h o urs o f tu to ri n g

3


8. You are helping a friend with a statistical analysis of his data. After running a regression of Y on the variables X1, X2, and X3, your friend panics that none of the variables in the regression are significant so he will have nothing to discuss in his thesis. What reassurance can you offer your friend, and what is the likely cause of the problem? Briefly describe ONE strategy for reducing the ambiguity in the results.


9. In an agricultural experiment, the dependent variable YIELD = 10s of pounds of tomatoes per 1000 sq ft of plantings is modeled using on FERTILZ = 10s of pounds of fertilizer per 1000 sq ft, SPRGRAIN = spring rainfall in centimeters. The attached regression printout shows the results of regressing Yield on FERTILZ, SPRGRAIN and the interaction SPRGFERT=SPRGRAIN*FERTILZ. The focus of our study is the effect of Fertilizer a. Draw a plot of expected Yield versus Fertilz when Sprgrain = 10 inches, and also when Sprgrain = 30 inches. You may superimpose your plot on the scatterplot below. Values of Fertilz ranged from 2 to 7. See fitted values already computed below. TO Help you, some of the fitted values have already been computed When SprgRain=10 and Fertilz=2 Estimated Yield = 561 SprgRain=30 and Fertilz=2 Estimated Yield = 683 SprgRain=10 and Fertilz=7 Estimated Yield = 723 SprgRain=30 and Fertilz=7 Estimated Yield = ? 1 1 0 0

1 0 0 0

9 0 0

8 0 0

7 0 0

6 0 0

5 0 0 2

3

4

5

6

7

F e rti l i ze r i n1 0 so f p o u n d s

b. Using your plot, describe the effect of Fertilizer. Is Fertilizer more effective when spring rains are heavy or when they are light? c. Is there significant evidence, at  = 5%, that at least one of the independent variables is related to Yield? Cite the appropriate test statistic and its p value. d. Is there significant evidence, at  = 5%, that adding the interaction term to a model that has SprgRain and Fertilz will improve prediction of yields? Cite the appropriate test statistic and its p value. e. Discuss the reasonableness of the regression assumptions, citing the available evidence.


PRINTOUT FOR PROBLEM 9

Source Model Error Corrected Total

Number of Observations Used

93

Analysis of Variance Sum of Mean Squares Square 805740 268580 7127.05286 80.07925 812867

F Value 3353.93

DF 3 89 92

Root MSE Dependent Mean Coeff Var

Variable Intercept fertilz sprgrain sprgfert

Parameter DF Estimate 1 475.28935 1 11.93061 1 2.04230 1 2.04754

8.94870 734.46929 1.21839

R-Square Adj R-Sq

Parameter Estimates Standard Error t Value 12.30734 38.62 2.68275 4.45 0.61353 3.33 0.13299 15.40

0.9912 0.9909

Pr > |t| <.0001 <.0001 0.0013 <.0001

Plot of Residuals versus predicted values 3 0

2 0

1 0

0

-10

-20

-30 5 0 0

6 0 0

7 0 0

8 0 0 p re d i cte d yi e ld

9 0 0

1 0 0 0

Pr > F <.0001

1 1 0 0

Variance Inflation 0 17.13569 9.82387 26.56513


10. The variable Y = change in A1C for a sample of pre-diabetic patients is regressed on four independent variables. The regression yielded the following diagnostic plots. For each of the three plots, briefly

describe what it is telling you.

Figure A

Figure B

Figure C


SOLUTIONS 1 a. #2

b. #2

c. #2

d. #2

e. #1

2 a. There is no significant evidence that Age is related to Memory, provided Health is kept constant. b. There is significant evidence that the relation of Age with Memory varies by value of Health. OR There is significant evidence that the relation of Health with Memory varies by value of Age. c. There is significant evidence that at least one of Health or Age have a relationship with Memory. 3. a. R-squared = (66.1 – 35.38) / 66.1 = 0.465 36.09 − 35.38 b. F = (596 − 593) = 3.97 with 3 and 593 df. The critical value is 2.60. There is significant 35.38 / 593

evidence that X2 has some type of association with ln(Income). 4. a. 10 0.28  1.96 * 0.095 = (0.94,4.66) With confidence 95%, if income increases by 10 units, then the expected increase in reading score is between 0.94 and 4.66 units. b. Ho: age = 10 . t =

8.56 − 10 = −0.857 with 196 df. There is no significant evidence that the 1.68

claim is incorrect.

5. For model 1, SSE = (1-0.365)*17136 = 10881.36 with 114 df For model 2, SSE = (1 – 0.303)*17136 = 11943.792 with 116 df F = 5.66 with 2 and 114 df. There is significant evidence that average age has some type of association with per capita property tax base. 6. The MSE from the full quartic model is 22/(20-5) = 1.467. The sequential sums of squares, beginning with a model that only has an intercept, would be Source SS F Linear 76-42 = 34 23.18 Quadratic 42-28 = 14 9.54 Cubic 28-24 = 4 2.73 Quartic 24-22 = 2 1.36 The critical value with 1 and 15 df is 4.54. This suggests that a quadratic model would fit the data adequately.


7. a. When X1=0 and X2=8, then Yˆ =40.86 b. The dashed line shows the relation of Y with X1 when X2 (time outside class) is 0. The solid line is when X2 is 8. Extra tutoring only has a small impact on expected scores when the student does not spend any extra hours outside of class. However, if the student does spend extra hours outside of class, the tutoring is associated with a great increase in scores.

70

60

y

50

40

c. This is a confidence interval for 1 . 1.85591.9921(2.5647) = (-3.25,6.97) When there is no extra time outside class, the tutoring does not have any significant effect.

30

20 0.0

0.5

1.0

1.5

2.0

2.5

3.0

x1

8. There is strong evidence that at least one variable is significant, but because of the multicollinearity in the independent variable, it is hard to say which. It may be possible to re-express the variables, say by taking ratios of X2 and X3 to X1, and reduce the multicollinearity. If the X’s are polynomial terms, centering X around its mean may help. 9. a. When Sprgrain=30 and Fertilz=7, predicted yield is 1050

c. F = 3353.93, p value < 0.0001, there is extremely strong evidence that at least one of the independent variables is associated with yields. d. t = 15.4, p value < 0.0001, yes there is significant evidence that adding an interaction to the model that has Sprgrain and Fertilz will improve prediction.

10.00 30.00 10.00 30.00

1000

900

ey

b. The graph shows that increasing Fertilizer is always associated with increasing levels of yield, but that the impact of increasing fertilizer is stronger when there is greater spring rain.

sprgrain

1100

800

700

600

500 2

3

4

5

6

7

fertilz

e. The residual plot does not show any crescent shape, that is, no sign of nonlinearity, nor any flare, that is no sign of nonconstant variance. There are no obvious outliers.


10. The featureless blob-like nature of Figure A reassures us there is no evidence of curvilinearity or nonconstant variance. Plot B identifies two moderate outliers on the low side. There are some points on the high side that barely clear the bar as potential outliers. However, in a data set this large, we expect to have a few points with studentized residuals with magnitude at least 2. Plot C shows that all the points that are potential outliers have very low leverage, so they are unlikely to be affecting the estimated regression coefficients very much.


CHAPTER 9: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator. 1. An investigator does three different two-way ANOVAS (summarized below in Tables 1 – 3) but the profile plots weren’t labeled and got mixed up. Help the investigator by determining which profile plot goes with which table. Write the letter for the profile plot in the blank for the corresponding table. Factor A is on the horizontal axis, Factor C is the plotting symbol. F 6.2 4.1 5.4 1.1

p value <0.001 0.029 0.028 0.349

Fig. A →

1 2

1 1

Mean Y

Table 1. Source Model Figure? A C A*C

1 0

9

8 1

2

3

Factor A

F 1.6 0.8 1.1 0.7

p value 0.198 0.460 0.305 0.506

Fig. B →

1 1

Mean Y

Table 2. Source Model Figure? A C A*C

1 2

1 0

9

8 1

2

3

Factor A

F 6.2 5.1 1.3 0.7

p value <0.001 0.014 0.265 0.506

1 2

Fig. C →

1 1

Mean Y

Table 3. Source Model Figure? A C A*C

1 0

9

8 1

2

Factor A

3


2. a. An analysis of the residuals from an ANOVA is performed in order to check the assumptions. The results from Levene's test show p value = 0.193. The researcher states that "there is evidence that the variances are equal". What comment do you have on this statement? b. Two factors are analyzed for their effect on depression scores among students. The factors are GENDER(male/female), and CLASS STANDING (undergraduate/graduate). What line of the ANOVA table would be relevant to testing each of the following hypotheses? #1. The difference between males and females is larger in undergraduates than in graduates. #2. Females have higher scores than males, if you combine undergraduates and graduates. c. In a two-way ANOVA, the factor A is quantitative. The investigator fits a model with a linear and quadratic term, then performs a lack of fit test. This test has p value = 0.026. How do you interpret this result?

3. A two-way ANOVA is carried out to study the effect of Training Duration (1 or 2) and Screen Format Style (A, B, or C) on Accuracy Rates for medical data entry. Sixty volunteers are randomly divided among the six training/screen format combinations, with 10 in each subgroup. The ANOVA table has been partially filled in. Source Training ScreenF Training*ScreenF Error

df

SS 121 72 134 864

MS

F

a. Fill in the blanks in the table. b. Write the appropriate conclusion for the main effect of Screen Format Style assuming  = 5%.

4. You have data on ‘math anxiety’ in girls and boys in grades 4, 6 and 8. a. You believe that if you pool together girls and boys, mean math anxiety will be the same at all grades. You believe there is no . You believe that if you pool together all the grades, mean math anxiety will be higher in girls. You believe that there is . You believe that the difference between girls’ and boy’s math anxiety is greater in grade 8 than it is in grade 4. You believe that there is a b. You had data of 12 children in each of the 6 grade/gender combinations (72 children in all). Complete the ANOVA table. SS df MS F model 220.5 grade 40.0 gender 100.0 grade*gender error 594.0 Identify the results that would be significant at  = 5% by marking them with a *


5. The ANOVA table below was produced by a two-way ANOVA where 5 different furnace temperatures and 4 different types of hardening agents were tried to assess their effects on Y = tensile strength of carbon fiber rods. For each combination of a furnace temperature and a hardening agent, n = 4 rods were produced and tested. This was an extremely costly experiment, but unfortunately the engineer in charged spilled coffee on the printout, partly erasing the ANOVA table. a. Please help by filling in the results. SOURCE Temp Hardener Temp*hardener Error (Within)

df

Sum of Squares Mean Square 1680.0 4896.0 2700.0 24000.0

F

b. The researchers need to recommend the best way of producing carbon fiber rods with greatest tensile strength. CHOOSE ONE it suffices to apply Tukey’s HSD to the main effect for TEMP it suffices to apply Tukey’s HSD to the main effect for HARDENER you must apply Tukey’s HSD to all combinations of TEMP and HARDENER

6. In a study comparing cholesterol levels by SEX (men, women), a second factor used was AGE (three groups). Assume that equal numbers of observations were used in each cell. The cell means for the 2x3 ANOVA is shown below: 40 < age < 50 50 < Age < 65 > 65 Overall Men 211.333 y = 208 y = 212 y = 214 11

12

13

Women

y21 = 184

y22 = 209

y23 = 221

204.667

Overall

196.0

210.5

217.5

208

a. Draw a plot that would summarize the information in the cell means. b. Write a paragraph summarizing the apparent effects of age and sex on cholesterol levels. c. Give the estimates of the parameters in the factorial model ( ̂ , ̂ , ˆ, ̂ ) 7. In a study of Math Anxiety, you are comparing girls and boys at grades 4 and 8. Hence, there are two factors (Gender and Grade). Each factor has two levels. The means within each group are shown below. Grade 4 Grade 8 Boys mean=34 mean=36 n=12 n=12 Girls mean=39 mean=44 n=12 n=12 a. Create a profile plot of the cell means. b. In advance, you specified that there were only three null hypotheses of interest #1. Girls and boys have the same mean Math Anxiety at grade 4. (t(44) = 1.519, p value = 0.136 ) #2 Girls and boys have the same mean Math Anxiety at grade 8. (t(44) = ????????) #3 The difference between girls and boys in grade 4 is the same as the difference between girls and boys at grade 8. (t(44) = -0.64, p value = 0.525)


The MSE from the ANOVA was 65. Two of the t test statistics have been calculated and the ordinary p values from the t table given, but the one for hypothesis #2 is illegible. Calculate the test statistic for hypothesis #2. c. Assume you wanted to control the overall significance level for all three hypothesis at  = 10%. Which of the hypotheses would be significant? (Explain your reasoning briefly.)

[Note regarding problem 8, the question can be altered to ask for different contrasts.] 8. The managers of a power plant investigate the optimal setting of the air intake to obtain the highest efficiency. They believe the optimal setting may depend on whether the ambient air temperature is COOL or WARM. They investigate three different air intake settings (70, 75 and 80) using data from 6 days each, a total of 2*3*6 = 36 observations. The sample means are shown below. The MSE from the experiment was 32.

ambient air COOL ambient air WARM

air intake 70 64 63

air intake 75 60 62

air intake 80 57 62

a. Calculate the test statistic for the null hypothesis that ‘under COOL conditions, there is no nontrivial linear trend in efficiency as air intake changes?’ Hint: the coefficients for a linear trend in three equally spaced groups are (-1, 0, 1). b. Assume that the null hypothesis for part (a) was decided after examining the data. What critical value should be used to decide whether the evidence is significant, using  = 5%.


9. We are examining data on gasoline taxes (expressed in cents per gallon) in 3 different states. These taxes vary with locality within each state. The taxes are classified not just by state, but whether the locality was Rural or Urban, forming a two-factor ANOVA. The results are summarized in the printout. a. Using the graph, give the reader a non-technical discussion of the apparent differences in taxes. Use the results of the ANOVA to say which of the apparent differences can be confirmed as statistically significant, using  = 5%. b. At the bottom of the printout, the means for each state are shown (after adjustment for unequal sample sizes among urban/rural localities). Also shown is a table with the p value for each pairwise comparison of the means by state using a variation of Tukey’s HSD designed for unequal sample sizes. This controls the family-wise error rate at 5%. Summarize the differences using a line plot with overbars, and comment on the differences between states. [Note to instructor: instead of a line plot with overbars, can substitute a table with lettering.]


PRINTOUT FOR PROBLEM 9 2 2 2 1 2 0 1 9 1 8 1 7 1 6 1 5 1 4 1 3 1 2 1 1 1 0 1

2

3

S T A T E ru rl a b

R u ra l

U rb a n

Dependent Variable: tax Source Model Error Corrected Total

DF 5 13 18

Sum of Squares 232.4868421 84.6500000 317.1368421

Mean Square 46.4973684 6.5115385

F Value 7.14

Pr > F 0.0020

Source state rurlab state*rurlab

DF 2 1 2

Type III SS 127.4936364 101.0471014 11.6400000

Mean Square 63.7468182 101.0471014 5.8200000

F Value 9.79 15.52 0.89

Pr > F 0.0025 0.0017 0.4328

Adjustment for Multiple Comparisons: Tukey-Kramer state

tax LSMEAN

LSMEAN Number

1 2 3

12.6916667 18.4666667 13.1333333

1 2 3

Least Squares Means for effect state Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: tax i/j 1 2 3

1 0.0037 0.9487

2 0.0037 0.0081

3 0.9487 0.0081


10. A researcher measures self-ESTEEM in adolescent girls and boys. The researcher classifies each subject by the MARITAL STATUS of their biological parents (Married / Other). The researcher is particularly interested in how Marital Status may impact ESTEEM. Give a full statistical discussion of the results.

Tests of Between-Subjects Effects Dependent Variable: Esteem Type III Sum Source of Squares Corrected Model 234.773a Intercept 2772.262 sex 113.507 mar_s tatus .858 sex * m ar_s tatus 103.852 Error 300.150 Total 3534.000 Corrected Total 534.923

df

Mean Square 3 1 1 1 1 35 39 38

78.258 2772.262 113.507 .858 103.852 8.576

F 9.126 323.269 13.236 .100 12.110

Sig. .000 .000 .001 .754 .001

a. R Squared = .439 (Adjus ted R Squared = .391)

Estimated Marginal Means

12.00

10.00

8.00

6.00

girl

boy

Esteem

sex

Dashed line is Married, solid is Other 1

Tukey HSD

a,b

Subset for alpha = .05 sex_m ar girl_othr girl_marr boy_m arr boy_othr Sig.

N 9 10

1

2

5.0000 8.6000

8.6000

.051

8.7500 11.7500 .106

8 12

Means for groups in hom ogeneous s ubsets are displayed. a. Us es Harm onic Mean Sample Size = 9.536. b. The group s izes are unequal. The harmonic m ean of the group sizes is used. Type I error levels are not guaranteed.


SOLUTIONS 1. Table 1 goes with Figure C Table 2 goes with Figure A Table 3 goes with Figure B 2. a. The null hypothesis is that variances are equal, never have significant evidence for Ho. The researcher should say there is no significant evidence that the variances differ. b. Interaction between SEX and Class standing. Main effect for sex. c. There is significant evidence that the quadratic model is not sufficient to model the effect of Factor A. 3. a.

Source Training ScreenF Training*ScreenF Error

df 1 2 2 54

SS 121 72 134 864

MS 121 36 67 16

F 7.56 2.26 4.19

b. The critical value with 2 and 72 df is about 3.16. There is no significant evidence that the mean screen format styles differ, when averaged over training.

4a. no main effect for grade; main effect for gender; interaction of grade and gender. b. SS df MS F model 220.5 5 44.1 4.9 * grade 40.0 2 20.0 2.22 gender 100.0 1 100.0 11.11 * grade*gender 80.5 2 40.25 4.47 * error 594.0 66 9.0

5. a.

SOURCE Temp Hardener Temp*hardener Error (Within)

df 4 3 12 60

Sum of Squares 1680.0 4896.0 2700.0 24000.0

Mean Square 420 1632 225 400

F 1.05 4.08 0.56

b. Since there are no significant interactions or main effects for TEMP, it suffices to analyze the main effects for HARDENER.


c. ̂ = 208.0 Sex (A) ̂1 = 211.333 − 208 = 3.333

2 3 0

2 2 0

Mean Chol.

6. a and b. The plot shows that cholesterol in women tends to increase sharply with age, but in men there is very little increase. In younger age groups, women have a much lower cholesterol than men. In the oldest age group, women are slightly exceeding men.

2 1 0

2 0 0

1 9 0

̂2 = 204.667 − 208 = −3.333

Age(C) ˆ1 = 196.0 − 208 = − 12.0 ˆ2 = 210.5 − 208 = 2.5 ˆ3 = 217.5 − 208 = 9.5

1 8 0 1

2

3

Age Group

Sex*AGE (A*C) ˆ11 = 208 − (208 + 3.333 − 12) = 8.667 ˆ12 = 212 − (208 + 3.333 + 2.5) = − 1.833 ˆ13 = 214 − (208 + 3.333 + 9.5) = − 6.833

ˆ21 = 184 − (208 − 3.333 − 12) = − 8.667 ˆ22 = 209 − (208 − 3.333 + 2.5) = 1.833 ˆ23 = 221 − (208 − 3.333 + 9.5) = 6.833

7a.

7b.

(44 − 36) = 2.43, p value = 0.019 65(1 / 12 +1 / 12) Using Bonnferroni’s method, a contrast would need to have p value < 0.10/3 = 0.033 to be significant. Only #2 is significant. t=


8 a. L̂ = (−1)64 + (0)60 + (1)57 = −7 , t =

−7 = −2.14 32([1 / 6 + 1 / 6])

b. Should use Scheffe’s method to determine the critical value. F table value with  = 5% and 5, 30 df is 2.53, so critical value is 5(2.53) = 3.57


9. Taxes in Urban areas appear to be higher on average than in Rural areas. This effect is seen in every state, though the difference is slightly larger in State 2. State 2 seems to have the highest taxes, for both Rural and Urban areas. The ANOVA confirms a significant main effect for Rurality (F(1,13) = 15.52, p value = 0.0017). There is also a significant main effect due to State (F(2,13) = 9.79, p value = 0.0025). However, there is no significant interaction (F(2,13) = 0.89, p value = 0.4328) so that the apparently larger difference between rural and urban areas in State 2 could be sampling variability. [Line graph should show states 1 and 3 connected as not significantly different, but state 2 apart as being significantly different/] Tukey-Kramer shows that State 2 has significantly different (apparently higher) mean taxes than States 1 and 3. States 1 and 3 do not differ significantly.

10. The profile plot suggests a strong interaction. Self-Esteem is about equal for girls and boys when the parents are married. When the parents are not married, girls have a low self-esteem and boys have a high self-esteem. The ANOVA confirms a strong interaction (F(1,35)=12.110, p=0.001). This shows that the difference between Married and Other for girls is significantly different from that for boys. Since the interactions are so strong, the analysis was continued by treating each of the four groups as part of a one-way ANOVA. Tukey’s HSD was used to compare the individual groups. There is a significant difference between Girls/Other and Boys/Other, consistent with the plot. However, despite the suggestion of the plot, within Girls the Other and Married groups do not differ significantly. Further, within Boys, the Married and Other do not differ significantly.


CHAPTER 10: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator.

1. FOR EACH SCENARIO BELOW, IDENTIFY THE EXPERIMENTAL DESIGN a. You are comparing three versions of a reading exam to see whether they are of equal difficulty. One hundred children each take all three versions of the exam, in random order. completely randomized design randomized block design randomized block design with sampling b. You are testing treatments designed to protect boat hulls from marine growth. There are two possible primers and three possible top coats, that is, 2x3 = 6 treatment combinations. You select 10 boat hulls, and randomly assign the 6 combinations to a different spot of each hull (6 observations per hull). You have no reason to suspect interactions between hull and treatment. split plot design factorial design within a randomized block repeated measures design with 2 within-subject factors c. You are testing treatments designed to protect boat hulls from marine growth. There are two possible primers and three possible top coats, that is, 2x3 = 6 treatment combinations. However, primers can only be applied over large sections of the hull. Therefore, you divide each hull in half, and paint one half with primer A and the other with primer B. Then you randomly assign the three top coats to spots on each hullhalf. split plot design factorial design within a randomized block repeated measures design with 2 within-subject factors d. You are comparing four paints with regard to the way they fade in sunlight. Twenty panels of wood are randomly assigned to one paint each, so that there are 5 panels for each paint. randomized block design completely randomized design randomized block design with sampling e. Twenty volunteers take a test for cognitive ability on three occasions. The occasions are arranged so that once the test is given in quiet conditions, once in noisy conditions, and once in very noisy conditions, in random order. Some of the volunteers are young, and some are old, and this may be an important influence on cognitive ability. repeated measure with 1 between-subject and 1 within-subject factorial design within a randomized block split plot design


2. To compare the calibration of three instruments designed to measure ozone, five different days are selected. On each day, the machines are set out side-by-side, in random order, and ozone measurements are recorded from each. a. Identify the experimental design. Which, if any, of the factors are random factors? b. The table below shows the sums of squares treating the data as if it came from an ordinary two-way ANOVA with Instrument and Day as fixed effects, but no interaction. At  = 5%, do the instruments differ in their mean ozone measurement? Show the construction of the test statistic. Source Instrument Day Error (Instrument*Day)

Sums of Squares 9.730 399.600 7.600

3. In a experiment to test the effect of antibiotics, fifteen pigeons are first trained to recognize which symbol marks the correct cup containing food. The measure of their training is the percentage of pecks made to the correct cup (PCT_CCUP). The pigeons are then randomly assigned to one of three groups, and their initial value (Time 0) of PCT_CCUP is recorded. Then the pigeons are given an injection. Group 1 receives a saline injection, Group 2 receives antibiotic ‘C’, and Group 3 receives antibiotic ‘P’. PCT_CCUP is measured 24 hours later, and again 48 hours later. The experiment is designed to test whether the antibiotics cause the pigeons to forget their training, and whether the effect of the antibiotic is different at 24 and 48 hours post-injection. a. Identify the experimental design. b. What purpose does the use of a saline injection serve, when the question concerns antibiotics? c. The table below shows the relevant sums of squares. Fill in the degrees of freedom, and explain the degrees of freedom for the error term. Source time inject time*inject pigeon(inject) Error

DF

24

Type III SS 1195.600000 149.733333 134.666667 2970.800000 92.400000

d. Use the information in the table to test for a main effect of Injection. e. Use the information in the table to test for an interaction of Time and Injection. f. The table below shows the sample means for each combination of Time and Injection. Use this to create a profile plot, and to speculate on the nature of the interactions. time 0 0 0 24 24 24 48 48 48

inject 1 2 3 1 2 3 1 2 3

pct_ccup LSMEAN 63.4000000 66.6000000 66.2000000 54.8000000 49.2000000 56.2000000 63.2000000 59.2000000 66.0000000


4. A hospital is experimenting with different wall-surface materials that might result in quieter rooms. On each of three randomly selected floors of the many floors at the hospital, 12 rooms are selected. These 12 rooms are randomly assigned to one of three wall-surface materials. Then a sound test is conducted in each room, for a total of 36 observations. a. Identify the experimental design. b. The sums of squares below were obtained by treating the data as an ordinary two-way ANOVA with Floor and Wall_Surface as the factors. Test the null hypothesis that the mean sound observation does not differ by Wall_surface, using  = 5%. Source

DF

Type III SS

Mean Square

floor wall_surface floor*wall_surface Error

2 2 4 27

250.4751568 413.9184920 64.2060927 339.604993

125.2375784 206.9592460 16.0515232 12.577963

c. If the hospital only had three floors, how would that affect the analysis and its interpretation?

5. An engineering firm is testing combinations of flooring (two possible types) and wall-surfacing (two possible types) which will produce the quietest workspace. They randomly select five different office buildings. The four combinations of flooring and wall-surfacing are then randomly assigned to four offices within each building. a. Identify the experimental design. b. The sums of squares below were obtained by treating the data as a three-way ANOVA. Assuming no interaction between the block effect and the treatment effects, describe the impact of flooring type and wall-surfacing type on noise. Use  = 5% for each test. Source bldg flooring wall_surface flooring*wall_surfac bldg*flooring bldg*wall_surface Error

df 4 1 1 1 4 4 4

SS 25.41710336 5.59352908 35.29569863 0.00019904 47.86657433 90.18758100 45.2720588

Mean Square 6.35427584 5.59352908 35.29569863 0.00019904 11.96664358 22.54689525 11.3180147


6. Twenty subjects are randomly selected, and each is asked to taste four different sodas (presented in random order). The subjects score the sodas on a continuous scale from 1 (poor) to 10 (excellent). The sums of squares for this randomized block design are presented below. Source soda subj Error

DF 3 19 57

Type III SS 72.6373976 264.6497379 141.1444222

Mean Square 24.2124659 13.9289336 2.4762179

a. Is there evidence of that the sodas differ in their mean taste score? Use  = 5%. b. This experiment could have been carried out by allowing each subject to only taste one soda, that is, as a completely randomized design analyzed by a one-way ANOVA. Calculate the relative efficiency of the randomized block design over the completely randomized design.

7. Twenty different volunteers rate two wines on a continuous scale from 1(poor) to 10(excellent), with each wine presented once at 55F and once at 65F. That is, each volunteer will have four scores. Subject effects can be quite strong, and we can not assume that the interactions of subject with either temperature or wine are weak. a. Identify the experimental design. b. The table below shows the sums of squares treating this as a three-way ANOVA. Carry out the proper test for the null hypothesis of no main effect for wine, using  = 5%. Source volunteer wine temp wine*temp volunteer*wine volunteer*temp Error

DF 19 1 1 1 19 19 19

Type III SS 168.3272456 3.2362543 55.1411527 13.4542434 96.4260179 60.5814116 69.4276658

Mean Square 8.8593287 3.2362543 55.1411527 13.4542434 5.0750536 3.1884953 3.6540877

8. FOR EACH SCENARIO BELOW, IDENTIFY THE EXPERIMENTAL DESIGN. a. Forty-five different third-graders are randomly selected, then each is assigned to one of three different math-skills workbooks. At the end of 6 weeks, the third-graders math improvement is assessed. Completely randomized design Randomized block design Randomized block design with sampling Two-way factorial Randomized block with embedded factorial design b. Ten volunteers are given tests of their ability to remember the details of an event under four different conditions: with and without a delay in questioning, and with same or different ethnic identity as the participants in the event. That is, there are two factors (Delay, Identity). Each volunteer is tested four times under all four conditions. Completely randomized design Randomized block design Randomized block design with sampling Two-way factorial


Randomized block with embedded factorial design c. Ten volunteers are given tests of their ability to remember the details of an event under two different conditions: with and without a delay in questioning. Each volunteer is tested three times under each of the two conditions, in random order, a total of six results per volunteer. Completely randomized design Randomized block design Randomized block design with sampling Two-way factorial Randomized block with embedded factorial design d. Forty volunteers in a ‘Phase 1’ drug trial are randomly divided into four groups and assigned Vaccine A or B at either a high or low dose. At the end of one month, each volunteer’s antibody levels are measured. Completely randomized design Randomized block design Randomized block design with sampling Two-way factorial Randomized block with embedded factorial design e. Ten volunteers in a ‘Phase 1’ drug trial are given a vaccine. Each volunteer’s antibody levels are measured at one-month, two-month, and six-month intervals after receiving the vaccine. Completely randomized design Randomized block design Randomized block design with sampling Two-way factorial Randomized block with embedded factorial design

9. FOR EACH PART BELOW, CHOOSE THE MOST APPROPRIATE ANSWER. a. Blocking is used in experimental design when: The blocks are an effect whose size we particularly wish to measure The blocks differ greatly in their responses, but this is not part of our research focus The blocks represent a convenient way to collect data b. In experimental designs, sampling error refers to: The variability in a treatment’s effect in different blocks The mean square used in the denominator of the F tests The variability in responses within a block when units are given the same treatment c. When the blocks within one treatment are different from the blocks within another treatment, we say we have a: Nested design Repeated measures design Randomized block design with sampling d. Repeated-measures experiments often:


have independent sampling errors require adjustment to their degrees of freedom based on a failure of sphericity have subject as a fixed effect e. Split plot designs arise when: there are two treatments, but one cannot be varied as easily as the other there are two treatments which can be assigned independently of each other both blocks and treatments are random effects

10. In a randomized block design with sampling, the plan was to obtain two observations for each of the three treatments within each of the 8 blocks, for a total of 48 observations. Unfortunately, a few scattered observations are missing. Your statistical software prints the following calculation of expected mean squares and a table of the mean squares. The symbol ‘Q’ is shorthand for an expression involving squared terms of the effects in the linear model. You fit a model with fixed treatment effects and random block effects, but no interactions.

a. Construct a test of the null hypothesis of no treatment effects treatment effects. b. Estimate the variance of the block effects.


SOLUTIONS.

1a. Randomized block design b. factorial design within a randomized block c. split plot design d. completely randomized design e. repeated measure with 1 between-subject and 1 within-subject

2. a. This is a randomized block design, with DAY as the block, which is a random effect. b. Source Sums of Squares Instrument 9.730 Day 399.600 Error (Instrument*Day) 7.600

df 2 4 8

MS 4.865

F 5.12

0.95

There is significant evidence that the instruments differ.

3. a. This is a repeated measures design. Pigeon is the subject, Injection is a between subjects factor, and Time is a within-subjects factor. b. The use of a control, or saline, group allows us to judge whether the effect is due to the antibiotic, as opposed to the trauma of being handled and receiving an injection. c. The error is formed using the interaction of the within-subject factor with 2 df, and the subject(between-subject factor) with (5-1)*3 = 12 df. Hence error has 24 df. Source time inject time*inject pigeon(inject) Error

DF 2 2 4 12 24

Type III SS 1195.600000 149.733333 134.666667 2970.800000 92.400000

d. This would use the mean square for pigeon(inject) in the denominator. F(2,12) = [149.733/2] / [2970.8/12] = 0.302. There is no significant evidence of a main effect for injection. e. This would use the Error term as the denominator. F(4,24) = [134.6667/4] / [92.4/24] = 8.74. There is significant evidence of an interaction of Time and Injection. f. Apparently, all the groups tend to show a decrease in PCT_CCUP at 24 hours post-injection, but tend to rebound by 48 hours. However, pigeons in group 2 (antibiotic ‘C’) show a greater effect at 24 hours.


m np ct_ ccup 6 7C 66 P 65 64 S 6 3

P

S

62 61 60 59 58 57 56 55 54 53 52 51 50 49

C

P S

C 0

1 0

2 0

3 0

4 0

5 0

ti m e

4. a. This is a randomized block with sampling. Blocks are floors of the hospital. b. The experimental error is given by the Floor*wall_surface interaction. F(2,4) = 125.238/16.052 = 7.80. The critical value from the F table at  = 5% is 6.94, so there is significant evidence that at least one of the wall_surface treatments differs in its mean sound reading. c. Now floor would be a fixed effect and this would be an ordinary two-way ANOVA.

5. a. This is a factorial experiment in a randomized block. Block is office building. b. The bldg*flooring and bldg*wall_surface interactions should be pooled with the error (from the threeway interaction) to estimate experimental error. Denominator: [47.867+90.188+45.272]/12 = 15.28 Flooring: F(1,12) = 5.594/15.28 = 0.37, no significant effect of flooring Wall_surface: F(1,12) = 35.296/15.28 = 2.31, no significant effect of wall_surface 6. a. F(3,57) = 9.78, there is significant evidence that the means for the sodas are not all equal. b. RE =

19(13.93) + 60(2.4762) / 79 2.4762

= 2.11

7. a. Repeated measures with two within-subjects factors b. Denominator will use the volunteer*wine interaction, F(1,19) = 0.64 8. a. Completely randomized design b. Randomized block with embedded factorial design c. Randomized block with sampling d. Two-way factorial e. Randomized block design


9. a. the blocks differ greatly in their responses, but this is not part of our research focus b. the variability in responses within a block when units are given the same treatment c. nested design d. require adjustment to their degrees of freedom based on a failure of sphericity e. there are two treatments, but one cannot be varied as easily as the other 10 a. F(2,35) = 0.9394 / 0.7625 = 1.23, p value = b. Equating the observed MS for Block to its expected value, 0.1732 = 0.763 + 5.601 var(Block), which requires var(Block) to be negative. Since this cannot be, we estimate var(Block) as 0.


CHAPTER 11: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator. 1. a. You fit two models in a situation where there are two groups, represented by the dummy variable d (coded as 1 for group 1 and 0 for group 2), and a quantitative independent variable x. Explain the difference in the interpretation of the parameter 1 in the two models. Model 1. y = 0 + 1x + 

Model 2. y = 0 + 1x + 2d + 3 xd + 

b. You use factor effects coding to represent ethnicity (whites, blacks, Latinos, and others) using the following coding scheme. D1 D2 D3 Whites 1 0 0 Blacks 0 1 0 Latino 0 0 1 Other -1 -1 -1 Let 1, 2 , and 3 be the regression coefficients that go with these dummy variables. Give the expressions for each of the following comparisons. #1 whites – Latinos #2 blacks – others #3 whites – average of all 4 groups.

2. Consider a situation where one of the independent variables is Disease Severity (mild, moderate, severe, extreme). For each of the situations below, define a set of dummy variables so that t tests for the comparisons of interest would automatically be computed by a regression program. a. You wish to compare mild, moderate, and severe categories to the average of all categories. b. You wish to compare moderate to mild, severe to mild, and extreme to mild. c. You wish to compare moderate to mild, severe to moderate, and extreme to severe.

3. You randomly assign volunteers to one of three treatments for high blood pressure. The dependent variable is the improvement in blood pressure after 4 weeks of treatment, but a possible complication is the age of the volunteer. The investigator proposes an ANCOVA to compare the effect of the treatment after controlling for age. There are 50 volunteers in the data. You create two independent variables for treatment, and fit two regression models: Model 1: Y = 0 + 1Age + 2 I2 + 3I3 + 4 I2 Age + 5I3 Age +  , with SSE = 1584, and Model 2: Y = 0 + 1Age + 2 I2 + 3I3 +  , with SSE = 1660. Is ANCOVA a legitimate procedure in this situation? Discuss all the reasons for or against its use.

4. A financial company has two models that they use to predict inflation for the next quarter. Method 1 tends to over-estimate inflation, that is, its error has expected value 0.25 with a variance of 0.02. Method 2 tends to under-estimate inflation, that is, its error has expected value -0.25 with a variance of 0.03. The two estimates are positively correlated, with a correlation of 0.8. a. Give the covariance matrix for the errors from the two methods. b. The president of the company decides that the best thing to do is to average the results of the two methods together. Give the mean and standard deviation for the error from this new forecast.


5. You have regressed Y = Trust in Police on independent variables X = Income in $10000s, G = Gender (0 for men, 1 for women) and the interaction X*G. The estimates of the regression parameters, and the estimated covariance matrix for those parameters , are given below. Assume that the sample size is 400. a. Is there significant evidence of a relationship between income and trust-in-police among men? b. Is there significant evidence of a relationship between income and trust-in-police among women? Variable Intercept x g xg

DF 1 1 1 1

Parameter Estimate 3.36179 0.40356 0.14909 -0.25865

Covariance of Estimates x Intercept g 0.0816 -0.0100 -0.0816 -0.0100 0.0014 0.0100 -0.0816 0.0100 0.1743 0.0100 -0.0014 -0.0206

Variable Intercept x g xg

xg 0.0100 -0.0014 -0.0206 0.0028

6. You have regressed an independent variable Y on two variables, X1 and X2. You have fit a model that does not have an intercept: Y = 1X1 + 2 X 2 +  . The regression, based on 10 observations, had MSE = 12. The parameter estimates and their covariance matrix are summarized below. a. Give the point estimate for a new observation that has X1 = 2 and X2 = 5. b. Give a 95% confidence interval for a new observation with these values of X1 and X2. Variable X1 X2 Variable X1 X2

DF 1 1

Parameter Estimate 3.5 -2.0 Covariance Matrix X1 X2 1.0 0.1 0.1 0.5


7. The attached printout summarizes a regression of the dependent variable Y = power plant efficiency versus X1 = air intake setting and two indicator variables which represent ambient weather conditions (cool, warm, hot). IW = 1 if warm, 0 otherwise IH = 1 if hot, 0 otherwise. The interactions of X1 with the indicator variables are also included. Efficiency = 0 + 1X1 + 2IW + 3IH + 4X1W + 5X1H + e a. The table below is partly filled in with some predicted efficiencies, based on the attached printout. Complete the table, and use it to graph predicted efficiencies versus X1 on the interval 70 to 80. Weather X1 Efficiency Cool 70 65.53 Cool 80 61.04 Warm 70 62.91 Warm 80 Hot 70 Hot 80 65.28 b. Write a simple, brief explanation of the apparent effect of air intake on efficiency. c. Test the null hypothesis that X1 has no linear relationship with Efficiency when conditions are warm. Specify the degrees of freedom for the test statistic. PRINTOUT FOR PROBLEM 7 Source Model Error Corrected Total

Variable Intercept x1 iw ih x1w x1h

DF 1 1 1 1 1 1

Analysis of Variance Sum of Mean Squares Square 84.24022 16.84804 7.31782 0.08712 91.55804

DF 5 84 89

Parameter Estimates Parameter Standard Estimate Error 96.99809 1.41011 -0.44952 0.01872 -34.81757 1.98450 -53.98339 1.98362 0.45998 0.02649 0.72780 0.02638

Variable Intercept x1 Intercept 1.98840 -0.02638 x1 -0.026378 0.00035 iw -1.98840 0.02638 ih -1.98840 0.02638 x1w 0.02638 -0.00035 x1h 0.02638 -0.00035

F Value 193.40

t Value 68.79 -24.01 -17.54 -27.21 17.37 27.59

Pr > F <.0001

Pr > |t| <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

Covariance of Estimates iw ih x1h x1w -1.98840 -1.98840 0.02638 0.02638 0.02638 0.02638 -0.00035 -0.00035 3.93825 1.98840 -0.05253 -0.02638 1.98840 3.93473 -0.02638 -0.05228 -0.05253 -0.02638 0.00070 0.00035 -0.02638 -0.05228 0.00035 0.00070

You will need 5 places beyond the decimal point to get sufficient accuracy when using the covariance matrix.


8. You have designed a 2x2 factorial experiment where the dependent variable is Y = child’s vocabulary and the factors are A = Gender of Interviewer (same as child, different from child), and B = Distractions (toys present in room, toys not present). However, some of the children refuse to participate or otherwise give invalid data, so the result is an unbalanced dataset summarized below. Gender same Gender different as child from child n = 4, y = 24 n = 2, y = 20 Distraction toys present n = 8, y = 32 n = 12, y = 22 Distraction: toys not present a. Give the estimates of the factor effects ( ̂ , ̂ , ˆ, ˆ ). b. Describe how dummy variables could be used to carry out the analysis using a regression procedure.

9. In each of the scenarios below, a weighted linear regression should be used. Identify the appropriate weighting value, wi . a. Each week, we randomly sample ni gasoline stations, where the number sampled can vary greatly depending on the available manpower. The dependent variable is the average price of regular gasoline at the sampled stations. b. Each week, we randomly sample ni gasoline stations, where the number sampled can vary greatly depending on the available manpower. The dependent variable is the total sales of all goods at the sampled stations and ni is one of the independent variables. c. Each week, we randomly sample 100 bank branches, and let Y be the total number of customers coming in to the inside service desks. We also record the week of the month (MW = 1st, 2nd, etc.). Not only is MW a possible independent variable, but we suspect that the standard deviation is twice as high in the 2nd and 4th weeks of the month as it is in the other weeks.


10. For many years, a company has been tracking its quarterly earnings per share versus what they had predicted. The difference is referred to as the ‘Deviation’. The plots below show a current quarter’s deviation versus the immediate past quarter’s deviation under several scenarios.

Part 1. Label each figure with its most likely value of the Durbin-Watson statistic, using the choices #1 D-W much more than 2; #2 D-W about 2; #3 D-W much less than 2;

Figure A: part 1 (D-W) Part 2 (predict)

Part 2. Label each figure with its most likely use of predicting this quarter’s deviation, knowing the past quarter’s. #1 if the past quarter was high, this quarter will most likely be high also; #2 if the past quarter was high, this quarter will most likely be low; #3 if the past quarter was high, we can’t say much about this quarter.

Figure B: part 1 (D-W) Part 2 (predict)

Figure C: part 1 (D-W) Part 2 (predict)


SOLUTIONS 1. a. In Model 1, 1 is the slope of Y vs X overall in the data set, with both groups combined. In model 2, it is the slope of Y vs X just in group 2. b. #1 1 − 3 #3 1 #2 1 + 22 + 3 2. a. Mild Mod. Sev. Extr.

I1 1 0 0 -1

I2 0 1 0 -1

I3 0 0 1 -1

Mild Mod. Sev. Extr.

I1 0 1 0 0

I2 0 0 1 0

I3 0 0 0 1

Mild Mod. Sev. Extr.

I1 0 1 1 1

I2 0 0 1 1

I3 0 0 0 1

b.

c.

3. Since the assignment of the volunteers was at random, the covariate Age and the Factor (Treatment) will be independent, so that assumption is satisfied. To check the assumption of no interaction between Age and Treatment, we calculate the test statistic: F (2, 44) =

(1660 −1584) / (46 − 44) = 1.06 (1584 / 44)

which is not significant even at  = 10%. So the assumptions of an ANCOVA are reasonable. 

4. a.  0.02 

0.0196

0.0196  0.03 

b. mean = 0.5*(0.25) + 0.5*(-0.25) = 0 variance = 0.52 * 0.02 + 2 * 0.5* 0.5* 0.0196 + 0.52 * 0.03 = 0.0223 and the standard deviation is 0.1493


5. a. Among men, the slope of Y versus Income is measured by  X . Ho:  X = 0. t=

0.40356 0.0014

= 10.79 with 396 df. There is significant evidence of a relationship of Y with Income among

men. b. Among women, the slope of Y versus Income is measured by X +  XG . Ho: X +  XG = 0 The point estimate is 0.40356-0.25865 = 0.14491. The estimated standard error is 0.0014 + 2(−0.0014) + 0.0028 = 0.0014 . The test statistic is t =

0.14491 0.0014

= 3.87 with 396 df. There is evidence

of a relationship between Y and income among women.

6. a. 3.5(2) + 5(-2) = -3 b. variance is 4(1) + 2(2)(5)(.1)+ 25(0.5) = 18.5 confidence interval is −3  2.306 18.5 = (−12.92, 6.92)

7. a. Warm Hot

80 70

63.02 62.49

6 6

c. Ho :  X1 + X1W = 0 point estimate = -0.44952+0.45998 = 0.01046 Using five decimal places in the covariance matrix,

Pred. Efficiency

6 5

b. When ambient air temperature is cool, efficiency tends to decrease as air intake is greater. The best setting is for a low air intake. However, as the weather warms, the relationship gradually shifts. For warm weather, there is almost no change in efficiency as air intake changes. For hot weather, there is an increasing relationship, with the highest efficiency at high air intake.

6 4

63

6 2

6 1 7 0 7 1 7 2 7 3 7 4 7 5 7 6 7 7 78 7 9 80

Air Intake

estimated variance of point estimate is 0.00035 + 2 *(−0.00035) + 0.00070 = 0.00035 and t =

0.01046 0.00035

= 0.56

with 84 df. There is no significant evidence of a relationship between efficiency and air intake when temperatures are warm. 8. a. ˆ = (24 + 20 + 32 + 22) / 4 = 24.5

̂1 = (24 + 20) / 2 − 24.5 = −2.5

ˆ = (24 + 32) / 2 − 24.5 = 3.5

ˆ = 24 − (24.5 + (−2.5) + 3.5) = −1.5

1

b. As a dummy variable for Gender, use G1 = (1 if same, -1 if different) As a dummy variable for Distraction, use D1 = (1 if present, -1 if not present) For interaction G1D1 = G1*D1. Run regression with G1, D1, G1D1 in model.


9. If each station has variance  2 , the variance of the sample mean each week would be  2 / ni , hence wi should be proportional to ni . b. If each station’s sales has variance  2 , then the variance of the total is ni 2 , hence wi should be proportional to 1/ ni c. If each bank’s customer count during weeks other than 2 and 4 is labeled  2 , then the variance in weeks 2 and 4 is 4 2 . Hence, the wi should be proportional ¼ during weeks 2 and 4, and 1 during other weeks. 10. Part 1: Figure A #1, Figure B #3, Figure C #2 Part 2: Figure A #2, Figure B #1, Figure C #3


CHAPTER 12: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator.

1. a. In an analysis of the relationship between the traits valued by sales managers for employees they hire as salesmen, and for the employees they hire as office assistants, you find X2 = 196 , p value < 0.001, and Pearson contingency coefficient 0.84. Does this imply that the sales managers value the same traits in salesmen as they do in office assistants? Explain. b. In writing up the results described in part (a), you describe it as a test of homogeneity. The editor objects that this is really a test of independence. What is the difference, and which phrase is most appropriate here?

2. A teacher has given an exam question that is multiple-choice with three choices. The teacher suspects that the class has not learned this material at all, and is simply selecting a choice at random. From the data, can you disprove the teachers suspicion, at = 5%? Choice number of students choosing

A 25

B 15

C 20

3. A large clinical study of a new drug for the treatment of cholesterol is attempting to compare the frequency of serious side effects. It is thought that the frequency might vary by age. The data on side effects is tabulated below, within age groups. Age: 30 – 39 12 of 132 reported serious side effects 40 – 49 14 of 124 reported serious side effects 50 – 59 21 of 119 reported serious side effects 60 – 69 28 of 121 reported serious side effects a. What is the apparent pattern in this data? b. A software package reports that X2 = 11.81 . Is there evidence of a difference in the probability of side effects by age group? Use = 1%.


4. The state’s department of environmental protection maintains lists of gasoline service stations in each county. Concerned that leaky underground tanks may be contaminating water supplies, the state inspects a large sample of tanks, classifying them as to whether or not they leak. The tanks are also classified by age category (0 – 5 years, 6 – 10 years, 11 + years). The data is summarized in the accompanying crosstab and statistical printouts. Comment on the apparent trend, if any, and use the formal hypothesis test to state a conclusion. Use = 1%. Age * Leak Crosstabulation Leak Age

0 - 5 years

6 - 10 years

11 + years

Total

does not leak 117

Count

leaks 4

Total 121

Expected Count

108.8

12.2

121.0

% within Age

96.7%

3.3%

100.0%

% within Leak Count

40.9%

12.5%

38.1%

88

7

95

Expected Count

85.4

9.6

95.0

% within Age

92.6%

7.4%

100.0%

% within Leak Count

30.8%

21.9%

29.9%

81

21

102

Expected Count

91.7

10.3

102.0

% within Age

79.4%

20.6%

100.0%

% within Leak Count

28.3%

65.6%

32.1%

286

32

318

Expected Count % within Age

286.0 89.9%

32.0 10.1%

318.0 100.0%

% within Leak

100.0%

100.0%

100.0%

Chi-Square Tests

2

Asymp. Sig. (2-sided) .000

18.782

2

.000

17.756

1

.000

Pearson Chi-Square

Value 19.352a

Likelihood Ratio Linear-by-Linear Association N of Valid Cases

318

df

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 9.56.


5. Refer to the information given in problem 4. In an effort to understand where the differences lie for the age groups, the researcher conducts three different pairwise comparisons of two groups at a time. a. Two of the Chi-squared statistics are shown below. Give the third. 0-5 years versus 6-10 years: X2 = 1.82 , p value = 0.178 0-5 years versus 11+ years: X2 = 16.61 , p value < 0.0001 6-10 years versus 11+ years: _ b. The table below summarizes probability of a leak by age group. Use a lettering system to show which groups are different, controlling the family-wise significance level at 5%. Age Category 0 – 5 years 6 – 10 years 11+ years Probability of Leak 3.3% 7.4% 20.6%

6. In a sample of 68 new homes recently built by Company A, inspectors find 7 homes with substantial code violations. In a sample of 54 new homes recently built by Company B, inspectors find 9 homes with substantial code violations. a. Is there evidence that the two companies differ in the probability of having a substantial code violation? Use = 5%. b. Suppose that Company B had 0 homes with substantial code violations. How would that affect your choice of methods for analyzing the data?

7. A psychologist has administered IQ tests to a sample of 200 adult prisoners. If this population has a distribution of IQs similar to that in the general population, the data should come from a normal distribution with a mean of 100 and a standard deviation of 16. The data is summarized below. For some categories, the expected number under the presumed distribution, and the contribution to the 2 statistic have also been given. a. Fill in the remaining blanks in the table. b. Test the null hypothesis that the distribution of IQs in the adult prison population is similar to that in the general population, using = 5%. c. Comment on the apparent difference between the adult prison population and the general population. IQ < 84 84 ≤ IQ < 100 100 ≤ IQ < 116 116 ≤ IQ sample count 55 81 40 24 expected 31.73 68.27 chi-squared 17.07 2.37 contribution

8. In the past, students have evaluated an instructor on a scale of 40% Fair, 40% Good, and 20% Excellent. In the past year, the instructor has changed his style of delivery. A random sample of 50 independent student evaluations showed 10 Fair, 25 Good, and 15 Excellent. Is there evidence of a change in the distribution of the teaching evaluations? Use = 5%.


9. In a trial of a new drug, 1 of the 100 people in the Placebo group reported flu-like symptoms in the first week. By contrast, 8 of the 100 people in the Drug group reported flu-like symptoms in the first week. The table below shows the printout for a standard set of test statistics comparing the proportions in the two groups. You wish to test the null hypothesis that there is no difference in the probability of experiencing flu-like symptoms for the two groups, using = 5%. Cite the single most appropriate test and give the conclusion. Statistics for Table of flu by drug Statistic Chi-Square Likelihood Ratio Chi-Square Contingency Coefficient Cramer's V

DF 1 1

Fisher's Exact Test Table Probability (P) Two-sided Pr <= P

Value 5.7010 6.4543 0.1665 0.1688

Prob 0.0170 0.0111

0.0158 0.0349

10. This table summarizes data from a survey of adults regarding their attitudes towards receiving a Covid-19 vaccination. While the sample size was too small for the Chi-squared statistic to be valid, a quick inspection of the data suggests that the Independent voters were less likely to answer ‘Yes’. Want Vaccine? → Yes No Undecided Total Political Affiliation N (% within party) Adjusted Residual Democrat 4 (40%) 5 (50%) 1 (10%) 10 1.76 -0.39 -1.30 Republican 1 (10%) 5 (50%) 4 (40%) 10 -1.30 -0.39 1.76 Independent 0 (0%) 2 (100%) 0 (0%) 2 -0.80 -0.80 Total

5

12

5

22

a. The standardized residual for the Independents who answered ‘No’ is missing. Calculate it. b. Is the reaction of the Independent voters really the strongest information in the data? If not, describe the strongest difference.


SOLUTIONS 1. a. No, it only means that there is a relationship. They could equally value the opposite traits in salesmen and in office assistants. b. In a true test of homogeneity, you would sample separate naturally occurring populations, and see if the distribution of some characteristic varied by population. For example, you might sample sales managers and production managers to see if the distribution of traits they prefer in office assistants were different. In a test of independence, a single population is sampled, and each participant is measured on two variables. This is a test of independence. 2. Expected value for each choice is 20. Chi-squared statistic is (25

20)2 / 20 + (15

20)2 / 20 + (20

20)2 / 20 =

2.5 with 2 degrees of freedom. There is not

significant evidence to disprove the teacher’s suspicion. 3 a. Apparently, the probability of a serious side effect increases as age increases. b. This has 3 degrees of freedom, the critical value is 11.345. There is significant evidence that at least one group has a different probability of a serious side effect.

4. The probability a tank will leak appears to be increasing, from 3.3% in the newest tanks to 20.6% in the oldest tanks. The proportion of tanks that are leaking is significantly different for at least one group, 2 = 19.352,with 2 df and p value = 0.000. Note that the Chi-squared test is valid here, as all cells have expected counts at least 5. 5a. 6-10 versus 11+ has X2 = 2.65522 = 7.05 , p value = 0.0079 b. Using Bonferroni, we need the ordinary p value to be less that 0.0167 for a difference to be significant. The 11+ group is significantly different from the first two groups. Age Category 0 – 5 years 6 – 10 years 11+ years Probability of Leak 3.3% a 7.4% a 20.6% b

6. a. This data has expected number of homes with code violations = 8.91 for Company A and 7.08 for Company B, so the sample is sufficiently large for a Chi-Squared test. 2

= (7

8.91)2 / 8.91 + (9

7.08)2 / 7.08 + (61

59.09)2 / 59.09 + (45

46.92)2 / (46.92) =

1.07

With 1 degree of freedom. There is no significant difference in the proportion of homes with substantial code violations. b. Now 2 of the 4 cells have expected counts less than 5. Use Fisher’s Exact Test.

7a. using the symmetry of the normal distribution about the mean of 100, IQ < 84 84 ≤ IQ < 100 100 ≤ IQ < 116 sample count 55 81 40 expected 31.73 68.27 68.27 17.07 2.37 11.71 chi-squared contribution

116 ≤ IQ 24 31.73 1.72


b. Chi-squared statistic = 32.87 with 3 degrees of freedom. Critical value is 7.815. There is significant evidence that the distribution of IQs among adult prisoners differs from that in the general population. c. Since the low IQ categories have more than expected, and the high IQ categories have fewer than expected, it appears that IQs are typically lower in the prison population.

8. The expected counts under the old distribution would be 20 Fair, 20 Good, and 10 Excellent. The Chisquared statistic is 8.75 with 2 degrees of freedom. The critical value is 5.991. There is significant evidence of a change in the distribution of the teaching evaluations. They have apparently improved.

9. Two of the 4 cells have expected count less than 5 (4.5 expected with flu-like symptoms in each group). Use Fisher’s Exact test with a two-tailed p value, 0.0349. There is significant evidence that the groups differ in the probability a patient will experience flu-like symptoms.

10. a. E = 22*12*2/(22*22) = 1.091

adjusted res. = (2 1.091) /

1.091(1 2 / 22)(1 12 / 22) = 1.35

b. No, the strongest difference is that Democrats are more likely to answer ‘Yes’, compared to the sample as a whole, and the Republicans are more likely to answer ‘No’.


CHAPTER 13: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator.

1. You are investigating the influence catalyst amount has on the probability a resin patch will fail to adhere. When the catalyst amount is 1 drop, 10% of the patches fail to adhere. When the catalyst amount is 4 drops, 5% of the patches fail to adhere. Assume a logistic model is appropriate. Give reasonable estimates for the logistic regression parameters.

2. A researcher wishes to know how the probability of a mouse developing a tumor is related to the concentration of a toxic substance to which the mouse is exposed. There are 20 mice in this experiment, and they are exposed to X=concentrations ranging from 0 to 30 ppm. The dependent variable is Y = 0 if did not develop tumor and Y = 1 if did develop tumor. The logistic regression modeled the event Y=1. The logistic regression results were: Parameter Intercept X

Estimate -65 2.195

Wald Chi-Square 25.3 7.4

p value <.0001 .0083

a. You are testing the null hypothesis that the concentration to which the mouse is exposed is not associated with the chances of developing a tumor. Cite the appropriate test statistic and its p-value, and write the appropriate conclusion as a sentence. b. Give the ODDS of developing a tumor, if the concentration is 29. c. Give the ODDS RATIO comparing the odds of developing a tumor in two mice, one of whom is exposed to concentration x and the other exposed to concentration x+2. d. To help the reader understand how X affects the probability of developing a tumor, the researchers fills out a table giving the probability at some selected values of X. Help the researcher by filling in the remaining blanks. X Fitted probability of developing a tumor

0 0

20 0

25 .00004

28 .02820

29

29.5

30 .7006


3. For a large sample of children, you have classified their weight as CHILD_WEIGHT=1 if child is obese, and 0 otherwise. You are investigating variables which may influence CHILD_WEIGHT, including mother’s weight (MOMWGT) and hours watching TV (TV). You want to test the null hypothesis that TV has no kind of effect on CHILD_WEIGHT (either a main effect or an interaction), provided you include MOMWGT in the model. You fit a number of logistic models. The value of -2*LogLikelihood from these models is shown below. Construct a single test statistic for this null hypothesis, and interpret the result using = 5%. Terms in model Intercept Intercept, MOMWGT Intercept, TV Intercept, MOMWGT, TV Intercept, MOMWGT, TV, MOMWGT*TV

-2 Log Likelihood 325.166 317.310 324.871 316.523 316.388

4. A marine biologist is studying the effect of dissolved oxygen levels on the probability of fish dying. (This has implications for setting water quality standards.) A number of fish are observed independently in tanks where the Dissolved Oxygen (DO) value varies between 3.0 and 6.0 mg/L. There are three situations (TYPE = 0 for freshwater, 1 for brackish water, and 2 for salt water). For each fish, we record Y = 0 died, 1 for survived. In the logistic regressions, Type was represented by 2 dummy variables using reference coding with freshwater as the baseline. That is: T1 = 1 when Type = 1 and 0 otherwise; T2 = 1 when Type = 2 and 0 otherwise Interactions are T1DO = T1*DO and T2DO = T2*DO. Results for several different models are attached. a. Based on the attached printout, construct a test statistic for the null hypothesis that Dissolved Oxygen (DO) has no kind of effect, either main effect or in the form of an interaction. Give the appropriate conclusion. b. Based on the fitted parameter estimates for Model 2, calculated the predicted probability that a fish will die given that the DO = 4.5 and Type = 0 (freshwater). Do the same calculation, assuming the Type =2 (saltwater). Based on the results for model 2, is the difference significant? (Cite the test statistic and its p-value in your answer.) c. Express the null hypothesis of ‘no difference’ between brackish water and saltwater at a fixed value of DO as a linear combination of the regression coefficients in Model 2. Do not try to carry out the test.


PRINTOUT FOR PROBLEM 4 Independent variables:

Model 1

Intercept and t1, t2, do, t1do, t2do Model Fit Statistics Criterion AIC SC -2 Log L

Intercept Only 51.648 53.312 49.648

Intercept and Covariates 42.074 52.056 30.074

MODEL 2 Independent variables: Intercept, t1, t2, do Model Fit Statistics Criterion AIC SC -2 Log L

Intercept Only 51.648 53.312 49.648

Intercept and Covariates 38.913 45.567 30.913

Parameter

Analysis of Maximum Likelihood Estimates Standard Wald DF Estimate Error Chi-Square

Intercept t1 t2 do

1 1 1 1

16.1161 -3.8231 -3.4854 -3.1078

5.5706 1.5441 1.5734 1.0513

8.3698 6.1304 4.9072 8.7395

MODEL 3 Independent variables: Intercept, t1, t2 Model Fit Statistics Criterion AIC SC -2 Log L

Intercept Only 51.648 53.312 49.648

Intercept and Covariates 53.007 57.998 47.007

Pr > ChiSq 0.0038 0.0133 0.0267 0.0031


5. You have data sent to you by a hospital that gives the number of people admitted with acute respiratory distress (ARD) each week for a sample of 50 recent summer weeks. You have also recorded average air quality for each of these weeks, on a quantitative scale from 10 (poo4) to 1 (excellent). You are interested in how the expected number of admissions for ARD is related to air quality. The size of the hospital has stayed the same over the study period. The results of a Poisson regression are attached. a. Describe the relationship between air quality and admissions. Is there significant evidence that a relationship exists? b. Calculate the expected admissions for ARD when air quality is 8. Criteria For Assessing Goodness Of Fit Criterion Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood

DF 48 48 48 48

Value 37.0780 37.0780 33.8631 33.8631 27.9419

Value/DF 0.7725 0.7725 0.7055 0.7055

Analysis Of Parameter Estimates Parameter Intercept x

DF 1 1

Estimate -0.0425 0.1899

Standard Error 0.2546 0.0372

Wald 95% Confidence Limits -0.5415 0.4566 0.1169 0.2628

ChiSquare 0.03 26.03

Pr > ChiSq 0.8675 <.0001

6. You have data sent to you by a hospital that gives the number of people admitted with acute respiratory distress (ARD) each week for a sample of 50 recent summer weeks. You have also recorded average air quality for each of these weeks, on a quantitative scale from 10 (poor) to 1 (excellent) and whether or not a heat emergency was declared during the week (0 = no, 1 = yes). The size of the hospital has stayed the same over the study period. The results of a Poisson regression are attached. a. Plot the estimated logarithm of the mean number of ARD per week as a function of air quality using separate lines for weeks with and without a heat emergency. b. Discuss the influence of the environmental variables on admissions for ARD. Criteria For Assessing Goodness Of Fit Criterion DF Value Deviance 46 31.3512 Log Likelihood 339.2511

Value/DF 0.6815

Analysis Of Parameter Estimates Parameter Intercept x heat xheat

DF 1 1 1 1

Estimate -0.2141 0.2152 0.2525 0.1463

Standard Error 0.3251 0.0482 0.4024 0.0575

Wald 95% Confidence Limits -0.8512 0.4231 0.1208 0.3097 -0.5362 1.0411 0.0336 0.2591

ChiSquare 0.43 19.94 0.39 6.47

Pr > ChiSq 0.5102 <.0001 0.5304 0.0110


7. One popular sigmoidal-shaped function is the probit function: x g (x, ) = 1

2

3

where is the cumulative distribution function for the standard normal distribution. Recall that increases from an asymptote of 0 at negative infinity to an asymptote of 1 at positive infinity. a. In this model, 1 is the maximum value, and 2 is the value at which the function attains 50% of its maximum. How do you interpret 3 ? b. Based on the plot below, what would reasonable initial estimates be for the parameters?

6 5 4 3 2 1 0 2

4

6

8 10 12 14 16 18 20 22

X

8. For each scenario below, say whether logistic regression or Poisson regression is most likely to be appropriate. a. In a test of a new drug, a possible side effect is headaches. For each participant in the study, researchers record the dose of the new drug (a quantitative variable) and the number of headaches the participant experienced during the first month of treatment. b. In a test of a new drug, a possible side effect is headaches. For each participant in the study, researchers record the dose of the new drug (a quantitative variable) and whether or not the participant experienced any headaches during the first month of treatment. c. A firing process for ceramic disks can produce cracks. In an experiment to improve the production process, engineers fire the disks at a variety of different temperatures and then record whether or not each disk has a crack. d. A firing process for ceramic disks can produce tiny bubbles in the surface. In an experiment to improve the production process, engineers fire the disks at a variety of different temperatures and then record the number of bubbles on each disk.


9. For each scenario below, say whether a sigmoidal or unimodal nonlinear regression is most likely to be appropriate. Assume there is data from a very wide range of the independent variable. a. Day’s mount of electricity produced by a solar panel regressed on day of the year. b. Day’s amount of electricity produced by a solar panel regressed on percent cloud cover. c. Number of words a person is able to recall regressed on the time spent studying the word list. d. Percent of bacteria killer by a disinfecting solution regressed on % alcohol concentration in the solution. 10. You are modeling amount of water absorbed by dried legumes as a function of x = soaking time, x 2 g (x, ) = . Previous work indicated that is in the vicinity of using the nonlinear function 1

1

3

5. In your large data set, you fit two models. Model 1 forces allows

1 = 5; it has -2ln(L) = 121.2. Model 2

1 to be estimated from the data; it has -2ln(L) = 119.4. Is there evidence, at α = 5%, that

1 differs from 5?

SOLUTIONS

1. At x = 1, the ln(odds) = -2.197, and at x = 4, the ln(odds) = -2.944, so an estimate of the slope is ˆ = ( 2.197 ( 2.944)) / (1 4) = 0.249 . An estimate of the intercept is ˆ 1 0 = 1.948 2a. There is significant evidence that the concentration is associated with the chances of developing a tumor (Wald Chi-squared = 7.4, p value = 0.0083). b. ln(odds) = -1.345, so odds = exp(-1.345) = 0.2605 c. exp(2*2.195) = 80.6 d. at x = 29, the probability is 0.2067, at x = 29.5, the probability is 0.4384

3. 2 = 317.310 316.388 = 0.922 with 2 df. The critical value is 5.991. There is no significant evidence that TV hours have any kind of effect, provided that MOMWGT is in the model.

4. A. The full model is Model 1, with main effects for DO and TYPE (T1 and T2) and the interactions T1DO, T2DO. This model has -2LogL = 30.074 The reduced model drops both the main effect for DO and the interactions that involve DO. It is model 3, which has -2LogL = 47.007. Chi-squared = 47.007 – 30.074 = 16.933 with 3 d.f. Using the table of the Chi-squared value, p < .005. There is significant evidence that DO has some kind of effect. b. In freshwater, T1 = T2 = 0, and the predicted probability of dying is

=

exp(16.1161 3.1078* 4.5) 1+ exp(16.1161 3.1078* 4.5)

= .8939

In saltwater, T1=0 and T2=1, and the predicted probability of dying is

=

exp(16.1161 3.4854

3.1078* 4.5)

1+ exp(16.1161 3.4854

3.1078* 4.5)

= .2052

The difference between freshwater and saltwater at a fixed value of DO is given by the parameter that goes with T2. According to the printout for model 2, this is significant (Chi-squared=4.9072, p = .0267).


c. General Model: ln

= o

1 =

Brackish water: ln

1 =

Saltwater: ln

o

1

+

+

+

1T1+

+

o

1

2 +

3 DO

2T 2 +

3DO

DO 3

To say there is no difference in the probability of dying for brackish and saltwater at the same value of DO, we need to test Ho: 1 2 =0

5. a. As x increases, that is, as air quality deteriorates, the expected admissions increases. There is significant evidence that a relationship exists (Chi-square = 26.03, p < 0.0001). b. When x = 8, ln( ) = 0.0425 + 0.1899 *8 = 1.4767, = exp(1.4767) = 4.38

Ln(mu)

6. a. See plot to the right. 1 marks weeks with heat emergency, 0 marks weeks without heat emergency. b. In weeks with without heat emergency, the 4 log(expected admissions) grows slowly as air quality 1 deteriorates. The relationship is significant (Chisquare=19.94, p < 0.0001). When there is a heat 3 emergency, the log(expected admissions) grows significantly faster (Chi-square = 6.47, p = 0.011) than when there is no emergency. During times of 2 good air quality, there is not much difference between 0 the log(expected admissions) whether or not there is a heat emergency. However, when air quality is very bad, the log(expected admissions) is much greater 1 with the heat emergency than without. 1 0

7a. 3 is a measure of the rate of change. The larger 3 , the slower the function will change. b. It looks like the maximum is near 6, so ˆ 1 6 .

0 1

2

3

4

5

6

7

8

9

10

X=air quality

The 50% value would be 3, and it appears that this is attained for an X of approximately 11, so ˆ 2 11 . To obtain an estimate of 3 , work with some convenient percentage of the maximum other than 50%. For example, it appears that the function attains 25% of its maximum value (0.25*6 = 1.5) for x = 8, so 8 11

= 0.67

ˆ

4.48 3

3

8a. Poisson, b. logistic, c. logistic, d. Poisson 9 a. unimodal

b. sigmoidal (decreasing)

c. sigmoidal

d. sigmoidal


10. Ho

5. Likelihood Ratio Test X2 = 121.2 – 119.4 = 1.8 with 1 df, p value =

= 5 versus H1: 1

1

0.237, there is no significant evidence that the value of

1 differs from 5.


CHAPTER 14: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS These problems are designed to be done without access to a computer, but they may require a calculator. Problems 5 – 9 will require specialized tables Note to Instructor: You can shorten the calculations required by the students in problems 5 – 9 by giving the students some of the intermediate sums of ranks. 1. For each of the following scenarios, identify one appropriate parametric and one appropriate nonparametric technique. a. Ten participants are asked to do a taste-testing of two wines. Each participant ranks both of the wines on a quantitative scale. The participants are told that the first wine is Californian, and the second wine is French, but in fact, both wines are identical. We wish to know if the information influences the mean ratings given by participants. b. Children are interviewed and classified as ‘gamers’ versus ‘non-gamers’. Each child is given a test of hand-eye coordination (reported on a quantitative scale). You want to compare to compare typical coordination scores for gamers and non-gamers. c. Based on items on a questionnaire, you score high school seniors with regard to their ‘concern for academic success’. This is a quantitative scale, where high values indicate great concern. You wish to know whether this value is associated with parents’ total years of formal education (quantitative, in years). d. 21 college algebra students are sorted into order by their score on a placement exam, then trios of students with similar initial ability are created. Within each trio, one student is randomly chosen to be taught by Method A, one by Method B, and one by Method C. At the end of the semester, all students take the same exam, and the score on that is our measure of how much students learned. We want to know whether one method tends to give superior scores.

2. For each of the following scenarios, identify one appropriate parametric and one appropriate nonparametric technique. a. Debt ratios (a quantitative measure of debt compared to household income) is measured in a sample of households. For each house, we report both its Debt Ratio, and its Educational Level of head of household (as years of formal education). The alternative hypothesis of interest is that typical Debt Ratios decline as Educational Level increases. bYou are studying cholesterol values in men. 8 men participate in your study and you follow them longitudinally – that is, the men’s cholesterol values are reported as they age – at years 50, 52, 54, 56, and 58. There are no missing values. You wish to detect whether typical values differ for at least one of the ages. c. The Department of Energy sponsors an experiment in which 12 homes of different age and construction type have additional attic insulation installed. We have a record of what each home’s summer utility bill was pre-installation, and also for the summer post-installation. We wish to compare these bills to see whether there has been a change. d. 60 people who need to lose substantial weight are randomly assigned to one of 4 different diet regimes. After 6 weeks on the diet, their ‘wellness’ score, a measure of how well they feel, is assessed.


This is on a scale of 0 to 50, with high scores reflecting better contentment. You wish to detect whether typical values differ for at least one diet regime. 3. Match each of the following figures with one of the combinations of Pearson’s r (rp) and Spearman’s r (rs). #1 rp = -0.8, rs = -1

#2 rp = 0, rs = 0

#3 rp = 0.9, rs = 1

#4 rp = -0.7, rs = 0.2

Figure A: Correlations

Figure B: Correlations

Figure C: Correlations

Figure D: Correlations

4. The data below show summer and winter Nitrogen Concentrations on 10 lakes. Carry out the Wilcoxon signed rank test to compare summer and winter levels, using asymptotic methods to calculate a p value. Write the appropriate conclusion assuming  = 5%. Lake 1 Winter 525 Summer 721

Lake 2

Lake 3

Lake 4

Lake 5

Lake 6

Lake 7

Lake 8

Lake 9

328 252

412 355

803 1242

119 155

225 180

465 415

582 308

333 512

Lake 10 419 500


5. The data below show Nitrogen Concentrations in 5 lakes in agricultural areas, and 6 lakes in a nearby natural area. Compare the location of the distributions for Nitrogen Concentrations, assuming the shapes of the distribution are the same. Use a nonparametric technique with  = 5%. Agricultural: 156 225 369 451 809 Natural: 89 155 290 331 401 600

6. Four different judges rank five different wines. Each judge ranks the wines from 1 = best to 5 = worst. We suspect that the judges cannot really tell the difference between the wines, and are simply assigning ranks at random. Does the data provide evidence to disprove this idea? Use  = 5%.

judge 1 judge 2 judge 3 judge 4

wine A 5 2 4 3

wine B 3 1 5 4

wine C 2 5 3 1

wine D 1 3 1 5

wine E 4 4 2 2

7. The data below show Secchi depths (a measure of water clarity, with higher values denoting clearer water) for lakes in three different land-use areas. Compare the distributions of the values using  = 5%. Agricultural: 3.5 4.1 5.2 5.6 6.2 Natural: 5.3 5.9 6.1 7.2 7.8 Residential: 3.6 4.5 5.1 5.8 6.0

8. In a very large study in your area during year 2000, the median debt ratio for households was 2.80. In a small recent sample in the same area, the observed debt ratios were: 1.52 1.86 1.92 2.02 2.25 2.62 2.83 2.91 3.06 3.11 3.18 3.25 3.30 3.44 3.58 3.62 3.85 3.93 4.04 4.44 4.56 5.01 5.62 5.99 6.80 a. Is there evidence that the current median differs from 2.8? Use  = 5%. If there is a significant difference, indicate its apparent direction. Hint: after subtracting 2.80 from all observations, T(+) = 253.5. b. Construct the boxplot for this data, and comment on why a nonparametric test would be the most suitable method of analysis.

9. You interview both the husband and wife for 6 different couples, asking each to separately fill out questionnaires gauging their ‘financial risk aversion’. Risk aversion is a score that is higher when an individual is less willing to take financial risks. Is there evidence that wives tend to differ systematically from their husbands with regard to risk aversion, at  = 10%? Husband Wife

8 12

10 9

6 8

14 17

9 12

16 10


10. You have credit ratings for samples of full-time employed people who are 1) married, 2) single and never married, 3) single and divorced, 4) widows/widowers. Credit ratings are scores on a scale of 0 to 850. In your sample, the median score is 520, and you use this to divide people into ‘High’ (521 and up) and ‘Low’ (0 through 520) ratings. How would you interpret the attached SAS printout? Which, if any, marital status classes seem to be associated with better credit scores? credit_score marital_status Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚ 1‚ 2‚ ˆ ˆ ˆ High ‚ 41 ‚ 25 ‚ ‚ 18.98 ‚ 11.57 ‚ ‚ 37.96 ‚ 23.15 ‚ ‚ 51.25 ‚ 50.00 ‚ ˆ ˆ ˆ Low ‚ 39 ‚ 25 ‚ ‚ 18.06 ‚ 11.57 ‚ ‚ 36.11 ‚ 23.15 ‚ ‚ 48.75 ‚ 50.00 ‚ ˆ ˆ ˆ Total 80 50 37.04 23.15

3‚ ˆ 16 ‚ 7.41 ‚ 14.81 ‚ 32.00 ‚ ˆ 34 ‚ 15.74 ‚ 31.48 ‚ 68.00 ‚ ˆ 50 23.15

4‚ Total ˆ 26 ‚ 108 12.04 ‚ 50.00 24.07 ‚ 72.22 ‚ ˆ 10 ‚ 108 4.63 ‚ 50.00 9.26 ‚ 27.78 ‚ ˆ 36 216 16.67 100.00

Statistics for Table of credit_score by marital_status Statistic DF Value Prob Chi-Square 3 13.6411 0.0034 Likelihood Ratio Chi-Square 3 14.0437 0.0028 Mantel-Haenszel Chi-Square 1 0.5392 0.4628 Phi Coefficient 0.2513 Contingency Coefficient 0.2437 Cramer's V 0.2513 Sample Size = 216

SOLUTIONS 1. a) parametric: paired t test; nonparametric: Wilcoxon signed rank test b) parametric: independent samples t test; nonparametric: Mann-Whitney test (or Wilcoxon two sample test) c) parametric: Pearson correlation; nonparametric: Spearman correlation d) parametric: F test for randomized block design; nonparametric: Friedman’s test 2. a) parametric: Pearson correlation; nonparametric: Spearman correlation b) parametric: F test for randomized block design; nonparametric: Friedman’s test c) parametric: paired t test; nonparametric: Wilcoxon signed rank test d) parametric: F test for one-way ANOVA; nonparametric: Kruskal-Wallis test 3. Figure A: #4, Figure B: #1, Figure C: #2, Figure D: #3 4. Taking the differences as Summer – Winter, T(-) = 23 and T(+) = 32, so T = 23.  = 10 *11 / 4 = 27.5 10 *11* 21 23 − 27.5 = 9.81, z = = −0.46 Since this is a two-tailed test, p value = 2*(0.323) = 0.646. 24 9.81 There is no significant evidence that the Summer-Winter differences are not symmetrically distributed about 0 (usually interpreted as no evidence of a change in the typical value of nitrogen concentration).

=

5. Ta = 34, na = 5, Tn = 32, nn = 6, T = 32. Using Table A.10, reject Ho if T < 18. There is no significant evidence of a difference in the distribution of the Secchi depths in the two areas.


6. Friedman’s test. b=4, t = 5, A = 220, B = 182.5, T* = 0.2. Compare to critical value from F table with 4 and 12 df. There is no significant evidence that the wines differ in the distribution of their ratings. Either all the wines are equally good, or the judges do not have any agreement that one is better or worse than another. 7. T(ag) = 31, T(res) = 31, T(nat) = 58, H = 4.86 compare to chi-squared table with 2 df. Critical value is 5.991. There is no significant evidence of a difference in the location of the distributions. 8. a. Using the sign test (from Chapter 4) 19 of the 25 observations exceed 2.80. Under the null hypothesis, the 0.76 − 0.50 probability of exceeding 2.80 is 0.50. z = = 2.6 0.5* 0.5 / 25 Or better use the signed rank test T = T(-) = 325-253.5 = 71.5 (there is one pair of tied values). Using Table A.8, Reject Ho if T < 90, there is significant evidence that the distribution is not symmetrically distributed around 2.80 (more usually interpreted as meaning that the median is not 2.80). b. Data is positively skewed with an outlier on the positive side. Normality assumption is questionable. 9. Taking the differences as Wife – Husband, T(-) = 7 and T(+) = 14, so T = 7. Using Table A.8, we would need T < 2 to reject Ho, so there is no significant evidence of a systematic difference between husbands and wives. 10. The Chi-squared test is for the null hypothesis that the probability of being higher than the overall median is the same in all groups, that is, that the population median is the same in all groups. Since the p value is small, there is significant evidence that at least one group is different. From the column %, it seems that widows/widowers tend to have the highest median ratings and single+divorced have the lowest.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.