ANSWER: B
6. For interval level variables, which of the following properties does not apply? a. Choco Mocha is five units lower than Cocoa Mocha. b. Choco Mocha is tastier than Cocoa Mocha. c. Choco Mocha is twice as expensive as Cocoa Mocha. d. Choco Mocha is different from Cocoa Mocha. ANSWER: C 7. Which of the following properties is appropriate for ordinal, but not for nominal variables? a. Choco Mocha is $2 greater than Cocoa Mocha. b. Choco Mocha is different from Cocoa Mocha. c. Choco Mocha is tastier than Cocoa Mocha. d. Choco Mocha is twice as expensive as Cocoa Mocha. ANSWER: C 8. Lilo has sent four manuscripts to journals to be considered for review while Stitch has sent out two. Which scale of measurement is implied by the quantity of publications? a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: D 9. A researcher creates a variable called “shampoo type” that categorizes shampoo into three types: a) herbal; b) chemically balanced; and c) non-additive. What type of measurement scale is “shampoo category?” a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: A
10. The number of bagels purchased by students at the student union each day is considered which measurement scale? a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: D 11. Jennifer counts the number of students that purchase lunch from the student union on Monday. Which measurement scale is "the number of students that purchase lunch"? a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: D 12. Wesley creates a variable that is called “truck size” that categorizes trucks into three types: a) compact; b) mid-sized; c) full-size and d) monster. What type of measurement scale is “truck size?” a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: B 13. Luke is shopping for engagement rings and decides there are three types of diamonds: 1) tiny; 2) average; and 3) huge. What is the measurement scale for "type of diamond" based on Luke's classification? a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: B
14. Marie is cleaning out her toy box and groups her toys into the following categories: a) stuffed animals; b) electronic toys and games; c) books; and d) toys to donate to charity. What is the measurement scale for Marie's toy categories? a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: A 15. After Marie cleans out her toy box, she counts the number of toys that she plans to donate to charity. What is the measurement scale for this variable? a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: D 16. A question on a survey completed by dancers asked which style of dance was their favorite. Options included the following: a) ballet; b) jazz; c) lyrical; and d) tap. What is the measurement scale of this variable? a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: A 17. A pool company conducted a survey and asked respondents to indicate the ideal temperature in which to swim. Respondents were asked to respond in whole numbers using the Fahrenheit scale. What is the measurement scale of this variable? a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: C
18. Which one of the following would be considered a categorical variable? a. Age in months b. Favorite actress c. Number of students who graduate with honors d. Temperature (in Fahrenheit) at noon ANSWER: B 19. Which one of the following would be considered a discrete variable? a. Age in months b. Favorite actress c. Number of students who graduate with honors d. Temperature (in Fahrenheit) at noon ANSWER:C 20. Which one of the following would be considered a discrete variable? a. Blood pressure b. Favorite actress c. Number of children in a family d. Home ownership (yes or no) ANSWER:C 21. Which one of the following would be considered an interval variable? a. Age in months b. Favorite actress c. Number of students who graduate with honors d. Temperature (in Fahrenheit) at noon ANSWER: D 22. Which one of the following would be considered both ratio and continuous variable? a. Age in months b. Favorite actress c. Number of students who graduate with honors d. Temperature (in Fahrenheit) at noon ANSWER: A 23. Mark made an A on his midterm exam and Randall made a B. I assert that Mark’s midterm exam score was 10 points higher than Randall’s. Am I correct?
a. Yes b. No ANSWER: B
24. Mark studied for the midterm exam for 6 hours. Randall studied for 2 hours. I assert that Mark studied three times longer for the midterm exam than Randall. Am I correct? a. Yes b. No ANSWER: B 25. Haiyan is an associate professor and Monifa is an assistant professor. I assert that Haiyan has five more years teaching experience than Monifa. Am I correct? a. Yes b. No ANSWER: B 26. JoAnn’s income is $50,000 while Betty’s income is $100,000. I assert that JoAnn’s income is one-half that of Betty’s income. Am I correct? a. Yes b. No ANSWER: A 27. A truck dealer promotes two new trucks, one is called the ‘light duty’ truck and the other is called the ‘heavy duty’ truck. I assert that the ‘light duty’ truck holds one-half the capacity as the ‘heavy duty’ truck. Am I correct? a. Yes b. No ANSWER: B 28. The standing high school record for the long jump is 6 feet, and the elementary school record is 4 feet. I assert that the elementary school record is two-thirds the length of the high school record. Am I correct? a. Yes b. No ANSWER: A
Short Answer 1. Rank the following values of the number of students per classroom who are enrolled in an afterschool activity, assigning rank 1 to the largest value: 3 0 5 12 20 8 2 15 9 11 ANSWER Value 3 0 5 12 20 8 2 15 9 11
Rank 8 10 7 3 1 6 9 2 5 4
2. Rank the following values of the number of credit cards owned by college freshmen, assigning rank 1 to the largest value: 4 2 0 5 10 11 15 1 3 9 ANSWER Value 4 2 0 5 10 11 15 1 3 9
Rank 6 8 10 5 3 2 1 9 7 4
3. Rank the following values of the number of hours of television watched per day, assigning rank 1 to the largest value: 4 0 8 13 9 3 2 7 5 1 ANSWER Value 4 0 8 13 9 3 2 7 5 1
Rank 6 10 3 1 2 7 8 4 5 9
4. Rank the following values of the number of miles driven from home to work in one day, assigning rank 1 to the largest value: 9 52 20 36 22 44 18 28 16 26 ANSWER Value 9 52 20 36 22 44 18 28 16 26
Rank 10 1 7 3 6 2 8 4 9 5
5. Rank the following values of the number of emails received in one day, assigning rank 1 to the largest value: 72 250 60 10 0 8 720 300 125 85 ANSWER Value 72 250 60 10 0 8 720 300 125 85
Rank 6 3 7 8 10 9 1 2 4 5
Multiple-Choice Chapter 2 (Data Representation) 1. For a distribution where the 25th percentile is 30, what is the percentile rank of 30? a. 0 b. .25 c. .30 d. 25 e. 30 ANSWER: E 2. For a distribution where the 75th percentile is 50, what is the percentile rank of 50? a. 0 b. .50 c. .75 d. 50 e. 75 ANSWER: D 3. For a distribution where the 40th percentile is 80, what is the percentile rank of 80? a. 0 b. .40 c. .80 d. 40 e. 80 ANSWER: E 4. For a distribution where the 90th percentile is 20, what is the percentile rank of 20? a. 0 b. .20 c. .90 d. 20 e. 90 ANSWER: D
5. Among the following, what is the preferred method for graphing data pertaining to preference of coffee flavor of a sample? a. Bar graph b. Cumulative frequency polygon c. Frequency polygon d. Histogram ANSWER: A 6. Among the following, what is the preferred method for graphing data pertaining to eye color of a sample? a. Bar graph b. Cumulative frequency polygon c. Frequency polygon d. Histogram ANSWER: A 7. Among the following, what is the preferred method for graphing data pertaining to religious affiliation of a sample? a. Bar graph b. Cumulative frequency polygon c. Frequency polygon d. Histogram ANSWER: A 8. Among the following, what is the preferred method for graphing data pertaining to favorite types of dance of a sample? a. Bar graph b. Cumulative frequency polygon c. Frequency polygon d. Histogram ANSWER: A
9. Which of the following statements is correct for a continuous variable? a. The proportion of the distribution below the 25th percentile is 75%. b. The proportion of the distribution below the second quartile is 50%. c. The proportion of the distribution between the first quartile and 75th percentile is 75%. d. The proportion of the distribution between the second quartile and 50th percentile is 50%. e. The proportion of the distribution between the third quartile and 75th percentile is 25%. ANSWER: B 10. Given a variable that is interval or ratio in measurement scale, which of the following is correct? a. The proportion of the distribution between the 25th percentile and the 3rd quartile is 75%. b. The proportion of the distribution between the 50th percentile and the 2nd quartile is 50. c. The proportion of the distribution below the 1st quartile is 75%. d. The proportion of the distribution between the 2nd quartile and the 75% percentile is 25%. ANSWER: D 11. Which of the following statements is correct for a continuous variable? a. The proportion of the distribution below the 25th percentile is 25%. b. The proportion of the distribution below the second quartile is 25%. c. The proportion of the distribution between the first quartile and 50th percentile is 50%. d. The proportion of the distribution between the second quartile and 50th percentile is 25%. e. The proportion of the distribution between the third quartile is 25%. ANSWER: A
12. Which of the following statements is correct for a continuous variable? a. The proportion of the distribution below the 50th percentile is 75%. b. The proportion of the distribution below the third quartile is 25%. c. The proportion of the distribution between the first quartile and 25th percentile is 50%. d. The proportion of the distribution between the first quartile and 50th percentile is 25%. e. The proportion of the distribution between the third quartile and 75th percentile is 25%. ANSWER: D 13. Which of the following statements is correct for a continuous variable? a. The proportion of the distribution below the 25th percentile is 75%. b. The proportion of the distribution above the first quartile is 75%. c. The proportion of the distribution below the third quartile is 50%. d. The proportion of the distribution between the second quartile and 50th percentile is 50%. e. The proportion of the distribution between the first quartile and 75th percentile is 75% ANSWER: B 14. Which of the following statements is correct for a continuous variable? a. The proportion of the distribution above the 25th percentile is 75%. b. The proportion of the distribution below the second quartile is 75%. c. The proportion of the distribution below the third quartile is 25%. d. The proportion of the distribution between the second quartile and 50th percentile is 25%. e. The proportion of the distribution between the first quartile and 50th percentile is 50%. ANSWER: A
15. Which of the following statements is correct for a continuous variable? a. The proportion of the distribution above the 25th percentile is 25%. b. The proportion of the distribution below the second quartile is 25%. c. The proportion of the distribution below the third quartile is 75%. d. The proportion of the distribution between the second quartile and 75th percentile is 50%. e. The proportion of the distribution between the first quartile and 50th percentile is 75%. ANSWER: C 16. Which of the following statements is correct for a continuous variable? a. The proportion of the distribution below the 25th percentile is 25%. b. The proportion of the distribution below the 50th percentile is 75%. c. The proportion of the distribution above the third quartile is 75%. d. The proportion of the distribution between the 25th and 75th percentile is 75%. e. The proportion of the distribution between the first quartile and third quartiles is 75%. ANSWER: A 17. In examining data collected over the past ten years, researchers at Disneyworld find that of 5,000 first-time guests: 2,250 visited during the summer months; 675 visited during the fall; 1,300 visited during the winter; and 775 visited during the spring. What is the relative frequency for guests who visited during the spring? a. .135 b. .155 c. .26 d. .45 ANSWER: A
18. In examining data collected over the past ten years, researchers at Disneyworld find that of 5,000 first-time guests: 2,250 visited during the summer months; 675 visited during the fall; 1,300 visited during the winter; and 775 visited during the spring. What is the relative frequency for guests who visited during the winter? a. .135 b. .155 c. .26 d. .45 ANSWER: C 19. In examining data collected over the past ten years, researchers at Sea World find that of 1,000 first-time guests: 452 visited during the summer months; 231 visited during the fall; 104 visited during the winter; and 213 visited during the spring. What is the relative frequency for guests who visited during the spring? a. .45 b. .23 c. .21 d. .10 ANSWER: C 20. For a dataset with six values (25, 28, 32, 37, 45, 54, 63), the relative frequency for the value 25 is 18%, the relative frequency for the value 28 is 10%, the relative frequency for the value 32 is 7%, the relative frequency for the value 37 is 26%, the relative frequency for the value 45 is 11%, and the relative frequency for the value 54 is 16%. What is the cumulative relative frequency for the value 45? a. 11% b. 28% c. 61% d. 72% e. 100% ANSWER: D
21. For a dataset with six values (25, 28, 32, 37, 45, 54, 63), the relative frequency for the value 25 is 18%, the relative frequency for the value 28 is 10%, the relative frequency for the value 32 is 7%, the relative frequency for the value 37 is 26%, the relative frequency for the value 45 is 11%, and the relative frequency for the value 54 is 16%. What is the cumulative relative frequency for the value 28? a. 11% b. 28% c. 61% d. 72% e. 100% ANSWER: B 22. For a dataset with five values (65, 72, 80, 88, 95), the relative frequency for the value 65 is 5%, the relative frequency for the value 72 is 15%, the relative frequency for the value 80 is 20%, the relative frequency for the value 88 is 30%, and the relative frequency for the value 95 is 30%. What is the cumulative relative frequency for the value 80? a. 15% b. 20% c. 40% d. 70% e. 100% ANSWER: C 23. For a dataset with five values (65, 72, 80, 88, 95), the relative frequency for the value 65 is 5%, the relative frequency for the value 72 is 15%, the relative frequency for the value 80 is 20%, the relative frequency for the value 88 is 30%, and the relative frequency for the value 95 is 30%. What is the cumulative relative frequency for the value 88? a. 30% b. 40% c. 60% d. 70% e. 100% ANSWER: D
24. One hundred guests during a university’s Parent Day are asked who it is they are visiting while they're on campus. Thirty indicate that they are visiting their son, forty indicate they are visiting their daughter, twenty indicate that they are visiting a grandchild, and ten indicate that they are visiting two children. What is the relative frequency for guests who are visiting their daughter? a. .10 b. .20 c. .30 d. .40 e. .50 ANSWER: D 25. The letter grades earned by students in a speech class included (with “A” being the best grade possible and “F” being the worst grade possible): A, B, C, D, and F. Letter grade of “A” had a relative frequency of 15%, grade of “B” had a relative frequency of 20%, grade of “C” had a relative frequency of 30%, letter grade of “D” had a relative frequency of 20%, and letter grade of “F” had a relative frequency of 15%. What is the cumulative relative frequency for a letter grade of “C”? a. 15% b. 35% c. 65% d. 85% ANSWER: C 26. The five hottest temperatures (in Fahrenheit) recorded at the Central State Zoo and their frequency of occurrence over the past five years were examined by the zoo's statistician. These temperatures included: 55, 72, 86, 94, and 96. The relative frequency for a temperature of 55 was 20%, for a temperature of 72 was 30%, for a temperature of 86 was 14%, for a temperature of 94 was 15%, and for a temperature of 96 was 21%. What is the cumulative relative frequency for the temperature of 86? a. 14% b. 20% c. 50% d. 64% ANSWER: D
27. Of 100 research participants, 10 are in group A, 22 in group B, 30 in group C, and 38 in group D. What is the relative frequency for participants in group D? a. .010 b. .022 c. .030 d. .038 ANSWER: D 28. A statistician employed by the Central State Zoo examines data from a random sample of 500 previous visitors. She finds that 322 guests who visited purchased annual passes, 102 purchased multi-day passes, and 76 purchased one-day tickets. What is the relative frequency for visitors who purchased one-day tickets? e. 1.00 f. .64 g. .20 h. .15 ANSWER: D
Short Answer Chapter 2 (Data Representation) 1. A sample distribution of variable X is as follows: X 2 5 6 8 10 12 15 18 20
f 2 5 8 3 4 6 1 5 7
Calculate or draw each of the following for the sample distribution of X. Where possible, use SPSS to generate the data. a. b. c. d. e. f. g. h. i.
Frequency distribution Cumulative relative frequency distribution Histogram (ungrouped) Frequency polygon Q1 Q2 Q3 P10 and P90 Box-and-whisker plot
2. A sample distribution of variable X is as follows: X 15 20 24 28 32 34 36 38 40
f 5 3 1 4 2 7 3 8 4
Calculate or draw each of the following for the sample distribution of X. Where possible, use SPSS to generate the data. a. b. c. d. e. f. g. h. i.
Frequency distribution Cumulative relative frequency distribution Histogram (ungrouped) Frequency polygon Q1 Q2 Q3 P10 and P90 Box-and-whisker plot
3. A sample distribution of variable X is as follows: X 45 48 52 55 59 62 65 68 70
f 1 2 3 2 5 3 4 4 6
Calculate or draw each of the following for the sample distribution of X. Where possible, use SPSS to generate the data. a. b. c. d. e. f. g. h. i.
Frequency distribution Cumulative relative frequency distribution Histogram (ungrouped) Frequency polygon Q1 Q2 Q3 P10 and P90 Box-and-whisker plot
4. A sample distribution of variable X is as follows: X 75 78 79 81 83 85 86 88 90
f 8 5 7 3 6 5 10 5 3
Calculate or draw each of the following for the sample distribution of X. Where possible, use SPSS to generate the data. a. b. c. d. e. f. g. h. i.
Frequency distribution Cumulative relative frequency distribution Histogram (ungrouped) Frequency polygon Q1 Q2 Q3 P10 and P90 Box-and-whisker plot
5. A sample distribution of variable X is as follows: X 100 110 115 120 125 130 135 140 145
f 4 8 2 6 10 5 7 3 1
Calculate or draw each of the following for the sample distribution of X. Where possible, use SPSS to generate the data. a. b. c. d. e. f. g. h. i.
Frequency distribution Cumulative relative frequency distribution Histogram (ungrouped) Frequency polygon Q1 Q2 Q3 P10 and P90 Box-and-whisker plot
b. Cumulative relative frequency distribution
ANSWER #1 a. Frequency distribution X Cumulative Frequency Valid
c.
Percent
Valid Percent
Percent
2.00
2
4.9
4.9
4.9
5.00
5
12.2
12.2
17.1
6.00
8
19.5
19.5
36.6
8.00
3
7.3
7.3
43.9
10.00
4
9.8
9.8
53.7
12.00
6
14.6
14.6
68.3
15.00
1
2.4
2.4
70.7
18.00
5
12.2
12.2
82.9
20.00
7
17.1
17.1
100.0
Total
41
100.0
100.0
histogram (ungrouped)
d. frequency polygon
e. Q1
Statistics X N
Percentiles
Valid
41
Missing
0
25
6.0000
50
10.0000
75
18.0000
f.
g. Q3
h. P10 and P90 Statistics X N
Percentiles
Valid
41
Missing
0
10
5.0000
90
20.0000
Q2
P10
P90
i. box-and-whisker plot
b. Cumulative relative frequency distribution
ANSWER #2 a. Frequency distribution X Cumulative Frequency Valid
c.
Percent
Valid Percent
Percent
15.00
5
13.5
13.5
13.5
20.00
3
8.1
8.1
21.6
24.00
1
2.7
2.7
24.3
28.00
4
10.8
10.8
35.1
32.00
2
5.4
5.4
40.5
34.00
7
18.9
18.9
59.5
36.00
3
8.1
8.1
67.6
38.00
8
21.6
21.6
89.2
40.00
4
10.8
10.8
100.0
Total
37
100.0
100.0
histogram (ungrouped)
d. frequency polygon
e. Q1
Statistics X N
Percentiles
Valid
37
Missing
0
25
26.0000
50
34.0000
75
38.0000
f. Q2
g. Q3
h. P10 and P90 Statistics X N
Percentiles
P10 Valid
37
Missing
0
10
15.0000
90
40.0000
P90
i. box-and-whisker plot
b. Cumulative relative frequency distribution
ANSWER #3 a. Frequency distribution
X Cumulative Frequency Valid
Percent
Valid Percent
Percent
45.00
1
3.3
3.3
3.3
48.00
2
6.7
6.7
10.0
52.00
3
10.0
10.0
20.0
55.00
2
6.7
6.7
26.7
59.00
5
16.7
16.7
43.3
62.00
3
10.0
10.0
53.3
65.00
4
13.3
13.3
66.7
68.00
4
13.3
13.3
80.0
70.00
6
20.0
20.0
100.0
Total
30
100.0
100.0
c. histogram (ungrouped)
d. frequency polygon
e. Q1
Statistics X N
Percentiles
Valid
30
Missing
0
25
55.0000
50
62.0000
75
68.0000
f. Q2
g. Q3
h. P10 and P90
Statistics X N
Percentiles
Valid
30
Missing
0
10
48.4000
90
70.0000
P10
P90
i. box-and-whisker plot
b. Cumulative relative frequency distribution
ANSWER #4 a. Frequency distribution X Cumulative Frequency Valid
Percent
Valid Percent
Percent
75.00
8
15.4
15.4
15.4
78.00
5
9.6
9.6
25.0
79.00
7
13.5
13.5
38.5
81.00
3
5.8
5.8
44.2
83.00
6
11.5
11.5
55.8
85.00
5
9.6
9.6
65.4
86.00
10
19.2
19.2
84.6
88.00
5
9.6
9.6
94.2
90.00
3
5.8
5.8
100.0
Total
52
100.0
100.0
c. histogram (ungrouped)
d. frequency polygon
Statistics
e. Q1
X N
Percentiles
Valid
52
Missing
0
25
78.2500
50
83.0000
75
86.0000
f. Q2
g. Q3
h. P10 and P90 Statistics X N
Percentiles
Valid
52
Missing
0
10
75.0000
90
88.0000
P10
P90
i. box-and-whisker plot
b. Cumulative relative frequency distribution
ANSWER #5 a. Frequency distribution X Cumulative Frequency Valid
Percent
Valid Percent
Percent
100.00
4
8.7
8.7
8.7
110.00
8
17.4
17.4
26.1
115.00
2
4.3
4.3
30.4
120.00
6
13.0
13.0
43.5
125.00
10
21.7
21.7
65.2
130.00
5
10.9
10.9
76.1
135.00
7
15.2
15.2
91.3
140.00
3
6.5
6.5
97.8
145.00
1
2.2
2.2
100.0
Total
46
100.0
100.0
c. histogram (ungrouped)
d. frequency polygon
Statistics
e. Q1
X N
Percentiles
Valid
46
Missing
0
25
110.0000
50
125.0000
75
131.2500
f. Q2
g. Q3
h. P10 and P90
Statistics X N
Percentiles
P10 Valid
46
Missing
0
10
107.0000
90
136.5000
P90
i. box-and-whisker plot
Multiple-Choice Chapter 3 (Univariate Population Parameters and Sample Statistics)
1. The mean can be computed on all types of measurement scales of variables. a. True b. False ANSWER: B
2. Recall the conceptual formula for calculating the variance. Squaring the deviations from the mean in the numerator means that a negative variance will never be a possibility. a. True b. False ANSWER: A
3. A statistician employed by the Central State Zoo collects data on the temperature (measured on the Fahrenheit scale) of the 100 highest attendance days at the zoo. The statistician computes the standard deviation on the “temperature.” Is this appropriate given the measurement scale of this variable? a. Yes b. No ANSWER: A
4. The Humane Society sampled 75 families who adopted pets and collected data on the type of pet adopted. Options included: a) dog; b) cat; c) rabbit; d) horse. The Humane Society computes the standard deviation on the “type of pet adopted.” Is this appropriate given the measurement scale of this variable? a. Yes b. No ANSWER: B
.
5. During 2007–2008, faculty and staff at My University received an 8% raise. In determining if there will be raises allocated at My University during the current year, the board of trustees asks that there be research conducted on how much variation there was in the percentage of raise allocated at similarly sized institutions across the U.S. The institutional research office contacts 100 similarly sized institutions and gathers data on the raise, measured in percentage, that they allocated during the past fiscal year. The institutional research office computes the standard deviation on the “percentage of raise allocated.” Is this appropriate given the measurement scale of this variable? a. Yes* b. No ANSWER: A
6. The research director at Roll Tide school district is conducting research on the amount of money spent per child at each of the 50 schools in the district. For each school, she collects data on "how much money is spent per child" (measured in whole dollars). She computes the standard deviation for "how much money is spent per child." Is this appropriate given the measurement scale of this variable? a. Yes b. No ANSWER: A
7. A student services employee advises 50 undergraduate students. For each student, she records the "type of student" for which they classify: a) traditional or b) non-traditional. She computes the standard deviation for "type of student." Is this appropriate given the measurement scale of this variable? a. Yes b. No ANSWER: B
8. A speech language pathologist records the number of times that children stutter during a 10-minute conversation. In her sample of 15 children, the number of times stuttered ranges from 0 to 21. The speech language pathologist computes the standard deviation on the "number of times stuttered." Is this appropriate given the measurement scale of this variable? a. Yes b. No ANSWER: A © Taylor & Francis 2020
-
9. The UCF athletic office has asked you to generate statistics from data collected on attendance at home football games. They have provided you two scores: the highest attended game and the lowest attended game. Which of the following can you calculate based on the data you have? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: D
10. Monthly donations to 1,000 regional animal shelters around the country are tracked by the national animal association. You are provided with the value of the highest donation and the value of the lowest donation and are asked to generate a measure of variability on the data. Which of the following measures of variability can be generated given this data? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: D
11. Fifty percent of college faculty earn $50,000 or less during their first year teaching. Which measure of central tendency does this statement represent? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: B
.
-
12. Of data collected on 50 schools, you are provided the number of students enrolled at the school that has the largest enrollment and the school that has the smallest enrollment. Which of the following can you calculate based on the data you have? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: D
13. Of elementary school students, 50% watch four or more hours of television per day. Which measure of central tendency does this statement represent? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: B
14. An admissions counselor is conducting research on graduating college seniors. One of his survey items asks seniors to indicate what they will be doing during the month immediately following graduation. The categories include a) working full or part-time; b) preparing for graduate school; c) seeking employment. Which measure of central tendency is appropriate to calculate given the measurement scale of this variable? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: C
.
-
15. Fifty percent of all graduate students enroll in nine or fewer credit hours during the fall semester. Which measure of central tendency does this statement represent? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: B
16. The average number of credit hours enrolled by graduate students during the fall semester is nine. Which measure of central tendency does this statement represent? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: B
17. The most frequent number of credit hours enrolled by graduate students is nine. Which measure of central tendency does this statement represent? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: C
18. The lowest number of credit hours enrolled by graduate students is one. The highest number of credit hours enrolled is 15. The difference between these values is 14. Which measure of variability does this statement represent? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: D
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
19. A research scientist at Disneyworld randomly samples 1,000 annual pass holders and collects data on the number of days they visited a Disney park using their annual pass during the past calendar year. Fifty percent of all visitors attended 21 or fewer days. Which measure of central tendency does this statement represent? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: B
20. A research scientist at Disneyworld randomly samples 1,000 annual pass holders and collects data on the number of days they visited a Disney park using their annual pass during the past calendar year. The average number of days attended was 14. Which measure of central tendency does this statement represent? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: A
21. A research scientist at Disneyworld randomly samples 1,000 annual pass holders and collects data on the number of days they visited a Disney park using their annual pass during the past calendar year. The most frequent number of days attended was 12. Which measure of central tendency does this statement represent? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: C
.
-
22. A research scientist at Disneyworld randomly samples 1,000 annual pass holders and collects data on the number of days they visited a Disney park using their annual pass during the past calendar year. The lowest number of days attended was 2 and the highest number of days attended was 45. Which measure of variability do these values reflect? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: D
23. A chef at the Grand Floridian has been asked to categorize the dishes that he makes into the following: a) appetizers; b) side dishes; c) entrees; d) desserts. Which measure of central tendency is appropriate to calculate given the measurement scale of this variable? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: C
24. A random sample of 500 graduating graduate students who attended My University (MU) are surveyed about their graduate student experiences at MU. The most frequent amount of financial aid in the form of student loans was $20,000. Which measure of central tendency does this statement represent? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: D
.
-
25. A random sample of 500 graduating graduate students who attended My University (MU) are surveyed about their graduate student experiences at MU. The lowest amount of financial aid in the form of student loans was $0.00 and the highest amount was $100,000. Which measure of variability do these values reflect? a. Mean b. Median c. Mode d. Range e. Standard deviation ANSWER: D
26. The mean is a function of which scores in the distribution? a. All but the two most extreme values b. All but the one more extreme score c. Every score d. Only the largest and smallest scores e. Only the middle two values f. The most frequently occurring values ANSWER: C 27. For which measurement scales is computing the mean appropriate? Select all that apply. a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: C and D 28. Compute the mean for the following values: 4, 8, 6, 1, 0, 9 a. 3.89 b. 4.67 c. 5.21 d. 6.00 ANSWER: B
.
-
Short Answer Chapter 3 (Univariate Population Parameters and Sample Statistics) 1. For the following data, and assuming an interval width of 1, compute the following: a. b. c. d. e. f.
Mode Median Mean Exclusive and inclusive range H spread Variance and standard deviation X 25 21 16 18 19 21 21 22 17 19
.
-
2. For the following data, and assuming an interval width of 1, compute the following using SPSS: a. b. c. d. e. f.
Mode Median Mean Exclusive range Standard deviation Variance X 15000 15000 15000 15000 15000 15000 15000 25000 25000 25000 25000 25000 42000 42000 45000 45000 50000 50000 65000 70000
.
-
3. Without doing any computations, which of the following distributions has the largest variance? X 64 66 69 74 76 78 79 80
f 1 1 5 2 1 4 1 5
Y 50 60 75 80 85 90 95 100
f 2 2 2 2 6 2 2 2
Z 80 81 82 83 84 85 86 87
f 3 2 2 2 2 4 2 3
4. Without doing any computations, which of the following distributions has the largest variance? X 30 36 40 41 48 50 52 60
f 3 2 3 2 2 3 2 3
Y 41 42 44 45 51 52 53 54
.
f 2 4 2 4 2 2 2 2
Z 45 46 48 51 53 55 59 60
f 3 1 1 3 3 3 3 3
-
5. Without doing any computations, which of the following distributions has the largest variance? X 100 111 115 120 122 125 130 135
f 1 4 1 1 4 1 1 7
Y 110 112 114 117 118 125 128 129
.
f 5 1 2 2 2 1 1 6
Z 99 120 125 127 130 132 135 175
f 1 4 4 4 2 3 1 1
-
ANSWER #1: a. Mode. This is the most frequently occurring value in the dataset. Based on the frequency distribution, we see the mode is 21 as it has a frequency of 3. X Frequency
Percent
Valid Percent
Cumulative Percent
16.00
1
10.0
10.0
10.0
17.00
1
10.0
10.0
20.0
18.00
1
10.0
10.0
30.0
19.00
2
20.0
20.0
50.0
21.00
3
30.0
30.0
80.0
22.00
1
10.0
10.0
90.0
25.00
1
10.0
10.0
100.0
Total
10
100.0
100.0
Valid
.
-
b. Median. Median = LRL +
50% (n) − cf 50%(10) − 3 2 w = 18.5 + (1) = 18.5 + (1) = 18.5 f 2 2
LRL is the lower real limit of the interval containing the median (in this case, 18.5 as the median is contained in the interval with the value of 19) , 50% is the percentile desired, n is the sample size (in this case 10), cf is the cumulative frequency of all intervals less than but not including the interval containing the median (cf below; in this case we can look at the frequency distribution and see that the cf = 1 + 1 + 1 = 3), f is the frequency of the interval containing the median (which in this case is 2), and w is the interval width (which in this case is 1). Generating the median using SPSS we find: Statistics X N
Valid
10
Missing
0
Median
21.00
c. Mean. The sum of all the Xs is 2300 and the sample size is 50. Thus the mean is 46 (as seen here). n
X
X = i =1 n
i
=
199 = 19.90 10
Generating the mean using SPSS we find: Statistics X N
Mean
Valid
10
Missing
0 19.9000
.
-
d. Exclusive and inclusive range. The exclusive range is defined as the difference between the largest and smallest scores in a collection of scores. For notational purposes, the exclusive range (ER) is shown as ER = Xmax – Xmin. As seen previously in the frequency distribution, the largest and smallest values, respectively, are 25 and 16. Thus ER = 25 – 16 = 9. The inclusive range takes into account the interval width so that all scores are included in the range. The inclusive range is defined as the difference between the upper real limit of the interval containing the largest score and the lower real limit of the interval containing the smallest score in a collection of scores. For notational purposes, the inclusive range (IR) is shown as IR = URL of Xmax – LRL of Xmin. For this example, with an interval width of one, IR = 25.5 – 15.5 = 10.
e. H spread. H spread is defined as Q3 – Q1, the simple difference between the third and first quartiles. Using SPSS and computing the quartiles with values at group midpoints, we find the first and third quartiles as follows and the resulting H spread: H = Q3 – Q1 = 21.25 – 17.75 = 3.50. [Had this been computed not using values at group midpoints, H = Q3 – Q1 = 21.500 – 18.000 = 3.500.]
Statistics X Valid
10
Missing
0
N 25
18.0000a
75
21.5000
Percentiles a. Percentiles are calculated from grouped data.
.
-
f. Variance and standard deviation. The sample variance is computed as:
s2 =
n
n
i =1
i =1
n X i2 − ( X i ) 2 n (n − 1)
Based on this formula, we need to compute X2, the sum of X2, and the sum of X: X
X2
25 21 16 18 19 21 21 22 17 19
625 441 256 324 361 441 441 484 289 361
199
4023 SUM
Plugging the values into the formula, we find the variance:
s2 =
n
n
i =1
i =1
n X i2 − ( X i ) 2 n (n − 1)
=
10(4023) − (199) 2 40230 − 39601 = = 6.98889 10(10 − 1) 90
And the standard deviation is:
s = + s 2 = 6.98889 = 2.64365
.
-
Computed the variance and standard deviation using SPSS, we find: Statistics X N
Valid
10
Missing
0
Std. Deviation
2.64365
Variance
6.989
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
ANSWER #2 2. For the following data, and assuming an interval width of 1, compute the following using SPSS: g. h. i. j. k. l.
Mode Median Mean Exclusive range Standard deviation Variance X 15000 15000 15000 15000 15000 15000 15000 25000 25000 25000 25000 25000 42000 42000 45000 45000 50000 50000 65000 70000
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
ANSWER #3
Statistics X Valid
20
Missing
0
N
Mean
Median
Mode
Std. Deviation
Variance Range
31950.0000
MEAN
25000.0000
MEDIAN
15000.00
MODE
17751.13043
STANDARD DEVIATION
315102631.579
VARIANCE
55000.00
EXCLUSIVE RANGE
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2018). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Multiple-Choice Chapter 4 (Normal Distribution and Standard Scores) 1. The left tail of a distribution of a continuous variable is pulled extremely to the left. Which of the following would you expect to find given the shape of the distribution? a. Positive skewness statistic b. Negative skewness statistic c. Positive kurtosis statistic d. Negative kurtosis statistic ANSWER: B
2. The right tail of a distribution of a continuous variable is pulled extremely to the right. Which of the following would you expect to find given the shape of the distribution? a. Positive skewness statistic b. Negative skewness statistic c. Positive kurtosis statistic d. Negative kurtosis statistic ANSWER: A
3. A distribution of a continuous variable is extremely peaked. Which of the following would you expect to find given the shape of the distribution? a. Positive skewness statistic b. Negative skewness statistic c. A positive kurtosis statistic d. A negative kurtosis statistic ANSWER: C
4. A distribution of a continuous variable is extremely flat. Which of the following would you expect to find given the shape of the distribution? a. Positive skewness statistic b. Negative skewness statistic c. A positive kurtosis statistic d. A negative kurtosis statistic ANSWER: D
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2018). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. Which of the following is indicative of a distribution that is positively skewed? a. A symmetric distribution b. A left tail that is substantially pulled to the left c. A right tail that is substantially pulled to the right d. A very flat distribution ANSWER: C
6. Which of the following would be found with a platykurtic distribution? a. A positive skewness statistic b. A negative skewness statistic c. A positive kurtosis statistic d. A negative kurtosis statistic ANSWER: D
7. Which of the following would be found with a leptokurtic distribution? a. A positive skewness statistic b. A negative skewness statistic c. A positive kurtosis statistic d. A negative kurtosis statistic ANSWER: A
8. The range of prices of new condominium sales ranges from $145,000 to $2,376,000. Most of the prices are bunched together in mid to low $200,000. What does this suggest in terms of the shape of the distribution? a. Negative kurtosis b. Negative skewness c. Positive kurtosis d. Positive skewness ANSWER: D
9. Weights (in pounds and ounces) of newborns ranges from 3 pounds 7 ounces to 8 pounds 14 ounces. Most of the weights of newborns bunch together between mid-seven to mid-eight pounds. What does this suggest in terms of the shape of the distribution? a. Negative kurtosis b. Negative skewness c. Positive kurtosis d. Positive skewness © Taylor & Francis 2020
Hahs-Vaughn, D. L. & Lomax, R. G. (2018). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
ANSWER: B
10. Which of the following is indicative of a distribution that has a skew value of +18.90? a. A distribution that when split down the middle is an exact mirror image of itself b. A left tail that is substantially pulled to the left c. A right tail that is substantially pulled to the right d. A very peaked distribution e. A very flat distribution ANSWER: C
11. Which of the following is indicative of a distribution that has a skew value of −14.76? a. A distribution that when split down the middle is an exact mirror image of itself b. A left tail that is substantially pulled to the left c. A right tail that is substantially pulled to the right d. A very peaked distribution e. A very flat distribution ANSWER: B
12. Which of the following is indicative of a distribution that has a kurtosis value of +7.25? a. Leptokurtic distribution b. Mesokurtic distribution c. Platykurtic distribution d. Positive skewness e. Negative skewness ANSWER: A
13. Which of the following is indicative of a distribution that has a kurtosis value of −11.56? a. Leptokurtic distribution b. Mesokurtic distribution c. Platykurtic distribution d. Positive skewness e. Negative skewness ANSWER: C .
Hahs-Vaughn, D. L. & Lomax, R. G. (2018). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
14. The population mean of a distribution that is approximately normal is 500 and the population standard deviation is 100. What is the mean of the standardized distribution for this variable? a. 0 b. 10 c. 100 d. 500 e. Cannot be determined from this information ANSWER: A
15. The population mean of a distribution that is approximately normal is 72 and the population standard deviation is 9. What is the mean of the standardized distribution for this variable? a. 0 b. 3 c. 9 d. 72 e. Cannot be determined from this information ANSWER: A
16. A distribution has a mean of $25,000 and a standard deviation of $5,000. Which of the following is correct based on applying the Empirical Rule? a. About 68% of the distribution is between $20,000 and $30,000. b. About 95% of the distribution is between $20,000 and $30,000. c. About 95% of the distibution is between $15,000 and $40,000. d. All of the values fall between $10,000 and $40,000. ANSWER: A
17. Apply the Empirical Rule to a distribution that has a mean of 275 and a standard deviation of 36. Which one of the following is correct? a. About 34% of the distribution is between 239 and 311. b. About 68% of the distribution is between 275 and 311. c. About 95% of the distribution is between 203 and 347. d. All of the distribution is between 167 and 383. ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2018). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
18. Applying the Empirical Rule to a distribution with a mean of 80 and standard deviation of 5, which one of the following is correct? a. About 34% of the values are located between 80 and 90. b. About 95% of the values are between 70 and 90. c. About 75% of the values are between 75 and 85. d. All values fall between 65 and 95. ANSWER: B
19. Which one of the following is a correct statement based on a distribution with a mean of 750 and standard deviation of 25? a. About 2.5% of the scores in the distribution are greater than 675. b. About 50% of the scores in the distribution are less than 725. c. About 84% of the scores in the distribution are less than 775. d. About 95% of the scores in the distribution are less than 700. ANSWER: C
20. A distribution of 400 scores is approximately normally distributed. The mean of the distribution is 50 and the standard deviation is 10. What percentage of scores are less than a value of 30? a. About 2.5% b. About 16% c. About 34% d. About 50% ANSWER: A
21. A distribution is approximately normal with a population mean of 80 and population standard deviation of 9. Approximately what percentage of scores fall between 71 and 89? a. About 1/3 b. About 2/3 c. About 95% d. About 99% ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2018). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
22. What is the percentile rank of the score 610 in a unit normal distribution that has a population mean of 500 and standard deviation of 100? a. Approximately 1 b. Approximately 87 c. Approximately 100 d. Approximately 110 ANSWER: B
23. What is the 80th percentile of a distribution that is approximately normally distributed with a population mean of 75 and variance of 16? a. Approximately 72% b. Approximately 78% c. Approximately 84% d. Approximately 95% ANSWER: B
24. A standardized score of 2.22 is interpreted as which one of the following? a. Approximately 2.25% of the distribution is right of the mean. b. The mean is 2.22. c. The score is approximately 2 and 1/4 standard deviation units to the right of the mean. d. The standard deviation is 2.22. ANSWER: C
25. After completion of a training program, scores on the satisfaction survey are standardized. The following reflect the standardized results for each component of the satisfaction survey: a) satisfaction with the physical environment, .86; b) satisfaction with the material presented, 2.70; c) satisfaction with the handouts, 1.37; and d) satisfaction with the facilitator, 3.25. Which aspect of satisfaction is farthest from the average? a. Satisfaction with the physical environment b. Satisfaction with the material presented c. Satisfaction with the handouts d. Satisfaction with the facilitator ANSWER: A
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2018). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
26. A new instructor receives the following standardized scores based on their first teaching evaluation: a) concern for students, −1.25; b) organization of the course, .35; c) facilitation of learning, .12; d) classroom materials, −2.10. In which area did the instructor score farthest from the average? a. Concern for students b. Organization of the course c. Facilitation of learning d. Classroom materials ANSWER: D
27. Which of the following represents the interpretation of a standardized score of −1.5? a. One and one-half percent (1.5%) of the scores are left of the mean. b. One and one-half percent (1.5%) of the scores are right of the mean. c. The score is 1.5 standard deviation units from the mean. d. The standard deviation of the variable is 1.5. ANSWER: C
28. An auditor for the national accrediting agency for the college provides the administration with standardized scores for the areas they were assessed. The standardized scores include: a) recruitment, -1.50; b) assessment, +1.25; c) faculty qualifications, +.75; and d) program offerings, -1.00. In which area did the college score closest to the average? a. Recruitment b. Assessment c. Faculty qualifications d. Program offerings ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2018). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Short Answer Chapter 4 (Normal Distribution and Standard Scores) 1. What is the proportion of the area below z = +.86 assuming a normal distribution and referring to the table for N(0,1)? 2. What is the proportion of the area below z = −.22 assuming a normal distribution and referring to the table for N(0,1)? 3. What is the proportion of the area below z = +1.10 assuming a normal distribution and referring to the table for N(0,1)? 4. What is the percentile rank of the score 95 in N(90,100)? 5. What is the 40th percentile of N(36,36)?
ANSWERS 1. .8051055 2. .4129356 3. .8643339 4. The percentile rank of the score 95 in N(90,100) (95 − 90) z= = .500 10 Looking in Table A.1, when z is .500, the proportion of the distribution below that value is approximately .6914625. Thus, the percentile rank of the score 95 in this distribution is approximately 69%. 5. The 40th percentile of N(36,36). At the 60th percentile, z is equal to approximately .25 (see Table A.1), thus at the 40th percentile, z is −.25. Plugging the values into the formula (with a standard deviation of 7 since the variance is 49), we find:
zi =
(Xi − X )
X
( X i − 36) − 1.50 = X i − 36 6 And thus X, the 40th percentile, is 34.50. − .25 =
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Multiple-Choice Chapter 5 (Introduction to Probability and Sample Statistics) 1. A bowl contains 20 balls: 6 red, 2 orange, 5 yellow, 3 blue, and 4 green. The probability that a ball selected at random is orange is equal to which one of the following? a. 2/15 b. 3/15 c. 4/15 d. 5/15 e. Cannot be determined ANSWER: A 2. A bowl contains 15 balls: 5 red, 3 orange, 4 yellow, and 3 blue. The probability that a ball selected at random is yellow is equal to which one of the following? a. 3/15 b. 4/15 c. 5/15 d. Cannot be determined ANSWER: B 3. A basket contains 50 cubes: 20 purple, 15 pink, 10 white, and 5 yellow. The probability that a cube selected at random is purple is equal to which one of the following? a. 5/50 b. 10/50 c. 15/50 d. 20/50 e. Cannot be determined ANSWER: D
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. A basket contains 25 balls: 5 polka dotted; 9 plaid; 3 striped; 8 solids. The probability that a ball selected at random is polka dotted is equal to which one of the following? a. 3/25 b. 5/25 c. 8/25 d. 9/25 e. Cannot be determined ANSWER: D 5. A basket contains 40 balls: 10 polka dotted; 7 plaid; 12 striped; 11 solids. The probability that a ball selected at random is striped is equal to which one of the following? a. 7/40 b. 10/40 c. 11/40 d. 12/40 e. Cannot be determined ANSWER: D 6. A basket contains 12 cubes: 4 pink, 6 purple, and 2 yellow. The probability that a cube selected at random is purple is equal to which one of the following? a. 2/12 b. 4/12 c. 6/12 d. Cannot be determined ANSWER: C 7. A basket contains 58 rings: 10 diamond, 22 amethyst, and 26 aquamarine. The probability that a ring selected at random is amethyst is equal to which one of the following? a. .17 b. .38 c. .45 d. Cannot be determined ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
8. A researcher collects salary information from the first 30 employees in the human resources roster. Which of the following sampling methods is implied by this scenario? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: A 9. A researcher collects heart rate data on the first 100 newborn babies at the hospital. Which of the following sampling methods is implied by this scenario? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: A 10. A researcher collects weight loss data from individuals who attend Weight Watchers on Monday. Which of the following sampling methods is implied by this scenario? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: A 11. A researcher collects data on the number of minutes exercised from the first 50 members who walk into the gym. Which of the following sampling methods is implied by this scenario? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: A
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
12. A researcher collects survey data from students who attend class on the day the survey is distributed. Which of the following sampling methods is implied by this scenario? a. Convenient sampling b. Smple random sampling with replacement c. Smple random sampling without replacement d. Systematic sampling ANSWER: A 13. A researcher collects data from employers on their satisfaction with the college graduates that they have hired. Surveys were sent to employers whose names were provided on the exit surveys completed by the college graduates. Which of the following sampling methods is implied by this scenario? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: A 14. A researcher collects data from every other employee in the human resources database. Which of the following sampling methods is implied by this scenario? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: D 15. A researcher wants their sampling method to be such that every 15th person through the door is selected. Which of the following is the most appropriate sampling method? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: D
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
16. A researcher wants their sampling method to be such that every 25th visitor in the population is selected. Which of the following is the most appropriate sampling method? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: D 17. A researcher wants their sampling method to be such that every 30th person in the population is selected. Which of the following is the most appropriate sampling method? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: D 18. A researcher wants their sampling method to be such that every 5th teacher in the population is selected. Which of the following is the most appropriate sampling method? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: D 19. A researcher is surveying university development officers regarding their satisfaction with alumni giving. The researcher wants their sampling method to be such that every development officer has an equal probability of being selected. Which of the following is the most appropriate sampling method? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
20. A researcher is surveying speech-language pathologists (SLP) on their perceptions about communicative devices. The researcher wants their sampling method to be such that every SLP has an equal probability of being selected. Which of the following is the most appropriate sampling method? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: C 21. A researcher is surveying guidance counselors on their perceptions about antibullying programs. The researcher wants their sampling method to be such that every counselor has an equal probability of being selected. Which of the following is the most appropriate sampling method? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: C 22. A researcher is surveying parents on their perceptions of the school that their child attends. The researcher wants their sampling method to be such that every parent has an equal probability of being selected. Which of the following is the most appropriate sampling method? a. Convenient sampling b. Simple random sampling with replacement c. Simple random sampling without replacement d. Systematic sampling ANSWER: C 23. If a population distribution is highly negatively skewed, then the distribution of the sample means for samples of size 1000 will be which one of the following? a. Highly negatively skewed b. Highly positively skewed c. Approximately normally distributed d. Cannot be determined without further information ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
24. If a population distribution is approximately normal, then the distribution of the sample means for samples of size 100 will be which one of the following? a. Highly negatively skewed b. Highly positively skewed c. Approximately normally distributed d. Cannot be determined without further information ANSWER: C 25. Which of the following confidence intervals provides the lowest level of confidence? a. 90% CI b. 95% CI c. 99% CI d. It depends on which statistical procedure is used. ANSWER: A 26. Which of the following confidence intervals provides the greatest level of confidence? a. 90% CI b. 95% CI c. 99% CI d. It depends on which statistical procedure is used. ANSWER: C 27. Which of the following is needed to be able to draw inferences about a parameter based on a sample estimate? a. Random sample b. Underlying theoretical distribution c. Outcome probability d. Sample population selection ANSWER: A 28. To draw inferences from a sample, to what is the sample statistic compared? a. Random sample b. Underlying distribution of estimates c. Outcome probability d. Sample population selection ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Short Answer Chapter 5 (Introduction to Probability and Sample Statistics) 1. If the standard error of the mean is 70 for n = 50, what must the sample size be to reduce the standard error to 50?
2. If the standard error of the mean is 10 for n = 100, what must the sample size be to reduce the standard error to 5?
3. If the standard error of the mean is 25 for n = 50, what must the sample size be to reduce the standard error to 20?
4. A random sample of size 30 had a mean of 80 and a standard deviation of 6. First calculate the standard error of the mean. Then calculate the 90% CI for the mean.
5. A random sample of size 50 had a mean of 5 and a standard deviation of 1. First calculate the standard error of the mean. Then calculate the 90% CI for the mean.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
ANSWERS 1. If the standard error of the mean is 70 for n = 50, what must the sample size be to reduce the standard error to 50? sX =
sX
50 =
442.72 n
70 =
n
sX
70 40 = 70(6.32) = 442.72 = s X
40
50 n = 442.72
(50 n ) 2 = (442.72) 2 50n = 196000.49 n = 3920
2. If the standard error of the mean is 10 for n = 100, what must the sample size be to reduce the standard error to 5? sX
sX =
5=
10 =
n
100 n
5 n = 100
sX
10 100 = 10(10) = 100 = s X
100
(5 n ) 2 = (100) 2 25n = 10000 n = 400
3. If the standard error of the mean is 25 for n = 50, what must the sample size be to reduce the standard error to 20? sX =
sX
20 =
176.78 n
25 =
n
sX
25 50 = 25(7.07) = 176.78 = s X
50
20 n = 176.78
(20 n ) 2 = (176.78) 2 400n = 31251.168 n = 78
4. A random sample size of 30 had a mean of 80 and a standard deviation of 6. First calculate the standard error of the mean. Then calculate the 90% CI for the mean.
X =
X n
=
6 = 1.095 30
95% CI = X 1.96 X = 80 1.645 (1.095) = 3 1.80 = (1.199,4.80)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. A random sample size of 50 had a mean of 5 and a standard deviation of 1. First calculate the standard error of the mean. Then calculate the 90% CI for the mean.
X =
X n
=
1 = .1414 50
95% CI = X 1.96 X = 5 1.645 (.1414 = 5 1.786 = (3.214,6.786)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Multiple-Choice Chapter 6 (Introduction to Hypothesis Testing: Inferences About a Single Mean) 1. In hypothesis testing, the probability of rejecting H0 when H0 is false is denoted by a. b. 1 – c. d. 1 – ANSWER: C 2.
When testing the hypothesis presented below, at a .05 level of significance with the t test, where is the rejection region? H 0 : = 100
H 1 : 100 a. The upper tail b. The lower tail c. Both the upper and tails d. Cannot be determined ANSWER: A 3.
When testing the hypothesis presented below, at a .05 level of significance with the t test, where is the rejection region? H 0 : = 100
H 1 : 100 a. The upper tail b. The lower tail c. Both the upper and tails d. Cannot be determined ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4.
A research question asks, “Is the mean number of hours spent studying per week for college freshmen lower than 15?” Which one of the following is implied? a. Left-tailed test b. Right-tailed test c. Two-tailed test d. Cannot be determined based on this information ANSWER: A
5.
A research question asks, “Is the mean age (in years) of first-year teachers different from 25?” Which one of the following is implied? a. Left-tailed test b. Right-tailed test c. Two-tailed test d. Cannot be determined based on this information ANSWER: C
6.
A research question asks, “Is the mean commute (in miles) to college more than 15 miles?” Which one of the following is implied? a. Left-tailed test b. Right-tailed test c. Two-tailed test d. Cannot be determined based on this information ANSWER: B
7.
A research question asks, “Is the mean purchase price of required textbooks for one semester higher than $750?” Which one of the following is implied? a. Left-tailed test b. Right-tailed test c. Two-tailed test d. Cannot be determined based on this information ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
8.
The probability of making a Type I error when the level of significance is .05 is which one of the following? a. 0 b. .05 c. .10 d. between .05 and .95 e. .90 f. .95 ANSWER: B
9.
The probability of making a Type I error when the level of significance is .10 is which one of the following? a. 0 b. .05 c. .10 d. between .05 and .95 e. .90 f. .95 ANSWER: C
10.
If the 90% CI includes the value for the parameter being estimated in H0, then which one of the following is a correct statement? a. H0 cannot be rejected at the .10 level b. H0 can be rejected at the .10 level c. H0 cannot be rejected at the .01 level d. H0 can be rejected at the .01 level ANSWER: A
11.
If the 99% CI includes the value for the parameter being estimated in H0, then which one of the following is a correct statement? a. H0 cannot be rejected at the .10 level b. H0 can be rejected at the .10 level c. H0 cannot be rejected at the .01 level d. H0 can be rejected at the .01 level ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
12.
If the 99% CI does not include the value for the parameter being estimated in H0, then which one of the following is a correct statement? a. H0 cannot be rejected at the .10 level b. H0 can be rejected at the .10 level c. H0 cannot be rejected at the .01 level d. H0 can be rejected at the .01 level ANSWER: D
13.
A one-sample t test is conducted at an alpha level of .05. The researcher finds a p value of .08 and concludes that the test is statistically significant. Is the researcher correct? a. Yes b. No ANSWER: B
14.
A research article reports an alpha level of .05 and a p value of .10. Based on this, the author concludes that the test is statistically significant. Is the researcher correct? a. Yes b. No ANSWER: B
15.
A research article reports an alpha level of .10 and a p value of .05. Based on this, the author concludes that the test is statistically significant. Is the researcher correct? a. Yes b. No ANSWER: A
16.
A research article reports an alpha level of .01 and a p value of .05. Based on this, the author concludes that the test is statistically significant. Is the researcher correct? a. Yes b. No ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
17.
A research article reports an alpha level of .05 and a p value of 1.20. Based on this, the author concludes that the test is NOT statistically significant. Is the researcher correct? a. Yes b. No ANSWER: A
18.
A research article reports an alpha level of .01 and a p value of 2.0. Based on this, the author concludes that the test is NOT statistically significant. Is the researcher correct? a. Yes b. No ANSWER: A
19. Which one of the following is a correct interpretation of d = .19? a. Small effect b. Moderate effect c. Large effect d. Statistical significance at .05 e. Not statistically significant at .05 ANSWER: A 20. Which one of the following is a correct interpretation of d = .51? a. Small effect b. Moderate effect c. Large effect d. Statistical significance at .05 e. Not statistically significant at .05 ANSWER: B 21. Which one of the following is a correct interpretation of d = .99? a. Small effect b. Moderate effect c. Large effect d. Statistical significance at .05 e. Not statistically significant at .05 ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
22. Which one of the following is a correct interpretation of d = .90? a. Small effect b. Moderate effect c. Large effect d. Statistical significance at .05 e. Not statistically significant at .05 ANSWER: C 23. Which one of the following is a correct interpretation of the significance level? a. Alpha level b. Confidence interval c. Effect size d. Observed probability e. Power ANSWER: A 24. Which one of the following is a correct interpretation of p value? a. Alpha level b. Confidence interval c. Effect size d. Observed probability e. Power ANSWER: D 25. Which one of the following is a correct interpretation of ? a. Alpha level b. Confidence interval c. Effect size d. Observed probability e. Power ANSWER: E 26. Which of the following conditions must be met with the one sample t test? a. Expected frequencies greater than 5 b. Interval or ratio dependent variable c. Independent variable with two categories d. Sample size greater than 30 ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
27. The assumption of independence for the one sample t test calls for having which of the following? a. Equal sample sizes b. Random assignment c. Random selection of the sample from the population d. Sample size greater than 30 ANSWER: C 28. A researcher conducts a one sample t test with a sample of 26 individuals. What are the degrees of freedom for this study? a. 24 b. 25 c. 26 d. Cannot be determined ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
Short Answer Chapter 6 (Introduction to Hypothesis Testing: Inferences About a Single Mean) 1. Using this random sample of data, test the following hypothesis at the .05 level of significance using SPSS. Interpret the output including identifying the specific statistical procedure that has been used and reporting the extent to which the test is statistically significant. Include appropriate evidence (e.g., values from the output). DATA 21.00 16.00 12.00 8.00 9.00 4.00 7.00 2.00 9.00 11.00 15.00 10.00 7.00 14.00 10.00 17.00 8.00 3.00 2.00 6.00
H 0 : = 10 H 1 : 10
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
2. Using this random sample of data, test the following hypothesis at the .05 level of significance using SPSS. Interpret the output including identifying the specific statistical procedure that has been used and reporting the extent to which the test is statistically significant. Include appropriate evidence (e.g., values from the output). DATA 45.00 50.00 62.00 55.00 36.00 22.00 51.00 38.00 47.00 40.00 58.00 59.00 52.00 68.00 65.00 45.00 30.00 35.00 47.00 54.00
H 0 : = 50 H 1 : 50 3. Refer to the t table and provide a numerical value for the percentile rank of t20 = 2.528. 4. Refer to the t table and provide a numerical value for the percentile rank of t40 = 2.021. 5. Refer to the t table and provide a numerical value for the 99th percentile of the distribution of t1.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
ANSWERS 1. A one-sample t test was conducted. The results are not statistically significant, t = −.390, df =19, p = .701.
One-Sample Statistics N X
Mean 20
9.5500
Std. Deviation
Std. Error Mean
5.15522
1.15274
One-Sample Test Test Value = 10 t
df
Sig. (2-tailed)
Mean Difference
95% Confidence Interval of the Difference Lower
X
−.390
19
.701
−.45000
Upper
−2.8627
1.9627
2. A one-sample t test was conducted. The results are not statistically significant, t = −.767, df =19, p = .453.
One-Sample Statistics N X
Mean 20
47.9500
Std. Deviation
Std. Error Mean
11.95815
2.67392
One-Sample Test Test Value = 50 t
df
Sig. (2-tailed)
Mean Difference
95% Confidence Interval of the Difference Lower
X
−.767
19
.453
.
−2.05000
−7.6466
Upper 3.5466
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
3. df = 20 for a one-tailed test (i.e., percentile rank suggests percentage of the distribution below this point thus only one tail) at alpha = .01 thus the percentile rank is 99th (i.e., 1 − .01). 4. df = 40 for a one-tailed test (i.e., percentile rank suggests percentage of the distribution below this point thus only one tail) at alpha = .025 thus the percentile rank is 97.5th (i.e., 1 − .025). 5. df = 1 for a one-tailed test (i.e., percentile rank suggests percentage of the distribution below this point thus only one tail) at alpha = .01 (given the 99th percentile) thus the 99th percentile is 31.821.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
Multiple-Choice Chapter 7 (Inferences About the Difference Between Two Means) 1. A swim coach designs an experiment to determine if children who participate in group versus individual swim lessons have different swimming skills, on average. She randomly selects 20 children who have registered for swim lessons and randomly assigns them to group or individual swim lessons and collects data on the following: 1) type of swim lesson participated (two categories: group or individual); and 2) swimming skills (measured on a scale from 0 to 200). Her research question reads: Is there a mean difference in swimming skills for children who participate in individual versus group swimming lessons? Which of the following represents the INDEPENDENT variable? a. 20 children b. Random assignment c. Swimming skills d. Type of swim lesson ANSWER: D 2. A swim coach designs an experiment to determine if children who participate in group versus individual swim lessons have different swimming skills, on average. She randomly selects 20 children who have registered for swim lessons and randomly assigns them to group or individual swim lessons and collects data on the following: 1) type of swim lesson participated (two categories: group or individual); and 2) swimming skills (measured on a scale from 0 to 200). Her research question reads: Is there a mean difference in swimming skills for children who participate in individual versus group swimming lessons? Which of the following represents the DEPENDENT variable? a. 20 children b. Random assignment c. Swimming skills d. Type of swim lesson ANSWER: C 3. A professor is examining success of tenure-track faculty in attaining tenure at their postsecondary institution. He surveys tenure-track faculty who have went through the tenure process in the past year and gathers data on two variables: 1) the number of publications in refereed journals (measured in whole numbers); and 2) whether or not they attained tenure (yes or no). The research question is: Is there an average difference in the number of publications based on faculty who attain tenure as compared to faculty who do not attain tenure? Which of the following represents the INDEPENDENT variable? a. Attainment of tenure b. Faculty in tenure track programs c. Number of publications d. Postsecondary institution ANSWER: A © Taylor & Francis 2020
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
4. A professor is examining success of tenure-track faculty in attaining tenure at their postsecondary institution. He surveys tenure-track faculty who have went through the tenure process in the past year and gathers data on two variables: 1) the number of publications in refereed journals (measured in whole numbers); and 2) whether or not they attained tenure (yes or no). The research question is: Is there an average difference in the number of publications based on faculty who attain tenure as compared to faculty who do not attain tenure? Which of the following represents the DEPENDENT variable? a. Attainment of tenure b. Faculty in tenure track programs c. Number of publications d. Postsecondary institution ANSWER: C 5. A professor is examining success of tenure-track faculty in attaining tenure at their postsecondary institution. He surveys tenure-track faculty who have went through the tenure process in the past year and gathers data on two variables: 1) the number of publications in refereed journals (measured in whole numbers); and 2) whether or not they attained tenure (yes or no). The research question is: Is there an average difference in the number of publications based on faculty who attain tenure as compared to faculty who do not attain tenure? Which one of the following represents how the NULL hypothesis will be written? a. The number of publications differs on average for faculty who attain tenure as compared to faculty who do not attain tenure. b. The average number of publications for faculty who attain tenure is the same as the average number of publications of faculty who do not attain tenure. c. The average number of publications of faculty who attain tenure is less than or equal to the average number of publications of faculty who do not attain tenure. d. The average number of publications of faculty who attain tenure is greater than or equal to the average number of publications of faculty who do not attain tenure. ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
6.
A professor is examining success of tenure-track faculty in attaining tenure at their postsecondary institution. He surveys tenure-track faculty who have went through the tenure process in the past year and gathers data on two variables: 1) the number of publications in refereed journals (measured in whole numbers); and 2) whether or not they attained tenure (yes or no). The research question is: Is there an average difference in the number of publications based on faculty who attain tenure as compared to faculty who do not attain tenure? Which one of the following represents how the ALTERNATIVE hypothesis will be written? a. The number of publications differs on average for faculty who attain tenure as compared to faculty who do not attain tenure. b. The average number of publications for faculty who attain tenure is the same as the average number of publications of faculty who do not attain tenure. c. The average number of publications of faculty who attain tenure is less than or equal to the average number of publications of faculty who do not attain tenure. d. The average number of publications of faculty who attain tenure is greater than or equal to the average number of publications of faculty who do not attain tenure. ANSWER: A
7. A researcher randomly assigns individuals to two groups, one in which calming instrumental music is played softly and another in which no music is played. The researcher collects time on task, measured in minutes. The research question is: Is there a greater average time on task for individuals who had music as compared to individuals who did not have music? Which one of the following represents how the NULL hypothesis will be written? a. Time on task differs on average for individuals with music as compared to without music. b. The average time on task for individuals with music is the same as the average time on task for individuals without music. c. The average time on task for individuals with music is less than or equal to the average time on task for individuals without music d. The average time on task for individuals with music is greater than or equal to the average time on task for individuals without music ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
8. A researcher randomly assigns individuals to two groups, one in which calming instrumental music is played softly and another in which no music is played. The researcher collects time on task, measured in minutes. The research question is: Is there a greater average time on task for individuals who had music as compared to individuals who did not have music? Which one of the following represents how the ALTERNATIVE hypothesis will be written? a. Time on task differs on average for individuals with music as compared to without music. b. The average time on task for individuals with music is the same as the average time on task for individuals without music. c. The average time on task for individuals with music is less than the average time on task for individuals without music d. The average time on task for individuals with music is greater than the average time on task for individuals without music ANSWER: D 9. In which one of the following is evidence of normality suggested? a. Alpha level = .05; Levene's test p value = .80 b. Curvilinear Q-Q plot c. Histogram where each value on the X axis has the same frequency d. Shapiro Wilk's p value = .08; level of significance = .01 ANSWER: D 10. A recent article you read indicates that the p value for Levene's test for homogeneity of variances was .19. Which of the following is a correct interpretation based on this information? a. The assumption of the test of normality is met. b. The results of the independent t test are not statistically significant. c. The variances of the groups are not statistically significantly different. d. There is a small effect present. ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
11. In examining the distribution of a ratio level measurement scale variable, Wesley finds that the skewness of the distribution is −2.96 and the kurtosis is +1.72. He also finds that the p value for the Shapiro Wilks test for the variable is .03. Which of the following is suggested? a. There is evidence that the assumption of homogeneity of variance has been met. b. There is evidence that the assumption of homogeneity of variance has been violated. c. There is evidence that the assumption of normality has been met. d. There is evidence that the assumption of normality has been violated. ANSWER: D 12. You read the following in a published research study: 1) Shapiro Wilk's p value of .05; 2) alpha level of .001; 3) eta squared of .25; 4) Levene's test p value of .005. Which of the following statements is correct? a. The alpha level is rejected. b. The assumption of equal variances has been met. c. The assumption of homogeneity of variance has been violated. e. There is a large effect size. ANSWER: B 13. You read the following in a published research study: 1) test statistic value of 2.56; 2) alpha level of .10; 3) phi coefficient of .75; 4) Shapiro-Wilk's p value of .05. Which of the following statements is correct? a. The alpha level is rejected. b. The assumption of equal variances has been met. c. The assumption of homogeneity of variance has been violated. d. There is a large effect size. ANSWER: D 14. In which one of the following is evidence of normality suggested? a. Quadratic distributional shape b. Shapiro Wilk's p value = .09; level of significance = .10 c. Skewness statistic = 1.65 and kurtosis statistic = 1.90 d. Test statistic value of 1.79; alpha level of .05 ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
15. A researcher draws a random sample of 500 participants for a study with the intent to perform a dependent t test for a directional hypothesis. He reviews the normality indices and finds slight non-normality. What is the most appropriate recommendation for the researcher? a. Abort and find a different procedure. The t test is not robust to violations of normality. b. Proceed with the t test. The t test is robust to violations of normality. c. Suggest a different sample be selected and start over. ANSWER: C 16. In conducting an independent t test, a researcher finds a p value of .356 for Levene's test of homogeneity of variance. Has the homogeneity assumption been violated? a. Yes b. No ANSWER: B 17. A researcher conducts an independent t test. She computes eta squared and finds the value to be .16. What interpretation can be made from this? a. This is a small effect. Approximately 16% of the variation in the dependent variable can be accounted for by the independent variable. b. This is a small effect. Approximately 16% of the variation in the independent variable can be accounted for by the dependent variable. c. This is a moderate effect. Approximately 16% of the variation in the dependent variable can be accounted for by the independent variable. d. This is a moderate effect. Approximately 16% of the variation in the independent variable can be accounted for by the dependent variable. e. This is a large effect. Approximately 16% of the variation in the dependent variable can be accounted for by the independent variable. f. This is a large effect. Approximately 16% of the variation in the independent variable can be accounted for by the dependent variable. ANSWER: E
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
18. Malani is examining the following research question: Is there a mean difference in the weight of garbage thrown away per week by homeowners as compared to renters? Use the “Garbage” dataset. "Weight" is a ratio level measurement scale variable. "Own_Rent" is a nominal level measurement scale variable. Conduct the appropriate inferential statistical procedure at an alpha level of .05. Do NOT exclude any cases when conducting the test. Which one of the following is correct based on reviewing the assumptions? a. The assumption of homogeneity of variances is met at alpha of .05. b. The assumption of homogeneity of variances is NOT met at alpha of .05. c. The assumption of linearity is met. d. The assumption of linearity is NOT met. ANSWER: A 19. Malani is examining the following research question: Is there a mean difference in the weight of garbage thrown away per week by homeowners as compared to renters? Use the "Garbage" dataset. "Weight" is a ratio level measurement scale variable. "Own_Rent" is a nominal level measurement scale variable. Conduct the appropriate inferential statistical procedure at an alpha level of .05. Do NOT exclude any cases when conducting the test. Which one of the following is correct based on reviewing the assumptions? a. Skewness and kurtosis statistics provide some evidence to suggest that homogeneity of variance is a reasonable assumption. b. Skewness and kurtosis statistics provide some evidence to suggest that homogeneity of variance is NOT a reasonable assumption. c. Skewness and kurtosis statistics provide some evidence to suggest that independence is a reasonable assumption. d. Skewness and kurtosis statistics provide some evidence to suggest that independence is NOT a reasonable assumption. e. Skewness and kurtosis statistics provide some evidence to suggest that normality is a reasonable assumption. f. Skewness and kurtosis statistics provide some evidence to suggest that normality is NOT a reasonable assumption. ANSWER: E
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
20. Malani is examining the following research question: Is there a mean difference in the weight of garbage thrown away per week by homeowners as compared to renters? Use the "Garbage" dataset. "Weight" is a ratio level measurement scale variable. "Own_Rent" is a nominal level measurement scale variable. Conduct the appropriate inferential statistical procedure at an alpha level of .05. Do NOT exclude any cases when conducting the test. Which one of the following is correct based on reviewing the results of the test? a. The probability the means are not equal is about 30%. b. There is a good probability that renters throw out more garbage than homeowners. c. There is about a 30% probability of finding a mean difference of about 16 or greater by chance if the true mean difference is zero. d. There is less than a 1% probability of finding a mean difference of about 16 or greater by chance if the true mean difference is zero. ANSWER: D 21. Malani is examining the following research question: Is there a mean difference in the weight of garbage thrown away per week by homeowners as compared to renters? Use the "Garbage" dataset. "Weight" is a ratio level measurement scale variable. "Own_Rent" is a nominal level measurement scale variable. Conduct the appropriate inferential statistical procedure at an alpha level of .05. Do NOT exclude any cases when conducting the test. Which one of the following is correct based on reviewing eta squared calculated from the data? a. Eta squared indicates that close to 24% of the variation in whether the person owns or rents their home can be attributed to the weight of garbage disposed. b. Eta squared indicates that close to 24% of the variation in the weight of garbage disposed can be attributed to whether the person owns or rents their home. c. Eta squared indicates that there are about 2.5 standard deviation units between homeowners and renters. d. Eta squared indicates that the effect size is generally a small effect. ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
22. A researcher collects data on shyness for two groups: 1) high school students who have their own cell phone; and 2) high school students who do NOT have a cell phone. Shyness is measured by the Shyness Inventory (an interval scaled variable). For group one (students who have their own cell phone), the following values are measured from the Shyness Inventory: 89, 95, 72, 68, 91, 86. For group two (students who do NOT have their own cell phone), the following values are measured from the Shyness Inventory: 56, 78, 61, 43, 80, 50. Conduct the appropriate statistical procedure at alpha of .05 to determine if there is a difference, on average, of the shyness of students who have versus do not have a cell phone. Is the assumption of homogeneity of variances met? a. Yes b. No ANSWER: A 23. A researcher collects data on shyness for two groups: 1) high school students who have their own cell phone; and 2) high school students who do NOT have a cell phone. Shyness is measured by the Shyness Inventory (an interval scaled variable). For group one (students who have their own cell phone), the following values are measured from the Shyness Inventory: 89, 95, 72, 68, 91, 86. For group two (students who do NOT have their own cell phone), the following values are measured from the Shyness Inventory: 56, 78, 61, 43, 80, 50. Conduct the appropriate statistical procedure at alpha of .05 to determine if there is a difference, on average, of the shyness of students who have versus do not have a cell phone. What is the t test statistic value? a. .544 b. 2.571 c. +/-2.776 d. −2.927 ANSWER: D 24. A researcher collects data on shyness for two groups: 1) high school students who have their own cell phone; and 2) high school students who do NOT have a cell phone. Shyness is measured by the Shyness Inventory (an interval scaled variable). For group one (students who have their own cell phone), the following values are measured from the Shyness Inventory: 89, 95, 72, 68, 91, 86. For group two (students who do NOT have their own cell phone), the following values are measured from the Shyness Inventory: 56, 78, 61, 43, 80, 50. Conduct the appropriate statistical procedure at alpha of .05 to determine if there is a difference, on average, of the shyness of students who have versus do not have a cell phone. What is the p value? a. .015 b. .05 c. .478 d. .544 ANSWER: A © Taylor & Francis 2020
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
25. A researcher collects information on students enrolled in a statistics course to determine if their level of anxiety changes from the time they begin the course to the time they end the course. Lower scores on the instrument reflect decreased anxiety toward statistics. The pretest and posttest scores for the students were: 88, 94, 91, 84, 80, 76, 96, 90, 83, 95. The posttest scores were: 56, 64, 72, 80, 42, 40, 51, 32, 61, 59. (The data are in respective order. For example, for student A the pretest score was 88 and posttest score was 56; for student B the pretest score was 94 and posttest score was 64, and so forth.) Conduct the appropriate statistical procedure at alpha of .05 to determine if there is a difference in statistics anxiety, on average, prior to and after completion of a statistics course. What is the t test statistic value? a. 2.201 b. +/−2.262 c. 6.840 d. 14.795 ANSWER: C 26. A researcher collects information on students enrolled in a statistics course to determine if their level of anxiety changes from the time they begin the course to the time they end the course. Lower scores on the instrument reflect decreased anxiety toward statistics. The pretest scores for the students were: 88, 94, 91, 84, 80, 76, 96, 90, 83, 95. The posttest scores were: 56, 64, 72, 80, 42, 40, 51, 32, 61, 59. (The data are in respective order. For example, for student A the pretest score was 88 and posttest score was 56; for student B the pretest score was 94 and posttest score was 64, and so forth.) Conduct the appropriate statistical procedure at alpha of .05 to determine if there is a difference in statistics anxiety, on average, prior to and after completion of a statistics course. Is there a statistically significant difference between the pretest and posttest means? a. Yes at p < .001 b. Yes at p = .01 c. Yes at p = .023 d. Yes at p = .05 ANSWER: A
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
27. A researcher collects information on students enrolled in a statistics course to determine if their level of anxiety changes from the time they begin the course to the time they end the course. Lower scores on the instrument reflect decreased anxiety toward statistics. The pretest and posttest scores for the students were: 88, 94, 91, 84, 80, 76, 96, 90, 83, 95. The posttest scores were: 56, 64, 72, 80, 42, 40, 51, 32, 61, 59. (The data are in respective order. For example, for student A the pretest score was 88 and posttest score was 56; for student B the pretest score was 94 and posttest score was 64, and so forth.) Conduct the appropriate statistical procedure at alpha of .05 to determine if there is a difference in statistics anxiety, on average, prior to and after completion of a statistics course. What is the confidence interval of the mean difference? a. 4.68 to 14.79 b. 21.42 to 42.58 c. 53.42 to 74.58 d. 55.70 to 87.70 ANSWER: B 28. A research article reports the following for an independent t test: 𝛼 = .05, t (6) = −4.40, p .005, d = −3.11. Which of the following reflects the interpretation for the hypothesis test? a. Fail to reject the null hypothesis b. Reject the null hypothesis c. There is a small effect d. There is a large effect ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Short Answer Chapter 7 (Inferences About the Difference Between Two Means) 1. Interpret the following SPSS output in terms of the extent to which the assumption of normality has been met. Include appropriate evidence (e.g., statistical values from the output). Descriptives Employment
Statistic
Not employed Number of
Std. Error
Mean
1.9000
years to
5% Trimmed Mean
1.8333
complete
Median
2.0000
Variance
.989
master's degree
Std. Deviation
.99443
Minimum
1.00
Maximum
4.00
Range
3.00
Interquartile Range
1.25
Skewness
1.085
.687
Kurtosis
.914
1.334 .61824
Employed at
Number of
Mean
4.6000
least part-
years to
5% Trimmed Mean
4.5556
time
complete
Median
4.5000
Variance
3.822
master's degree
.31447
Std. Deviation
1.95505
Minimum
2.00
Maximum
8.00
Range
6.00
Interquartile Range
3.25
Skewness
.147
.687
Kurtosis
-.703
1.334
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis. Tests of Normality Kolmogorov-Smirnova Employment Not employed
Statistic Number of
df
Shapiro-Wilk
Sig.
Statistic
df
Sig.
.260
10
.054
.829
10
.063
.163
10
.200*
.940
10
.555
years to complete master's degree Employed at
Number of
least part-time
years to complete master's degree
a. Lilliefors Significance Correction *. This is a lower bound of the true significance.
ANSWER Review of the Shapiro-Wilk test for normality (SW = .829, df = 10, p = .063) and skewness (1.085) and kurtosis (.914) statistics suggest that normality of the number of years to complete a master’s for not employed was reasonable. The Q-Q plot suggested minor non-normality. Review of the Shapiro-Wilk test for normality (SW = .940, df = 10, p = .555) and skewness (.147) and kurtosis (.703) statistics suggest that normality of the number of years to complete a master’s for not employed was reasonable. The Q-Q plot suggested normality was generally reasonable.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
2. Review the following SPSS output. Interpret the output including identifying the specific statistical procedure that has been used and reporting the extent to which the test is statistically significant. Include appropriate evidence (e.g., values from the output).
t-test for Equality of Means 95% Confidence Interval of the Difference
Sig. t Years to
Equal
complete
variances
master’s
assumed
df
−3.893
18
Mean
Std. Error
(2-tailed)
Difference
Difference
.001
−2.7
.69362
Lower
Upper
−4.15725
−1.24275
ANSWER An independent t test was conducted. The test is statistically significant, t = −3.893, df = 18, p = .001.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. Review the following SPSS output. Interpret the output including identifying the specific statistical procedure that has been used and reporting the extent to which the test is statistically significant. Include appropriate evidence (e.g., test statistic value, degrees of freedom, and p value). Paired Samples Test Paired Differences 95% Confidence Interval of the Difference Mean Pair 1
−27.6
pretest
Std.
Std. Error
Deviation
Mean
Lower
Upper
11.98332
3.78946
−36.1723
−19.0276 −7.283
−
5
t
df
Sig. (2-tailed) 9
.000
5
posttest
ANSWER A dependent t test (i.e., paired samples test) was conducted. The test is statistically significant, t = -7.283, df = 9, p = .000. 4. A research report indicates that the mean birth weight of twins that participated in their study was 5.5 pounds (variance = .25; n = 15) and the mean birth weight of singleton babies was 7.7 pounds (variance = .40; n = 15). Calculate and interpret Cohen’s d effect size. ANSWER
d=
Y1 − Y2 5.5 − 7.7 = = −3.86 sp .57
where
(n1 − 1) s1 + (n2 − 1) s 2 = n1 + n2 − 2 2
sp =
2
(15 − 1).25 + (15 − 1).40 3.5 + 5.6 = = .57 15 + 15 − 2 28
Cohen’s d is −3.86 which, interpreted according to Cohen’s rules of thumb, is a large effect and indicates over one-half of one standard deviation unit difference between the mean birth weight of twins and singletons (with twins having the lower mean). © Taylor & Francis 2020
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. Calculate and interpret Cohen’s d using the following output. Paired Samples Test Paired Differences 95% Confidence Interval of the Difference Mean Pair 1
pretest
−27.6
Std.
Std. Error
Deviation
Mean
Lower
Upper
11.98332
3.78946
−36.1723
−19.0276 −7.283
−
5
t
df
Sig. (2-tailed) 9
.000
5
posttest
ANSWER: The effect size is computed to be the following:
Cohen d =
d − 27.60 = = 2.30 sd 11.98
which is interpreted as there is approximately two and one-third standard deviation unit difference between the pretest and posttest means (with the posttest having the larger mean), a very large effect size according to Cohen’s subjective standard.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Multiple-Choice Chapter 8 (Inferences About Proportions) 1. A restaurant owner wants to determine if the same proportion of individuals select Pepsi versus Coke. She keeps count of the number of people who order Pepsi and Coke on randomly selected days; thus information for only one nominally scaled variable (two levels: Pepsi or Coke) is collected. Which statistical procedure is most appropriate to use to test the hypothesis? a. Chi square goodness-of-fit test b. Chi square test of association ANSWER: A 2. Eden wants to determine if the same proportion of students become involved in Greek life as compared to students who do not become involved in Greek life. She randomly samples 500 college students on her campus and determines which are involved in Greek life. Thus information for only one nominally scaled variable (two levels: Greek or non-Greek) is collected. Which statistical procedure is most appropriate to use to test the hypothesis? a. Chi square goodness-of-fit test b. Chi square test of association ANSWER: A 3. Oscar wants to know whether a person's music preferences are related to the region of the country in which he or she resides. He randomly samples 10,000 adults nationwide and collects data on two variables: 1) music preference, a nominal variable with three levels (country, jazz, pop); and region, a nominal variable with four levels (north, south, east, west). Which statistical procedure is most appropriate to use to test the hypothesis? a. Chi square goodness-of-fit test b. Chi square test of association ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. Malani wants to determine if the same proportion of professional chefs attend chef school in America as compared to attending chef school abroad. She randomly samples 250 professional chefs and determines where they attended chef school (America vs. abroad; nominal variable with two levels). Which statistical procedure is most appropriate to use to test the hypothesis? a. Chi square goodness-of-fit test b. Chi square test of association ANSWER: A 5. Matthew wants to know whether food preferences are related to the region of the country in which a person resides and more specifically if the same proportions of people select different types of food in the different regions. He randomly samples 800 adults nationwide and collects data on two variables: 1) food preference, a nominal variable with two levels (American, ethnic); and region, a nominal variable with four levels (north, south, east, west). Which statistical procedure is most appropriate to use to test the hypothesis? a. Chi square goodness-of-fit test b. Chi square test of association ANSWER: B 6. Luke wants to know if the same proportion of scuba divers prefer east coast diving destinations as compared to west coast diving destinations when selecting where to vacation. He randomly samples 500 scuba divers and determines which coast they prefer (east vs. west coast; nominal measurement scale). Which statistical procedure is most appropriate to use to test the hypothesis? a. C square goodness-of-fit test b. Chi square test of association ANSWER: A 7. Dedra hypothesizes that the proportion of people selecting outdoor as compared to indoor exercise will differ based on the climate where they reside. She randomly samples 100 adults who subscribe to a national health magazine and collects data on two variables: 1) exercise preference, a nominal measurement scale with two categories (outdoor exercise or indoor exercise); and climate, an ordinal variable with three categories (mild, moderate, severe). Which statistical procedure is most appropriate to use to test the hypothesis? a. Chi square goodness-of-fit test b. Chi square test of association © Taylor & Francis 2020
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
ANSWER: B 8. How many degrees of freedom are there in a 2 × 3 contingency table when the chi-square test of association is used? a. 1 b. 2 c. 3 d. 6 ANSWER: B 9. How many degrees of freedom are there in a 4 × 5 contingency table when the chi-square test of association is used? a. 4 b. 5 c. 12 d. 20 ANSWER: C 10. How many degrees of freedom are there in a 3 × 6 contingency table when the chi-square test of association is used? a. 3 b. 6 c. 10 d. 18 ANSWER: C 11. How many degrees of freedom are there in a 2 × 5 contingency table when the chi-square test of association is used? a. 2 b. 4 c. 5 d. 10 ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
12. A contingency table contains 4 rows and 3 columns. How many cells are in this table? a. 3 b. 4 c. 6 d. 12 ANSWER: D 13. A contingency table contains 2 rows and 5 columns. How many cells are in this table? a. 2 b. 4 c. 5 d. 10 ANSWER: D 14. A contingency table contains 3 rows and 6 columns. How many cells are in this table? a. 3 b. 6 c. 10 d. 18 ANSWER: D 15. Which of the following is a correct interpretation of Cohen’s w = .12? a. Small effect b. Moderate effect c. Large effect d. Statistical significance at .10 e. Not statistically significant at .10 ANSWER: A
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
16. Which of the following is a correct interpretation of Cohen’s w = .09? a. Small effect b. Moderate effect c. Large effect d. Statistical significance at .10 e. Not statistically significant at .10 ANSWER: A 17. Which of the following is a correct interpretation of Cohen’s w = .29? a. Small effect b. Moderate effect c. Large effect d. Statistical significance at .05 e. Not statistically significant at .05 ANSWER: B 18. Which of the following is a correct interpretation of Cohen’s w = .35? a. Small effect b. Moderate effect c. Large effect d. Statistical significance at .05 e. Not statistically significant at .05 ANSWER: B 19. Which of the following is a correct interpretation of Cohen’s w = .65? a. Small effect b. Moderate effect c. Large effect d. Statistical significance at .05 e. Not statistically significant at .05 ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
20. Which of the following standardized residuals is contributing to a statistically significant chi-square statistic, given an alpha of .05? a. −1.00 b. −.50 c. +1.70 d. +3.20 e. None of the above ANSWER: D 21. Which of the following standardized residuals is contributing to a statistically significant chi-square statistic, given an alpha of .05? a. −2.30 b. −.04 c. +1.05 d. +1.70 e. None of the above ANSWER: A 22. Which of the following standardized residuals is contributing to a statistically significant chi-square statistic, given an alpha of .05? a. −2.30 b. −.04 c. +1.05 d. +1.70 e. None of the above ANSWER: A 23. Which of the following standardized residuals is contributing to a statistically significant chi-square statistic, given an alpha of .05? a. −0.90 b. −.15 c. +0.95 d. +1.36 e. None of the above ANSWER: E
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
24. Which of the following standardized residuals is contributing to a statistically significant chi-square statistic, given an alpha of .01? a. −2.30 b. −.04 c. +1.05 d. +1.70 e. None of the above ANSWER: E 25. Which of the following standardized residuals is contributing to a statistically significant chi-square statistic, given an alpha of .01? a. −2.50 b. −1.50 c. +0.04 d. +2.75 e. None of the above ANSWER: E 26. What is the measurement scale for variables that are used when examining inferences about proportions? Select all that apply. a. Nominal b. Ordinal c. Interval d. Ratio ANSWER: A and B 27. What are the characteristics of the Chi square distribution? Select all that apply. a. Chi square values range from negative infinity to positive infinity. b. The chi square distribution is a family of distributions dependent on degrees of freedom associated with the number of categories in the data. c. The Chi square calculation yields a negative value. d. Zero is the maximum value for chi square. ANSWER: A and B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
28. What are the characteristics of the Chi square distribution? Select all that apply. a. Degrees of freedom for the chi square distribution is based on the sample size. b. The mean of the chi square distribution equals the degrees of freedom. c. The variance of the chi square distribution equals the product of the sample size and the degrees of freedom. d. Zero is the minimum value for chi square. ANSWER: B and D
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
Short Answer Chapter 8 (Inferences About Proportions) 1. For a random sample of 20 children, 18 attended preschool and 2 did not attend preschool. Test the following hypotheses at the .05 level of significance:
H O : = .75 H 1 : .75
2. For a random sample of 25 students, 20 passed the exam and 5 did not pass the exam. Test the following hypotheses at the .01 level of significance:
H O : = .75 H 1 : .75
3. A chi-square goodness-of-fit test is to be conducted to determine whether the sample proportions of soda drinkers (three categories including Pepsi, Coca-Cola, other) differ from the proportions reported nationally. The chi-square test statistic is equal to 6.500. Determine the result of this test by looking up the critical value and making a statistical decision, using alpha = .05. 4. A chi-square goodness-of-fit test is to be conducted to determine whether the sample proportions of employees (four categories including: part-time less than 10 hours, parttime 10–20 hours, part-time 21–30 hours, full-time) differ from the proportions reported nationally. The chi-square test statistic is equal to 10.250. Determine the result of this test by looking up the critical value and making a statistical decision, using alpha = .01. 5. Lilo wants to know if equal proportions of students in her class know how to surf as compared to do not know how to surf. She randomly samples 50 students in her class and asks them if they know how to surf (yes or no). Twenty students indicate "no," that they do not know how to surf while 30 students indicate "yes," they do know how to surf. Using SPSS, conduct the appropriate statistical procedure at alpha of .05. Interpret the output including identifying the specific statistical procedure that has been used and reporting the extent to which the test is statistically significant. Include appropriate evidence (e.g., values from the output).
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
ANSWERS 1. For a random sample of 20 children, 18 attended preschool and 2 did not attend preschool. Test the following hypotheses at the .05 level of significance:
H O : = .75 H 1 : .75
z=
p −0
0 (1 − 0 ) n
=
.90 − .75 .75 (1 − .75) 20
=
.15 .75(.25) 20
=
.15 = 1.55 .0968
Using the z table, we find critical values of +/−1.96 (split alpha in ½ given a two-tailed test thus 1 − .025 or .975; find P(z) = .975 in the table; this corresponds to z of 1.96). Since z of 1.55 (test statistic value) is less than our critical value (+1.96), we fail to reject the null hypothesis.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
2. For a random sample of 25 students, 20 passeg the exam and 5 did not pass the exam. Test the following hypotheses at the .01 level of significance:
H O : = .75 H 1 : .75
z=
p −0
0 (1 − 0 ) n
=
.80 − .75 .75 (1 − .75) 25
=
.05 .75(.25) 25
=
.05 = .577 .0866
Using the z table, we find critical values of +/−2.57 (split alpha in ½ given a two-tailed test thus 1 − .005 or .995; find P(z) = .995 in the table; this corresponds to z of 2.57). Since z of .577 (test statistic value) is less than our critical value (+2.57), we fail to reject the null hypothesis.
3. A chi-square goodness-of-fit test is to be conducted to determine whether the sample proportions of soda drinkers (three categories including Pepsi, Coca-Cola, other) differ from the proportions reported nationally. The chi-square test statistic is equal to 6.500. Determine the result of this test by looking up the critical value and making a statistical decision, using alpha = .05. This is a non-directional test since we are looking to see if the proportions differ nationally (thus no indication that there is directionality to the hypothesis). With three categories, our degrees of freedom equal 3 – 1 = 2. The alpha is .05. Using the chi-square distribution table, we find the associated critical value is 5.99146. Since the chi-square test statistic is 6.500 and thus greater than our critical value, we reject the null hypothesis.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
4. A chi-square goodness-of-fit test is to be conducted to determine whether the sample proportions of employees (four categories including: part-time less than 10 hours, parttime 10–20 hours, part-time 21–30 hours, full-time) differ from the proportions reported nationally. The chi-square test statistic is equal to 10.250. Determine the result of this test by looking up the critical value and making a statistical decision, using alpha = .01. This is a non-directional test since we are looking to see if the proportions differ nationally (thus no indication that there is directionality to the hypothesis). With four categories, our degrees of freedom equal 4 – 1 = 3. The alpha is .01. Using the chi-square distribution table, we find the associated critical value is 11.3449. Since the chi-square test statistic is 10.250 and thus less than our critical value, we fail to reject the null hypothesis.
5. Lilo wants to know if equal proportions of students in her class know how to surf as compared to do not know how to surf. She randomly samples 50 students in her class and asks them if they know how to surf (yes or no). Twenty students indicate "no," that they do not know how to surf while 30 students indicate "yes," they do know how to surf. Using SPSS, conduct the appropriate statistical procedure at alpha of .05. Interpret the output including identifying the specific statistical procedure that has been used and reporting the extent to which the test is statistically significant. Include appropriate evidence (e.g., values from the output). A chi-square goodness of fit was conducted to determine if equal proportions of students in Lilo’s class know how to surf as compared to do not know how to surf. The test was not statistically significant, 2 = 2.00, df = 1, p = .157. Test Statistics VAR00001 2.000a
Chi-Square df
1
Asymp. Sig.
.157
a. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell frequency is 25.0.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
Multiple-Choice Chapter 9 (Inferences About Variances) 1. Which of the following is an example of two dependent samples? a. Number of text messages sent to college freshmen and the number of emails sent to the same group of college freshmen b. Pretest scores of preschool children and posttest scores of the same group of children c. Science test scores of boys and science test scores of girls d. Weight of zoo animals in the San Diego Zoo and weight of zoo animals in the Central Park Zoo ANSWER: B 2. Which of the following is an example of two dependent samples? a. Birth weight of boys and birth weight of girls b. Number of hours worked of blue-collar workers and number of hours worked of white collar workers c. Salaries for first-year teachers and salaries for third-year teachers d. Weight of individuals before starting a weight loss program and weight of the same individuals after starting a weight loss program ANSWER: D 3. Which one of the following procedures is recommended when the distribution is platykurtic? a. Brown-Forsythe b. O’Brien ANSWER: B 4. Which one of the following procedures is recommended when the distribution is mesokurtic? a. Brown-Forsythe b. O’Brien ANSWER: B 5. Which one of the following procedures is recommended when the distribution is leptokurtic? a. Brown-Forsythe b. O’Brien ANSWER: B © Taylor & Francis 2020
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
6. A researcher computes the O’Brien procedure and finds an F test statistic of 2.22 and an F critical value of 2.01. Which of the following can be determined? a. The variances of the groups are statistically different. b. The variances of the groups are NOT statistically different. c. There is not enough information to make a determination. ANSWER: B 7. A researcher computes the O’Brien procedure and finds an F test statistic of 3.81 and an F critical value of 2.61. Which of the following can be determined? a. The variances of the groups are statistically different. b. The variances of the groups are NOT statistically different. c. There is not enough information to make a determination. ANSWER: A 8. A researcher computes the Brown-Forsythe procedure and finds an F test statistic of 4.75 and an F critical value of 3.32. Which of the following can be determined? a. The variances of the groups are statistically different. b. The variances of the groups are NOT statistically different. c. There is not enough information to make a determination. ANSWER: A 9. A researcher computes the Brown-Forsythe procedure and finds an F test statistic of 2.35 and an F critical value of 3.78. Which of the following can be determined? a. The variances of the groups are statistically different. b. The variances of the groups are NOT statistically different. c. There is not enough information to make a determination. ANSWER: B 10. A researcher conducted the Brown-Forsythe procedure and reports an alpha level of .05 and p value of .80. Which of the following is a correct statement based on this information? a. The variances of the groups are statistically different. b. The variances of the groups are NOT statistically different. c. There is not enough information to make a determination. ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
11. A researcher conducted the Brown-Forsythe procedure and reports an alpha level of .01 and p value of .005. Which of the following is a correct statement based on this information? a. The variances of the groups are statistically different. b. The variances of the groups are NOT statistically different. c. There is not enough information to make a determination. ANSWER: A 12. A researcher conducted the Brown-Forsythe procedure and reports an alpha level of .01 and p value of .10. Which of the following is a correct statement based on this information? a. The variances of the groups are statistically different. b. The variances of the groups are NOT statistically different. c. There is not enough information to make a determination. ANSWER: B 13. A researcher conducted the O’Brien procedure and reports an alpha level of .10 and p value of .05. Which of the following is a correct statement based on this information? a. The variances of the groups are statistically different. b. The variances of the groups are NOT statistically different. c. There is not enough information to make a determination. ANSWER: A 14. A researcher conducted the O’Brien procedure and reports an alpha level of .05 and p value of .20. Which of the following is a correct statement based on this information? a. The variances of the groups are statistically different. b. The variances of the groups are NOT statistically different. c. There is not enough information to make a determination. ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
15. A researcher wants to determine if the sample variance for the number of hours spent studying per week for a sample of college students is different than 15. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: C 16. A researcher samples high school seniors and wants to determine if the variance of college entrance exam scores for the seniors prior to completing a test-taking course is lower than the variance of college entrance exam scores for the same group of seniors after completing a test-taking course. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: A 17. A researcher samples cheerleaders and wants to determine if the variance of the number of cartwheels conducted during a football game prior to attending cheer camp is different than the variance of the number of cartwheels conducted during a football game after attending cheer camp. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: A
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
18. A researcher samples parents of charter school transfer students and wants to determine if the variance of their child’s satisfaction with school prior to transferring to the charter school is lower than the variance of their child’s satisfaction with school after transferring to the charter school. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: A 19. A researcher samples unemployed workers and wants to determine if the variance of depression scores for the unemployed prior to becoming unemployed is higher than the variance of depression scores for the unemployed after becoming unemployed. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: A 20. A researcher wants to determine if the variance of points scored by teams who have won a national championship differs from the variance of points scored by teams who have not won a national championship. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
21. A researcher wants to determine if the variance of the amount of entertainment costs for families who live in subsidized housing differs from the variance of the amount of entertainment costs for families who do not live in subsidized housing. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: B 22. A researcher wants to determine if the variance of price of cars bought by males differs from the variance of the price of cars bought by females. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: B 23. A researcher wants to determine if the variance of salary of assistant professors differs from the variance of salary of associate professors. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: B 24. A researcher wants to determine if the sample variance for college tuition of a sample of college freshmen is different than $10,000. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
25. A researcher wants to determine if the sample variance for reading test scores of a sample of elementary aged children is different than 75. Which one of the following procedures is appropriate to test this? a. Test for two dependent variances b. Test for two independent variances c. Test for a single variance d. None of the above ANSWER: C 26. In what is a researcher interested when conducting a test of inference for independent variances? a. Whether the population variance for one group is different than the population variance for one or more other independent groups b. Whether the population variance for one group is different than the population variance for one or more other dependent groups c. Whether the population mean difference for one group is different than a hypothesized value d. Whether the sample mean of group 1 differs from the sample mean of group 2 e. Whether the population standard deviation is within a ratio of 1:4 f. Whether two groups have the same population mean and variance ANSWER: A 27. Which of the following are examples of types of research questions that can be answered with a test of inference about independent variances? Select all that apply. a. Does the variation in the time to complete a task different for subjects in a treatment group as compared to subjects in a control group? b. Is variation in height different for children who drink the recommended number of glasses of water per day as compare to children who do not? c. Is there a relationship between the variation of BMI in adults and weight in middle school? d. Are there similar proportions of public institutions who declined in federal spending in 2000 as compared to 2010? ANSWER: A and B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4TH ed.). New York: Routledge/Taylor & Francis.
28. Which of the following are examples of types of research questions that can be answered with a test of inference about independent variances? a. Is there a relationship between book price and tuition? b. Are there similar proportions of U.S. versus non-U.S. board graduates? c. Is the variation in graduation rate for public colleges different than the graduation rate for private colleges? d. Does weight vary for athletes who train 6 hours per day as compared to athlete who train 8 hours per day? e. Is there a relationship between satisfaction with a job's salary and satisfaction with a job's level of responsibility? f. Is there a mean difference in engagement in the workplace based on whether an employee is full-time or part-time? ANSWER: C and D
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Short Answer Chapter 9 (Inferences About Variances)
1. The following summary statistics are available for two dependent random samples: s12 = 36, s22 = 16, n = 22, r12 = .70. Test the following hypotheses at the .05 level of significance:
H 0 : 12 − 22 = 0 H1 : 12 − 22 0
2. The following summary statistics are available for two dependent random samples: s12 = 100, s22 = 225, n = 122, r12 = .50. Test the following hypotheses at the .05 level of significance:
H 0 : 12 − 22 = 0 H1 : 12 − 22 0
3. A random sample of 51scores is collected with a sample mean of 75 and a sample variance of 10. Test the following hypotheses at the .05 level of significance:
H0 : 2 = 7 H1 : 2 7
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. A random sample of 9 scores is collected with a sample mean of 62 and a sample variance of 16. Test the following hypotheses at the .05 level of significance:
H 0 : 2 = 15 H 1 : 2 15
5. A random sample of 5 scores is collected with a sample mean of 8 and a sample variance of 9. Test the following hypotheses at the .05 level of significance:
H0 : 2 = 5 H1 : 2 5
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
ANSWERS
1. The following summary statistics are available for two dependent random samples: s12 = 36, s22 = 16, n = 22, r12 = .70. Test the following hypotheses at the .05 level of significance:
H 0 : 12 − 22 = 0 H1 : 12 − 22 0
The degrees of freedom (v) are equal to n – 2 = 22 – 2 = 20. The standard deviations are, respectively, 6 and 4 (i.e., the square root of the variance terms provided).
t=
s12 − s 22 2 s1 s 2
1 − r122
=
36 − 16 2 (6) 4
1 − (.70) 2 20
=
24 = 19.608 1.224
From Appendix Table 2, and using an alpha level of .05 and degrees of freedom (v) of 20, we determine the critical values to be +/− 2.086. As the test statistic (19.608) exceeds the critical value, our decision is to reject the null hypothesis.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
2. The following summary statistics are available for two dependent random samples: s12 = 100, s22 = 225, n = 122, r12 = .50. Test the following hypotheses at the .05 level of significance:
H 0 : 12 − 22 = 0 H1 : 12 − 22 0
The degrees of freedom (v) are equal to n – 2 = 122 – 2 = 120. The standard deviations are, respectively, 10 and 15 (i.e., the square root of the variance terms provided).
t=
s12 − s 22 2 s1 s 2
1 − r122
=
100 − 225 1 − (.50) 2 120
2 (10)15
=
− 115 = −8.398 13.693
From Appendix Table 2, and using an alpha level of .05 and degrees of freedom (v) of 120, we determine the critical values to be +/− 1.98. As the test statistic (−8.398) exceeds the critical value, our decision is to reject the null hypothesis.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. A random sample of 51 scores is collected with a sample mean of 75 and a sample variance of 10. Test the following hypotheses at the .05 level of significance:
H0 : 2 = 7 H1 : 2 7
Degrees of freedom (v) are equal to n − 1, thus 51 – 1 = 50. The hypothesized population variance is provided to us (7). Plugging these values into the formula, we find the test statistic value:
2 =
s 2 50(10) = = 71.429 7 02
From Appendix Table 3, and using an alpha level of .05 (i.e., tabled values listed under .975 and .025) and degrees of freedom (v) of 50, we determine the critical values to be 71.4202 and 32.3574. As the test statistic (71.429) does exceeds one of the critical values by falling into the upper-tail critical region (71.429 > 71.4202), our decision is to reject the null hypothesis.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. A random sample of 9 scores is collected with a sample mean of 62 and a sample variance of 16. Test the following hypotheses at the .05 level of significance:
H 0 : 2 = 15 H 1 : 2 15
Degrees of freedom (v) are equal to n − 1, thus 9 – 1 = 8. The hypothesized population variance is provided to us (15). Plugging these values into the formula, we find the test statistic value:
2 =
s 2 8(16) = = 8.53 15 02
From Appendix Table 3, and using an alpha level of .05 (i.e., tabled values listed under .975 and .025) and degrees of freedom (v) of 8, we determine the critical values to be 2.17973 and 17.5345. As the test statistic (8.53) does not exceed one of the critical values, our decision is to fail to reject the null hypothesis.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. A random sample of 5 scores is collected with a sample mean of 8 and a sample variance of 9. Test the following hypotheses at the .05 level of significance:
H0 : 2 = 5 H1 : 2 5
Degrees of freedom (v) are equal to n − 1, thus 5 – 1 = 4. The hypothesized population variance is provided to us (5). Plugging these values into the formula, we find the test statistic value:
2 =
s 2 7(9) = = 15.75 4 02
From Appendix Table 3, and using an alpha level of .05 (i.e., tabled values listed under .975 and .025) and degrees of freedom (v) of 7, we determine the critical values to be .484419 and 11.1433. As the test statistic (15.75) exceeds one of the critical values, our decision is to reject the null hypothesis.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Multiple-Choice Chapter 10 (Bivariate Measures of Association) 1. Which of the following correlation coefficients indicates the weakest relationship? a. –.77 b. –.25 c. +.10 d. +.50 ANSWER: C 2. Which of the following correlation coefficients indicates the strongest relationship? a. –.77 b. –.25 c. +.10 d. +.50 ANSWER: A 3. Which of the following correlation coefficients indicates the weakest relationship? a. –.60 b. –.42 c. +.55 d. +.80 ANSWER: B 4. Which of the following correlation coefficients indicates the strongest relationship? a. –.91 b. –.59 c. +.05 d. +.95 ANSWER: D 5. Which of the following correlation coefficients indicates the weakest relationship? a. –.70 b. –.62 c. +.50 d. +.65 © Taylor & Francis 2020
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
ANSWER: C 6. If the number of children and household income are strongly positively correlated, then those with more income tend to have more children. a. True b. False ANSWER: A 7. If the number of children and number of pets are strongly positively correlated, then those with more pets tend to have fewer children. a. True b. False ANSWER: B 8. If the kindergarten readiness and IQ are strongly negatively correlated, then those with lower IQ tend to be more prepared for kindergarten. a. True b. False ANSWER: B 9. If work ethic and salary are strongly negatively correlated, then those with lower work ethic tend to be paid more. a. True b. False ANSWER: A 10. For variables X and Y, given a correlation coefficient of +.80, which one of the following is a correct statement? a. As X decreases, Y decreases b. As X decreases, Y increases c. As X increases, Y decreases d. X causes Y ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
11. For variables X and Y, given a correlation coefficient of -.80, which one of the following is a correct statement? a. As X decreases, Y decreases b. As X increases, Y increases c. As X increases, Y decreases d. X causes Y ANSWER: C 12. For variables X and Y, given a correlation coefficient of -.95, which one of the following is a correct statement? a. As X decreases, Y decreases b. As X increases, Y increases c. As X increases, Y decreases d. X causes Y ANSWER: C 13. For variables X and Y, given a correlation coefficient of +.75, which one of the following is a correct statement? a. As X decreases, Y decreases b. As X decreases, Y increases c. As X increases, Y decreases d. X causes Y ANSWER: A 14. Mark wants to know if there is a relationship between men's height and shirt size. He randomly samples 100 men and gathers data on two variables: 1) height in inches (ratio measurement scale); and 2) shirt size (small, medium, large). Which statistical procedure is most appropriate to use to test the hypothesis? a. Pearson correlation coefficient b. Phi coefficient c. Spearman or Kendall tau's correlation coefficient ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
15. A civil engineer wants to know if there is a relationship between the number of stoplights in a city and the number of wrecks during a six-week period. He randomly samples 100 cities across the U.S. and gathers data on: 1) the number of stoplights in the city (ratio measurement scale) and 2) number of wrecks in that city during the past six weeks (ratio measurement scale). Which statistical procedure is most appropriate to use to test the hypothesis? a. Pearson correlation coefficient b. Phi coefficient c. Spearman or Kendall tau's correlation coefficient ANSWER: A 16. Jasmine collects data on a random sample of 100 rollercoasters around the country. She collects data on the following: 1) height of highest peak on the coaster (measured in feet; ratio measurement scale) and 2) length of the coaster (measured in feet; ratio measurement scale). She is interested in examining if there is a relationship between the highest peak and length of the rollercoaster. Which statistical procedure is most appropriate to use to test the hypothesis? a. Pearson correlation coefficient b. Phi coefficient c. Spearman or Kendall tau's correlation coefficient ANSWER: A 17. Aladdin wants to know if there is a relationship between the type of entrance exam completed and the graduate degree program that the student enters. He collects data from a random sample of 500 incoming UCF graduate students on the following: 1) type of graduate entrance exam completed (GRE, GMAT, MAT, other); and 2) graduate degree program the student is entering (master's, Ed.S., Ed.D., Ph.D.). Which statistical procedure is most appropriate to use to test the hypothesis? a. Pearson correlation coefficient b. Phi coefficient c. Spearman or Kendall tau's correlation coefficient ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
18. Matthew is a speech writer for politicians. He wants to know if there is a relationship between the number of words in campaign speeches and the number of votes obtained. He randomly samples 50 politicians and gathers data on two variables: 1) the number of words used in their last known campaign speech (ratio level measurement scale); and 2) the number of votes received (ratio level measurement scale). Which statistical procedure is most appropriate to use to test the hypothesis? a. Pearson correlation coefficient b. Phi coefficient c. Spearman or Kendall tau's correlation coefficient ANSWER: A 19. Aladdin wants to know if the same proportion of master's and bachelor's degree recipients accepts jobs either in or out-of-state. He randomly samples 250 individuals who graduated in spring 2008 with their master's or bachelor's and collects data on the following: 1) degree attained (two categories: master's or bachelor's) and 2) place of employment (two categories: in-state or out-ofstate). Which statistical procedure is most appropriate to use to test the hypothesis? a. Pearson correlation coefficient b. Phi coefficient c. Spearman or Kendall tau's correlation coefficient ANSWER: B 20. Wesley is interested in seeing if there is a relationship between voter registration status (registered voter or not) and whether or not the individual believes the incumbent will win. He collects data from 2,000 voters and collects data on the following two variables: a) voter registration status (two categories: registered or not registered); and b) belief of whether or not the incumbent will win (two categories: incumbent will win or incumbent will not win). Which statistical procedure is most appropriate to use to test the hypothesis? a. Pearson correlation coefficient b. Phi coefficient c. Spearman or Kendall tau's correlation coefficient ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
21. The variance of X is 16, the variance of Y is 25, and the covariance between X and Y is 4. What is rXY? a. .10 b. .25 c. .50 d. .75 ANSWER: A 22. The variance of X is 36, the variance of Y is 36, and the covariance between X and Y is 49. What is rXY? a. .20 b. .36 c. .42 d. .49 ANSWER: A 23. The variance of X is 25, the variance of Y is 64, and the covariance between X and Y is 81. What is rXY? a. .23 b. .30 c. .40 d. .56 ANSWER: A 24. The variance of X is 81, the variance of Y is 100, and the covariance between X and Y is 64. What is rXY? a. .40 b. .56 c. .64 d. .89 ANSWER: D
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
25. The variance of X is 4, the variance of Y is 16, and the covariance between X and Y is 36. What is rXY? a. .24 b. .48 c. .60 d. .75 ANSWER: D 26. A researcher has three variables and is computing three correlation coefficient hypothesis tests using those variables. The researcher was originally testing at an alpha of .05. If the researcher makes a Bonferroni adjustment, what will the alpha level be? Round to three decimal places. a. .017 b. .03 c. .05 d. .15 ANSWER: A 27. A researcher has five variables and is computing ten correlation coefficient hypothesis tests using those variables. The researcher was originally testing at an alpha of .05. If the researcher makes a Bonferroni adjustment, what will the alpha level be? Round to three decimal places. a. .005 b. .01 c. .05 d. .10 ANSWER: A 28. The rank biserial is a variation of which of the following correlation coefficients? a. Cramer's phi b. Kendall's tau c. Pearson d. Phi e. Point biserial f. Spearman ANSWER: E
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Short Answer Chapter 10 (Bivariate Measures of Association) 1. You are given the following pairs of sample scores on X and Y. Determine the Pearson correlation coefficient. X 8 4 6 9 1
Y 7 6 8 6 5
2. You are given the following pairs of sample scores on X and Y. Determine the Pearson correlation coefficient using SPSS. X 20 30 25 40 50
Y 400 300 250 500 100
3. If rXY = .70 for a random sample of size 20, test the hypothesis that the population Pearson is significantly different from 0 (conduct a two-tailed test at the .05 level of significance).
4. If rXY = .50 for a random sample of size 10, test the hypothesis that the population Pearson is significantly different from 0 (conduct a two-tailed test at the .01 level of significance).
5. If rXY = .40 for a random sample of size 122, test the hypothesis that the population Pearson is significantly different from 0 (conduct a two-tailed test at the .01 level of significance).
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
ANSWERS 1. You are given the following pairs of sample scores on X and Y. Determine the Pearson correlation coefficient. X 8 4 6 9 1
Y 7 6 8 6 5
X 8 4 6 9 1
X2 64 16 36 81 1
Y 7 6 8 6 5
Y2 49 36 64 36 25
XY 56 24 48 54 5
= 28
= 198
= 32
= 210
= 187
The Pearson product-moment correlation coefficient is computed as follows.
rXY =
rXY =
n n n n X i Yi − X i Yi i =1 i =1 i =1 2 2 n 2 n n 2 n n X − X n Y − Y i i i i i =1 i =1 i =1 i =1
5(187 ) − (28)(32 )
5(198) − (28) 5(210) − (32) 2
2
.
=
935 − 896 (990 − 784)(1050 − 1024)
=
39 = .533 73.185
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
2. You are given the following pairs of sample scores on X and Y. Determine the Pearson correlation coefficient using SPSS. X 20 30 25 40 50
Y 400 300 250 500 100 Using SPSS, we compute the Pearson correlation to be −.397. Correlations X Pearson Correlation X
Y 1
Sig. (2-tailed)
.508
N
Y
−.397
5
5
Pearson Correlation
−.397
1
Sig. (2-tailed)
.508
N
5
5
3. If rXY = .70 for a random sample of size 20, test the hypothesis that the population Pearson is significantly different from 0 (conduct a two-tailed test at the .05 level of significance). The test statistic formula is computed as:
t=r
n−2 1− r 2
Plugging in values, we find:
t = .70
20 − 2 = .70(5.94) = 4.16 1 − .49
With = n – 2 = 20 – 2 = 18 degrees of freedom, an alpha of .05, and a two-tailed test, we find the critical values to be –2.101 and +2.101. Since the test statistic value (4.16) is larger than our critical value, we reject the null hypothesis. The population Pearson is significantly different from zero. .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. If rXY = .50 for a random sample of size 10, test the hypothesis that the population Pearson is significantly different from 0 (conduct a two-tailed test at the .01 level of significance). The test statistic formula is computed as:
t=r
n−2 1− r 2
Plugging in values, we find:
t = .50
10 − 2 = .50(3.266) = 1.633 1 − .25
With = n – 2 = 10 – 2 = 8 degrees of freedom, an alpha of .01, and a two-tailed test, we find the critical values to be –3.355 and +3.355. Since the test statistic value (1.633) is smaller than our critical value, we fail to reject the null hypothesis. The population Pearson is not statistically significantly different from zero.
5. If rXY = .40 for a random sample of size 122, test the hypothesis that the population Pearson is significantly different from 0 (conduct a two-tailed test at the .01 level of significance). The test statistic formula is computed as:
t=r
n−2 1− r 2
Plugging in values, we find:
t = .40
122 − 2 = .40(14.142) = 5.657 1 − .40
With = n – 2 = 122 – 2 = 120 degrees of freedom, an alpha of .01, and a twotailed test, we find the critical values to be –2.617 and +2.617. Since the test statistic value (5.657) is larger than our critical value, we reject the null hypothesis. The population Pearson is statistically significantly different from zero.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 11 (One-Factor ANOVA: Fixed-Effects Model) Test Bank I. Multiple Choice Items 1. For a one-factor ANOVA comparing five groups with n = 30 in each group, the F ratio has degrees of freedom equal to a. 4, 120 b. 4, 145 c. 5, 149 d. 5, 145 ANSWER: B (dfbetw = J − 1 = 5 − 1 = 4, dfwith = N − J = 5*30 − 5 = 145.) 2. Suppose n1 = 15, n2 = 13, n3 = 17, n4 =17. For a one-factor ANOVA, the dfwith would be a. 3 b. 4 c. 58 d. 61 ANSWER: C (dfwith = N − J = 15 + 13 + 17 + 17 − 4 = 58.) 3. Suppose n1 = 15, n2 = 13, n3 = 17, n4 =17. For a one-factor ANOVA, the dfbetw would be a. 3 b. 4 c. 58 d. 61 ANSWER: A (dfbetw = J − 1 = 4 − 1 = 3.) 4. Suppose n1 = 15, n2 = 13, n3 = 17, n4 =17. For a one-factor ANOVA, the dftotal would be a. 3 b. 4 c. 58 d. 61 ANSWER: d (dftotal = N − J = 15 + 13 + 17 + 17 − 1 = 61. Or, dftotal = dfwith + dfbetw = 61.) 5. In a one-factor ANOVA, if H0 is rejected, we conclude that © Taylor & Francis 2020
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
a. b. c. d.
all population means are different from one another. at least one pair of population means is different. all population means are equal. at least one pair of population means is the same.
ANSWER: B (H0 states that all population means are equal. If H0 is rejected, it means at least one pair of population means is not equal.) 6. In ANOVA, the variability between group means is estimated by a. SSbetw b. SSwith c. MSbetw d. MSwith ANSWER: C (MSbetw = SSbetw/dfbetw. It measures the average deviation of each group mean from the overall mean, thus estimating variance between group means.) 7. In ANOVA, the average deviation of all the scores from their respective group means is estimated by a. SStotal b. SSwith c. MStotal d. MSwith ANSWER: D (MSwith = SSwith/dfwith. It measures the average deviation of each score from the respective group, thus estimating variance within group means.) 8. In ANOVA, which of the following is used to determine the appropriate F ratio? a. SS values b. MS values c. df values d. E(MS) values ANSWER: D (The E(MS) values are used to determine the appropriate F ratio, which represents systematic plus error variability divided by error variability.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
9. A researcher used ANOVA to examine the effects of paper colors on reading speed. She prepared reading materials printed on three different colors of paper: white paper, yellow paper, and blue paper. She then randomly assigned 20 readers to each type of paper. How many factors are involved in this experiment, and how many levels are in the factor(s)? a. 1 factor with 3 levels b. 1 factor with 20 levels c. 3 factors with 20 levels d. 3 factors with 60 levels ANSWER: A (The color of the paper is the factor. There are three levels in the factor: white, yellow, and blue.) 10. Which of the following is a necessary assumption of the ANOVA model? a. Observations come from populations with equal means. b. Observations come from populations with equal variances. c. Sample sizes are equal for each group. d. Dependent variables are measured on a ratio scale. ANSWER: b (The homoscedasticity assumption states that observations come from populations with equal variances. ) 11. If you find an F ratio of 0.5 in a one-factor ANOVA ( = .05), you will a. reject the null hypothesis. b. fail to reject the null hypothesis. c. not be able to decide because the df and critical value are unknown. d. conclude that a mistake has been made. ANSWER: B (An F ratio of 0.5 indicates the between-group variation is small compared to within-group variation and will lead to failing to reject H0 regardless of the df and critical value.) 12. If you find an F ratio of 2 in a one-factor ANOVA ( = .05), you will a. reject the null hypothesis. b. fail to reject the null hypothesis. c. not be able to decide because the df and critical value are unknown. d. conclude that a mistake has been made. ANSWER: C (An F ratio of 2 indicates the between-group variation is somewhat larger than within-group variation, and the means are somewhat different, but whether the difference is significant is uncertain.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
13. If you find an F ratio of −2 in a one-factor ANOVA ( = .05), you will a. reject the null hypothesis. b. fail to reject the null hypothesis. c. not be able to decide because the df and critical value are unknown. d. conclude that a mistake has been made. ANSWER: D (F is the ratio of two sums of squares, so it will never have negative values.) 14. For J = 2 and = .05, if the result of the one-factor fixed-effects ANOVA is nonsignificant, then the result of the independent t test using the same data set and same level will be a. significant. b. nonsignificant. c. uncertain. d. dependent on the sample sizes. ANSWER: B (When J = 2, doing ANOVA is equivalent to doing an independent t test.) 15. When analyzing mean differences between two samples, doing an independent t test instead of an ANOVA a. decreases the probability of a Type I error. b. increases the probability of a Type I error. c. does not change the probability of a Type I error. d. cannot be determined from the information provided. ANSWER: C (When J = 2, doing ANOVA is equivalent to doing an independent t test, thus the probability of a Type I error is the same.) 16. When analyzing mean differences between three samples, doing all pairwise independent t tests instead of ANOVA using the same level a. decreases the probability of a Type I error. b. increases the probability of a Type I error. c. does not change the probability of a Type I error. d. cannot be determined from the information provided. ANSWER: B (When J = 3, there are three different independent t tests to be done. The Type I error accumulates across these tests.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
17. Suppose that for a one-factor ANOVA with J = 3 and n = 10, the three sample means are all equal to 4.5. What is the value of MSbetw? a. 0 b. 45 c. 135 d. Cannot be determined from the information provided. ANSWER: A (When all means are equal, there is no variation between group means, so SSbetw = 0 and MSbetw = 0.) 18. Suppose that for a one-factor ANOVA with J = 3 and n = 10, two of the sample means are both equal to 4.5, but the other sample mean is 4.8. What conclusion will you make? a. Reject the null hypothesis. b. Fail to reject the null hypothesis. c. Uncertain because the df and critical value are unknown. d. Uncertain because the variances of the samples are unknown. ANSWER: D (The means are not all the same, but whether the differences between the means are significant depends on how much variability there is between and within the groups.) 19. The homoscedasticity assumption states that a. the populations are normally distributed. b. the samples are normally distributed. c. the variances of each population are equal. d. the variances of each sample are equal. ANSWER: C (The homoscedasticity assumption states that observations come from populations with equal variances.) 20. Which assumption can be examined by Q-Q plots? a. Independence b. Homogeneity of variance c. Normality d. All of the above ANSWER: C (Q-Q plots graph quantiles of the theoretical normal distribution against quantiles of the sample distribution; thus normality is assessed.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
21. A researcher wanted to compare the average home prices in three different school districts. For each district, he recorded the average annual prices over the past 10 years (n = 10). Then he used a one-factor ANOVA to examine if prices are different between districts. Based on the design, what assumption of ANOVA has most likely been violated? a. Independence b. Homogeneity of variance c. Normality d. Equilibrium ANSWER: A (Observations over time are likely to be dependent on one another; thus independence is likely violated.) 22. In ANOVA, which of the following statements is possible? a. SStotal < SSwith b. dftotal < dfwith c. Both a and b are possible. d. Neither a nor b is possible. ANSWER: d (df and SS always add up to the total, so SStotal will never be smaller than SSwith, and dftotal will never be smaller than dfwith.) 23. In ANOVA, which of the following statements occurs when H0 is true? a. E(MSbetw) = E(MSwith) b. E(MSbetw) > E(MSwith) c. E(F) > 1 d. None of the above ANSWER: A (When H0 is true, E(MSbetw) = E(MSwith), E(F) = 1; when H0 is false, E(MSbetw) > E(MSwith), E(F) > 1.) 24. Which of the following statements about level is true? a. As level decreases, the critical F value will decrease as well. b. As level decreases, the computed F value will increase. c. If level is smaller than the p value, then the null hypothesis is rejected. d. level describes the highest risk of Type I error we are willing to take. ANSWER: D ( level inversely affects the magnitude of critical value, but is not related to the computed F value. We fail to reject the null if < p.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
25. Using the same data with SPSS, Jamie and John found out that the p value of an ANOVA F test is 0.004. They both rejected the null hypothesis. However, Jamie used = 0.05, whereas John used = 0.01. Who has a higher probability of actually making a Type I error? a.J amie b. John c.B oth have the same probability. d. Uncertain ANSWER: C (p value measures the observed probability of making a Type I error. Since Jamie and John both reject the null, both of them have a probability of 0.004 of making Type I error, i.e., falsely rejecting a null hypothesis that is true.) 26. The ability of ANOVA to compare variation between groups is referring specifically to which one of the following? a. Comparing variation between the categories of the independent variable b. Comparing variation between the categories of the dependent variable c. Comparing variation within cases within the same category of the independent variable d. Comparing the relationship between the dependent variable and independent variable ANSWER: A 27. The ability of ANOVA to compare variation within groups is referring specifically to which one of the following? a. Comparing variation between the categories of the independent variable b. Comparing variation between the categories of the dependent variable c. Comparing variation within cases within the same category of the independent variable d. Comparing the relationship between the dependent variable and independent variable ANSWER: C 28. Which one of the following is appropriate for a one-way ANOVA? a. One continuous dependent variable and one categorical independent variable with two or more groups b. One categorical dependent variable and one categorical independent variable with two or more groups .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
c. Two or more continuous dependent variables and one categorical independent variable with two or more groups d. One continuous dependent variable and two or more categorical independent variables with two or more groups ANSWER: A
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 11 (One-Factor ANOVA: Fixed-Effects Model) Test Bank Short Answer Items 1. A consumer testing lab wants to compare the mean life of AA batteries produced by different manufacturers. Five brands of batteries are selected, and for each brand, 20 batteries are randomly sampled. The lab then tests for the lifetime of each battery (in hours) and compares the average battery life of different brands using ANOVA. Complete the following one-factor ANOVA summary table using = .05. Based on the results, do batteries of different brands have different lifetimes? Source SS df MS F Critical Value and Decision Between Within Total
10 1110
2. A reading specialist would like to know whether the page layout has any consistent effect on children's reading speed. He printed the same story in three types of page layout (one-column, two-column, and three-column) and then randomly assigned 15 children to each group. The time each child took to finish reading is recorded and compared using the one-factor ANOVA model. Complete the following ANOVA summary table using = .05. Based on the results, does page layout have an effect on the speed of reading? Source SS df MS F Critical Value and Decision Between Within Total
.
9
1.8
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. A consumer group wanted to determine if there was a difference in prices for a specific type of toy depending on where the toy was purchased. In the local area there are three main retailers: W-Mart, Tag, and URToy. For each retailer, the consumer group randomly selected five stores located in different parts of the city and collected their listed prices of that specific type of toy (in dollars). Assume that all stores priced their merchandise independently. W-Mart Tag URToy 23 24 30 22 27 30 25 28 26 24 25 29 23 27 29 Use SPSS to conduct a one-factor ANOVA to determine if the prices are different across different retailers, using = .05. Test the assumptions, plot the group means, consider an effect size, interpret the results, and write an APA-style summary.
4. A stock analyst wanted to compare the long-term return of stocks from different industries. She randomly selected eight stocks in each of the three industries of interest (financial, energy, utilities) and compiled the 10-year rate of return for each stock (assume the return for one stock is not dependent on the return for any other stock). Below are the data that were collected. Financial Energy Utilities 10.76 12.72 10.88 15.05 14.91 5.86 17.01 6.43 12.46 5.07 11.19 9.90 19.50 18.79 3.95 8.16 20.73 3.44 10.38 9.60 7.11 6.76 17.40 15.70 Use SPSS to conduct a one-factor ANOVA to determine if the returns are equal across industries ( = .05). Test the assumptions, plot the group means, consider an effect size, interpret the results, and write an APA-style summary.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. A researcher was interested in comparing rental rates in four different parts of the city. She randomly selected a block from each part of the city. For each block, she collected the rental rates of different neighboring apartments. She then used a one-factor ANOVA to analyze her data. The ANOVA table below summarized the results she obtained. Source SS df MS F Critical Value and Decision Between 600 3 200 0.5 .05F3,26 = 2.975 Within 2600 26 100 Fail to reject H0 Total 3200 30 a. There are two mistakes in the ANOVA table. Identify the mistakes and correct them. b. Based on the research design, do you think any assumption of ANOVA may have been violated in this study? If so, what assumption is being violated? What might be the consequences of the violation?
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Solutions: 1. There are 5 brands, so J = 5. Each brand has 20 batteries sampled, so n = 20. N = 5*20 = 100. dfbetw = J − 1 = 5 − 1 = 4, dfwith = N − J = 100 − 5 = 95, dftotal = N − 1 = 100 − 1 = 99. SSwith = MSwith*dfwith = 10*95 = 950. SSbetw = SStotal − SSwith = 1100 − 950 = 160.
MSbetw = SSbetw/dfbetw = 160/4 = 40. F = MSbetw/MSwith = 40/10 = 4, critical value = .05F4,95 = 2.47. Because F > critical F value, we reject H0 and conclude that batteries of different brands have different lifetimes. Source
SS
Between Within Total
160 950 1110
df 4 95 99
MS
F
40 10
Critical Value and Decision 4
.05F4,95 = 2.47
reject H0
2. There are 3 different types of page layout, so J = 3. There were 15 children in each group, so n = 15. N = 3*15 = 45. dfbetw = J − 1 = 3 − 1 = 2, dfwith = N − J = 45 − 3 = 42, dftotal = N − 1 = 45 − 1 = 44.
Because F = MSbetw/MSwith, MSwith = MSbetw/F = 9/1.8 = 5. SSbetw = MSbetw*dfbetw = 9*2 = 18, SSwith = MSwith*dfwith = 5*42 = 210, SStotal = SSbetw + SSwith = 18 + 210 = 228. Critical value .05F2,42 = 3.22. Because F = 1.8 < critical F value, we fail to reject H0 and conclude that page layout does not have a significant effect on the speed of reading. Source SS df MS F Critical Value and Decision Between Within Total
.
18 210 228
2 42 44
9 5
1.8
.05F2,42 =3.22
Fail to reject H0
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. Procedure: 1) Create a data set with two variables, Prices and Retailer. The data set should have 15 cases, each case representing one store. 2) Go to Analyze → General Linear Model → Univariate. Select Prices as the Dependent Variable and Retailer as the Fixed Factor. Go to Plot and select Retailer to the Horizontal Axis, then Add, to get a profile plot. Go to Save and check Unstandardized under Residuals to save model residuals. Go to Options. Check Estimates of effect size to get effect size estimates. Check Homogeneity tests to examine the assumption of homoscedasticity. 3) To examine the assumption of independence, go to Graphs → Legacy Dialogs → Scatter/Dot → Simple Scatter → Define. Select RES_1 as the Y Axis, and Retailer as the X Axis. To examine the assumption of normality, go to Analyze → Descriptive Statistics → Explore. Select RES_1 to Dependent List. Go to Plots and check Normality plots with tests. Selected SPSS Output: Tests of Between-Subjects Effects Dependent Variable: Prices Source
Type III Sum of Squares
df
Retailer Error Corrected Total
72.933 26.800 99.733
2 12 14
Levene's Test of Equality of Error Variances Dependent Variable: Prices
.
F
df1
df2
Sig.
.467
2
12
.638
Mean Square 36.467 2.233
F
Sig.
16.328 .000
Partial Eta Squared .731
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Profile Plot
Residual Plot by Group
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Q-Q Plot of Residuals
A one-way ANOVA was conducted to determine if the prices of a certain type of toy differed in three major retail stores. The Q-Q plot of residuals showed that the points clustered close to the diagonal line, suggesting that the assumption of normality was reasonable. According to Levene's test, the homogeneity of variance assumption was satisfied [F(2, 12) = .467, p = .638]. The scatterplot of residuals against the levels of the independent variable demonstrated a random display of points around 0, providing evidence to the assumption of independence being satisfied. From the ANOVA summary table, we see that the prices are significantly different across the three retailers (F = 16.328, df = 2, 12, p < .001), the effect size is rather large (2 = .731; suggesting about 73.1% of the variance in prices is accounted for by the differences in retailers). The profile plot suggested that the price is the lowest in W-Mart, higher in Tag, and the highest in URToy.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. Procedure: Create a data set with two variables, Returns and Industry. The data set should have 24 cases, each representing one stock. Follow the procedures described in Question 3, using Returns as the Dependent Variable, and Industry as the Fixed Factor. Selected SPSS Output: Tests of Between-Subjects Effects Dependent Variable: Return Source
Type III Sum of Squares
df
Industry Error Corrected Total
113.115 480.730 593.845
2 21 23
Levene's Test of Equality of Error Variances Dependent Variable: Return F
df1
df2
Sig.
.156
2
21
.857
Profile Plot
.
Mean Square 56.557 22.892
F
Sig.
Partial Eta Squared
2.471
.109
.190
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Residual Plot by Group
Q-Q Plot of Residuals
A one-way ANOVA was conducted to determine if the returns of stocks differed across three major industries. The Q-Q plot of residuals showed that the points clustered close to the diagonal line, suggesting that the assumption of normality was reasonable. According to Levene's test, the homoscedasticity assumption was satisfied [F(2, 21) = .156, p =.857]. The scatterplot of residuals against the levels of the independent variable demonstrated a random display of points around 0, providing evidence as to the assumption of independence being satisfied. From the ANOVA summary table, we see that the rates of returns are not significantly different across the three industries (F = 2.471, df = 2, 21, p =.109). However, the effect size is adequate (2 = .190; suggesting about 19% of the variance in rate of returns is accounted for by industry). The profile plot suggested that the rate of return is the lowest in Utility, higher in Financial, and the highest in Energy.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. a. dftotal = N − 1 = 29. F = MSbetw/MSwith = 2. b. The assumption of independence may have been violated. The rental rates of neighboring apartments in the same block are possibly related to one another. Violation of this assumption would lead to increased likelihood of a Type I and/or Type II error in the F statistic; it also influences standard errors of means and thus inferences about those means.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 12 (Multiple Comparison Procedures) Test Bank (25 multiple choice items and 5 short answer items) I. Multiple Choice 1. How many possible pairwise contrasts are there if J = 5? a. 5 b. 10 c. 15 d. 20 ANSWER: B (When J = 5, there are 10 possible pairs of group means: 1 vs. 2, 1 vs. 3, 1 vs. 4, 1 vs. 5, 2 vs. 3, 2 vs. 4, 2 vs. 5, 3 vs. 4, 3 vs. 5, and 4 vs. 5.) 2. How many possible pairwise contrasts are there if J = 3? a. 3 b. 4 c. 5 d. 6 ANSWER: A 3. Which of the following linear combinations of population means is a legitimate contrast? a. − + b. ( + ) + c. ( + ) − d. − − ANSWER: C (For a legitimate contrast, the sum of the contrast coefficients must be equal to 0.) 4. If J = 3, which of the following sets of contrasts is orthogonal? a. − − b. + − c. − − ( + ) d. − − ( + ) ANSWER: D (For a, cjcj' = 2. For b, + is not a legitimate contrast. For c, cjcj' = 3/2. For d, both contrasts are legitimate and cjcj' = 0.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. When a one-factor fixed-effects ANOVA results in a nonsignificant F ratio for J = 4, one should follow up the ANOVA with which one of the following post-hoc procedures? a. Scheffé method b. Tukey HSD method c. Fisher LSD method d. None of the above ANSWER: D (When the omnibus ANOVA F is not significant, there is no need to conduct post-hoc MCPs.) 6. A one-factor fixed-effects ANOVA results in the following: F (1, 17) = .011, p = .918. Which post-hoc procedure should be used as a follow-up? a. Scheffé method b. Tukey HSD method c. Fisher LSD method d. None of the above ANSWER: D (When the omnibus ANOVA F is not significant, there is no need to conduct post-hoc MCPs.) 7. Applying the Dunn procedure, the per contrast alpha is calculated to be 0.01 for five contrasts. What is the nominal family-wise error rate? a. .002 b. .005 c. .01 d. .05 ANSWER: D (For the Dunn procedure, the per contrast alpha is calculated as the nominal alpha divided by the number of contrasts; so the nominal alpha is calculated as the per contrast alpha times the number of contrasts, or .01*5 = .05) 8. Applying the Dunn procedure, what is the per contrast alpha if the nominal alpha is 0.05 and there are three contrasts? a. .02 b. .05 c. .10 d. .15 ANSWER: A (For the Dunn procedure, the per contrast alpha is calculated as the nominal alpha divided by the number of contrasts thus .05/3 = .0167 or approximately .02.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
9. For any pairwise MCP, holding the number of groups and dfwith constant, as the level increases, the critical value will a. increase. b. decrease. c. not change. d. increase or decrease depending on which MCP is used. ANSWER: B (As increases, the critical value will decrease and it becomes easier to reject H0.) 10. Which of the following statements about post hoc tests is true? a. We should always do post hoc tests following an omnibus ANOVA F test. b. We should always do post hoc tests following a significant ANOVA F test. c. With post hoc comparisons, we can conduct only pairwise tests. d. We should always do post hoc tests following a significant ANOVA F test and when J > 2. ANSWER: D (We should do post hoc tests when the omnibus F test is significant, and J > 2. Both pairwise and complex contrasts can be tested with post hoc MCPs.) 11. Which of the following statements about trend analysis is true? a. The independent variable should not be on the nominal scale. b. If the factor has three levels, we can test the linear trend, the quadratic trend, and the cubic trend. c. The set of contrasts are not necessarily orthogonal. d. If there is a significant linear trend, there cannot be a significant quadratic trend. ANSWER: A (If J = 3, there are only two possible trends to be tested. The contrasts in trend analysis are always orthogonal. It is possible to have both significant linear and quadratic trends.) 12. For pairwise post hoc contrasts with unequal group variances, which of the following MCPs is most appropriate? a. Scheffé b. Kaiser-Bowden c. Tukey HSD d. Games-Howell ANSWER: D (See the flowchart of MCPs.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
13. A researcher used Fisher's LSD to test three contrasts: − − − ( + ). Evaluate this practice. a. This practice is problematic because Fisher's LSD is always too liberal. b. This practice is problematic because Fisher's LSD can be used only to test simple contrasts. c. This practice is problematic because the contrasts are not orthogonal. d. I do not see any problem with this practice. It is great! ANSWER: B (Fisher's LSD is designed to test pairwise contrasts. It has good control of Type I error when J = 3, but becomes too liberal when J > 3.) 14. Which one of the following procedures does not assume equal group variances? a. Scheffé b. Kaiser-Bowden c. Tukey HSD d. Tukey-Kramer ANSWER: B (Kaiser-Bowden can be used for complex contrasts when group variances are not equal. See Figure 12.2.) 15. Which of the following is not an assumption of Tukey HSD? a. equal sample sizes (n) b. equal group variances c. equal group means d. All of the above are assumptions for the test. ANSWER: C (Tukey HSD requires equal n and equal variances; we hope the means are different.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
16. Following a significant omnibus ANOVA F test, a Scheffé MCP is conducted. Which of the following results is never possible? a. Only one pairwise contrast is significant. b. Only one complex contrast is significant. c. None of the pairwise contrasts is significant. d. None of the linear contrasts is significant. ANSWER: D (Scheffé is the only MCP that is consistent with the omnibus F; if F is significant, at least one linear combination of means will be significant) 17. If 5 orthogonal contrasts can be tested, what is the number of levels of the independent variable? a. 1 b. 4 c. 5 d. 6 ANSWER: D (The number of orthogonal contrasts is one less than the number of levels of the independent variable. In this case J − 1 = 5, so J = 5 + 1 = 6) 18. Which of the following procedures can be used to test complex contrasts? a. Tukey HSD b. Dunnett's method c. Fisher's LSD d. Scheffé method ANSWER: D (The Tukey, Dunnett, and Fisher methods are all designed to test pairwise contrasts. Only the Scheffé method can be used to test complex contrasts.) 19. In a study of different instructional methods, if the researcher is interested only in comparing the scores of the class taught by the traditional method against each of the classes taught by the alternative methods, which procedure should she use? a. Dunnett method b. Dunn (Bonferroni) method c. Tukey HSD d. Trend analysis ANSWER: A (The Dunnett method is designed to test pairwise contrasts where a control group, in this case the class taught by the traditional method, is compared to each of the other groups.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
20. A researcher conducts a study of different instructional methods. If the researcher is interested in comparing the scores of the class taught by the traditional method against the average of the classes taught by all the alternative methods, which procedure should she use? a. Dunnett method b. Dunn (Bonferroni) method c. Tukey HSD d. Trend analysis ANSWER: B (It is a planned complex comparison.) 21. A researcher conducts a study of different instructional methods. The researcher wants to examine if student scores will increase as the instructional time increases. She randomly assigned 60 students to three classes. Each class spent 4 hours, 5 hours, and 6 hours per day, respectively, on instruction. Then she compared the average scores of the three classes. Which procedure is the most appropriate for this research question? a. Dunnett method b. Dunn (Bonferroni) method c. Tukey HSD d. Trend analysis ANSWER: D (The groups represent different quantitative levels of a ratio-scaled variable, i.e., instructional time.) 22. A researcher conducts a study of different instructional methods. The researcher finds that student scores increase significantly when instructional time increases, so she predicts that scores will be even higher if the instructional time is 12 hours per day. Evaluate the prediction. a. The prediction is problematic because 4, 5, 6, and 12 are not equally spaced. b. The prediction is problematic because the classes with longer instructional time may not have the same number of students. c. The prediction is problematic because we have no way of knowing what the trend is outside of the range of 4 to 6 hours of instructional time. d. The prediction is accurate because it is based on empirical evidence. ANSWER: C (The trend may or may not be the same outside of the range of levels investigated.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
23. After a significant F test, two different pairwise comparison procedures were used to detect the differences between 5 group means. Procedure A detected one significant contrast, whereas procedure B showed that all contrasts are significant. Which procedure is more powerful? a.p rocedure A b. procedure B c.e qually powerful d. cannot be determined based on the information given ANSWER: B (With more powerful procedures, it is easier to find significant results.) 24. In post hoc pairwise comparisons, if procedure A yields wider confidence intervals than procedure B does, which procedure is more powerful? a.p rocedure A b. procedure B c.e qually powerful d. cannot be determined based on the information given ANSWER: B (More powerful procedures tend to generate narrower confidence intervals.) 25. After a significant F test, a researcher used the Scheffé method to test all pairwise contrasts and found that all such contrasts are significant. He then repeated MCP with the Tukey method. What would he find? a.A ll pairwise contrasts are significant. b. Fewer number of contrasts are significant. c.N one of the pairwise contrasts are significant. d. Anything is possible. ANSWER: A (When testing all pairwise contrasts, Scheffé is more conservative than Tukey. The contrasts found to be significant by Scheffé would also be significant by Tukey.) 26. A researcher finds a significant omnibus F test with J = 4. Which of the following statements is true? a.Y ou should always do a priori comparisons before an ANOVA. b. You should always do post hoc tests after an ANOVA. c.Y ou should always do post hoc tests to determine why H0 was rejected. d. You should always do post hoc tests to determine why H0 was not rejected. ANSWER: C (A priori comparisons are usually conducted without regard to omnibus ANOVA. Post hoc tests are conducted when H0 is rejected in the omnibus test and J > 2.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
27. In an experiment, ̅ Y1 = 10, ̅ Y2 = 20, and ̅ Y3 = 40. Tukey HSD shows that − is a significant contrast. What will you find out about − and − if the same procedure and same level are used? a. Only − is significant. b. Only − is significant. c. Both contrasts are significant. d. None of the contrasts are necessarily significant. ANSWER: C (Using the same procedure and level, if = ̅ Y2 − ̅ Y1 = 10 is significant, any pairwise contrasts with larger group differences will be significant.) ̅1 = 10, Y ̅2 = 200, and Y ̅3 = 4000. Which pairwise contrast will 28. In an experiment, Y necessarily be significant? a. − b. − c. − d. None of the contrasts will necessarily be significant. ANSWER: D (We do not know the values of the standard error, critical value, etc.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Short Answer Chapter 12 (Multiple Comparison Procedures)
1. A one-factor fixed-effects ANOVA is performed on data for 5 groups of unequal sizes, and H0 is rejected at the .05 level of significance. Using the Scheffé procedure, test the contrast that ̅
̅
𝑌 +𝑌 𝑌̅.1 − .2 2 .3
at the .05 level of significance given the following information: dfwith = 30, 𝑌̅.1 = 5, n1 = 6, 𝑌̅.2 = 7.5, n2 = 6, 𝑌̅.3= 8.5, n3 = 6, and MSwith = 9. 𝑌̅ +𝑌̅
ANSWER: For the contrast Ψ = 𝑌̅.1 − .2 2 .3 , c1 = 1, c2 = −1/2, c3 = −1/2; n1 = n2 = n3 = 6; dfwith = 30; MSwith = 9; 𝑌̅.1 = 5, 𝑌̅.2 = 7.5, 𝑌̅.3 = 8.5; = .05; J = 5. Therefore, contrast = 5 − (7.5 + 8.5)/2 = −3; 𝑐2
𝑐2
𝑐2
1
2
3
1
1
1
standard error of contrast: s = √𝑀𝑆with (𝑛1 + 𝑛2 + 𝑛3 ) = √9(6 + 24 + 24) = 3/2; t = s = −3/(3/2) = −2. critical values: ±√(𝐽 − 1)(α 𝐹𝐽−1,𝑑𝑓𝑤𝑖𝑡ℎ ) = ±√4(.05 𝐹4,30 ) =±3.28. Because |t| = 2 < 3.28, we fail to reject the null hypothesis and conclude that the contrast is not significant at the .05 level of significance.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
2. A one-factor fixed-effects ANOVA is performed on data from three groups of equal size (n = 15) and equal variances, and H0 is rejected at the .05 level. The following values were computed: MSwith = 60 and the sample means are 𝑌̅.1 = 25, 𝑌̅.2 = 20, and 𝑌̅.3 = 12. Use the Tukey HSD method to test all possible pairwise contrasts ( = .05). ANSWER: J = 3, n1 = n2 = n3 = 15; dfwith = n1 + n2 + n3 − J = 42, MSwith = 60; = .05; 𝑌̅.1 = 25, 𝑌̅.2 = 20, 𝑌̅.3 = 12. Because J = 3, there are three possible pairwise contrasts: 1 vs. 2, 1 vs. 3, 2 vs. 3. For each contrast, q =
𝑌̅.𝑗 −𝑌̅.𝑗′ 𝑠ψ
, where 𝑌̅.𝑗 and 𝑌̅.𝑗′ are two group means to be compared. 𝑀𝑆𝑤𝑖𝑡ℎ
Because standard error s = √
𝑛
60
= √15 = 2
𝑌̅ −𝑌̅
•
q1= ( .1𝑠 .2 ) = (25 − 20)/2 = 2.5
•
q2= ( .1𝑠 .3 ) = (25 − 12)/2 = 6.5
•
q1= ( .2𝑠 .3 ) = (20 − 12)/2 = 4
ψ
𝑌̅ −𝑌̅ ψ
𝑌̅ −𝑌̅ ψ
Using Tukey HSD, the critical values are ±qdf(with), J = q60, 3 = ±3.40. q1 < critical q, so 1 − 2 is not statistically significant at = .05. q2 > critical q, q3 > critical q, so 1 − 3 and 2 − 3 are statistically significant at = .05.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. Dr. Guinea Pigg would like to test if the amount of caffeine intake would affect how well laboratory rats performed on a specific task. He randomly assigned 40 rats to five groups (n = 8). Each rat walked the same maze after drinking caffeinated water. The amount of caffeine the rats took and the number of mistakes they made in walking the maze were recorded in the following table. Dosage 10 mg 20 mg 30 mg 40 mg 50 mg 16 16 2 5 7 5 10 9 8 12 11 7 11 1 14 Number of 23 12 13 5 16 mistakes 18 7 10 8 11 12 4 13 11 9 12 23 9 9 19 19 13 9 9 24 Using the data, conduct a trend analysis to test if there is any linear, quadratic, or cubic trend ( = .05). ANSWER: Procedure: Create a data set with two variables, Dosage and Mistakes. The data set should have 40 cases, each representing one rat. Go to Analyze → General Linear Model → Univariate. Select Mistakes as the Dependent Variable and Dosage as the Fixed Factor. Go to Contrasts and select Polynomial in the pull-down menu under Change Contrasts. Click Change, then Continue. To plot the trend, go to Plots, select Dosage to Horizontal Axis. Click Add, then Continue. Selected SPSS Output:
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Results of Trend Analysis Dependent Variable Mistakes Contrast Estimate −1.739 Hypothesized Value 0 Difference (Estimate − Hypothesized) −1.739 Linear Std. Error 1.738 Sig. .324 Lower Bound −5.268 95% Confidence Interval for Difference Upper Bound 1.790 Contrast Estimate 5.212 Hypothesized Value 0 Difference (Estimate − Hypothesized) 5.212 Quadratic Std. Error 1.738 Sig. .005 Lower Bound 1.683 95% Confidence Interval for Difference Upper Bound 8.740 Contrast Estimate 2.688 Hypothesized Value 0 Difference (Estimate − Hypothesized) 2.688 Cubic Std. Error 1.738 Sig. .131 Lower Bound −.841 95% Confidence Interval for Difference Upper Bound 6.217 Dosage Polynomial Contrast
A trend analysis was conducted to determine if there was any significant linear, quadratic, or cubic trend in the number of mistakes made as the dosage of caffeine increases ( = .05). From the summary table, we see that the linear trend was not significant (p =.324), nor was the cubic trend significant (p =.131). However, the quadratic trend was statistically .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
significant (p = .005). The profile plot suggested that the number of mistakes made decreased as the caffeine intake increased from 10 mg to 40 mg, yet increased drastically as the intake further increased to 50 mg.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. In the experiment described in Question 3, instead of a planned test, Dr. Pigg conducted an omnibus ANOVA F test at = .05 and found there was significant difference between the group means. Suppose the group variances are equal. Use the Tukey method to test all possible pairwise contrasts. ANSWER Procedure: Use the data set created for Question 3. Go to Analyze → General Linear Model → Univariate. Select Mistakes as the Dependent Variable and Dosage as the Fixed Factor. Go to Post Hoc and select Dosage for Post Hoc Tests for. Check Tukey under Equal Variances Assumed, then click Continue. Selected SPSS Output: Tests of Between-Subjects Effects Dependent Variable: Mistakes Type III Source Sum of Squares Dosage 314.400 Error 846.000 Corrected Total 1160.400
.
Mean Square
df 4 35 39
F
78.600 3.252 24.171
Sig. .023
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Multiple Comparisons Mistakes Tukey HSD Mean (I) (J) Dosage Difference (I- Std. Error Dosage J) 2 20mg 3.00 2.458 3 30mg 5.00 2.458 1 10mg * 4 40mg 7.50 2.458 5 50mg .50 2.458 1 10mg −3.00 2.458 3 30mg 2.00 2.458 2 20mg 4 40mg 4.50 2.458 5 50mg −2.50 2.458 1 10mg −5.00 2.458 2 20mg −2.00 2.458 3 30mg 4 40mg 2.50 2.458 5 50mg −4.50 2.458 * 1 10mg −7.50 2.458 2 20mg −4.50 2.458 4 40mg 3 30mg −2.50 2.458 5 50mg −7.00 2.458 1 10mg −.50 2.458 2 20mg 2.50 2.458 5 50mg 3 30mg 4.50 2.458 4 40mg 7.00 2.458
95% Confidence Interval Sig. .740 .272 .033 1.000 .740 .925 .373 .846 .272 .925 .846 .373 .033 .373 .846 .053 1.000 .846 .373 .053
Lower Bound Upper Bound −4.07 −2.07 .43 −6.57 −10.07 −5.07 −2.57 −9.57 −12.07 −9.07 −4.57 −11.57 −14.57 −11.57 −9.57 −14.07 −7.57 −4.57 −2.57 −.07
10.07 12.07 14.57 7.57 4.07 9.07 11.57 4.57 2.07 5.07 9.57 2.57 −.43 2.57 4.57 .07 6.57 9.57 11.57 14.07
After a significant F is found, Tukey HSD was used to test all pairwise contrasts (α = .05). There is only one significant pairwise comparison: Group 1 vs. Group 4 (p = .033). (Also notice the 95% confidence interval does not include 0.) That means the number of mistakes made by rats with 10 mg of caffeine intake is significantly different from that made by rats with 40 mg of caffeine intake.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. Consider the situation where there are J = 3 groups of subjects. Answer the following questions: a. Construct a set of orthogonal contrasts and show that they are orthogonal. b. Construct a set of contrasts for Dunnett’s test (use group 1 as the reference group). Are these contrasts orthogonal? Why? c. Construct a set of contrasts for pairwise comparisons. Are these contrasts orthogonal? Why?
ANSWER: a. One possible set of orthogonal contrasts: = .1 − (.2 + .3)/2, = .2 − .3 c1 c2 c3 +1 −1/2 −1/2 =0 = .1 − (.2 + .3)/2 0 +1 −1 =0 = .2 − .3 0 −1/2 +1/2 =0 (cj cj') Because for each contrast, cj = 0, all contrasts are legitimate. Because for the pair of contrasts, cj' =0, the set of contrasts is orthogonal. b) The set of contrasts for Dunnett’s test: = .1 − .2, = .1 − .3 c1 c2 c3 +1 −1 0 =0 = .1 − .2 +1 0 −1 =0 = .1 − .3 +1 0 0 =1 (cj cj') Because for the pair of contrasts, cj' = 1, the set of contrasts are not orthogonal. c) The set of contrasts for pairwise comparison: = .1 − .2, = .1 − .3, = .2 − .3 c1 c2 c3 +1 -1 0 =0 = .1 − .2 +1 0 −1 =0 = .1 − .3 0 +1 −1 =0 = .2 − .3 +1 0 0 =1 and ∑(𝑐𝑗 𝑐𝑗′ ) = 0 −1 0 = −1 and 0 0 1 =1 and Because for each pair of contrasts, cj' ≠ 0, the set of contrasts are not orthogonal.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 13 (Factorial ANOVA: Fixed-Effects Model) Test Bank (25 multiple choice items) I. Multiple Choice 1. For a three-factor fixed-effects ANOVA, how many F tests will there be? a. 3 b. 4 c. 6 d. 7 ANSWER: D (A three-factor fixed-effect ANOVA has 7 effects to be tested: 3 main effects; 3 two-way interactions; and 1 three-way interaction.) 2. For a design with four factors, how many three-way interactions will there be? a. 3 b. 4 c. 8 d. 16 ANSWER: B (For 4 factors, A, B, C, and D, there are 4 three-way interactions: ABC, ABD, ACD, BCD.) 3. In a two-factor ANOVA, factor A has three levels and factor B has two levels. If each cell has five observations, the FA ratio has degrees of freedom equal to? a. 2, 24 b. 3, 24 c. 2, 29 d. 3, 29 ANSWER: A (dfA = J − 1 = 3 − 1 = 2, dfwith = N − JK = 3*2*5 – 3*2 = 24, so dfs for FA are 2, 24.) 4. In a two-factor ANOVA, dfA = 2, dfB = 3, and each cell has five observations. What is dfAB? a. 2 b. 3 c. 6 d. 10 ANSWER: C (dfAB = dfA*dfB = 2*3 = 6) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. In a two-factor ANOVA, dfA = 2, dfB = 3, and each cell has five observations. What is dfwith? a. 24 b. 36 c. 48 d. 60 ANSWER: C (dfA = 2, so J = 3; dfB = 3, so K = 4; dfwith = N − JK = 3*4*5 – 3*4 = 48.) 6. In a two-factor fixed-effects ANOVA, FA = 3, FB = 3.5, dfA = 3, and dfB = 3. If the null hypothesis for factor A is rejected at .05 level of significance, the null hypothesis for factor B will be a. rejected at .10 level of significance. b. rejected at .05 level of significance. c. both a and b. d. neither a nor b. ANSWER: C (dfA = dfB, so the critical value is the same for FA and FB. Therefore, if H0 for factor A is rejected at = .05, H0 for factor B will also be rejected when = .05, or when > .05.) 7. In a three-factor fixed-effects ANOVA, the three-way interaction effect is found to be significant. Which of the following statements is always true? a. All of the two-way interaction effects and main effects are significant. b. None of the two-way interaction effects or main effects are significant. c. At least one of the two-way interaction effects or main effects is significant. d. None of the above. ANSWER: D (Knowing that the three-way interaction is significant does not give us information on either main effects or two-way interactions.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
8. A researcher used ANOVA to examine the effects of different types of diet on the weight gains of sheep. Specifically, he wants to see if the effect of diet type is different for sheep of different ages. Ten sheep were assigned to each of the four diet groups. Within each diet group, half of the sheep were younger than one year old, and the other half were older than one year old. How many factors are in this experiment, and how many levels do these factors have? a. 1 factor; 4 levels b. 1 factor; 8 levels c. 2 factors; one with 4 levels, one with 2 levels. d. 2 factors; one with 10 levels, one with 5 levels. ANSWER: C (There are two factors: diet type and age. Diet type has four levels, i.e., four types of diet. Age has two levels, i.e., < 1 year old and > 1 year old.) 9. A three-factor fixed-effects design always has which one of the following? a. Three independent variables b. Three levels in each factor c. Three dependent variables d. A significant three-way interaction ANSWER: A (A three-factor design indicates three independent variables, nothing else.) 10. In a two-factor fixed-effects ANOVA, the interaction effect is definitely not present when a. the effect of one factor is different across levels of the other factor. b. column effects are not consistent across rows. c. main effects and residual error do not account for all of the variation in the dependent variable. d. main effects and residual error have accounted for all of the variation in the dependent variable. ANSWER: D (When the variation in the dependent variable is completely accounted for by main effects and residual error, there is no interaction effect.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
11. In a 2*2 factorial design, the cell means are given as follows: cell 11 = 10, cell 12 = 20, cell 21 = 10, and cell 22 = 0. Assume that the within-cell variation is small. Which one of the following conclusions seems most probable? a. All effects are significant. b. Only the two main effects is significant. c. Only the interaction is significant. d. The interaction and one of the main effects are significant. ANSWER: D (A plot of the cell means reveals that the interaction and one of the main effects are significant.)
12. Which of the following situations would result in the greatest generalizability of the main effect for factor B across the levels of factor A? a. For factor A, p = .06; for factor B, p = .05; and interaction AB, p = .01. b. For factor A, p = .06; for factor B, p = .10; and interaction AB, p = .05. c. For factor A, p = .20; for factor B, p = .05; and interaction AB, p = .10. d. For factor A, p = .20; for factor B, p = .10; and interaction AB, p = .05. ANSWER: C (A nonsignificant interaction results in the greatest generalizability of the main effect for B.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Questions 13 through 15 are based on the following plots of cell means. Assume that the within-cell variation is very small.
(1)
(3)
(2)
(4)
13. Which plot indicates that both main effects are significant but the interaction effect is nonsignificant? a. plot (1) b. plot (2) c. plot (3) d. plot (4) ANSWER: A (The two lines are parallel; the mean of B2 is higher than that of B1 across levels of A; the mean of A2 is higher than that of A1 across levels of B.)
14. Which plot indicates that neither of the main effects are significant but the interaction effect is significant? a. plot (1) b. plot (2) c. plot (3) d. plot (4) ANSWER: C (The two lines are obviously not parallel; A1 and A2 have the same aggregated mean; B1 and B2 have the same aggregated mean.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
15. Which plot(s) indicate(s) significant interaction effects? a. plots (1) and (2) b. plots (2) and (3) c. plots (1), (2), and (3) d. plots (1) and (4) ANSWER: B (Nonparallel lines suggest that an interaction may exist.) 16. Using the same data set (J = 3, K = 2, n = 10), Jane conducted two one-factor ANOVA, whereas Joe conducted a two-factor ANOVA. Which of the following statements is not true? a.J ane and Joe would get the same dfA. b. Jane and Joe would get the same dfB. c.J ane and Joe would get the same dfwith. d. Jane and Joe would get the same dftotal. ANSWER: C (Using the same data set, no matter how many factors are included in the model, dfA = J − 1, dfB = K − 1, and dftotal = N − 1. dfwith is smaller in the two-factor model because more degrees of freedom are taken away from dftotal.) 17. The results of a two-factor ANOVA (J = 3, K = 2) show that both main effects are significant, but the interaction is not significant. We need to a.c onduct MCPs to examine the two main effects and the interaction effect. b. conduct MCPs to examine the interaction effect only. c. conduct MCPs to examine the two main effects. d. conduct MCPs to examine one main effect only. ANSWER: D (We need only to conduct MCPs to examine significant effect(s) with more than one degree of freedom.) 18. In a three-factor fixed-effects ANOVA, FA = 3, dfA = 2, dfB = 3, dfC = 2, and dfwith = 60. The null hypothesis for factor A can be rejected a.a t the .01 level. b. at the .05 level, but not at the .01 level. c. at the .10 level, but not at the .05 level. d. None of the above. ANSWER: C (.01F2, 60 = 4.98, .05F2, 60 = 3.15, .10F2, 60 = 2.39. FA = 3 is smaller than .05F2, 60 and larger than .10F2, 60, so H0 for factor A can only be rejected at the .10 level.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
19. Adding a second factor to a one-factor design may a. increase dftotal. b. increase dfwith. c. decrease SStotal. d. decrease MSwith. ANSWER: D (Adding a second factor will not change dftotal or SStotal. dfwith will decrease. MSwith may decrease.)
Questions 20 through 22 are based on the following ANOVA summary table (fixed effects): Source df MS F A B AB Within
2 3 6 120
15 10 3 5
3.0 2.0 0.6
20. For which source of variation is the null hypothesis rejected at the .10 level of significance? a. A b. B c. AB d. All of the above ANSWER: A (.10F2, 120 = 2.35, .10F3, 120 = 2.13, .10F6, 120 = 1.82. Only FA is larger than its critical value.) 21. How many cells are there in the design? a. 6 b. 8 c. 9 d. None of the above ANSWER: D (dfA = 2, dfB = 3, so J = 3 and K = 4. There are 12 cells.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
22. The total sample size for the design is which one of the following? a. 131 b. 132 c. 134 d. None of the above ANSWER: B (N = dftotal + 1 = dfA + dfB + dfAB + dfwith + 1 = 132.)
Questions 23 through 25 are based on the following ANOVA summary table (fixed effects): Source df MS F A B AB Within
5 1 5 60
18.0 13.5 15.0 3.0
6.0 4.5 5.0
23. For which source of variation is the null hypothesis rejected at the .05 level of significance? a.A b. B c.A B d. All of the above ANSWER: D (.05F5, 60 = 2.37, .05F1, 60 = 4. All F values are larger than their respective critical value.) 24. How many cells are there in the design? a. 6 b. 10 c.1 2 d. None of the above ANSWER: C (dfA = 5, dfB = 1, so J = 6 and K = 2. There are 12 cells.) 25. The total sample size for the design is which one of the following? a.7 2 b. 74 c.7 7 d. None of the above ANSWER: A (N = dftotal + 1 = dfA + dfB + dfAB + dfwith + 1 = 72.).
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
26. Which of the following would be appropriate for a factorial ANOVA? a. One categorical dependent variable and one categorical independent variable b. One categorical dependent variable and one continuous independent variable c. One continuous dependent variable and one categorical independent variable d. One continuous dependent variable and two categorical independent variables ANSWER: D 27. A researcher is interested in examining the extent to which there is a mean difference in lower-class undergraduate students' attitude toward instruction (interval) based on class modality (face-to-face, hybrid, online) and class standing (freshman or sophomore). Would conducting a factorial ANOVA be appropriate for this study? a. Yes b. No ANSWER: A 28. In a two-factor fixed-effects ANOVA with factors A and B, each of which have four categories, which one of the following occurs? a. Alternating categories of the factors are not included. b. Categories 1 and 2 of factor A are crossed with categories 3 and 4 of factor B only. c. Every combination of factors A and B is included in the design of the study. d. The first one-half of units in each category in factor A are crossed with the last one-half of unit in each category in factor B; other units and categories are excluded. ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 13 (Factorial ANOVA: Fixed-Effects Model) Test Bank (5 short answer items)
II. Short Answer 1. Complete the following ANOVA summary table for a two-factor fixed-effects ANOVA, where there are three levels of factor A (teaching method) and two levels of factor B (time of class). Each cell includes six students and = .05. Source
SS
df
A B AB Within Total
MS
F
Critical Value
Decision
6.5 5.2 39 65
ANSWER There are 3 levels of factor A, so J = 3. There are 2 levels of factor B, so K = 2. Each cell has 6 students, so n = 6. N = 3*2*6 = 36. dfA = J − 1 = 3 − 1 = 2, dfB = K − 1 = 2 − 1 = 1, dfAB = (J − 1)(K − 1) = 2*1 = 2, dfwith = N − JK = 36 − 3*2 = 30, dftotal = N − 1 = 36 − 1 = 35. SSA = dfA*MSA = 2*6.5 = 13, SSAB = dfAB*MSAB = 2*5.2 = 10.4, SSB = SStotal − SSA − SSAB − SSwith = 65 − 13 − 10.4 − 39 = 2.6. MSB = SSB/dfB = 2.6/1 = 2.6; MSwith = SSwith/dfwith = 39/30 = 1.3. FA = MSA/MSwith = 6.5/1.3 = 5; critical value = .05F2,30 = 3.32 < FA, reject H0. FB=MSB/MSwith = 2.6/1.3 = 2; critical value = .05F1,30 = 4.17 > FB, fail to reject H0. FAB =MSAB/MSwith =5.2/1.3 = 4; critical value = .05F2,30 = 3.32 < FAB, reject H0.
.
Source
SS
df
MS
F
Critical Value
Decision
A B AB Within Total
13.0 2.6 10.4 39.0 65.0
2 1 2 30 35
6.5 2.6 5.2 1.3
5 2 4
.05F2,30 = 3.32
Reject H0 Fail to reject H0 Reject H0
.05F1,30 = 4.17 .05F2,30 = 3.32
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
2. Complete the following ANOVA summary table for a two-factor fixed-effects ANOVA, where there are four levels of factor A (grade level) and three levels of factor B (textbook). Each cell includes 11 students and = .05. Source A B AB Within Total
SS
df
MS
F
Critical Value
Decision
5 42 25 240
ANSWER A has 4 levels, so J = 4. B has 3 levels, so K = 3. Each cell has 11 students, so n = 11. N = 4*3*11 = 132. dfA = J − 1 = 4 − 1 = 3, dfB = K − 1 = 3 − 1 = 2, dfAB = (J − 1)(K − 1) = 3*2 = 6, dfwith = N - JK = 132 − 4*3 = 120, dftotal = N − 1 = 132 − 1 = 131. SSA = dfA*MSA = 3*5 = 15, SSAB = dfAB*MSAB = 6*25 = 150, SStotal = SSA + SSB + SSAB + SSwith = 15 + 42 + 150 + 240 = 447. MSB = SSB/dfB = 42/2 = 21; MSwith = SSwith/dfwith = 240/120 = 2. FA =MSA/MSwith = 5/2 = 2.5; critical value = .05F3,120 = 2.68 > FA, fail to reject H0. FB =MSB/MSwith = 21/2 = 10.5; critical value = .05F2,120 = 3.07 < FB, reject H0. FAB =MSAB/MSwith = 25/2 = 12.5; critical value = .05F6,120 = 2.18 < FAB, reject H0.
.
Source
SS
df
MS
F
Critical Value
Decision
A B AB Within Total
15 42 150 240 447
3 2 6 120 131
5 21 25 2
2.5 10.5 12.5
.05F3,120 = 2.68
Fail to reject H0 Reject H0 Reject H0
.05F2,120 = 3.07 .05F6,120 = 2.18
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. Complete the following ANOVA summary table for a two-factor fixed-effects ANOVA, where there are two levels of factor A (type of counseling) and four levels of factor B (frequency of counseling). Each cell includes six people and = .01. Source
SS
A B AB Within Total
60
df
MS
F
Critical Value
Decision
6
60 640
ANSWER A has 2 levels, so J = 2. B has 4 levels, so K = 4. Each cell has 6 people, so n = 6. N = 2*4*6 = 48. dfA = J − 1 = 2 − 1 = 1, dfB = K − 1 = 4 − 1 = 3, dfAB = (J − 1)(K − 1) = 1*3 = 3, dfwith = N − JK = 48 − 2*4 = 40, dftotal = N − 1 = 48 − 1 = 47. MSA = SSA/dfA = 60/1 = 60, MSAB = SSAB/dfAB = 60/3 = 20, MSwith = MSA/FA = 60/6 = 10; SSwith = MSwith*dfwith = 10*40 = 400, SSB = SStotal − SSA − SSAB − SSwith = 640 − 60 − 60 − 400 = 120; MSB = MSB/FB = 120/3 = 40 FA = MSA/MSwith = 60/10 = 6; critical value = .01F1,40 = 7.31 > FA, fail to reject H0. FB = MSB/MSwith = 40/10 = 4; critical value = .01F3,40 = 4.31 > FB, fail to reject H0. FAB = MSAB/MSwith = 20/10 = 2; critical value = .01F3,40 = 4.31 > FAB, fail to reject H0.
.
Source
SS
df
MS
F
Critical Value
Decision
A B AB Within
60 120 60 400
1 3 3 40
60 40 20 10
6 4 2
.01F1,40 = 7.31
Fail to reject H0 Fail to reject H0 Fail to reject H0
.01F3,40 = 4.31 .01F3,40 = 4.31
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. A researcher wanted to examine the effects of types of diet (factor A) and the age (factor B) on the weight gains of sheep. Diet type has four levels (i.e., four types of diet) and age has two levels (younger than one year old and older than one year old). Five sheep were assigned to each of the eight cells. The following are the scores (weight gains) from the individual cells: A1B1: 11.8, 11.7, 11.1, 10.7, 10.4 A1B2: 11.1, 9.8, 9.5, 9.2, 8.8 A2B1: 10.2, 10.0, 8.7, 8.1, 7.3 A2B2: 8.6, 8.2, 7.7, 7.4, 5.6 A3B1: 12.0, 10.8, 10.5, 10.2, 10.2 A3B2: 10.7, 9.8, 9.7, 9.5, 8.9 A4B1: 9.8, 9.6, 9.4, 9.1, 7.9 A4B2: 8.0, 7.4, 7.4, 6.7, 5.8 Use SPSS to conduct a two-factor fixed-effects ANOVA to determine if there are any effects due to diet type, diet amount, or the interaction ( = 0.05). Conduct Tukey HSD post hoc comparisons, if necessary. (Presume all assumptions of ANOVA are satisfied.) ANSWER Procedure: Create a data set with three variables: DietType (factor A), Age (factor B), and WeightGain (dependent variable). The data set should have 40 cases. Go to Analyze → General Linear Model → Univariate. Select WeightGain as the Dependent Variable. Select DietType and Age as the Fixed Factors. Go to Options. Check Descriptive statistics, Estimates of effect size, and Observed power. Click Continue, then click OK. Based on the ANOVA results, we only need to conduct post hoc tests on DietType, which is a significant main effect and has more than two levels. Go to Analyze → General Linear Model → Univariate. Go to Post Hoc. Select DietType to Post Hoc Tests for, and check Tukey. Click Continue, then click OK. Selected SPSS Output:
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Tests of Between-Subjects Effects Dependent Variable: WeightGain Type III Mean Source Sum of df Square Squares DietType 47.493 3 15.831 Age 22.052 1 22.052 DietType*Age 1.527 3 .509 Error 25.056 32 .783 Corrected Total 96.128 39
.
F
Sig.
20.218 28.164 .650
.000 .000 .589
Partial Observed Eta Power Squared .655 1.000 .468 .999 .057 .171
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. A small-scale taste test is conducted to find out how the levels of moisture (factor A) and sweetness (factor B) of the cakes are related to people's preference to the cake. Participants' gender (factor C) is also considered. Each factor consists of two levels. Thirty-two participants are assigned to eight cells (i.e., four per cell), one for each of the factor combinations. The following are the scores (rating of the cakes by the participants) from the individual cells: A1B1C1: 64, 61, 65, 70 A1B1C2: 63, 60, 64, 68 A1B2C1: 73, 76, 72, 71 A1B2C2: 72, 75, 79, 77 A2B1C1: 72, 78, 81, 75 A2B1C2: 74, 73, 75, 69 A2B2C1: 98, 97, 93, 89 A2B2C2: 86, 93, 88, 94 Use SPSS to conduct a three-factor fixed-effects ANOVA ( = 0.01). If there is (are) any significant interaction(s), graph and interpret the interaction(s). ANSWER Procedure: Create a data set with four variables: Moisture (factor A), Sweetness (factor B), Gender (factor C), and Score (dependent variable). The data set should have 32 cases. Go to Analyze → General Linear Model → Univariate. Select Score as the Dependent Variable. Select Moisture, Sweetness, and Gender as the Fixed Factors. Go to Options. Check Descriptive statistics, Estimates of effect size, and Observed power. Change the Significance level to 0.01. Click Continue, then click OK. Based on the ANOVA results, we need to graph the Moisture*Sweetness twoway interaction. Go to Analyze → General Linear Model → Univariate. Go to Plot. Select Moisture to Horizontal Axis, and Sweetness to Separate Lines, then Add. Click Continue, then click OK.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Selected SPSS Output: Tests of Between-Subjects Effects Dependent Variable: Score Type III Mean PartialObserved Source Sum of df F Sig. Square 2 Power Squares Moisture 1582.031 1 1582.031137.195.000 .851 1.000 Sweetness 1526.281 1 1526.281132.360.000 .847 1.000 Gender 19.531 1 19.531 1.694 .205 .066 .087 Moisture*Sweetness 116.281 1 116.281 10.084 .004 .296 .648 Moisture*Gender 42.781 1 42.781 3.710 .066 .134 .217 Sweetness*Gender 7.031 1 7.031 .610 .443 .025 .033 Moisture*Sweetness*Gender 9.031 1 9.031 .783 .385 .032 .040 Error 276.750 24 11.531 Corrected Total 3579.71931 Profile Plots
The ANOVA summary table showed that the main effect of Moisture (F = 137.20, df = 1, 24, p < .001), main effect of Sweetness (F = 132.36, df = 1, 24, p < .001), and the two-way interaction of Moisture*Sweetness (F = 10.084, df = 1,24, p = .004) were significant at = .01. The main effect of Gender and the other interactions were not significant. Effect sizes were large for all the significant effects (partial η2 is .851 for Moisture, .847 for Sweetness, and .296 for Moisture*Sweetness). Observed power was maximal for the main effects of Moisture and Sweetness, but not satisfactory for the Moisture*Sweetness interaction (< 0.8). The significant interaction, Moisture*Sweetness, was graphed. The plot shows that at each level of sweetness, the score is higher for moister cakes. However, the score difference is larger when the cakes are sweeter. .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 14 (One-Factor Fixed-Effects ANCOVA with Single Covariate) Test Bank Multiple choice Questions 1 through 3 are based on the following scenario: A researcher wanted to examine whether people remembered information better with auditory cues or with visual cues. Ten participants were randomly assigned to each of the two groups: one group heard a list of words from headphones, and the other group saw the same list of words from a screen. The participants were then asked to write down as many words as they could remember. The dependent variable was the number of words remembered. 1. For which of the following situations is it appropriate to use ANCOVA? a. In each group, half of the participants had a time limit to write the words down. Whether a time limit was imposed was also included in the model. b. The participants' socioeconomic status (low, middle, or high) was expected to have an effect on their performance and was also included in the model. c. Participants took a vocabulary test before the experiment and the scores of the test were included in the model. d. Participants took a vocabulary test after the experiment and the scores of the test were included in the model. ANSWER: C (The covariate should be measured on interval or ratio scale. Also, it is better that covariates are measured prior to the administration of the treatment so that the dependence of the covariate on the treatment can be minimized.) 2. Suppose the researcher used IQ as the covariate. He then found that the mean IQ is slightly different across two groups, and people with higher IQ tended to remember more words than people with lower IQ. It seems likely that a. the assumption of independence was violated. b. the assumption of independence of covariate and treatment was violated. b. the assumption of homogeneity of variance was violated. d. the assumption of homogeneity of slopes was violated. e. there is no indication of assumption violation. ANSWER: E (The value of covariate can be different between groups, and a correlation between the covariate and dependent variable is desirable.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. Suppose out of budget concerns, the researcher included only five participants in each group, tested each participant twice, and used each time as separate observations. It seems likely that a. the assumption of independence was violated. b. the assumption of independence of covariate and treatment was violated. b. the assumption of homogeneity of variance was violated. d. the assumption of homogeneity of slopes was violated. e. there is no indication of assumption violation. ANSWER: A (The observations collected from the same subject are not independent.)
Questions 4 through 8 are based on the following scenario: A researcher wanted to examine if soil type had any effects on the heights of daylilies. Ten bulbs of daylilies were planted in each of the three different types of soil. The thickness of the soil (X) was also measured for each pot. After three months, the heights of the plants (Y) were measured. Below are the group means of the study. Soil Type n Heights of Daylilies Thickness of Soil (Y) (X) ̅ ̅ 1 10 𝑌1. = 20 𝑋1. = 21 ̅ 2 10 𝑌2. = 25 𝑋̅2. = 18 3 10 𝑌̅3. = 30 𝑋̅2. = 15 4. If ANCOVA is used, which group will have the highest adjusted mean height? a. Group 1 (𝑌̅1.′) b. Group 2 (𝑌̅2.′ ) c. Group 3 (𝑌̅3.′ ) d. All groups would have the same adjusted mean. e. Any of the above situations is possible (not enough information provided). ANSWER: E (Without knowing bw, any adjustment of means is possible.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. If there is a substantial, positive correlation between the thickness of soil (X) and the height of daylilies (Y), which group will have the highest adjusted mean height? a.G roup 1 (𝑌𝑌̅1.′) b. Group 2 (𝑌𝑌̅2.′ ) c.G roup 3 (𝑌𝑌̅3.′ ) d. All groups would have the same adjusted mean. e.A ny of the above situations is possible. ANSWER: C (When bw > 0, the group with the smallest 𝑋𝑋̅ will have a higher adjusted mean, and the group with the largest 𝑋𝑋̅ will have a lower adjusted mean.) 6. If there is no correlation between X and Y, the MSwith for ANCOVA as compared to that for ANOVA will be a.l ess. b. the same. c.g reater. d. unpredictably different. ANSWER: c (If X and Y are not correlated, SSwith will be the same, but with the loss of one dfwith, MSwith will become greater.) 7. If there is a substantial negative correlation between X and Y, the error variation for ANCOVA as compared to that for ANOVA will be a.l ess. b. the same. c.g reater. d. unpredictably different. ANSWER: A (If the correlation is substantial, then error variance will be reduced in ANCOVA regardless of its sign.) 8. If the correlation between X and Y is 0.5 for soil type 1, 0.1 for soil type 2, and −0.5 for soil type 3, it seems likely that a.t he assumption of normality is violated. b. the assumption of linearity is violated. b. the assumption of homogeneity of variance is violated. d. the assumption of homogeneity of slopes is violated. e.t here is no indication of assumption violation. ANSWER: D (The regression slopes will be different for different soil types.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
9. Matt has generated an ANCOVA. In testing the assumptions, he reviews the scatterplot of the covariate (X) and the dependent variable (Y) for each group and overall. For which assumption(s) is Matt likely reviewing evidence? a. Homogeneity of regression slopes b. Homogeneity of variance c. Linearity d. All of the above ANSWER: D (Scatterplot of Y against X can be used to examine the assumptions of homogeneous regression slopes, homogeneous variances, and linearity.) 10. In ANCOVA, suppose Y is the dependent variable, X is the covariate, and the factor has two levels. Also, all assumptions of ANCOVA are met. Which of the following situations is the most desirable? a. rXY = 0.5; 𝑋̅1. =10, 𝑋̅2. =12, 𝑋̅=11. b. rXY = 0.1; 𝑋̅1. =10, 𝑋̅2. =12, 𝑋̅=11. c. rXY = -0.1; 𝑋̅1. =10, 𝑋̅2. =20, 𝑋̅=15. d. rXY = -0.5; 𝑋̅1. =10, 𝑋̅2. =20, 𝑋̅=15. ANSWER: A (Ideally the covariate should be highly correlated with the dependent variable to reduce error variance, and the groups should be approximately equivalent in all other traits except for group membership.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
11. In a one-factor ANCOVA, suppose the factor has two levels (groups). Scatterplots of the dependent variable (Y) and the covariate (X) are generated (where group 1 is indicated by "•" and group 2 by "o"). Which of the following graphs shows the most desirable situation?
a
b
c
d
ANSWER: B (Ideally X and Y should be correlated with each other, and the slope should be similar for each group.) 12. Which of the following conditions about the dependent variable (Y) and the covariate (X) is required for ANCOVA? a.B oth X and Y are measured on interval or ratio scale. b. Both X and Y are measured in the same unit. c. Both X and Y are independent of the independent variable. d. Both X and Y are measured without error. ANSWER: A (Both X and Y should be interval or ratio-scaled variables, but they don't need to be on the same scale. Only X should be independent of the factor.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
13. An experiment was conducted to compare four different weight loss programs. Twenty subjects participated in each program, and their weights were recorded before and after the program. If both the pretest weight and age are used as covariates, what are the degrees of freedom for the error term? a. 74 b. 75 c. 76 d. 79 ANSWER: A (N = 4*20 = 80; J = 4; 2 covariates; dfwith = N − J − 2 = 74.) 14. A study was conducted to compare five new textbooks. Thirty teachers were asked to rate the textbooks they used in the previous semester, and then they were randomly assigned to one of five new textbook groups (n = 6). At the end of the semester, teachers' ratings of the new textbooks (Y) were collected. What may be an appropriate covariate for this study? a. Students' achievement at the end of the semester. b. Students' rating of the new textbook at the end of the semester. c. Teachers' rating of the old textbook. d. Teachers' ethnicity. ANSWER: C (Students' achievement and student ratings of the textbook may be dependent on which textbook they were assigned to. Ethnicity is on a nominal scale.) 15. In the study described in Question 14, suppose an appropriate covariate has been chosen. What are the degrees of freedom for the error term? a. 4 b. 24 c. 25 d. 29 ANSWER: B (N = 5*6 = 30; J = 5; 1 covariate; dfwith = N − J − 1 = 24.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
16. If the dependent variable, math score (Y), is about the same for persons with higher scores in writing (X) and for those with lower scores in writing, one would expect that a. adding the covariate will not substantially reduce error variation. b. the adjusted mean for each group will be the same. c. ANCOVA will be more powerful than ANOVA. d. the covariate and factor will be independent of each other. ANSWER: A (When the correlation between X and Y is very weak, adding the covariate will not substantially reduce error variance, and ANCOVA will be less powerful than ANOVA.) 17. In which of the following situations will the adjusted means always equal the unadjusted means in ANCOVA? a. All groups have the same unadjusted mean. b. All groups have the same mean on the covariate. c. The dependent variable and covariate are highly correlated. d. Regression of the dependent variable on the covariate with bw = 0.6. ANSWER: B (If all groups have the same mean on X, 𝑋̅1. = 𝑋̅2. = . . . = 𝑋̅𝑗. = ̅.𝑗 − 𝑏𝑤 (𝑋 ̅ .𝑗 − 𝑋 ̅ .. ), adjusted means will be the same as 𝑋̅.. . Because 𝑌̅.𝑗′ = 𝑌 unadjusted means.) 18. In reviewing the assumptions, Susan runs an ANOVA using the covariate X as the dependent variable and the factor A as the independent variable. For this test, which of the following situations is the most desirable? a. p = 0.01 b. p = 0.05 c. p = 0.1 d. p = 0.5 ANSWER: D (Larger p indicates that the values of X are not significantly different across groups, lending evidence to the assumption of independence of X and the factor.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
19. Susan runs an ANOVA using the covariate X as the dependent variable and the factor A as the independent variable. Now using Y as the dependent variable, Susan runs another ANOVA to evaluate if there is any interaction between the covariate X and the factor A. For 𝐹𝐴×𝑋 , which of the following situations is the most desirable? a. p = 0.01 b. p = 0.05 c. p = 0.1 d. p = 0.5 ANSWER: D (Larger p indicates that the interaction between X and the factor is not significant, lending evidence to the assumption of homogeneous slopes.) 20. In which of the following situations is the assumption of linearity violated? a. Trend analysis shows that there exists a significant quadratic trend. b. There is no association between the independent variable and the covariate. c. The distribution of the covariate is seriously skewed in all groups. d. The scatterplot of the dependent variable and the covariate is U shaped for all groups. ANSWER: D (The assumption of linearity states that the relation of X and Y is linear for all groups. U-shaped patterns in scatterplots indicate nonlinearity.) 21. If both ANOVA and ANCOVA are conducted on the same data set, which part of the summary table will always have the same value? a. SSbetw b. SSwith c. SStotal d. MSbetw e. MSwith ANSWER: C (SStotal will always have the same value as long as the same data set are used.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
22. Using the same data set, which part of the ANCOVA summary table will always have a smaller value as compared to that for ANOVA? a. dfbetw b. MSbetw c.d fwith d. MSwith e.d ftotal ANSWER: C (dfbetw and dftotal will not change. dfwith will be reduced due to the inclusion of covariate(s) in ANCOVA.) 23. In which of the following situations is ANOVA more powerful than ANCOVA? a. The adjusted means are the same as unadjusted means. b. The unadjusted means all have the same value. c. The correlation between the covariate and the dependent variable is strong. d. The correlation between the covariate and the dependent variable is weak. ANSWER: D (When the correlation between X and Y is weak, adding the covariate will not substantially reduce error variance, and ANOVA will be more powerful than ANCOVA.) 24. In a randomized experiment (true experiment), if the covariate contains considerable measurement error, the use of ANCOVA would likely result in a. too much adjustment of means. b. less of the covariate effect being removed from the dependent variable. c. an F test that is too liberal. d. biased estimates of the treatment effects. ANSWER: B (Considerable measurement error can result in underadjustment of means, underestimation of covariate effect, and less powerful F tests. However, the treatment effects will not be biased in a true experiment.) 25. If the regression slopes of the dependent variable (Y) on the covariate (X) are substantially different across the groups, one would expect that a.t he adjusted means will be biased. b. Y is independent of X. c. the factor is not independent of X. d. there is a modest effect with equal n's in quasi-experiment. ANSWER: A (Heterogeneous slopes could seriously bias the adjusted means. The effect is modest with equal n's in a quasi-experiment.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
26. The covariate in ANCOVA serves as which one of the following? a. Dependent variable b. Design control c. Experimental control d. Statistical control ANSWER: D 27. In ANCOVA, which of the following is a source of variation not controlled for when designing the experiment but that the researcher believes to affect the outcome? a. Covariate b. Dependent variable c. Independent variable d. None of the above ANSWER: A 28. The covariate in ANCOVA is also referred to as which one of the following? a. Adjustment variable b. Concomitant variable c. Dependent variable d. Independent variable ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Solutions: Multiple choice 1. c (The covariate should be measured on interval or ratio scale. Also, it is better that covariates are measured prior to the administration of the treatment so that the dependence of the covariate on the treatment can be minimized.) 2. e (The value of covariate can be different between groups, and a correlation between the covariate and dependent variable is desirable.) 3. a (The observations collected from the same subject are not independent.) 4. e (Without knowing bw, any adjustment of means is possible.) 5. c (When bw > 0, the group with the smallest 𝑋̅ will have a higher adjusted mean, and the group with the largest 𝑋̅ will have a lower adjusted mean.) 6. c (If X and Y are not correlated, SSwith will be the same, but with the loss of one dfwith, MSwith will become greater.) 7. a (If the correlation is substantial, then error variance will be reduced in ANCOVA regardless of its sign.) 8. d (The regression slopes will be different for different soil types.) 9. d (Scatterplot of Y against X can be used to examine the assumptions of homogeneous regression slopes, homogeneous variances, and linearity.) 10. a (Ideally the covariate should be highly correlated with the dependent variable to reduce error variance, and the groups should be approximately equivalent in all other traits except for group membership.) 11. b (Ideally X and Y should be correlated with each other, and the slope should be similar for each group.) 12. a (Both X and Y should be interval or ratio-scaled variables, but they don't need to be on the same scale. Only X should be independent of the factor.) 13. a (N = 4*20 = 80; J = 4; 2 covariates; dfwith = N − J − 2 = 74.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
14. c (Students' achievement and student ratings of the textbook may be dependent on which textbook they were assigned to. Ethnicity is on a nominal scale.) 15. b (N = 5*6 = 30; J = 5; 1 covariate; dfwith = N − J − 1 = 24.) 16. a (When the correlation between X and Y is very weak, adding the covariate will not substantially reduce error variance, and ANCOVA will be less powerful than ANOVA.) 17. b (If all groups have the same mean on X, 𝑋̅1. = 𝑋̅2. = ⋯ = 𝑋̅𝑗. = 𝑋̅.. . Because ̅.𝑗 − 𝑏𝑤 (𝑋 ̅.𝑗 − 𝑋 ̅ .. ), adjusted means will be the same as unadjusted means.) 𝑌̅.𝑗′ = 𝑌 18. d (Larger p indicates that the values of X are not significantly different across groups, lending evidence to the assumption of independence of X and the factor.) 19. d (Larger p indicates that the interaction between X and the factor is not significant, lending evidence to the assumption of homogeneous slopes.) 20. d (The assumption of linearity states that the relation of X and Y is linear for all groups. U-shaped patterns in scatterplots indicate nonlinearity.) 21. c (SStotal will always have the same value as long as the same data set are used.) 22. c (dfbetw and dftotal will not change. dfwith will be reduced due to the inclusion of covariate(s) in ANCOVA.) 23. d (When the correlation between X and Y is weak, adding the covariate will not substantially reduce error variance, and ANOVA will be more powerful than ANCOVA.) 24. b (Considerable measurement error can result in under-adjustment of means, underestimation of covariate effect, and less powerful F tests. However, the treatment effects will not be biased in a true experiment.) 25. a (Heterogeneous slopes could seriously bias the adjusted means. The effect is modest with equal n's in a quasi-experiment.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 14 (One Factor Fixed-Effects ANCOVA with Single Covariate) Test Bank
Short Answer 1. A researcher wanted to examine if soil type had any effects on the heights of daylilies (Y). The thickness of the soil (X) in the pot was used as the covariate X. Given the data that follow, where there are three types of soil (n = 10 in each group), (a) calculate the adjusted mean values assuming that bw = 0.5, and (b) determine what effects the adjustment had on the posttest results. Soil Type Heights of Daylilies (Y) Thickness of Soil 1 𝑌̅.1 = 20 𝑋̅.1 = 21 2 𝑌̅.2 = 25 𝑋̅.2 = 18 ̅ 3 𝑌.3 = 30 𝑋̅.3 = 15 ANSWER: (a) Adjusted mean: 𝑌̅.𝑗′ = 𝑌̅.𝑗 − 𝑏𝑤 (𝑋̅.𝑗 − 𝑋̅.. ). We know that 𝑌̅.1 = 20, 𝑌̅.2 = 25, 𝑌̅.3 = 30; bw = 0.5; 𝑋̅.1 = 21, 𝑋̅.2 = 18, 𝑋̅.3 = 15. Because it's a balanced design, 𝑋̅.. = (21 + 18 + 15)/3 = 18. We can calculate 𝑌̅.1′ = 𝑌̅.1 − 𝑏𝑤 (𝑋̅.1 − 𝑋̅.. ) = 20 − 0.5*(21 − 18) = 18.5; 𝑌̅.2′ = 𝑌̅.2 − 𝑏𝑤 (𝑋̅.2 − 𝑋̅.. ) = 25 − 0.5*(18 − 18) = 25; 𝑌̅.3′ = 𝑌̅.3 − 𝑏𝑤 (𝑋̅.3 − 𝑋̅.. ) = 30 − 0.5*(15 − 18) = 31.5. (b) The adjustment moved the mean for Group 1 down by 1.5 units (from 20 to 18.5), and moved the mean for Group 3 up by 1.5 units (from 30 to 31.5). It did not change the mean for Group 2. Therefore, after the adjustment the difference between group means is enlarged. The effects of soil type will become larger and possibly more significant.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
2. A market researcher wanted to know whether different package designs for the same yogurt product would affect consumers' purchase intention. There were four different versions of package designs. Each package was viewed by eight participants, who then rated their likelihood to purchase the product. A onefactor ANCOVA was used to analyze the data where the covariate was the participants' liking of the yogurt product in general. Complete the following ANCOVA summary table ( = .05): Source
SS
df
MS
F
Critical
Decision
Value Between adjusted Within adjusted Covariate Total
10.5 __ __ 35
__ __ __ __
__ __ 5.6
__
__
__
__
__
__
ANSWER The factor (designs of cereal packages) has 4 levels, so J = 4. Each level has 8 participants, so n = 8, N = 4*8 = 32. There is one covariate (liking of yogurt). dfbetw(adj) = J - 1 = 4 − 1 = 3, dfwith(adj) = N − J − 1 = 32 − 4 − 1 = 27, dfcov = 1, dftotal = N − 1 = 32 − 1 = 31. SScov = dfcov*MScov = 1*5.6 = 5.6, SSwith(adj) = SStotal − SSbetw(adj) − SScov = 35 − 10.5 − 5.6 = 18.9. MSbetw(adj) = SSbetw(adj)/dfbetw(adj) = 10.5/3 = 3.5. MSwith(adj) = SSwith(adj)/dfwith(adj) = 18.9/27 = 0.7. Fbetw(adj) = MSbetw(adj)/MSwith(adj) = 3.5/0.7 = 5; critical value = .05F3,27 = 2.96 < Fbetw(adj), reject H0. Fcov = MScov/MSwith(adj) = 5.6/0.7 = 8; critical value = .05F1,27 = 4.21 < Fcov, reject H0.
.
Source
SS
df
MS
F
Critical Value
Decision
Between adjusted Within adjusted Covariate Total
10.5 18.9 5.6 35.0
3 27 1 31
3.5 0.7 5.6
5
.05F3,27 = 2.96
Reject H0
8
.05F1,27 = 4.21
Reject H0
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. Mike wanted to examine whether people remembered information better with auditory cues or with visual cues. He recruited 14 participants and randomly assigned seven participants to each of the two groups: one group heard a list of words from headphones, and the other group saw the same list of words from the screen. The participants were then asked to write down as many words as they could remember. The dependent variable was the number of words correctly remembered (Y), and the covariate was the participants' score on a memory test (X) administered before the experiment. Using the data below, conduct an ANOVA on Y and an ANCOVA on Y using X as a covariate, and compare the results ( = .05). Determine the unadjusted and adjusted means. Auditory Cue (Headphone) X Y 80 10 75 9 95 16 60 4 70 11 65 8 90 15
Visual Cue (Screen) X Y 75 13 90 20 65 8 70 9 80 15 85 13 75 12
ANSWER Procedure: Create a data set with three variables: Memory (covariate), Words (dependent variable), and Group (factor with 2 levels). The data set should have 14 cases. To conduct ANOVA: Analyze → General Linear Model → Univariate. • Select Words as the Dependent Variable. Select Group as the Fixed Factor. • Go to Options. Select Group into Display Means for. Check Descriptive statistics, Estimates of effect size, and Observed power. Click Continue. To conduct ANCOVA: Analyze → General Linear Model → Univariate. • Select Words as the Dependent Variable. Select Group as the Fixed Factor. Select Memory as the Covariate. • Click on Model. Under Sum of squares, select Type I from the dropdown menu. Click Continue. • Go to Options. Select Group into Display Means for. Check Descriptive statistics, Estimates of effect size, and Observed power. Click Continue.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Selected SPSS Output: I. ANOVA Results: Tests of Between-Subjects Effects Dependent Variable: Words Source
Type III Sum of Squares
df
Group Error Corrected Total
20.643 196.571 217.214
1 12 13
Mean Square 20.643 16.381
F
Sig.
1.260 .284
Partial Eta Observed Squared Power .095
.179
Unadjusted Means Dependent Variable: Words Group
Mean
Std. Error
1 auditory 2 visual
10.429 12.857
1.530 1.530
95% Confidence Interval Lower Bound Upper Bound 7.096 9.524
13.762 16.190
II. ANCOVA Results: Tests of Between-Subjects Effects Dependent Variable: Words Type I Sum of Squares Memory 167.127 Group 16.722 Error 33.366 Corrected Total 217.214 Source
df 1 1 11 13
Mean Partial Eta Observed F Sig. Square Squared Power 167.127 55.098 .000 .834 1.000 16.722 5.513 .039 .334 .572 3.033
Adjusted Means Dependent Variable: Words Group
Mean
Std. Error
1 auditory 2 visual
10.549a 12.736a
.658 .658
95% Confidence Interval Lower Bound Upper Bound 9.100 11.287
11.999 14.186
A one-factor fixed-effects ANOVA was first conducted. As shown in the table of unadjusted means, participants who received visual cues remembered more words (12.86) than those who received auditory cues did (10.43). However, the ANOVA table showed that the effect of type of cues on the number of words remembered was nonsignificant (F = 1.26, df = 1, 12, p = .28). The effect size was medium (partial η2=.095), but the observed power was not satisfactory (.179). A one-factor ANCOVA was also conducted on the same data. As shown in the table of adjusted means, after adjusting for group differences in memory, the adjusted mean was 10.55 for people who received auditory cues and 12.74 for .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
people who received visual cues. The one-factor ANCOVA table showed that the type of cues had a significant effect on the number of words remembered after adjusting for memory (F = 5.51, df = 1, 11, p = .039), with a large effect size (partial η2=.334). The slope of Memory (i.e., the covariate) was also significantly different from zero (F = 55.10, df = 1, 11, p < .001), with large effect size and maximal power, suggesting that memory is related to the number of words remembered. After Memory was included as covariate in the model, the mean for Group 1 was adjusted to be slightly higher than the raw mean, whereas the mean for Group 2 was adjusted to be slightly lower than the raw mean. By substantially reducing error variance, the use of the covariate resulted in a larger difference between the groups, and increased power for the F test.
4. Barbara wants to know whether students will learn most effectively with soft music as background sound, as opposed to loud music or no music at all. In the following table are three independent random samples (different background sounds) of paired values on the covariate (X; pretest score) and the dependent variable (Y; posttest score). Conduct an ANOVA on Y, an ANCOVA on Y using X as a covariate, and compare the results ( = .05). Determine the unadjusted and adjusted means. Soft Music X Y 75 89 42 49 34 52 61 70 45 66 70 83 75 90 58 73
Loud Music X Y 50 57 43 52 56 62 48 65 59 64 27 35 65 71 54 68
No Music X 64 70 32 65 67 42 71 74
Y 72 79 48 87 80 53 85 82
ANSWER Procedure: Create a data set with three variables: Pretest (covariate), Posttest (dependent variable), and Music (factor with 3 levels). The data set should have 24 cases. Other steps are similar to those in Question 3. Use Posttest as the Dependent Variable, Music as the Fixed Factor, and Pretest as the Covariate. Selected SPSS Output: I. ANOVA Results: .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Tests of Between-Subjects Effects Dependent Variable: Posttest Source
Type III Sum of Squares
df
Music Error Corrected Total
931.000 4157.000 5088.000
2 21 23
Mean Square 465.500 197.952
F
Sig.
2.352 .120
Partial Eta Observed Squared Power .183
.422
Unadjusted Means Dependent Variable: Posttest Music
Mean
Std. Error
1 soft music 2 loud music 3 no music
71.500 59.250 73.250
4.974 4.974 4.974
95% Confidence Interval Lower Bound Upper Bound 61.155 48.905 62.905
81.845 69.595 83.595
II. ANCOVA Results: Tests of Between-Subjects Effects Dependent Variable: Posttest Type I Sum of Squares Pretest 4553.148 Music 122.915 Error 411.937 Corrected Total 5088.000 Source
Mean Partial Eta Observed F Sig. Square Squared Power 1 4553.148 221.060 .000 .917 1.000 2 61.457 2.984 .073 .230 .515 20 20.597 23 df
Adjusted Means Dependent Variable: Posttest Music
Mean
Std. Error
1 soft music 2 loud music 3 no music
70.214a 64.745a 69.041a
1.607 1.655 1.635
95% Confidence Interval Lower Bound Upper Bound 66.861 61.291 65.631
73.567 68.198 72.451
A one-factor fixed-effects ANOVA was first conducted. As shown in the table of unadjusted means, students who studied without any background music had the highest posttest mean score (73.25), followed by those who listened to soft music (71.5), and those who listened to loud music (59.25). The ANOVA table showed that background music did not have any significant effect on posttest scores (F = 2.352, df = 2, 21, p = .12). Effect size was large (partial η2 = .183), but the observed power was not adequate (.422). A one-factor ANCOVA was also conducted on the same data. As shown in.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
the table of adjusted means, after adjusting for group differences in pretest scores, the mean was highest for students who listened to soft music (70.21), followed by students who studied without music (69.04), and students who listened to loud music (64.75). The one-factor ANCOVA table showed that the effects of background music on posttest score after adjusting for pretest score was nonsignificant at = .05 (F = 2.98, df = 2, 20, p = .073), with a large effect size yet inadequate power (partial η2 = .23, power = .515). The slope of Pretest (i.e., the covariate) was significantly different from zero (F = 221.06, df = 1, 20, p < .001), suggesting that pretest score is closely related to the posttest scores. After pretest score is included as covariate in the model, the means for Group 1 and Group 3 were adjusted to be lower than their raw mean, whereas the mean for Group 2 was adjusted to be higher than its raw mean. The use of the covariate resulted in a reduction of error variance and increased the power for the F test.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. Dr. Green conducted an ANCOVA to determine whether hearing-impaired children who were taught sign language early on would develop better language skills compared to those who did not learn sign language. The data set contains two independent random samples (children who learned sign language and those who did not) of paired values on the covariate (X; child's hearing) and the dependent variable (Y; language skills measured when the child was three years old). Dr. Green also examined the data to see if the assumptions of ANCOVA were met. The following table and figures are the selected output from Dr. Green's analysis ( = .05): Tests of Between-Subjects Effects Dependent Variable: Language Source Type III Sum of Squares df Sign 22681.088 1 Hearing 3084.607 1 Sign*Hearing 23130.912 1 Error 136920.190 36 Corrected Total 162488.375 39
Mean Square 22681.088 3084.607 23130.912 3803.339
F 5.963 .811 6.082
Sig. .020 .374 .019
a. What assumption is being evaluated here? b. Was this assumption satisfied? If not, what effect might it have on the results of ANCOVA? ANSWER a. The assumption of homogeneous slopes is being evaluated. b. The interaction between signing (factor) and hearing (covariate) is significant (F = 6.082, p = .019). Moreover, the scatterplots of Y vs. X suggest that the regression slopes differ between two groups (i.e., children who learned sign language and those who did not). Therefore, the assumption of homogeneous slopes seems to be violated. .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
With heterogeneous slopes, the size of the group differences in Y (i.e., children's language skills) will depend on the value of X (i.e., children's hearing). For example, learning to sign may be helpful for children with worse hearing, but not for children with better hearing. In this case, a straightforward interpretation of the ANCOVA would be misleading. The use of w can yield biased adjusted means and can affect the F test. Considering the study is probably not a randomized experiment, the violation of this assumption may have a modest effect on the results of the F test.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 15 (Random- and Mixed-Effects Analysis of Variance Models) Test Bank I. Multiple choice 1. The denominator of the F ratio used to test the main effect of factor A in a two-factor ANOVA is MSAB in which one of the following? a. Fixed-effects model b. Mixed-effects model where A is the fixed factor c. Mixed-effects model where A is the random factor d. Split-plot model where A is the between-subjects factor e. Split-plot model where A is the within-subjects factor ANSWER: B (In the fixed-effects model, MSwith is the denominator. In the mixed-effects model, MSAB is the denominator when A is the fixed factor, and MSwith is the denominator when A is the random factor. In the split-plot model, MSs is the denominator when A is the between-subjects factor, while MSAs is the denominator when A is the within-subjects factor.) 2. Candace conducts a study to examine the effect of a professional development program on teachers' content knowledge. Half of the teachers are randomly assigned to participate in the program (the other half are controls). All teachers are then tested on their content knowledge each month over the course of the academic year. Which ANOVA model is most appropriate for analysis of these data? a. One-factor random-effects model b. Two-factor random-effects model c. Two-factor mixed-effects model d. One-factor repeated measures design e. Two-factor split-plot design ANSWER: E (Month is the within-subjects factor, and the status of participation in the program is the between-subjects factor.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. Gilbert wants to make generalizations about the class attendance rate in his district. He randomly samples four schools and collects data on the attendance rate of 50 classes (one class represents one observation). Which ANOVA model is most appropriate for analysis of these data? a. One-factor random-effects model b. Two-factor random-effects model c. Two-factor mixed-effects model d. One-factor repeated measures design e. Two-factor split-plot design ANSWER: A (Schools were randomly selected from the population; thus the one-factor random-effects model is appropriate.) 4. In the same study as described in Question 3, Gilbert randomly selected 25 classes from the four sampled schools. All selected classes participated in a program that aimed to increase attendance rate (the remaining classes were controls). Gilbert then collected data on the change of attendance rate to see whether the program made a difference. Which ANOVA model is most appropriate for analysis of these data? a. One-factor random-effects model b. Two-factor random-effects model c. Two-factor mixed-effects model d. One-factor repeated measures design e. Two-factor split-plot design ANSWER: C (School is the random factor, and the status of participation in the program is the fixed factor.) 5. Ravi conducts a study where randomly selected patients of a clinic are measured on their blood pressure each month over the course of one year. Which ANOVA model is most appropriate for analysis of these data? a. One-factor random-effects model b. Two-factor random-effects model c. Two-factor mixed-effects model d. One-factor repeated measures design e. Two-factor split-plot design ANSWER: D (Patients are measured on the same outcome, blood pressure, at multiple points in time; thus the one-factor repeated measures design is appropriate.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
6. In the same study as described in Question 5, suppose Ravi randomly assigned half of the selected patients to follow a specific diet plan (the other half are controls). Patient blood pressure is recorded every month over the course of one year. Ravi wanted to examine whether the diet plan made any difference in blood pressure. Which ANOVA model is most appropriate for analysis of these data? a. One-factor random-effects model b. Two-factor random-effects model c. Two-factor mixed-effects model d. One-factor repeated measures design e. Two-factor split-plot design ANSWER: E (Month is the within-subjects factor, and the status of participation in the diet plan is the between-subjects factor.) 7. In a taste test, the participants are asked to taste each of the three types of pies (the order is counterbalanced), and then rate how much they like each flavor. To compare the average ratings for each type of pie, which ANOVA model is most appropriate? a. One-factor random-effects model b. Two-factor random-effects model c. Two-factor mixed-effects model d. One-factor repeated measures design e. Two-factor split-plot design ANSWER: D (Participants give ratings to each of the three types of pies, thus type of pie is a repeated factor.) 8. If a given set of data were analyzed with both a two-factor fixed-effects model and a two-factor random-effects model, which F ratio will definitely stay unchanged? a. FA b. FB c. FAB d. All of the above e. None of the above ANSWER: C (For the fixed-effects model, FA = MSA/MSwith and FB = MSB/MSwith. For the random-effects model, FA = MSA/MSAB and FB = MSB/MSAB. For both fixed-effects and random-effects models, FAB = MSAB/MSwith.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
9. Suppose two researchers perform a one-factor ANOVA on the same data, but researcher A assumes a fixed-effects model, whereas researcher B assumes a random-effects model. Which of the following statements is false? a. Two researchers will have the same null hypothesis. b. Two researchers will obtain the same F ratio. c. Two researchers will obtain the same p value. d. Two researchers will obtain the same critical F value if they use the same significance level. ANSWER: A (The null hypothesis in fixed-effects model is 1 = 1 = . . . = J. The null hypothesis in random-effects model is a2 = 0.) 10. Which of the following statements regarding a comparison of the fixedeffects and random-effects models is false? a. One can make generalizations about all levels of the factor with randomeffects models, but not with fixed-effects models. b. One can exactly replicate studies using fixed-effects models, but studies using random-effects models cannot be exactly replicated. c. One can conduct MCP with fixed-effects models, but not with randomeffects models. d. Both fixed-effects model and random-effects models can be used in a repeated design. ANSWER: B (Random-effects models can also be replicated, though replications will not necessarily consist of the same levels.) 11. When testing the interaction effect of a two-factor design, which of the following models does not use MSwith as the denominator for that F ratio? a. Two-factor fixed-effects model b. Two-factor random-effects model c. Two-factor mixed-effects model d. Two-factor split-plot design e. All models use MSwith as the denominator for FAB ANSWER: D (In a two-factor split-plot design, MSBS is used as the denominator for FAB.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
12. How many hypotheses are tested in a one-factor repeated measures ANOVA? a.1 b. 2 c.3 d. 4 e.I t depends. ANSWER: A (In a one-factor repeated measures design, we can only test the main effect of factor A. The null hypothesis is H0: .1 = .2 = . . . = .J) 13. How many hypotheses are tested in a two-factor split-plot ANOVA? a.1 b. 2 c. 3 d. 4 e.I t depends. ANSWER: C (In a two-factor split-plot design, there are three F ratios in the ANOVA table—two main effects and an interaction—so there are three hypotheses to be tested.) 14. In a two-factor mixed-effects ANOVA, A (fixed factor) has 3 levels, and B (random factor) has 5 levels. Each cell has 5 observations. What is dfwith? a. 15 b. 60 c. 74 d. 75 ANSWER: B (J = 3, K =5, n = 5. N = nJK = 75. dfwith = N - JK = 75 − 15 = 60. This is the same as in the fixed-effects model.) 15. In a two-factor ANOVA, A (fixed factor) has 3 levels, and B (random factor) has 5 levels. Each cell has 5 observations. The FA ratio has degrees of freedom equal to what? a.2 , 8 b. 2, 15 c. 2, 60 d. 2, 74 e.N one of the above ANSWER: A (Because FA is the ratio of MSA and MSAB, degrees of freedom for the F ratio is (J − 1), (J − 1)(K − 1), which turns out to be 2, 8.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
16. In a one-factor repeated measures ANOVA, A (fixed factor) has 4 levels, and each cell has 10 observations. The FA ratio has degrees of freedom equal to what? a. 4, 9 b. 4, 27 c. 3, 27 d. 3, 36 ANSWER: C (J = 4, n = 10. Because FA is the ratio of MSA and MSSA, degrees of freedom for the F ratio is (J − 1), (J − 1)(n − 1), which turns out to be 3, 27.) 17. In a two-factor split-plot ANOVA, A (between-subjects factor) has 3 levels, B (within-subjects factor) has 4 levels, and each cell has 5 observations. The FA ratio has degrees of freedom equal to what? a. 2, 4 b. 2, 6 c. 2, 12 d. 2, 36 ANSWER: C (J = 3, K = 4, n = 5. Because FA is the ratio of MSA and MSS, degrees of freedom for the F ratio is (J − 1), J(n − 1), which turns out to be 2, 12.) 18. In the table below, each cell represents one combination of factor levels. The actual experiment assigns participants only to the combinations represented by the shaded cells (the rows are randomly selected). This experiment is an example of which of the following designs? B1
B2
B3
A1 A2 A3 A4 A5
a.A two-factor fixed-effects design b. A two-factor random-effects design c. A two-factor mixed-effects design, where A is the fixed factor d. A two-factor mixed-effects design, where B is the fixed factor e. A nested design ANSWER: D (Levels of A are randomly selected and cell effects of a given column do not necessarily sum to 0, so A is a random factor. All levels of B .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
are selected and cell effects of a given column always sum to 0, so B is a fixed factor.)
19. In a one-factor random-effects model, suppose we find that the effect of factor A is significant at = .05. a. We should conduct MCP on A. b. We should conduct MCP only when A has more than two levels. c. We conclude that not all the .j. are equal at significance level .05. d. We are sure that the effect of A will also be significant if we have applied a fixed-effects model to the data. ANSWER: D (MCPs are not appropriate for random effects. The hypotheses of random effects deal with the variation, not the means.) 20. MCP can be used to further examine a. the significant interaction effect in a two-factor random-effects model. b. the significant interaction effect in a two-factor mixed-effects model. c. the significant main effect of a between-subjects fixed factor in a twofactor split plot model. d. the significant main effect of a within-subjects random factor in a twofactor split plot model. ANSWER: C (MCP is appropriate only when the factors involved have fixed levels.)
21. In a two-factor mixed-effects ANOVA (A is the fixed-effects factor and B is the random-effects factor), which of the following statements is not always true? a. E(MSA) ≥ E(MSAB) b. E(MSB) ≥ E(MSAB) c. E(MSA) ≥ E(MSwith) d. E(MSB) ≥ E(MSwith) e. E(MSAB) ≥ E(MSwith) ANSWER: B (When A is the fixed-effects factor and B is the random-effects factor, E(MSA) = 2 + nb2 + Jn2, E(MSB) = 2 + Jnb2, E(MSAB) = 2 + nb2, E(MSwith) = 2. Therefore E(MSB) is not necessarily larger than E(MSAB).)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
22. In which of the following models can the population variance of the residual errors (2) be estimated? a. Two-factor fixed-effects model b. One-factor repeated measures design c. Two-factor split-plot design d. All of the above ANSWER: A (In two-factor fixed-effects model, E (MSwith) = 2, so the term MSwith can be used to estimate the population variance of residual errors. In repeated measures models, however, there is no term estimating 2 alone.) 23. Which model(s) generally requires the assumption of sphericity? a. One-factor repeated design b. Two-factor split plot design c. Three-factor design with two between-subjects factors and one withinsubjects factor d. All of the above e. None of the above ANSWER: D (Models that involve within-subjects factor require the assumption of sphericity.) 24. In a one-factor repeated measures model, scores are collected at three different time points (J = 3). The assumption of sphericity basically states that a. the variance of scores collected at each time point is the same. b. the variance of the difference scores between the first time point and the second time point is the same as that between the second time point and the third time point. c. the variance of the difference scores is the same for each of the three pairs of time points. d. both the variance and covariance of the difference scores are the same for each of the three pairs of time points. ANSWER: C (See definition of sphericity.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
25. If the assumption of sphericity (i.e., compound symmetry) is violated, what should the researcher do? a. Because the effects of assumption violation will be minimal, we should use the results of the usual F test anyway. b. If H0 is rejected using the usual F test, we should use alternative F tests to verify the results. c. If H0 is not rejected using the usual F test, we should use alternative F tests to verify the results. d. Because the usual F test is not valid, we cannot analyze the data using this ANOVA model. ANSWER: B (When the assumption of sphericity is violated, the usual F test tends to be too liberal. Therefore when the usual F test rejects H0, we need to verify the results using alternative F tests.) 26. A researcher interested in making generalizations about the entire population of categories of the independent variable, not just the levels that were sampled, should pursue what type of ANOVA model? a. Covariate model b. Fixed effect c. Random effect d. Repeated measure ANSWER: C 27. Which of the following is not a characteristic of a two-factor random effects ANOVA? a. Two factors (or independent variables) each with two or more levels b. The levels for one of the factors are randomly sampled from the population of levels (i.e., the random-effects factor) and all of the levels of interest for the second factor are included in the design (i.e., the fixed-effects factor) c. Subjects are randomly selected and assigned to one category of one independent variable only d. The dependent variable is measured at least at the interval level ANSWER: C (Subjects are randomly selected and assigned to one combination of the levels of the two factors.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
28. Which of the following are variations of a two-factor mixed-effects ANOVA? a. One where factor A is fixed and factor B is fixed, and the other where factor A is random and factor B is random. b. One where factor A is fixed and factor B is random, and the other where factor A is fixed and factor B is fixed. c. One where factor A is fixed and factor B is random, and the other where factor A is random and factor B is fixed. d. One where factor A is random and factor B is random, and the other where factor A is fixed and factor B is fixed. ANSWER: C
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 15 (Random- and Mixed-Effects Analysis of Variance Models) Test Bank II. Short answer 1. Complete the following ANOVA summary table for a two-factor model, where there are two levels of factor A (fixed program effect) and five levels of factor B (random school effect). Each cell of the design includes seven students ( = .05). Source
SS
df
MS
F
Critical Value
Decision
A B AB
100 210 __
__ __ __
__ __ 20
__ __ __
__ __ __
__ __ __
Within Total
__ 690
__ __
__
ANSWER: There are 2 levels of the fixed factor A, so J = 2. There are 5 levels of the random factor B, so K = 5. Each cell has 7 students, so n = 7. N = 2*5*7 = 70. dfA = J − 1 = 2 − 1 = 1, dfB = K − 1 = 5 − 1 = 4, dfAB = (J − 1)(K − 1) = 1*4 = 4, dfwith = N − JK = 70 – 2*5 = 60, dftotal = N − 1 = 70 − 1 = 69. SSAB = MSAB*dfAB = 20*4 = 80; SSwith = SStotal − SSA − SSB − SSAB = 690 − 100 − 210 − 80 = 300. MSA = SSA/dfA = 100/1 = 100; MSB = SSB/dfB = 210/4 = 52.5; MSwith = SSwith/dfwith = 300/60 = 5. FA = MSA/MSAB = 100/20 = 5; critical value for A = .05F1,4 = 7.71 > FA, fail to reject H0. FB = MSB/MSwith = 52.5/5 = 10.5; critical value for B = .05F4,60 = 2.53 < FB, reject H0. FAB = MSAB/MSwith = 20/5 = 4; critical value for AB = .05F4,60 = 2.53 < FAB, reject H0.
.
Source
SS
df
MS
F
Critical Value
Decision
A B AB
100 210 80
1 4 4
100 52.5 20
5 10.5 4
.05F1,4 = 7.71
fail to reject H0 reject H0 reject H0
Within
300
60
5
.05F4,60 = 2.53 .05F4,60 = 2.53
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Total
690
69
2. Complete the following ANOVA summary table for a two-factor model, where there are three levels of factor A (random dosage effect) and four levels of factor B (random physician effect). Each cell of the design includes six patients ( = .05). Source
SS
df
MS
F
Critical Value
Decision
A B AB Within Total
25 __ __ __ 160
__ __ __ __ __
__ 15 __ 1
__ __ __
__ __ __
__ __ __
ANSWER: There are 3 levels of the fixed factor A, so J = 3. There are 4 levels of the random factor B, so K = 4. Each cell has 6 students, so n = 6. N = 3*4*6 = 72. dfA = J − 1 = 3 − 1 = 2, dfB = K − 1 = 4 − 1 = 3, dfAB = (J − 1)(K − 1) = 2*3 = 6, dfwith = N − JK = 72 – 3*4 = 60, dftotal = N − 1 = 72 − 1 = 71. SSB = MSB*dfB = 15*3 = 45; SSwith = MSwith*dfwith = 1*60 = 60; SSAB = SStotal − SSA − SSB − SSwith = 160 − 25 − 45 − 60 = 30. MSA = SSA/dfA = 25/2 = 12.5; MSAB = SSAB/dfAB = 30/6 = 5. FA = MSA/MSAB = 12.5/5 = 2.5; critical value for A = .01F2,6 = 10.93 > FA, fail to reject H0. FB = MSB/MSAB = 15/5 = 3; critical value for B = .01F3,6 = 9.78 < FB, fail to reject H0. FAB = MSAB/MSwith = 5/1 = 5; critical value for AB= .01F6,60 = 3.12 < FAB, reject H0. Source
SS
Df
MS
F
Critical Value
Decision
A B AB Within Total
25 45 30 60 160
2 3 6 60 71
12.5 15 5 1
2.5 3 5
.01F2,6 = 10.93
fail to reject H0 fail to reject H0 reject H0
.01F3,6 = 9.78 .01F6.60 = 3.12
3. To examine development in the pedagogical content knowledge (PCK) of new .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
teachers, 12 novice teachers are measured on their PCK at the beginning of their teaching career, at the end of the first semester, and at the end of the first school year. The scale that measures PCK ranges from 0 to 45, with higher scores reflecting greater levels of PCK. The data are shown below. Conduct a onefactor repeated measures ANOVA to determine the mean differences across time, using = .05. Use the Bonferroni method to detect exactly where the differences are among the time points (if they are different). Subject 1 2 3 4 5 6 7 8 9 10 11 12
Time 1 16 17 11 15 21 14 12 17 11 8 14 12
Time 2 17 23 21 23 32 22 19 21 16 14 24 19
Time 3 28 32 30 35 40 31 31 25 24 20 28 25
ANSWER: Procedure: Create a data set with three variables: Time 1, Time 2, and Time 3. The data set should have 12 cases, representing 12 teachers. 1. Go to Analyze → General Linear Model → Repeated Measures. 2. For Within-Subject Factor Name, enter "Time." For Number of Levels, enter 3 (i.e., 3 time points). Click Add. For Measure Name, enter "PCK." Click Add. 3. Click Define. Move Time1 through Time3 in the left box to Within-Subjects Variables (Time). 4. Click Options. Move "Time" to Display Means for. Select Compare main effects. Select Bonferroni under Confidence interval adjustment. Select Descriptive statistics, Estimates of effect size, and Observed power. Click Continue. 5. To get profile plots, click Plots. Move "Time" to the box under Horizontal Axis. Click Add. Click Continue. 6. Click OK.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Selected SPSS Output: Mauchly's Test of Sphericity Measure: PCK Epsilon Within Subjects Mauchly's Approx. df Sig. Effect W Chi-Square Greenhouse-Geisser Huynh-Feldt Lower-bound
Time
.825
1.925
2 .382
.851
.991
.500
Tests of Within-Subjects Effects Measure: PCK Type III Sum of Squares
df
Mean Square
Sphericity Assumed
1368.167
2
684.083 139.566.000
.927
1.000
Greenhouse-Geisser Huynh-Feldt
1368.167 1368.167
1.702 803.864 139.566.000 1.981 690.476 139.566.000
.927 .927
1.000 1.000
Lower-bound
1368.167
1.000 1368.167 139.566.000
.927
1.000
Sphericity Assumed
107.833
Greenhouse-Geisser
107.833
Huynh-Feldt
107.833
22 18.72 2 21.79 6
Lower-bound
107.833
Source
Time
Error (Time)
11.00 0
F
Sig.
Partial Eta Power a Squared
4.902 5.760 4.947 9.803
a. Computed using alpha = .05 Pairwise Comparisons: Bonferroni Method Measure: PCK (I) Time (J) Time Mean Difference (I-J) Std. Error Sig. 1 2 3
95% Confidence Interval for Difference Lower Bound −9.205
Upper Bound −4.628
2
−6.917
3
−15.083
1.076 .000
−18.119
−12.048
1
6.917
*
.811
.000
4.628
9.205
3
−8.167*
.796
.000
−10.412
−5.922
1
*
15.083
1.076 .000
12.048
18.119
2
*
.796
5.922
10.412
* *
8.167
.811
.000
.000
The results of the test of sphericity give no evidence against the assumption of sphericity (Mauchly's W = .825, 2 = 1.925, df = 2, p = .382). Based on the results of the one-factor repeated measures ANOVA, the effect of the within-subjects factor, Time, is significant at the .05 level (F2,22 = 139.556, p < .001) with large effect size (partial 2 = .927) and maximum observed power, implying that the novice teachers' average level of PCK has changed significantly across three time points. Results of MCP, using the Bonferroni method, reveal significant differences among all pairs of time points. Specifically, the level of PCK increased significantly from Time 1 (i.e., the beginning of the school year) to Time 2 (i.e., the end of the .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
first semester), from Time 2 to Time 3 (i.e., the end of the school year), and from Time 1 to Time 3. The results therefore suggest that the novice teachers' pedagogical content knowledge increases substantially in their first two semesters of teaching.
4. Using the same data as in Problem 3, conduct a two-factor split-plot ANOVA, where the first six teachers participate in a professional development program throughout their first year of teaching, and the last six teachers do not participate in the program ( = 05). Does the program seem to have an effect on the change in teachers' pedagogical content knowledge? ANSWER: Procedure: Add another variable to the data set: Program (the between-subjects factor). The first six cases have value 1 for the new variable, and the last six cases have value 0. Follow the procedures as described in Question 3. In step 3, also add "Program" to Between-Subjects Factor(s). In step 4, also select "Homogeneity tests." In step 5, move "Time" to the box under Horizontal Axis, and then move "Program" to Separate Lines. Click Add. Selected SPSS Output: Mauchly's Test of Sphericity Measure: PCK Epsilon Within Subjects Mauchly's Approx. df Sig. Effect W Chi-Square Greenhouse-Geisser Huynh-Feldt Lower-bound
Time
.
.869
1.259
2 .533
.885
1.000
.500
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Tests of Within-Subjects Effects Measure: PCK Type III Sum of Squares
Source
Time
Time*Program
df
Mean Square
F
Sig.
Partial Eta Powera Squared
Sphericity Assumed Greenhouse-Geisser
1368.167 1368.167
2 684.083 163.961.000 .943 1.769 773.394 163.961.000 .943
1.000 1.000
Huynh-Feldt
1368.167
2.000 684.083 163.961.000 .943
1.000
Lower-bound
1368.167
1.000 1368.167163.961.000 .943
1.000
Sphericity Assumed
24.389
Greenhouse-Geisser Huynh-Feldt Lower-bound
Sphericity Assumed Greenhouse-Geisser Error (Time) Huynh-Feldt Lower-bound
2
12.194
2.923 .077 .226
.506
24.389 24.389
1.769 13.786 2.000 12.194
2.923 .085 .226 2.923 .077 .226
.471 .506
24.389
1.000 24.389
2.923 .118 .226
.340
83.444 83.444 83.444
20 4.172 17.690 4.717 20.000 4.172
83.444
10.000 8.344
a. Computed using alpha = .05 Tests of Between-Subjects Effects Measure: PCK Transformed Variable: Average Type III Sum of Squares
Source
Intercept 16384.000 Program 215.111 Error 362.889 a. Computed using alpha = .05
Mean Square
df 1 1 10
F
Sig.
16384.000 451.488 .000 215.111 5.928 .035 36.289
Partial Eta Observed Squared Powera .978 .372
1.000 .594
Pairwise Comparisons: Bonferroni Method Measure: PCK (I) Time (J) Time Mean Difference (I-J) Std. Error Sig. 1 2 3
95% Confidence Interval for Difference Lower Bound
Upper Bound
.000 .000
−9.330 −17.817
−4.504 −12.350
.841 .687 .952
.000 .000 .000
4.504 −10.139 12.350
9.330 −6.194 17.817
.687
.000
6.194
10.139
2 3
*
−6.917 −15.083*
.841 .952
1 3 1
6.917* −8.167* 15.083*
2
8.167*
The results of the test of sphericity give no evidence against the assumption of sphericity (Mauchly's W = .869, 2 = 1.259, df = 2, p = .533). Based on the results of .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
the two-factor split-plot ANOVA, the interaction between Time and Program is not significant at = .05 (F2,20 = 2.923, p = .077), though the effect size is large (partial 2 = .226) and the power is not adequate (.506). The patterns of change in PCK do not significantly differ between the teachers who participated in the professional development program and those who did not. The main effect of the within-subjects factor, Time, is significant at the .05 level (F2,20 = 163.961, p < .001), with large effect size (partial 2 = .943) and maximal power. This implies that teachers' level of PCK has changed significantly across time. MCP using the Bonferroni method reveals significant differences among all pairs of time points, and shows that PCK increases over time. The main effect of the between-subjects factor, Program, is also significant (F1,10 = 5.928, p = .035) with large effect size (partial 2 = .372), though the power is not satisfactory (.592). This indicates that there is a significant difference in the average level of PCK between teachers who participated in the mentoring program and those who did not.
5. Dr. Bellus wants to study how people's perception of attractiveness is affected by the gender of the subjects, and the facial expression of the object. Each participant views six headshots (in random order) of the same person, who demonstrates different facial expressions (joy, surprise, fear, anger, sadness, and disgust) in each picture. The participants are then asked to rate the attractiveness level of each picture. The rating is the dependent variable. Below is the selected output of a two-factor split plot ANOVA. Test of Between-Subjects Effects Source A (between-subjects factor) S (subject)
Type III SS 226.81 2084.61
df 1 28
MS 226.81 74.45
F 3.04
Source
Type III SS
df
MS
F
B (within-subjects factor) A*B B*S
18802.53 324.22 13829.70
5 5 140
3760.506 64.84 98.78
38.07 0.66
Test of Within-Subjects Effects
a. Identify the between-subjects factor and within-subjects factor. b. Are the effects significant ( = .05)? What do these results tell us about people's perception of attractiveness? c. Is MCP necessary? If so, on which effect(s) should we conduct MCP? ANSWER: a. The between-subjects factor is gender (the subject is either male or female). The within-subjects factor is type of facial expression (each subject views and .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
rates all six pictures). b. For the AB interaction, .05F5,140 = 2.28 > .66, so we fail to reject the null and conclude that there is no significant interaction. In other words, the effects of facial expression, if any, are similar for males and females. For the between-subject effects, .05F1,28 = 4.20 > 3.04, so we fail to reject the null and conclude that there is no significant gender effect. In other words, males and females do not give significantly different ratings. For the within-subject effects, .05F5,140 = 2.28 < 38.07, so we reject the null and conclude that the type of facial expression has a significant effect. In other words, the perception of attractiveness is substantially affected by the facial expression of the person in the picture. c. MCP should be conducted on the fixed-effect factor B, i.e., type of facial expression, because its effect turns out to be significant, and there are more than two levels.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 16 (Hierarchical and Randomized Block Analysis of Variance Models) Test Bank Multiple choice 1. To study the effects of a new mathematics textbook, 30 students are randomly selected. Based on the scores of their achievement test, students are grouped into three groups (limited, 0–15; proficient, 16–30; advanced, 31–45). Half of the students in each group are then randomly assigned to use the new textbook, while the rest of the students continue to use the old textbook. Which of the following methods of blocking is employed here? a. Predefined value blocking b. Predefined range blocking c. Sampled value blocking d. Sampled range blocking e. Post hoc blocking method ANSWER: B (The blocks are formed based on predefined ranges of an interval blocking variable, i.e., the achievement test scores.)
2. In the scenario described in Question 1, instead of using the raw scores, the researcher used the percentile rank in the achievement test and grouped the students into three new groups: the top third, the middle third, and the bottom third. Then half of the students in each group were randomly assigned to use the new textbook. Which of the following methods of blocking is employed here? a. Predefined value blocking b. Predefined range blocking c. Sampled value blocking d. Sampled range blocking e. Post hoc blocking method ANSWER: A (The blocks are formed based on predefined values of an ordinal blocking variable, i.e., the percentile ranks.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. Ravi was studying the effect of a new medicine on subjects' blood pressure. He randomly sampled 60 laboratory rats and ranked them based on their current blood pressure. Those ranked from 1 to 3 (i.e., the three rats with the highest blood pressure) formed group 1, those ranked from 4 to 6 formed group 2, etc. Ravi then randomly selected five groups (i.e., 15 rats). Within each group, rats were randomly assigned to three groups: new medicine, old medicine, and no medicine. Which of the following methods of blocking is employed here? a. Predefined value blocking b. Predefined range blocking c. Sampled value blocking d. Sampled range blocking e. Post hoc blocking method ANSWER: D (The subjects are ranked on a ratio blocking variable, i.e., the blood pressure, and a random sample of blocks are drawn. Then subjects within the blocks are assigned to treatment.) 4. Azita wants to evaluate the effects of two new types of word processors (Macrosoft word and Toggle doc) on employees' working efficiency. Twelve employees were randomly selected and grouped into four blocks based on the nature of the work (proofreading, copyediting, technical writing, data analysis). Within each block, each employee was randomly assigned to use one of the three word processors: Macrosoft word, Toggle doc, and the old word processor. Their working efficiency as measured by the number of files processed per day was compared. Which of the following statements is false? a. Both the blocking factor and the treatment factor are fixed. b. The blocking factor and the treatment factor are crossed. c. It is expected that the blocking factor (nature of work) is related with the number of files processed per day. d. We can conduct MCP on the interaction between nature of work and type of word processor if it is found to be significant in ANOVA. ANSWER: D (The interaction cannot be tested since there is only one subject in each cell.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. In a split-plot design, suppose A is the between-subjects factor, B is the withinsubjects factor, and S is the subject. Which of the following statements is true? a. Subject is nested within factor A. b. Subject is nested within factor B. c. Factor A is nested within subject. d. Factor A is nested within factor B. ANSWER: A (In a split-plot design, each subject is only assigned to one level of the between-subjects factor, but is tested on all levels of the within-subjects factor. Therefore, at each level of factor A, unique subjects appear.) 6. To study the effects of diet plans on blood pressure, the researcher assigned 10 subjects to diet plan A, and 10 participants to diet plan B. Within each group, five of the subjects are male and five participants are female. Which of the following statement is true? a. Gender is nested within diet plans. b. Diet plan is nested within gender. c. Gender and diet plan are crossed. d. Diet plan is nested within subjects. ANSWER: C (Within each type of diet plan, both male and female participants are observed; within each gender, both types of diet plans are observed.) 7. If a design is denoted by A(BC), it means that a. factor A is nested within factor B and factor C. b. both factor B and factor C is nested within factor A. c. factor A is the blocking factor and factors B and C are the treatment factors. d. factor A is the treatment factor and factors B and C are the blocking factors. ANSWER: A (This is the notation for factor A nested within factors B and C.) 8. If a design is denoted by A(BC), and there are four observations in each cell, which of the following effects can be studied? a. The A×B interaction b. The A×C interaction c. The B×C interaction d. The A×B×C interaction e. All of the above ANSWER: C (Factor A is nested within factors B and C. Factors B and C are crossed. The interaction can only be studied when the factors are completely crossed.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
9. To determine if statistical software A is uniformly superior to software B for the population of statistical consultants, from which random samples are taken to conduct a study, one needs a. a nested design with a fixed-effects model. b. a crossed design with a fixed-effects model. c. a nested design with a random-effects model. d. a crossed design with a random-effects model. e. a nested design with a mixed-effects model. f. a crossed design with a mixed-effects model. ANSWER: F (While the type of statistical software is fixed, consultant is random, and the design is mixed. To determine uniform superiority, a crossed design should be used.) 10. For the same course in statistics, two instructors are from the mathematics department, and two instructors are from the statistics department. Each instructor taught one session of the course (one student may be enrolled in only one session). The researcher is interested in comparing student evaluation of instructors from different departments. This is an example of which type of design? a. Completely crossed design b. Repeated measures design c. Hierarchical design d. Randomized block design ANSWER: C (Instructors are nested within department.) 11. In the scenario as described in Question 10, suppose the instructors from the same department are randomly assigned to use either the lecture-based method or the computer-based method. The researcher is primarily interested in the effects of different instructional methods on student outcomes. Which of the following statements is correct? a. Instructional method is used as a blocking factor. b. Department and instructional method are crossed. c. Instructor is nested within department but crossed with method. d. Instructor is nested within method but crossed with department. ANSWER: B (Instructors are nested within both department and method. Department and method are crossed, and department is used as the blocking factor.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
12. How many hypotheses are tested in a design denoted by A(B)? a. 1 b. 2 c. 3 d. 4 e. It depends. ANSWER: B (A(B) denotes a two-factor hierarchical design, where factor A is nested within factor B. We can test the main effect of B, and the effect of A(B).) 13. In which of the following designs can we test the A×B interaction? a. A and B are crossed, and n = 1. b. A and B are crossed, and n = 2. c. A is nested within B, and n = 1. d. A is nested within B, and n = 2 ANSWER: B (To test the interaction, the two factors must be crossed, and there must be more than one observation per cell.) Questions 14–17 are based on the following figure. A1
A2
A3
A4
A5
A6
B1 B2 B3
Each cell represents one combination of factor levels. The actual experiment assigns participants only to the combinations represented by the shaded cells. It is unknown whether the levels of the factors are fixed or randomly selected. 14. Which of the following best describes the design of the experiment? a. A×B b. B×A c. A(B) d. B(A) ANSWER: C (Factor A is nested within factor B, because at each level of B, unique levels of A appear.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
15. A is a random-effects factor and B is a fixed-effects factor. Each cell has 5 observations. The FB ratio has degrees of freedom equal to what? a.2 , 3 b. 2, 10 c.2 , 24 d. 3, 72 ANSWER: A (Because FB is the ratio of MSB and MSA(B), degrees of freedom for the F ratio are (K − 1), K(J(k) − 1). Since K = 3, J(k) = 2, FB has df equal to 2,3.) 16. Both A and B are fixed-effects factors. Each cell has 5 observations. The FB ratio has degrees of freedom equal to? a. 2, 3 b. 2, 10 c. 2, 24 d. 3, 72 ANSWER: C (Because FB is the ratio of MSB and MSwith, degrees of freedom for the F ratio are (K − 1), KJ(k)(n − 1). Since K = 3, J(k) = 2, and n = 5, FB has df equal to 2, 24.) 17. Both A and B are random-effects factors. Each cell has 5 observations. The FB ratio has degrees of freedom equal to what? a. 2, 3 b. 2, 10 c. 2, 24 d. 3, 72 ANSWER: A (Because FB is the ratio of MSB and MSA(B), degrees of freedom for the F ratio are (K − 1), K(J(k) − 1), which is equal to 2,3.) 18. If the correlation between the concomitant variable and dependent variable is −.35, and the concomitant variable is measured on an interval scale, which of the following designs is recommended? a. One-factor ANOVA b. ANCOVA c.R andomized block ANOVA d. Either ANCOVA or randomized block ANOVA ANSWER: C (When .2 < r < .4, it is recommended that the concomitant variable be used as a blocking factor.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
19. If the correlation between the concomitant variable and dependent variable is .05, and the concomitant variable is measured on an ordinal scale, which of the following designs is recommended? a. One-factor ANOVA b. ANCOVA c. Randomized block ANOVA d. Either ANCOVA or randomized block ANOVA ANSWER: A (When r < .2, it is recommended that the concomitant variable be ignored, because the addition of the concomitant variable will not reduce much residual variation.) 20. If the correlation between the concomitant variable and dependent variable is .5, and the concomitant variable is measured on an ordinal scale, which of the following designs is recommended? a. One-factor ANOVA b. ANCOVA c. Randomized block ANOVA d. Either ANCOVA or randomized block ANOVA ANSWER: C (When .4 < r < .6, use concomitant variable either as a blocking factor or as a covariate (about equal). However, since the concomitant variable is ordinal, it should be used only as a blocking factor. ) 21. Suppose two researchers performed ANOVA on the same data. Researcher A used a one-factor model, whereas researcher B added a blocking factor in addition to the treatment factor to the model. Which researcher will get the smaller residual variation? a. Researcher A b. Researcher B c. Both researchers will get the same residual variation. d. It depends. ANSWER: D (Whether the block design can effectively reduce residual variation depends on the correlation between the blocking factor and the dependent variable.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
22. Which of the following blocking methods is more precise in estimating the treatment effects for an ordinal blocking variable? a.P redefined value blocking b. Predefined range blocking c.S ampled value blocking d. Sampled range blocking e.P ost hoc blocking method ANSWER: A (For an ordinal blocking variable, the predefined value method is more precise than the sampled value method, and the post hoc method is the least precise of all the blocking methods.) 23. Which of the following variables may be used as a treatment factor? a.T eaching method. b. Years of teaching experience. c.P retest score. d. Gender. ANSWER: A (One should be able to assign individuals to different levels of treatment factors. Years of teaching experience, pretest scores, and gender cannot be assigned, so they must be used as either a blocking factor or a covariate.) 24. Compared to the two-factor fully crossed ANOVA, the two-factor hierarchical ANOVA a.h as the same model assumptions. b. has the same linear model. c. has the same ANOVA table. d. has the same hypotheses to be tested. ANSWER: A (The two-factor hierarchical ANOVA does not include the interaction term, and one factor is nested within another factor. ) 25. Which assumption is required for the randomized block model, but not for the two-factor fully crossed ANOVA? a.I ndependence b. Homogeneity of variance c.N ormality d. Sphericity ANSWER: D (The necessary and sufficient condition for the validity of the F test of A in a randomized block design is sphericity, which assumes that the variance of the difference scores for each pair of factor levels is the same.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
26. True or false? In a two-factor randomized block ANOVA, the blocking factor is often considered a nuisance variable for which the researcher wants to control through the design of the study. a. True b. False ANSWER: A 27. Randomized block designs are also known as which of the following? Select all that apply. a. Adjusted ANCOVA b. Matching designs c. Mixed effect models d. Treatment by block designs ANSWER: B and D 28. Which one of the following is the nonparametric equivalent to the two-factor randomized block ANOVA model? a. Chi-square test of equivalence b. Friedman c. Kruskal-Wallis d. Mann-Whitney ANSWER: B
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 16 (Hierarchical and Randomized Block Analysis of Variance Models) Test Bank Short answer 1. An experiment was conducted to compare the effects of four types of classroom layout on children's learning outcomes. Score on a pretest is used to form the blocks (low, middle, high). The mean scores on the dependent variable, children's score on a posttest quiz, are listed here for each cell. Type of classroom layout 1 2 3 4
Pretest score block low
middle
high
65 80 70 75
75 85 80 85
90 95 85 100
Use these cell means to graph the interaction between type of classroom layout and pretest score block. a. Is there an interaction between type of classroom layout and pretest score? b. What kind of recommendation would you make to teachers? ANSWER The profile plot of cell means is shown as follows.
a. No, there does not appear to be an interaction between type of classroom layout and pretest score. The three lines (representing three levels on pretest) in the plot are almost parallel, indicating that the effects of classroom layout are similar for students with different achievement levels on the pretest. b. Results suggest that the effects of classroom layout are generalizable across students with different levels of achievement on the pretest. Therefore, teachers should choose the type of classroom layout in which the students had the highest .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
learning outcome. However, the plots also suggest that the effects of four types of classroom may not differ significantly from each other.
2. An experiment was conducted to compare the effect of three types of daily schedules on the employees' work performance. The schedules all contained the same amount of tasks, but the tasks were listed in different orders in each schedule. The day of the week was used as the blocking variable. The mean scores on the dependent variable, the percentage of tasks completed, are listed here for each cell. Weekday Type of schedule Monday Tuesday Wednesday Thursday Friday 1 2 3
70 74 80
89 94 98
74 81 85
85 90 96
75 80 84
Use these cell means to graph the interaction between type of schedule and the day of the week. a. Is there an interaction between type of schedule and the day of the week? b. What kind of recommendation would you make to the manager? ANSWER The profile plot of cell means is shown as follows.
a. No, there does not appear to be an interaction between type of schedule and the day of the week. The three lines (representing three types of schedules) in the plot are almost parallel, indicating that the effects of type of schedule are similar for each weekday. b. Results suggest that the effects of type of schedule are generalizable across all weekdays. Therefore, the manager should implement schedule type 3, which .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
consistently yields the best work performance throughout the week.
3. A market researcher wanted to test three exotic flavors of cake (dragon fruit, root beer, and yam) on customers of different age groups (21–30, 31–40, 41–50, 51– 60). Thus, age is a blocking variable. The dependent measure was the customer's rating on how much he/she liked the cake. There were six subjects in each cell. Complete the ANOVA summary table below, assuming a fixed-effects model where = .05. Source
SS
df
MS
F
Critical Value
Decision
Flavor (A) Age (B) Interaction (AB) Within Total
300 __ __ __ 2400
__ __ __ __ __
__ 50 __ 25
__ __ __
__ __ __
__ __ __
ANSWER There are 3 levels of the fixed factor A (flavors of cake), so J = 3. There are 4 levels of the fixed factor B (age), so K = 4. Each cell has 6 subjects, so n = 6. N = 6*3*4 = 72. dfA = J − 1 = 3 − 1 = 2, dfB = K − 1 = 4 − 1 = 3, dfAB = (J − 1)(K − 1) = 2*3 = 6, dfwith = N − JK = 72 − 3*4 = 60, dftotal = N − 1 = 72 − 1 = 71. SSB = MSB*dfB = 50*3 = 150; SSwith = MSwith*dfwith = 25*60 = 1500; SSwith = SStotal − SSA − SSB − SSwith = 2400 − 300 − 150 − 1500 = 450. MSA = SSA/dfA = 300/2 = 150; MSAB = SSAB /dfAB = 450/6 = 75. FA = MSA/MSwith = 150/25 = 6; critical value for A = .05F2,60 = 3.15 < FA, reject H0. FB = MSB/MSwith = 50/25 = 2; critical value for B = .05F3,60 = 2.76 > FB, fail to reject H0. FAB = MSAB/MSwith = 75/25 = 3; critical value for AB= .05F6,60 = 2.25 < FAB, reject H0.
.
Source
SS
df
MS
F
Critical Value
Decision
Flavor (A) Age (B) Interaction (AB) Within Total
300 150 450 1500 2400
2 3 6 60 71
150 50 75 25
6 2 3
.05F2,60 = 3.15
reject H0 fail to reject H0 reject H0
.05F3,60 = 2.76 .05F6,60 = 2.25
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. Azita wants to evaluate the effects of three new types of word processors (1: MacrosoftWork; 2: ToogleDoc, 3: WordTerrific) on the employees' working efficiency. Twelve employees were randomly selected and grouped into four blocks based on the amount of their daily tasks (higher values indicate heavier work load). Within each block, each employee was randomly assigned to use one of the three word processors. Their working efficiency as measured by the number of files processed per day is listed in the table. Conduct a two-factor randomized block ANOVA ( = .05) and Bonferroni MCPs using SPSS to determine the results of the study. Subject
Word Processor
Workload
1 2 3 4 5 6 7 8 9 10 11 12
1 1 1 1 2 2 2 2 3 3 3 3
1 2 3 4 1 2 3 4 1 2 3 4
Number of files processed 8 13 18 26 3 10 13 20 5 12 14 20
ANSWER Procedure: Create a data set with three variables: Processor (the treatment factor), Workload (the blocking factor), and Files (the dependent variable). The data set should have 12 cases, representing 12 employees. Because there is only one observation per cell (n = 1), we will not include the interaction effect for "Processor*Workload" in the model. 1) Go to Analyze → General Linear Model → Univariate. 2) Select Score as the Dependent Variable. Select Processor and Workload as the Fixed Factors. 3) Click Model. Select Custom under Specify Model. Click the Build Term(s) toggle menu and select Main effect. Select Processor and Workload from the Factors & Covariates list on the left and move them to the Model box on the right. Click Continue. 4) Click Options. Move Processor and Workload to Display Means for. Select Compare main effects. Select Bonferroni under Confidence interval adjustment. Select Descriptive statistics, Estimates of effect size, and Observed power. Click .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Continue. 5) To get profile plots, click Plots. Move Workload to the box under Horizontal Axis and Processor to Separate Lines. Click Add. Click Continue. Click OK.
Selected SPSS Output: Tests of Between-Subjects Effects Dependent Variable: Files Source
Type III Sum of df Squares
Mean Square
Processor Workload Error Corrected Total
48.500 433.667 6.833 489.000
24.250 144.556 1.139
2 3 6 11
F
Sig.
21.293 126.927
.002 .000
Partial Eta Observed Squared Power .877 .984
.994 1.000
Bonferroni Pairwise Comparisons: Processor Dependent Variable: Files (I) Processor
1 MacrosoftWork 2 ToogleDoc 3 WordTerrific
(J) Processor 2 ToogleDoc 3 WordTerrific
Mean Std. Difference Sig Error (I-J) *
4.750 3.500*
1 MacrosoftWork −4.750* 3 WordTerrific −1.250 1 MacrosoftWork −3.500* 2 ToogleDoc
1.250
95% Confidence Interval for Difference
Lower Bound Upper Bound .755 .001 2.904 6.596 .755 .004 1.654 5.346 .755 .001 .755 .149 .755 .004
−6.596 −3.096 −5.346
−2.904 .596 −1.654
.755 .149
−.596
3.096
Bonferroni Pairwise Comparisons: Workload Dependent Variable: Files 95% Confidence Interval for Difference Mean Std. (I) (J) Sig Workload Workload Difference (I-J) Error Lower Bound Upper Bound * 2 −6.333 .871 .000 −8.465 −4.201 1
2
3
4
.
3
−9.667*
.871 .000
−11.799
−7.535
4 1
*
−16.667 6.333*
.871 .000 .871 .000
−18.799 4.201
−14.535 8.465
3
−3.333*
.871 .009
−5.465
−1.201
4
*
−10.333
.871 .000
−12.465
−8.201
1 2 4
*
9.667 3.333* −7.000*
.871 .000 .871 .009 .871 .000
7.535 1.201 −9.132
11.799 5.465 −4.868
1
16.667*
.871 .000
14.535
18.799
2 3
*
.871 .000 .871 .000
8.201 4.868
12.465 9.132
10.333 7.000*
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
A two-factor randomized-block ANOVA was conducted to compare three different word processors. The level of workload was used as the blocking factor. The ANOVA summary table showed that the main effect of Processor (F = 21.293, df = 2, 6, p = .002) and that of Workload (F = 126.927, df = 3, 6, p < .001) were both significant at the .05 level. The treatment effect has a large effect size (partial η2 = .877) and adequate power (.994). The Bonferroni MCP showed that there were significant mean differences for MacrosoftWork vs. ToogleDoc (p = .001), and for MacrosoftWork vs. WordTerrific (p = .004). The mean difference between WordTerrific and ToogleDoc, however, was not significant (p = .149). Employees who used MacrosoftWork processed more files than the employees using other word processors. There were also significant mean differences for all pairs of levels of blocks (all p < .05). Employees with a heavier workload processed more files than those with a lighter workload. The results are also demonstrated by the profile plot. As shown in the plot, employees who used MacrosoftWork processed more files than those who used WordTerrific and ToogleDoc. This trend is consistent across all levels of workload.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. Dr. Numerus wanted to examine two different methods to teach number theory. He randomly selected six instructors from the department and assigned three of them to use method 1 and three of them to use method 2. There were 20 students in each instructor's class. At the end of the semester, Dr. Numerus collected the test scores from all six classes and ran a two-factor hierarchical ANOVA. Below is the selected output generated by SPSS. Test of Between-Subjects Effects Source
Type III SS
df
MS
F
A B(A) Within Total
764.54 944.22 1356.75 3065.51
1 4 114 119
764.54 236.06 11.90
64.23993 19.83435
a. Identify factor A and factor B. b. For each factor, identify whether it is a fixed-effects or random-effects factor. c. Is the SPSS ANOVA summary table correct? If not, correct the erroneous part(s). d. What conclusions can you draw from the correct test results? (Use = .05) ANSWER a. Factor A is teaching method. Factor B is instructor (nested within method). b. Factor A is a fixed-effects factor. Factor B is a random-effects factor. This is a mixed-effects design. c. The F ratio for A is erroneous. In the mixed-design where A is fixed and B is random, FA should use MSB(A) as the denominator. Therefore, the correct value for FA = MSA/MSB(A) = 764.54/236.06 = 3.24. d. The critical value for A = .05F1,4 = 7.71. Therefore, we fail to reject the null hypothesis that all levels of A have the same mean. The critical value for B = .05F4,114 = 2.45. Therefore, the null hypothesis that levels of B (nested within A) have the same mean is rejected. We conclude that while the effect of instructors is significant, the effect of teaching method is nonsignificant. In other words, the variation in test scores is mostly due to differences in instructors, instead of different teaching methods.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 17 (Simple Linear Regression) Test Bank Multiple choice 1. The regression line for predicting college GPA from SAT scores is found to be Y' = 0.0016X + 0.6. Karen's SAT score is 1800, and Mary's SAT score is 1600. What is the predicted difference in their college GPA? a. Karen's predicted GPA is 0.32 higher than Mary's predicted GPA. b. Karen's predicted GPA is 0.92 higher than Mary's predicted GPA. c. Karen's predicted GPA is 0.32 lower than Mary's predicted GPA. d. Karen's predicted GPA is 0.32 lower than Mary's predicted GPA. e. Karen and Mary have the same predicted GPA. ANSWER: A (Karen's predicted GPA = 0.0016(1800) + 0.6 = 3.48. Mary's predicted GPA = 0.0016(1600) + 0.6 = 3.16. Therefore, the difference in the predicted GPA between Karen and Mary is 3.48 − 3.16 = 0.32.) 2. Complete this sentence by selecting one of the following statements: "In simple linear regression, if the slope is found to be −0.002, . . ." a. the value of Y is equal to −.002 when X is 0. b. the value of Y is equal to −.002 when X is 1. c. the value of Y will decrease by 0.002 units when X increases by 1 unit. d. there is a negative, but very weak relationship between X and Y. ANSWER: C (The slope represents the change in Y with when X increases by one unit.) 3. Sarah collected the data on heights and weights from 100 graduate students. Based on the data, she built a simple linear regression model to predict weight (in lbs) from height (in inches). The regression line is found to be Y' = 4X − 136. Which of the following statements is the correct interpretation of the equation? a. When weight increases by 1 lb, height is expected to increase by 4 inches. b. When weight decreases by 4 lbs, height is expected to increase by 1 inch. c. When height increases by 1 inch, weight is expected to increase by 4 lbs. d. When height increases by 4 inches, weight is expected to decrease by 1 lb. ANSWER: C (Height is the independent variable X, and weight is the dependent variable Y. The slope equals 4, which means that weight is expected to increase by 4 lbs as height increases by 1 inch.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. In the scenario as described in Question 3, Sarah now used the regression line she obtained to predict the weight of her three-year old niece, who is 34 inches tall. The predicted weight for her niece, however, turned out to be 0 lb. What is the problem with Sarah's prediction? a. Sarah must have incorrectly estimated the intercept of the regression line. b. The regression line was correct, but a computational error was made in the prediction. c. Sarah should have included the error term (ei) in the regression model when she made the prediction. d. Sarah was extrapolating beyond the sample predictor data. e. There is no problem with Sarah's prediction. This is how statistics work. ANSWER: D (The model is based on the data collected from young adults. Therefore, it is not appropriate to use the model to predict the weight of a child, whose height is outside of the range of the X values used in developing the model.) 5. It is known that X = 1.5, X2 = 25, Y = 10, and Y2 = 0. A simple linear regression model was estimated. Which of the following is the variance of the predicted values of Y? a. 0 b. 15 c. 25 d. 250 e. It depends on the correlation between X and Y. ANSWER: A (When Y2 = 0, YX = XY Y/X = 0. When the slope is equal to 0, all predicted values will be the same, so the variance of Y' will be 0.) 6. It is known that X = 10, X2 = 16, Y = 52, Y2 = 8, XY = 0. A simple linear regression model is estimated. Which of the following statements is true? a. X and Y are completely unrelated. b. The slope of the regression model will be zero. c. The intercept of the regression model will be zero. d. The prediction of Y based on the linear model will be very accurate. ANSWER: B (When XY = 0, YX = XY Y/X = 0. Therefore the slope will be zero, and the model will not have good prediction power. On the other hand, even if XY = 0, X and Y may still be related in a nonlinear way.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
7. It is known that rXY = 0.5, sX2 = 1, sY2 = 1. A simple linear regression model is estimated. The regression line will have a slope of which one of the following? a. 0 b. 0.5 c. 0.5 or −0.5 d. 1 or −1 e. It depends on the values of 𝑋̅ and 𝑌̅. ANSWER: C (When rXY = 0.5, sX2 = sY2 = 1, bYX = rXY sX/sY = ±0.5.) 8. In a study of the relation between hours watching TV per day (X) and scores on the final exam (Y), the equation of regression line is found to be Y' = −7X + 100. Suppose Jamie watches TV two hours per day, and he scored a 91 on the exam. What is the residual score for Jamie? a. −7 b. −5 c. 0 d. 5 e. 100 ANSWER: D (For Jamie, Xi = 2. The predicted score is Yi' = −7(2) + 100 = 86. The residual score is ei = Yi − Yi' = 91 − 86 = 5.) 9. Doug wanted to use simple linear regression to study the relation between the time to complete a marathon (in hours) (Y) and the fluid intake (in ml) during the race (X). Based on the same data set, he estimated two models. Model 1: X1 = total amount of fluid intake; Y = .00028X1 + 3.97. R12 = .014. Model 2: X2 = amount of fluid intake per hour; Y = −.0052X2 + 7.84. R22 = .65. Suppose for both models, all assumptions for linear regression are satisfied. Compare the two models. a. Doug should use model 1, because the correlation between X1 and Y is stronger than that between X2 and Y (bYX1 > bYX2). b. Doug should use model 2, because it gives a more accurate prediction of Y (the finishing time) than model 1 (R12 < R22). c. Doug should use model 1, because model 2 will give unreasonable (negative) predicted values of Y when X is large. d. Both models are problematic, because the units of X are different from the unit of Y. ANSWER: B (Because R22 > R12, the correlation between X2 and Y is stronger than that between X1 and Y. Since the assumptions of both models are satisfied, the model that has better predictive power should be used.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
∗ 10. The standardized regression slope (𝑏𝑌𝑋 ) a. may never be negative. b. may never be greater than +1.00. c. is always equal to 0. d. None of the above. ∗ ANSWER: B (The standardized regression slope 𝑏𝑌𝑋 = rXY. Therefore, it is always within the range of −1 to 1.)
11. If two individuals have the same observed score on the dependent variable Y, their residual scores will be which one of the following? a. Always be equal. b. Be equal only when they have the same score on X. c. Be equal only when the slope equals zero. d. Both b and c. ANSWER: D (ei = Yi - Yi'. When individuals have the same score on X, they will also have the same predicted value Y'. When the slope is 0, all individuals will have the same Y' regardless of their scores on X. When two individuals have the same Yi and Yi', they will also have the same ei.) 12. In simple linear regression, if rXY = .3, the proportion of variation in Y that is not predictable from X is which one of the following? a. 0.09 b. 0.3 c. 0.7 d. 0.91 e. It depends on the slope. ANSWER: D (The proportion of variation in Y that is predictable from X = R2 = (.3)2 = .09. The proportion that is not predictable = 1 − .09 = .91.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
13. Bob and Brian both used simple linear regression to predict the consumption of ice cream (ml/person) (Y) based on temperature (°F) (X). However, they used two different data sets to estimate the model: Bob's sample includes only children younger than 12 (rXY = 0.6), while Brian's sample includes only adult consumers (rXY = 0.4). Which of the following statements is always true? a. Bob and Brian will get different estimates of intercept. b. Bob and Brian will get different estimates of slope. c. Bob and Brian will get different prediction equations. d. Bob and Brian will get different R2 for their models. ANSWER: D (In simple linear regression, R2 = rXY2. Because Bob and Brian have different rXY, they will always get different R2 for their models.)
14. In the scenario described in Question 13, suppose Bob and Brian have both converted their data to z score scale and estimated regression models using the standardized scores. Which of the following statements is false? a. Bob and Brian will get different estimates of intercept. b. Bob and Brian will get different estimates of slope. c. Bob and Brian will get different prediction equations. d. Bob and Brian will get different R2 for their model. ∗ ∗ ANSWER: D (When the scores are standardized, 𝑏𝑌𝑋 = rXY, 𝑎𝑌𝑋 =0. Because ∗ Bob and Brian have different rXY, they will always get different 𝑏𝑌𝑋 and R2, but the intercepts will be zero for both models.)
15. The assumptions of the simple linear regression model do not include which one of the following? a. The errors are normally distributed. b. The errors have constant variance across different values of X. c. The errors are independent of each other. d. The errors have mean of 0 and variance of 1. ANSWER: D (See the list of assumptions for the simple linear regression model.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
16. In which of the following situations is it most appropriate to use the simple linear regression model? a. b.
c.
d.
ANSWER: D (We want to see a relatively strong linear relation between X and Y.) 17. If the slope of the estimated regression line is positive, the correlation between X and Y a. must be positive. b. must be negative. c. may be zero. d. depends on the mean and variance of X and Y. ANSWER: A (In simple linear regression, bYX and rXY always have the same sign.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
18. If rXY = 1, which of the following statements is true? a. All points will fall on the regression line. b. The regression line will have an intercept of 0. c. The regression line will have a slope of 1. d. The prediction of future observations will be 100% accurate. ANSWER: A (When rXY =1, X and Y have a perfect linear relationship, and all data points fall on the regression line. However future observation may not fall on the line.) 19. Derek is studying the relation between the selling price of a house (in dollars) (Y) and the age of the house (in years) (X). It is shown that rXY = −0.2, 𝑋̅ = 40, and 𝑌̅ = 460,000. If Derek's own house was constructed 50 years ago, then the predicted selling price of his house based on simple linear regression would be a. more than 460,000 dollars. b. less than 460,000 dollars. c. exactly 460,000 dollars. d. impossible to be determined based on the information given. ANSWER: B (If the variables are negatively correlated, then the slope would be negative and a high score on the X would predict a low score on Y.) 20. In simple linear regression, the unstandardized regression line will always pass a. at least one data point. b. at least two data points. c. the point (𝑋̅, 𝑌̅). d. the point (0,0). ANSWER: C (The regression equation can be expressed as 𝑌̅ = bYX𝑋̅ + aYX, so (𝑋̅, 𝑌̅) will always be on the regression line.) 21. Which assumption(s) involved in simple linear regression can be assessed by examining the residual plot (ei vs. Xi)? a. Independence b. Homogeneity c. Linearity d. All of the above ANSWER: D (The assumptions of independence, homogeneity of variance, and linearity can all be assessed using residual plots.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
22. If the homogeneity assumption is violated, the possible consequences include a.b iased estimates of regression coefficients. b. deflated standard error of estimates. c. larger number of Type I errors. d. nonnormal conditional distribution of Y. ANSWER: D (The regression coefficients remain unbiased, but the estimates of standard error will be larger. With larger standard errors, it is more difficult to reject H0, therefore resulting in a larger number of Type II errors.) 23. In simple linear regression, the assumption of normality states that a.t he observed scores on Y are normally distributed. b. the conditional distributions of Y are normal in shape. c.t he observed scores on X are normally distributed. d. the distributions of regression coefficients are normal in shape. e.t here are no outliers. ANSWER: B (See definition of normality.) 24. Dr. Guinea was studying the relation between the amount of caffeine intake and people's performance on a difficult task. He found out that as the amount of caffeine intake increases, the time to finish the task first decreases, and then increases. If he used the data to fit a linear regression model, which assumption would likely be violated? a. Independence b. Homogeneity c.L inearity d. Normality e.F ixed X ANSWER: C (The relation between caffeine intake and the time to finish the task is curvilinear.) 25. In a simple linear regression, if SSres = 150 and SStotal = 200, what is the proportion of variation in Y that is predictable from X? a.2 5% b. 42.86% c. 56.25% d. 75% ANSWER: D (R2 = SSres/SStotal = 150/200 = 0.75. Thus, 75% of variation in Y is predictable from X. ) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
26. Which one of the following reflects variables appropriate for a simple linear regression model? a. One categorical dependent variable and one continuous independent variable b. One continuous dependent variable and one continuous or categorical independent variable c. One continuous dependent variable and two or more continuous independent variables d. Two or more continuous dependent variables and one continuous or categorical independent variable ANSWER: B 27. Which of the following is that part of the dependent variable that is not predicted by the independent variable? a. Covariate b. Intercept c. Residual d. Slope ANSWER: C 28. The sample intercept is which of the following? Select all that apply. a. The point where the regression line crosses the Y axis b. The predicted change in Y for a one-unit change in X c. The unstandardized regression coefficient d. The value of the dependent variable when the independent variable is zero ANSWER: A and D
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 17 (Simple Linear Regression) Test Bank Short answer 1. You are given the following pairs of scores on X (Pretest score) and Y (Posttest score). X
Y
65
74
82 70 46 55 75
87 82 53 69 81
a. Find the linear regression model for predicting Y from X. b. Use the prediction model obtained to predict the value of Y for a new person who scored 80 on the pretest. ANSWER: a. Intercept a = 15.919, slope b = .892. The regression model is Yi = .892Xi + 15.919 + ei The prediction equation is Y′i = .892Xi + 15.919 b. When X = 80, Y′ = .892X + 15.919 = .892(80) + 15.919 = 87.279Y′ Procedure: Create a data set with two variables: Pretest (X), Posttest (Y). The data set should have 6 cases. 1) Go to Analyze → Regression → Linear. 2) Select Posttest to the Dependent list. Select Pretest to the Independent(s) list. 3) Click Statistics. Select Confidence interval under Regression Coefficients. Click Continue. 4) Click OK. Selected SPSS Output: Model Summary Model R R Square 1
.
.964
.930
Adjusted R Square
Std. Error of the Estimate
.912
3.627
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Coefficientsa
Model
Unstandardized Coefficients B
1
(Constant) 15.919 Pretest .892
Standardized Coefficients
Std. Error
8.173 .123
t
Sig.
Beta
.964
1.948 .123 7.268 .002
95.0% Confidence Interval for B Lower Bound
Upper Bound
−6.771 .551
38.610 1.233
a. Dependent Variable: Posttest
2. You are given the following pairs of scores on X (Percentage of students whose families are below poverty line) and Y (Percentage of students at or above proficiency level) for nine schools. X
Y
92.3 0.9 25.1 67.1 24.7 90.7
18.9 87.3 48.0 45.4 80.2 13.7
44.0 65.5 40.6
26.9 58.6 46.3
a.F ind the linear regression model for predicting Y from X. b. Use the prediction model obtained to predict the value of Y for a school that has 50% of students whose families are below the poverty line. ANSWER: a. Intercept a = 15.919, slope b = .892. The regression model is Yi = .892Xi + 15.919 + ei The prediction equation is Y′i = .892Xi + 15.919 b. When X = 80, Y′ = .892X + 15.919 = .892(80) + 15.919 = 87.279 Procedure: Create a data set with two variables: Pretest (X), Posttest (Y). The data set should have 6 cases. 1. Go to Analyze → Regression → Linear. 2. Select Posttest to the Dependent list. Select Pretest to the Independent(s) list. 3. Click Statistics. Select Confidence interval under Regression Coefficients. Click Continue. 4. Click OK. .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Selected SPSS Output: Model Summary Model R R Square 1
.964
.930
Adjusted R Square
Std. Error of the Estimate
.912
3.627
Coefficientsa
Model
Unstandardized Coefficients B
1
(Constant) 15.919 Pretest .892
Std. Error
8.173 .123
a. Dependent Variable: Posttest
.
Standardized Coefficients
t
Sig.
Beta
.964
1.948 .123 7.268 .002
95.0% Confidence Interval for B Lower Bound
Upper Bound
−6.771 .551
38.610 1.233
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
3. The prediction equation for predicting Y (the amount of ice cream in pints consumed per person) from X (temperature in Fahrenheit) is Y′ = 0.003X + 0.2. What is the observed mean for Y if X = 70 and X2 = 25? ANSWER: Because prediction equation is Y = 0.003X + 0.2, when X = 70, Y = 0.003X + 0.2 = 0.003(70) + 0.2 = 0.41.
4. You are given the following pairs of scores on X (height in inches) and Y (weight in lbs). X 66 69 72 74 72 67 66 71 70 68 72 69 73 68 68 69 69 66 62 62 64 68 63 64 62
.
Y 140 155 195 160 155 145 135 170 130 170 190 145 155 150 130 145 150 120 131 120 102 110 116 125 110
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Perform the following computations using = .05. a. The regression equation of Y predicted by X. b. Test of the significance of X as a predictor. c. Plot Y versus X. d. Compute the residuals. e. Plot residuals versus X. ANSWER: a. Intercept a = −199.139, slope b = 5.037. The prediction equation is Y′i = 5.037Xi − 199.139 b. Height is a good predictor of weight, F(1,23) = 29.262, p < .001. Additionally, the unstandardized slope (5.037) and standardized slope (.748) are statistically significantly different from 0 (t = 5.409, df = 23, p < .001); with every one inch increase in height, weight is expected to increase by 5.037 lbs. c. Plot of Y versus X.
(d) Residuals ei = Yi − Y′i. Xi 66 69 72 74 72 67 66 71 70 68 72 69 .
Yi 140 155 195 160 155 145 135 170 130 170 190 145
ei 6.705 6.594 31.484 −13.590 −8.516 6.668 1.705 11.520 −23.443 26.631 26.484 −3.406
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
73 68 68 69 69 66 62 62 64 68 63 64 62
155 150 130 145 150 120 131 120 102 110 116 125 110
−13.553 6.631 −13.369 −3.406 1.594 −13.295 17.852 6.852 −21.221 −33.369 −2.184 1.779 −3.148
(e) Plot of residuals versus X.
Procedure: Create a data set with two variables: Height (X), Weight (Y). The data set should have 25 cases. 1) Go to Analyze → Regression → Linear. 2) Select Weight to the Dependent list. Select Height to the Independent(s) list. 3) Click Statistics. Select Confidence interval under Regression Coefficients. Click Continue. 4) To save residuals for the residual plot, click Save. Select Unstandardized under Residuals. Click Continue. Click OK. 5) To plot Y versus X, go to Graphs → Legacy Dialogs → Scatter/Dot. Select Simple Scatter. Click Define. Select Weight to Y axis, and Height to X axis. Click OK. 6) To obtain the residual plot, go to Graphs → Legacy Dialogs → Scatter/Dot. Select Simple Scatter. Click Define. Select RES_1 to Y axis, and Height to X axis. Click OK. .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Selected SPSS Output: Model Summary Model R 1
R Square
.748
.560
ANOVAb Model 1
Adjusted R Square .541
Std. Error of the Estimate 16.196
Sum of Squares
df
Mean Square
F
Sig.
Regression
7676.012
1
7676.012
29.262
.000a
Residual Total
6033.348 13709.360
23 24
262.319
a. Predictors: (Constant), height b. Dependent Variable: weight
Coefficientsa
Model
Unstandardized Coefficients Std. B Error
(Constant) −199.139 63.176 height 5.037 .931 a. Dependent Variable: weight 1
.
Standardized Coefficients Beta
.748
t
95.0% Confidence Interval for B Sig. Lower Upper Bound Bound
−3.152 .004 −329.830 −68.449 5.409 .000 3.111 6.963
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
5. Dr. Watt is studying the relation between the percentage of a population who has a bachelor's degree (X) and the average income (Y) in 108 cities. After fitting a simple linear regression model, he decides to assess whether the assumptions of the model are reasonably satisfied. Below is one of the plots he uses to assess the assumptions.
a. What assumption(s) is Dr. Watt trying to assess using this plot? b. Based on the plot, is there any indication of assumption violations? If so, which assumption(s) has (have) been violated? c. What are the possible consequences of the assumption violation(s)? d. Suggest at least one solution to fix the problem. ANSWER: a. Using the residual plot, the assumptions of independence, homogeneity, and linearity can be assessed. b. The assumption of homogeneity is clearly violated. As X increases in value, the variation of residuals increases. c. If the homogeneity assumption is violated, the regression coefficients remain unbiased, but estimates of the standard errors are larger, affecting the validity of the significance tests. With larger standard errors, it is more difficult to reject H0, therefore resulting in a larger number of Type II errors. In addition, nonconstant variances may also result in the conditional distributions being nonnormal in shape. d. Solutions to deal with nonconstant variances include: transform the data to stabilize variance; use weighted least squares; use a form of robust estimation; etc.
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
Chapter 18 (Multiple Linear Regression) Test Bank Multiple choice 1. Variable 1 is to be predicted from a combination of variable 2 and one of variables 3, 4, 5, and 6. The correlations of importance are as follows: r13 = .3; r23 = .9 r14 = .4; r24 = .2 r15 = .6; r25 = .8 r16 = .7; r26 = .1 Which of the following multiple correlation coefficients will have the smallest value? a. r1.23 b. r1.24 c. r1.25 d. r1.26 ANSWER: A (Variable 3 has the smallest correlation with variable 1 and the largest with variable 2.) 2. Carol is studying the correlation of college GPA (X1) and the number of hours spent on watching TV (X2). As intelligence is expected to affect GPA, she would like to remove the influence of IQ (X3) from GPA scores when computing the correlation. This is an example of which one of the following? a. Bivariate correlation b. Partial correlation c. Semipartial correlation d. Regression correlation ANSWER: C (The correlation between GPA (X1) and the time spent on watching TV (X2) where IQ (X3) is removed from X1 only is an example of semipartial correlation.) 3. The correlation of GPA (X1) and the number of hours spent on watching TV (X2) where the influence of IQ (X3) is removed from GPA can be denoted only by a. r2.13 b. r12.3 c. r13.2 d. r1(2.3) e. r2(1.3) ANSWER: E (r2(1.3) denotes the correlation of X1 and X2 with the effect of X3 removed from X1 only.) © Taylor & Francis 2020
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
4. Suppose for the three variables, GPA (X1), time spent on watching TV (X2), and IQ (X3), the bivariate correlation coefficients are computed as follows: r12 = −0.4; r13 = 0.6; r23 = 0. If we remove the influence of IQ from GPA, the correlation of GPA and the time spent on watching TV will be a. stronger than the bivariate correlation r12. b. weaker than the bivariate correlation r12. c. the same as the bivariate correlation r12. d. uncertain. ANSWER: A (X3 is correlated with X1 but not with X2, so after the effect of X3 is removed from X1, X2 now accounts for a larger proportion of the remaining variation in X1, and the correlation between X1 and X2 becomes stronger. It can also be verified by computing r2(1.3) = −0.5.) 5. David is studying the correlation of temperature (X1) and the consumption of ice cream (X2). As he expects that the price of ice cream (X3) is correlated with both temperature and the consumption of ice cream, he would like to remove the effects of price from both X1 and X2 when computing the correlation. This is an example of which one of the following? a. Bivariate correlation b. Partial correlation c. Semipartial correlation d. Regression correlation ANSWER: B (The correlation between temperature (X1) and consumption (X2), where price (X3) is controlled for both X1 and X2, is an example of semipartial correlation.) 6. The correlation between temperature (X1) and the consumption of ice cream (X2) controlling for price of ice cream (X3) can be denoted by a. r2.13. b. r12.3. c. r13.2. d. r1(2.3). e. r2(1.3). ANSWER: B (r12.3 denotes the correlation of X1 and X2 controlling for the effect of X3.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
7. David states that the correlation of the price of ice cream and the consumption of ice cream is −0.3 when temperature is held constant. By saying "held constant," David is implying that a. he conducted an experiment in a laboratory where the temperature can be artificially controlled. b. the correlation of price and consumption of ice cream is −0.3 when the effects of temperature are removed. c. the conclusion is reached using data collected at the same temperature. d. the correlation of price and consumption of ice cream is stronger when the temperature stays the same than when the temperature varies. ANSWER: B (When temperature is "held constant," its effects on other variables are removed, and the correlations of other variables are computed as if the temperature were constant.) 8. For three variables, X1, X2, and X3, the bivariate correlation coefficients are: r12 = 0.7; r13 = 0; r23 = 0. The correlation of X1 and X2 controlling for X3 will be a. stronger than the bivariate correlation r12. b. weaker than the bivariate correlation r12. c. the same as the bivariate correlation r12. d. uncertain. ANSWER: C (If the variable being partialed out is uncorrelated with each of the other two variables, then the partialing process will logically not have any effect.)
9. The regression line for predicting selling price of houses (in $1000) (Y) from size of the house (in 1000 square feet) (X1) and number of bathrooms (X2) is found to be Y = −41.8 + 64.8X1 + 19.2X2 + ei. Which of the following statements is a correct interpretation of the equation? a. Compared to a house with only one bathroom, a house with two bathrooms will be $19,200 higher in terms of selling price. b. For two houses of the same size, the house with one additional bathroom is expected to be $19,200 higher in its selling price. c. Larger houses are expected to have higher selling prices than smaller houses, regardless of the number of bathrooms. d. To increase the selling price of his house, the home owner should build as many bathrooms as he/she can in the house. ANSWER: B (Selling price is expected to be higher for larger houses, holding the number of bathrooms constant. Also, the relation described in the model may not apply to houses with unusual size or unusual number of bathrooms.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
10. In the scenario described in Question 9, if the residual (ei) is −3.5 for a particular house, it means that a. the regression function overestimates the selling price by $3500. b. the regression function underestimates the selling price by $3500. c. the selling price is estimated to be 3.5 standard deviations below the average price. d. the regression function is not valid because the prediction is erroneous. ANSWER: A (ei = Yi - Y′i. When ei < 0, the predicted score Y′i is larger than the observed score Yi.) 11. The multiple regression model for predicting Y from X1, X2, and X3 is Yi = b1X1i + b2X2i + b3X3i + a + ei. If the bivariate correlation of Y and X1 is positive (rY1 > 0), the partial slope for X1 (b1) will be a. a positive value. b. a negative value. c. zero. d. uncertain. ANSWER: D (The partial slope may be larger than, the same as, or smaller than 0, depending on the partial correlation of Y and X1 controlling for X2 and X3.) 12. In multiple regression, if the null hypothesis, H0: 1 = 2 = 3 = 4 = 0, is rejected, it means that a. there is no linear relationship between Y and any of the independent variables. b. there is a linear relationship between Y and all four independent variables. c. there is a linear relationship between Y and at least one independent variable. d. none of the individual regression coefficients (bk) are significantly different from 0. e. all of the individual regression coefficients (bk) are significantly different from each other. ANSWER: C (Rejection of the null means that one or more of the independent variables has a significant linear relationship with Y. In other words, one or more of the individual regression coefficients will be significantly different from 0.)
.
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
13. In a multiple regression model, Y is predicted from X1, X2, and X3. Both R2 and Radj2 are computed. If X3 is removed from the model, how will R2 change? a. R2 will increase. b. R2 will decrease. c. R2 will not change. d. Uncertain. ANSWER: D (R2 will either decrease or stay the same.)
14. In a multiple regression model, Y is predicted from X1, X2, and X3. Both R2 and Radj2 are computed. If X3 is removed from the model, how will Radj2 change? a.R adj2 will increase. b. Radj2 will decrease. c.R adj2 will not change. d. Uncertain. ANSWER: D (Radj2 may increase, decrease, or stay the same.) 15. For the regression model, Yi = b1X1i + b2X2i + a + ei, consider the following two situations: Situation 1: rY1 = −0.5 rY2 = 0.8 r12 = 0.1 Situation 2: rY1 = −0.5 rY2 = 0.8 r12 = 0.3 In which of the two situations will R2 be larger? a. Situation 1. b. Situation 2. c. R2 will be the same in both situations. d. Uncertain. ANSWER: A (R2 is higher when the predictors are uncorrelated.) 16. For the regression model, Yi = b1X1i + b2X2i + a + ei, consider the following two situations: Situation 1: rY1 = −0.5 rY2 = 0.8 r12 = 0.1 Situation 2: rY1 = 0.2 rY2 = 0.8 r12 = 0.1 In which of the two situations will R2 be larger? a. Situation 1. b. Situation 2. c. R2 will be the same in both situations. d. Uncertain. ANSWER: A (R2 is higher when there is a high correlation of the predictors with the dependent variable.) .
Hahs-Vaughn, D. L. & Lomax, R. G. (2020). An introduction to statistical concepts (4th ed.). New York: Routledge/Taylor & Francis.
17. In a multiple regression, the F test for the overall model is highly significant, but none of the t values for individual predictors are significant. What is the most likely cause for this situation? a. Heterogeneous variances b. Nonindependence of residuals c. Nonnormality of residuals d. Nonlinearity relation between Y and the predictors e. Collinearity of the independent variables ANSWER: E (Multicollinearity can result in nonsignificant t values for individual predictors while the overall model is significant.) 18. All of the following are possible effects of multicollinearity except a. the standard error of the regression coefficients may be larger than expected. b. the signs of the regression coefficients may be opposite of what is expected. c. regression coefficients can be quite unstable across samples. d. R2 may be significant, yet none of the predictors are significant. e. the VIF is 0 for all predictors. ANSWER: E (Multicollinearity will result in large VIFs.) 19. Which of the following statements about R2 and Radj2 is correct? a. R2 is always larger than Radj2. b. R2 will always increase as more predictors are added to the model. c. Radj2 adjusts for the number of independent variables and sample size. d. If an additional independent variable were entered in the model, an increase in R2 indicates the new variable is adding value to the model. e. R2 and Radj2 will never be negative. ANSWER: C (When an additional predictor is added to the model, R2 will either increase or stay the same. If Radj2 increases, the new variable is adding value to the model. R2 will never be negative, but Radj2 may be negative when the model fits the data very poorly.)
.
-
20. Carol is building a multiple regression model to predict college GPA from a set of predictors. There is a theory suggesting that college GPA can be predicted by students' involvement in college when controlling for their prior achievement. Therefore, Carol first entered into the model a set of variables that measure prior achievement (e.g., high school GPA, SAT), and then added a set of variables that measure collegial involvement. Which one of the following procedures is used? a. Backward elimination b. Forward selection c. Stepwise selection d. Hierarchical regression ANSWER: C (The order of entry of predictors is determined by the researcher based on theoretical considerations.) 21. The scatterplot of X and Y are shown as follows.
Based on the plot, which model is the most appropriate to use? a. Yi = b1Xi + a + ei. b. Yi = b1Xi + b2Xi2 + a + ei. c. Yi = b1Xi2 + a + ei. d. Yi = b1Xi + b2Xi + b3Xi3 + a + ei. ANSWER: B (There is a curvilinear relation between X and Y, so a quadratic model should be applied.)
.
-
22. Which of the following situations will result in the best prediction in multiple regression analysis? a. rY1 = 0.1 rY2 = 0.4 r12 = 0.1 b. rY1 = 0.1 rY2 = 0.4 r12 = 0.8 c. rY1 = 0.6 rY2 = 0.4 r12 = 0.1 d. rY1 = 0.6 rY2 = 0.4 r12 = 0.8 ANSWER: C (Best prediction will result when there is a high correlation of the predictors with the dependent variable and low correlations among the predictors.) 23. An instructor wanted to know if the scores on pop quizzes are good predictors of the scores on the final exam. He used the following regression model, Yi = b1X1i + b2X2i + b3X3i + a + ei, where Y is the score on the final exam, X1 is the score on the first quiz, X2 is the score on the second quiz, and X3 is the average score of the two quizzes. Evaluate this model. a. The assumption of independence is violated. b. The assumption of linearity is violated. c. The assumption of noncollinearity is violated. d. There is no indication of assumption violation based on the information given. ANSWER: C (Because X3 is the average of X1 and X2, it has a perfect linear relation with X1/2 + X2/2, so perfect collinearity exists.) 24. An interaction between X1 and X2 is present in which of the following situations? a. The relationship between GPA (Y) and the time spent on watching TV (X1) is similar for students with different SAT scores (X2). b. The relationship between GPA (Y) and the time spent on watching TV (X1) is different for students with higher SAT scores versus those with lower SAT scores (X2). c. There is a strong correlation between the time spent on watching TV (X1) and SAT scores (X2). d. There is a strong correlation between GPA (Y) and the time spent on watching TV (X1) controlling for SAT scores (X2). ANSWER: B (When an interaction between X1 and X2 exists, the relationship between Y and X1 depends on the level of X2.)
.
-
25. Karen wants to use a categorical variable, levels of education, to predict annual income. There are six categories in the levels of education: High school graduate, Some college, Associate's degree, Bachelor's degree, Master's degree, Doctorate, or Professional degree. How many categories need to be dummy coded and included in the regression model as predictors? a. 4 b. 5 c. 6 d. 7 ANSWER: B (The number of dummy coded variables equals the number of categories minus one.) 26. Which one of the following reflects variables appropriate for a multiple linear regression model? a. One categorical dependent variable and one continuous independent variable b. One continuous dependent variable and one continuous or categorical independent variable c. One continuous dependent variable and two or more continuous independent variables d. Two or more continuous dependent variables and one continuous or categorical independent variable ANSWER: C 27. In a multiple linear regression with three independent variables, X1, X2, and X3, which one of the following reflects an example of a semipartial correlation? a. The correlation between X1 and X2 and X3 where both X2 and X3 are removed from X1 and X2 b. The correlation between X1 and X2 where X3 is held constant c. The correlation between X2 and X3 where X1 is partialed out d. The correlation between X1 and X2 where X3 is removed from X2 only ANSWER: D
.
-
28. Partial correlations allow for which one of the following in multiple linear regression? a. Design control b. Experiential control c. Experimental control d. Statistical control ANSWER: D
.
-
Chapter 18 (Multiple Linear Regression) Test Bank Short answer 1. You are given the following data, where X1 (Pretest score) and X2 (Hours spent in the program) are used to predict Y (Posttest score): Y
X1
X2
65
60
7.5
82 94 80 87 66
62 75 78 65 60
9.0 8.5 7.0 10.0 8.0
Determine the following values: intercept, b1, b2, SSres, SSreg, F, sres2, s(b1), s(b2), t1, t2. ANSWER: Intercept = −70.662, b1 = 1.235, b2 = 8.077. SSreg = 619.504, SSres = 44.496, F(2,3) = 20.884 (p < .017, reject at .05), s2res = 14.832. s(b1) = .227, s(b2) = 1.660, t1 = 5.437 (p = .012, reject at .05), t2 = 4.866 (p = .017, reject at .05). Procedure: Create a data set with three variables: Posttest (Y), Pretest (X1), and Hour (X2). The data set should have six cases. 1) Go to Analyze → Regression → Linear. 2) Select Posttest to the Dependent list. Select Pretest and Hour to the Independent(s) list. 3) Click OK.
.
-
Selected SPSS Output: Model Summary Model R R Square
Adjusted R Square
a
1 .966 .933 a. Predictors: (Constant), Hour, Pretest ANOVAb Model 1
Std. Error of the Estimate
.888
3.851
Sum of Squares
df
Mean Square
F
Sig.
Regression
619.504
2
309.752
20.884
.017a
Residual Total
44.496 664.000
3 5
14.832
a. Predictors: (Constant), Hour, Pretest b. Dependent Variable: Posttest
Coefficientsa Model
1
Unstandardized Coefficients B Std. Error
(Constant)
−70.662
23.229
Pretest Hour
1.235 8.077
.227 1.660
Standardized Coefficients Beta .846 .757
t
Sig.
−3.042
.056
5.437 4.866
.012 .017
a. Dependent Variable: Posttest
Results: The results of the multiple linear regression suggest that a significant proportion of the total variation in posttest scores was effectively predicted by pretest scores and hours spent in the program, F(2,3) = 20.884, p = .017. For Pretest, the unstandardized partial slope (1.235) and standardized partial slope (.846) are statistically significantly different from 0 (t = 5.437, df = 3, p = .012); with every one-point increase in pretest, posttest score will increase by 1.235 when controlling for Hour. For Hour, the unstandardized partial slope (8.077) and standardized partial slope (.757) are statistically significantly different from 0 (t = 4.866, df = 3, p = .017); with every additional hour spent in the program, posttest score is expected to increase by 8.077 when controlling for Pretest scores. Thus, Pretest and Hour were shown to be statistically significant predictors of Posttest, both individually and collectively. Multiple R2 indicates that 93.3% of the variation in Salary was predicted by Pretest and Hour. This suggests a large effect size. The intercept was −70.662, which is not statistically significantly different from 0 at the .05 level (t = −3.042, df = 3, p = .056).
.
-
2. Complete the missing information for this regression model (df = 25). Y′
=
60 (30) (2)
+
16X1 (4) () ()
+
0.4X2 (0.05) () ()
−
70X3 (10) () ()
Standard errors t ratios Significant at .05?
ANSWER: t1 = b1/s(b1) = 16/4 = 4; t2 = b2/s(b2) = .4/.05 = 8; t3 = b3/s(b3) = 70/10 = 7. The critical t value is ±/2tdf = ±t25 = ±2.06. |t1|, |t2|, |t3| > critical t, so X1, X2, and X3 are all significant predictors of Y.
Y′
=
60 (30) (2)
+
16X1 (4) (4) (yes)
+
0.4X2 (0.05) (8) (yes)
−
70X3 (10) (7) (yes)
Standard errors t ratios Significant at .05?
3. Calculate the partial correlation r12.3 and the part correlation r1(2.3) from the following bivariate correlations: r12 = .3, r13 = −.5, r23 = −.8. ANSWER: Because r12 = .3, r13 = −.5, and r23 = −.8. r12.3 =
𝑟12 −𝑟13 𝑟23 2 )(1−𝑟 2 ) √(1−𝑟13 23
𝑟 −𝑟13 𝑟23
r1(2.3) = 12
2 ) √(1−𝑟23
.
=
=
.3−(−.5)(−.8) √(.75)(.36)
.3−(−.5)(−.8) √(1−.82 )
=
= −.1925
−.1 √.36
= - .1667.
-
4. A researcher would like to predict GPA from a set of three predictor variables for a sample of 34 college students. Multiple linear regression analysis was utilized. Complete the following summary table ( = .05) for the test of significance of the overall regression model: Source Regression Residual Total
SS
df
MS
F
Critical Value
Decision
6.5 66
ANSWER: There are three independent variables, so m = 3. There are 34 students, so n = 34. dfreg = m = 3, dfres = n − m − 1 = 34 − 3 − 1 = 30, dftotal = n − 1 = 34 − 1 = 33. SSreg = MSreg*dfreg = 6.5*3 = 19.5, SSres = SStotal − SSreg = 66 − 19.5 = 46.5. MSres = SSres/dfres = 46.5/30 = 1.55 F = MSreg/MSres = 6.5/1.55 = 4.19; critical value = .05F3,30 = 2.92 < F, reject H0.
Source
SS
Df
MS
F
Critical
Decision
Value Regression Residual Total
.
19.5 46.5 66
3 30 33
6.5 1.55
4.19
2.92
Reject H0
-
5. You are given the following data, where X1 (attendance rate) and X2 (average SAT score) are to be used to predict Y (average score in graduation test). Each case represents one school.
Y 78.4 81.3 81.3 82.5 77.8 84.5 88.2 88.7 72.5 85.4 82.9 81.4
X1 93.4 94.6 95.4 91.1 91.6 94.2 94.5 93.4 92.1 94.9 94.3 94.7
X2 1010 1020 1024 1136 952 1042 1106 1004 880 1124 1124 996
Determine the following values: intercept, b1, b2, SSres, SSreg, F, sres2, s(b1), s(b2), t1, t2.
ANSWER: Intercept = −36.536, b1 = .869, b2 = .036. SSreg = 120.957, SSres = 103.365, F(2,9) = 5.266 (p = .031, reject at .05), s2res = 11.485. s(b1) = .764, s(b2) = .014, t1 = 1.138 (p = .285, fail to reject at .05), t2 = 2.608 (p = .028, reject at .05). Procedure: Create a data set with three variables: Test (Y), Attend (X1), and SAT (X2). The data set should have 12 cases. 1) Go to Analyze → Regression → Linear. 2) Select Test to the Dependent list. Select Attend and SAT to the Independent(s) list. 3) Click OK. Selected SPSS Output: Model Summary Model R R Square 1 .734a .539 a. Predictors: (Constant), SAT, Attend
.
Adjusted R Square
Std. Error of the Estimate
.437
3.3890
-
ANOVAb Model 1
Sum of Squares
df
Mean Square
F
Sig.
Regression
120.957
2
60.479
5.266
.031a
Residual Total
103.365 224.323
9 11
11.485
a. Predictors: (Constant), SAT, Attend b. Dependent Variable: Test
Coefficientsa Model
1
Unstandardized Coefficients B Std. Error
(Constant)
−36.536
69.023
Attend SAT
.869 .036
.764 .014
Standardized Coefficients Beta .268 .614
t
Sig.
−.529
.609
1.138 2.608
.285 .028
a. Dependent Variable: Test
Results: The results of the multiple linear regression suggest that a significant proportion of the total variation in graduation test scores was effectively predicted by attendance rate and SAT scores, F(2,9) = 5.266, p = .031. For Attend, the unstandardized partial slope (.869) and standardized partial slope (.268) are not statistically significantly different from 0 (t = 1.138, df = 9, p = .285). For SAT, the unstandardized partial slope (.036) and standardized partial slope (.614) are statistically significantly different from 0 (t = 2.608, df = 9, p = .028); with every point increase in SAT score, the graduation test score is expected to increase by .036 controlling for attendance rate. Multiple R2 indicates that 53.9% of the variation in graduation test scores was predicted by attendance rate and SAT score. This suggests a large effect size. The intercept was −36.536, which is not statistically significantly different from 0 at .05 level (t = −.529, df = 9, p = .609).
.
-
Chapter 19 (Logistic Regression) Test Bank Multiple choice 1. In which of the following situations can binary logistic regression be used? a. Kate wants to examine if gender (0 = male, 1 = female) can predict the aggression level of teenagers. She uses the number of self-reported aggressive behaviors in the previous month as the measure of aggression level. b. Amy wants to use aggression level to predict whether a student will finish high school (0 = no, 1 = yes). The aggression level is measured by the occurrence of self-reported aggressive behaviors in the previous month. c. Pete wants to predict the starting salary of teachers based on their highest level of education (Associate, Bachelor, Master, PhD, or professional). d. Mike wants to use parents' level of education to predict the highest level of education achieved by their children. He codes the level of education into six categories: some school, high school graduate, some college, two-year college, four-year college, and postgraduate. ANSWER: B (The dependent variable in binary logistic regression must be dichotomous, i.e., a variable that has two categories.) 2. Which one of the following can be used as an appropriate dependent variable for binary logistic regression? a. Dichotomous variable b. Multinomial variable c. Continuous variable d. None of the above e. All of the above ANSWER: B (The outcome for binary logistic regression must have only two categories.) 3. Which one of the following can be used as an appropriate independent variable for binary logistic regression? a. Dichotomous variable b. Multinomial variable c. Continuous variable d. None of the above e. All of the above ANSWER: E (When properly measured and coded, both categorical variables and continuous variables can be used as independent variables in logistic regression.) © Taylor & Francis 2020
-
4. Which one of the following statements is true about OLS regression and logistic regression? a. OLS regression uses continuous variables as independent variables, whereas logistic regression uses categorical variables as independent variables. b. OLS regression assumes that the conditional distribution of the dependent variable is normal, whereas logistic regression makes no such assumption. c. In OLS regression the predicted value (Y′) is continuous, whereas in logistic regression the predicted probability is either 0 or 1. d. In OLS regression there are always residuals in the model, whereas in logistic regression there are no residuals or errors. ANSWER: B (Logistic regression does not assume either the dependent variable or the log odds of the dependent variable to be normal. The predicted probability in logistic regression is a value between 0 and 1. The model also includes error.) 5. Which of the following would be appropriate outcomes to examine with binary logistic regression? a. Types of college degree (associate's degree, bachelor's degree) b. Levels of education (high school, college, postgraduate) c. Marital status (single, common-law married, married, divorced, separated, widowed) d. Grades in a test (A, B, C, D, F) e. All of the above ANSWER: A (The outcome variable must contain only two categories.) 6. Based on a logistic regression model, the odds of Sandy passing a test is 4. Based on the odds, what is the probability that Sandy will pass the test? a. 0.8 b. 0.4 c. 0.25 d. 0.2 e. Cannot be determined. ANSWER: A (Odds of passing = probability of passing/probability of failing. The two probabilities add up to 1. Therefore, the probability of passing = 4/5 = 0.8.)
.
-
7. Based on a logistic regression model, Cindy did some calculation and predicted that the probability of her passing a test is 0.5. What are the odds that Sue will pass the test? a. 1/2 b. 1 c. 2 d. 5 e. Cannot be determined. ANSWER: B (When the probability of passing = 0.5, the probability of failing = 1 − 0.5 = 0.5, so the odds of passing is 1.) 8. Which of the following statements about the relationship between Odds(Y = 1) and Logit(Y) is false? a. When the odds increase, the logit of Y increases. b. When the odds are larger than 1, the logit of Y has positive value. c. When the odds are smaller than 1, the logit of Y has negative value. d. When the odds are 1, the logit of Y may be either positive or negative. ANSWER: D (When the odds are 1, the logit of Y is zero.) 9. In the logistic regression model, which of the following is assumed to have a linear relationship with the independent variables? a. The dichotomous variable Y b. Odds(Y = 1) c. Logit(Y), i.e., the log odds ratio d. Error term ANSWER: C (The dependent variable in the linear model is the log odds of the dichotomous variable Y, i.e., logit(Y).)
.
-
Questions 10–12 are based on the following scenario. A study was conducted to investigate variables associated with dropping out of high school. The following logistic regression model was obtained: Logit(Yi) = 3.5 − 1.3X1 + 2.3X2. Y: 1 = dropped out of high school; 0 = did not drop out of high school; X1: cumulative high school GPA obtained; X2: 1 = retained in at least one grade; 0 = never retained in any grade. 10. What is being predicted in this model? a. The mean difference in cumulative GPA between students who dropped out of high school and those who finished high school. b. The percentage of students who will drop out before graduating high school. c. The odds that a student will drop out of high school. d. The odds that a student had been retained in at least one grade if he dropped out of high school. ANSWER: C (Logistic regression predicts the odds that a unit of analysis belongs to one of two groups defined by the dependent variable. In this case, it predicts the odds that a student belongs to the group "drop out of high school.") 11. Based on logistic regression, if a student has been retained in at least one grade, the chance that he/she will drop out of high school a. increases. b. decreases. c. stays the same. d. is uncertain. ANSWER: A (The regression coefficient for X2 is positive, indicating that if a student has been retained in at least one grade (X2 = 1), the log odds of dropping out of high school will increase.) 12. If Mindy has a high school GPA of 3, and has never repeated a grade, which of the following predictions can be derived from the model? a. Mindy has more than 50% probability of dropping out of high school. b. Mindy has less than 50% probability of dropping out of high school. c. Mindy has exactly 50% probability of dropping out of high school. d. Mindy will drop out of high school. e. Mindy will not drop out of high school. ANSWER: B (For Mindy, X1 = 3, X2 = 0, logit(Y) = 3.7 − 1.3(3) < 0. That means the odds of Mindy dropping out of high school are smaller than 1, so Mindy has less than 50% probability of dropping out.)
.
-
13. In logistic regression, which of the following statements about probability, odds, and log odds is true? a. Probability, odds, and log odds can all be computed from the regression line. b. Probability, odds, and log odds all have the same range of possible values. c. Probability, odds, and log odds are all assumed to have a linear relationship with the independent variables. d. Probability, odds, and log odds are all measured on the same scale. ANSWER: A (Probability ranges from 0 to 1, odds can be zero or any positive value, and log odds ranges from negative infinity to positive infinity. Only log odds is assumed to have linear relationships with the predictors in the model. ) 14. Aaron is studying smoking behavior and has coded "smoker" as "1" and "nonsmoker" as "0." Which of the following is a correct interpretation if the odds ratio is equal to 1? a. The probability of being a smoker is equal to that of being a nonsmoker. b. The probability of being a smoker is substantially greater than that of being a nonsmoker. c. The probability of being a smoker is substantially smaller than that of being a nonsmoker. d. Cannot be determined from the information provided. ANSWER: A (When the odds ratio is equal to 1, it is equally likely to fall into either of the two categories or groups.) 15. For a logistic regression model, if −2LL = 0, it indicates that a. the model has perfect fit but the model cannot be estimated. b. the model is a poor fit to the data. c. the value of the log likelihood function is 0. d. a computational error has been made. ANSWER: A (−2LL ranges from 0 to positive infinity. When −2LL = 0, the value of the log likelihood function is 1 and the dependent variable is perfectly predicted. That will result in an inability to estimate the model.)
.
-
16. Herbert is studying the risk factors associated with heart diseases. He identified three risk factors (age, sex, and cholesterol level), and built two different models. Model 1: Logit(Yi) = −7 + 2.5X1 − X2. −2LL = 3 Model 2: Logit(Yi) = −8.5 + 1.5X3 −2LL = 8 Y: 1 = diagnosed with major heart disease; 0 = no major heart disease; X1: age in years (above 40); X2: sex, where 0 is male and 1 is female; X3: cholesterol level (in mmol/L) Herbert conducted a log likelihood difference test (p = .025) and concluded that Model 1 fits the data significantly better than Model 2. Evaluate his analysis. a. The analysis is not valid. The two models cannot be compared using the log likelihood difference test because they are not nested models. b. The interpretation of the test results is erroneous. The fact that the null hypothesis is rejected indicates that Model 2 fits the data better than Model 1. c. The interpretation of the test results is erroneous. The fact that null hypothesis is rejected indicates that Model 1 and Model 2 do not differ substantially in fit. d. There is nothing wrong with the analysis. ANSWER: A (The likelihood difference test assumes that one model is nested within another, i.e., all elements that are included in the simpler model must also be included in the more complex model.) 17. If the logistic regression model is a good fit to the data, which of the following test(s) will likely have significant results? a. Log likelihood difference test b. Hosmer-Lemeshow goodness-of-fit test c. Both a and b d. Neither a nor b ANSWER: A (In the Hosmer-Lemeshow goodness-of-fit test, a statistically significant result indicates poor model fit.)
.
-
18. In logistic regression, the multiple R2 pseudo-variance explained values a. measure the proportion of variance explained in the dichotomous outcome variable. b. measure the effect size for the model. c. will increase as more variables are added to the model. d. are generally small for models with good fit. ANSWER: B (R2 in logistic regression is a measure of effect size, but it does not measure the proportion of variance explained in the dichotomous outcome. For models with good fit, pseudo R2 will be reasonably large.) 19. Aaron is studying smoking behavior and has coded "smoker" as "1" and "nonsmoker" as "0." The predictor is the number of family members who smoke. Which of the following is a correct interpretation of an odds ratio of +2? a. For every additional family member who smokes, the odds of being a smoker increase by 100%. b. For every additional family member who smokes, the odds of being a smoker decrease by 100%. c. For every one-unit increase in being a smoker, the odds of having a family member who smoke increase by 100%. d. For every one-unit increase in being a smoker, the odds of having a family member who smoke decrease by 100%. ANSWER: A (The number of family members who smoke is the independent variable. The positive sign of the odds ratio indicates that the odds of being a smoker increase as the independent variable increases in value.)
.
-
In the smoking study, Aaron has obtained the following classification table. Answer Questions 19–21 based on the information provide in the table.
Classification Table Predicted Observed
Smoke
Smoke .00
1.00 Overall Percentage
.00 60
1.00 15
13
12
Percentage Correct 80.0 48.0 72.0
20. What is the false positive rate? a. 20% b. 28% c. 48% d. 52% ANSWER: A (Of the 75 nonsmokers (observed value = 0), 15 were incorrectly classified as smokers (predicted value = 1). Thus, the false positive rate is 15/75, or 20%.) 21. What is the false negative rate? a. 20% b. 28% c. 48% d. 52% ANSWER: D (Of the 25 smokers (observed value = 1), 13 were incorrectly classified as nonsmokers (predicted value = 0). Thus, the false negative rate is 13/25, or 52%.)
.
-
22. If a person is predicted to be a smoker, we would expect that a. the person has a good chance of actually being a smoker, because the model has an adequate overall predictive accuracy. b. the person has a good chance of actually being a smoker, because the model has high specificity. c. the person may or may not be a smoker, because the model has low sensitivity. d. the person may or may not be a smoker, because the false positive rate is high. ANSWER: C (The model has a false negative rate of 52% and inadequate sensitivity. In fact, among the 27 people who were predicted to be smokers, only 12 of them are actually smokers.) 23. In the likelihood ratio test of the overall regression model, if the null hypothesis, H0: 1 = 2 =... = m = 0, is rejected, it means that a. the baseline model fits the data well. b. the baseline model has better fit than the regression model. c. the baseline model fits the data as well as the regression model. d. the regression model has better fit than the baseline model. ANSWER: D (In the likelihood ratio test, a significant result indicates that the more complex model has better fit than the baseline model.) 24. Which of the following is NOT a statistic that can be used to evaluate individual regression coefficients for logistic regression models? a. Cox and Snell R squared b. 2 in the change in log likelihood test c. Wald statistic d. BIC ANSWER: A (Cox and Snell R squared is a measure of overall model fit. The log likelihood test can be used to test both the overall model and the individual coefficients.)
.
-
25. A logistic regression model is estimated to be logit(Yi) = −40 + 5X1, where X1 is a continuous variable. Which assumption does not need to be examined? a. Noncollinearity b. Independence c. Linearity d. All of the above assumptions need to be examined. ANSWER: A (The assumption of noncollinearity needs to be examined only when there is more than one predictor variable.) 26. In logistic regression, the assumption of linearity does not need to be examined in which of the following situations? a. When the dependent variable is categorical. b. When all the independent variables are categorical. c. When all the independent variables are continuous. d. When some of the independent variables are categorical, and some of them are continuous. e. The assumption of linearity always needs to be examined. ANSWER: B (When all the independent variables are categorical, we do not need to examine the assumption of linearity.) 27. The odds ratio is computed by which of the following? a. 𝑏 𝑒 b. 𝑏 −1/2 c. 𝑒 𝑏𝑘 d. 𝑒 𝑟𝑏 ANSWER: C 28. Which one of the following can occur when the number of variables equals, or nearly equals, the number of cases in the data? a. Extremely small regression coefficients and standard errors. b. The dependent variable is perfectly predicted. c. The maximum likelihood estimator reduces to zero. d. The outcome is constant for one or more categories of a nominal independent variable. ANSWER: C
.
-
Chapter 19 Test Bank Short answer 1. Complete the missing information for this table (Y is a dichotomous variable). P(Y = 1) 0.10 0.25 0.40 0.20 0.90 0.75 0.60
P(Y = 0)
Odds(Y = 1)
ANSWER: P(Y = 0) = 1 − P(Y = 1). Odds(Y = 1) = P(Y = 1)/P(Y = 0) The complete table can be obtained as follows. P(Y = 1) 0.10 0.25 0.40 0.20 0.90 0.75 0.60
.
1 − P(Y = 1) 0.90 0.75 0.60 0.80 0.10 0.25 0.40
Odds(Y = 1) 0.11 0.33 0.67 0.25 9.00 3.00 1.50
-
2. You are given the following data, where X1 (high school cumulative GPA) and X2 (having repeated grade; 0 = never repeated any grade and 1 = have repeated at least one grade; use 0 as the reference category) are used to predict Y (dropping out of high school, "1," vs. graduating high school, "0"). ( = .05) X1 2.50 2.60 2.75 1.33 3.00 3.42 2.70 2.33 1.75 2.80
X2 1 0 0 1 1 0 1 1 0 0
Y 0 0 0 1 0 0 1 1 1 0
Determine the following values based on simultaneous entry of the independent variables: −2LL, constant, b1, b2, se(b1), se(b2), odds ratios, Wald1, Wald2. ANSWER: −2LL = 5.048; b1(GPA) = −6.617, b2(Repeat) = 3.204, bconstant = 14.123; se(b1(GPA)) = 4.308, se(b2(Repeat)) = 3.387; odds ratio1(GPA) = .001, odds ratio2(Repeat) = 24.631; Wald1(GPA) = 2.359, Wald2(Athletics) = .895. Procedure: Create a data set with three variables: GPA (X1), Repeat (X2), and Dropout (Y). The data set should have 10 cases. Step 1: Go to Analyze → Regression → Binary Logistic. Step 2: Select Dropout to the Dependent box. Select GPA and Repeat to the Covariate(s) list. Step 3: Click Categorical. Move Repeat into the Categorical Covariates box. Select Reference Category: First. Click Change. Click Continue. Step 4: Click Save. Check Probabilities and Group membership under Predicted Values. Check Standardized under Residuals. Check Cook's, Leverage values, and DfBeta(s) under Influences. Click Continue. Step 5: Click Options. Check Classification plots, Hosmer-Lemeshow goodness-of fit, Casewise listing of residuals, and CI for exp(B) under Statistics and Plots. Click Continue. Click OK.
.
-
Selected SPSS output: Model Summary Step −2 Log likelihood 1
5.048
Hosmer and Lemeshow Test Step Chi-square 1
Cox & Snell R Square Nagelkerke R Square
a
4.288
.569
df
Sig.
8
.830
.769
Variables in the Equation B GPA −6.617 Step Repeat(1) 3.204 1a Constant 14.123
S.E.
Wald df
Sig.
Exp(B)
4.308 3.387
2.359 1 .895 1
.125 .344
.001 24.631
9.936
2.020 1
.155 1359604.313
a. Variable(s) entered on step 1: GPA, Repeat.
.
95% C.I. for EXP(B) Lower
Upper
.000 .032
6.209 18817.875
-
3. Complete the missing information for Table 1, using 0.50 as the cut value. Then complete the classification table (Table 2). Compute sensitivity, specificity, false positive rate, and false negative rate.
Table 1. Observed group membership 1 1 0 1 0 1 1 0 0 1
Predicted Probability 0.88 0.72 0.62 0.49 0.34 0.40 0.60 0.21 0.05 0.57
Predicted group membership
Table 2. Predicted .00 1.00 Observed
.00 1.00
ANSWER: Assuming 0.5 is the cut value, cases with predicted probabilities at .5 or above are predicted as 1 and predicted probabilities below .5 are predicted as 0. There are four cases with observed value 1 and predicted value 1. There are three cases with observed value 0 and predicted value 0. There is one case with observed value 0 yet predicted value 1 (false positive). There are two cases with observed value 1 yet predicted value 0 (false negative). Sensitivity = 4/(2+4) = 0.67 = 67% Specificity = 3/(3+1) = 0.75 = 75% False positive rate = 1/(3+1) = 0.25 − 25% False negative rate = 2/(2+4) = 0.33 = 33%
.
-
Table 1. Observed group membership 1 1 0 1 0 1 1 0 0 1
Predicted Probability 0.88 0.72 0.62 0.49 0.34 0.40 0.60 0.21 0.05 0.57
Predicted group membership 1 1 1 0 0 0 1 0 0 1
Table 2. Predicted .00 1.00 Observed
.
.00
3
1
1.00
2
4
-
4. You are given the following data, where X1 (sex; male = 0, female =1; use 0 as the reference category) and X2 (having at least one immediate family member who smokes; yes = 1, no = 0; use 0 as the reference category) are used to predict Y (being a smoker = 1 vs. being a nonsmoker = 0). ( = .05) X1 0 0 0 0 1 1 1 1 1 1
X2 0 0 1 1 0 0 0 0 1 1
Y 1 0 1 1 0 0 1 0 1 0
Determine the following values based on simultaneous entry of independent variables: −2LL, constant, b1, b2, se(b1), se(b2), odds ratios, Wald1, Wald2.
ANSWER: −2LL = 10.688; b1(Sex) = 1.792, b2(Family) = 1.792, bconstant = −1.386; se(b1(Sex)) = 1.571, se(b2(Family)) = 1.571; odds ratio1(Sex) = 6.000, odds ratio2(Family) = 6.000; Wald1(Sex) = 1.301, Wald2(Family) = 1.301. Procedure: Create a data set with three variables: Sex (X1), Family (X2), and Smoke (Y). The data set should have 10 cases. Follow the steps described in Question 2. Use Smoke as the dependent variable, and Sex and Family as the covariates. In step 3, move both Sex and Family into the Categorical Covariates box. Select Reference Category: First. Click Change.
.
-
Selected SPSS Output: Model Summary Step −2 Log likelihood
Cox & Snell R Square Nagelkerke R Square
a
1
10.688
Hosmer and Lemeshow Test Step Chi-square 1
.451
.272
df
Sig.
2
.798
.363
Variables in the Equation B
S.E.
Wald df Sig. Exp(B)
Sex(1) 1.792 1.571 1.301 Step Family(1) 1.792 1.571 1.301 1a Constant −1.386 1.160 1.428 a. Variable(s) entered on step 1: Sex, Family.
1 1
.254 .254
6.000 6.000
1
.232
.250
95% C.I. for EXP(B) Lower
Upper
.276 .276
130.426 130.426
5. Professor Pruefung wanted to examine if performance in quizzes can predict whether a student will pass or fail the final exam. The independent variables are scores in two pop quizzes (Quiz1, Quiz2), and the dependent variable is a dichotomous variable (pass = 1 vs. fail = 0). Below is part of the output of the analysis. a. Professor Pruefung assumed that the better a student performed in the quizzes (a higher score indicates better performance), the higher the odds that he/she will pass the final exam. If that is the case, what are the expected signs for b1 and b2? Do the results confirm the expectation? b. Based on the tables, is there any indication of assumptions violation? If so, which assumption(s) has (have) been violated? c. What are the possible consequences of the assumption violation? Omnibus Tests of Model Coefficients Step Step Block 1 Model
Model Summary
Chi-square
df
Sig.
24.055 24.055
2 2
.000 .000
24.055
2
.000
Step
−2 Log Cox & Snell Nagelkerke likelihood R Square R Square
1
22.998
.452
.653
Variables in the Equation
Step 1 .
B
S.E.
Wald
df
Sig.
Exp(B)
Quiz1 Quiz2
1.557 −.535
1.064 1.023
2.140 .273
1 1
.143 .601
4.745 .586
Constant
−21.721
8.990
5.838
1
.016
.000
-
Coefficientsa Model
1
Unstandardized Coefficients B
Std. Error
(Constant) Quiz1
−1.417 .184
.633 .098
Quiz2
−.089
.105
Standardized Coefficients
t
Sig.
Beta
Collinearity Statistics Tolerance
VIF
1.196
−2.237 1.877
.031 .068
.036
27.532
−.540
−.847
.402
.036
27.532
a. Dependent Variable: Pass
ANSWER: a. The signs of b1 and b2 should be positive. However, the sign of b2 is negative. b. The assumption of noncollinearity has been violated. The overall model is statistically significant and has a good fit (−2LL = 22.998, Cox and Snell R2 = .452, Nagelkerke R2 = .653), but neither of the individual regression coefficients are significant. This suggests that the two independent variables may be highly correlated. Further examination of model assumptions indicates that there is serious collinearity problem in the model (tolerance = 0.036, VIF = 27.532). c. Collinearity results in inflated standard errors of the regression coefficients, making it more difficult to achieve statistical significance. As a result, it will lead to instability of the regression coefficients across samples, where the estimates will bounce around quite a bit in terms of magnitude, and even occasionally result in changes in sign (perhaps opposite of expectation, as observed in this example). Another result is that whereas the overall regression is significant, none of the individual predictors are significant (as observed in this example). Violation will also restrict the utility and generalizability of the estimated regression model.
.
-
Chapter 20 (Mediation and Moderation) Test Bank Multiple choice 1. A researcher is examining the relationship between science and mathematics performance. The researcher believes that there may be an indirect effect of science on mathematics through literacy. Which of the following types of models would you recommend the researcher examine? a. Mediation b. Moderation c. Neither d. Both ANSWER: A 2. A researcher is examining the relationship between weight and blood pressure. The researcher believes that there may be an indirect effect of weight and blood pressure through stress. Which of the following types of models would you recommend the researcher examine? a. Mediation b. Moderation c. Neither d. Both ANSWER: A 3. A researcher is examining the relationship between science and mathematics performance. The researcher believes that science may interact with literacy. Which of the following types of models would you recommend the researcher examine? a. Mediation b. Moderation c. Neither d. Both ANSWER: B 4. A researcher is examining the relationship between time on task and academic performance. The researcher believes that time on task may interact with age. Which of the following types of models would you recommend the researcher examine? a. Mediation b. Moderation c. Neither d. Both ANSWER: B
.
-
5. A researcher has conducted a moderated multiple regression analysis and finds f2 of .03. Based on Cohen's guidelines, what is the interpretation of the effect? a. Small b. Moderate c. Large d. None of the above ANSWER: A 6. A researcher has conducted a moderated multiple regression analysis and finds f2 of .16. Based on Cohen's guidelines, what is the interpretation of the effect? a. Small b. Moderate c. Large d. None of the above ANSWER: B 7. Which of the following is NOT an assumption for moderated multiple regression? a. Expected frequencies b. Homoscedasticity c. Linearity d. Normality ANSWER: A 8. What is one of the challenges with the Johnson-Neyman approach for probing an interaction? a. Interactions cannot be proved with this approach. b. Regions of significance cannot be determined with this approach. c. This approach can be used only with continuous moderating variables. d. Violations of homogeneity of regression cannot be handled with this approach. ANSWER: C 9. In a mediation model, a positive sign for the direct effect indicates which of the following? a. One unit higher on X is estimated to be higher on Y. b. One unit higher on X is estimated to be lower on Y. c. One unit higher on X is estimated to be the same as Y. d. Impossible to determine with this information. ANSWER: A
.
-
10. In a mediation model, a negative sign for the direct effect indicates which of the following? a. One unit higher on X is estimated to be higher on Y. b. One unit higher on X is estimated to be lower on Y. c. One unit higher on X is estimated to be the same as Y. d. Impossible to determine with this information. ANSWER: B
11. Which of the following represents an indirect effect in a simple mediation model? a. The dependent variable influences the mediating variable, which then influences the independent variable. b. The independent variable influences the mediating variable, which then influences the dependent variable. c. The independent variable influences the dependent variable, which then influences the mediating variable. d. The mediating variable simultaneously influences the independent and dependent variable. ANSWER: B 12. Which of the following is an interpretation of the direct effect of the independent variable on the dependent variable? a. How much two cases that differ by one unit on the dependent variable will differ on the independent variable. b. How much two cases that differ by one unit on the independent variable will differ on the mediator. c. How much two cases that differ by one unit on the independent variable will differ on the dependent variable. d. Two cases that differ by one unit on the mediator but are equal on the independent variable will differ by b units on the dependent variable. ANSWER: B 13. True or false? In a moderation model where X is the independent variable and W is the moderating variable, the inclusion of the interaction term (XW) is testing the effect of X on the dependent variable is conditional on W. a. True b. False ANSWER: A
.
-
14. Which one of the following is not a way to examine an interaction? a. Graphical methods b. Johnson-Neyman technique c. Pick-a-point approach d. All of the above e. None of the above ANSWER: D 15. Which of the following will be evident in a full mediation model? a. The relationship between the independent and dependent variable will completely disappear b. The relationship between the independent and dependent variable becomes stronger ANSWER: A 16. Which of the following may impact power in moderated multiple regression? a. Measurement error b. Sample size c. The variance of the predictor is smaller in the population. d. All of the above e. None of the above ANSWER: D 17. The underlying framework for mediation is to understand which one of the following: a. How the mediator relates to the moderator. b. The way in which the independent variable relates to the dependent variable. c. Which independent variable is related to the dependent variable. d. Which dependent variable produces statistically significant results. ANSWER: B 18. The direct effect in a simple mediation model is denoted by which one of the following? a. a b. b c. 𝑐 ′ d. e ANSWER: C
.
19. In a moderated regression model, the interaction term reflects which one of the following? a. The dependent variable controlling for the moderating variable b. The product of the dependent variable and independent variable c. The product of the independent variable and moderating variable d. The sum of the regression coefficients ANSWER: C 20. The conditional effect of the independent variable on the dependent variable when the moderating variable is zero is interpreted in which one of the following ways? a. The difference in X for two cases that differ by one on W but differ by zero on 𝑌𝑖′ b. The difference in W for two cases that differ by one on X but differ by zero on 𝑌𝑖′ c. The difference in 𝑌𝑖′ for two cases that differ by one on W but differ by zero on X d. The difference in 𝑌𝑖′ for two cases that differ by one on X but differ by zero on W ANSWER: D 21. Testing that the effect of the independent variable (X) on the dependent variable (Y) is dependent on the moderating variable (W) is conducted by including which term in the moderation model? a. X b. XY c. W d. WY e. XW ANSWER: E 22. Which one of the following is needed to determine whether the relationship between the independent and dependent variables systematically varies as a function of the moderator? a. Graph of the interaction b. Probe for an interaction c. Test for an interaction d. Visualization of the XY relationship ANSWER: C
.
23. Which one of the following is NOT an effect size that is recommended to avoid in reporting mediation results? a. Kappa squared b. Partially standardized total effect c. Ratio of the indirect effect to the direct effect d. The proportion of variance in the dependent variable that is explained by the indirect effect ANSWER: B 24. The assumptions for moderation are the same as which of the following procedures? a. Binary logistic regression b. Multinomial logistic regression c. Multiple linear regression d. Simple linear regression ANSWER: C 25. The assumptions for mediation are the same as which of the following procedures? a. Binary logistic regression b. Multinomial logistic regression c. Multiple linear regression d. Simple linear regression ANSWER: C 26. Moderation occurs in which of the following situations? a. The effect of the dependent variable on the moderating variable can be predicted by the independent variable. b. The effect of the independent variable on the dependent variable can be predicted by the moderating variable. c. The interaction of the independent and dependent variables can be predicted by the moderating variable. d. The moderating variable can be predicted by the effect of the independent variable on the dependent variable. ANSWER: B 27. Which one of the following represents moderation? a. Exponentiation of the regression coefficient b. Interactions in factorial ANOVA c. Random effects with a completely crossed design d. Repeated measures within persons ANSWER: B
.
28. Which one of the following reflects partial mediation? a. Some relationship between X and Y remains, and that relationship becomes stronger. b. Some relationship between X and Y remains, but that relationship is smaller in magnitude. c. The relationship between X and Y completely disappears. d. The relationship between X and Y exponentiates. ANSWER: B
.
Chapter 20 (Mediation and Moderation) Test Bank Short answer 1. A researcher conducts a simple mediated regression to determine the extent to which reading performance mediates the relationship between time spent on homework and science performance. The researcher finds the following results. Indicate what type of mediation it is (full or partial) and interpret the coefficients for X and M. OUTCOME VARIABLE: SCIENCE Model Summary R
R-sq
MSE
F(HC4)
df1
df2
p
.6468
.4184
78.5197
309.8067
2.0000
1207.0000
.0000
Model coeff
se(HC4)
t
p
LLCI
ULCI
constant
19.9463
2.9626
6.7327
.0000
14.1339
25.7588
HOMEWORK
.0008
.0248
.0311
.9752
-.0479
.0494
READING
.3849
.0155
24.8780
.0000
.3545
.4152
Standardized coefficients coeff HOMEWORK
.0008
READING
.6468
ANSWER: Reading fully mediates the relationship between homework and science performance. This is shown as the observed probability for homework is not statistically significant (p = .9752) but is statistically significant for reading (p < .001). The coefficient for homework (X) tells us that two students that differ by one unit on homework but are equal on reading performance are estimated to differ by .0008 units on science. The coefficient for reading (M) tells us that two people who are equal on homework (X) but that differ by one unit on reading are estimated to differ by .3849 on science.
.
2. A researcher conducts a simple mediated regression to determine the extent to which reading performance mediates the relationship between time spent on homework and science performance. The researcher finds the following results. Interpret the partially standardized total effect and partially standardized direct effect.
************** TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y **************
Total effect of X on Y Effect
se(HC4)
t
p
LLCI
ULCI
c_ps
c_cs
.0212
.0304
.6971
.4859
-.0385
.0809
.0018
.0225
Direct effect of X on Y Effect
se(HC4)
t
p
LLCI
ULCI
c'_ps
c'_cs
.0008
.0248
.0311
.9752
-.0479
.0494
.0001
.0008
ANSWER: The partially standardized total effect suggests that two cases that differ by one unit on homework (X) will differ by about .0018 standard deviation on science (Y) as a result of the combined direct and indirect effects by which homework affects science. The partially standardized direct effect suggests that, independent of the mediating effect of reading, a student that is one unit higher on homework will be about .0001 standard deviations different on science.
.
3. A researcher computes a moderated regression model and finds the following. Describe what the interaction term represents (e.g., which terms are interacting), and interpret the interaction. ************************************************************************** Model
: 1
Y
: MATH
X
: SCIENCE
W
: READING
Sample Size:
1210
************************************************************************** OUTCOME VARIABLE: MATH
Model Summary R
R-sq
MSE
F(HC4)
df1
df2
p
.7050
.4971
135.7397
323.8703
3.0000
1206.0000
.0000
Model coeff
se(HC4)
t
p
LLCI
ULCI
constant
31.1829
31.4492
.9915
.3216
-30.5183
92.8840
SCIENCE
.9186
.3579
2.5668
.0104
.2165
1.6207
READING
.2803
.1730
1.6205
.1054
-.0591
.6197
Int_1
-.0010
.0019
-.5006
.6168
-.0048
.0028
Product terms key: Int_1
:
SCIENCE
x
READING
Test(s) of highest order unconditional interaction(s):
X*W
R2-chng
F(HC4)
df1
df2
p
.0003
.2506
1.0000
1206.0000
.6168
---------Focal predict: SCIENCE
(X)
Mod var: READING
(W)
ANSWER: The interaction represents the product of science and reading and indicates how the effect of science (X) on the dependent variable, math, changes as reading (W, the moderator) changes by one unit. More specifically, as reading increases by one unit, the dependent variable decreases by .0010 as science increases by one unit. .
4. A researcher conducts a moderated regression with W as the moderator. Interpret the conditional effects presented here.
Conditional effects of the focal predictor at values of the moderator(s):
W
Effect
se(HC4)
t
p
LLCI
ULCI
1.5000
.5236
.0342
15.3307
.0000
.4566
.5906
2.0000
.4373
.0269
16.2711
.0000
.3846
.4901
2.5000
.3511
.0381
9.2266
.0000
.2765
.4258
ANSWER: The first row represents the effect of the independent variable on the dependent variable conditioned on the moderator, W being low (16th percentile, or one standard deviation below the mean). This is an effect of .5236. The second row represents the effect of X on Y conditioned on W being moderate (50th percentile). This is an effect of .4373. The third row represents the effect of the independent variable on the dependent variable conditioned on the moderator, W being high (84th percentile, or one standard deviation above the mean). This is an effect of .3511.
.
5. A researcher conducts a moderated regression with W as the moderator. Interpret the results from probing the interaction using the Johnson-Neyman technique. Moderator value(s) defining Johnson-Neyman significance region(s): Value
% below
% above
3.5824
99.6694
.3306
Conditional effect of focal predictor at values of the moderator: W
Effect
se(HC4)
t
p
LLCI
ULCI
1.0000
.6098
.0527
11.5659
.0000
.5064
.7132
1.1440
.5850
.0469
12.4851
.0000
.4930
.6769
1.2880
.5601
.0413
13.5548
.0000
.4791
.6412
1.4320
.5353
.0363
14.7500
.0000
.4641
.6065
1.5760
.5105
.0320
15.9559
.0000
.4477
.5732
1.7200
.4856
.0288
16.8876
.0000
.4292
.5421
1.8640
.4608
.0270
17.0854
.0000
.4079
.5137
2.0080
.4360
.0269
16.1929
.0000
.3831
.4888
2.1520
.4111
.0286
14.3632
.0000
.3550
.4673
2.2960
.3863
.0318
12.1502
.0000
.3239
.4487
2.4400
.3615
.0360
10.0278
.0000
.2907
.4322
2.5840
.3366
.0410
8.2011
.0000
.2561
.4172
2.7280
.3118
.0466
6.6974
.0000
.2205
.4031
2.8720
.2870
.0524
5.4752
.0000
.1841
.3898
3.0160
.2621
.0585
4.4800
.0000
.1473
.3769
3.1600
.2373
.0648
3.6628
.0003
.1102
.3644
3.3040
.2125
.0712
2.9845
.0029
.0728
.3521
3.4480
.1876
.0777
2.4151
.0159
.0352
.3400
3.5824
.1644
.0838
1.9619
.0500
.0000
.3289
3.5920
.1628
.0843
1.9320
.0536
-.0025
.3281
3.7360
.1380
.0909
1.5178
.1293
-.0404
.3163
3.8800
.1131
.0976
1.1593
.2465
-.0783
.3045
ANSWER: The relationship between the independent and dependent variable is statistically significant when the moderating variable is greater than 3.5824.
.