An Adventure in Statistics: The Reality Enigma Chapter 1: Why you need science 1. The research process involves a series of steps to systematically reach conclusions about a phenomenon of interest. We observe that there are differences in the salaries of men and women for doing the same job, what is the population of interest that we would need to collect information on to carry out our research? a) All individuals aged 18 and over in a population b) Male workers earning more than the average salary c) All individuals who are at the legal age of working d) All individuals who have worked at least 1 hour in the last week Ans: D Explanation: we are interested in comparing the salaries of men and women who participate in the labour market 2. Which statement best describes a theory? a) An approach to explain a phenomenon in the real world b) A set of principles that describe individuals’ behaviours c) A set of principles specific to the observation or situation of interest d) A general principle that applies to all populations Ans: A Explanation: a theory is a principle or set of principle that we use to explain an observed situation or event (i.e., a phenomenon) that include but is not limited to individuals’ behaviours
3. For the following statements, indicate which ones are non-scientific: a) Friendship bonds are unbreakable b) Dogs are better than cats c) Films are better enjoyed with popcorn Ans: B a) Scientific: we can measure the length of any friendship bond, which might or might not last forever b) Non-scientific: individuals’ preferences for dogs or cats can be measured; however, we cannot have an objective measure of which animal is best. c) Scientific: we can measure the level of satisfaction of the same people watching a film with and without popcorn and find out whether there are differences in their level of enjoyment.
4. When do we need to draw a random sample? a) When we have information from all entities of interest b) When the population is large enough to be representative c) When the population is too big to collect data for all the entities of interest d) When we need information about entities that are representative of the population
1
Ans: C Explanation: for large populations is not feasible to collect data for all entities (e.g., individuals), so we need to draw a random sample that ensures it is representative of the population 5. We draw 5 samples of 30 young adults aged between 18 and 25 to conduct some research on the number of concerts young adults go to within a year. We find the following averages: 6, 14, 7, 10 and 9. What can we conclude from the given information? a) The sampling variation would get larger with an additional 5 samples b) Most samples overestimate the true parameter of the population c) The sampling error is largest for the sample with an average statistic of 6 concerts a year d) The best estimate we have of the population parameter is the average of these 5 sample statistics Ans: D Explanation: In the absence of knowing the true value of the mean of the population from which have drawn our samples, the best estimate is the mean of our samples means. 6. The assertion that a variable measured in natural conditions causes an outcome to vary is possible under: a) A longitudinal research design b) A cross-sectional research design c) An experimental research design d) An exploratory research design Ans: D Explanation: we could establish a relationship between two variables that have not been manipulated by the researcher (natural conditions). Also, and provided we use a longitudinal research design, we could determine a time sequence between the cause occurring prior to the effect. However, without the possibility of using an experimental design, we would need to rule out any other confounding variables as potentially being the cause.
7. When the results of an experiment can be applied to real-world conditions, that experiment is said to have: a) Criterion validity b) Ecological validity c) Content validity d) Factorial validity Ans: B Explanation: For a research study to possess ecological validity, the methods, materials and setting of the study must approximate the real-life situation that is under investigation 8. Imagine that we find an association between people drowning in a lake and consumption of ice cream. What is the correct conclusion we can draw from this association?
2
a) The higher intake of ice cream positively affects the chances of drowning b) A tertium quid variable ‘month of the year’ is influencing both variables c) We can conclude a cause-effect between ice-cream consumption and drowning d) The association between intake of ice cream and drowning in a lake is weak Ans: B Explanation: The summer season triggers more bathers into lakes, which increases the risk of drowning, while the heat in summer induces more people to consume ice-creams. 9. A variable manipulated by a researcher is known as: a) An independent variable b) A dependent variable c) A confounding variable d) A discrete variable Ans: A Explanation: An independent variable (or predictor variable) is a variable that is thought to be the cause of some effect. This term is usually used in experimental research to denote a variable that the experimenter has manipulated. 10. The purpose of a control condition is to: a) Show up relationships between predictor variables. b) Control for participant characteristics. c) Rule out a tertium quid. d) Allow inferences about a cause. Ans: D Explanation: A properly constructed control condition provides you with a reference point to determine what change (if any) occurred when a variable was modified. 11. In a study interested in comparing the effects of owning or not a cat on single adults’ emotional well-being, what statement about the variation is correct? a) The systematic variation will be bigger than unsystematic variation b) The systematic and unsystematic will be about the same c) The unsystematic variation will be greater than systematic variation d) In this kind of study design the variation is not an issue Ans: C Explanation: In independent designs in which we contrast two groups, with or without a cat, we have narrower scope to control for noise, thus we have more variability that is uncontrolled. 12. If we want to tackle potential problems of variation in a study using a within-subject design, an efficient strategy would be to: a) Assign participants of an experiment to random conditions
3
b) Assign participants with similar characteristics to a different order of tasks c) Set at least three different conditions d) Set a randomized order to complete tasks Ans: D Explanation: in repeated-measured designs, counterbalancing is implemented to eliminate sources of systematic variation; Latin squares design assign individuals a different order to the conditions established in an experiment. The most efficient way is by randomly assigning a different order to all the subjects.
13. Controlling for compositional characteristics in a randomized experiment is important because: a) It allows for measuring the amount of systematic and unsystematic variation b) We can determine the effect of the independent variable c) We can be certain that differences within entities are due to systematic variation d) It allows for avoiding problems with ethical issues Ans: B Explanation: If we control for the differences in, for instance, sex, age, social class, artistic skills, we reduce the noise from unsystematic variation, and, thus, we can be more certain about the effect of the manipulation of the cause (i.e., the manipulation of the independent variable) on the outcome. 14. In general, as the sample size (N) increases: a) The confidence interval gets wider. b) The confidence interval gets narrower. c) The confidence interval is unaffected. d) The confidence interval becomes less accurate. Ans: B Explanation: with a larger sample size, the sample resembles more closely to the population of interest which allows for a better estimation of the range of values that the true value of the population is likely to fall.
15. Confidence intervals: a) Can be used instead of conventional statistics based on point estimates. b) Are not frequently used in research articles because they can mislead the reader. c) Are constructed using subjective evaluations of confidence. d) None of these options are correct. Ans: A Explanation: the point estimate sits in the middle of the confidence interval. 16. An interval estimate represents: a) The size of the effect of the computed statistic b) A range of values containing the population value
4
c) A range of values indicating the sampling variability d) A range of values within which the population value is expected to fall Ans: D Explanation: interval estimates set the limits within which the true value of the population is likely to be found. Chapter 2: Reporting research, variables and measurement 1. What is a scientific journal? a) A piece of scientific research that has not yet been published. b) A collection of articles written by scientists that have been peer-reviewed. c) A notebook kept by scientists containing important details of all their own experimental research for future reference. d) A collection of articles written by scientists that have not yet been reviewed by other scientists in the field.
Ans: A Explanation: Scientific journals contain articles that have been peer-reviewed, in an attempt to ensure that articles meet the journal’s standards of quality and scientific validity.
2. Which of the following statements is correct about working out the result of an equation? a) We should resolve multiplications after divisions b) The last order in the calculations is always the subtraction c) The order of multiplication or division depends on their position in the equation d) The exponent takes precedence over the parenthesis
Ans: C Explanation: multiplications and divisions are resolved as encountered first when going from left to right. 3. Short questions: a) what represents y6? 2
b) What represents x2 ? c) What represents x?
Ans: a) The outcome for person 6 b) The score of the entity (e.g., a person) 2 to the power of 2 c) A predictor variable 4. In what circumstances is indicated to use reverse phrasing in a questionnaire? a) In the need of constructing nominal scales b) A construct with several items ordinally scored
5
c) To obtain balanced scores in summing up different items d) With constructs that are likely to yield high scores
Ans: B Explanation: in questionnaires with several questions related to a construct, adding reverse-phasing is a method to avoid response bias. 5. Read the following list of variables and decide which ones can be directly observed and which are constructs: a) Religion b) Happiness c) Size of the industry sector d) Confidence
Ans: a) Variable b) Construct c) Variable d) Construct 6. Which level of measurement applies to the following examples of the disposable income of households? Code
Description
1
<10,000
2
10,001–20,000
3
20,001–30,000
4
30,001–40,000
5
>40,001
a) Interval b) Ordinal c) Nominal d) Ratio
Ans: B Explanation: the categories of income have an order (i.e., an explicit hierarchy) ranging from less than 10,000 to over 40,001 7. A variable with a true value of 0 is considered: a) A variable that provides a valid scale of measurement b) A more reliable variable of the thing measured c) A variable that can take the absence of the thing measured d) An interval scale variable
6
Ans: C Explanation: a ratio variable has a meaningful 0 point that indicates the absence of the property being measured. 8. The discrepancy between the numbers used to represent something that we are trying to measure and the actual value of what we are measuring is called: a) Variance b) The ‘fit’ of the model c) Reliability d) Measurement error
Ans: D Explanation: This is because, it’s one thing to measure variables, but it’s another thing to measure them accurately. 9. Of the following variables, which ones would you need to operationalize? a) Emotional wellbeing b) Hours worked per week c) Academic achievement d) Water consumption
Ans: A Explanation: only emotional wellbeing is a construct that is not directly measurable. We can use a series of questions to approximate to measure it.
10. What scale of measurement is a variable with hierarchical categories? a) Ordinal b) Nominal c) Interval d) Binary
Ans: A Explanation: an ordinal variable as ordered categories, but it does not require that the intervals between categories are equal such is the case for interval or ratio scale.
11. Imagine we are interested in doing research on the importance of family in the life of people (10 being very important and 1 being not important at all). We interview 20 people asking them to rate attitudes towards family. The average we obtain is 8.5 in the importance of family in people’s life. What can we conclude about this variable? a) The average can lead to a wrong conclusion about people’s importance of family b) It is a discrete variable treated as a continuous one c) The difference between categories is not equal
7
d) Someone scoring 4.25 attributes half the importance about family than the average
Ans: B Explanation: self-reported data measured in scales, often from 1 to 10, are treated as continuous. The interval between categories is not of equal magnitude. 12. In a study about the deficiency of Vitamin D in young children and its relationship with growth, we obtained data for 100 children over 5 years. What is expected to happen to the measurement error? a) It would increase over time b) I would decrease over time c) It would stay the same d) It would be larger for measuring vitamin D than growth
Ans: C Explanation: taking measures of Vitamin D and growth should not be affected by who takes the measurement or where it is taken. Thus, we would expect no changes in the measurement error over time.
13. In a study on the mental health status of mothers of children younger than 10, the variable wellbeing is measured using a 12-item questionnaire. Some mothers are interviewed in January and others in July. How should we ensure to measure the variable accurately? a) We should make sure to take measures of all mothers at these two times points b) We should use different methods to record answers on mental health at each time point c) We should include women with no children d) We should form a control group by age of the mother at each time point
Ans: A Explanation: time of the year can affect moods; by interviewing mothers at these two-time points we would be able to decrease measurement error as we control for factors other than mental health. 14. Imagine we are comparing subjective appraisal of happiness between two countries with distinct cultural and religious values. What type of validity should we be concerned about? a) Face validity b) Criterion validity c) Predictive validity d) Content validity
Ans: B Explanation: criterion validity is concerned about the degree to which interpretations and descriptions based on test scores or other measures match behaviour. In two cultural distinct settings, we would need to make sure interpretations of happiness match behaviour in a similar fashion. 15. A failure in a test to account for content validity means:
8
a) The test does not accurately capture an analogous dimension b) The test is not related to some outcome in the real world c) The test fails to account for all relevant dimensions of the construct d) The test is a mismatch between observed behaviours and reported attitudes
Ans: C Explanation: content validity is whether the items in a test or instrument include all relevant and adequate dimension that makes up the domain. 16. Reliability of a measure is expected to be higher if: a) The measure is tested under different environments b) The measure is recorded several times across different populations c) The measure is recorded with the same instrument d) The measure randomly uses different items each time
Ans: C Explanation: using the same tools or instruments to measure a test would lead to higher chances of obtaining similar results. 17. Imagine we have conducted a study with a group of individuals aged between 15 and 17 on their preferences of three types of music (rock, jazz and kid’s music) 10 years ago and now. Some differences are observed in the scores of the type of music they like. What can we conclude about this change? a) The instrument to measure their musical preferences is not reliable b) The changes in the scores fall within the expected c) The validity of the answers 10 years ago has changed over time d) We don’t have enough information to assess the reliability of the instruments
Ans: B Explanation: it is expected that preferences of music at the age of 5 and 15 experience a large transformation.
Chapter 3: Summarizing data 1. Using the information provided in the table about the self-rated general health of a sample of adults, answer the following questions: a) What is the frequency of having ‘fair’ health? b) What is the relative frequency of ‘very good’ health? c) What is the value of N?
9
Ans: Short answers a) 5,436 b) 35.2% c) 37,700 2. What can we conclude about the left-right political ideology given the table below? Left-right scale
Frequency
Percent
Cumulative Percent
0–Left
56
2.5
2.5
One
34
1.5
4.1
Two
118
5.4
9.4
Three
209
9.5
18.9
Four
211
9.6
28.5
Five
769
34.9
63.4
Six
201
9.1
72.5
Seven
222
10.1
82.6
Eight
108
4.9
87.5
Nine
21
1
88.4
10–Right
47
2.1
90.6
Refusal
15
0.7
91.2
Don’t know
193
8.8
100
Total
2204
100
a) The largest percentage of people are right-winged b) Less than a quarter situates themselves between totally left and four points in the left-right scale c) There are as many people that situate themselves in the right scale as in the left scale d) More than a third situate themselves neither left nor right
Ans: D Explanation: The percentage at five on the scale is 34.9%, which is a little over a third.
3. Which one of the following statements is accurate about frequency tables?
10
a) They can only be used for ordinal or interval variables b) They exclude cells with 0 counts c) The sum of relative frequencies sums up to 1 d) They cannot include ranges of values
Ans: C Explanation: relative frequencies are the proportion of cases divided by the total number of cases. The sum of all these proportions will always be 1.
4. In a table showing a frequency distribution we find that
å f = 25, what can we conclude about it?
a) The average number of participants is 25 b) The total sum of the scores is equal to 25 c) The rows of the variable contain information for 25 entities d) The sum of relative frequencies is 25
Ans: C Explanation: it is the total sum of the frequencies in each of the rows of the frequency table.
5. Using the following table of frequencies, work through the questions below. Table: Life satisfaction by sex of respondent Male
Female
Total
Completely dissatisfied
533
884
1417
Mostly dissatisfied
1146
1573
2719
Somewhat dissatisfied
2176
2924
5100
Neither Sat nor Dissat
1373
1740
3113
Somewhat satisfied
2622
3155
5777
Mostly satisfied
6806
8080
14886
Completely satisfied
2106
2551
4657
Total
16762
20907
37669
a) What is the relative frequency of males ‘mostly satisfied’ with their life? b) Calculate the cumulative percentage of both males and females up to somewhat dissatisfied. c) What is the percentage of females who are completely dissatisfied?
Ans: Short answers: a) 0.41 b) 24.5% c) 4.2%
6. What of the following statements is correct about class interval widths? a) The first class interval should start in 0 b) Class intervals must have equal width
11
c) Class intervals Include decimal points d) The lowest score might not be the first score in the data
Ans: D Explanation: the first interval needs to include the lowest score, although the first digit of the interval can be a lower number
7. Given the frequency distribution of the self-assessment’ of happiness, answer the following questions: Frequency
Per cent
Extremely unhappy
29
0.78
1
28
0.75
2
55
1.48
3
106
2.85
4
130
3.49
5
405
10.89
6
331
8.90
7
663
17.82
8
997
26.80
9
591
15.89
Extremely happy
385
10.35
Total
3720
100
a) What is the class width of the frequency table? b) What is the raw class interval width if we want 3 categories? c) Given we have recoded the variable into three categories (‘not happy’, ‘about neutral’ and ‘happy’), what would be the cumulative percentage of the group ‘not happy’?
Ans: Short answers: a) The difference between the upper and lower boundaries or limits that equals to 11 b) 3.66 c) The first 4 categories have a cumulative percentage equal to 5.86%
8. What does not idealized frequency distributions show? a) The distribution of class intervals of a grouped variable b) The probability of a score occurring given a density function c) The deviation or not from a bell-shaped distribution d) The description of the pattern of a variable
Ans: A Explanation: Once we have grouped values into class intervals, we can no longer see the distribution of values within each class interval.
12
9. A frequency distribution in which low scores are most frequent (i.e., bars on the graph are highest on the left-hand side) is said to be: a) Leptokurtic b) Platykurtic c) Positively skewed d) Negatively skewed
Ans: C Explanation: In a positively skewed distribution the frequent scores are clustered at the lower end and the tail points towards the higher or more positive scores.
10. A frequency distribution in which high scores are most frequent (i.e., bars on the graph are highest on the right-hand side) is said to be: a) Negatively skewed b) Positively skewed c) Leptokurtic d) Platykurtic
Ans: A Explanation: In a negatively skewed distribution the frequent scores are clustered at the higher end and the tail points towards the lower or more negative scores. 11. Looking at the table below, which of the following statements is the most accurate? (Hint: The further the values of skewness and kurtosis are from zero, the more likely it is that the data are not normally distributed.)
a) For the number of hours spent practising, the data are fairly positively skewed. b) For the level of musical skill, the data are heavily negatively skewed.
13
c) For the number of hours spent practising, there is an issue with kurtosis. d) For the number of hours spent practising, there is not an issue with kurtosis.
Ans: D Explanation: The value for the kurtosis is .098, a very small value close to 0.
12. When we have a distribution of a variable that is negatively skewed, we know that: a) The mean is greater than the median. b) The mean is lower than the median. c) The mode is the same as the median d) The mode is greater than the median
Ans: B Explanation: negatively skewed frequent scores are clustered at the higher end.
13. A kurtosis value of –2.89 indicates: (Hint: Positive values of kurtosis indicate too many scores in the tails of the distribution and that the distribution is too peaked, whereas negative values indicate too few scores in the tails and that the distribution is quite flat). a) There is a mistake in your calculation. b) A pointy and heavy-tailed distribution c) A flat and heavy-tailed distribution d) A flat and light-tailed distribution
Ans: A Explanation: The further the value is from zero, the more likely it is that the data are not normally distributed.
14. What does the graph below indicate about the normality of our data?
a) The histogram reveals that the data deviate substantially from normal.
14
b) The histogram reveals that the data are more or less normal. c) The histogram reveals that the data have multivariate normality. d) We cannot infer anything about the normality of our data from this graph.
Ans: B Explanation: The graph is fairly symmetrical, with the characteristic bell-shaped curve. 15. Is it possible to calculate the skewness of a set of numerical scores? a) Yes. b) Only if you have used an independent-measures design. c) No. d) Only if you have a large sample size.
Ans: A Explanation: For a set of numerical scores we can calculate the mean, median and standard deviation, and, therefore, it is possible to calculate the skewness of the distribution. 16. What of the following statements is accurate about the graph below?
a) The kurtosis is large b) It is negatively skewed c) It is positively skewed d) It is platykurtic
Ans: C Explanation: A long tail of the extreme values indicates a positively skewed distribution Chapter 4: Fitting models (central tendency) 1. What is a residual? a) The amount of excess probability of error above a pre-set alpha level in hypothesis testing. b) The degree of error that we are prepared to accept in making a conclusion. c) It is a mistake in creating data, either through poor specification, measurement, coding or analysis.
15
d) The difference between observed data and a model of those data.
Ans: D Explanation: It is the error of our estimation and the data point that we have observed.
2. Compared with a large sample, what can we expect that would happen to the error if we have a small sample? a) It would increase b) It would decrease c) Stay about the same d) Sample size and error are not related
Ans: A Explanation: With small samples, the chance of underestimating or overestimating the parameter of interest will increase, and as a result, the error will be higher.
3. What are variables? a) Variables estimate the centre of the distribution. b) Variables are estimated from the data and are (usually) constants believed to represent some fundamental truth about the relations in the model. c) Variables are measured constructs that vary across entities in the sample. d) Variables estimate the relationship between two parameters.
Ans: C 4. What are the parameters? a) Parameters are measured constructs that vary across entities in the sample. b) A parameter tells us about how well the mean represents the sample data. c) Parameters are estimated from the data and are (usually) constructs believed to represent some fundamental truth about the relations between variables in the model. d) All of the options describe parameters.
Ans: C 5. We are given an equation with a statistical model to predict how often people change their bath towels. Given the following equation bathtoweli = b0 + b1 X i + errori .Answer the following questions: a) How many parameters are estimating the outcome? b) What is the meaning of Xi? c) What is the meaning of the subscript 0 in the b ?.
Ans: Short answers:
16
a) 2 b) The score of the predictor variable of the ith entity c) The estimate of the outcome in a model with zero other predictor variables 6. In which of the following is the median lie?
a) Up to 5 b) 11–15 c) 6–10 d) 21–25
Ans: B Explanation: The cumulative percentage for the 11–15 group is over 50%.
7. Which measure of central tendency can be used with any level of measurement? a) Arithmetic mean b) Median c) Interquartile range d) Mode
Ans: D Explanation: The mode is the most frequent score in a distribution.
8. Using the following information, work through the questions below. Below is a frequency distribution from www.amazon.co.uk of a CD called Some Loud Thunder by an artist called ‘Clap Your Hands Say Yeah’ (13 customer reviews): 5 star:
(26)
4 star:
(5)
3 star:
(2)
2 star:
(0)
1 star:
(1)
17
Work out the following using the above data. a) Calculate the mean of these data. (Give your answer to two decimal places. b) What is the mode of these data? c) What is the median of these data? d) What would be our estimate of the standard deviation in the population? (Give your answer to two decimal places.)
Ans: Short answers: a) 0.728 b) 4 c) 0 d) Negatively skewed
9. We have the following scores (0–100) for the performance of the ‘Phantom of the Opera’ at La Scala in Milan: 43, 40, 67, 63, 68, 54, 60, 69, 51, 40. What of the following statements is correct about the effect of adding a score of 88 in the distribution of scores? a) The mean will be a poorer fit of the data b) The model will underestimate the scores c) Only the scores above the mean will be affected d) The model will be a better fit for the data
Ans: A Explanation: Outliers have a strong effect on the mean, and as a consequence, it tampers with the fit of the model. 10. Participants rated their mood score out of 20 before and after listening to Reign in Blood by the thrash metal band Slayer. Before Listening to Slayer
After Listening to Slayer
5
14
8
5
9
17
4
18
3
8
15
19
12
14
6
16
What are the degrees of freedom for this study? a) 8 b) 7 c) 16 d) 15
18
Ans: B Explanation: When the same participants have been used, the degrees of freedom are the sample size minus 1.
11. A ______ is a numerical characteristic of a sample and a ______ is a numerical characteristic of a population. a) distribution, variable b) variable, distribution c) parameter, statistic d) statistic, parameter
Ans: D 12. I have a positively skewed distribution. What statistic is the most appropriate measure of central tendency: a) The mode b) The median c) The mean of the mean, median and mode d) The mean divided by the standard deviation
Ans: B Explanation: In a positively skewed distribution large positive values will greatly affect the mean. Therefore, we need to rely on the mode as it is the value that occurs most frequently. 13. Complete the following sentence: A small standard deviation (relative to the value of the mean itself) (Hint: The standard deviation is a measure of the dispersion or spread of data around the mean.) a) Indicates that the data points are distant from the mean. b) Indicates that the mean is a poor fit of the data. c) Indicates that data points are close to the mean (i.e., the mean is a good fit of the data). d) Indicates that you should analyze your data with a non-parametric test.
Ans: C Explanation: The standard deviation indicates how to spread the scores are from the mean, a small standard deviation means the scores are closer to the mean.
14. A class of 10 students score the following scores on an exam: 24, 27, 30, 25, 32, 28, 28, 23, 30, 28. The interquartile range is equal to: a) 9 b) 5.5 c) 7 d) 5
19
Ans: D Explanation: The Interquartile range is the difference between the upper quartile and lower quartile. 15. Complete the following sentence: A large standard deviation (relative to the value of the mean itself) (Hint: The standard deviation is a measure of the dispersion or spread of data around the mean) a) Indicates that the data points are distant from the mean (i.e., the mean is a poor fit of the data). b) Indicates that the data points are close to the mean. c) Indicates that the mean is a good fit for the data. d) Indicates that you should analyze your data with a parametric test.
Ans: A Explanation: The standard deviation is an indicator of how well the model parameter such as the mean fits the data. A good fit for the data will yield a small standard deviation.
16. Suppose we have a dataset in which the mean is equal to 10 and the standard deviation is equal to 3. We now add to each value 8 units to create a new dataset. Which of the following statements about the mean and standard deviation in the new dataset are true: a) The mean = 10 and the standard deviation is 3. b) The mean = 12 and the standard deviation is 11. c) The mean = 18 and the standard deviation is 3. d) The mean = 18 and the standard deviation is 11.
Ans: C Explanation: The addition of the same change in the units to all the values in distribution changes the mean by the same number, but it does not affect the standard deviation.
17. A random sample of five observations, for two variables X and Y, have been collected. The values for the data are: X: 4, 5, 6, 8, 12 Y: 8, 7, 5, 4, 1 What is the sample standard deviation of random variable X is (to two decimal places)? a) 3.16 b) 2.55 c) 10.00 d) 1.41
Ans: A
20
å (x - x ) n
i =1
2
i
N -1 (4 - 7) 2 + (5 - 7) 2 + (6 - 7) 2 + (8 - 7) 2 + (12 - 7) 2 9 + 4 + 1 + 1 + 25 = = = 5 -1 4
40 = 4
10 = 3.16
Note: we could have divided by N as it is only a small sample with 5 observations. The standard deviation would then be 2.83 Chapter 5: Presenting Data 1. Which form of graphical presentation is the most appropriate for showing how many hours a week retired people watched TV last month? a) Pie chart b) Scatterplot c) Bar chart d) Boxplot Ans: D Explanation: For continuous variables such as the number of hours worked per week, a boxplot is the most adequate graphic representation of the distribution of scores. 2. In presenting a table, which of the following is NOT a necessary component? a) Sample size b) Missing cases c) Raw number of cases for each cell d) Source of the data Ans: C Explanation: The percentages and total count should be displayed, but there is no need for the raw number of cases for each cell. 3. In the following graph we have the distribution of religious convictions of a sample of members in a country. What should we do to make it an academic graph?
21
a) We should include the percentage in each category b) We should remove the patterns in the bars c) We should include confidence intervals around the mean d) We should change the scale in the y-axis Ans: B Explanation: It contains unnecessary pattern shading and can lead to biased judgments of the data being presented. 4. A scatterplot shows: a) Scores on one variable are plotted against scores on a second variable. b) The frequency with which values appear in the data. c) The average value of groups of data. d) The proportion of data falling into different categories. Ans: A Scores on one variable are plotted against scores on a second variable. 5. What is the graph below known as?
22
a) Stacked bar chart b) Stacked histogram c) Frequency polygon d) Population pyramid Ans: D Explanation: Population pyramid graphs show the distribution (traditionally of ages across a population) divided by a criterion. In the current example, the distribution of cigarettes by the type of treatment. 6. Imagine we took a group of smokers, recorded how many cigarettes they smoked each day and then split them randomly into one of two 6-week interventions; ‘hypnosis’ or ‘nicotine patch’. After the 6 weeks, we again recorded how many cigarettes they smoked each day and subtracted this number from the number of cigarettes they each smoked pre-intervention, to produce an intervention success score for each participant. Out of the following options, which would be the best method of displaying the results? a) A simple boxplot with the variable ‘intervention method’ on the x-axis and ‘intervention success’ on the y-axis. b) A simple bar chart with the variable ‘intervention method’ on the y-axis and ‘intervention success’ on the x-axis. c) A simple boxplot with the variable ‘intervention method’ on the y-axis and ‘intervention success’ on the x-axis. d) A clustered boxplot with ‘intervention success’ on the y-axis and ‘intervention method’ on the x-axis Ans: A
23
Explanation: This option is used when you want to plot a boxplot of a single variable (in this case success), but you want different boxplots produced for different categories in the data (for these success data we could produce separate boxplots for our two intervention groups). 7. Looking at the graph below, which of the following statements are correct? (Hint: Look at the bars – are they in the same direction?)
a) On average, the nicotine intervention was less successful in those who wanted to quit smoking than in those who did not want to quit, whereas the hypnosis intervention was more successful in those who did not want to quit smoking than in those who did b) Overall, the nicotine intervention was equally successful at reducing the number of cigarettes smoked per day compared to the hypnosis intervention. c) On average, for those who wanted to quit smoking, the nicotine patches reduced the number of cigarettes smoked per day, whereas hypnosis actually increased the number of cigarettes smoked per day. d) On average, for those who wanted to quit smoking, the nicotine patches increased the number of cigarettes smoked per day, whereas hypnosis actually reduced the number of cigarettes smoked per day. Ans: C Explanation: For those who wanted to quit smoking (green box), the nicotine intervention has a positive value (above 0) indicating a reduction in the number of cigarettes smoked. 8. Which of the following statements best describes the graph below?
24
a) The graph shows that for those who used nicotine patches there is a fairly normal distribution, whereas those who used hypnosis show a skewed distribution, where a very small proportion of people (relative to those using nicotine) smoke more than two cigarettes per day. b) The graph looks fairly symmetrical. This indicates that both groups had a similar spread of scores before the intervention. c) The graph shows that for those who used hypnosis there is a fairly normal distribution, whereas those who used nicotine patches show a skewed distribution, where a very large proportion of people (relative to those using nicotine) smoke less than four cigarettes per day. d) The graph looks fairly unsymmetrical, indicating that the two groups are from different populations. Ans: B Explanation: A population pyramid can be a very good way to visualize differences in distributions in different groups. 9. Approximately what is the mean success score for those who wanted to quit in the hypnosis group?
25
a) 1.00 b) –1.00 c) 0.00 d) The graph does not display the mean. Ans: D Explanation: Boxplots show the median, but not the mean. 10. What can we say about the graph below?
26
a) There is a negative relationship between the number of cigarettes smoked per day before the intervention and the number of cigarettes smoked after the intervention b) The participants who smoked the most cigarettes per day before the intervention, smoked the fewest cigarettes per day after the intervention. c) There is a positive relationship between the number of cigarettes smoked per day before the intervention and the number of cigarettes smoked after the intervention. d) There is no relationship between the two variables. Ans: C Explanation: The scatterplot indicates that the more cigarettes people smoked before the intervention, the more cigarettes they smoked after the intervention, which makes sense. 11. A study was done to investigate the effect of ‘motivation to quit’ on the success rate of a new intervention developed to reduce the number of cigarettes smoked per day in a group of smokers. Looking at the graph below, what can we say about the relationship between motivation to quit and the success rate of the intervention?
27
a) The medians were the same in people who wanted to quit smoking and those that didn’t. b) We can’t say anything about the success of the intervention because the graph does not take into account the number of cigarettes smoked per day pre-intervention. c) Whether a person wanted to quit smoking had no effect on the success of the smoking intervention. d) There was the same number of people who wanted to quit smoking as those who didn’t. Ans: B Explanation: The mean number of cigarettes per day post-intervention is about the same for those who want and do not want to quit. 12. In IBM SPSS, the following graph is known as a:
28
a) Summary point plot b) Simple scatterplot c) Scatterplot matrix d) Grouped scatterplot Ans: D Explanation: Group scatterplots display the distribution of scores of Y on X for a third variable that groups them such as gender. 13. We took a sample of children who had been learning to play a musical instrument for five years. We measured the number of hours they spent practising each week and assessed their musical skill by how many of eight increasingly difficult exams they had passed. We also asked them whether their parents forced them to practise or not (were their parents pushy?). What does the following graph show?
29
a) The more time spent practising, the more musically skilled the children were and this relationship was stronger for those who had pushy parents compared to those who did not. b) Practice causes better exam performance. c) Children with pushy parents always passed more grade exams than those without. d) The more time spent practising, the more musically skilled the children were, and this relationship was stronger for children who did not have pushy parents than for those who did. Ans: A Explanation: The slope of the regression line of children with pushy parents (green) is steeper than the blue line. 14. The graph below shows the mean success rate of cutting down on smoking (positive score = success) in people who wanted to quit and people who did not want to quit. Which of the following statements is the truest?
30
a) On average, people who wanted to quit were 25 times more successful than those who did not. b) On average, success was six times higher in people who wanted to quit than in those who did not. c) The effect in the population is likely to be the same for those who did and did not want to quit. d) The average success was significantly higher in people who wanted to quit. Ans: C Explanation: This is correct because the confidence intervals almost entirely overlap. Chapter 6: z-scores 1. Which of the following statements is correct about standardizing scores? a) It allows for comparisons between different distributions b) It changes the shape of the distribution c) A score of 1 indicates that the individual has a score at the mean of the group d) z-scores only take positive values Ans: A Explanation: z-scores indicate the position of a score in distribution relative to the mean with a particular standard deviation. Scores from different distributions will be comparable once we have transformed them into z-scores. 2. In relative terms, if your score is above the mean, you would do comparatively better in distribution with: a) A large standard deviation b) A sample with at least 100 individuals c) A small standard deviation
31
d) An absence of extreme values Ans: C Explanation: With distributions with a small standard deviation a score above the mean will be relatively better off than compared to the other scores. 3. Which of the following statements is correct about the effect of dividing by the standard deviation when transforming scores into z-scores? a) The mean changes to 0 b) The width of the distribution becomes narrower c) The mean score changes its position d) The scores change their position relative to the mean Ans: D Explanation: Dividing by the standard deviation allows for any distribution of scores to be measured in standard scores, that is the relative distance of a score to the mean measured in standard deviation units. 4. The scores on a test on musical abilities have a mean of 26 and a standard deviation of 4, what is the z-score for Martha’s score of 18? a) -2 b) 2 c) 11 d) -1.41 Ans: A Explanation: The z scores are calculated by subtracting the mean from the score (18–26) = –8 and dividing your answer by the standard deviation (–8 / 4) = –2 5. In a similar test on musical abilities using other measurement instruments, Kate obtains a score of 7; the mean is 16 and the standard deviation of 4. What can we conclude about the standard scores of Kate and Martha? a) Martha has worse musical abilities than Kate b) Kate has closer to average musical abilities compared to Martha c) There is not enough evidence to compare Kate’s and Martha’s scores d) Kate has slightly worse musical abilities compared to Martha Ans: D Explanation: Kate’s score is -2.25 standard deviations from the mean, indicating her musical abilities are a little bit further away from the mean compared to Martha. So, we conclude that Kate has worse musical abilities than Martha. 6. Given Kate’s standardized score, what would her raw score be if the distribution had the characteristics of Martha’s test?
32
a) 35 b) 17 c) 23 d) 14 Ans: B Explanation: X = zs + µ = -2.25*4 + 26 = -9 + 26 = 17 7. What is the range of raw scores that about 68% of the population have for a distribution with a mean of 50 and standard deviation of 3? a) 41 and 59 b) 45 and 55 c) 48 and 52 d) 47 and 53 Ans: D Explanation: 68% of the scores fall between -1 and +1 standard deviations from the mean. For -1: X = zs + µ -1*3 + 50 = -3 + 50 = 4 For 1: X = zs + µ 1*3 + 50 = 3 + 50 = 53 8. If 15,467 people rated how much they liked my textbook on a scale of 1 (it is rubbish) to 10 (I love it). The distribution of scores had a skew of 1.23 (SE = 0.65). The mean rating was 4.78. What is the z-score for the skew of these data? (Hint: in large samples, the skew is less important. Also, think about the value for the mean if we are interested in the value of a skewed distribution) a) 6.1 b) 5.46 c) 1.89 d) 4.13 Ans: C The value of the mean is 0 instead of the mean of the sample. The value (X) is the skewness value instead.
z
skewness
=
S -0 1.23 - 0 = = 1.89 SE 0.65 skewness
Chapter 7: Probability 1. What is considered an event? a) The frequency with which a trial can happen b) The probability of an outcome associated with an experiment c) One of the possible outcomes of the sample space d) The set of all possible outcomes in an experiment Ans: B
33
Explanation: An event is the relative frequency of an outcome divided by the sample space. 2. What statement is correct about empirical probability? a) It is based on a finite number of trials observed b) It is based on an equally likely number of outcomes occurring c) It is based on a theoretical assumption of possible outcomes d) The probability of an outcome cannot be 0.5 Ans: A Explanation: The probability of an event occurring is calculated as the ratio of the number of a particular outcome that we have observed to the total number of observed outcomes. 3. In a standard pack of 52 playing cards (excluding the jokers), there are 4 suits, each of which has the following cards: Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King. What is the probability of NOT drawing a Jack, Queen, King, or Ace from any suit as the first card? a) 4/52 b) 9/13 c) 16/52 d) 10/13 Ans: D Explanation: The set of outcomes that we are interested in is 9, that is the frequency of drawing any other card except Jack, Queen, King, or Ace. The sample space is the total number of possible outcomes, which is 13 cards regardless of the suit you draw. 4. Suppose we have a continuous random variable and we produce a probability distribution for this variable. What do we use to find out about the probability of particular events? a) The height of the ‘line’ b) The steepness of the ‘line’ c) The area under the ‘line’ d) The length of the ‘line’ The following information will be needed to answer questions 5 to 8. Imagine you are interested in the proportion of people in poverty living in cities in Europe. You know that the city % living in poverty is normally distributed with a mean of 15% and a standard deviation of 5%. Ans: C Explanation: Rather than showing the frequencies of scores, a density curve or probability distribution indicates the probability of a given score occurring. 5. The probability that a city will have more than 22% of its population living in poverty (give your answer to 2 decimal places) is: a) 0.92 b) 0.06
34
c) 0.10 d) 0.08 Ans: D Explanation: We first calculate the z-score (22 - 15) = (7) = 1.4 . Then, looking at the table of z-
5
5
scores, we look for 1.4; the proportion given the z-score is 0.08. 6. The probability a city will have less than 3% of its population living in poverty (give your answer to 2 decimal places) is: a) 0.02 b) 0.008 c) 0.78 d) 0.22 Ans: B Explanation: We first calculate the z-score (3 - 15) = ( -12) = 2.4 . Then, looking at the table of z-
5
5
scores, we look for 2.4; the above the given z-score 0.008. 7. The European Union decide to pursue a poor relief initiative in certain cities. Money is tight so they target the 1% of cities with the highest proportions living in poverty. What is the % living in poverty above which a city might expect to receive funding? a) 25.52 b) 27.23 c) 25.00 d) 26.65 Ans: D Explanation: As we are interested in the poorer 1%, we will look for the smaller proportion that equalizes to 1% of the proportion under the density curve. The z-score value is 2.33. Now we can calculate the raw score by multiplying the z-score by the standard deviation and adding up the mean. 8. The probability that a city will have more than 12% and less than 17% of its population living in poverty is? a) 0.85 b) 0.39 c) 0.38 d) 0.31 Ans: B Explanation: We will first find the z-scores for each of the raw values; 0.6 and 0.4 are the z-scores for 12% and 17%, respectively. The proportion can be calculated as follows: 1 – the smaller proportion of z-score 0.6 – smaller proportion of z-score 0.4 = 1 – 0.27 – 0.34 = 0.39.
35
9. Given the following conditional probability statement P(A∩B) = P(A)*P(B). Which of the following statements about events A and B is true? a) Events A and B are dependent b) Events A and B are independent c) Events A and B are very unlikely to occur d) Events A and B are very likely to occur Ans: B Explanation: Events are independent if one event occurring does not have an effect on the probability of the other event. 10. Imagine that we found that the probability that a rugby player in a league was involved in doping was 0.6. If a random sample of three rugby players were taken, what is the probability that all three were not involved in doping? a) 0.064 b) 0.216 c) 0.65 d) 1 Ans: A Explanation: P(not doping) = 1-0.6 = 0.4. P(all three cyclists not doped) = 0.4 * 0.4 * 0.4 = 0.064 11. The following cross-tabulation shows the relationship between political affiliation and opinion on whether unemployed people deserve a basic income. Joint probability
Do you think unemployed people deserve a Basic Income
Total
Yes
No
Conservatives
47
201
248
Liberals
80
62
142
Radicals
A
110
C
Total
B
373
696
What values should fill the cells A, B and C? a) A: 196, B: 306, C: 323 b) A: 33, B: 103, C: 149 c) A:127, B: 160, C: 323 d) A: 196, B: 323, C: 306 Ans: D Explanation: Work out C first by adding the row totals and then subtracting the total. Then A can be found by subtracting 110 the row total for Radicals. B is the sum of the rows from the column ‘Yes’. 12. Given P (Conservatives U Radicals), which of the following results is correct?
36
a) 0.75 b) 0.60 c) 0.1 d) 0.25 Ans: A Explanation: It is the sum of the probabilities of P(conservatives) and P(Radicals) = 0.15 + 0.60 = 0.75 13. The joint probability of being a Conservative and answering ‘no’ to the question of whether unemployed people deserve a Basic Income: a) 0.54 b) 0.29 c) 1 d) 0.71 Ans: B Explanation: P(A and B) = 201/696 = 0.29 14. The conditional probability of being a Radical supporter gives an answer of ‘no’ to the question of whether unemployed people deserve a Basic Income (Hint: use Baye’s theorem). a) 0.17 b) 0.44 c) 0.29 d) 0.54 Ans: C Explanation: P(A|B) = P(Not in favour|Radical)
P ( B|A) p( A) p ( B)
=
P( Radical | Not in favour )) P( Radical ) 0.53*0.29 = = 0.29 p(not in favour ) 0.53
Chapter 8: Inferential statistics: Going beyond the data 1. Which of the following sentences is an accurate description of the standard error? a) The standard deviation squared. b) It is the same as the standard deviation. c) It is the standard deviation of the sampling distribution of a statistic. d) It is the observed difference between sample means minus the expected difference between population means (if the null hypothesis is true). Ans: C Explanation: It is the standard deviation of the population for a sample statistic such as the mean. 2. Two samples of data are collected and the sample means are calculated. If the samples come from the same population, then: a) Their means should be roughly equal.
37
b) The difference between the samples we have collected is likely to be larger than we would expect based on the standard error. c) Their means should differ significantly. d) The experiment is unreliable. Ans: A Explanation: Although it is possible for their means to differ by chance alone, we would expect large differences between sample means to occur very infrequently. 3. What is the standard error? a. The standard error is the standard deviation of sample means. b. All of the options describe the standard error. c. The standard error is a measure of how representative a sample parameter is likely to be of the population parameter. d. The standard error is computed from known sample statistics, and it provides an unbiased estimate of the standard deviation of the statistic. Ans: B 4. Which of the following is true about a 95% confidence interval of the mean: a) 95 out of 100 sample means will fall within the limits of the confidence interval. b) 95 out of 100 confidence intervals will contain the population mean. c) 95% of the population means will fall within the limits of the confidence interval. d) There is a 0.05 probability that the population mean falls within the limits of the confidence interval. Ans: B Explanation: This is because if we’d collected 100 samples, calculated the mean and then calculated a confidence interval for that mean, then for 95 of these samples the confidence intervals we constructed would contain the true value of the mean in the population. 5. A 95% confidence interval is: a) The range of values of the statistic that we can be 95% confident contains a significant effect on the population. b) The range of values of the statistic which we can by 95% certain does not contain the true population effect. c) The range of values of the statistic which probably contains the true value of the statistic in the population. d) The range of values of the statistic which we can be 5% confident contains a significant effect on the population. Ans: C Explanation: A confidence interval is a range of values that is likely to include the true population value.
38
6. The 99% confidence interval usually is: a) Narrower than the 95% confidence interval. b) Wider than the 95% confidence interval. c) The same as the 95% confidence interval. d) A less precise estimate of the effect in the population than the 95% confidence interval. Ans: A Explanation: As the probability of the confidence interval increases the interval in which the true population parameter is likely to fall gets smaller. 7. Which of the following statements is true? a) Confidence intervals tell us about the range of possible values of a statistic within the sample. b) Confidence intervals are known as point estimates. c) Confidence intervals are not biased by non-normally distributed data. d) If the confidence interval for the difference between two means does include zero then the difference between the means is statistically significant. Ans: D 8. Of what is the standard error a measure? a) The ‘flatness’ of the distribution of sample scores. b) The variability in scores in the sample. c) variability of scores in the population. d) The variability of sample estimates of a parameter. Ans: B Explanation: The standard error tells us how far the sample means are far apart from the population mean. 9. Two samples of data are collected and the sample means are calculated. If the samples come from the same population, then: a) Their means should be roughly equal. b) The difference between the samples we have collected is likely to be larger than we would expect based on the standard error. c) Their means should differ significantly. d) The experiment is unreliable. Ans: A Explanation: Although it is possible for their means to differ by chance alone, we would expect large differences between sample means to occur very infrequently. 10. Imagine we have data from two samples on the salaries of employees in two supermarket companies. The first sample consists of 40 employees with a mean salary of 17,500 with a standard deviation of 1,500. The second sample has data for 35 employees with a mean monthly salary of
39
16,800 and a standard deviation of 1,300. What is the confidence interval at 90% for the difference in monthly salaries? (Hint: for a confidence interval at 90%, the z-scores would fall between -1.64 and +1.64 standard deviations from the mean). a) (-16, 28) b) (185,4, 1050.1) c) (169.7, 1230.2) d) (-101.3, 215) Ans: C Explanation: CI = ( ) (17500 - 16800
± 1.64
1500 1300 + ) 40 35
The upper limit is = 1230.2 The lower limit is = 169.7 11. A small standard error of differences tells us what? (Hint: The standard error of differences is a measure of the unsystematic variation within the data.) a) That the differences between scores are normally distributed. b) That most pairs of samples from a population will have very similar means. c) That sample means can deviate quite a lot from the population mean and so differences between pairs of samples can be quite large by chance alone. d) That the differences between scores are not normally distributed. Ans: A Explanation: The difference between sample means should normally be very small. 12. Why is the standard error important? a) It is unaffected by outliers. b) It gives you a measure of how well your sample parameter represents the population value. c) It tells us the precise value of the variance within the population. d) It is unaffected by the distribution of scores. Ans: B Explanation: As we do not have the true population value, knowing that the sample means are close to each other allows us to be confident that even for an extreme sample value, this value will not be far from the population value. 13. Which of the following statements about the t distribution is correct? a) It is skewed. b) In small samples it is narrower than the normal distribution c) It follows an exponential curve. d) As the degrees of freedom increase, the distribution becomes closer to normal. Ans: D
40
Explanation: All distributions are defined by degrees of freedom, which is the sample size minus 1. tdistributions are used for small samples, if we increase the sample size the distribution will start resembling a normal distribution. Chapter 9: Robust estimation 1. What is it true about extreme scores? a) Affect the point estimates only b) Affect the point estimates and population variance estimates c) Affect the standard error and confidence intervals, but not the mean d) Affect the mean and standard error, but not the confidence intervals Ans: B Explanation: Both the point estimates and the standard error, and, subsequently, confidence intervals, are particularly affected by extreme scores. 2. A robust estimate means that: a) We have calculated the estimate after transforming the data b) We are assuming that the distribution follows a normal distribution c) We estimate the variance parameters from a non-normal distribution d) We first transform the data and then we apply bootstrapping techniques Ans: C Explanation: Bootstrapping is a technique to reliably estimate the standard error and confidence intervals without the assumption that the sampling is normally distributed. 3. Which of the following transformations is most useful for correcting skewed data? a) Tangent transformation b) Arcsine transformation c) Cosine transformation d) Log transformation Ans: D Explanation: This is because taking the logarithm of a set of numbers squashes the right tail of the distribution. As such it’s a good way to reduce positive skew. 4. To get a sample of a certain size, scores are taken one-by-one from the observed data and each time replaced. The parameter of interest (e.g., the mean or b in regression) is computed within the sample. This process is repeated numerous times. The resulting parameter estimates are used to compute a confidence interval. The process I am describing is: a) Bootstrapping b) Significance testing c) Sampling d) The standard error Ans: A
41
Explanation: Bootstrapping uses sampling with replacement to construct a new sample as big as the original one. 5. What effect does it have trimming data with extreme scores by 20%? a) The median shifts to the left b) The median increases c) The mean shifts to the right d) The mean might shift either right or left Ans: D Explanation: Trimming data reduces the number of extreme scores on both tails of the distribution. Depending on the skew (positive or negative) of the data, the mean will be shifted to the left or right of the distribution. 6. If we use winsorizing instead of trimming, what does it happen to our data? a) The range of values becomes smaller b) The sample size is reduced by 10% c) The sample size decreases according to M-estimators d) The range of values becomes larger Ans: A Explanation: Changing the value of the tails to the nearest lowest or highest score impacts the range of values, that is the highest and lowest scores become smaller. 7. In the following sampling distribution, determine what scores are 3 standard deviations away from the mean to determine which values should be windsorized. -3234
45
59
90
108
-560
48
67
80
1321
4
49
84
86
1432
27
56
159
87
1450
32
59
81
95
4187 4999
a) -3234 and 4999 b) 4999 c) -3234, 4187 and 4999 d) -3234 Ans: B Feeback: you need to compute the standard deviation and the mean. We know that the z-score is 3 already. Thus, X = (3*1479.6) + 436.4 = 4875/X = (-3*1479.6) + 436.4 = -4002. 8. What is the main characteristic of a bootstrap sample? a) Not all the scores are equally likely to be selected
42
b) The number of scores selected is 10% smaller than the sample size c) Some scores might appear several times d) Scores in the original distribution can only be selected once Ans: C Explanation: bootstrap constructs a new sample using sampling with replacement, which although it might be possible not a single score is selected more than once, it is very unlikely. Chapter 10: Hypothesis testing 1. Children can learn a second language differently before the age of 7 than after.’ Is this statement: a) A non-scientific statement b) A one-tailed hypothesis c) A null hypothesis d) A two-tailed hypothesis Ans: D Explanation: This is because it is non-directional, it states that an effect will occur, but it doesn’t state the direction of the effect (i.e., whether learning will be faster or slower before the age of 7) 2. A two-tailed probability is .03. What is the one-tailed probability if the effect were in the specified direction? (Hint: A one-tailed hypothesis specifies the direction of the effect, while a two-tailed hypothesis specifies only that there will be an effect, it does not specify the direction of the effect.) a) 0.15 b) 1.15 c) 0.6 d) 0/9 Ans: A Explanation: 0.15. Solution = .03/2 3. A Type II error occurs when: (Hint: This would occur when we obtain a small test statistic (perhaps because there is a lot of natural variation between our samples). a) The data we have typed into SPSS is different from the data collected. b) We conclude that there is not an effect in the population when in fact there is. c) We conclude that the test statistic is significant when in fact it is not. d) We conclude that there is an effect in the population when in fact there is not. Ans: B Explanation: A Type II error would occur when we obtain a small test statistic (perhaps because there is a lot of natural variation between our samples). 4. What is the null hypothesis for the following question: Is there a relationship between heart rate and the number of cups of coffee drunk within the last 4 hours? a) There will be a significant relationship between the number of cups of coffee drunk within the last 4 hours and heart rate.
43
b) There will be no relationship between heart rate and the number of cups of coffee drunk within the last 4 hours. c) People who drink more cups of coffee will have significantly lower heart rates. d) People who drink more coffee will have significantly higher heart rates. Ans: B Explanation: The null hypothesis is the opposite of the alternative hypothesis and so usually states that an effect is absent. 5. Participants rated their mood score out of 20 before and after listening to Reign in Blood by the thrash metal band Slayer. What is the null hypothesis of this study? Before listening to slayer
After listening to slayer
5
14
8
5
9
17
4
18
3
8
15
19
12
14
6
16
a) Listening to Slayer is no better than listening to no music at improving mood score. b) Listening to Slayer decreases mood score. c) Listening to Slayer does not affect mood. d) Listening to Slayer increases mood score. Ans: C Explanation: The null hypothesis is the prediction that there is no effect of listening to slayer on the mood of individuals. 6. Read the following statements about statistical significance and select the right one: a) The probability of rejecting an alternative hypothesis when it is true. b) The probability of rejecting an alternative hypothesis when it is true. c) The probability of rejecting a null hypothesis when it is false. d) The probability of rejecting the null hypothesis when it is true. Ans: D Explanation: Statistical significance is the probability of committing a Type II error, which is when we believe there is no effect in the population, when, in fact, there is. 7. Under a null hypothesis, a sample value yields a p-value of .015. Which of the following statements is true? a) This finding is statistically significant at the .01 level of significance. b) This finding is statistically significant at the .05 level of significance.
44
c) This finding is statistically significant at the .001 level of significance. d) This finding is not statistically significant. Ans: B Explanation: a p-value of 0.015 is larger than 1% but smaller than 5%. Therefore, the association is statistically significant at the 5% level of significance. 8. Under the null hypothesis, what is the expected value for the number of women who have a Master’s degree in the following sample of 100 respondents (n = 100) Master’s degree
No Master’s degree
Total
Women
1/5*60 = 12
4/5*60 = 48
60
Men
1/5*40 = 8
4/5*40 = 32
40
Total
20
80
100
a) 32 b) 8 c) 12 d) 60 Ans: C Explanation: Under the null hypothesis the expected value would be no different than the observed value. 9. What is the alternative hypothesis for the following question: Does eating salmon make your skin glow? a) People who eat salmon will have a more glowing complexion compared to those who don’t. b) People who eat salmon will have a similar complexion to those who do not. c) Eating salmon does not predict the glow of the skin. d) There will be no difference in the appearance of the skin of people who eat salmon compared to those who don’t. Ans: A Explanation: The hypothesis or prediction from your theory would normally be that an effect will be present. This is known as the alternative hypothesis. 10. Classify each of the following statements about statistical power as either true or false (Hint: The power of a test is the probability that a given test will find an effect assuming that one exists in the population.) a) Power is the ability of a test to detect an effect given that an effect of a certain size exists in a population. b) We can use power to determine how large a sample is required to detect an effect of a certain size. c) Power is linked to the probability of making a Type II error. d) The power of a test is the probability that a given test is reliable and valid. Ans: Short answers:
45
a) true b) true c) true d) false 11. Classify each of the following statements about statistical power as either true or false. The power of a statistical test depends on (Hint: The power of a test is the probability that a given test will find an effect assuming that one exists in the population.) a) How big the effect actually is. b) How strict we are about deciding that an effect is significant. c) The sample size. d) Whether the test is a one- or two-tailed test. Ans: Short answers: a) true b) true c) true d) true 12. Which of the following p values based on a hypothesis test would be significant at less than the 1% level? a) 0.001 b) 0.01 c) 0.005 d) 0.1 Ans: A Explanation: 0.001 tells us that the chances are less than 1 in a thousand. 13. Which of the following best describes the relationship between sample size and significance testing? (Hint: Remember that test statistics are basically a signal-to-noise ratio, so given that large samples have less ‘noise’ they make it easier to find the ‘signal’.) a) Large effects tend to be significant only in large samples. b) In large samples even small effects can be deemed ‘significant’. c) Large effects tend to be significant only in small samples. d) In small samples only small effects will be deemed ‘significant’. The following information about hypothesis testing will be required to answer questions 13 to 15. Imagine you conduct a study on IQ tests of university students. You collect a random sample of 150 students. The average IQ score in your sample is 103.8 (μuni) and the sample standard deviation is 20. You know that the average IQ in the population of interest is 100. You are interested in performing a hypothesis test to see if the average IQ of university students is higher than that of the general population.
46
Ans: B Explanation: the larger the sample, the more information we have (less noise). We have greater power to detect differences. 14. Which is the appropriate hypothesis test? a) H0: μuni ≠ 100, HA: μuni =100. b) H0: μuni ≠ 100, HA: μuni £ 100. c) H0: μuni = 100, HA: μuni ≠ 100 d) H0: μuni £ 100, HA: μuni > 100 Ans: D Explanation: The alternative hypothesis is that university students have an IQ coefficient larger than 100 compared to the larger population, whereas the null hypothesis is that it is not. 15. A 95% confidence interval for the difference between two population means is found to be (−0.08, 0.15). Which of the following statements is true? a) The two populations cannot have the same means. b) We can be 95% confident that the true difference between the population means falls between −0.08 and 0.15. c) The probability is 0.05 that the true difference between the population means is between −0.08 and 0.15 d) The probability is 0.95 that a significant difference between the population means lies between −0.08 and 0.15. Ans: D Explanation: Here, we are testing the difference between two population means, so the 95% confidence interval indicates the range of probability that the two means are the same. Chapter 11: Modern approaches to theory testing 1. If we calculated an effect size and found it was r = .21 which expression would best describe the size of the effect? (Hint: The value of r can lie between 0 (no effect) and 1 (a perfect effect). a) medium to large b) large c) small d) small to medium Ans: D Explanation: While this is correct, it is worth remembering that these ‘canned’ effect sizes are no substitute for evaluating an effect size within the context of the research domain in which it is being used. 2. A researcher asked 933 people which type of programme they prefer to watch on television. The results are below. What are the odds of being a man if you prefer to watch sport?
47
News
Documentaries
Soaps
Sport
Total
Women
108
123
187
62
480
Men
130
123
68
132
453
Total
238
246
255
194
933
a) 1.6 b) 1.47 c) 2.13 d) 3.4 Ans: C 2.13 Solution
Odds
man if prefers sport
Number of men who prefer sport Number of women who prefer sport 132 = 62 = 2.13 =
3. A recent story in the media has claimed that women who eat breakfast every day are more likely to have boy babies than girl babies. Imagine we wanted to investigate this effect in women from two different age groups (18–30 and 31–43 years). Use the fabricated data in the table below to calculate: a. The odds of a 28-year-old woman having a boy if she eats breakfast. b. The odds of a 40-year-old woman having a boy if she eats breakfast. Age group of mother
Gender of baby
(years) 18–30
Count
breakfast or not? Boy Girl
31–43
Did the mother eat
Boy Girl
No
8
Yes
15
No
17
Yes
9
No
10
Yes
14
No
16
Yes
4
Ans: Part 1: odds of it being a boy if you eat breakfast at 28 years old The correct answer is d) 1.67 Part 2: odds of it being a boy if you eat breakfast at 40 years old The correct answer is a) 3.5 Solutions: For the age group 18–30:
48
Odds boy if eats breakfast =
Number of women who ate breakfast and had a boy 15 = Number of women who ate breakfast and had a girl 9
= 1.67 For the age group 31–43:
Number of women who ate breakfast and had a boy 14 = Number of women who ate breakfast and had a girl 4 = 3.5
Odds boy if eats breakfast =
4. How much greater is the shared variance between two variables if the Pearson correlation coefficient between them is –.4 than if it is .2? a) Four times as great b) Two times as great c) Half as much d) A quarter as much Ans: A 5. Which correlation coefficient would you use to look at the correlation between gender and time spent on the phone talking to your mother? a) The biserial correlation coefficient, rb b) Kendall’s correlation coefficient, τ c) The point-biserial correlation coefficient, rpb d) Pearson’s correlation coefficient, r Ans: C Explanation: The point-biserial correlation coefficient, rpb, is used when one variable is a discrete dichotomy (e.g., gender) 6. A Pearson’s correlation coefficient of –.5 would be represented by a scatterplot in which: a) The data cloud looks like a circle and the regression line is flat. b) The regression line slopes upwards. c) Half of the data points sit perfectly on the line. d) There is a moderately good fit between the regression line and the individual data points on the scatterplot. Ans: D Explanation: There is a moderately good fit between the regression line and the individual data points on the scatterplot. 7. Which of the following statements about Pearson’s correlation coefficient is not true? a) It cannot be used with binary variables (those taking on a value of 0 or 1). b) It can be used as an effect size measure c) It varies between –1 and +1.
49
d) It can be used on ranked data. Ans: A Explanation: Pearson’s correlation coefficient can be used with binary variables (or categorical variables). 8. If you have a curvilinear relationship, then: (Hint: The two most important sources of bias in this context are probably linearity and normality.) a) Transforming the data won’t help. b) You can use Pearson’s correlation; you just need to remember that a curve indicates that the variables are not linearly related. c) It is not appropriate to use Pearson’s correlation because it assumes a linear relationship between variables. d) Pearson’s correlation can be used in the same way as it is for linear relationships. Ans: C Explanation: Remember that we’re fitting a linear model to the data, so if the relationship between variables is not linear then this model is invalid, and you might need to transform the data. 9. If a correlation coefficient has an associated probability value of .02 then: a) There is only a 2% chance that we would get a correlation coefficient this big (or bigger) if the null hypothesis were true. b) The results are important. c) We should accept the null hypothesis. d) The hypothesis has been proven. Ans: A Explanation: A probability value of .02 indicates that the correlation coefficient is significant. 10. Which of the following could not be a correlation coefficient: a) 0 b) .27 c) –.27 d) 2.7 Ans: D Explanation: A correlation coefficient can only range from –1 to +1. 11. Looking at the table below, which variables were the most strongly correlated? Work ethic
Work ethic
Annual income
IQ
1.000
.72
.66
Sig. (2-tail)
.
.001
.000
N
550
550
550
Pearson’s correlation
50
Annual income
Pearson’s
.72
1.000
.47
Sig. (2-tail)
.000
.
.03
N
550
550
550
Pearson’s
.66
.47
1.000
Sig. (2-tail)
.000
.03
.
N
550
550
550
correlation
IQ
correlation
(Hint: Larger values of Pearson’s correlation (ignoring the sign) indicate stronger correlations between variables.) a) Work ethic and annual income b) Work ethic and IQ c) Annual income and IQ d) None of the correlations are significant Ans: A Explanation: The value of Pearson’s r was .72, which was the largest in the table. 12. The coefficient of determination: a) Is the square root of the correlation coefficient. b) Is the square root of the variance. c) Is a measure of the amount of variability in one variable that is shared by the other. d) Indicates whether the correlation coefficient is significant. Ans: D Explanation: it is the ratio of the variance of the dependent variable that is predicted by the predictor variable. 13. A Pearson’s correlation of –.71 was found between the number of hours spent at work and energy levels in a sample of 300 participants. Which of the following conclusions can be drawn from this finding? a) The estimate of the correlation will be imprecise. b) Spending more time at work caused participants to have less energy. c) Amount of time spent at work accounted for 71% of the variance in energy levels. d) There was a strong negative relationship between the number of hours spent at work and energy levels. Ans: D Explanation: the correlation coefficient is negative and the effect is large 0.71. 14. Assume a researcher found that the correlation between a test she had developed and exam performance was .5 in a study of 25 students. She had previously been informed that correlations
51
under .30 are considered unacceptable. The 95% confidence interval was [0.131, 0.747]. Can you be confident that the true correlation is at least 0.30? a) No, you cannot, because the lower boundary of the confidence interval is .131, which is less than .30, and so the true correlation could be less than .30. b) Yes, you can, because the correlation coefficient is .5 (which is above .30) and falls within the boundaries of the confidence interval. c) No, you cannot because the sample size was too small. d) Yes, you can, because the upper boundary of the confidence interval is above .30 we can be 95% confident that the true correlation will be above .30 Ans: A Explanation: as the lower bound of the confidence interval falls below the expected threshold, we cannot conclude a correlational effect. 15. Imagine we took five people who each had one sibling and asked them how close they live to their sibling (km) (M = 12.40, SD = 16.257) and then asked them how many hours, on average, they spend with their sibling per month (M = 34.00, SD = 39.648). The covariance between the variables was – 335.00. a. Calculate the Pearson correlation coefficient for the relationship between average time spent with sibling (hours) and distance from sibling (km). Write your answer to 2 decimal places. b. The researcher was asked to measure time in minutes rather than hours and distance in miles rather than kilometres. Calculate the Pearson correlation coefficient for the relationship between average time spent with sibling and distance from a sibling in this case. Write your answer to 2 decimal places (Hint: The correlation coefficient is a standardized measure). Ans: Part a: The correct answer is d) 0.52 Part b: The correct answer is a) 0.52 Solutions: Part a: The correlation is
r=
cov xy sx s y
=
-335.00 = -.52 16.257 ´ 39.65
Chapter 12: Assumptions 1. Which of the following is not an assumption of the general linear model? a) Normally distributed residuals b) Linearity c) Additivity d) Dependence Ans: D
52
Explanation: Independence is an assumption of parametric tests, not dependence. 2. In which of the following situations is the assumption of normality least important? a) If you have a small sample. b) If you want to construct confidence intervals around the parameter estimates of your model. c) If you want only to estimate the parameters of your model. d) If you want to compute significance tests relating to the parameter estimates of your model. Ans: C Explanation: in order to reliably estimate confident intervals and test significance, we need to rely on the assumption that the sampling distribution follows a normal distribution. 3. The assumption of homogeneity of variance is met when: a. The variances in different groups are significantly different. b. The variances in different groups are approximately equal. c. The variance across groups is proportional to the means of those groups. d. The variance is the same as the interquartile range. Ans: B Explanation: This is because to make sure our estimates of the parameters that define our model and significance tests are accurate, we have to assume homoscedasticity (also known as homogeneity of variance). 4. What does the assumption of independence mean? a. This assumption means that none of your independent variables is correlated. b. This assumption means that the errors in your model are not related to each other. c. This assumption means that you must use an independent design rather than a repeated-measures design. d. This assumption means that the residuals in your model are not independent. Ans: B Explanation: An easier way to think about this is in terms of scores on the outcome variable being independent, which means that the behaviour of one participant does not influence the behaviour of another. 5. When it is not necessary to use Levene’s test? a) When you have equal group sizes. b) When you have unequal group sizes. c) When you have a small sample. d) When you are conducting a two-tailed test. Ans: A Explanation: If you don’t have unequal group sizes, the assumption of homogeneity of variance is pretty much irrelevant and can be ignored.
53
6. Levene’s test can be used to measure: a) Whether scores are normally distributed. b) Whether scores are independent. c) Whether group means are equal. d) Whether group variances differ. Ans: D Explanation: Levene’s test is a test of homogeneity of variances, that is whether the variance of a variable calculated in two or more groups is equal. 7. Imagine you conduct a t-test using IBM SPSS and the output reveals that Levene’s test for equality of variance is significant. What should you do? (Hint: Levene’s test tests the assumption that variances in different groups are approximately equal.) a) Interpret the figures in the row labelled ‘equal variances assumed’. b) Conduct a Kruskal–Wallis test instead. c) Interpret the figures in the row labelled ‘equal variances not assumed’. d) Collect more data. Ans: C Explanation: a statistically significant Levene’s test indicates we reject the null hypothesis that the variances are equal, and we accept the alternative hypothesis. 8. Looking at the table below, which of the following statements is correct?
a) Levene’s test was significant, F(1, 118) = 0.93, p = .007, indicating that the assumption of homogenity of variance had been met. b) Levene’s test was non-significant, F(1, 118) = 0.01, p = .93, indicating that the assumption of homogenity of variance had been met. c) Levene’s test was non-significant, F(1, 118) = 0.01, p = .93, indicating that the assumption of homogenity of variance had been violated. d) Levene’s test was significant, F(1, 118) = 0.01, p = .93, indicating that the assumption of homogeneity of variance had been violated. Ans: B Explanation: Leven’s test is a test of equality of variances. As the p-value is larger than 5%, we have no evidence to reject the null hypothesis, so we assume variances are equal.
54
9. A researcher investigating ‘Pygmalion in the classroom’ measured teachers’ perceptions of male and female students’ mathematical abilities. She collected teacher ratings of 97 male and female students. Based on the output, what can you say about the data?
a) Mean perceptions of male and female students were significantly different. b) Mean and median ratings were similar for males and females c) The variances of ratings were significantly different for males and females. d) Homogeneity of variance can be assumed. Ans: C Explanation: The significant result (p < .05) means that homogeneity of variance cannot be assumed. 10. A psychologist was interested in predicting how depressed people are from the amount of news they watch. Based on the output, do you think the psychologist will end up with a model that can be generalized beyond the sample?
a) No, because the errors lack linearity. b) No, because the errors show heteroscedasticity.
55
c) Yes, because errors are normally distributed. d) Yes, because errors are independent. Ans: B
The correct answer is b) No, because the errors show heteroscedasticity. Explanation: The funnel shape of the residuals indicates heteroscedasticity. 11. The following graph shows:
a) Heteroscedasticity b) Heteroscedasticity and non-linearity c) Non-linearity d) Regression assumptions that have been met Ans: D Explanation: This is correct because the residuals are spread out evenly and do not curve or funnel out. 12. What is multicollinearity? (Hint: It is an assumption of the linear model. For this assumption to be met, we want there to be no multicollinearity in our data set.) a) When predictor variables are independent b) When predictor variables are correlated with variables, not in the regression model c) When predictor variables correlate very highly with each other d) When predictor variables have a linear relationship with the outcome variable Ans: C 13. Which of the following is not a reason why multicollinearity is a problem in regression? a) It leads to unstable regression coefficients. b) It creates heteroscedasticity in the data. c) It limits the size of R. d) It makes it difficult to assess the importance of individual predictors.
56
Ans: B Explanation: This is correct because it is not true. When the variances are very unequal there is said to be heteroscedasticity. Violating this assumption invalidates our confidence intervals and significance tests. Multicollinearity, however, is when predictor variables correlate too highly. 14. Which of these statements is not true? a) Tolerance values above 0.2 may indicate multicollinearity in the data. b) The tolerance is 1 divided by the VIF (variance inflation factor). c) Multicollinearity in the data is shown by a VIF (variance inflation factor) greater than 10. d) If the average variance inflation factor is greater than 1 then the regression model might be biased. Ans: A Explanation: Yes, this is correct, because it is not true. Tolerance below 0.1 indicates a serious problem and tolerance below 0.2 indicates a potential problem. Chapter 13: Relationships 1. Which of the following statements about the t-statistic in regression is not true? a) The t-statistic is equal to the regression coefficient divided by its standard deviation. b) The t-statistic can be used to see whether a predictor variable makes a statistically significant contribution to the regression model. c) The t-statistic provides some idea of how well a predictor predicts the outcome variable d) The t-statistic tests whether the regression coefficient, b, is equal to 0. Ans: A Explanation: This is correct as the statement is not true. 2. With 2 ´ 2 contingency tables (i.e., two categorical variables both with two categories) no expected values should be below ______. a) 10 b) 0.8 c) 1 d) 5 Ans: D 3. A researcher asked 933 people which type of programme they prefer to watch on television. The results are below. News
Documentaries
Soaps
Sport
Total
Women
108
123
187
62
480
Men
130
123
68
132
453
Total
238
246
255
194
933
A chi-square test produced the SPSS output below. What can we conclude from this output?
57
Chi-Square Tests
Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association N of Valid Cases
Value 82.112a 84.840
3 3
Asymp. Sig. (2-sided) .000 .000
1
.746
df
.105 933
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 94.19.
a) Men and women watch similar types of programmes. b) The profile of programmes watched was significantly different between men and women. c) Significantly more soap operas were watched. d) Women watched significantly more programmes than men. Ans: B Explanation: The p-value is less than .05 for the row labelled Pearson Chi-Square and underneath the table, SPSS tells us that no expected frequencies are less than 5. 4. Are directional hypotheses possible with chi-square? a) Yes, but only when your sample is greater than 200. b) Yes, but only when there are 12 or more degrees of freedom. c) Yes, but only when you have a 2 × 2 design. d) Directional hypotheses are never possible with the chi-squared test. Ans: C Explanation: Directional alternative hypotheses using the chi-square test are only valid for 2 × 2 contingency tables – any larger and the chi-square test is testing compound hypotheses. 5. Which of the following is not a correct statement about the chi-square test? a) A sample size of at least 20 b) A null hypothesis of independence between the variables c) The variables must have discrete categories d) There is a limit on how many cells can have a low expected value Ans: A Explanation: Although is preferable to have a sample size bigger than 20, it is not a mandatory condition of the chi-square test. 6. Imagine a researcher who wanted to investigate whether there was a significant correlation between IQ and annual income, but she had reason to believe that work ethic would influence both of these variables. What should she do? a) Conduct a semi-partial correlation to look at the relationship between IQ and work ethic while partially out the effect of annual income.
58
b) Conduct a partial correlation to look at the relationship between work ethic and annual income partially out the effect of IQ. c) Conduct a semi-partial correlation to look at the relationship between IQ and annual income while partially out the effect of work ethic. d) Conduct a partial correlation to look at the relationship between IQ and annual income while partially out the effect of work ethic. Ans: D Explanation: Partial correlation partials out the effect that the third variable has on both variables in the correlation. 7. The table below contains scores from six people on two different scales that measure attitudes towards reality TV shows. Attitudes towards Watching Reality TV Scale
The General Reality TV Scale
3
2
7
5
4
3
1
1
8
7
6
7
Using the scores above, the two scales are likely to: (Hint: If two variables are related, then changes in one variable should be met with similar changes in the other variable.) a) Have identical means. b) Correlate positively. c) Be uncorrelated. d) Correlate negatively. Ans: B Explanation: High scores on one scale tend to produce high scores on the other, and low scores on one also correspond with low scores on the other 8. Men and women were asked which type of animal they thought made the best pets. Data are in the table below. Reptiles
Mammals
Birds
Men
24
35
20
Women
15
47
12
If the expected frequencies rule for chi-square had been violated by the data, which categories could be combined together in a meaningful way to increase the expected frequencies? a) It depends on the research hypothesis. b) Reptiles, mammals and neither c) Men and women
59
d) Reptiles and mammals Ans: A Explanation: All of the answers are correct in a sense, but it depends on the question that you want to answer. If you want to know whether men and women like different types of pets it doesn’t make sense to combine males and females, but it might be acceptable to combine reptiles and birds so you could see gender effects on mammals vs. other animals 9. Which of the hypotheses below would be suited for testing with a one-variable chi-square test? a) People who choose the number 7 as their ‘lucky’ number are significantly more superstitious than people who choose the number 13 as their ‘lucky’ number. b) Choice of ‘lucky’ number is directly related to measures superstition. c) It was hypothesized that more people would choose the number 7 as their ‘lucky’ number than any other number. d) It was hypothesized that fewer people would choose the number 7 as their ‘lucky’ number than any other number. Ans: C Explanation: one-tail hypothesis indicates the direction of the expected association. 10. The interpretation of the odds ratio, Exp(B), can be generalized to the population if: a) The confidence interval of Exp(B) does cross 0. b) The confidence interval of Exp(B) does cross 1. c) The confidence interval of Exp(B) does not cross 0. d) The confidence interval of Exp(B) does not cross 1. Ans: D Explanation: If our confidence interval does not cross zero then we can be confident that the direction of the relationship we have observed is true in the population. 11. How are the degrees of freedom calculated for a chi-square test? (Hint: r is the number of rows and c is the number of columns.) a) (r - 1)/(c - 1) b) (r + 1)-(c + 1) c) (r - 1)(c -1) d) (r ´ 2)-(c ´ 2) Ans: B 12. What does Fisher’s exact probability show? (Hint: Fisher came up with a method for computing the exact probability of the chi-square statistic that is accurate when sample sizes are small). a) It is the amount of variation that one variable can explain in the other variable. b) It tests whether the assumption of independence has been met. c) It tests whether the assumption of expected frequencies has been met.
60
d) It is the probability of obtaining a chi-square value at least as big as the one observed if the null hypothesis were true. Ans: D 13. The table below contains scores from six people on two different scales that measure attitudes towards reality TV shows. Attitudes towards Watching Reality TV Scale
General Attitudes towards Reality TV Scale
3
2
7
5
4
3
1
1
8
7
6
7
Based on intuition rather than computation, which of the following is the value of the coefficient of determination between the two scales? a) .85 b) –.85 c) 85 d) .085 Ans: A Explanation: The coefficient of determination is r squared, and there is a very strong positive relationship between the scales. 14. Imagine we took five people who each had one sibling and asked them to rate on a 10-point scale how close they felt to their sibling (M = 6.00, SD = 3.08) and then asked them how many hours, on average, they spend with their sibling per month (M = 34.00, SD = 39.65). The covariance between the variables was 27.5. Calculate the Pearson correlation coefficient for the relationship between average time spent with sibling and perceived closeness and write your answer to 2 decimal places. Solution:
r=
cov xy sx s y
=
27.5 = .225 3.08 ´ 39.65
15. Research suggests that anxiety runs in families. Imagine we measured anxiety on a scale from 0 to 60 (0 = I’ve never felt anxious/nervous about anything in my life, 60 = I feel anxious/nervous every minute of every day) in a sample of 10 mothers and their children. The data are displayed in the table below. Mother anxiety score
60
45
54
23
34
Mean
S
43.2
14.96
61
Child anxiety score
55
32
36
15
17
31.0
16.23
a) Calculate the covariance of these data. Write your answer to 2 decimal places. b) Calculate Pearson’s correlation coefficient for these data. Write your answer to 2 decimal places. Ans: Part a: The correct answer is 227.75 Part b: The correct answer is .94 Solutions:
å ( x - x )( y - y ) cov ( x, y ) = n
i =1
i
i
N -1
cov ( x, y ) =
(16.8)( 24 ) + (1.8 )(1) + (10.8 )( 5 ) + (-20.2 )(-16 ) + (-9.2 )(-14 )
N -1 403.2 +1.8 + 54 + 323.2 +128.8 = 4 911 = 4 = 227.75
r=
cov xy sx s y
=
227.75 = .938 14.96 ´16.23
16. A researcher asked 933 people which type of programme they prefer to watch on television. The results are below. What is the expected frequency under the null hypothesis for men who liked to watch sport? (Hint: When we have categorical predictors but a continuous outcome (e.g., ANOVA) the model we use is group means, but we can’t work with means when we have a categorical outcome variable so we work with frequencies instead) News
Documentaries
Soaps
Sport
Total
Women
108
123
187
62
480
Men
130
123
68
132
453
Total
238
246
255
194
933
Solution: n is the total number of observations (in this case 933). We can calculate these expected frequencies for the four cells within our table (row total and column total are abbreviated to RT and CT, respectively):
Model Men, Sport =
RTMen × CTSport n
=
453×194 = 94.19 933
17. A recent story in the media has claimed that women who eat breakfast every day are more likely to have boy babies than girl babies. Using the fabricated data in the table below calculate: (Hint: You will need to use the table of critical values of the chi-square distribution in the Appendix.)
62
a. The Pearson’s chi-square test statistic. b. The degrees of freedom of the chi-square statistic. c. Is the chi-square value significant at the .05 level? d. Is the chi-square statistic significant at the .01 level? Did they eat breakfast? Gender of baby
Yes
No
Total
Boy
25
19
44
Girl
24
32
56
Total
49
51
100
Ans: Part a:
c2 = 3.92 Part b: df = 1 Part c: Yes Part d: No Solutions:
Modelij = E ij =
Row Totali × Column Total j n
RTboy × CTyes
44 × 49 = 21.56 n 100 RTboy × CTno 44 × 51 Modelboy, No = = = 22.44 n 100 Modelboy, Yes =
Modelgirl, Yes = Modelgirl, No =
RTgirl × CTyes n RTgirl × CTno n
=
=
56 × 49 = 27.44 100
=
56 × 51 = 28.56 100
( 25 - 21.56 ) = + (19 - 22.44 ) + ( 24 - 27.44 ) + (32 - 28.56 ) χ = 2
2
2
2
2
21.56
22.44
27.44
( 3.44 ) + ( -3.44 ) + ( -3.44 ) + ( 3.44 ) = 2
21.56
2
22.44
2
27.44
28.56
2
28.56
= 0.55 +.53 + 2.43 + 0.41 = 3.92 This statistic can then be checked against a distribution with known properties called the chi-square distribution. All we need to know is the degrees of freedom, and these are calculated as (r − 1)(c – 1) we need r is the number of rows and c is the number of columns. In this case we get df = (2 − 1)(2 − 1) = 1.
63
Chapter 14: The general linear model 1. What is the overall effect of an independent variable on a dependent variable known as? a) The indirect effect b) The main effect c) The direct effect d) The interaction effect Questions 2 and 3 use the following information. A random sample from a population has been taken and the following five observations on variables X and Y were recorded. We have that the standard deviation of x is 6.32 and of y is 5.48. Also, Pearson’s correlation is -6.8. x
y
4
8
5
7
6
5
8
4
12
1
Ans: B Explanation: main or simple effect is the effect of an independent variable on a dependent variable. 2. What is the regression (ordinary least squares) estimate of the slope of a regression of Y (dependent variable) on X? a) -1.2 b) -0.93 c) -1.13 d) 10.95 Ans: B Explanation: bˆ1 =
r (sy ) sx sx
=
-6.8*5.48 -37.24 = = -0.93 6.32*6.32 40
3. What is the OLS estimate of the intercept of regression of X (dependent variable) on Y? a) -10.95 b) -12.67 c) -0.85 d) 11.52 Ans: D Explanation: The value of Y when x=0. Using the formula: Y - bX = 5 – (-0.93)*7 = 11.52 4. Which of the following statements about the F-ratio is true? a) The F-ratio is the ratio of variance explained by the model to the error in the model.
64
b) The F-ratio is the ratio of variance explained by the model to the total variance in the outcome variable. c) The F-ratio is the ratio of error variance to the total variance. d) The F-ratio is the proportion of variance explained by the regression model. Ans: A Explanation: the F-tests as other tests tell us about the improvement by fitting a model relative to how much error remains in the model. 5. Recent research has shown that lecturers are among the most stressed workers. A researcher wanted to know exactly what it was about being a lecturer that created this stress and subsequent burnout. She recruited 75 lecturers and administered several questionnaires that measured: Burnout (high score = burnt out), Perceived Control (high score = low perceived control), Coping Ability (high score = low ability to cope with stress), Stress from Teaching (high score = teaching creates a lot of stress for the person), Stress from Research (high score = research creates a lot of stress for the person), and Stress from Providing Pastoral Care (high score = providing pastoral care creates a lot of stress for the person). The outcome of interest was burnout, and Cooper’s (1988) model of stress indicates that perceived control and coping style are important predictors of this variable. The remaining predictors were measured to see the unique contribution of different aspects of a lecturer’s work to their burnout. How much variance in burnout does the final model explain for the sample? Model Summaryd Model 1 2 3
R .868a .884b .896c
R Square .753 .782 .803
Adjusted R Square .746 .772 .792
Std. Error of the Estimate 6.71701 6.36233 6.08191
Durbin-W atson
1.461
a. Predictors: (Constant), Coping Ability, Perceived Control b. Predictors: (Constant), Coping Ability, Perceived Control, Stress from Teaching c. Predictors: (Constant), Coping Ability, Perceived Control, Stress from Teaching, Stress from providing pastoral Care d. Dependent Variable: Burnout
a) 89.6% b) 8.3% c) 79.2% d) 80.3% Ans: D Explanation: In a model with several variables, we need to look at the R square column to find out how much variance is explained by the model adjusted by the number of variables in the model. 6. The student welfare office was interested in trying to enhance students’ exam performance by investigating the effects of various interventions. They took five groups of students before their
65
statistics exams and gave them one of five interventions: (1) a control group just sat in a room contemplating the task ahead; (2) the second group had a yoga class to relax them; (3) the third group were told they would get monetary rewards contingent upon the grade they received in the exam; (4) the fourth group were given beta-blockers to calm their nerves; and (5) the fifth group were encouraged to sit around winding each other up about how much revision they had/hadn’t done (a bit like what usually happens). The final percentage obtained in the exam was the dependent variable. Using the critical values for F, how would you report the result in the table below? SS
Df
MS
F
Model
1213.6
4
303.4
12.43**
Residual
707.9
29
21.4
Total
1921.5
33
a) Type of intervention had a significant effect on levels of exam performance, F(4, 33) = 12.43, p < .01. b) Type of intervention did not have a significant effect on levels of exam performance, F(4, 33) = 12.43, p > .01. c) Type of intervention had a significant effect on levels of exam performance, F(4, 29) = 12.43, p < .01. d) Type of intervention did not have a significant effect on levels of exam performance, F(4, 29) = 12.43, p > .05. Ans: C 7. A recent study investigated whether vodka is less likely to give you a hangover than wine. Twenty participants on a night out were asked to drink only vodka for the whole evening then rate how they felt the next day out of 10 (0 = I feel fantastic, 10 = I can’t move my head in case it explodes). The following month, they were asked to do the same again, only this time they were asked to drink only white wine. The t-score was 2.56. Which of the sentences below is correct? (Hint: You will need to look up the p-value in the table of critical values of the t-distribution in the Appendix.) a) It is not significant with a two-tailed test. b) It is not significant with a one-tailed test. c) It is significant at the 1% level with a two-tailed test. d) It is significant at the 5% level but not the 1% level with a two-tailed test. Ans: D Explanation: Our value of t was t(19) = 2.56, which is larger than the .05 critical value of 2.09 and smaller than the .01 critical value of 2.86. 8. The t-statistic: a) When significant, indicates an important finding. b) Is accurate only when using large samples. c) Is the ratio of the systematic variation to the unsystematic variation. d) Is the standard deviation of the sampling distribution of a statistic.
66
Ans: C Explanation: as other statistics is the improvement due to the model relative to the error that remains in the model. 9. What does the variance sum law state? a) That the variance of a difference between two independent variables is smaller than the sum of their variances. b) That the variance of a difference between two independent variables is equal to the sum of their variances. c) That the variance of a difference between two independent variables is larger than the sum of their variances. d) That the sum of the variances of two independent variables is larger than the sum of their individual variances. Ans: B Explanation: This statement means that the variance of the sampling distribution is equal to the sum of the variances of the two populations from which the samples were taken. 10. Which of the following statements is NOT correct about the adjusted R2: a) An estimate of the improvement of our model adjusted by the number of predictors b) The percentage of variation in the response explained by our model adjusted by the number of predictors c) A measure of goodness of fit to explain the dependent variable d) A way to compare models with different number of variables Ans: A Explanation: the adjusted R square is a measure of how much variance is explained by the predictors included in the model. 11. A psychologist was interested in whether the amount of news people watch predicts how depressed they are. Using the output table below, how much variance (as a percentage) in depression is shared by exposure to the news?
Ans: The correct answer is 5.0% Solution: Look at the ‘R Square’ column and multiply the value by 100. 12. Recent research has shown that lecturers are among the most stressed workers. A researcher wanted to know exactly what it was about being a lecturer that created this stress and subsequent
67
burnout. She recruited 75 lecturers and administered several questionnaires that measured: Burnout (high score = burnt out), Perceived Control (high score = low perceived control), Coping Ability (high score = low ability to cope with stress), Stress from Teaching (high score = teaching creates a lot of stress for the person), Stress from Research (high score = research creates a lot of stress for the person), and Stress from Providing Pastoral Care (high score = providing pastoral care creates a lot of stress for the person). The outcome of interest was burnout, and Cooper’s (1988) model of stress indicates that perceived control and coping style are important predictors of this variable. The remaining predictors were measured to see the unique contribution of different aspects of a lecturer’s work to their burnout. What analysis has been carried out? Variables Entered/Removedb Model 1 2
Variables Entered Coping Ability, Perceived Controla
Variables Removed .
Stress from Teaching
.
Stress from providing pastoral Care
.
3
Method Enter Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100). Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).
a. All requested variables entered. b. Dependent Variable: Burnout
a) Hierarchical multiple regression b) Multilevel model c) Reliability analysis d) Factor analysis Ans: A Explanation: in a model with several predictors, a hierarchical approach would first include the predictors known to influence the outcome followed by other predictors that may or may not have an effect. 13. Participants rated their mood score out of 20 before and after listening to Reign in Blood by the thrash metal band Slayer. Before Listening to Slayer
After Listening to Slayer
5
14
8
5
9
17
4
18
3
8
15
19
12
14
6
16
68
Calculate the standard error of the difference between the means. (Report your answer to 3 decimal places.) Ans: The correct answer is 1.865 Solution:
standard error of the difference between the means =
å( x - x ) s =
sD N
2
i
D
N -1
sD (before – after) = 5.276
5.276 = 1.865 8 14. A consumer researcher was interested in what factors influence people's fear responses to horror films. She measured gender (0 = female, 1 = male) and how much a person is prone to believe in things that are not real (fantasy proneness) on a scale from 0 to 4 (0 = not at all fantasy-prone, 4 = very fantasy-prone). Fear responses were measured on a scale from 0 (not at all scared) to 15 (the most scared I have ever felt). How much variance (as a percentage) in fear is shared by fantasy proneness?
Ans: The correct answer is 5.3% Solution: Answer = (look inbox labelled R Square in the table) .156 – .103 = .053. .053 ´ 100 = 5.3% 15. Recent research has shown that lecturers are among the most stressed workers. A researcher wanted to know exactly what it was about being a lecturer that created this stress and subsequent burnout. She recruited 75 lecturers and administered several questionnaires that measured: Burnout (high score = burnt out), Perceived Control (high score = low perceived control), Coping Ability (high score = low ability to cope with stress), Stress from Teaching (high score = teaching creates a lot of stress for the person), Stress from Research (high score = research creates a lot of stress for the person), and Stress from Providing Pastoral Care (high score = providing pastoral care creates a lot of stress for the person). The outcome of interest was burnout, and Cooper’s (1988) model of stress indicates that perceived control and coping style are important predictors of this variable. The
69
remaining predictors were measured to see the unique contribution of different aspects of a lecturer’s work to their burnout. Which of the predictor variables does not predict burnout? Coefficientsa
Model 1
2
3
Unstandardized Coefficients B Std. Error .548 1.558 .619 .088 .372 .054 10.119 3.476 .633 .084 .516 .070 -.240 .079 6.166 3.615 .675 .082 .507 .067 -.360 .087 .182 .065
(Constant) Perceived Control Coping Ability (Constant) Perceived Control Coping Ability Stress from Teaching (Constant) Perceived Control Coping Ability Stress from Teaching Stress from providing pastoral Care
Standardi zed Coefficien ts Beta
t .352 7.002 6.851 2.911 7.548 7.383 -3.042 1.706 8.271 7.595 -4.143 2.775
.496 .485 .507 .673 -.257 .541 .663 -.387 .193
Sig. .726 .000 .000 .005 .000 .000 .003 .093 .000 .000 .000 .007
Collinearity Statistics Tolerance VIF .683 .683
1.465 1.465
.681 .370 .430
1.469 2.703 2.326
.658 .369 .322 .578
1.520 2.708 3.104 1.730
a. Dependent Variable: Burnout
Excluded Variablesd
Model 1
2 3
Stress from Teaching Stress from Research Stress from providing pastoral Care Stress from Research Stress from providing pastoral Care Stress from Research
Beta In -.257a .092a .049a .106b .193b .098c
t -3.042 1.523 .731 1.865 2.775 1.808
Sig. .003 .132 .467 .066 .007 .075
Partial Correlation -.340 .178 .086 .218 .315 .213
Collinearity Statistics Minimum Tolerance VIF Tolerance .430 2.326 .370 .933 1.072 .645 .771 1.297 .541 .927 1.078 .351 .578 1.730 .322 .925 1.081 .322
a. Predictors in the Model: (Constant), Coping Ability, Perceived Control b. Predictors in the Model: (Constant), Coping Ability, Perceived Control, Stress from Teaching c. Predictors in the Model: (Constant), Coping Ability, Perceived Control, Stress from Teaching, Stress from providing pastoral Care d. Dependent Variable: Burnout
a) Stress from providing pastoral care b) Stress from teaching c) Stress from research d) Perceived control Ans: C Explanation: look in the box ‘excluded variables’, the variable stress from research is not statistically significant in any of the three models. Chapter 15: Comparing two means 1. What does the independent t-test test assume? a) The sampling distribution is normally distributed. b) The data are normally distributed c) There are no differences between the mean scores. d) The means of two sets of scores are correlated. Ans: A
70
Explanation: t-tests are generally quite robust, so we talk about only needing approximately normally distributed data. 2. When conducting an independent t-test, what is the dependent variable? a) The experimental conditions b) The scores c) One of the independent variables becomes the dependent variable in the analysis. d) The term ‘dependent variable’ does not apply to the t-test. Ans: B Explanation: In an independent t-test we compare the scores of two unrelated groups. 3. Other things being equal, compared to the paired-samples (or dependent) t-test, the independent ttest: a) Has less power to find an effect. b) Has more power to find an effect. c) Has the same amount of power, the data are just collected differently. d) Is less robust. Ans: A Explanation: When the same participants are used across conditions the unsystematic variance (often called the error variance) is reduced dramatically, making it easier to detect any systematic variance. 4. A paired-samples t-test is used to test for? a) Differences between means of groups containing different entities when the sampling distribution is normally distributed, and the data have equal variances and are at least interval. b) Differences between means of groups containing different entities when the sampling distribution is not normally distributed. c) Differences between means of groups containing the same entities when the sampling distribution is not normally distributed and the data do not have unequal variances. d) Differences between means of groups containing the same entities when the sampling distribution is normally distributed, and the data have equal variances and are at least interval. Ans: D 5. Which of the following statements about a paired samples t-test is incorrect?: a) There ought to be less unsystematic variance compared to the independent t-test. b) The same participants take part in both experimental conditions. c) The error variance tends to stay the same or increase d) Other things being equal, you do not need as many participants as you would for an independent samples design. Ans: C
71
Explanation: When the same participants are used across conditions the unsystematic variance (often called the error variance) is reduced dramatically, making it easier to detect any systematic variance. 6. If we violate the assumption of independence of errors in a paired samples t-test, how does it affect our estimates? a) It provides a wrong estimate of the (probability?) of the difference of scores b) It provides a more accurate estimate of the confidence intervals c) It provides a wrong estimate of the confidence intervals and significance tests d) It does not affect the estimate of the standard error, but it does for confidence intervals Ans: C Explanation: if the assumption of independence of errors is violated, then the estimate of the standard error is wrong, which consequently affects the estimates of the confidence intervals and significance tests are these are based on the standard error. 7. The degrees of freedom for the paired samples t-test is: a) N - 1 b) ÖN - 1 c) N d) N - 2 Ans: A Explanation: When the same participants have been used, the degrees of freedom are the sample size minus 1. 8. When entering data for a repeated-measures design in SPSS: a) Each row of the data editor should represent a level of a variable, while each column represents data from one entity. b) Each row of the data editor should represent data from one entity, while each column represents a coding variable. c) Each row of the data editor should represent data from one entity, while each column represents a level of a variable. d) It doesn’t matter how you enter repeated-measures data into SPSS. Ans: C Explanation: Separate columns should represent each level of a repeated-measures variable. 9. When an experimental manipulation is carried out on the same entities, the within-participant variance will be made up of: a) The effect of the manipulation and individual differences in performance. b) Unsystematic variance only. c) Individual differences in performance only. d) The effect of the manipulation only.
72
Ans: A Chapter 16: Comparing several means 1. The table below contains the length of time (minutes) for which different groups of students were able to stay awake to revise statistics after consuming 500 ml of one of three different types of stimulants. What is the variation in scores from groups A to B to C known as? A
B
C
20
15
40
15
12
33
120
7
50
57
18
135
a) The between-groups variance b) The within-groups variance c) The grand variance d) Homogeneity of variance Ans: A 2. When the between-groups variance is a lot larger than the within-groups variance, the F-value is ______ and the likelihood of such a result occurring because of sampling error is ______. a) small; low b) large; high c) large; low d) small; high Ans: C Explanation: Yes, this is correct. If the differences between group means are large enough, then the resulting model will be a better fit of the data than the grand mean. 3. Subsequent to obtaining a significant result from an exploratory one-way independent ANOVA, a researcher decided to conduct three t-tests to investigate where the differences between groups lie. Which of the following statements is correct? a) The researcher should accept as statistically significant tests with a probability value of less than 0.016 to avoid making a Type I error. b) This is the correct method to use. The researcher did not make any predictions about which groups will differ before running the experiment, therefore contrasts and post hoc tests cannot be used. c) The researcher should have conducted orthogonal contrasts instead of t-tests to avoid making a Type I error. d) None of these options are correct. Ans: A Explanation: Yes, this is correct. Conducting multiple t-tests increases the familywise error rate, so if you are going to do this, it is important to divide the accepted probability level (.05) by the number of t-
73
tests you conduct (in this case 3). Alternatively, you could run post hoc tests, which control the familywise error rate. 4. When conducting a one-way independent ANOVA with three levels on the independent variable, an F-ratio that is large enough to be statistically significant tells us: a) That all of the differences between means are statistically significant. b) That there is a significant three-way interaction. c) That the model fitted to the data accounts for less variation than extraneous factors, but it doesn’t tell us where the differences between groups lie. d) That one or more of the differences between means is statistically significant but not where the differences between groups lie. Ans: D Explanation: It is necessary after conducting an ANOVA to carry out further analysis to find out which groups differ. 5. After an ANOVA you need more analysis to find out which groups differ. When you did not generate specific hypotheses before the experiment using: (Hint: We need a way to contrast the different groups without inflating the Type I error rate.) a) Post hoc tests b) Bootstrapping c) Planned contrasts d) t-tests Ans: A Explanation: Post hoc tests compare every group (as if conducting several t-tests) but use a stricter acceptance criterion such that the familywise error rate does not rise above .05 6. Imagine we conduct a one-way independent ANOVA with four levels on our independent variable and obtain a significant result. Given that we had equal sample sizes, we did not make any predictions about which groups would differ before the experiment and we want guaranteed control over the Type I error rate, which would be the best test to investigate which groups differ? (Hint: Post hoc tests do not require specific a priori predictions about which groups will differ.) a) Helmert b) Hochberg’s GT2 c) Bonferroni d) Orthogonal contrasts Ans: D Explanation: If you want guaranteed control over the Type I error rate then use Bonferroni. 7. The table below shows hypothetical data from an experiment with three conditions. Condition A
Condition B
Condition C
74
12
10
9
15
15
13
20
25
31
32
30
27
54
46
50
For these data, sphericity will hold when: (Hint: Sphericity refers to the equality of variances of the differences between treatment levels.) a. varianceA-B ¹ varianceB-C ¹ varianceA-C b. varianceA+B » varianceB+C » varianceA+C c. varianceA » varianceB » varianceC d. varianceA-B » varianceB-C » varianceA-C Ans: D Explanation: Sphericity is met when the variances of the differences between treatment levels are roughly equal. 8. Imagine I had a mean square of 824.16, with 2 degrees of freedom, and a residual of 21.66. What would the F-ratio be? The correct answer is F = 38.05 Solution: 824.16/21.66 = 38.05 9. The results of a one-way repeated-measures ANOVA with four levels on the independent variable revealed a significance value for Mauchly’s test of p = 0.048. What does this mean? a) The assumption of sphericity has been met. b) The assumption of sphericity has been violated. c) That Tukey’s test should be used. d) This value can be ignored because sphericity is not an issue in a one-way repeated-measures ANOVA design. Ans: B Explanation: The significance value (.048) is less than the critical value of .05, which means that the assumption of sphericity has been violated. 10. Imagine I ran a one-way repeated-measures ANOVA with five levels on the independent variable. The results revealed a Greenhouse–Geisser estimate, eˆ = .977, and the Huynh–Feldt estimate, e! = .999. What do these values tell us about the assumption of sphericity? (Hint: When the data are perfectly spherical, these estimates will be 1.) a) We do not need to check the assumption of sphericity for a one-way repeated-measures ANOVA. b) We cannot tell anything about the assumption of sphericity from these values. c) The assumption of sphericity is likely to have been violated.
75
d) The assumption of sphericity is likely to have been met. Ans: D Explanation: For the Greenhouse–Geisser estimate, the lowest possible value is 1/ (k-1) which with five conditions will be 1/ (5-1)= .25. The Greenhouse–Geisser estimate is closer to the upper limit of 1 than to the lower limit of .25, so we do not have many deviations from sphericity at all. 11. Imagine we were interested in the effect of supporters singing on the number of goals scored by soccer teams. We took 10 groups of supporters of 10 different soccer teams and asked them to attend three home games, one at which they were instructed to sing in support of their team (e.g., ‘Come on, you Reds!’), one at which they were instructed to sing negative songs towards the opposition (e.g., ‘You’re getting sacked in the morning!’) and one at which they were instructed to sit quietly. The order of chanting was counterbalanced across groups. Which of the following could be used to analyse these data? (Hint: All participants took part in all experimental conditions.) a) Loglinear analysis b) One-way repeated-measures ANOVA c) Three-way repeated measures ANOVA d) One-way between-groups ANOVA Ans: B Explanation: There was one independent variable (singing) with three levels (positive, negative, and none), and all participants took part in all three conditions. 12. Imagine we were interested in the effect of supporters singing on the number of goals scored by soccer teams. We took 10 groups of supporters of 10 different soccer teams and asked them to attend three home games, one at which they were instructed to sing in support of their team (e.g., ‘Come on, you Reds!’), one at which they were instructed to sing negative songs towards the opposition (e.g., ‘You’re getting sacked in the morning!’) and one at which they were instructed to sit quietly. The order of chanting was counterbalanced across groups. Which of the following sentences regarding the output from Mauchly’s test of sphericity below is correct?
a) Mauchly’s test indicated that the assumption of sphericity had been met, c2 (2) = 2.50, p = .29. b) Mauchly’s test indicated that the assumption of sphericity had been met, c2 (2) = 2.50, p < .05. c) Mauchly’s test indicated that the assumption of sphericity had been violated, c2 (2) = .73, p = .29. d) Mauchly’s test indicated that the assumption of sphericity had not been met, c2 (2) = 2.50, p > .05.
76
Ans: A Explanation: The significance value of Mauchly’s W is .287, which is larger than .05, indicating that the assumption of sphericity had been met. 13. Which of the following statements is false? When you have data that violate the assumption of sphericity: a) The Greenhouse–Geisser or Huynh–Feldt correction should be applied. b) The degrees of freedom are adjusted for any F-ratios affected by the violation using estimates of sphericity. c) The means are adjusted for any groups that are affected by the violation using estimates of sphericity. d) You can use multivariate test statistics (MANOVA) instead. Ans: C Explanation: This statement is false, therefore it is correct. When sphericity is violated, the means stay the same, it is the degrees of freedom and the resulting F-ratio that change. 14. Imagine we were interested in the effect of supporters singing on the number of goals scored by soccer teams. We took 10 groups of supporters of 10 different soccer teams and asked them to attend three home games, one at which they were instructed to sing in support of their team (e.g., ‘Come on, you Reds!’), one at which they were instructed to sing negative songs towards the opposition (e.g., ‘You’re getting sacked in the morning!’) and one at which they were instructed to sit quietly. The order of chanting was counterbalanced across groups. An ANOVA with simple contrasts using the last category as a reference was conducted. Looking at the output tables below, what does the first contrast (Level 1 vs. Level 3) compare?
77
a) We cannot tell which groups the levels represent from this output. b) Negative singing vs. positive singing c) No singing vs. negative singing d) Positive singing vs. no singing Ans: D Explanation: If you look at the Within-Subjects Factors box, you can see that level 1 = positive singing, level 2 = negative singing and level 3 = no singing. 15. When conducting a repeated-measures ANOVA which of the following assumptions is not relevant? (Hint: Repeated-measures ANOVA is an extension of the linear model and so all of the sources of potential bias (and counteractive measures) apply.) a) Homogeneity of variance b) They are all relevant c) Sphericity d) Independent residuals Ans: A Explanation: The assumption of homogeneity of variance (the assumption that the variances between groups are roughly equal) is relevant for between-groups ANOVA and not for repeated-measures ANOVA.
16. Based on the ANOVA table below calculate the value of F. (Hint: Calculate the value of. MSR first.) ANOVA SPAIDIF
Between Groups Within Groups Total
Sum of Squares 1582.858 2142.488 3725.347
df 2 45 47
Mean Square 791.429 ?????
F ?????
The correct answer is 16.62 solution
MSR =
SSR 2142.488 = = 47.6108 df R 45
78
F=
MSM 791.429 = = 16.623 MSR 47.6108
17. Based on the ANOVA table below calculate the value of F. Report your answer to 3 decimal places. ANOVA Grade on next assignment – grade on previous assignment (%) Sum of Squares
df
Mean square
Between Groups
1810.038
2
905.019
Within Groups
9466.100
26
364.081
Total
11276.138
28
F
Sig. .103
Ans: The correct answer is 2.486 Solution:
F=
MSM MSR
F=
905.019 = 2.48576 364.081
18. A researcher conducted an experiment to look at the effect of different coaching strategies on player performance. He took 12 teams over three seasons, and each season he subjected them to one of three different managerial ‘team talk’ strategies counterbalanced across groups. The three strategies were: ‘supportive’ (the manager used words of support and encouragement such as ‘Come on, team, you can do it’), ‘tough love’ (the manager used a more aggressive approach such as ‘Get out there and do better, you useless bunch of losers’), or ‘neutral’ (the manager just discussed tactic strategies with the team). The outcome was the average number of wins over the season out of a total of 38. To analyse his data, the researcher decided to conduct a one-way repeated-measures ANOVA by hand. He calculated the model mean squares as 275.861, the residual mean squares as 7.104 and the residual degrees of freedom as 22. What was the F-ratio? Round your answer to 2 decimal places. Ans: The correct answer is F = 38.83 Solution:
F=
MSM 275.861 = = 38.8317 MSR 7.104
79
19. A researcher conducted an experiment to look at the effect of different relaxation strategies on sleep quality. She took 15 people over three months, and each month she subjected them to one of four different relaxation techniques 1 hour before the participant went to bed (massage, reading, sleepy tea and nothing) counterbalanced across groups. The outcome was the average number of hours of sleep over the course of the month. To analyse her data, the researcher decided to conduct a one-way repeated-measures ANOVA. The Greenhouse–Geisser estimate of sphericity was .587, the original residual degrees of freedom were 42. a. Use the estimate of sphericity to correct the model degrees of freedom. (Round your answer to 2 decimal places.) b. Use the estimate of sphericity to correct the residual degrees of freedom. (Round your answer to 2 decimal places.) Ans: Part a) The correct answer is dfM = 1.76 Part b) The correct answer is dfR = 24.65 Solution: dfM = number of experimental conditions - 1 = 4 - 1 = 3 Corrected dfM = 3 ´ .587 = 1.76 Corrected dfR = 42 ´ .587 = 24.65 Chapter 17: Factorial designs 1. What type of ANOVA is used when there are two independent variables each with more than two levels, and with different participants taking part in each condition? a) Factorial b) One-way independent c) Mixed d) One-way between subjects Ans: A 2. Two-way repeated-measures ANOVA compares: a) Several means when there are two independent variables, and the same entities have been used in some of the conditions. b) Two means when there are more than two independent variables, and the same entities have been used in all conditions. c) Several means when there are two independent variables, and the same entities have been used in all conditions. d) Several means when there are more than two independent variables, and some have been manipulated using the same entities and others have used different entities.
80
Ans: C 3. Two-way ANOVA is basically the same as one-way ANOVA, except that: a) We calculate the model sum of squares by looking at the difference between each group mean and the overall mean. b) The residual sum of squares represents individual differences in performance. c) The model sum of squares is partitioned into two parts. d) The model sum of squares is partitioned into three parts. Ans: D Explanation: The model sum of squares is partitioned into the effect of each of the independent variables and the effect of how these variables interact. 4. How many effects will there be from a two-way repeated-measures ANOVA? a) 4 b) 2 c) 3 d) It will vary depending on how many levels of your independent variable there are. Ans: C Explanation: There will be the main effect of each variable and the interaction between the two. 5. A recent story in the media has claimed that women who eat breakfast every day are more likely to have boy babies than girl babies. Imagine you conducted a study to investigate this in women from two different age groups (18–30 and 31–43 years). Looking at the output tables below, which of the following sentences best describes the results?
81
a) The model is a poor fit of the data. b) There was a significant two-way interaction between eating breakfast and the age group of the mother. c) Women who ate breakfast were significantly more likely to give birth to baby boys than girls. d) Whether or not a woman eats breakfast significantly affects the gender of her baby at any age. Ans: D Explanation: To interpret this interaction, we could perform a chi-square test on breakfast and gender. 6. An experiment was conducted to see how people with eating disorders differ in their need to exert control in different domains. Participants were classified as not having an eating disorder (control), as having anorexia nervosa (anorexic), or as having bulimia nervosa (bulimic). Each participant underwent an experiment that indicated how much they felt the need to exert control in three domains: eating, friendships and the physical world (this final category was a controlled domain in which the need to have control over things like gravity or the weather was assessed). So all participants gave three responses in the form of a mean reaction time; a low reaction time meant that the person did feel the need to exert control in that domain. The variables have been labelled as a group (control, anorexic, or bulimic) and domain (food, friends, or physical laws). Of the following options, which analysis should be conducted? a) Two-way repeated-measures ANOVA. b) Three-way independent ANOVA c) Two-way mixed ANOVA d) Analysis of covariance Ans: C Explanation: Group is a between-subjects variable and domain is a within-subjects variable. 7. A study was conducted to look at whether caffeine improves productivity at work in different conditions. There were two independent variables. The first independent variable was email, which had two levels: ‘email access’ and ‘no email access’. The second independent variable was caffeine, which also had two levels: ‘caffeinated drink’ and ‘decaffeinated drink’. Different participants took part in each condition. Productivity was recorded at the end of the day on a scale of 0 (I may as well have stayed in bed) to 20 (wow! I got enough work done today to last all year). Looking at the group means in the table below, which of the interpretations below is correct? Decaffeinated Drink
Caffeinated Drink
82
No Email
12.08
19.83
11.98
5.49
a) A simple effects analysis is likely to show an effect of email on productivity at both levels of caffeine. b) A simple effects analysis is likely to show an effect of caffeine on productivity at both levels of email. c) A simple effects analysis is likely to show an effect of email on productivity for decaffeinated drinks but not caffeinated ones. d) A simple effects analysis is likely to show an effect of caffeine on productivity only for ‘no email’. Ans: B Explanation: The effect of caffeine for no email is 19.83 – 12.08 = 7.75; and for email 5.49 – 11.98 = – 6.49, both of which are not 0. 8. A study was conducted to look at whether caffeine improves productivity at work in different conditions. There were two independent variables. The first independent variable was email, which had two levels: ‘email access’ and ‘no email access’. The second independent variable was caffeine, which also had two levels: ‘caffeinated drink’ and ‘decaffeinated drink’. Different participants took part in each condition. Productivity was recorded at the end of the day on a scale of 0 (I may as well have stayed in bed) to 20 (wow! I got enough work done today to last all year). Looking at the group means in the table below, which of the following statements best describes the data? Decaffeinated drink
Caffeinated drink
No email
12.08
19.83
11.98
5.49
a) A significant interaction effect is likely to be present between caffeine consumption and email access. b) The effect of caffeine is about the same regardless of whether the person had email access. c) The effect of email is relatively unaffected by whether the drink was caffeinated. d) There is likely to be a significant main effect of caffeine. Ans: A Explanation: Yes, this is correct: for decaffeinated drinks, there is little difference between email and no email, but for caffeinated drinks there is. 9. An experiment was done to look at whether there is an effect of the number of hours spent practising a musical instrument and gender on the level of musical ability. A sample of 30 (15 men and 15 women) participants who had never learnt to play a musical instrument before were recruited. Participants were randomly allocated to one of three groups that varied in the number of hours they would spend practising every day for 1 year (0 hours, 1 hour, 2 hours). Men and women were divided equally across groups. All participants had a one-hour lesson each week over the course of the year, after which their level of musical skill was measured on a 10-point scale ranging from 0 (you can’t play
83
for toffee) to 10 (‘Are you Mozart reincarnated?’). An ANOVA was conducted on the data from the experiment. Looking at the output below, which of the following sentences is correct?
a) There was a significant main effect of gender. Looking at the group means, we can see that women achieved a significantly higher level of musical skill (M = 5.53) than males (M = 4.87). b) There was a significant main effect of practice. This means that men and women significantly differed in the amount of practice that they did. c) There was no significant main effect of gender. This means that overall when we ignore the number of hours spent practising, the gender of the participant did not have a significant effect on the level of musical skill. d) There was no significant practice ´ gender interaction effect, indicating that if we ignore the gender of the participant, the number of hours spent practising did not significantly affect the level of musical skill. Ans: C Explanation: If you look at the means in the table labelled Gender you can see that although women scored slightly higher (M = 5.53) than men (M = 4.87), the means were very similar. 10. An experiment was conducted to see how people with eating disorders differ in their need to exert control in different domains. Participants were classified as not having an eating disorder (control), as having anorexia nervosa (anorexic), or as having bulimia nervosa (bulimic). Each participant underwent an experiment that indicated how much they felt the need to exert control in three domains: eating, friendships and the physical world (this final category was a controlled domain in which the need to have control over things like gravity or the weather was assessed). So all participants gave three responses in the form of a mean reaction time; a low reaction time meant that the person did feel the need to exert control in that domain. The variables have been labelled as a
84
group (control, anorexic or bulimic) and domain (food, friends, or physical laws). Looking at the output below, what can we conclude about the main effect of the domain variable?
a) There was a significant effect of domain, F(2, 54) = 8.02, p < .01, on the degree to which people felt the need to exert control. b) There was not a significant effect of the domain, F(2, 54) = 8.02, p < .05, on the degree to which people felt the need to exert control. c) People with eating disorders need to exert more control over different domains of their life compared to controls, F(1.55, 41.89) = 8.02, p < .01. d) There was a significant effect of the domain, F(1.55, 41.89) = 8.02, p = .001, on the degree to which people felt the need to exert control. Ans: D Explanation: The significant result for Mauchly’s test indicates that the assumption of sphericity has not been met. However, both the Greenhouse–Geisser and Huynh–Feldt corrections were significant. 11. An experiment was done to look at whether there is an effect of the number of hours spent practising a musical instrument and gender on the level of musical ability. A sample of 30 (15 men and 15 women) participants who had never learnt to play a musical instrument before were recruited.
85
Participants were randomly allocated to one of three groups that varied in the number of hours they would spend practising every day for 1 year (0 hours, 1 hour, 2 hours). Men and women were divided equally across groups. All participants had a one-hour lesson each week over the course of the year, after which their level of musical skill was measured on a 10-point scale ranging from 0 (you can’t play for toffee) to 10 (‘Are you Mozart reincarnated?’). Which of the following tests could we use to analyse these data? a) Two-way independent ANOVA b) Two-way repeated-measures ANOVA c) Three-way ANOVA d) t-test Ans: A Explanation: We have two independent variables, ‘gender’ and ‘number of hours spent practising’, and different participants took part in each condition. 12. Field and Lawson (2003) reported the effects of giving children aged 7–9 years positive, negative or no information about novel animals (Australian marsupials). This variable was called ‘Infotype’. The gender of the child was also examined. The outcome was the time taken for the children to put their hand in a box in which they believed either the positive, negative or no information animal was housed (positive values = longer than average approach times, negative values = shorter than average approach times). Some simple contrasts were performed on the data. Based on the SPSS output given, which of the following statements is true? (Levels of Infotype were entered in the following order: negative information, positive information, no information.) (Hint: The order that the levels of the independent variable are entered corresponds to the level of the contrast in the output: negative information = level 1, positive information = level 2, no information = level 3.) Tests of Within-Subjects Effects Measure: MEASURE_1 Source INFOTYPE
Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound INFOTYPE * GENDER Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Error(INFOTYPE) Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound
Type III Sum of Squares 9.177 9.177 9.177 9.177 .599 .599 .599 .599 51.664 51.664 51.664 51.664
df 2 1.940 2.000 1.000 2 1.940 2.000 1.000 82 79.544 82.000 41.000
Mean Square 4.588 4.730 4.588 9.177 .299 .309 .299 .599 .630 .650 .630 1.260
F 7.283 7.283 7.283 7.283 .475 .475 .475 .475
Sig. .001 .001 .001 .010 .623 .618 .623 .495
86
Tests of Between-Subjects Effects Measure: MEASURE_1 Transformed Variable: Average Source Intercept GENDER Error
Type III Sum of Squares 2.034E-02 1.822E-03 17.109
df 1 1 41
Mean Square 2.034E-02 1.822E-03 .417
F
Sig. .826 .948
.049 .004
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source INFOTYPE
INFOTYPE Level 1 vs. Level 3 Level 2 vs. Level 3 INFOTYPE * GENDER Level 1 vs. Level 3 Level 2 vs. Level 3 Error(INFOTYPE) Level 1 vs. Level 3 Level 2 vs. Level 3
Type III Sum of Squares 11.090 .447 1.177 .174 51.896 43.689
df 1 1 1 1 41 41
Mean Square 11.090 .447 1.177 .174 1.266 1.066
F 8.762 .420 .930 .163
Sig. .005 .521 .341 .688
a) Approach times for the box containing the positive animal were significantly shorter to the box containing the control (no information) animal. b) Approach times for the box containing the negative animal were significantly longer than for the box containing the control (no information) animal. c) Approach times for the box containing the negative animal were not significantly different from those for the box containing the positive information animal. d) The profile of results were different for boys and girls. Ans: B
87