Research Methodology and Statistical Reasoning Introductory Notes Johar
Contents 1 Importance of Statistics 1.1 Introduction to Research . . . . . 1.2 Population and Sample . . . . . . 1.3 Research Problem and Hypothesis 1.4 Variables . . . . . . . . . . . . . .
. . . .
2 2 2 3 3
. . . . . . . . . .
4 4 4 5 5 6 6 6 7 7 7
3 The Normal Curve and its Importance in Choosing a Statistic 3.1 The Normal Curve and its Properties . . . . . . . . . . . . . . . . . . . . . 3.2 Skewness, Kurtosis and Tests of Normality . . . . . . . . . . . . . . . . . .
7 7 8
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
2 Research Methods and Design 2.1 The Experimental Method . . . . . . . . . . . . . . . 2.2 Between Subjects or Independent Groups Design . . 2.3 Repeated Measures or Within Subjects Design . . . . 2.4 Complex/Factorial Designs . . . . . . . . . . . . . . . 2.5 Non-Experimental Design . . . . . . . . . . . . . . . 2.6 Quasi-Experimental or Natural Groups Design . . . . 2.7 Data Analyses of Observational and Descriptive Data 2.8 Case Study . . . . . . . . . . . . . . . . . . . . . . . 2.9 Survey Research . . . . . . . . . . . . . . . . . . . . . 2.10 Choosing the Right Research Method . . . . . . . . .
4 NOIR 4.1 Nominal Scale . . . . 4.2 Ordinal Scale . . . . 4.3 Interval Scale . . . . 4.4 Ratio Scale . . . . . 4.5 Concluding Remarks
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
5 Descriptive Statistics 5.1 Measures of Central Tendency: Mean . . . . . 5.2 Measures of Central Tendency: Median, Mode 5.3 Measures of Variability: Range . . . . . . . . 5.4 Measures of Variability: Quartile Deviation . . 5.5 Measures of Variability: Variance . . . . . . . 5.6 Measures of Variability: Standard Deviation . 1
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
. . . . . .
. . . .
. . . . . . . . . .
. . . . .
8 . 8 . 9 . 9 . 9 . 10
. . . . . .
10 10 11 11 11 12 12
. . . . . .
6 Inferential Statistics
12
7 Interpreting a Statistic 12 7.1 Factors related to Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 13 7.2 Effect Size and Practical Significance . . . . . . . . . . . . . . . . . . . . . 13 8 Common Errors and Biases 13 8.1 Sources of Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 8.2 Errors in Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 9 Software 14 9.1 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 9.2 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 9.3 PSPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1
Importance of Statistics
1.1
Introduction to Research
This course focuses on the basic components of research and aims at improving one’s knowledge about how to carry out a research.
1.2
Population and Sample
Consider a study where participants have to learn two lists of words: words denoting emotion and words having a non-emotional or neutral meaning. You want to study if participants remember one category of words better than the other. • Population: This is all the data you are interested in collecting; a population can be large or small, as long it covers all of your data. In the above example, your population can be defined as any person between ages 1640. • Sample: This is a subset of the population you want to study; a sample must be adequate and representative of the population it is drawn from. Using the example above, your sample can be 300 individuals who have volunteered to be a part of your study. • The characteristics of a population are called parameters. E.g., standard deviation of the population. • Characteristics of a sample are called statistics. For instance, inferential statistics are calculated for the sample in order to estimate population parameters. • Sampling can be done in two ways: Probability Sampling and Non-probability sampling. • In probability sampling, every unit of the population has equal probability (or chance) of being selected in the sample. For example, random sampling, stratified sampling, and cluster sampling. 2
• In non-probability sampling, all units of population does not have an equal probability (or chance) of getting selected in the sample. For example, quota sampling, purposive sampling, and convenience sampling.
1.3
Research Problem and Hypothesis
• One of the first steps in research is identifying a research problem and a hypothesis. • Hypothesis: This is a tentative explanation of a problem, which is expressed in the form of a prediction of some outcome. For instance, it is hypothesized that words with an emotional connotation will be better recalled than neutral words. • Research Problem: This is framed like a question or statement that speculates the relationship between two variables. Unlike a hypothesis a research problem is exploratory in nature and does not give a prediction for the variables.
1.4
Variables
• Variable: In research, a variable is a psychological construct that takes on different values. • Independent Variable (IV): The variable that the researcher manipulates or selects to determine its effect on the Dependent Variable (DV). In the above example, the IV is the category of the words: emotional or neutral. • In its simplest form, the IV has at least two levels. One level, where there is some form of treatment, and another where the same treatment is absent. The levels of the IV in our example are 2: Emotional and Neutral. • Dependent Variable (DV): This is the measure of behaviour that is used to observe the effects of the IV. Its outcome depends on the independent variable. Research aims to determine whether the levels of the IV cause any difference in the DV. The DV in our example is participants performance on a recall test. • Sometimes, the DV can be explained by some variable other than the IV. This other variable is called a confounding variable. A possible confound in our example is if any participant has already undergone a similar experiment in the past or is a psychology student and hence be acquainted with the experiment hypothesis. • When such a variable is present, it is difficult to tell whether the change in the DV is brought about the IV or by the confounding variable. • The effects of these confounding variables can be managed by introducing Controls in the experiment. • One such control is the technique of counter-balancing that helps correct for order effects. Thus, counter-balancing involves presenting the IVs in all possible orders in order to neutralize order effects. In our example, we would present half the participants with the emotional list first followed by the neutral list, whereas the other half would get the neutral list followed by the emotional list.
3
2
Research Methods and Design
2.1
The Experimental Method
• An experiment involves the manipulation of one or more IVs order to observe their effects on one or more DVs. An experiment allows us to: – Test a hypothesis: By allowing us to exercise controls, experiments can attempt to eliminate extraneous factors and test a hypothesized relationship between the IV and the DV. – Make causal inferences between the IV and DV: Experiments also allow us to state with a high degree of confidence that changes in the IV cause changes in the DV. • Three conditions are necessary in order to establish a causal link between the IV and the DV: – Covariation: There is an observed relationship between the IV and the DV. – Time-order relationship: The change in the DV is observed after the IV is manipulated implying that that the changes are contingent on the manipulations of the IV. – Elimination of plausible alternative causes: Accomplished by the use of controls and counter-balancing.
2.2
Between Subjects or Independent Groups Design
• Main characteristics: Each participant undergoes a different level of the IV. • Limitation: Individual differences among participants in the different groups might confound the results. • For this reason the different groups are matched, i.e., the researcher ensures that the groups are similar to each other in important characteristics that might confound the results. • For instance, if socio-economic status is a possible confounding factor between different groups, the researcher can divide the participants into low, middle, and high income groups, on the basis of reported characteristics. Each socio-economic group will consist of respondents from each of these levels, and will undergo different conditions of the IV. • These matched groups are then randomly assigned to one of the conditions or levels of the IV. The rationale is that random assignment will neutralize or balance out the individual differences between participants so that they are similar in all other characteristics except for the condition or level of the IV they undergo.
4
2.3
Repeated Measures or Within Subjects Design
• Main characteristics: Each participant undergoes all levels of the IV. • When is it used: This design is suitable when the sample is small or specialized or to study changes in behaviour over time (e.g., learning research). Compared to the Between Subjects Design, individual differences do not have a confounding effect on the DV in this design. • Limitations: Due to repeated testing in this design, participants might show practice effects. Their performance can either get better from one condition to the next, or worse across conditions due to fatigue effects. • Practice effects can be controlled by counter-balancing the order of presentation of the IV across various conditions.
2.4
Complex/Factorial Designs
• Often, research involves more than one IV in which case a complex or factorial design would be used. • In this, each level of IV1 is paired with each level of IV2, so that we can observe each IV independently (main effect) as well as in interaction with other IVs (interaction effect). • For example: We hypothesize that the audio-visual method of teaching leads to better performance on a class test than the black-board method, in low ability students. Here, we have 2 IVs: Method of teaching (manipulated IV1): Audiovisual and black board; and Ability of the student (selected IV2): Low, Middle or High. • An example of a main effect would be higher than baseline test scores by audio-visual method of teaching for all three groups of students. An interaction effect would be a specific improvement in performance when the low ability group is taught using the audiovisual method. • The number of conditions in a complex design is obtained by multiplying the number of levels of all IVs involved. Taking the above example, since IV1 has two levels and IV2 has three levels, this design is represented as a 2 × 3 factorial design. • Factorial designs can be of three types: Completely between subjects factorial designs, completely within subjects factorial designs and mixed factorial designs • In a completely between subjects factorial design, different sub-groups are randomly assigned to undergo different levels of independent variable. For example, for a 2 × 2 factorial design, there will 4 subgroups undergoing each level of the IV. Thus, both IVs will be between groups IVs. For instance, IV1 is gender (male, female), and IV2 is location (urban, rural). • In a completely within subjects factorial design, the same group of participants will go through all the levels of the IVs. For example, in a 2 × 2 factorial design where IV1 is stimulus (word, image), and IV2 is valence (positive, negative), all 5
participants will be presented with positive words, positive images, negative words, and negative images. • In a mixed factorial design, one IV is a between subjects variable, and the other IV is a within subjects variable. For example, in a 2 × 2 factorial design where IV1 is gender (male, female) and IV2 is stimulus (word, image), both male and female participants will be presented with images and words. Thus, participants are belong to different levels of one independent variable and participate in all the levels of other variable. It is a combination of completely between subjects design and completely within subjects design.
2.5
Non-Experimental Design
It is often not practical to observe behaviour in the strictly controlled conditions that experimental research demands. Quasi experimental design is one that applies an experimental interpretation to results that do not meet all the requirements of a true experiment. It lacks the degree of control found in true experiments. It controls some but not all the confounding variables, which gives threat to its internal validity.
2.6
Quasi-Experimental or Natural Groups Design
• Main characteristic: This uses existing individual differences among participants. Contrary to a Between-Subjects Design where the aim is to neutralize these differences across different conditions, Natural Groups Design maximizes one or few differences among its participants. • For example, research on brain lesions would require a select sample of participants who already have brain lesions. Similarly, to study the effects of divorce, one would require a group of divorced persons.
2.7
Data Analyses of Observational and Descriptive Data
• Data analyses of observational and descriptive data can involve the following: – Data Reduction Procedures: These procedures help abstract and summarize the data. An example of such a procedure is coding, in which the data are broken down into smaller units and classified as per pre-decided categories. – Content Analysis: Such an analysis can be done with archival records, written communications, film and other media, online material like tweets or blog entries. Once the source of the content analysis is decided, the next step involves sample selection from the source (for instance, 10 random blog entries made by a celebrity for a month, up to 6 months). Next, the content will be coded in order to further interpret emerging patterns. – Descriptive Statistics can also be computed for such data on the basis of frequency counts, timing, and ratings.
6
– When data is independently observed, rated or analyzed by two or more judges/raters, the inter-rater agreement can be calculated on the basis on the percentage of agreement between both raters.
2.8
Case Study
• Case studies are commonly used in fields such as clinical psychology, neurology, anthropology, and criminal psychology. They describe and analyze one individual. Case studies lack the level of control used in experimental research. • Scales may be used in such research to measure participants’ judgements or their relative standing on a personality trait.
2.9
Survey Research
• Survey data can be obtained through personal interviews, telephone interviews or internet surveys. • Most surveys use questionnaires as means to measure the variables of study. Questionnaires mainly provide a measure of the demographic details of the participants along with their attitudes, preferences. • The accuracy of a questionnaire depends on the care taken in framing its contents and on the specificity of its language, which helps illicit unambiguous responses from participants.
2.10
Choosing the Right Research Method
• Existing literature to see the methodology used in similar studies in the past. • Goal of the study. If a study is observational, it would require a qualitative methodology than if it were purely experimental. • Nature of data collected. The level of measurement of data will determine the analyses to be performed.
3
The Normal Curve and its Importance in Choosing a Statistic
3.1
The Normal Curve and its Properties
• An important aspect of the description of a variable is the shape of its frequency distribution, which describes the frequency of values from different ranges of the variable. Data is normally distributed when the distribution has an equal mean, median, and mode. 7
• A normally distributed curve is also called the bell curve because of its shape, where 50% of the values are below the mean and 50% of the values are above the mean. • The exact shape of the normal distribution is defined by a function, which has two parameters: mean and standard deviation. The standard deviation is a measure of dispersion that demonstrates how spread out the data is. • A characteristic property of the normal distribution is that 68% of observations fall within a range of ±1 standard deviation from the mean; a range of ±2 standard deviations includes 95% of the scores; and 99.7% of the observations fall within ±3 standard deviations of the mean. A standard normal distribution has mean = 0 and standard deviation = 1.
3.2
Skewness, Kurtosis and Tests of Normality
• Significant skewness and kurtosis clearly indicate that data are not normal. Skewness is a measure of the lack of symmetry. Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. • The skewness and kurtosis of the normal distribution are 0 and 3 respectively. If these are markedly different from zero, data is asymmetrically distributed. • The histogram is an effective graphical technique for showing both the skewness and kurtosis of data set. In statistics, normality tests are used to determine if a data set has a normal distribution and in descriptive statistics, a goodness of fit test is used. • All parametric tests follow the assumption of normal distribution while non-parametric tests do not require data to be normally distributed. Normal distribution is also a prerequisite for calculating confidence intervals. • For inferential statistics, tests of normality that are commonly used include KolmogorovSmirnov test (N > 50), or the Shapiro-Wilk’s W test (N < 50).
4
NOIR
A factor that determines the amount of information that can be provided by a variable is its scale of measurement. There are four basic levels of measurement, namely: nominal, ordinal, interval, and ratio.
4.1
Nominal Scale
• The scale involves categorizing an event into one of a number of discrete categories. This scale of measurement is used when behaviours and events are classified into mutually exclusive categories. The variable may also be referred to as a categorical variable.
8
• One observation cannot fall under more than one category. A very common way of summarizing nominal data is by reporting the frequency in the form of proportion or percentage in each of the categories. • Examples of variables that would typically fall under this category are gender and marital status. The attributes of a variable are only named, and the measure of central tendency that can be used is the mode.
4.2
Ordinal Scale
• The second level of measurement is called the ordinal scale. This involves ranking the events that need to be measured. • When using an ordinal scale, the central tendency of a group of items can be described by using the group’s mode or median. • The interval between values is not to be interpreted in an ordinal measure. • Other statistical ways of analyzing an ordinal scale are percentiles and rank order correlations, but not the mean. Here the attributes of the variable can only be ordered. • Variables that are ordinal would be ranking of scores, brands, or products. These rankings only give the direction of difference between variables, not the degree of difference.
4.3
Interval Scale
• The third level of measurement is an interval scale that involves specifying how far apart two events are on a given dimension. Interval scales specify relative size or degree of difference between the items measured. • In an interval measurement, the distances between attributes have meaning. An interval scale does not have an absolute meaningful zero point. • Intervals between categories are usually equal. Product moment correlations can be used to measure such scales.
4.4
Ratio Scale
• The fourth level of measurement is called a ratio scale. A ratio scale has all the properties of an interval scale, but a ratio scale also has an absolute zero point. • In terms of arithmetic operations, a zero point makes the ratio of scale values meaningful. • Examples of variables that are commonly considered to be ratio are temperature, weight, and length where, zero is not absolute but has meaning and value.
9
• The scale that contains the richest information about an object is ratio scaling. The ratio scale also contains all of the information of the previous three levels.
4.5
Concluding Remarks
The four levels of measurement are very important for analyzing the results of a research study. At the lower levels of measurement, assumptions tend to be less restrictive and data analyses tend to be less sensitive. At each level up the hierarchy, the current level includes all of the qualities of the one below it and adds something new. In general, it is desirable to have a higher level of measurement (e.g., interval or ratio) rather than a lower one (nominal or ordinal).
5
Descriptive Statistics
Descriptive statistics allow us to i) describe and ii) summarize raw data. They help make sense of vast quantities of raw data by determining emerging patterns and trends. These statistics do not allow us to i) make conclusions beyond the available data about the population; ii) test hypotheses. Descriptive statistics provide us with statistics on the basis of inferential procedures can be conducted.
5.1
Measures of Central Tendency: Mean
These measures represent a single value that describes the data by identifying the central position within the data. Arithmetic Mean • The most frequently used measure of central tendency, the mean, is the sum of all measurements divided by the number of observations in the data set. • The mean includes every value in the data set for its calculation. This makes the mean a more robust measure than median or mode as even small changes in the data set can be reflected. • However, the mean is susceptible to the presence of outliers in the data set. The presence of outliers can artificially increase or decrease the value of the mean. In such cases, the mean can be used after using an appropriate method to treat outliers, else the median would be more representative. • Another situation where the mean may not be the best measure to use is when the data distribution is significantly skewed. In a normal distribution the mean, median, and mode is the same. However, skewness drags the mean away from its central location making it unrepresentative of the data. In such situations, the median is a better measure as it is less affected by skewness.
10
5.2
Measures of Central Tendency: Median, Mode
Median • The median is the middle value or score in an ordered data set. • Unlike the mean, it is less affected by outliers or skewness. As a rule of thumb, when the data set is skewed, the median is more representative of the central tendency than mean. Mode • Mode is the most frequent value or score in a data set. It represents the most popular alternative chosen by the sample, and is often used with categorical data. • However, the mode is not unique to one category or option. When two or more values or options share the same frequency the data set will have multiple modes. In such a situation it is difficult to decide which mode is the most representative of the data. • The mode is also not the most representative measure when the most common score is far away from the rest of the data.
5.3
Measures of Variability: Range
These measures describe the spread of the data set. For instance, if the mean score of a data set of 150 students is 60 out of 100, not all students will have scored around this number. A measure of variability will tell us how the scores are spread across the 150 students. Measures of central tendency are usually reported along with measures of variability to given a holistic picture of the data set. Thus, if the spread of the data is large, it implies that that the mean does not adequately represent the data. A large spread means that there are larger differences between individual scores. Moreover, from a research perspective a small spread is more desirable as it implies smaller variability in our sample. Range • Difference between the highest and lowest score.
5.4
Measures of Variability: Quartile Deviation
• Quartiles describes the spread of the data by dividing the data set down to quarters. • Quartiles are less affected by skewness and outliers and often reported with the median is such cases. • The second quartile is equivalent to the median. • Quartiles are often reported as the interquartile range, which is the difference between the first and the third quartile
11
• However, the quartiles do not take into account every score in the data set. Thus, measures like the variance are more representative of the spread.
5.5
Measures of Variability: Variance
• Variance is the difference or the deviation of the scores from the mean. These deviations are squared to remove any negative values. • If the data is mostly centred around the mean, the variance will be small. Large variance implies more spread in the data. • Disadvantages: Because variance squares the deviations from the mean, it can give undue weight to outliers. Variance is expressed in square units and not the direct unit of measurement of the data set. This makes it difficult to relate the variance to our data set.
5.6
Measures of Variability: Standard Deviation
• The SD describes how much the data set deviates from or is different from the mean. It is calculated as the square root of the variance and symbolized by σ for the population SD, and s for sample SD. Like the mean, the SD is used with continuous data and appropriate only when the distribution is not skewed or has outliers.
6
Inferential Statistics • Inferential statistics are used to draw conclusions and test hypotheses. When using inferential statistics, we analyze the sample to draw relationships, causal links, or predictions about the population we are interested in. • Thus, an important precondition when using inferential statistics is having a representative sample. This is achieved by using the right sampling technique. However, sampling naturally involves sampling errors and cannot be expected to accurately represent the population.
7
Interpreting a Statistic
After choosing and applying an appropriate statistical procedure, the next step is understanding what the statistic is telling us about our data. Correct interpretation of statistics takes them from being just numbers and gives them meaning and relevance in research.
12
7.1
Factors related to Statistics
• Magnitude or Size of the relationship: For instance, if we find that audio visual methods of teaching are more effective in low performing students as compared to high performing ones, the value of this relationship (as given by the chosen statistic) is its magnitude or size. When this is large, it means that we can predict our dependent variable based on the independent variable (at least among members of our sample). • Reliability or significance of the relationship: This tells us how representative our results are for the entire population. Statistics are based on the rationale of generalizing to the population, based on a representative sample. The p-value provides this significance.
7.2
Effect Size and Practical Significance
• The effect size describes how much effect on the DV was as a result of the IV. • An important feature of the effect size is that it is not affected by the sample size. Thus, on finding a statistically significant relationship, checking for the effect size confirms if the results are an artifact of a large sample size or due to a genuine relationship. • The effect size is expressed in terms of Cohens d. In research literature, effect sizes of .20, .50 and .80 represent small, medium and large effects of the IV on the DV. • Effect size is also expressed in terms of ‘eta-squared’. It tells us how much variation in the DV can be attributed to the IV. • The effect size is a measure of whether the results have practical significance beyond being statistically significant.
8
Common Errors and Biases
8.1
Sources of Bias
These refer to conditions that limit the external validity of our findings based on statistics. • Lack of Representative Sampling: Ideally, in a representative sample, each individual in the population has an equal chance of being selected. However, this may not always be the case in practice. • Assumptions of Normality: Valid interpretations can be made as long as they your data fulfil the basic assumptions of a particular statistic. One common assumption is that of normality which is often disregarded. Even if the statistic is a robust one, it is necessary to ensure that the data are normally distributed. • Assumption of Independence of Observations: Most statistical tests assume that each participant has individually undergone the IV, without suggestion or influence 13
of other participants or the researcher. However, when the sample is a concentrated one, say like a class of second year psychology students, it is difficult to maintain independence of observations.
8.2
Errors in Methodology
â&#x20AC;˘ Statistical Power: The power of a statistical test depends on four factors: sample size, effect size, type I error rate (alpha value), are variability in the sample. These factors can help calculate the power of the statistic being used. A low power level will enable you to reconsider the methodology and analyses to be used. â&#x20AC;˘ Multiple Comparisons: Errors can occur when one has several variables to compare and can be intensified if these comparisons are done in a haphazard manner. Thus, if we calculate correlations between each combination of our several variables spurious correlations there is a probability that some of them will be spurious just because of the number of correlations. â&#x20AC;˘ Measurement Error: This refers to the measurement tools employed in research such as rating scales, questionnaires, and psychometric tests. To reduce error due to faulty construction of these tools, checking their reliability and validity becomes essential.
9 9.1
Software SPSS
SPSS (Statistical Package for the Social Sciences) is a software package used for statistical analysis. The software is also used in health sciences and marketing. The statistics included in this software are descriptive statistics (cross- tabulation, frequencies), bivariate statistics (means, t-test, ANOVA, correlation and non-parametric tests), linear regression, factor analysis, and cluster analysis. The package also provides graphical representation of data.
9.2
R
Unlike SPSS, R Studio is an open source software and performs similar statistics as SPSS. However unlike SPSS, R operates using C++ programming.
9.3
PSPP
Another software available is PSPP. This software too provides similar statistical measurements as the previous two. It was originally introduced with the aim of being a free open replacement of SPSS.
14