BML246
Research Skills Session 3:
Research Design and Data Collection:1 Understanding Your Data Tutors: Dr Andy Clegg and Dr Jorge Gutic
Learning Outcomes Aims: 혰 To discuss and contextualise the key elements of the research
design and data collection process
혰 To discuss and consider the differences between different types of
data
혰 To demonstrate how different types of data influence the type of
analysis that can be executed
혰 To map out different types of advanced statistical analysis and
demonstrate how the choice of statistical analysis is influenced by the type of data
Step 1: 1: Step Decide Your Decide on Research your Topicresearch
topic
1
Crafting Research
What?
Why?
• What puzzles/intrigues me? • What do I want to know more about/understand better? • What are my key research questions?
• Why will this be of enough interest to others to be published as a thesis, book, paper, guide to practitioners or policymakers? • Can the research be justified as a ‘contribution to knowledge?’
How – Conceptually?
How – Practically?
• What models, concepts and theories can I draw on/develop to answer my research questions? • How can these be brought together into a basic conceptual framework to guide my investigation?
• What investigative styles and techniques shall I use to apply my conceptual framework (both to gather material and analyse it)? • How shall I gain access to information sources/date
Step 2: Research Aims & Objectives
2
Step 3a: Linking Data Types to Analysis
3a
Step 3a: Linking Data Types to Analysis Planning the Journey 3a
Assessment Criteria
— Evidence of clear research aims and objectives informed by background research — Clear extrapolation of answers and analysis based on the use of the appropriate use of either qualitative or quantitative approaches (e.g. SPSS)
Data & Data Sources Principal Forms of Data: Quantitative: the observations or responses are expressed
numerically
Qualitative:
use of comments, case studies or observations with responses often subsequently assigned to categories
Ideally aim for a balance of both quantitative and qualitative
methodologies/data (mixed methods)
Data Types
NOIR
Data Types Nominal:
NOIR
List of categories to which objects can be attributed; objects
can be counted but not be measured numerically (classed as qualitative data)
No assumption on their order, only that objects in different
categories are “different”
Classed as non-parametric data Examples:
Which supermarket do you normally shop at?: Asda (1), Sainsburys (2), Tesco (3), Morrisons (4) Gender: Male (1), Female (2)
Data Types
Types of Data Ordinal:
NOIR
List of categories but this time ordered or ranked; differences
are in relative magnitude (greater than; less than)
No assumption is made on the ‘distance’ between categories Classed as non-parametric data
Examples: How would you rate the quality of service provided by your mobile phone company?: 5 – Excellent to 1 – very poor University student populations ranked by size
Data Types Data Types Types of Data Interval:
NOIR
Observations are made on a scale comprising equal intervals
but the zero value is arbitrary
Classed as parametric data
Examples:
Fahrenheit or Celsius scale to measure temperature. Differences make sense but ratios do not (20o/10o is not twice as hot!)
Data Types Ratio:
NOIR
Observations are made on a scale comprising equal intervals
with a true zero point
Classed as parametric data
Examples:
e.g. age, height, weight, response time, grade
Data Types – Further Definitions
QUANTITATIVE DATA
Discrete
Continuous
No. of people in the class
Distance traveled: 12.654 miles
INTEGER
Data Types – Further Definitions
NOIR NOMINAL
ORDINAL
NON-PARAMETRIC
INTERVAL
RATIO
PARAMETRIC
Linking Data Types to Analysis Linking Data Types to Analysis Nature of the Question
NOMINAL ORDINAL Type of Data
INTERVAL RATIO
Type of Analysis DESCRIPTIVE INFERENTIAL
Linking Data Types to Analysis
General Purpose
Description (only)
Specific Purpose
Summarise Data
Compare Groups
Finds Strengths of Association, Relate Variables
Descriptive
Difference
Associational
Descriptive Statistics (e.g. mean, percentage, range)
(e.g. t-test, Mann Whitney)
(e.g. correlation)
Type of Question/Hypothe sis General Type of Statistic
Explore Relationship Between Variables
[Source: Morgan, G. et al (2011), IBM SPSS for Introductory Statistics, Routledge, London, p. 6]
Linking Data Types to Analysis
General Purpose
Description (only)
Specific Purpose
Summarise Data
Compare Groups
Finds Strengths of Association, Relate Variables
Descriptive
Difference
Associational
Descriptive Statistics (e.g. mean, percentage, range)
(e.g. t-test, Mann Whitney)
(e.g. correlation)
Type of Question/Hypothe sis General Type of Statistic
Explore Relationship Between Variables
[Source: Morgan, G. et al (2011), IBM SPSS for Introductory Statistics, Routledge, London, p. 6]
Linking Data Types to Analysis
Question: What grade do you expect to get?
Type of Data =
Type of Analysis
Linking Data Types to Analysis
Question: What grade do you expect to get?
Type of Data = NOMINAL
Type of Analysis
Linking Data Types to Analysis
Type of Analysis
Tabular What grade do you expect to get for the module? 2011/2012
2010/2011
%
%
Grade A
6
8.5%
3
12%
Grade B
17
23.9%
16
32%
Grade C
37
52.1%
19
38%
Grade D
11
15.5%
11
22%
Grade E
0
0%
1
2%
Total
71
100%
50
100%
Linking Data Types to Analysis
Type of Analysis
Graphical Expected(Grade(for(BML224(
F"(<40%):" 2%"
A"(70%+):" 6%"
D"(40,49%):" 22%"
B"(60,69%):" 32%"
C"(50,59%):" 38%"
Linking Data Types to Analysis
Type of Analysis
Graphical Expected)Grade)for)BML224) 40%
35%
30%
Percentage)(%))
25%
20%
15%
10%
5%
0% A)(70%+):
B)(60169%):
C)(50159%):
Expected)Grade)
D)(40149%):
F)(<40%):
Linking Data Types to Analysis
Question: How confident do you feel about starting this module?
Type of Data =
Type of Analysis
Linking Data Types to Analysis
Question: How confident do you feel about starting this module?
Type of Data = ORDINAL
Type of Analysis
Linking Data Types to Analysis
Type of Analysis
Tabular How confident are you about starting this module? 2011/2012
2010/2011
%
7 - Very confident
0
0.0%
0
0.0%
6 - Quite Confident
4
5.60%
1
2.0%
5 - Confident
16
22.50%
10
20.0%
4 - Uncertain
28
39.40%
26
52.0%
3 - Anxious
14
19.70%
7
14.0%
2 - Quite Anxious
4
5.60%
4
8.0%
1 - Very Anxious
5
7.00%
2
4.0%
Uncertain to very anxious
51
72%
39
78%
Sample (n)
71
50
%
Linking Data Types to Analysis
Type of Analysis
Tabular How confident are you about starting this module?
Average confidence level 2010/2012: 3.82 Average confidence level 2011/2012: 3.82
2011/2012
2010/2011
%
7 - Very confident
0
0.0%
0
0.0%
6 - Quite Confident
4
5.60%
1
2.0%
5 - Confident
16
22.50%
10
20.0%
4 - Uncertain
28
39.40%
26
52.0%
3 - Anxious
14
19.70%
7
14.0%
2 - Quite Anxious
4
5.60%
4
8.0%
1 - Very Anxious
5
7.00%
2
4.0%
Uncertain to very anxious
51
72%
39
78%
Sample (n)
71
50
%
Linking Data Types to Analysis
Type of Analysis
Graphical Student&Confidence&Levels&2011&
Very&Anxious:! 4%!
Quite&Anxious:! 8%!
Quite&Confident:! 2%!
Confident:! 20%! Anxious:! 14%!
Uncertain:! 52%!
Linking Data Types to Analysis
Question: Attitudes to Statistics
Type of Data =
Type of Analysis
Linking Data Types to Analysis
Question: Attitudes to Statistics
Type of Data = ORDINAL
Type of Analysis
Linking Data Types to Analysis
Type of Analysis
Tabular
Attitudes Towards Statistics
This is my first ever statistics class I am worried about this module If I could avoid taking this module I would I've never enjoyed maths Passing is my main goal for this module I do not see the relevance of this module
Strong Agree [5]
Agree [4]
No Opinion [3]
Disagree [2]
Strongly Disagree [1]
32%
37%
4%
14%
13%
17%
26%
18%
32%
7%
14%
32%
25%
20%
9%
10%
24%
24%
30%
13%
32%
38%
17%
10%
1%
7%
9%
24%
39%
21%
Linking Data Types to Analysis
Type of Analysis
Graphical Student$Attitudes$to$Statistics$ I,do,not,see,the,relevance,of,this,module
7%$
9%$
Statement$
Passing,is,my,main,goal,for,this,module
24%$
39%$
32%$
21%$
39%$
17%$
10%$ 1%$
Strongly,Agree Agree
I've,never,enjoyed,maths
10%$
24%$
24%$
30%$
13%$
No,Opinion Disagree Strongly,Disagree
If,I,could,avoid,taking,this,module,I,would
14%$
I,am,worried,about,this,module
32%$
17%$
This,is,my,first,ever,statistics,class
25%$
25%$
18%$
32%$
0%
20%
20%$
32%$
37%$
40% Percentage$
4%$
60%
14%$
80%
9%$
7%$
13%$
100%
Linking Data Types to Analysis
Type of Analysis
Graphically Attitudes$to$Statistics$
I+do+not+see+the+relevance+of+this+module
4$
Response$
Passing+is+my+main+goal+for+this+module
Ranking$Scale$ + 1+=+Strongly+Agree+ 2+=+Agree+ 3+=+No+opinon+ 4+=+Disagree+ 5+=+Strongly+Diagree+ +
2.5$
I've+never+enjoyed+maths
3.4$
If+I+could+avoid+taking+this+module+I+would
3.2$
I+am+worried+about+this+module
3$
This+is+my+first+ever+statistics+class
2.3$
0
1
2
3 Mean$Rank$
4
5
Linking Data Types to Analysis
Type of Analysis
Graphically Attitudes$to$Statistics$
Plotting the mean score (rank) for each response
I+do+not+see+the+relevance+of+this+module
4$
Response$
Passing+is+my+main+goal+for+this+module
Ranking$Scale$ + 1+=+Strongly+Agree+ 2+=+Agree+ 3+=+No+opinon+ 4+=+Disagree+ 5+=+Strongly+Diagree+ +
2.5$
I've+never+enjoyed+maths
3.4$
If+I+could+avoid+taking+this+module+I+would
3.2$
I+am+worried+about+this+module
3$
This+is+my+first+ever+statistics+class
2.3$
0
1
2
3 Mean$Rank$
4
5
Linking Data Types to Analysis
Question: Business Turnover in 2010
Type of Data =
Type of Analysis
Linking Data Types to Analysis
Question: Business Turnover in 2010
Type of Data = RATIO
Type of Analysis
Linking Data Types to Analysis Type of Analysis
Analytical Descriptive Statistics – Turnover 2010 Turnover 2010 Mean
£41,311.40
Median
£44,640.00
Mode
£44,760.00
Standard Deviation
£9191.0316
Linking Data Types to Analysis
Type of Analysis
Graphical
Distribution of the Data
Box plot
Linking Data Types to Analysis
Type of Analysis
Distribution of the Data
Graphical
Linking Data Types to Analysis
Type of Analysis
Tabular Size of Business in 2010 by Category of Turnover Turnover 2010
No. of Businesses
Percentage
£0 to £9,999
0
0
£10,000 to £19,999
0
0
£20,000 to £29,999
48
16
£30,000 to £39,999
73
24
£40,000 to £49,000
118
39
£50,000 to £59,000
59
20
£60,000
2
1
300
100%
Total
Linking Data Types to Analysis
Type of Analysis RECODING
RATIO
NOMINAL
Tabular Size of Business in 2010 by Category of Turnover Turnover 2010
No. of Businesses
Percentage
£0 to £9,999
0
0
£10,000 to £19,999
0
0
£20,000 to £29,999
48
16
£30,000 to £39,999
73
24
£40,000 to £49,000
118
39
£50,000 to £59,000
59
20
£60,000
2
1
300
100%
Total
Linking Data Types to Analysis Linking Data Types to Analysis Type of Analysis RECODING
NOMINAL
RATIO
Tabular Size of Business in 2010 by Category of Turnover Turnover 2010
No. of Businesses
Percentage
£0 to £9,999
0
0
£10,000 to £19,999
0
0
£20,000 to £29,999
48
16
£30,000 to £39,999
73
24
£40,000 to £49,000
118
39
£50,000 to £59,000
59
20
£60,000
2
1
300
100%
Total
Linking Data Types to Analysis
Type of Analysis
Graphical Size%of%Business%by%Turnover%Category% £60,000% 1%% £50,000%to%£59,000%% 20%%
£20,000%to%£29,999%% 16%%
£30,000%to%£39,999%% 24%%
£40,000%to%£49,000%% 39%%
Linking Data Types to Analysis
Question: % Change in Turnover 2008-2010
Type of Data =
Type of Analysis
Linking Data Types to Analysis
Question: % Change in Turnover 2008-2010
Type of Data = INTERVAL
Type of Analysis Same process of analysis for RATIO data would apply
Types of Data Summary TYPE OF MEASUREMENT
QUALITATIVE NOMINAL In nominal measurement the variables consists of named categories. The categories have no mathematical properties.
QUANTITATIVE VARIABLES ORDINAL
INTERVAL
RATIO
The scores indicate only rank order in terms of size. It is not correct to calculate means on the scores.
The steps between the scores are equal in size though there is no proper zero point. The scores can be added, means calculated etc.
This is the same as ‘interval’ measurement but the scale has a proper zero point. Ratios can be calculated as a consequence
[Source: Howitt, D. and Cramer, D. (2011), Introduction to Research Methods in Psychology, Pearson, London]
Services
Research Design and Data Collection:
Linking Data Types to Statistical Analysis
Crosstabulations Definition Â&#x2014; A crosstabulation is a joint frequency distribution of cases based on
two or more categorical variables
Â&#x2014; Displaying a distribution of cases by their values on two or more
variables is known as contingency table analysis
Crosstabulations Â&#x2014; Examples: Length of Ownership by Response to Recession
Crosstabulations Â&#x2014; Examples: Length of Ownership by Response to Recession
Statistical Tests Used to make deductions/inferences about a particular data set or
relationships (differences/associations) between different data sets
Random sample of 50 households in two rural villages in West
Sussex:
Village A: mean income £17,650 Village B: mean income £22,220
A test can be used to determine if there is a ‘real difference’ or
whether the difference occurred ‘purely by chance’
Statistical Tests Â&#x2014; Parametric Tests:
data conforms to normal distribution and is of interval or ratio in nature
Â&#x2014; Non-Parametric Tests:
data does not conform to normal distribution â&#x20AC;&#x201C; use ordinal data
Parametric Tests Independence of observations (except where the data is paired) Random sampling Interval scale measurement for the dependent variable A minimum sample size of 30 per group is recommended Equal variances of the population from which the data is drawn Hypotheses are usually made about the mean of the population
Non-Parametric Tests Independence of randomly selected observations except when paired Few assumptions concerning the distribution of the population Ordinal or nominal scale of measurement Ranks or frequencies of data are the focus of tests A minimum sample size of 30 per group is recommended Hypotheses are posed regarding ranks, medians or frequencies Sample size requirements are less stringent than for parametric tests
Research Design and Data Collection:
Student T-Test
Student T-Test Â&#x2014; Scenario Â&#x2014; As part of the bidding process to Tourism South East for
future tourism funding, local tourism officers have to demonstrate if there is a difference in profit levels between businesses in the Arun and Chichester Districts
Student T-Test: Data Requirements
Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Student T-Test: Data Requirements
Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
It is a variable that stands alone and isn't changed by the other variables you are trying to measure
A variable that depends on other factors
Student T-Test: Data Requirements
Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ratio or Interval (Continuous)
Student T-Test: Data Requirements Â&#x2014; Scenario Â&#x2014; As part of the bidding process to Tourism South East for
future tourism funding, local tourism officers have to demonstrate if there is a difference in profit between businesses in the Arun and Chichester Districts
Profit Test Variable Ratio (continuous)
Student T-Test: Data Requirements Â&#x2014; Scenario Â&#x2014; As part of the bidding process to Tourism South East for
future tourism funding, local tourism officers have to demonstrate if there is a difference in turnover between businesses in the Arun and Chichester Districts
Area Code Grouping Variable Nominal (categorical) 2 Levels [Chichester District 1 / Arun District 2]
Student T-Test: Data Requirements
Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ratio or Interval (Continuous)
Nominal (Categorical) [2 Levels]
Nominal (Categorical) [2 Levels]
Identify potential variables for use in a Student TTest from the dataset guide
Choosing the Right
One Categorical and One Continuous
Mann Whitney Â&#x2014; Scenario Â&#x2014; Tourism South East are developing a new e-tourism strategy
and they want to establish if there is any difference between e-strategy motives (e-commerce adopters and non-adopters) and business attitudes to the web-based customer relationship management systems TSE offer
Mann Whitney: Data Requirements
Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Mann Whitney: Data Requirements
Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ordinal
Mann Whitney: Data Requirements Â&#x2014; Scenario Â&#x2014; Tourism South East are developing a new e-tourism strategy
and they want to establish if there is any difference between e-strategy motives (e-commerce adopters and non-adopters) and business attitudes to the web-based customer relationship management systems TSE offer TSECMS Test Variable Ordinal (continuous)
Mann Whitney: Data Requirements Â&#x2014; Scenario Â&#x2014; Tourism South East are developing a new e-tourism strategy
and they want to establish if there is any difference between e-strategy motives (e-commerce adopters and nonadopters) and business attitudes to the web-based customer relationship management systems TSE offer E-Strategy Grouping Variable Nominal (categorical) 2 Levels [E-Commerce Adopters 1 / E-Commerce Non-Adopters 2]
Mann Whitney
Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ordinal
Nominal (Categorical) [2 Levels]
Nominal (Categorical) [2 Levels]
Identify potential variables for use in a Mann Whitney Test from the dataset guide
Learning Outcomes Aims: 혰 To discuss and contextualise the key elements of the research
design and data collection process
혰 To discuss and consider the differences between different types of
data
혰 To demonstrate how different types of data influence the type of
analysis that can be executed
혰 To map out different types of advanced statistical analysis and
demonstrate how the choice of statistical analysis is influenced by the type of data