Research Design & Data Collection: 2
BML224: Data Analysis for Research
Research Design and Data Collection:
Previously‌
Types of Data -‐ Summary
NOIR NOMINAL
ORDINAL
NON-‐PARAMETRIC
INTERVAL
RATIO
PARAMETRIC
Quan>ta>ve Research Design Nature of the Ques>on
NOMINAL ORDINAL Type of Data
INTERVAL RATIO
Type of Analysis DESCRIPTIVE INFERENTIAL
Quan>ta>ve Research Design General Purpose
Descrip0on (only)
Specific Purpose
Summarise Data
Compare Groups
Finds Strengths of Associa>on, Relate Variables
Type of Ques0on/ Hypothesis
Descrip0ve
Difference
Associa0onal
Descrip>ve Sta>s>cs (e.g. mean, percentage, range)
(e.g. t-‐test, Mann Whitney)
(e.g. correla8on)
General Type of Sta0s0c
Explore Rela0onship Between Variables
[Source: Morgan, G. et al (2011), IBM SPSS for Introductory Sta8s8cs, Routledge, London, p. 6]
Research Design and Collec>on General Purpose
Descrip0on (only)
Specific Purpose
Summarise Data
Compare Groups
Finds Strengths of Associa>on, Relate Variables
Type of Ques0on/ Hypothesis
Descrip0ve
Difference
Associa0onal
Descrip>ve Sta>s>cs (e.g. mean, percentage, range)
(e.g. t-‐test, Mann Whitney)
(e.g. correla8on)
General Type of Sta0s0c
Explore Rela0onship Between Variables
[Source: Morgan, G. et al (2011), IBM SPSS for Introductory Sta8s8cs, Routledge, London, p. 6]
Basic Descrip>ve Sta>s>cs NOMINAL
Quan%ta%ve)Research)Design! Type)of)Analysis)
ORDINAL
Quan%ta%ve)Research)Design!
Tabular)
Type)of)Analysis)
Graphical) Expected(Grade(for(BML224(
What!grade!do!you!expect!to!get!for!the!module?!
INTERVAL
2010/2011!
%)
2011/2012! )
%)
Grade)A)
6)
8.5%)
3)
12%)
Grade)B)
17)
23.9%)
16)
32%)
Grade)C)
37)
52.1%)
19)
38%)
Grade)D)
11)
15.5%)
11)
22%)
Grade)E)
0)
0%)
1)
2%)
Total&
71)
100%)
50)
100%)
RATIO
F"(<40%):" 2%"
B"(60,69%):" 32%"
C"(50,59%):" 38%"
Quan%ta%ve)Research)Design! Type)of)Analysis)
Graphical) Expected)Grade)for)BML224) 40%
35%
30%
Percentage)(%))
25%
20%
15%
10%
5%
0% A)(70%+):
B)(60169%):
C)(50159%):
Expected)Grade)
D)(40149%):
F)(<40%):
A"(70%+):" 6%"
D"(40,49%):" 22%"
Basic Descrip>ves Quan%ta%ve)Research)Design!
Quan%ta%ve)Research)Design! Type)of)Analysis)
ORDINAL
Type)of)Analysis)
Tabular)
Graphical) Student&Confidence&Levels&2011&
How!confident!are!you!about!star1ng!this!module?! 2010/2011!
%)
2011/2012! )
0)
0.0%)
0)
7);)Very)confident)
INTERVAL RATIO
%) 0.0%)
6);)Quite)Confident)
4)
5.60%)
1)
2.0%)
5);)Confident)
16)
22.50%)
10)
20.0%)
4);)Uncertain)
28)
39.40%)
26)
52.0%)
3);)Anxious)
14)
19.70%)
7)
14.0%)
2);)Quite)Anxious)
4)
5.60%)
4)
8.0%)
1);)Very)Anxious)
5)
7.00%)
2)
4.0%)
Uncertain) to)very)anxious)
51)
72%)
39)
78%)
Sample'(n)'
71'
Very&Anxious:! 4%!
Quite&Confident:! 2%!
Quite&Anxious:! 8%!
Confident:! 20%! Anxious:! 14%!
50' Uncertain:! 52%!
Quan%ta%ve)Research)Design!
Quan%ta%ve)Research)Design! Type)of)Analysis)
Type)of)Analysis)
Tabular)
Graphical) Student$Attitudes$to$Statistics$
A#tudes!Towards!Sta/s/cs! Strong!Agree! [5]! This)is)my)first)ever) sta%s%cs)class) I)am)worried)about)this) module) If)I)could)avoid)taking)this) module)I)would) I've)never)enjoyed)maths) Passing)is)my)main)goal) for)this)module! I)do)not)see)the)relevance) of)this)module!
Agree! [4]!
No!Opinion! [3]!
Disagree! [2]!
Strongly! Disagree! [1]!
I,do,not,see,the,relevance,of,this,module
7%$
9%$
Passing,is,my,main,goal,for,this,module
Statement$
NOMINAL
24%$
39%$
32%$
21%$
39%$
17%$
10%$ 1%$
Strongly,Agree Agree
I've,never,enjoyed,maths
32%!
37%!
4%!
14%!
13%!
17%!
26%!
18%!
32%!
7%!
If,I,could,avoid,taking,this,module,I,would
14%!
32%!
25%!
20%!
9%!
I,am,worried,about,this,module
10%!
24%!
24%!
30%!
13%!
32%!
38%!
17%!
10%!
1%!
10%$
24%$
24%$
30%$
13%$
No,Opinion Disagree Strongly,Disagree
14%$
32%$
17%$
This,is,my,first,ever,statistics,class
25%$
25%$
18%$
32%$
0%
20%
9%!
24%!
39%!
21%!
32%$
37%$
40% Percentage$
7%!
20%$
4%$
60%
14%$
80%
9%$
7%$
13%$
100%
Basic Descrip>ves NOMINAL
Quan%ta%ve)Research)Design! Type)of)Analysis)
ORDINAL
Quan%ta%ve)Research)Design!
Analy%cal)
Type)of)Analysis)
Graphical)
Descrip)ve!Sta)s)cs!–!Turnover!2010! Turnover!2010!
INTERVAL
Mean)
£41,311.40(
Median)
£44,640.00(
Mode)
£44,760.00(
Standard)Devia%on)
£9191.0316(
RATIO
Distribution of the Data
Box plot
Quan%ta%ve)Research)Design! Type)of)Analysis)
Distribution of the Data
Graphical)
Research Design and Data Collec>on 2: Learning Outcomes Aims: To map out different types of advanced sta8s8cal analysis and
demonstrate how the choice of sta8s8cal analysis is influenced by the type of data
To map and log opportuni8es for advanced sta8s8cal analysis and
sta8s8cal tests against the different variables within the dataset guide
Research Design and Collec>on General Purpose
Descrip0on (only)
Specific Purpose
Summarise Data
Compare Groups
Finds Strengths of Associa>on, Relate Variables
Type of Ques0on/ Hypothesis
Descrip0ve
Difference
Associa0onal
Descrip>ve Sta>s>cs (e.g. mean, percentage, range)
(e.g. t-‐test, Mann Whitney)
(e.g. correla8on)
General Type of Sta0s0c
Explore Rela0onship Between Variables
[Source: Morgan, G. et al (2011), IBM SPSS for Introductory Sta8s8cs, Routledge, London, p. 6]
Research Design and Data Collection:
Exploratory Data Analysis: Crosstabulations
Crosstabula>ons Defini&on A crosstabula8on is a joint frequency distribu8on of cases based on
two or more categorical variables
Displaying a distribu8on of cases by their values on two or more
variables is known as con&ngency table analysis
Crosstabula>ons Examples: Area by Response to Recession
Crosstabula>ons Examples: Area by Response to Recession
Analysis by Row
Crosstabula>ons Examples: Area by Response to Recession
Analysis by Column
Crosstabula>ons Examples: Size by Response to Recession
Analysis by Row
Crosstabula>ons Examples: Size by Response to Recession
Analysis by Column
Crosstabula>ons Variables for Analysis
Identify potential variables that could form the basis of four separate crosstabulations
Research Design and Data Collection:
Planning the Journey! Basic to Advanced Statistical Analysis
The Role of Sta>s>cal Tests in Advanced Analysis Used to make deduc8ons/inferences about a par8cular data set or
rela8onships (differences/associa8ons) between different data sets
Random sample of 50 households in two rural villages in West
Sussex:
Village A: mean income £17,650 Village B: mean income £22,220
A test can be used to determine if there is a ‘real difference’ or
whether the difference occurred ‘purely by chance’
Sta>s>cal Tests: Parametric Tests Parametric Tests: data conforms to normal distribu8on and is of interval or ra8o in nature Independence of observa8ons (except where the data is paired) Random sampling Interval scale measurement for the dependent variable A minimum sample size of 30 per group is recommended Equal variances of the popula8on from which the data is drawn Hypotheses are usually made about the mean of the popula8on
Sta>s>cal Tests: Non-‐Parametric Tests Non-‐Parametric Tests: data does not conform to normal distribu8on – use ordinal data Independence of randomly selected observa8ons except when paired Few assump8ons concerning the distribu8on of the popula8on Ordinal or nominal scale of measurement Ranks or frequencies of data are the focus of tests A minimum sample size of 30 per group is recommended Hypotheses are posed regarding ranks, medians or frequencies Sample size requirements are less stringent than for parametric tests
Basic to Advanced Sta>s>cal Analysis Scenario 1 As part of a review of tourism compe88veness along the
South Coast, local tourism officers have been asked to look at respec8ve profit levels between businesses in the Arun and Chichester Districts drawing on the results of the business survey.
Where would you begin your analysis?
Basic to Advanced Sta>s>cal Analysis Start with your descrip0ve analysis
Basic to Advanced Sta>s>cal Analysis Start with your descrip0ve analysis
This analysis suggests that there is a difference in profit levels between Chichester District and Arun District
Basic to Advanced Sta>s>cal Analysis Scenario 1 As part of a review of tourism compe88veness along the
South Coast, local tourism officers have been asked to look at respec8ve profit levels between businesses in the Arun and Chichester Districts drawing on the results of the business survey.
What would be a suitable test?
Choosing the Right Test
Choosing the Right Test
One Categorical and One Continuous
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
It is a variable that stands alone and isn't changed by the other variables you are trying to measure
A variable that depends on other factors
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ra0o or Interval (Con0nuous)
Basic to Advanced Sta>s>cal Analysis Scenario 1 As part of a review of tourism compe88ve along the South
Coast, local tourism officers have been asked to look at respec8ve profit levels between businesses in the Arun and Chichester Districts drawing on the results of the business survey.
Basic to Advanced Sta>s>cal Analysis Scenario 1 As part of a review of tourism compe88veness along the
South Coast, local tourism officers have been asked to look at respec8ve profit levels between businesses in the Arun and Chichester Districts drawing on the results of the business survey Profit Test Variable Ratio (continuous)
Basic to Advanced Sta>s>cal Analysis Scenario 1 As part of a review of tourism compe88veness along the
South Coast, local tourism officers have been asked to look at respec8ve profit levels between businesses in the Arun and Chichester Districts drawing on the results of the business survey Area Code Grouping Variable Nominal (categorical) 2 Levels [Chichester District 1 / Arun District 2]
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ra0o or Interval (Con0nuous)
Nominal (Categorical) [2 Levels]
Nominal (Categorical) [2 Levels]
Identify potential variables for use in a Student T-Test from the dataset guide
Basic to Advanced Sta>s>cal Analysis Scenario 2 Tourism South East are developing a new e-‐tourism strategy
and they want to establish if there is any difference between e-‐strategy mo8ves (e-‐commerce adopters and non-‐adopters) and business ajtudes to the value of the internet in 2010.
Where would you start?
Basic to Advanced Sta>s>cal Analysis Start with your descrip0ve analysis
Basic to Advanced Sta>s>cal Analysis Start with your descrip0ve analysis
This analysis suggests that there is a difference in aLtudes towards the internet between e-‐strategy adopters and non-‐ adopters
Basic to Advanced Sta>s>cal Analysis Scenario 2 Tourism South East are developing a new e-‐tourism strategy
and they want to establish if there is any difference between e-‐strategy mo8ves (e-‐commerce adopters and non-‐adopters) and business ajtudes to the value of the internet in 2010.
What would be a suitable test?
Choosing the Right Test
Choosing the Right Test
One Categorical and One Continuous
Mann Whitney – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Mann Whitney – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ordinal
Basic to Advanced Sta>s>cal Analysis Scenario 2 Tourism South East are developing a new e-‐tourism strategy
and they want to establish if there is any difference between e-‐strategy mo8ves (e-‐commerce adopters and non-‐adopters) and business ajtudes to the value of the internet in 2010.
Basic to Advanced Sta>s>cal Analysis Scenario 2 Tourism South East are developing a new e-‐tourism strategy
and they want to establish if there is any difference between e-‐strategy mo8ves (e-‐commerce adopters and non-‐adopters) and business aLtudes to the value of the internet in 2010
WEBQUAL10 Test Variable Ordinal (continuous)
Basic to Advanced Sta>s>cal Analysis Scenario 2 Tourism South East are developing a new e-‐tourism strategy
and they want to establish if there is any difference between e-‐strategy mo&ves (e-‐commerce adopters and non-‐ adopters) and business ajtudes to the value of the internet in 2010 E-Strategy Grouping Variable Nominal (categorical) 2 Levels [E-Commerce Adopters 1 / E-Commerce Non-Adopters 2]
Mann Whitney – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ordinal
Nominal (Categorical) [2 Levels]
Nominal (Categorical) [2 Levels]
Identify potential variables for use in a Mann Whitney Test from the dataset guide
Basic to Advanced Sta>s>cal Analysis Scenario 3 Between 2008 and 2010, Tourism South East ran a series of
courses in conjunc>on with the Green Tourism Business Scheme to help GTBS members progress to the next stage of accredita>on (e.g. bronze to silver; silver to gold). As part of the monitoring process, Tourism South East want to establish if these courses have had an impact on GTBS scores
Where would you start you analysis?
Basic to Advanced Sta>s>cal Analysis Start with your descrip0ve analysis
This analysis suggests that there is a difference in GTBS scores between 2008 and 2010
Basic to Advanced Sta>s>cal Analysis Scenario 3 Between 2008 and 2010, Tourism South East ran a series of
courses in conjunc>on with the Green Tourism Business Scheme to help GTBS members progress to the next stage of accredita>on (e.g. bronze to silver; silver to gold). As part of the monitoring process, Tourism South East want to establish if these courses have had an impact on GTBS scores
What would be a suitable test?
Choosing the Right Test
Choosing the Right Test
Two continuous which is the same administered twice
Basic to Advanced Sta>s>cal Analysis Scenario 3 Between 2008 and 2010, Tourism South East ran a series of
courses in conjunc>on with the Green Tourism Business Scheme to help GTBS members progress to the next stage of accredita>on (e.g. bronze to silver; silver to gold). As part of the monitoring process, Tourism South East want to establish if these courses have had an impact on GTBS scores
Paired Values GTBS08/GTBS10
Related or Paired Samples T-‐Test Appropriate Related/Paired Variables (Ra0o or Interval)
Identify potential variables for use in a Related or Paired Samples TTest from the dataset guide
Basic to Advanced Sta>s>cal Analysis Scenario 4 Between 2008 and 2010, Tourism South East ran a series of
e-‐commerce workshops across the South East region promo>ng e-‐commerce. As part of the monitoring process, Tourism South East want to establish if these workshops have had an impact on business a\tudes to the value of the internet.
Where would you start your analysis?
Basic to Advanced Sta>s>cal Analysis Start with your descrip0ve analysis
This analysis suggests that there is a difference in aLtudes towards the Internet between 2008 and 2010
Basic to Advanced Sta>s>cal Analysis Scenario 4 Between 2008 and 2010, Tourism South East ran a series of
e-‐commerce workshops across the South East region promo>ng e-‐commerce. As part of the monitoring process, Tourism South East want to establish if these workshops have had an impact on business a\tudes to the value of the internet.
What would be a suitable test?
Choosing the Right Test
Two continuous which is the same administered twice
Basic to Advanced Sta>s>cal Analysis Scenario 4 Between 2008 and 2010, Tourism South East ran a series of
e-‐commerce workshops across the South East region promo>ng e-‐commerce. As part of the monitoring process, Tourism South East want to establish if these workshops have had an impact on business aVtudes to the value of the internet. Paired Values Webqual08/ Webqual10
Wilcoxon Appropriate Related/Paired Variables (Ordinal)
Identify potential variables for use in a Wilcoxon Test from the dataset guide
Basic to Advanced Sta>s>cs Analysis Scenario 5 A review of research literature conducted by the University of
Chichester indicates that the length of business ownership influences business response to recession, and the longer the length of business ownership, the more proac8ve businesses are in terms of their overall business strategy and their response to recession. In this instance, the University would like to establish if there is a significant difference between the length of business ownership and the business response to recession.
Where would you start your analysis?
Basic to Advanced Sta>s>cal Analysis Start with your descrip0ve analysis
This analysis suggests that there is a difference between length of business ownership and response to the recession
Basic to Advanced Sta>s>cs Analysis Scenario 5 A review of research literature conducted by the University of
Chichester indicates that the length of business ownership influences business response to recession, and the longer the length of business ownership, the more proac8ve businesses are in terms of their overall business strategy and their response to recession. In this instance, the University would like to establish if there is a significant difference between the length of business ownership and the business response to recession.
What would be a suitable test?
Choosing the Right Test Two categorical
Chi-‐Squared A test to examine difference between data that is grouped into
independent and mutually exclusive groups
Data must be in the form of con8ngency tables, showing frequency
of observa8ons in different categories (h) for one or more samples (k)
Chi-‐Squared Test uses nominal data
Chi-‐Squared Scenario 5 A review of research literature conducted by the University of
Chichester indicates that the length of business ownership influences business response to recession, and the longer the length of business ownership, the more proac8ve businesses are in terms of their overall business strategy and their response to recession.
In this instance, the University would like to establish if there is a
significant difference between the length of business ownership and the business response to recession. Nominal Variables Lengthcat v Response
Chi-‐Squared Nominal Variables
Identify potential variables for use in a Chi-Squared T-Test from the dataset guide
Basic to Advanced Sta>s>cal Analysis Scenario 6 Tourism South East is in the process of developing a new
Sustainable Tourism Strategy for the region, as part of which they are inves8ga8ng factors influencing the uptake of local goods and services. Tourism South East would like to establish if there is any rela8onship/associa8on between GTBS score and the use of local goods and services.
Where would you start your analysis?
Basic to Advanced Sta>s>cal Analysis Start with your descrip0ve analysis
The scaTerplot shows evidence of a linear rela&onship between Green10 and GTBS10
Basic to Advanced Sta>s>cal Analysis Scenario 6 Tourism South East is in the process of developing a new
Sustainable Tourism Strategy for the region, as part of which they are inves8ga8ng factors influencing the uptake of local goods and services. Tourism South East would like to establish if there is any rela8onship/associa8on between GTBS score and the use of local goods and services.
What would be a suitable test?
Choosing the Right Test Two separate continuous
Types of Correla>on When variables are parametric in nature (e.g. ra8o/interval data),
the most commonest measure of correla8on is the Pearson’s Product Moment Correla&on Coefficient
Where data is ordinal or when not normally distributed, or when
other assump8ons of the Pearson correla8on coefficient are violated, we use the Spearman Rank Correla&on Coefficient
Correla>on Correla8on is a means to measure the degree of associa8on
between two variables, that is, the extent to which changes in values of one variable are matched by changes in another variable
• Posi&ve Correla&on • Measures the extent to which higher values of one variable are matched with higher values of the other
Correla>on Correla8on is a means to measure the degree of associa8on
between two variables, that is, the extent to which changes in values of one variable are matched by changes in another variable
• Nega&ve Correla&on • Measures the extent to which higher values of one variable are matched with lower values of the other
Correla>on Scenario 6 Tourism South East is in the process of developing a new
Sustainable Tourism Strategy for the region, as part of which they are inves8ga8ng factors influencing the uptake of local goods and services. Tourism South East would like to establish if there is any rela8onship/associa8on between GTBS score and the use of local goods and services. Ratio/Ratio Variables GTBS10 v Green10
Correla>on Pearson’s Product Moment Correla&on Coefficient (Ra&o/Interval)
Spearman Correla&on Coefficient (Ordinal)
Nominal (Categorical) [2 Levels]
Nominal (Categorical) [2 Levels]
Identify potential variables for use in Correlation Tests from the dataset guide
Learning Outcomes At the end of this session you should be able to: Map out different types of advanced sta8s8cal analysis and
demonstrate how the choice of sta8s8cal analysis is influenced by the type of data
Map and log opportuni8es for advanced sta8s8cal analysis and
sta8s8cal tests against the different variables within the dataset guide
Self-‐Directed Ac>vity: To do: Please complete self-‐directed Ac8vity 6 – Scenario Quiz Please complete self-‐directed Ac8vity 7 using the Dataset Guide
Analysis Template