Research Design & Data Collection: 2
BML224: Data Analysis for Research
Research Design and Data Collection:
Previously‌
Types of Data -‐ Summary
NOIR NOMINAL
ORDINAL
NON-‐PARAMETRIC
INTERVAL
RATIO
PARAMETRIC
Quan>ta>ve Research Design Nature of the Ques>on
NOMINAL ORDINAL Type of Data
INTERVAL RATIO
Type of Analysis DESCRIPTIVE INFERENTIAL
Quan>ta>ve Research Design General Purpose
Descrip0on (only)
Specific Purpose
Summarise Data
Compare Groups
Finds Strengths of Associa>on, Relate Variables
Type of Ques0on/ Hypothesis
Descrip0ve
Difference
Associa0onal
Descrip>ve Sta>s>cs (e.g. mean, percentage, range)
(e.g. t-‐test, Mann Whitney)
(e.g. correla8on)
General Type of Sta0s0c
Explore Rela0onship Between Variables
[Source: Morgan, G. et al (2011), IBM SPSS for Introductory Sta8s8cs, Routledge, London, p. 6]
Research Design and Collec>on General Purpose
Descrip0on (only)
Specific Purpose
Summarise Data
Compare Groups
Finds Strengths of Associa>on, Relate Variables
Type of Ques0on/ Hypothesis
Descrip0ve
Difference
Associa0onal
Descrip>ve Sta>s>cs (e.g. mean, percentage, range)
(e.g. t-‐test, Mann Whitney)
(e.g. correla8on)
General Type of Sta0s0c
Explore Rela0onship Between Variables
[Source: Morgan, G. et al (2011), IBM SPSS for Introductory Sta8s8cs, Routledge, London, p. 6]
Research Design and Data Collec>on 2: Learning Outcomes Aims: To map out different types of advanced sta8s8cal analysis and
demonstrate how the choice of sta8s8cal analysis is influenced by the type of data
To map and log opportuni8es for advanced sta8s8cal analysis and
sta8s8cal tests against the different variables within the dataset guide
Research Design and Collec>on General Purpose
Descrip0on (only)
Specific Purpose
Summarise Data
Compare Groups
Finds Strengths of Associa>on, Relate Variables
Type of Ques0on/ Hypothesis
Descrip0ve
Difference
Associa0onal
Descrip>ve Sta>s>cs (e.g. mean, percentage, range)
(e.g. t-‐test, Mann Whitney)
(e.g. correla8on)
General Type of Sta0s0c
Explore Rela0onship Between Variables
[Source: Morgan, G. et al (2011), IBM SPSS for Introductory Sta8s8cs, Routledge, London, p. 6]
Research Design and Data Collection:
Exploratory Data Analysis: Crosstabulations
Crosstabula>ons Defini&on A crosstabula8on is a joint frequency distribu8on of cases based on
two or more categorical variables
Displaying a distribu8on of cases by their values on two or more
variables is known as con&ngency table analysis
Crosstabula>ons Examples: Length of Ownership by Response to Recession
Crosstabula>ons Examples: Length of Ownership by Response to Recession
Crosstabula>ons Examples: Length of Ownership by Response to Recession
Analysis by Row
Crosstabula>ons Examples: Length of Ownership by Response to Recession
Crosstabula>ons Examples: Length of Ownership by Response to Recession
Analysis by Column
Crosstabula>ons Examples: Size by Response to Recession
Analysis by Row
Crosstabula>ons Examples: Size by Response to Recession
Analysis by Column
Crosstabula>ons Examples: E-‐Strategy by Value of the Internet
Combining nominal (E-‐Strategy) with Ordinal (Webvalue)
Crosstabula>ons Variables for Analysis
Identify potential variables that could form the basis of four separate crosstabulations
Research Design and Data Collection:
Linking Data Types to Advanced Statistical Analysis
Sta>s>cal Tests Used to make deduc8ons/inferences about a par8cular data set or
rela8onships (differences/associa8ons) between different data sets
Random sample of 50 households in two rural villages in West
Sussex:
Village A: mean income £17,650 Village B: mean income £22,220
A test can be used to determine if there is a ‘real difference’ or
whether the difference occurred ‘purely by chance’
Sta>s>cal Tests Parametric Tests: data conforms to normal distribu8on and is of
interval or ra8o in nature
Non-‐Parametric Tests: data does not conform to normal
distribu8on – use ordinal data
Sta>s>cal Tests: Parametric Tests Independence of observa8ons (except where the data is paired) Random sampling Interval scale measurement for the dependent variable A minimum sample size of 30 per group is recommended Equal variances of the popula8on from which the data is drawn Hypotheses are usually made about the mean of the popula8on
Sta>s>cal Tests: Non-‐Parametric Tests Independence of randomly selected observa8ons except when paired Few assump8ons concerning the distribu8on of the popula8on Ordinal or nominal scale of measurement Ranks or frequencies of data are the focus of tests A minimum sample size of 30 per group is recommended Hypotheses are posed regarding ranks, medians or frequencies Sample size requirements are less stringent than for parametric tests
Research Design and Data Collection:
Testing for Difference
Choosing the Right Test
Choosing the Right Test
One Categorical and One Continuous
Research Design and Data Collection:
Student T-Test
Sta>s>cal Tests Scenario As part of the bidding process to Tourism South East for
future tourism funding, local tourism officers have to demonstrate if there is a difference in profit levels between businesses in the Arun and Chichester Districts
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
It is a variable that stands alone and isn't changed by the other variables you are trying to measure
A variable that depends on other factors
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ra0o or Interval (Con0nuous)
Student T-‐Test – Data Requirements Scenario As part of the bidding process to Tourism South East for
future tourism funding, local tourism officers have to demonstrate if there is a difference in profit between businesses in the Arun and Chichester Districts
Profit Test Variable Ratio (continuous)
Student T-‐Test – Data Requirements Scenario As part of the bidding process to Tourism South East for
future tourism funding, local tourism officers have to demonstrate if there is a difference in turnover between businesses in the Arun and Chichester Districts
Area Code Grouping Variable Nominal (categorical) 2 Levels [Chichester District 1 / Arun District 2]
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ra0o or Interval (Con0nuous)
Nominal (Categorical) [2 Levels]
Nominal (Categorical) [2 Levels]
Identify potential variables for use in a Student T-Test from the dataset guide
Choosing the Right
One Categorical and One Continuous
Research Design and Data Collection:
Mann Whitney
Sta>s>cal Tests Scenario Tourism South East are developing a new e-‐tourism strategy
and they want to establish if there is any difference between e-‐strategy mo8ves (e-‐commerce adopters and non-‐adopters) and business aitudes to the web-‐based customer rela8onship management systems TSE offer
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ordinal
Sta>s>cal Tests Scenario Tourism South East are developing a new e-‐tourism strategy
and they want to establish if there is any difference between e-‐strategy mo8ves (e-‐commerce adopters and non-‐adopters) and business aPtudes to the web-‐based customer rela&onship management systems TSE offer TSECMS Test Variable Ordinal (continuous)
Sta>s>cal Tests Scenario Tourism South East are developing a new e-‐tourism strategy
and they want to establish if there is any difference between e-‐strategy mo&ves (e-‐commerce adopters and non-‐ adopters) and business aitudes to the web-‐based customer rela8onship management systems TSE offer E-Strategy Grouping Variable Nominal (categorical) 2 Levels [E-Commerce Adopters 1 / E-Commerce Non-Adopters 2]
Student T-‐Test – Data Requirements Grouping Variable [Independent Variables]
Test Variables [Dependent Variables]
Nominal (Categorical) [2 Levels]
Ordinal
Nominal (Categorical) [2 Levels]
Nominal (Categorical) [2 Levels]
Identify potential variables for use in a Mann Whitney Test from the dataset guide
Choosing the Right Test
One Categorical and One Continuous
Choosing the Right Test
Two continuous which is the same administered twice
Research Design and Data Collection:
Related or Paired Samples T-Test
Related or Paired Samples T-‐Test The Paired Sample T-‐test is undertaken when the samples are
related or paired (ojen with the same par8cipants in each sample)
Parametric Test (ra8o or interval data)
Related or Paired Samples T-‐Test The Paired Sample T-‐test is undertaken when the samples are
related or paired (ojen with the same par8cipants in each sample)
Parametric Test (ra8o or interval data) Scenario Between 2008 and 2010, Tourism South East ran a series of courses
in conjunc>on with the Green Tourism Business Scheme to help GTBS members progress to the next stage of accredita>on (e.g. bronze to silver; silver to gold). As part of the monitoring process, Tourism South East want to establish if these courses have had an impact on GTBS scores
Related or Paired Samples T-‐Test The Paired Sample T-‐test is undertaken when the samples are
related or paired (ojen with the same par8cipants in each sample)
Parametric Test (ra8o or interval data) Scenario Between 2008 and 2010, Tourism South East ran a series of courses
in conjunc>on with the Green Tourism Business Scheme to help GTBS members progress to the next stage of accredita>on (e.g. bronze to silver; silver to gold). As part of the monitoring process, Tourism South East want to establish if these courses have had an impact on GTBS scores Paired Values GTBS08/GTBS10
Related or Paired Samples T-‐Test Appropriate Related/Paired Variables (Ra0o or Interval)
Identify potential variables for use in a Related or Paired Samples TTest from the dataset guide
Choosing the Right Test
Two continuous which is the same administered twice
Research Design and Data Collection:
Wilcoxon
Wilcoxon The Wilcoxon Test is undertaken when the samples are related or
paired (ojen with the same par8cipants in each sample)
Non-‐Parametric Test (ordinal data)
Wilcoxon The Wilcoxon Test is undertaken when the samples are related or
paired (ojen with the same par8cipants in each sample)
Non-‐Parametric Test (ordinal data) Scenario Between 2008 and 2010, Tourism South East ran a series of e-‐
commerce workshops across the South East region suppor>ng the implementa>on of their new CMS system. As part of the monitoring process, Tourism South East want to establish if these workshops have had an impact on business a\tudes to the value of their CMS systems.
Wilcoxon The Paired Sample T-‐test is undertaken when the samples are
related or paired (ojen with the same par8cipants in each sample)
Parametric Test (ra8o or interval data) Scenario Between 2008 and 2010, Tourism South East ran a series of e-‐
commerce workshops across the South East region. As part of the monitoring process, Tourism South East want to establish if these workshops have had an impact on business aStudes to the value of the internet. Paired Values TSECMS08/ TSECMS10
Wilcoxon Appropriate Related/Paired Variables (Ordinal)
Identify potential variables for use in a Wilcoxon Test from the dataset guide
Choosing the Right Test
Two continuous which is the same administered twice
Choosing the Right Test Two categorical
Research Design and Data Collection:
Chi-Squared
Chi-‐Squared A test to examine difference between data that is grouped into
independent and mutually exclusive groups
Data must be in the form of con8ngency tables, showing frequency
of observa8ons in different categories (h) for one or more samples (k)
Chi-‐Squared Test uses nominal data
Chi-‐Squared Scenario A review of research literature conducted by the University of
Chichester indicates that the length of business ownership influences business response to recession, and the longer the length of business ownership, the more proac8ve businesses are in terms of their overall business strategy and their response to recession.
In this instance, the University would like to establish if there is a
significant difference between the length of business ownership and the business response to recession. Nominal Variables Lengthcat v Response
Crosstabula>ons Examples: Length of Ownership by Response to Recession
Chi-‐Squared Nominal Variables
Identify potential variables for use in a Chi-Squared T-Test from the dataset guide
Research Design and Data Collection:
Tests for Association
Choosing the Right Test Two categorical
Choosing the Right Test Two separate continuous
Research Design and Data Collection:
Correlation
Correla>on Correla8on is a means to measure the degree of associa8on
between two variables, that is, the extent to which changes in values of one variable are matched by changes in another variable
• Posi&ve Correla&on • Measures the extent to which higher values of one variable are matched with higher values of the other
Correla>on Correla8on is a means to measure the degree of associa8on
between two variables, that is, the extent to which changes in values of one variable are matched by changes in another variable
• Nega&ve Correla&on • Measures the extent to which higher values of one variable are matched with lower values of the other
Types of Correla>on When variables are parametric in nature (e.g. ra8o/interval data),
the most commonest measure of correla8on is the Pearson’s Product Moment Correla&on Coefficient
Where data is ordinal or when not normally distributed, or when
other assump8ons of the Pearson correla8on coefficient are violated, we use the Spearman Rank Correla&on Coefficient
Correla>on Scenario Tourism South East is in the process of developing a new
Sustainable Tourism Strategy for the region, as part of which they are inves8ga8ng factors influencing the uptake of local goods and services. Tourism South East would like to establish if there is any rela8onship/associa8on between GTBS score and the use of local goods and services.
Correla>on Scenario Tourism South East is in the process of developing a new
Sustainable Tourism Strategy for the region, as part of which they are inves8ga8ng factors influencing the uptake of local goods and services. Tourism South East would like to establish if there is any rela8onship/associa8on between GTBS score and the use of local goods and services. Ratio/Interval Variables GTBS10 v Green10
Correla>on Pearson’s Product Moment Correla&on Coefficient (Ra&o/Interval)
Spearman Correla&on Coefficient (Ordinal)
Nominal (Categorical) [2 Levels]
Nominal (Categorical) [2 Levels]
Identify potential variables for use in Correlation Tests from the dataset guide
Learning Outcomes At the end of this session you should be able to: Map out different types of advanced sta8s8cal analysis and
demonstrate how the choice of sta8s8cal analysis is influenced by the type of data
Map and log opportuni8es for advanced sta8s8cal analysis and
sta8s8cal tests against the different variables within the dataset guide
Self-‐Directed Ac>vity: To do: Please complete self-‐directed Ac8vity 7 using the Dataset Guide
Analysis Template