Section 2

Page 1

Section 2

Descriptive Statistics

Learning Outcomes At the end of this session, you should be able to: 

Produce descriptive statistics including the mean, median and mode

Understand the features of measures of central tendency

Apply appropriate descriptive statistics to different data types

Import data into SPSS and use SPSS to produce descriptive statistics and cross-tabulations

Use SPSS to graphically describe data through the use of frequency histograms, stem and leaf plots and box plots



Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

2.0

Introduction The first part of the data analysis process is the production of basic descriptive statistics, such as the mean, median, mode, standard deviation, standard error, and basic frequency and contingency tables. The analysis of the descriptive statistics can then be used to ascertain the nature of the data, especially in relation to its distribution, and what types of statistical tests can be used to analyse the data further.

2.1

Measures of Central Tendency Averages, or measures of central tendency, give a simple summary of the characteristics of the data being described. How the data is described depends upon its quality. The three measures used are the mean, median and mode (see Table 2.1).

Table 2.1:

2.2

Measures of Central Tendency Name

Data Type

Description

Example

Mean

Ratio or interval

Total/Number of samples

‘The mean July maximum in Bognor is 210C’

Median

Ordinal

Middle in rank order

‘Half of the customers travel more than 6km to Tescos’

Mode

Nominal

Most common category

‘Most visitors are from London’

Arithmetic Mean This is the figure that most people would produce if they were asked to give the average set of figures. The mean is the most commonly used of all averages and is calculated by adding together all the values in a series and dividing the total by the number of items in the series. The computation formula is:

n

x =  xi

/

n

i =1

The symbols may be explained as follows:

x

pronounced ‘x-bar’ denotes the arithmetic mean of a sample;

pronounced ‘sigma’ means ‘the sum of’;

© Dr Andrew Clegg

p. p. 2-25 25


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

xi n

means all values of x where x1, x2, x3...xn represent the values of each observation in a data set. Thus i assumes, in turn, the values of 1,2,3 and so on and; is the total number of observations in the data set.

Therefore for the following data series: 8,2,4,7,3,4,1,2,2,1 The arithmetic mean is calculated as:

x=

2.2.1

8 + 2 + 4.... + 2 + 1 34 = = 3.4 10 10

Features of the Mean When using the mean, you should consider the following points: 

The mean is easy to understand and calculate and is the most commonly used of all averages;

It makes use of every value in the distribution, leading to a mathematical exactness which is useful for further mathematical processing;

It can be determined if only the total value of the items and the number of items are known, without knowing individual values;

It can be distorted by extreme values in the distribution;

For a discrete distribution, the mean may be an ‘impossible’ figure e.g. 17.5 cigarettes per day when all values in the distribution are whole numbers.

© Dr Andrew Clegg

p. p. 2-26 26


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

2.3

The Median There are however certain occasions when it is either not possible or not practical to use the arithmetic mean, particularly if the values of some of the extreme items are difficult to determine or if it is possible only to arrange the items in order without assigning numerical values to them. In such cases the representative or average figure may be taken as the middle item when the series is arranged in ascending or descending order. The statistical term for this middle item in a set of data is the median. The median is a position average or the value of the middle item of a series. For example, the median of the series 1,2,2,4,7,7,10 is the value 4 since it is the middle item. For a series with an even number of items (e.g. 1,2,3,4), there is no middle item and yet a median may still be required. In this case the median is conventionally taken as the arithmetic mean of the two central items, in this case, a value of 5.5. Therefore, to reemphaise:

WORKED EXAMPLE Example 1: A series with an uneven number of items The data series in rank order is: 1

2

2

4

7

7

10

The median is the middle item which in this case is 4.

Example 2: A series with an even number of items The data series in rank order is: 1

2

2

4

7

7

10

11

The median is the arithmetic mean of the two central items: 4 + 7

2

Š Dr Andrew Clegg

= 5.5

p. p. 2-27 27


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

2.3.1

The Median of a Grouped Distribution Strictly speaking, it should be impossible to find the median of a grouped distribution as detailed information is lost when data is gathered into classes. However, as with the arithmetic mean, several assumptions are made and an answer is produced. There is also a convention to say which is the median item in a grouped frequency distribution with either an odd or an even number of items. If a frequency distribution contains a total of n items then the median item will be:

 n + 1 th   2  item if n is odd

a) the 

b) the

n th item if n is even 2  401+1th  = 201st item.  2 

For a distribution of 401 items the median will thus be the 

For a distribution of 400 items the median will be the

400 2

th

= 200th item.

To find the median within a grouped data set it is first necessary to construct a table showing the cumulative frequencies. The data on the following pages highlights the annual rainfall in Kano, a popular tourist destination in Nigeria, and it should be clear that Table 2.2 has been produced by dividing the annual rainfall totals into ranked categories (400-499mm etc) and then counting the number of years that fall into each of these categories. These are then added up to produce the cumulative frequency, which can be expressed as a percentage for easier interpretation (see Figure 2.1).

© Dr Andrew Clegg

p. p. 2-28 28


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Table 2.2:

Rainfall for Kano, Nigeria from 1907 to 1974

Year Rainfall 1907 930 1908 970 1909 650 1910 890 1911 1230 1912 850 1913 750 1914 950 1915 680 1916 1010 1917 740 1918 480 1919 690 1920 820 1921 990 1922 860 1923 1040

Table 2.3:

Year 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940

Rainfall 820 1100 540 780 850 900 700 770 890 830 1000 1180 1010 850 830 940 980

Year 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957

Rainfall 740 1110 810 840 620 790 480 990 1060 800 700 580 920 810 1040 710 1110

Year 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974

Rainfall 1070 1010 830 1020 760 780 1140 700 750 900 780 970 960 710 660 410 560

Cumulative Rainfall for Kano, Nigeria

Annual Rainfall in mm. Frequency 400-499 3 500-599 3 600-699 5 700-799 15 800-899 15 900-999 12 1000-1099 9 1100-1199 5 1200-1299 1

Š Dr Andrew Clegg

Cumulative Frequency 3 6 11 26 41 53 62 67 68

Cumulative % Frequency 4.4 8.8 16.2 38.2 60.3 77.9 91.2 98.5 100

p. p. 2-29 29


Geographical 2 Data Analysis for Techniques Research

Figure 2.1:

Descriptive DescriptiveStatistics Statistics

Cumulative Frequency Curve for Kano, Nigeria

By reading off at 50% on the y axis (Cumulative % Frequency) to the line, and then down to the x axis the median is calculated at about 850mm. The median is, in fact, what is quite often meant by ‘the average’ in everyday conversation, in that half of the years tend to have more rainfall than this, and half less.

2.2.1

Features of the Median When using the median, you should consider the following points: 

Half the items in the series will have a value greater than or equal to the median and half less than or equal to the median. It is therefore a measure of rank or position;

It is easy to understand;

It is unaffected by the presence of extreme items in the distribution;

If found directly (from ungrouped data) it will be the same as an actual item in the distribution;

It may be found when the values of all the items are not known, provided that values of middle items and the total number of items are known;

Ranking the items can be tedious;

The median cannot be used for further mathematical processing;

It may not be representative if there are few items.

© Dr Andrew Clegg

p. p. 2-30 30


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

2.4

The Mode In an ungrouped, discrete distribution the mode is the value which occurs most often; that is, the value with the highest frequency. The mode of the series 1,2,2,3,4 is the value of 2. Unlike the mean and the median, it is not necessarily unique. For example the series 1,2,2,3,4,4 has two modes: 2 and 4. In a continuous frequency distribution it is possible that no two values will be the same. In this sort of situation the mode is defined as the point where there is the greatest clustering of values, or maximum frequency density.

2.4.1

Mode for Grouped Data To find the mode within a grouped data set it is first necessary to construct a histogram showing the frequency distribution (see Figure 2.2). Having constructed the graph, first identify the modal class (the class with the greatest frequency or frequency density). To calculate the actual value of the mode, draw a line from the top right-hand corner of the modal rectangle to the point where the top of the adjacent rectangle on the left meets it. Now draw a similar line from the top left-hand corner to the point where the adjacent rectangle on the right meets it. Now draw a perpendicular from the point at which these lines cross to the horizontal axis. This point gives the value of the mode. The Calculation of the Mode from a Frequency Histogram

Frequency

Figure 2.2:

70 Mode = 34

60 50 40 30 20 10 0

10

20

30

40

50

60

70

80

90

100

Age

While this technique will give the specific value of the mode, it is often more useful and meaningful to simply indicate the boundaries of the modal class. In other words, rather than attempting to calculate an accurate value for the mode, which may not be entirely accurate or representative, it would be more meaningful to say that more people for example fell within the 30 and under 40 age group than any other group described by Figure 1.2. Š Dr Andrew Clegg

p. p. 2-31 31


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

2.4.2

Features of the Mode When using the mode, you should consider the following points:

For discrete data it is an actual single value;

For continuous data it is the point of highest frequency density;

It is easy to understand;

Extreme items do not affect its value;

It can be estimated from incomplete data;

It cannot be used for further mathematical processing;

It may not be unique or clearly defined;

It requires arrangement of the data which may be time consuming.

Activity 1: For practice work out the mean, median and mode for the following sets of scores relating to the number of bedspaces in serviced accommodation in Torquay. Set 2:

Set 1: 4 16 16 20 32 10

© Dr Andrew Clegg

9 10 20 15 14 27

16 15 8 8 6 15

Mean =

Mean =

Median =

Median =

Mode =

Mode =

14 12 10 2 14 30

p. p. 2-32 32


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

2.5

Comparison of the Mean, Median and Mode The mean, median and mode are the three most important statistical measures of location and central tendency. Here are some guidelines to help you decide which value should be used in a particular case:

2.5.1

To determine what would result from an equal distribution use the mean (e.g. to determine the per capita consumption of jelly babies);

If position or ranking is involved use the median which gives the half-way value (e.g. a student interested in whether his exam mark places him in the upper or lower half of the class will need to compare his mark with the median mark);

Where the most typical value is required use the mode (e.g. a shoe manufacturer may want to know the average shoe size for ladies. For production planning it will be the mode that he requires as it will tell him the most common shoe size).

Which Measure Should You Use? The type of measure that you use will depend on the data that you are using, but ultimately whatever measure you choose should provide a good indication of the typical score in your sample. The mean is the most frequently used measure of central tendancy, because it is calculated from the actual scores themselves, not from ranks, as is the case with the median, and not from frequency of occurence, as in the case of the mode. However, as mentioned earlier, as the mean uses all the scores in the calculation it is sensitive to extreme values. Look at the following sets of scores: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 The mean from this set of data is 5.5 (the same as the median). If we were to change one of the scores to make it more extreme, we would get the following: 1,2, 3, 4, 5, 6, 7, 8, 9, 20 The mean is now 6.5, although the median is still 5.5. If we were to make the final score even more extreme we would get the following: 1, 2, 3, 4, 5, 6, 7, 8, 9, 100 The mean is now 14.5, which as you can see is not really representative of this set of scores. As we have only changed the highest score, the median remains 5.5. In this case, the median becomes a better measure of central tendancy. Therefore when deciding which measure to use it is always useful to check the data for extreme values. Where extremes scores are present, use the median as this simply gives you the score in the middle of other scores when they are put into ascending order. The insensitivity to extreme values makes the median a useful alternative to the mean.

© Dr Andrew Clegg

p. p. 2-33 33


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

The mode can be used with any type of data, as it relates to the most frequently occurring score and does not require any calculation. The median and mode cannot be used with certain types of data. For example if you were discussing occupation or attraction classifications it would be meaningless to rank these in order of magnitude. Again, when using the mode it is important that it provides a good indication of the typical score. Consider the following two sets of data: A] 1,2,2,2,2,2,2,2,3,4,5,6,7,8 B] 1,2,2,3,4,5,6,7,8,9,10,11,12 In set A there are more 2s than any other number and the mode would provide a suitable measure of central tendency. However, in set B, although the mode is again 2, it is not such a good indicator as its frequency of occurence is only just greater than all the other scores.

ď €

Activity 2: Which measure of central tendency would be most suitable for each of the following sets of data: a] b] c] d]

2.6

1, 23, 25, 26, 27, 23, 29, 30 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 5 1, 1, 2, 3, 4, 1, 2, 6, 5, 8, 3, 4, 5, 6, 7 1, 101, 104, 106, 111, 108, 109, 200

........................................ ........................................ ........................................ ........................................

The Population Mean The measures of central tendancy outlined above are useful for giving an indication of the typical score in a sample. However, what if you wanted to get an indication of the typical score in a population. In theory, one could calcuate the population mean (a parameter) in a similar way to the calculation of a sample mean; obtain scores from everyone in the population, sum them and divide by the number in the population. However, this would not be possible. We therefore have to estimate the population parameters from the sample statistics. One way of estimating the population mean is to calculate the means for a number of samples and then calculate the mean of these sample means. It has been found that this gives a close approximation of the population mean. So why does the mean of the sample means approximate the population mean? Imagine randomly selecting a sample of people and measuring their IQ. It has been found that the population mean for IQ is 100. It could be that, by chance, you have selected mainly geniuses and that the mean IQ of the sample is 150. This is clearly above the population mean of 100. You might select another sample that happens to have a mean IQ of 75, again not near the population mean. It is clear that the sample mean need not be a close approximation of the population mean. However, if we calculate the mean of these two samples, we get a much closer approximation to the population mean: (75+100)/2 = 112.5

Š Dr Andrew Clegg

p. p. 2-34 34


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

The mean of the sample means (112.5) is a closer approximation of the population mean (100) than either of the individual sample means (75 and 150). If several samples of the same size are taken from a population, some will have a mean higher than the population mean and some will have a lower mean. If all the sample means were plotted as a frequency histogram the graph would look similar to Figure 2.3. Figure 2.3:

Distribution of Sample Means Selected from a Population with a Mean of 100

Population mean and mean of sample means are both 100

If we calculated the mean of all these sample means it would be equal to 100, which is also equal to the population mean. This tendency of the mean of sample means to equal the population mean is known in statistics as the Central Limit Theorem. Knowing that the mean of the sample means gives a good approximation of the population mean is important as it helps us to generalise from our samples to our population. This will be considered in more detail when we look at dispersion.

Š Dr Andrew Clegg

p. p. 2-35 35


Geographical 2 Data Analysis for Techniques Research

2.7

Descriptive DescriptiveStatistics Statistics

Skew and the Relationship of the Mean, Median and Mode Skew is the term that is used to describe the shape of the data as depicted by its frequency distribution or frequency curve. Under a symmetrical distribution curve, or what is also called ‘Normal Distribution’ (this will be covered in more detail when we look at measure of dispersion), the data builds up slowly from the left to a central peak or modal point and then declines to the right. In this situation, the mean, median and the mode all coincide (see Figure 2.4). A positive skew is when the peak lies to the left and a negative skew when it lies to the right. The further the peak lies from the centre of the horizontal axis, the more the distribution is said to be skewed.

Figure 2.4:

Symmetrical, Positively and Negatively Skewed Data Distributions

Where the distribution is positively skewed, the mean and median will be pulled to the right of the mode, and where it is negatively skewed, the mean and median are pulled to the left. Consequently, in a positively skewed distribution, the mean will have the greatest value, the mode the lowest value and the median will fall between the two. Conversely, in a negatively skewed distribution, the mode will have the highest value and the mean will have a lower value than the median and the mode.

Š Dr Andrew Clegg

p. p. 2-36 36


Geographical 2 Data Analysis for Techniques Research

2.8

Descriptive DescriptiveStatistics Statistics

Using SPSS to Calculate Descriptive Statistics Having considered the basic calculation of the mean, median and mode by hand (and hopefully not to painfully!), the aim of this next section is to show you how to produce basic descriptive statistics using SPSS. You can also produce descriptive statistics in Access, and this will be demonstrated later in the module. We first need to consider the basic elements of the SPSS operating system.

2.8.1

An Introduction to SPSS SPSS (PASW Statistics) is a powerful statistical tool that can be used to perform a wide range of statistical techniques. When analysing data in SPSS it is often convenient to transfer over the data you which to analyse from an Excel spreadsheet. The following section will highlight how to import an Excel spreadsheet, and provide a basic introduction to the SPSS environment, before detailing in more detail how to produce descriptive statistics. To import an Excel spreadsheet, first open SPSS. SPSS asks you what you would like to do. Move the mouse over Open an Existing Data Source and press the left mouse button. Either choose the required files or select More Files and click OK.

simulation

Š Dr Andrew Clegg

p. p. 2-37 37


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

The Open File dialog box appears. Move the mouse over the drive containing the file you want to open and then press the left mouse button. The file Dataset is located in the BML224 home page on Moodle.

SPSS must be told to look for an Excel file. Therefore in the Files of Type box make sure that Excel is selected [Move the mouse over and press the left mouse button. A sub menu of different file types appear. Move the mouse over Excel and press the left mouse button]. Now select the Dataset file and click Open. The Opening File Options dialog box appears. In the Excel spreadsheet you are going to import, the first row in the spreadsheet contains the field names of the variables you want to examine. To assist your data analysis, you need to ensure that SPSS recognises this.

Move the mouse over Read Variable Names option and press the left mouse button becomes ). Move the mouse over OK and ( press the left mouse button. Š Dr Andrew Clegg

p. p. 2-38 38


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

SPSS now automatically imports the fields in the Excel spreadsheet and the data is displayed in the Data Editor window.

You know need to save this file to your own homespace on the network. Move the mouse over File and press the left mouse button. Move the mouse over Save As and press the left mouse button again. The Save As Dialog box appears. Save the file as DATASET.SAV. Note that .SAV is the file extension for data tables in SPSS. If you need to reload this file at any point, in the Open File dialog box select the DATASET.SAV file.

Š Dr Andrew Clegg

p. p. 2-39 39


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

Before using SPSS to perform basic frequency counts and descriptive statistics on the results of the Interview data you first need to understand the nature of the data. For example, some variables are based on numeric coding schemes (nominal, categorical data types) and others on specific data values (interval or ratio data types). For those questions based on numeric coding schemes, certain descriptive statistics are not appropriate, although in this case SPSS can be used to perform basic frequency counts. Details of the variables in the Dataset file are included in the Dataset guide which has been given to you as part of the module resources. Please read through this guide carefully and become familiar with the different types of data, as this will be central to your successful completion of this module.

Š Dr Andrew Clegg

p. p. 2-40 40


Geographical 2 Data Analysis for Techniques Research

2.8.2

Descriptive DescriptiveStatistics Statistics

Using the Variable View In SPSS, we can use the variable view to check the integrity of the data and to apply additional information to the coding schemes to aid our analysis of the data. In the bottom of the SPSS window, click on the Variable View tab. The Variable View window is displayed. This window provides specific information relating to the variables that we have imported in the Dataset file. A number of key areas need to be checked at this point. First, check the Type column. In order for SPSS to conduct statistical analysis on the variables in the Dataset file all the variables here should be listed as Numeric.

In this instance the Greenrank06 variable is listed as a String. This needs to be changed to Numeric. To do this move the mouse over String and press the left mouse button. The cell is highlighted and a button appears.

Click the button and the Variable Type dialog box appears.

simulation Š Dr Andrew Clegg

p. p. 2-41 41


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Select Numeric and click OK.

Check the other variables to ensure that they are set as numeric. We can also use the Variable View to check the Measurement type of the variables. In this instance the measurement type should look like this. Refer back to your introductory notes to check on different data types. If the measurement type is not correct for a specific variable, move the mouse over the measurement cell in question and press the left mouse button. The cell is highlighted and a button appears.

Click on the button and a sub menu appears, offering three options: Scale, Ordinal and Nominal. Move the move over the required data type and press the left mouse button. The new data type will be presented. Note that ratio and interval data (e.g. age/investment) are classified as Scale). In the Variable View we can also assign more specific value labels to each of the variables. For example if we take Area as an example of the basic coding scheme in place here, Chichester District = 1 and Arun District =2. Any subsequent analysis that we perform will use this base coding scheme in any output. In order to make the SPSS output more self-explanatory we can assign additional value labels so that any output actually refers to Chichester District and Arun District. In the Variable View move the mouse over Values for the Area variable and press the left mouse button. The cell is highlighted and a button appears. Click the button.

Š Dr Andrew Clegg

p. p. 2-42 42


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

The Value Labels dialog box appears.

In the Value: box type 1. In the Value Label: box type Chichester District.

Chichester District

Click Add.

In the Value: box type 2. In the Value Label: box type Arun District. Click Add.

Arun District 1 = ‘Chichester District’

Click OK.

The changes you have made are reflected in the Variable View.

Repeat this process to add Value Labels to the remaining variables (where appropriate!). Return to the Data View and SAVE the file. We can now experiment with producing descriptive statistics.

© Dr Andrew Clegg

p. p. 2-43 43


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

By using the Value Labels in the Data View window you can switch the value labels between the numeric coding and the full text labeling. Click the button to toggle between the different options.

Numeric Coding

Text Label

Š Dr Andrew Clegg

p. p. 2-44 44


Geographical 2 Data Analysis for Techniques Research

2.8.3

Descriptive DescriptiveStatistics Statistics

Working with SPSS Output Before we start producing descriptive statistics, it is worth mentioning that SPSS output can be cut and paste into a Word document (or equivalent package). The process is very simple. In the output window, select the item you want to cut and paste, in this case a histogram. When the item is selected a black border will appear. Copy the item (Edit>Copy or right mouse click>Copy).

Open Word and paste the selection into your document.

simulation Š Dr Andrew Clegg

p. p. 2-45 45


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

To print specific elements of the output, first select the element you wish to print. When the item is selected a black border will appear. Select Print from the File menu. The Print dialog box opens. Make sure that Selection is highlighted and click OK.

The required element is printed. Please use this method to print and annotate output that will be created during the module.



Please use the cut and paste process highlighted here to complete your log book that we will use throughout this module.

Additional guidance notes on the different features of SPSS are available in the appendices of this handbook. When using SPSS to analsye data, you should not be directly cutting and pasting SPSS output into your work. Outputs tables should ideally be recreated in Word, and data should be transferred into Excel to create appropriate graphs.

Š Dr Andrew Clegg

p. p. 2-46 46


Geographical 2 Data Analysis for Techniques Research

2.8.4

Descriptive DescriptiveStatistics Statistics

Producing Descriptive Statistics As mentioned earlier, before using SPSS to perform basic frequency counts and descriptive statistics on the results of the survey data you first need to understand the nature of the data (refer back to Section 1.6). In this case, we will start by exploring the categorical/nominal variable: OCC (occupation). Remember for this variable it would not be appropriate to apply the mean, median or standard deviation. To perform a basic frequency count, first decide on the variable you which to examine. In this case we shall examine OCC.

To do so, first move the mouse over Analyse and press the left mouse. Move the left mouse button over Descriptive Statistics and then over Frequencies and press the left mouse button again.

The Frequencies dialog box appears.

simulation

Move the mouse over the variable you want to examine (in this case Occ) and press the left mouse button. Move the mouse over the central arrow and press the left mouse button again. Alternatively, select the variable you want to examine and quickly double click the left mouse button. The selected variable moves across into the Variable(s) box. Note this procedure can be repeated for multiple variables. Click OK.

Š Dr Andrew Clegg

p. p. 2-47 47


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

The results of the frequency count are displayed in the output window. Notice that the frequency table has listed the occupations as a result of you entering in data for the Value Labels. This helps to make the table more self-explanatory. Any statistics you generate in SPSS will also be displayed in this output window. This is very useful as it means all your calculations are stored in one file that you can save and open at a later date. Save the output file to your own homespace on the network. Save the file as DS-OUTPUT1. Repeat this procedure to perform frequency counts to complete the Tables 1 and 2 overleaf. Your additional frequency counts will appear in the output window. Save the output regularly. Record your results overleaf or alternatively print out and fully annotate your SPSS output and file in your work folder. The information presented in the frequency chart could now be copied or cut and paste into Excel where you could create an Excel chart to show the distribution of the data.

simulation

An online simulation of how to create basic frequency statistics is available on the BML224 home page. Please use this simulation to familiarise yourself with the basic prodecures outlined here.

Š Dr Andrew Clegg

p. p. 2-48 48


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Activity 3:

Table 1: The Distribution of Accommodation by Size Size

Frequency

Percentage

Small

Medium

Large

Table 2: The Distribution of Accommodation by Price Price

Frequency

Percentage

Up to £30

£31 to £50

£51 to £70

£71 to £90

£91+

Having completed Tables 1 and 2, now have a go at completing Table 3. It is exactly the same process but you will need to perform a frequency count for each separate question in the table (the relevant variable name is given in the brackets).

© Dr Andrew Clegg

p. p. 2-49 49


Geographical 2 Data Analysis for Techniques Research

Activity 4:

Descriptive DescriptiveStatistics Statistics

Table 3: Business Responses to Tourism Issues

You will have noticed that the frequency count produced relates to the entire sample of 300 businesses, and there is no differentiation based on specific cases such as location. By selecting specific cases we can use SPSS to produce more detail frequency counts. In the following example we will produce a frequency count showing the frequency distribution of different occupation types by area.

© Dr Andrew Clegg

p. p. 2-50 50


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Return to the Data View window in SPSS. Move the mouse Data and press the left mouse button.

simulation

Move the mouse over Split File and press the left mouse button

The Split File dialog box opens.

Select Compare groups

Then select Area and move into the Groups Based on box. Then click OK.

Š Dr Andrew Clegg

p. p. 2-51 51


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

The frequency table is displayed in the output window. As you can see the frequency table now gives a breakdown of occupation type by area (our prior labelling clearly referring to the Chichester and Arun Districts). Let us repeat this frequency count but this time instead of using Area we will use Town Code.

Š Dr Andrew Clegg

p. p. 2-52 52


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Return to the Data View window in SPSS. Move the mouse over Data and press the left mouse button.

Move the mouse over Split File and press the left mouse button.

The Split File dialog box opens. Deselect Area and then select Town Code and move into the Groups Based on box.

Click OK.

Š Dr Andrew Clegg

p. p. 2-53 53


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

Run the frequency count again and the frequency table is displayed in the output window. As you can see the frequency table now gives a breakdown of occupation type by Town Code (our prior labelling clearly referring to the actual towns).

Š Dr Andrew Clegg

p. p. 2-54 54


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Activity 5:

Using the Split File option please complete the following tables. Table 4: Size of Accommodation by Area

Size of Accommodation

Area

Small [No. of Ests]

Medium [No. of Ests]

Large [No. of Ests]

Total

Chichester District % Distribution Arun District % Distribution

Table 5: Size of Accommodation by Town

Size of Accommodation

Town

Small [No. of Ests]

Medium [No. of Ests]

Large [No. of Ests]

Total

Chichester % Distribution Midhurst % Distribution Arundel % Distribution Bognor Regis % Distribution

© Dr Andrew Clegg

p. p. 2-55 55


Geographical 2 Data Analysis for Techniques Research

Activity 5:

Descriptive DescriptiveStatistics Statistics

Table 6: Business Response to Employment Opportunities by Area

Table 7: Business Response to Employment Opportunities by Town

© Dr Andrew Clegg

p. p. 2-56 56


Activity 6: Self-Directed Cut and paste the results from Table 5 in your SPSS output into Excel. Edit the layout of the results accordingly and produce the following graph. The graph should be presented on A4 in landscape format. Please copy the format of this chart exactly.

The Size Structure of Accomodation in the Chichester and Arun Districts

Bognor Regis Small

Medium

Arundel Town

ď €

Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Large

Midhurst

Chichester

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Percentage

Please print of the chart and have it checked by the module tutor. File the chart in your work folder.

Š Dr Andrew Clegg

p. p. 2-57 57


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

Before we do any additional analysis it is important to remember to set the Split File dialog box, so any subsequent analysis is based on the entire sample. Return to the Data View window in SPSS. Move the mouse Data and press the left mouse button.

Move the mouse over Split File and press the left mouse button

The Split File dialog box opens.

Select Analyze all cases, do not create groups and then click OK.

Failure to reset the Split Files dialog box can result in inaccurate statistics being created.

Š Dr Andrew Clegg

p. p. 2-58 58


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

There are a number of ways in which you can produce Descriptive Statistics for interval or ratio variables in SPSS. Method 1: First decide on the variable you which to examine. In this case we shall examine the turnover of businesses in 2008 (Turnover08).

To do so, first move the mouse over Analyse and press the left mouse.

Move the left mouse button over Descriptive Statistics and then over Frequencies and press the left mouse button again. The Frequencies dialog box appears.

Move the mouse over the variable you want to examine (in this case Turnover08) and press the left mouse button. Move the mouse over the central arrow and press the left mouse button again. Alternatively, select the variable you want to examine and quickly double click the left mouse button. The selected variable moves across into the Variables box. Note this procedure can be repeated for multiple variables. Move the mouse over Statistics and press the left mouse button.

Š Dr Andrew Clegg

simulation p. p. 2-59 59


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

The Frequencies: Statistics dialog box appears. This dialog box gives you the opportunity to select a wide range of descriptive statistics. Select the options you want to include by moving the mouse over the blank square and pressing the left mouse button so a tick appears. When you have completed your selection move the mouse over Continue and press the left mouse button.

Note that SPSS also allows you to select measures of dispersion. This will be discussed in more detail in the next session. This will take you back to the Frequencies dialog box. Move the mouse over OK and press the left mouse button. SPSS automatically calculates the necessary statistics and displays the results in the Output window. This method not only produces the basic descriptive statistics for the variable but also a frequency table (which can be deleted).

Š Dr Andrew Clegg

p. p. 2-60 60


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

Descriptive statistics can also be produced by selecting Descriptives instead of Frequencies in the Descriptive Statistics sub menu. Follow the same procedures as in the previous example, however, in this case click Options to specify the descriptive statistics you want SPSS to produce.

Select the options you want to include by moving the mouse over the blank square and pressing the left mouse button so a tick appears.

When you have completed your selection move the mouse over Continue and press the left mouse button. This will take you back to the Descriptives dialog box. Move the mouse over OK and press the left mouse button. SPSS automatically calculates the necessary statistics and displays the results in the Output window.

Š Dr Andrew Clegg

p. p. 2-61 61


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

You will have noticed that the descriptive statistics produced for Turnover08 relate to the entire sample of 300 businesses. By using the Split File option again we can look in more detail at the characteristics of turnover in relation to specific cases such as size of business or location. For example in the following, we can use the Split file to look at the average turnover in the Chichester and Arun Districts. As before open the Split File dialog box and select Compare groups. Select Area to go in the Groups Based on: box.

Now produce descriptive statistics for Turnover08 again (using either the descriptives or frequencies option). In the following example I have created descriptive statistics using the frequencies option and you can see in the output that descriptive statistics have now been produced for both the Chichester and Arun Districts.

Š Dr Andrew Clegg

p. p. 2-62 62


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

Method 2: The second (and slightly faster method) is to use the Explore function. In this example we will again examine the turnover of businesses in 2008 (Turnover08). To do so, first move the mouse over Analyse and press the left mouse.

Move the left mouse button over Descriptive Statistics and then over Explore and press the left mouse button again. The Explore dialog box appears.

Move the mouse over the variable you want to examine (in this case Turnover08) and press the left mouse button. Move the mouse over the Dependent List arrow and press the left mouse button again.

Š Dr Andrew Clegg

p. p. 2-63 63


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

Turnover08 appears in the Dependent List.

Make sure that Statistics is selected in the dialog box. We will come back to plots later. Click OK. Descriptive statistics for Turnover08 are produced in the output window.

Š Dr Andrew Clegg

p. p. 2-64 64


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

As in the previous method producing descriptive statistics, the values given in the output relate to the entire sample. By adding variables in the Factor List in the Explore dialog box, we can differentiate by specific cases. Return to the Explore dialog box.

Select Area from the variable list and click the Factor List arrow. Area will appear in the Factor List window. This will give us separate descriptive statistics for the Arun and Chichester Districts. Remember in the previous method, we used the Split File option to group around specific cases. Click OK. Descriptive statistics for business turnover in the Arun and Chichester Districts are produced in the output window.

Š Dr Andrew Clegg

p. p. 2-65 65


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

Let me illustrate another example. Return to the Explore dialog box. Remove Area from the Factor List and replace with E-Strategy. Click OK.

Descriptive statistics for business turnover for E-Commerce Adopters and E-Commerce Non-Adopters are produced in the output window.

Š Dr Andrew Clegg

p. p. 2-66 66


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Activity 7:

Using either method, attempt to complete the following tables. Table 8: Descriptive Statistics for Turnover08 by Town

Table 9: Descriptive Statistics for GTBS Score in 2008 [GTBS08] by Size of Business

GTBS08

Size of Business

Mean

Median

Mode

Standard Deviation

Range

Small Medium Large

Table 10: Descriptive Statistics for Invest by GStrategy

© Dr Andrew Clegg

p. p. 2-67 67


Geographical 2 Data Analysis for Techniques Research

2.9

Descriptive DescriptiveStatistics Statistics

Graphically Describing Data As mentioned earlier, when using statistics it is important to understand the data that you are using. One of the best ways of doing this is through exploratory data analysis, and investigating your data using graphical techniques. The next section will consider three main elements: frequency histograms, stem and leaf plots and box plots.

2.9.1

Frequency Histograms In the above section you have used SPSS to perform basic frequency counts. The frequency histogram is a useful way of representing a frequency count more graphically, and allowing us to inspect for any extreme values (see Figure 2.5). Any extreme values and possible errors that have been made in inputting the data are often easier to spot when you have graphed the data. The frequency histogram is also useful for discovering other important characteristics of your data. For example you can easily record the value of the mode by looking for the tallest column in the chart. In addition, the histogram also gives you useful information about how the values are distributed. However, when interpreting the distribution of the data, be aware that the interpretation of your histrogram is dependent upon the particular intervals that the bars represent. The way that the data is distributed will become important when we look at normal distribution and dispersion in the next session. The distribution and character of the data is also an important consideration in the use of inferential statistics that will be examined later in this module.

Š Dr Andrew Clegg

p. p. 2-68 68


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Figure 2.5:

Freqency Histogram showing the Mean, Median and Mode Mean (not normally shown on histograms)

Median

Mode

16.56

[Note: The frequency histogram is based on the following data: 2, 12, 12, 19, 19, 20, 20, 20, 25]

2.9.2

Stem and Leaf Plots Stem and leaf plots are similar to frequency histograms in that they allow you to see how the scores are distributed. They also retain the values of the individual observations. A basic example of a stem and leaf plot is shown below: Stem and Leaf Plot [a] [Data set= 2, 12, 12, 19, 19, 19, 20, 20, 20, 25] Stem Tens

Leaf Units

0 1 2

2 22999 0005

The score of 2

The score of 25

A stem and leaf plot based on a larger data set is illustrated overleaf.

Š Dr Andrew Clegg

p. p. 2-69 69


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Stem and Leaf Plot [b] [Data set= 1, 1, 2, 2, 2, 5, 5, 5, 12 ,12, 12, 12, 14, 14, 14, 14, 15, 15, 15, 15, 18, 18, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 28, 28, 28, 28, 28, 28, 28, 28, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 42, 42, 42, 43, 43, 44 ] Stem 0 1 2 3 4

Leaf 11222555 22224444555588 44444555555588888888 2233334444455555 222334

You can see the similarities between histograms and stem and leaf plots if you turn the stem and leaf plot on its side. When you do this you can get a good representation of the distribution of the data. In Stem and Leaf Plot [a] the first line contains the scores 0 to 9, the next line 10 to 19 and hte last line 20 to 29. Therefore in this case the stem indicates the tens and the leaf the units. You can see the score of 2 is represented as 0 in the tens column (the stem) and 2 in the units column (the leaf), 25 is represented as a stem of 2 and a leaf of 5. The same pattern applies to Stem and Leaf Plot [b], which highlights that this approach is useful for presenting lots of data. However, there are times when the system of blocking in tens is not very informative. For example look at the following Stem and Leaf Plot. Stem and Leaf Plot [c] Stem 0 1 2 6

Leaf 0000022222222333333333555555555555555777777777777799999999 000000033333888 3 4

This Stem and Leaf Plot is not really that informative, and only indicates that most of the values are below 20. An alternative system is to block the scores in groups of 5 (0-4, 5-9, 10-14, 15-19 etc). Stem and Leaf Plot [d] Block 0-4 5-9 10-14 15-19 20-24 60-64

Š Dr Andrew Clegg

Stem 0. 0* 1. 1* 2. 6.

Leaf 0000022222222333333333 555555555555555777777777777799999999 000000033333 888 3 4

p. p. 2-70 70


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

This stem and leaf plot provides a much better indication of the distribution of scores. You can see that we use a full stop (.) following the stem to signify the first half of each block of ten scores (e.g. 0-4) and an asterisk (*) to signify the second half of each block of ten scores (e.g. 5-9).

2.9.3

Box Plots Extreme scores are sometimes difficult to spot in a large data set. In this instance an alternative graphical technique is the box plot or whisker plot, which gives a clear indication of the distribution of extreme scores, and like the stem and leaf plots and histograms discussed above, tells us how the scores are distributed. An example of a box plot is given in Figure 2.6:

Figure 2.6:

An Example of a Box Plot

40

This thick line represents the Median Adjacent Values

30

20 Hinges The Box

10

Whiskers

0 N=

9

Although SPSS will automatically create box plots, the following section will outline how to create them so you understand how to interpret them. Step 1:

The box plot in Figure 1.6 is based on the following data: 2, 20, 20, 12, 12, 19, 19, 25, 20 The first step is to calculate the median score. 2, 12, 12, 19, 19, 20, 20, 20, 25 Median score = 19 [position 5]

Š Dr Andrew Clegg

p. p. 2-71 71


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

Step 2:

The next step is to calculate the hinges. These are the scores that cut the top and bottom 25% of the data (the lower and upper quartiles): thus 50% of the scores fall within the hinges. The hinges form the outer boundaries of the box. The hinges are calculated by adding 1 to the position of the median position and then dividing by 2. In this instance the median was in position 5, therefore: (5+1)/2 = 3

Step 3:

The upper and lower hinges are therefore the third score from the top and the third score from the bottom of the ranked list, which in this current example are 20 and 12 respectively.

Step 4:

From these scores we can work out the h-spread, which is the range of the scores between the two hinges. The score on the upper hinge is 20 and the score on the lower hinge is 12, therefore the h-spread is 8 (20 minus 12).

Step 5:

We define extreme values as those that fall one-and-a-half times the h-spread outside the upper and lower hinges. The points one-and-a-half times the h-spread outside the upper and lower hinges are called inner fences. One-and-a-half times the h-spread in this case is 12, that is 1.5*8: therefore any score that falls below 0 (lower hinge, 12, minus 12) or above 32 (upper hinge, 20, plus 12) is classed as an extreme score.

Step 6:

The scores that fall within the hinges and inner fences and which are closest to the inner fence are called adjacent scores. In our example, these scores are 2 and 25, as 2 is the closest score to 0 (the lower inner fence) and 25 is the closest to 32 (the upper inner fence). These are illustrated by the cross-bars on each of the whiskers.

Any extreme scores (those that fall outside the upper and lower fences), are shown on the box plot. You can see from Figure 2.6 that the h-spread is indicated by the box width (12 to 20) and that there are no extreme scores. The lines coming out from the edge of the box are called whiskers, and these represent the range of scores that fall outside the hinges but are within the limits defined by the inner fences. Any scores that fall outside the inner fences are classed as extreme scores (also called outliers). As shown in Figure 1.6, there are no scores outside the inner fences, which are 0 and 32. The inner fences are not necessarily shown on the plot. The lowest and highest scores that fall within the inner fences (adjacent scores 2 and 25) are indicated on the plots by the cross-lines on each of the whiskers. If we were to add a score of 33 to the data set illustrated in Figure 1.6, a revised box plot would now indicate the presence of an extreme score (see Figure 2.7). As shown in Figure 2.7, the score is marked as 10, indicating that the tenth score in our data set is an extreme score (in this case, 33). This value falls outside the inner fence of 32. In this situation it may be worth examining the data set to ensure that this extreme value has not been caused by an error in the data entry process.

Š Dr Andrew Clegg

p. p. 2-72 72


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Figure 2.7:

Revised Box Plot Indicating an Extreme Score

40

10 30

20

10

0 N=

Š Dr Andrew Clegg

10

p. p. 2-73 73


Geographical 2 Data Analysis for Techniques Research

2.10

Descriptive DescriptiveStatistics Statistics

Graphically Describing Data in SPSS Creating histograms, stem and leaf plots and box plots in SPSS is very straight forward. In the following example, we will generate graphical output relating to the Turnover08 variable in the dataset. Move the mouse over Analyse and press the left mouse.

Move the left mouse button over Descriptive Statistics and then over Explore and press the left mouse button again.

The Explore dialog box appears.

Move the mouse over the variable you want to examine (in this case Turnover08) and press the left mouse button. Move the mouse over the central arrow and press the left mouse button again. Alternatively, select the variable you want to examine and quickly double click the left mouse button.

Š Dr Andrew Clegg

p. p. 2-74 74


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

The selected variable moves across into the Dependent List. Move the mouse over Plots and press the left mouse button.

Turnover08

The Explore Plots dialog box appears. Select Stem and Leaf and Histogram. Click Continue.

Š Dr Andrew Clegg

p. p. 2-75 75


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

You are returned to the Explore dialog box. Click OK.

Turnover08

SPSS generates a histogram, stem and leaf plot and box plot in the output window.

Turnover08

Š Dr Andrew Clegg

p. p. 2-76 76


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Turnover08

Turnover08

Š Dr Andrew Clegg

p. p. 2-77 77


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

As before any graphical output produced is referring to the entire sample of 300 businesses. Using the Factor List option in the Explore dialog allows us to examine specific variables in more detail. For example the following output has been produced by selecting Area in the Factor List box. This is an extremely useful way of visually looking at the distribution of your data, which we will come back to when we look at dispersion and statistical testing.

Š Dr Andrew Clegg

p. p. 2-78 78


Geographical 2 Data Analysis for Techniques Research

Activity 8:

Descriptive DescriptiveStatistics Statistics

I would like you now to have a go at producing graphical output for a specific variable. Choose an appropriate variable (which must be ratio or interval in nature) and produce output for the entire sample, and then use the Factor List option in the Explore dialog box to investigate specific cases. Record your observations by cutting and pasting the output into your log book.

© Dr Andrew Clegg

p. p. 2-79 79


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

2.11

Creating Cross-tabulations in SPSS Another useful way of examining the relationship between variables is through the use of cross-tabulations. In the following example we will create a number of cross-tabulations using data from the Dataset file. To create a cross-tabulation in SPSS, move the mouse over Analyse and press the left mouse button. Move the mouse over Descriptive Statistics and then Crosstabs.

The Crosstabs dialog box appears. You need to think about the structure of your crosstab and decide what variable you want as a row and what variable you want as column. Your crosstab should take the form of a contingency table.

simulation Š Dr Andrew Clegg

p. p. 2-80 80


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

Move the mouse over the variable you want to assign to Rows, in this case Area, and press the left mouse button. Move the mouse over the central arrow and press the left mouse button again. Area appears in the Row(s) box:

Move the mouse over the variable you want to assign to Columns, in this case Occ (Occupation), and press the left mouse button. Move the mouse over the central arrow and press the left mouse button again. Occ appears in the Column(s) box:

Click OK.

Š Dr Andrew Clegg

p. p. 2-81 81


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

SPSS produced the crosstab in the output window:

The crosstab presented here is based on the absolute values of the data. We can repeat the process to include Row and Column percentages. This is often a good idea, as it provides a more representative overview if you have different sample sizes. In this instance we will add percentages to the rows.

Having selected the Row and Column variables move the mouse over Cells and press the left mouse button.

Š Dr Andrew Clegg

p. p. 2-82 82


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

The Crosstabs: Cell Display dialog box appears.

Select Row in the Percentages window and then click Continue. This will return you to the Crosstabs dialog box. Click OK.

A second crosstab is produced in the output window - this time row percentages have been included. In this example the crosstab is showing the distribution of occupation categories within a specific District. For example in the Chichester District, 48.6% of businesses are run by previous managers and administrators, compared to 25.7% who were in professional occupations. Reference to the percentage distribution rather than the absolute values provides a more representative discussion, as it takes into account relative sample sizes. Repeat the process removing row percentages and adding column percentages.

Š Dr Andrew Clegg

p. p. 2-83 83


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

When producing crosstabs it is important that you correctly assign row and column percentages as this can influence the accuracy of how you discuss the results. A simple rule of thumb is that row percentages should always total 100 when read across the row, and column percentages will always total 100 when read down the column. In the above example where we have used the column total we are looking at the distribution of specific occupation categories across the two Districts. For example, 70.2% of managers and administrators are within the Chichester District compared to 29.8% in the Arun District. In contrast, 63.6% of plant operatives are within the Arun District compared to 36.4% in the Chichester District.

Š Dr Andrew Clegg

p. p. 2-84 84


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

Activity 9:

Now attempt to complete the following tables. Please give consideration to whether you should be using row or column totals (the clue is in the table). Refer to your Dataset guide. Table 11: Town Against G-Strategy

Table 12: E-Strategy Against Occupation

Occupation

E-Strategy

Managers and Administrators

Professional Occupations

Clerical and Secretarial

Sales Operations

Plant Operatives

Total

E-Commerce Adopters %Distribution E-Commerce NonAdopters %Distribution

© Dr Andrew Clegg

p. p. 2-85 85


Descriptive DescriptiveStatistics Statistics

Geographical 2 Data Analysis for Techniques Research

ď €

Activity 9: Table 13: Perceived Value of the Internet Against E-Commerce and Marketing Course Attendance

Table 14: Town Against the Size of Business

Town

Size of Business

Chichester

Midhurst

Arundel

Bognor Regis

Small %Distribution Medium %Distribution Large %Distribution Total

Š Dr Andrew Clegg

p. p. 2-86 86


Geographical 2 Data Analysis for Techniques Research

 simulation

Descriptive DescriptiveStatistics Statistics

Activity 10:

Using the Dataset file, create 3 additional crosstabs using appropriate variables. Record your results by cutting and pasting your output into your log book. Check your crosstabs with your module tutor to ensure that they are correct.

Please review the online simulations to ensure that you are familiar with the basic approaches of producing descriptive statistics in SPSS.

We can make crosstabs even more specific by using the Layer Command. In the following example our the initial crosstab is GStrategy v Occupation but we are going to use the layer command to examine any differences between GStrategy, Occupation and Area. In effect, the layer command is allowing us to use Area as an additional filter. Select the variables to use as the basis of your crosstab. Here we are using GStrategy (row) and Occupation (column). Select Area and put in the Layer option.

Click OK.

© Dr Andrew Clegg

p. p. 2-87 87


Geographical 2 Data Analysis for Techniques Research

Descriptive DescriptiveStatistics Statistics

In the output window you will notice that a crosstab showing GStrategy v Occupation has been provided, but the results have also now been split by area, showing relative distributions in both the Arun and Chichester Districts.

Š Dr Andrew Clegg

p. p. 2-88 88


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.