SPSS for Learners

Page 1

SPSS Session I Preparing your data for statistical analysis

ASK WEEK February 2010 Christine Gregory


Table of Contents Section 1: Introduction to SPSS .......................................................................................................... 3 1.1 Opening SPSS 15.0 ................................................................................................................... 3 Importing data from Excel ........................................................................................................... 4 1.2 SPSS Data Editor ...................................................................................................................... 6 Variable View: Defining Variables............................................................................................. 6 Data view ..................................................................................................................................... 7 1.3 Saving your SPSS files ............................................................................................................. 8 Section 2: Describing and Presenting Data Using SPSS .................................................................... 9 2.1 Transforming variables ............................................................................................................. 9 Recoding into different variables ................................................................................................. 9 Computing new variables........................................................................................................... 11 2.2 Describing specific cases or groups separately ....................................................................... 12 Selecting cases ........................................................................................................................... 12 Split File ..................................................................................................................................... 14 2.3 Describing and presenting data graphically ............................................................................ 15 Getting familiar with the Chart Builder ..................................................................................... 15 Histograms ................................................................................................................................. 16 Boxplots ..................................................................................................................................... 19 Bar Charts .................................................................................................................................. 21 Line ............................................................................................................................................ 24 Pie............................................................................................................................................... 26 Scatter/Dot ................................................................................................................................. 28 How do I edit my chart? ............................................................................................................. 30 2.4 Describing and presenting data using descriptive statistics .................................................... 30 Frequencies ................................................................................................................................ 31 Descriptives (employ.sav: Age and Gross Annual Income). ..................................................... 32 Crosstabs .................................................................................................................................... 34 Explore ....................................................................................................................................... 35 2.5 Normality Tests: Can I use parametric tests on my data? ....................................................... 38 Four BIG assumptions for parametric tests................................................................................ 38 Testing for Normality: How can I know if my data is normally distributed? ........................... 39 2.6 Correcting Problems ............................................................................................................... 44 References .......................................................................................................................................... 44


Section 1: Introduction to SPSS 1.1 Opening SPSS 15.0 To avoid problems, always open SPSS first, then open or type in your data (Figures 1.1 and 1.2) instead of trying to open up SPSS by double-clicking an SPSS data file or output file.

Generally, you will only choose one of these options: Type in Data

OR Open an existing data sourse (can be SPSS data or from another database such as Excel)

Figure 1.1. The first window that opens when you start SPSS.

If the data editor window is already open, you can open data from the File menu.

Figure 1.2. You can also open files from the SPSS Data Editor window.


After you have either opened an existing data file or you save a new file, SPSS automatically opens an output file (Figure 1.3). This is where SPSS keeps a record of everything you do (called a log); it is also where all analysis are stored and can be viewed. If you have an existing output file you would like to use, then close the new output file created by SPSS and open your output via the File menu (see Figure 1.2).

Everything I do will be seen in outline form in this left window.

I saved a data file called test.sav and it was recorded in the log here.

Figure 1.3. Output file automatically created by SPSS.

Importing data from Excel Say some data has been stored in an Excel file named employ.xls, as shown in Figure 1.3 (the file extension must be .xls as SPSS will NOT read an Excel 2007 file, .xlsx or .xlsm). In employ.xls the column headings are the variable names. Note that variable names MUST NOT contain any spaces or special characters (except _ ) and MUST begin with a letter (whether imported or typed directly into SPSS). Using either method above (Figure 1.1 or 1.2), choose the option to open an existing data source. In the dialog box (Figure 1.5) make sure that you select “All Files (*.)� from the Files of Type dropdown menu, then find your Excel file and click Open.

Figure 1.4. An example of an excel data file (employ.xls) ready to be imported into SPSS.


Figure 1.5. Locate your excel file (.xls ONLY). From the next dialog box (Figure 1.6), check Read variable names from the first row of data and identify the Worksheet which contains your data. If the range which SPSS shows (e.g. [A1:AC71] in Figure 1.6) is different from the range of your data, then specify the correct Range in the space provided; otherwise leave it blank. Click OK.

Figure 1.6. Specify where your data is (worksheet and range) in your excel file. The data should appear in the SPSS Data Editor, with the variable names as the column headings. Figure 1.7 shows the Data View (which should look similar to the Excel spreadsheet) and the Variable View after importing. We will look at the Data View and Variable View in the Section 1.2.


Figure 1.7. The excel file employ.xls has been imported into SPSS.

1.2 SPSS Data Editor As we saw in Figure 1.7, SPSS has two editor windows: the Data View, from which to view/modify your individual data entries for each variable and the Variable View, from which to view and/or define each variable.

Variable View: Defining Variables From the Variable View you can create new variables, rename existing variables, define (or redefine) variables and specify how you want your variables displayed in the Data View (Figure 1.8 expands all the options available to you). Name

Enter the name of your variable – REMBMER: no spaces, no special characters (except _) and it must begin with a letter.

Type

Specify the data type of your variable.

Width

The maximum width of each data entry (field width).

Decimals

The number of decimal places shown in the Data View.

Label

A description of the variable (IMPORTANT so you remember what the variable is). This label will appear in your output instead of the variable name.

Values

Create labels for categorical variables. In the dialog box enter the data value which you have entered in the Data View, enter a label, then click Add. When finished, click OK. These labels will appear in your output instead of the numerical values (e.g. if 1 = Female and 2 = Male (as in Figure 1.9), then Female and Male will be shown in your tables and charts instead of 1 and 2.


Missing

Specify if you have missing values. You can either choose to specify up to three discrete numbers to represent missing values, or you can specify a range. You might want to specify different missing values depending upon why it is missing (such as non response or the participant selected N/A). When SPSS conducts any analysis, it will NOT include these missing values. You can then assign labels to your missing values so you remember what they represent. Notice in Figure 1.9, missing values for the variable commit have been defined as 0. You must enter the missing values into the Data View (e.g. the 0’s were entered manually for commit), SPSS will NOT do this for you.

Columns

The width of the column in Data View.

Align Measure

Align your data Left, Right or Center in the Data View. Specify the measurement type of the variable: Scale, Ordinal or Nominal.

Figure 1.8. Defining variables.

Data view In the Data View, variable names are shown as the column headings. From the toolbar (just under the menu bar), SPSS gives you the option to either view the numerical values for your data, or the value labels (which you defined in the Variable View). Figure 1.9 shows the Data View before and after selecting Value Labels from the toolbar. The remaining toolbar options allow you to insert columns (new variables), insert new rows (new participants), find specific data points, etc‌ We will have a look at the menu bar options as we go along.


Menu Bar Toolbar

Figure 1.9. Showing Value Labels in Data View.

1.3 Saving your SPSS files SPSS has two file types: .sav and .spo. Save all data files as .sav and all output files (see Figure 1.3) as .spo (Figure 1.10).

Save as type: .sav for data files .spo for output files

Figure 1.10. Save your SPSS data files with the extension .sav (on your H drive).


Section 2: Describing and Presenting Data Using SPSS In this section, we will look at different ways of describing and presenting data: (1) by transforming existing variables into new variables; (2) by selecting specific cases for analysis; (3) using charts and graphs; and (4) using descriptive statistics.

2.1 Transforming variables SPSS allows you to Transform your variables (Figure 2.1) in several different ways. We will look at two of these options: Recode into Different Variables and Compute Variable.

Recoding into different variables From the Transform menu, select Recode into Different Variables (Figure 2.1). This option allows you to create a new variable using values from an existing variable. As an example, from the variable age we will create a new variable called AgeGroup in order to group employees by age, thus, we will be creating a nominal variable from a scale variable.

Figure 2.1. Recode an existing variable into a new variables. First select age from the list of variables on the left and click to move it into the middle box. In the frame called Output Variable, enter the name of the new variable (AgeGroup) and assign it a label (Age Group of Employees), then click Change. Your new variable (AgeGroup) should now be moved into the middle box with the old variable (age) as shown in Figure 2.2. Now we are ready to define values for the AgeGroup, so click on Old and New Values. The dialog box shown in Figure 2.3 will appear.


Figure 2.2. Assign a name to the new variable (AgeGroup), label it (Age Group of Employees) and select Change.

2

1

3

Figure 2.3. Assign values to the new variable (or reassign values to an existing variable). In this dialog box (Figure 2.3) we will convert our scale variable (age) into a nominal (categorical) variable (AgeGroup) by creating age groups. First, select Range, LOWEST through value (marked 1). In the space provided enter 25, select Value from the New Value frame and enter 1, then click Add. You’ve just created the first age group and defined it as 25 years or younger. The next age group will be a range (ages 26 through 25). Select Range (marked 2) and enter 26 in the first space and 35 in the space below. Select Value and enter 2, then click Add. Repeat that step to create groups 3 (ages 36 through 45) and 4 (ages 46 through 55). Finally, select Range, value through HIGHEST (marked 3) and enter 56 in the space provided (this group will be defined as age 56 or older). Select Value and enter 5, then click Add. Once you have all five age groups defined and visible in the box labelled Old -> New, click Continue (note that in Figure 2.3 the age group 5 has not been added to the list yet). You will be sent back to the first dialog box (Figure 2.2); click OK. From the Variable View we can define AgeGroup (Figure 2.6). SPSS automatically defines the new variable as scale, so make sure to change the Measure if your new variable is Nominal or Ordinal and to assign Value Labels if appropriate.


Computing new variables From the Transform menu (Figure 2.4) select Compute Variable. The dialog box shown in Figure 2.5 allows you to compute a new variable from an existing variable based on a user-defined formula.

Figure 2.4. Compute a new Variable. First, in the space labelled Target Variable specify the name of the new variable you wish to compute. Define how the new variable will be computed in the space labelled Numeric Expression by using any of the functions selected from Function group and Functions and Special Variables. A textbox below the number pad gives a brief description of the function you’ve selected.

Figure 2.5. Compute a new variable called ‘MeanSatisScore’ as the mean of all four satisfaction scores, for each of N participants. Thus, MeanSatisScore will have N rows.


As an example, we will compute a new variable to be the mean of all four satisfaction score variables. In other words, for each participant, we will compute the average of their four satisfaction scores. First, let’s name the new variable MeanSatisScore. From the Function group list select Statistical, then select Mean from Functions and Special Variables and click . The function MEAN(?,?) should now be in the Numeric Expression box. This function has two ?’s separated by commas because it requires AT LEAST two variables, and each variable must be separated by a comma. We want four variables. Remove ?,? from the argument of MEAN and select satis1 from the list of variables on the left, then click . Repeat this last step for each of satis2, satis3 and satis4, separating each variable by a comma. Now click OK. From Variable View we can define the variable MeanSatisScore which we just computed (make sure to label it).

Figure 2.6. New variables AgeGroup and MeanSatisScore.

2.2 Describing specific cases or groups separately Selecting cases What if you just want to look at a specific category of a nominal or ordinal variable or a specific range of a scale variable? SPSS allows you to select certain cases using Select Cases from the Data menu (Figure 2.7).

Figure 2.7. Select Cases (or rows) of the data to work with.


From the Select Cases dialog box you can specify how you want to select cases: using an If criteria, randomly, based on a range or using a filter variable. We will look at an example using an If criteria. The options Random sample of cases and Based on time or case range are self explanatory. The option Use filter variable requires a 0-1 variable (0 = False and 1 = True), thus, all the 1’s will be selected. In our example, we will select all the cases in which ethnicgp = 2, that is, we will select all the Asian participants (Figure 2.8). Choose If condition is satisfied from the Select frame and click If. In the dialog box select Ethnic Group from the variable list on the left and click . After ethnicgp type =2, then click Continue. You will be sent back to the first dialog box, click OK. From the Data View you should see all cases in which the participant was NOT Asian crossed out (Figure 2.9). Thus, all Asian participants have been selected.

Figure 2.8. Select row if ethnicgp = 2.

Figure 2.9. All rows in which ethnicgp = 2 (Asian) have been selected (i.e. all other rows are crossed out).


Notice in Figure 2.8, in the frame labelled Output, you can specify what you want to do with the selected cases. We left the first option (Filter out unselected cases) selected, but we could have chosen to either copy the selected cases to a new dataset or even delete the unselected cases. After you are finished analysing the selected cases, you can return your dataset to its original state (i.e. all cases selected) by clicking the Reset button, then OK, in the Select Cases dialog box.

Figure 2.10. Deselect all Asians by going back to ‘Select Cases’ and pressing ‘Reset’.

Split File If you want to analyse your data by groups (say by Gender) without deselecting any cases (i.e. you want to analyse both Male and Female and not just one or the other), choose the Split File option from the Data menu (Figure 2.7).

Figure 2.11. Split file by Gender. In the Split File dialog box select Organize output by groups. Choose the grouping variable from the list on the left and click . In the example shown in Figure 2.11, Gender has been selected as the grouping variable. Also, the option Sort the file by grouping variable has been chosen, which


means that SPSS will do just that, i.e. group all the Females together (all the 1’s), followed by all the Males (all the 2’s) in the Data View. If you DON’T want SPSS to sort your data by the grouping variable then choose File is already sorted, then click OK.

2.3 Describing and presenting data graphically In this section we will look at (most of) the ways in which SPSS allows you to describe and present data graphically using the Chart Builder. Before you get started with graphs and charts, consider these helpful tips on presenting data from Tuft (2001) (cited in [1], p.88): Charts and graphs should… 1. Show/reveal the data. In particular, help make sense of large data sets. 2. Get the reader thinking critically about your data. 3. Not misrepresent the data. 4. Display the maximum information with minimum ink. In other words, keep it simple and straight forward. Don’t get too crazy with colours, patterns and designs. You want to reader to be able to focus on your data, with minimal distractions.

Getting familiar with the Chart Builder From the Graphs menu select Chart Builder (Figure 2.12). In the Chart Builder window (Figure 2.13), just to the right of the list of variables is the window in which you build your chart by selecting from the available options. In this tutorial, we will be building charts using the chart Gallery. From the Gallery, from the available charts, we will be looking at those you are most likely to use: Histogram, Boxplot, Bar, Line, Pie and Scatter/Dot. For each type of chart, we will look at an example; the data file (.sav) and variables used will be specified in ( ). We will finish off by looking at how we can edit the charts we create.

Figure 2.12. Chart Builder.


The “Element Properties� window differs depending upon the chart selected. We will see it in more detail as we go along.

Figure 2.13. Getting familiar with the chart builder.

Histograms SPSS offers four types of Histograms (Figure 2.14). We will be looking at three: the Simple Histogram, Stacked Histogram and the Population Pyramid. Simple Histogram

Stacked Histogram

Frequency Polygon

Population Pyramid

Figure 2.14. Histogram Chart Gallery.


Simple Histogram (employ.sav: Gross Annual Income). x-axis: Continuous variable (Gross Annual Income). y-axis: Select from statistic drop-down menu in “Element Properties� (Figure 2.16); for this example, choose Histogram. Other options: Bar Style: Here, Bar has been selected. Display normal curve: Check this if you want to display the normal curve with your histogram. Set Parameters: Either let SPSS create bins for your histogram automatically, or define them yourself. * After changing Element Properties make sure to click Apply.

Figure 2.15. Simple histogram before (Gallery view) and after (SPSS output).

Figure 2.16. Element Properties window for a Simple Histogram.


Stacked Histogram (employ.sav: Gross Annual Income by Ethnic Group). x-axis: Continuous variable (Gross Annual Income). y-axis: Select from statistic drop-down menu in “Element Properties� (Figure 2.17); for this example, choose Histogram. Set color: Categorical variable (Ethnic Group). Other options: Same as for the Simple Histogram.

Figure 2.17. Stacked histogram before (Gallery view) and after (SPSS output). Population Pyramid (employ.sav: Gross Annual Income split by Gender). Distribution Variable: Continuous variable (Gross Annual Income) Split Variable: Chose a dichotomous variable (fancy term for categorical variable with ONLY 2 categories) (Gender). Other Options: Display normal curve. Other Element Properties: View the Element Properties window to see all of your options.

Figure 2.18. Population pyramid before (Gallery view) and after (SPSS output).


Boxplots SPSS offers three types of Boxplots (Figure 2.19). We will be looking at all three. Simple Boxplot

1-D Boxplot

Clustered Boxplot Figure 2.19. Boxplot Chart Gallery. Simple Boxplot (employ.sav: Gross Annual Income by Ethnic Group) x-axis: nominal or ordinal variable, but preferably one with only a handful of categories as a separate boxplot will be created for each category along this axis (Ethnic Group). y-axis: Continuous variable (Gross Annual Income). This would be a variable you would choose for the x-axis in a histogram. Other Options: Show only categories present in the data: Choose this option from the Small/Empty Categories frame in the Element properties window by selecting X-Axis1 (Box1) from the Edit Properties of box. In this example, the variable ‘Ethnic Group’ had no participants classified as ‘other’, therefore I chose to omit ‘other’ from my chart (Figure 2.20). Other Element Properties: View the Element Properties window to see all of your options.

Figure 2.20. Simple Boxplot before (Gallery view) and after (SPSS output).


Clustered (employ.sav: Gross Annual Income by Ethnic Group, clustered by Gender) x-axis: Categorical variable (Ethnic Group). y-axis: Continuous variable (Gross Annual Income). This would be a variable you would choose for the x-axis in a histogram. Cluster: Categorical variable (Gender). Other Options: Show only categories present in the data: In this example this option has been selected as before. Other Element Properties: View the Element Properties window to see all of your options.

Figure 2.21. Clustered Boxplot before (Gallery view) and after (SPSS output). 1-D Boxplot (employ.sav: Age) A 1-D Boxplot is essentially an alternative to a histogram. x-axis: Continuous variable (Age). This would be a variable you would choose for the x-axis in a histogram.

Figure 2.22. 1-D Boxplot before (Gallery view) and after (SPSS output).


Bar Charts SPSS offers eight types of Bar charts (Figure 2.22). We will be looking at four of them: Simple Bar, Simple Error Bar, Clustered Bar and Stacked Bar. Clustered Bar

Stacked Bar

Simple Bar

Simple Error Bar

Figure 2.23. Bar Chart Gallery. Simple Bar A simple bar chart can be used to describe and present data in which the variables are either independent or related. We will see the difference between independent and related variables as we go along. Independent Variables (chol.sav: cholesterol level grouped by smoker). In this case, there will be one grouping variable on the x-axis. In this example, cholesterol level (grouped by smoker) is independent because the same participant cannot be both a smoker and a non-smoker. x-axis: Independent, categorical variable (smoker). y-axis: Dependent variable (cholesterol level). Select the statistic to be plotted for this variable from the Statistic drop-down menu in the Element Properties window. Be mindful of the statistic you select for your dependent variable – use common sense. That is, don’t try to chart the mean of a dichotomous variable like gender. Other Options: Display Error Bars: If you have chosen a measure of central tendency as the Statistic for the variable along the y-axis, then you can choose to display error bars at a specified confidence level for that Statistic. In this example, the Mean was selected; the error bars represent the 95% confidence level for the mean. Other Element Properties: View the Element Properties window to see all of your options.


Figure 2.24. Simple Bar (Independent) before (Gallery view) and after (SPSS output). Related Variables (Hiccups.sav: Drinking, Gargling, Breath and Sugar (data file adapted from [1])). In this case, there will be multiple variables along the x-axis, where the same group of participants was used to collect the data for each variable. This will hopefully become more clear as we do an example. In our hypothetical example, the same group of participants tried all four methods to try and get rid of their hiccups (drinking water from a cup backwards, gargling saltwater, holding their breath and eating a spoonful of sugar). After each method, the number of hiccups per minute were recorded. x-axis: Leave this axis alone, it will take care of itself. y-axis: Select multiple, related, variables (Drinking, Gargling, Breath and Sugar). A box will pop up that shows SUMMARY along the y-axis and INDEX along the x-axis, just click OK. Other Options: Statistic: Select the summary statistic from the Statistic drop down menu in the Element Properties window (for this example Mean was selected). Display Error Bars: The same as described for independent variables. Other Element Properties: View the Element Properties window to see all of your options.

Figure 2.25. Simple Bar (Related) before (Gallery view) and after (SPSS output).


Simple Error Bar An alternative representation of a Simple Bar chart. Thus, follow the same steps as above. Clustered Bar Just as with the simple bar, a clustered bar chart can be used to describe and present data in which the variables are either independent or related. Again, we will see the difference between independent and related variables as we go along. Independent Variables (chol.sav: cholesterol level grouped by smoker, clustered by Gender). In this case, there will be a grouping variable and a clustering variable along the x-axis, each with two or more categories. As before, cholesterol level is independent (grouped by smoker and clustered by Gender) because the same participant cannot be both a smoker and a non-smoker, nor male and female. x-axis: Independent, Categorical variable (smoker). y-axis: Dependent variable (cholesterol level), select statistic to be plotted from the Statistics drop-down menu in element properties. As mentioned before with simple bar charts, be mindful of the statistic you select for your dependent variable – use common sense. Cluster: Categorical variable (Gender). Other Options: Statistic: Select the summary statistic from the Statistic drop down menu in the Element Properties window (for this example Mean was selected). Display Error Bars: The same as described for simple bar charts. Other Element Properties: View the Element Properties window to see all of your options.

Figure 2.26. Clustered Bar (Independent) before (Gallery view) and after (SPSS output). Related Variables (Hiccups.sav: Drinking, Gargling, Breath and Sugar, clustered by Gender). In this case, there will be multiple variables along the x-axis, where the same group of participants were used in collecting data for each variable. These variables can be clustered into two or more groups by a categorical variable. This will hopefully become more clear as we do an example. x-axis: Leave alone, it will take care of itself.


y-axis: Select multiple, related, variables (Drinking, Gargling, Breath and Sugar). A box will pop up that shows SUMMARY along the y-axis and INDEX along the x-axis, just click OK. Cluster: Categorical variable (Gender). Other Options: Statistic: Select the summary statistic from the Statistic drop down menu in the Element Properties window (for this example Mean was selected). Display Error Bars: The same as described for simple bar charts. Other Element Properties: View the Element Properties window to see all of your options.

Figure 2.27. Clustered Bar (Related) before (Gallery view) and after (SPSS output). Stacked Bar An alternative representation of a Clustered Bar chart. Thus, follow the same steps as above.

Line SPSS offers two types of Bar charts (Figure 2.23). We will be looking at both Simple Line and Multiple Line. Simple Line

Multiple Line

Figure 2.28. Line Chart Gallery.


Line charts are an alternative representation of bar charts, thus, they can be used to describe and present data in which the variables are either independent or related. As the process is almost identical, it will not be explained in detail. Line charts are preferred to bar charts when the grouping variable along the x-axis has a larger number of categories. The precise value of “larger number” is up to you. Looking variables you want to plot, what do you think represents the data better, a bar chart or a line chart? If you don’t know, do both and consider using the one which communicates the data most clearly for the reader. Simple Line An alternative representation to a simple bar chart for both independent (Figure 2.29) and related (Figure 2.30) variables. The example used for independent variables is not best as the grouping variable along the x-axis only has two categories. I’ve simply used it for consistency as it was the example given for creating the simple bar chart.

Figure 2.29. Line chart (Independent) before (Gallery view) and after (SPSS output).

Figure 2.30. Line chart (Related) before (Gallery view) and after (SPSS output). Multiple Line An alternative representation to a clustered bar chart for both independent (Figure 2.31) and related (Figure 2.32) variables. The only difference in constructing a multiple line chart is that instead of


“Cluster”, “Set Color” is where you define your categorical variable, which will be represented by different coloured lines in your chart (instead of different coloured bars). As mentioned for the simple line chart, the example used for independent variables is probably not best as the grouping variable along the x-axis only has two categories. I’ve simply used it for consistency as it was the example given for creating the clustered bar chart.

Figure 2.31. Multiple line chart (Independent) before (Gallery view) and after (SPSS output).

Figure 2.32. Multiple line chart (Related) before (Gallery view) and after (SPSS output).

Pie (employ.sav: Ethnic Group, Gross Annual Income). SPSS offers one Pie chart option (Figure 2.33). We will examine two situations in which it may be appropriate to use a pie chart. Situation 1: You want to examine a categorical variable by looking at the proportion (or percentage) of each category within that variable out of the whole sample. For example, out of the entire sample, what percentage of participants were from each ethnic group. Situation 2: You want to examine a categorical variable by looking at the proportion (or percentage) of each category within that variable with respect to a scale variable, out of the whole sample. For example, if you want to examine the gross annual income of participants by ethnic group.


Slice by: Categorical variable (Ethnic Group or Gender, Situations 1 and 2 respectively). Angle Variable: If Situation 1: Choose a statistic from the Statistic drop-down menu in the Element Properties window. If Situation 2: Select a scale variable (Gross Annual Income). Other Options: Other Element Properties: View the Element Properties window to see all of your options. Pie Chart

Figure 2.33. Pie Chart Gallery.

Figure 2.34. Pie chart (Situation 1) before (Gallery view) and after (SPSS output). Beware not to misinterpret the pie chart for Situation 1 (Figure 2.34). It DOES NOT say that White/European participants had a higher gross annual income. It says, if you add up the income of every participant, then White/European participants made up the largest percentage of the total gross annual income of the sample. This makes sense seeing as White/European participants made up the largest ethnic group of this sample.


Figure 2.35. Pie chart (Situation 2) before (Gallery view) and after (SPSS output).

Scatter/Dot SPSS offers eight Scatter/Dot chart options (Figure 2.36). We will look at three of these options: Simple Scatter, Grouped Scatter and Matrix Scatter (you can choose to display a regression line with your plot, for all types of scatter plot, by choosing Fit Line from Element Properties. We will look at scatter plots again in Session II when we discuss correlation coefficients and regression. Simple Scatter

Grouped Scatter

Matrix Scatter Figure 2.36. Scatter/Dot Chart Gallery. Simple scatter plot (Exam Anxiety.sav: Revision Time and Exam Performance) x-axis: Independent (predictor) and scale variable (Revision Time). y-axis: Dependent (outcome) and scale variable (Exam Performance). Other Options: Other Element Properties: View the Element Properties window to see all of your options.


Figure 2.36. Simple Scatter before (Gallery view) and after (SPSS output). Grouped scatter plot (Exam Anxiety.sav: Revision Time and Exam Performance, grouped by Gender). x-axis: Independent (predictor) and scale variable (Revision Time). y-axis: Dependent (outcome) and scale variable (Exam Performance). Set color: Categorical variable by which to split the data (Gender). For example, if I choose gender, two scatter plots will be plotted on the same graph. One will compare exam performance based on revision time for females and the other will compare exam performance based on revision time for males. Other Options: Other Element Properties: View the Element Properties window to see all of your options.

Figure 2.38. Grouped Scatter before (Gallery view) and after (SPSS output). Matrix scatter plot (Exam Anxiety.sav: Revision Time, Exam Performance and Anxiety). Scatter plot Matrix: Drag and drop each scale variable you want to compare (e.g. Revision Time, Exam Performance and Anxiety).


Figure 2.39 Matrix Scatter before (Gallery view) and after (SPSS output). A note on interpretation: consider the relationship between Anxiety and Exam Performance. Is it possible that we could get a more accurate relationship between the two variables if we had a way of measuring different types of anxiety. Someone could be anxious just because they get anxious for ALL tests, regardless of how well they think they will do. This type of anxiety may increase adrenaline and help them do well on the test. Another participant could be anxious because they don’t feel prepared causing feelings of fear and dread, which may negatively affect them on the test. So maybe, in addition to anxiety, we could measure how prepared the participant feels they are for the exam.

How do I edit my chart? Double click your chart in the output window. This will take you to the chart editor. Once in the chart editor, you can access various options in any of the following ways: 1. The toolbar or menu bar from the Chart Editor. 2. The properties window, accessed by double clicking on your chart. 3. Right Click on your chart (you can access most of these options via the toolbar or menu bar). From any of the above (chart editor, properties window or Right Click menu) you can do all sorts: modify either axis, change or insert titles/footnotes, change colours/patterns, insert data labels, insert regression lines, add distribution lines to histograms, etc‌

2.4 Describing and presenting data using descriptive statistics From the menu bar, select Analyze, then Descriptive Statistics. In this section, we will examine how data can be described and presented using the first four options: Frequencies, Descrpitives, Explore and Crosstabs.


Figure 2.40. The Descriptive Statistics menu accessed via the Analyze menu.

Frequencies (employ.sav: Ethnic Group and Job Satisfaction Scale 1) From the list on the left (Figure 2.41), select the variable(s) of interest, then click the list on the right (Ethnic Group and Job Satisfaction Scale 1).

to move to

Figure 2.41. Frequencies via the Descriptive Statistics menu. Display frequency tables: Check this option to output a frequency table for each variable (see Figure 2.42). These are useful for analysing nominal and ordinal data (Gender, Ethnic Group), however, they are not recommended for scale data, particularly continuous scale data.


Statistics: Useful for analysing scale, nominal and ordinal data. However, be mindful of the statistics you choose for nominal and ordinal data. In this example, I’ve chosen not to generate any statistics for the categorical variables Ethnic Group and Job Satisfaction Scale 1. Charts: While generating histograms with the normal curve is helpful for continuous scale data, I don’t recommend generating charts (bar, pie or histogram) for your data in this manner as your options are limited. However, sometimes it is helpful to produce histograms for several variables at once, just to get an idea of how the data is distributed, instead of producing individual histograms one-by-one through the chart editor. In this example, I’ve chosen not to create any charts for the categorical variables Ethnic Group and Job Satisfaction Scale 1. Format: Decide how you want your variables presented in your frequency table. You also have the option to suppress large tables. So say you want to produce frequency tables of all your variables (both scale and categorical) in one go, then you can choose to suppress the large frequency tables for your scale variables. Figure 2.41 shows the default formatting options.

Figure 2.42. Frequency tables for Ethnic Group and Job Satisfaction Scale 1. Output based on specifications shown in Figure 2.40.

Descriptives (employ.sav: Age and Gross Annual Income). From the list on the left (Figure 2.43), select the variable(s) of interest, then click the list on the right (Age and Gross Annual Income).

to move to


Figure 2.43. Descriptives via the Descriptive Statistics menu. Options: Chose the desired statistics and specify the order in which to display your variables in the output. Save standardized values as variables: This gives you the option to create a new variable containing Z-scores for each variable you are calculating descriptive statistics for. As I have it checked, two new variables called Zincome and Zage were created (see Figure 2.44) which consist of the z-scores for the variables Gross Annual Income and Age, respectively.

Figure 2.44. Z-scores for Gross Annual Income and Age saved as new variables.

Figure 2.45. Example of Descriptive Statistics output based on specifications shown in Figure 2.43.


How is Descriptives different from Frequencies? 1. No chart options. 2. It does not offer the following statistics: Median, Mode or Percentile Values. 3. No frequency tables (obviously).

Crosstabs (employ.sav: Gender and Ethnic Group) Similar to a frequency table, a cross tabulation table is useful for analysing nominal and ordinal data, but instead of just looking at the number of male and female participants or the number of participants from each ethnic group, a cross tabulation table will produce the number of male and female participants within each ethnic group. Just as with a frequency table, a cross tabulation table is probably not very helpful for presenting scale data. From the list on the left (Figure 2.46), select the variable(s) of interest, then click the list on the right (Gender and Ethnic Group).

Figure 2.46. Crosstabs via the Descriptive Statistics menu.

to move to


Statistics: A mix of hypothesis tests and correlation statistics. SPSS helps us out by identifying which statistics should be used on certain types of data. Seeing as two independent variables (Gender and Ethnic group) are being analysed, any of these statistics may be selected. Cells: Select the information you want to see in EACH cell of the cross tabulation table. Counts: In Figure 2.46, only Observed (in the frame labelled Counts) has been selected, however, you may want to output Expected counts as well, particularly if you are conducting a hypothesis test, such as Chi-squared, or looking at the correlations for nominal or ordinal data. Percentages: The following percentages will be displayed in EACH cell of the table. Row: Each cells value is expressed as a percentage of the total number of observations in each row. Column: Each cells value is expressed as a percentage of the total number of observations in each column. Total: Each cells value is expressed as a percentage of the total number of observations in the sample. Format: Choose how you want your data outputted. Exact: You will only be concerned with this if you have selected certain options from the Statistics dialog box.

Figure 2.47. Example of Crosstabs output based on the specifications shown in Figure 2.46.

Explore (ExamAnxiety.sav: Exam Performance and Anxiety by gender). Explore is essentially a combination of Frequencies and Descriptives. From the list on the left, select variables to add to the Dependent or Factor list. Multiple variables can be added to each list, however, it will not produce any cross tabulations. It will just produce frequencies, descriptive statistics and any other optional output selected, by pairing each variable in the dependent list with each item in the factor list separately.


Figure 2.48. Explore via the Descriptive Statistics menu. Dependent List: Dependent scale variable(s). Factor List: Predictor variable(s), nominal or ordinal. Display: Choose whether to display only statistics, only plots or both. Statistics: Select the statistics to be outputted. Figure 2.49 shows what will be generated by checking the Descriptives option. Plots Boxplots, Factor levels together: This will produce a separate boxplot for each dependent variable. For example, if I chose Exam Performance and Anxiety, with gender as the factor, two boxplots will be created, as shown in Figure 2.50. Boxplots, Dependent levels together: This will produce a separate boxplot for each predictor variable. Keeping with the same example as above, if Exam Performance and Anxiety are the Dependent variables and Gender is the Factor (the predictor), one boxplot will be created, as shown in Figure 2.51. Normality Plots with tests, and the Levene test: We will discuss these options in the next section.


Figure 2.49. Example of Explore output based on the specifications shown in Figure 2.48 (boxplots shown in Figure 2.50).

Figure 2.50. Explore output: Boxplots, factor levels together.

Figure 2.51. Explore output: Boxplots, dependent levels together.


2.5 Normality Tests: Can I use parametric tests on my data? Four BIG assumptions for parametric tests ([1]) Here we are actually referring to the distribution of the population from which we took the sample. As we cannot possibly collect data from an entire population, we collect only a sample. Then, we use the central limit theorem, which says that if the distribution of our sample is normal, then the population from which it came is also normal (there’s a bit more to the central limit theorem, this is a just a simplification of it, please see [1] for a more technical definition). The data is normally distributed: Depending on the test, the “data” will be referring to different things, such as the actual sample or model errors ([1]). Homogeneity of variance: Depending on what is being examined, this assumptions states that either the variability of each variable should be about the same or the variability between categories within one variable should be about the same. Another way to say it is that the difference between the variances of any two variables is approximately zero or the difference between the variances of the categories within one variable is approximately zero. See [1] for more on the theory behind this assumption. SPSS provides Levene’s test for testing this assumption on scale data (we will look consider categorical data in Session II): Null Hypothesis: There is no difference between the variances. Alternate Hypothesis: There is a difference between the variances (assumption violated). This test can be found in Explore (see Figure 2.48). From the Plots option button, check Normality Plots with tests and select Untransformed. This means that it will run the test on the ‘raw’ data. If the ‘raw’ data violate this assumption, you can choose to transform the data, and hopefully a transformation of the data will satisfy this assumption of homogeneity. If you want to run the test on a transformation of your data, choose Transformed and select which type of transformation you would like to carry out. Figure 2.52 shows the results of Levene’s test for Exam Performance grouped by Gender (Exam Anxiety.sav); the Lavene Statistic is reported as an F statistic with df1 and df2, given by F(df1, df2). The null hypothesis states that there is NO difference between the variance in Exam performance between males and females. The results show F(1, 101) = .160 and a significance level (p-value) of .690, thus p>.05 and we ACCEPT the null hypothesis that there is no difference between the variances. If we look at the variances of exam performance for males and females (Figure 2.48) we see that they are very similar (26.318 and 25.811, respectively) and the test results confirm that they are indeed similar, i.e. their difference (26.318 – 25.811 = 0.507) is not significantly different.


Figure 2.52. Explore output: Levene’s Test for the Homogeneity of Variance. The data is at least interval. See [1] for a good discussion on data types. Independence: What does this mean? Well, it depends on the data. If the variables being tested are: …NOT RELATED, for example, if different participants were used for different experimental conditions, then the variables should be independent across each column of data. …ARE RELATED, for example, if the same participants were used for two or more experimental conditions, then different participants’ measurements should be independent (that is, independence between rows).

Testing for Normality: How can I know if my data is normally distributed? There are several ways in which you can evaluate whether data is normally distributed. I’ve divided them into three categories: Visual evaluation, Numerical evaluation and Statistical evaluation.

Visually: Plots and charts Histograms: Create a histogram with the normal distribution curve (see Section 2.1). P-P plots (probability-probability plot): Along one axis is plotted the cumulative probability of the variable being tested, along the other axis is plotted the cumulative probability of the distribution to which we are making a comparison (as we are testing normality, we would want to plot the cumulative probability of a normal distribution). P-P plots can be created in SPSS by choosing P-P plots from the Descriptive Statistics menu (Figure 2.40). SPSS will also output a Detrended Normal P-P plot, which is just the difference between the observed and expected values for each point on the P-P plot. Figure 2.53 shows both plots for Exam Performance.


Figure 2.53. Example output for P-P plot and Detrended P-P plot for Exam Performance. Q-Q plots (quantile-quantile plot): Along one axis observed values are plotted, along the other axis expected values are plotted (as we are testing normality, we would want to plot the expected values based on a normal distribution). Q-Q plots can be created in one of two ways in SPSS, depending on what you want to test: Testing the normality of one variable (i.e. all observations for one variable): Select Q-Q plots from the Descriptive Statistics menu under Analyze (Figure 2.40). Testing the normality between groups within one variable: In Explore, click on the Plots button and check Normality plots with tests (Figure 2.48). As with the P-P plots, SPSS will output a Q-Q plot as well as a Detrended Q-Q plot (again, the difference between the observed and expected values for each point in the Q-Q plot.

Figure 2.54. Example output for P-P plot and Detrended P-P plot for Exam Performance.


Numerically: I’ve named this category Numerical, even though technically statistics will be used, because no statistical tests will be employed, only numerical values will be compared. In a numerical evaluation, statistical characteristics of the sample distribution can be compared against those of a normal distribution, e.g. skewness and kurtosis (obtained by either Frequencies or Descriptives from Descriptive Statistics, see Figure 2.40).

Statistically: Test what you find numerically and see visually to determine if your conclusions are statistically significant or not. Kolmogorov-Smirnov (K-S) Test: Although we are only interested in testing normality here, the K-S test can be used to test the distribution of the sample against distributions other than the normal distribution. In addition, it can be used on small samples. Null Hypothesis: the distribution of the sample is NOT different from a normal distribution (accept if p > .05, i.e. not significant). Alternate Hypothesis: the distribution of the sample IS different from a normal distribution (reject null in favour of alternate if p < .05, i.e. is significant). The K-S test can be run in one of two ways, depending on what you want to test: Testing the normality of one variable (i.e. all observations for one variable): Use the 1Sample K-S under Nonparametric Tests in the Analyze menu (Figure __). Move the variable of interest into the Test Variable List. The example in Figures __ and __ show the results of the K-S test for Exam Performance (Exam Anxiety.sav) Options: You can choose to output statistics and specify how missing values are handled. Exact: Leave this as Asymptotic only for now. Interpreting and reporting results: The output of the K-S test for Exam Performance (Exam Anxiety.sav) is shown in Figure __. The distribution of Exam Performance is NOT statistically significant as D(103) = 1.365 and p < .05. Thus, there is strong evidence to suggest that Exam Performance is not normally distributed. Testing the normality between groups within one variable: In Explore, click on the Plots button and check Normality plots with tests and choose the Untransformed option (Figure 2.48). Using the example from Figure 2.48, we can test the normality of the distribution of Exam Performance for male and female participants separately. Interpreting and reporting K-S test results: The output of the K-S test for Exam Performance grouped by Gender is shown in Figure __. The distribution of Exam Performance for males is statistically significant as D(52) = .136 and p < .05 (p = .018), thus there is strong evidence to suggest that the distribution of Exam Performance for males is not normal. Likewise, the distribution of Exam Performance for females is statistically significant as D(51) = .132 and p < .05 (p = .028), there is strong evidence to suggest that the distribution of Exam Performance for females is also not normal. Note that 52 and 51 are the degrees of freedom the male and female samples, respectively.


Figure 2.55. The 1-Sample K-S accessed viva the Nonparametric Tests and Analyze menus.

Figure 2.56. The 1-Sample K-S test accessed via the Nonparametric Tests menu.


Figure 2.57. Example of 1-Sample K-S output based on specifications shown in Figure 2.__.

Figure 2.58. Example K-S test output via Normality tests and plots option in Explore. Shapiro-Wilk (S-W) Test: It can be argued that the S-W test is superior to the K-S test in detecting deviations of a samples distribution from normality ([2], p.99). Null Hypothesis: the distribution of the sample is NOT different from a normal distribution (accept if p>.05, i.e. not significant) Alternate Hypothesis: the distribution of the sample IS different from a normal distribution (reject null in favour of alternate if p<.05, i.e. is significant). Interpreting and reporting S-W test results: The output of the S-W test for Exam Performance grouped by Gender is shown in Figure 2.58. The distribution of Exam Performance for males is statistically significant as the test statistic is .942 and p < .05 (p = .014), thus there is strong evidence to suggest that the distribution of Exam Performance for males is not normal. However, the distribution of Exam Performance for females is not statistically significant as the test statistic is .958 and p > .05 (p = .069), thus there is strong evidence to suggest that the distribution of Exam Performance for females is normal. A note of caution when using either test: According to [1], both tend to produce significant results (i.e. show that the sample distribution is not normal) for a large sample when they only slightly deviate from normal. Thus, the researcher needs to judge, based on visual observations and by comparing skewness and kurtosis, whether the distribution is in fact non-normal or whether the test has been a bit too sensitive to deviations.


2.6 Correcting Problems What do I do about: …outliers? There is not a procedure in SPSS to do this for you. Field (2009) describes several options available to you – basically, it’s a judgement call on your part as the researcher. …non-normal data? Try a transformation. Data can be transformed by recoding the variables into new variables, as described in Section 2.1. For more information on which transformation you should try see [1] (p.155). Just remember, if one variable is transformed, the same transformation must be applied to every other variable to which comparisons are being made and tests being run. …data that violates the assumption of homogeneity of variance? As with non-normal data, try transformations. When testing this assumption, there is another option for transforming the data: it can be transformed during the execution of Levene’s test (see Section 2.4).

References [1] Field, Andy. Discovering Statistics Using SPSS, 3rd Edition. SAGE Publications Ltd: London, 2009. [2] Field, Andy. Discovering Statistics Using SPSS 3rd Edition Addition Material. 2009. http://www.uk.sagepub.com/field3e/additionalwebmaterial.htm. Accessed: 01 February 2010.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.