How to code categorical variables

Page 1

Table of Contents How to Code Categorical Variables ............................................................................ 2 Why do categorical variables need to be coded? ............................................................................ 2 1.1 Choosing the right code ............................................................................................................ 2 Nominal data: .............................................................................................................................. 2 Ordinal data: ................................................................................................................................ 3 1.2 Multiple response questions from a questionnaire.................................................................. 3 1.3 Define your coded values in SPSS ............................................................................................. 4

Additional Resources.................................................................................................. 5

Created by ASK (2012)

Page 1 of 5


How to Code Categorical Variables If you are unsure how to identify the variables in your dataset or about which variables are categorical, it’s best to review these topics through the guides on Blackboard first.

Why do categorical variables need to be coded? Most quantitative software cannot analyse words or text, but only numbers. Thus, categorical data are assigned a numerical code for analysis.

1.1 Choosing the right code Nominal data: Because nominal data does not have a meaningful order or rank, technically you can use any numbers, in any order that you want. However, it’s good practice to  

start numbering from either 0 or 1 and to use consecutive numbers (e.g., 1, 2, 3, 4, 5) rather than non-consecutive numbers (e.g., 1, 4, 9, 12, 23, etc…).

EXAMPLE . “What is your main mode of transport when you commute to university?” (circle one)     

Public transport Cycle Car Walk Motorcycle

I will code these possible responses from 1 to 5. It does not matter which order I number them because the data is nominal. Then I’ll replace the responses with the numerical code in my dataset (see Figure 1).

1 = Public transport 2 = Cycle 3 = Car 4 = Walk 5 = Motorcycle

Figure 1. Nominal data before and after coding. Created by ASK (2012)

Page 2 of 5


Ordinal data: Because ordinal data does have a meaningful order or rank, you should:  

number your data in the correct order use consecutive numbers (e.g., 1, 2, 3, 4, 5) rather than non-consecutive numbers (e.g., 1, 4, 9, 12, 23, etc…).

EXAMPLE . “Most days my commute causes me to feel stressed” (circle one) Strongly Disagree

Disagree

Don’t know

Agree

Strongly Agree

I will code these possible responses from 1 to 5. Remember, it does matter which order I number them because the data is ordinal. Then I’ll replace the responses with the numerical code in my dataset (see Figure 2). Note I could have coded the responses from 5 (SD) to 1 (SA), instead of 1(SD) to 5 (SA); it’s your choice. Just make sure it is consistent throughout your data.

1 = Strongly disagree (SD) 2 = Disagree (D) 3 = Don’t know (DK) 4 = Agree (A) 5 = Strongly agree (SA)

Figure 2. Ordinal data before and after coding.

1.2 Multiple response questions from a questionnaire EXAMPLE . “Which of these sports would you go to see at the Olympics if you had tickets?” (tick all that apply)     

Athletics Swimming Boxing Football Gymnastics

Created by ASK (2012)

Page 3 of 5


Because you cannot have multiple responses as one data entry, each of these responses become a new variable. Then, the data for each participant is either YES (they did tick it) or NO (they didn’t tick it). Code these as we did for nominal data (e.g., 0 = NO, 1 = YES). See Figure 3 for an example. From the data we can read that:    

Participant 1 ticked boxing, football and gymnastics Participant 2 ticked athletics, boxing and football Participant 3 did not tick any Etc…

*I chose to use 0 (NO) and 1 (YES), but you can use 1 (NO) and 2 (YES) if you prefer.

Figure 3. Coding multiple response questions (0 = NO, 1 = YES).

1.3 Define your coded values in SPSS If you are using SPSS, then you will need to create value labels for the categorical variables you have coded. What’s the point of doing this?  

The value labels will appear instead of the codes on all output (this will save you loads of time!) You can choose to view codes or the value labels (the words/text) in the data view

Created by ASK (2012)

Page 4 of 5


Additional Resources In the Getting Started folder under the SPSS resources section, you may be interested in the following: 1. How to identify variables in your dataset 2. Levels of measurement (nominal, ordinal and scale variables) 3. How to create value labels for categorical variables * If you are unsure about which variables are categorical, have a look at the Levels of Measurement guide mentioned above.

Return to: Title: How to Code Categorical Variables 1.1 Choosing the right code Nominal data Ordinal data

1.2 Multiple response questions from a questionnaire 1.3 Define your coded values in SPSS

Created by ASK (2012)

Page 5 of 5


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.