Statistics for Cambridge IGCSE™
COURSEBOOK
Dean Chalmers with Digital access
Introduction
Welcome to Cambridge IGCSE™ Statistics.
This Coursebook provides you with complete coverage of the Cambridge IGCSE™ Statistics syllabus (0479) for examination from 2027. The eleven chapters are structured in a logical order, building on prior skills, to cover the complete syllabus content over the recommended 2-year period, although you can also use this resource for a 1-year intensive study course. Each chapter covers all the syllabus topic learning objectives, getting started prerequisite knowledge activity, introductory narrative, relevant worked examples, and exercise sets.
As you work through the book, you will find clear explanations of the topics and concepts you are studying, with numerous worked examples to help ensure that you understand each step taken in the mathematical processes you are learning. Further, to assist you in making sense of your learning, each chapter is clearly divided into sections relating to specific topics. The exercises in each section will give you plenty of chances to practise the skills that you have learned, with frequent occasions for you to pause and assess your strengths, as you reflect on your learning strategies.
Four practice exercises, containing referenced past paper questions are included at timely intervals throughout the book. These will help to deepen your understanding of concepts and commit information into your long-term memory. You will also find a glossary of key words used throughout the book, to ensure complete confidence in the mathematical language used in the syllabus.
Full numerical answers can be found at the back of the Coursebook.
Finally, we wish you all the best with your studies and hope that you enjoy using this book.
How to use this book
Throughout this book, you will notice lots of different features that will help your learning. These are explained below.
LEARNING INTENSIONS
These set the scene for each chapter, help with navigation through the Coursebook and indicate the important concepts in each topic.
GETTING STARTED
This contains questions and activities on the subject knowledge you will need before starting a chapter.
KEY WORDS
Key vocabulary appears in a boxes throughout each chapter and is highlighted in the text when it is first introduced. You will also find definitions of these words in the Glossary at the back of this Coursebook.
TIP
The information in this feature will help you complete the exercises and give you support in areas that you might find difficult.
INVESTIGATION/DISCUSSION
These boxes contain questions and activities that will allow you to extend your learning by investigating a problem, or by discussing it with classmates.
Exercises
Appearing throughout the text, exercises give you a chance to check that you have understood the topic you have just read about and practise the mathematical skills you have learned. You can find the answers to these at the back of the Coursebook.
REFLECTION
These activities ask you to think about the approach that you take to your work, and how you might improve this in the future.
WORKED EXAMPLE
These boxes show you the step-by-step process to work through an example question or problem, giving you the skills to work through questions yourself.
SELF/PEER ASSESSMENT
At the end of some exercises you will find opportunities to help you assess your own work, or that of your classmates, and consider how you can improve the way you learn.
SELF-EVALUATION CHECKLIST
This is a summary of key points you should know so that you can check your progress after studying each topic.
COMMAND WORDS
Command words that appear in the syllabus and might be used in assessment are highlighted in the practice questions when they are first introduced.
Chapter 1 Data and its Collection
BY THE END OF THIS CHAPTER YOU WILL BE ABLE TO:
•classify data and variables according to their type
•know the terms associated with different types of sample and understand the sampling methods used to obtain them
•understand that some sampling methods can be biased
•use a random number table to select a simple random, systematic or stratified sample
•use open and closed questions in a survey
GETTING STARTED
What do you already know?
• When a quantity, N, is divided in the ratio a : b, the two shares are a a + b × N and b a + b × N
• When a number is rounded, there are lower and upper limits for the original number.
Check your skills
1 Find the greater share when:
a 2.7 kg is divided in the ratio 1 : 2
b 63.21 cm is divided in the ratio 4 : 3
c $420 is divided in the ratio 3 : 11
2 An influenza virus has so far infected exactly 8567 people.
a Write down the number that will appear in a newspaper article, if the journalist approximates the number of infected people to the nearest hundred.
b One week later, the journalist writes a second article in which she reports that, to the nearest ten, 9320 people have now been infected by the influenza virus.
Find the least and greatest possible number of infections that could have occurred in the week between the first and second articles.
Data is something we use all the time, but what is it?
Data are pieces of information, and a set of data is a collection facts and statistics that are gathered together and analysed to help answer questions. Data are often collected to investigate people’s opinions and behaviours, or to discover the characteristics and qualities of objects. This might be to help national and local governments make sensible decisions about spending on public services, or for quality-control testing in manufacturing industries.
Imagine you want to answer one of these questions:
• What do the people living in your town think about a plan to build a new road nearby?
• Which internet service provider gives you the best value for money?
• How much did your friends and family enjoy their meal at a restaurant?
To answer these sorts of questions you need to collect some data. This can be done by carrying out a survey. Before you carry out a survey, you will need to decide what data to collect and how you will collect it. For example, you can:
• use interviews and questionnaires
• make observations and measurements
• collect responses online by asking customers to rate services and products.
When you look at the results of a survey that has been completed by another person or by an organisation, you should think about whether the people who carried out the survey are prejudiced or if they might be trying to influence you in a certain way.
KEY WORDS
raw data quantitative qualitative census population sample representative bias sampling error sampling frame questionnaire open question closed question discrete continuous
You should ask yourself:
• What method was used to collect the data?
• How was the sample chosen for the survey?
• What was the siz e of the sample and was it a representative sample?
1.1 Types of Data and Variable
Any time that you investigate a statistical question, you will need to collect information or data about a variable. The initial, unprocessed information you collect is called raw data
For example, if you want to collect data on the trees in a park, some of the variables that you could investigate are ‘tree height’, ‘tree age’, ‘leaf colour’ or ‘hardness of wood’. Variables and data can be defined as quantitative or qualitative.
• Quantitative data are numerical values of a variable that can be arranged in order of size
• Qualitative data are qualities or characteristics of a variable that are described by words or symbols.
The table shows some examples of quantitative variables that produce quantitative data, and qualitative variables that produce qualitative data.
Quantitative
• the numbers of students in the classes at a school
• the heights of students in a group
• the numbers of items purchased by customers at a shop
Qualitative
• the styles of shoes that people wear
• the types of animals kept at a zoo
• students’ opinions about their school unifor m
Quantitative variables and their data are defined as either discrete or continuous
As a general rule:
• discrete quantitative data are values that are counted and can be given exactly
• continuous quantitative data are measurements that cannot be given exactly but can be given to a certain degree of accuracy.
Data for a discrete variable can take only certain specific values, so the number of possible values within a range is countable. Values can appear at regular or irregular intervals, and there is always a gap between one data value and the next, as shown in the diagram.
Discrete variable Range
Only certain values within a range
Quantitative data can be ranked by arranging the values in ascending order (from smallest to largest) or in descending order (from largest to smallest).
WORKED EXAMPLE 1.1
Each of the following variables has a countable number of possible values. In the range from 1 to 10 inclusive, state the number of possible values and list them in each case.
a The numbers of passengers travelling on buses through a town.
b The numbers of sides of a polygon.
c The values, in rupees, of the coins used in India today.
d Half the number that can be rolled with an ordinary dice.
Answers
a There are ten possible values: 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. (0 is not in the range from 1 to 10.)
b There are eight possible values: 3, 4, 5, 6, 7, 8, 9 and 10.
c There are four possible values: 1, 2, 5 and 10.
d There are five possible values: 1, 1 1 2 , 2, 2 1 2 and 3.
( 1 2 is not in the range from 1 to 10.)
Data for a continuous variable can take any value within a given range or ranges, so the number of possible values is uncountable, as shown in the diagram.
TIP
The range from 1 to 10 inclusive means that 1 and 10 are included. The range from 1 to 10 exclusive means that 1 and 10 are not included.
SAMPLE
Values in a set of continuous data can be found by making measurements or taking readings from a machine or device. Although these measurements can be made accurately, they can almost never be made exactly. This means that values are often approximated to a certain degree of accuracy (to the nearest whole number, nearest ten, 1 decimal place, 3 significant figures, and so on) and so they are often represented by rounded values and each rounded value represents a range of values.
WORKED EXAMPLE 1.2
Each of the following variables has an uncountable number of possible values. Using inequalities, find the range of the possible values in each case.
a Tree heights, h, that are given as 3 metres, to the nearest metre.
b Playing times of movies, t, that are given as 90 minutes, to the nearest 10 minutes.
c Masses of objects, m, that are given as 7.3 kg, to 1 decimal place.
CONTINUED
Answers
a 2.5 < h , 3.5 m
This includes all heights from 2.5 to under 3.5 metres
b 85 < t , 95 min This includes all playing times from 85 to under 95 min.
c 7.25 < m , 7.35 kg This includes all masses from 7.25 to under 7.35 kg.
Items that can take only one value are not variables, they are constants.
For example, the number of letters in the word JUPITER is not a variable: it is a constant with a value of 7.
However, the numbers of letters in the words of the phrase THE PLANET JUPITER is a discrete quantitative variable that can take a value of 3, 6 or 7.
1.2 Surveys
The purpose of a survey is to collect information or raw data. Different types of information can be collected, including numerical quantities, opinions, qualities, or characteristics.
Primary data are facts and information that you collect yourself, whereas secondary data are facts and information that has been collected by somebody else.
A census is a type of survey that collects information about a whole population, and a sample is a survey that focuses on just part of a population.
Census
In a census, information is collected from an entire population. This could be:
• all the people living in a particular country
• all the children attending a particular school
• all the bolts produced at a factory
• all the vehicles registered in a certain district.
Conducting a census of an entire country is a very large and expensive thing to do and is only carried out regularly by governments, and usually only once every five or ten years.
Sample Surveys
For most companies and organisations, a census is not necessary, and they prefer to carry out a sample survey because it is cheaper and less time-consuming.
SAMPLE
The sample chosen to take part in a survey should be representative, so that it gives an accurate picture of the numerical quantities, opinions, qualities, or characteristics, of the population from which it is taken. This can be difficult to achieve, but it is important to try in order to avoid bias. However, there will almost always be a difference between the statistical characteristics of the sample and the statistical characteristics of the population. This difference is referred to as the sampling error
Once the purpose of a survey has been stated and the method of selecting the sample has been decided, the target population and the sampling units must be defined clearly. For example, if the target population is all the people that subscribe to Superfast Internet Services, then the sampling unit is a person that subscribes to Superfast Internet Services.
If the sampling units are individually named or numbered to form a list, then the list is referred to as the sampling frame
SAMPLE
WORKED EXAMPLE 1.3
a Approximately 90% of a sample of people living in a town approve of a new shopping centre that is being built. Explain what is meant by the statement, ‘The sample is representative of the population.’
b A survey made just before an election indicated that party Z would receive 40% of the votes, but at the election party Z actually received only 20% of the votes. Suggest two possible reasons for this sampling error.
Answers
a The statement means that approximately 90% of the town’s population approves of the new shopping centre as well.
b The sampling error could have happened because mistakes were made when collecting the data, or because the sample was not representative of the population.
Questionnaires
In order to conduct an effective survey, you need to decide what type of data is required and what questions you need to ask to obtain the data. Both quantitative and qualitative data can be collected in a survey by using a questionnaire.
A questionnaire may consist of open questions and/or closed questions and it should be relevant to the purpose of the survey, simple to understand and easy to complete.
Open questions allow the person being interviewed to respond in any way they like. These types of questions may contain phrases such as:
• What do you think about …?
• What is your opinion of …?
Responses to open questions can be difficult to collect together and analyse because the responses are not restricted in any way. The responses may need to be interpreted, and this means that they can also be misinterpreted.
Closed questions allow the person being interviewed to respond in a limited number of ways. Answers are restricted and some respondents may feel that none of the available answers are suitable.
Closed questions may contain phrases such as:
• Answer ‘Yes’ or ‘No’ to the following …
• Tick the box that …
• Do you always, sometimes or never …
• Rate the following on a scale from 1 to 5 …
• How many stars out of five would you give to …
Numerical scores can be assigned to worded responses to make compiling and analysis simpler.
The responses ‘Yes’, ‘Don’t know’ and ‘No’ could be assigned scores of +1, 0 and −1, respectively.
WORKED EXAMPLE 1.4
On a questionnaire, a sample of 50 people was asked, ‘Is your local hospital providing a good service?’
The three possible answers to this question are:
a Calculate the total score for this question if: i the same number of people answered ‘Yes’ as answered ‘No’ ii 27 people answered ‘Yes’, 14 people answered ‘Don’t know’ and the rest answered ‘No’.
b If the total score for this question is −10, what general conclusions can you make about the people’s opinion of the service provided by the local hospital?
Answers
a i Equal numbers of +1 scores and −1 scores give a total of 0 ii 27 × 1 + 14 × 0 + 9 × (−1) = 18
b The total is less than 0, so 10 more people answered ‘No’ than answered ‘Yes’. Overall, these 50 people do not think that the local hospital is providing a good service.
A questionnaire is being designed to investigate what people think about a new chocolate bar and to work out what is the best price to charge.
a Design a closed question to collect quantitative data about a fair selling price for the chocolate bar.
b Given that the manufacturer plans to sell the chocolate bar for $1.50, design an open question to collect qualitative data for what people think about this price. Explain why the data collected will be qualitative.
CONTINUED
Answers
a
What do you think is a fair selling price for the chocolate bar? Tick one box only
SAMPLE
b
In at most three words, what is your opinion of the suggested selling price of $1.50?
The question asks for an opinion in words (it is not numerical data) so the data collected will be qualitative.
Exercise 1A
1 Four variables, W, X, Y and Z, are defined as follows:
W: the numbers of lions seen on drives in a game park
X: all integers from 5 to 9, inclusive
Y: all numbers from 1 to 4, inclusive
Z: all possible shoe sizes.
State whether each variable is discrete or continuous.
2 Write down the letter of the descriptions below that are not discrete quantitative variables.
Justify your answer in each case.
A: the types of food eaten by Mr. Chand’s cat
B: the numbers of broken eggs in boxes of twelve eggs
C: the number of new cars purchased in Karachi last month
D: the heights of the buildings in a city.
3 A questionnaire is being planned for a survey on the sales potential of a fashionable sports shoe.
a Describe the most appropriate population from which a sample should be selected to complete the questionnaire.
b Suggest what the sports shoe manufacturer could do to help the sample members complete the questionnaire knowledgeably and honestly.
c Design two closed questions suitable for use in the questionnaire. One should produce a set of qualitative data and the other should produce a set of quantitative data.
4 A questionnaire contains the following question: ‘Write down one word that you think best describes the parking facilities in the town centre.’
a Explain why this is referred to as an open question.
b Give a reason why this question will produce a set of qualitative data.
c Design a closed question to obtain qualitative data on people’s descriptions of the parking facilities in the town centre.
d Suggest a way in which your closed question could be used to produce a set of quantitative data.
5 a The following infor mation is to be collected from the adults attending a conference.
i blood type ii head circumference iii gender iv number of children
Use appropriate statistical language to describe fully each of the types of data that will be collected.
b The heights of the adults are also recorded. Explain how this can be done so that a set of qualitative data, rather than a set of quantitative data, is collected.
6 The distance between two cities is given as approximately 300 km.
Use inequalities to show the possible distance, d, if this approximation is correct to the nearest:
a hundred kilometres b ten kilometres c kilometre.
7 The people completing a questionnaire are asked to provide three ways they can be contacted:
• a physical address
• an email address
• a mobile phone number
For each of these, state whether the data collected are quantitative or qualitative, giving a reason in each case.
8 In a 200-metre race, the continuous variables average speed and time taken can be used to decide the positions of the runners. Copy and complete the following to describe the winner of the race:
The winner has the ____ average speed and the ____ time taken.
9 A head teacher wants to conduct a survey on parents’ opinions on the changes she plans to make to the school timetable. To do this, she considers the following survey methods:
A: Sending a questionnaire by post to all pupils’ parents
B: Placing a questionnaire on the school’s website and inviting pupils’ parents to respond
a Suggest one reason why the head teacher might prefer to use:
i method A instead of method B
ii method B instead of method A
b Give the reason why the head teacher’s survey is actually a census.
10
a Find or describe the number of possible values from 1 to 10, inclusive, that each of the following variables can take:
A: the possible scores when the number rolled with an ordinary dice is doubled
B: the lengths of time that people take to brush their teeth
C: the possible number of interior angles of a regular polygon
D: the possible size of the exterior angles of a regular polygon
E: the numbers of factors of the integers from 8 to 15, inclusive.
b Which two variables in part a can take non-integer values?
c For each of the variables named in your answer to part b, state whether it is discrete or continuous.
11 Rafiq, a town planner, is going to a shopping centre to collect data for a survey in order to help determine its success.
Three of the questions on Rafiq’s questionnaire are shown.
Question 1 How many times have you visited this shopping centre during the past month?
Question 2 What is the name of your favourite shop in this shopping centre?
Question 3 How long does it take you to travel from your home to the shopping centre?
Tick one box only
Less than 15 up to 30 up to 1 hour 15 minutes 30 minutes 60 minutes or more
a From which one of these questions will Rafiq obtain qualitative data?
b For each of the other two questions, use statistical language to fully describe the type of data that Rafiq will collect.
SELF-ASSESSMENT
Check your answers to Exercise 1A.
How well do you understand what you have learnt so far in this chapter?
What are you good at and what needs more work?
What can you do to improve your understanding?
TIP
If a regular polygon has n sides, then it has exterior angles of 360° n and interior angles of 180° − 360° n
DISCUSSION
You want people to give honest answers when they complete a questionnaire. If they are not honest, then the data you collect is of no use. People might not be honest because the question is:
• about a sensitive topic, such as health or income
• not written in a tactful way, so giving an honest answer might cause embarrassment.
Discuss these problems with a partner and think of some examples.
Write a question that you think people would not answer honestly, and then challenge your partner to rewrite it with question(s) that people would answer honestly.
What alternatives are there to face-to-face verbal interviews that might encourage people to give more honest answers?
1.3 Types of Sample
There are several ways in which you can choose a sample to represent a population. As far as possible, all members of the population should have the same chance of being selected for the sample. If this can be achieved then we say that the sampling method is fair, or unbiased.
Before a survey can take place, you must decide what size and type of sample you require, and how you will select the sample.
Simple Random Samples
A simple random sample is obtained by selecting at random, so that each member of the population is equally likely to be selected.
Some methods of obtaining a random sample of people are:
• writing names on pieces of paper, and randomly selecting the required number of pieces of paper
• using random sampling numbers from tables or a calculator to select from a numbered list.
By using the SHIFT and RAN (or similar) keys, calculators can generate three-digit random numbers, such as 0.345, 0.87, 0.009 and 0.112. Here, you need to add 0 to 0.87 (to give 0.870). Then the first zero and the decimal point on each number are ignored, to give 345, 870, 009 and 112.
SAMPLE
If you need random two-digit numbers, you can regroup them as 34, 58, 70, 00, 91 and 12.
If you need random numbers from 00 to 09, you can place a zero in front of each of the twelve digits and use 03, 04, 05, 08, 07, 00, 00, 00, 09, 01, 01 and 02.
Any random number that is too large or too small for a population is simply ignored. Repeated random numbers are also ignored because each person or item can only be selected once for the sample.
WORKED EXAMPLE 1.6
SAMPLE
The random two-digit numbers in the following table are to be used to select various random samples from a list of 60 names numbered from 00 to 59.
The names on the list from 00 to 29 are males, and the names on the list from 30 to 59 are females.
73153465154155024842
49206635135784135703
a Use the first row to select a simple random sample of four and describe the sample that is selected.
b Use the second row to select a random sample of 10% of the population and describe the sample that is selected.
Answers
a In the first row, 73 and 65 are discarded because they are too large, and the second 15 is ignored because nobody can be selected more than once for the sample.
The random numbers that are used from the first row are 15, 34, 41 and 55.
The sample consists of one male (15) and three females (34, 41 and 55).
b The required sample consists of 10% × 60 = 6 people
In the second row, 66 and 84 are discarded because they are too large, and the second 13 and 57 are ignored because nobody can be selected more than once for the sample.
The random numbers that are used from the second row are 49, 20, 35, 13, 57 and 03.
The sample consists of three males (20, 13 and 03) and three females (49, 35 and 57).
Systematic Samples
A systematic sample is obtained by selecting at regular intervals from a numbered list. The list is divided into equal-sized groups, and one item from each group is selected for the sample.
To take a systematic sample of S items from a population of size P:
• list the items and divide them into S groups with N itemsin each group, where N = P S
• randomly select one item from the first group and then select every Nth item after that on the list.
Suppose that a systematic sample of four is required from a numbered list of 200 names.
Start by dividing the list into four groups with 50 names in each group.
Randomly select one name from the first group of 50, and then select every 50th name after that.
So, for example, if the number 32 is selected from the first group, then the sample will consist of the names with numbers 32, 82, 132 and 182.
WORKED EXAMPLE 1.7
The numbered list gives the names, in random order, of the 30 students in a class.
SAMPLE
a Describe a method that can be used to select a systematic sample of size 5 from this population of 30.
b List and describe the composition of each of the possible samples that could be selected.
Answers
a Divide the population into five groups with 30 5 = 6 students in each group (and check that 5 × 6 = 30).
The numbers of the students in the five groups are 00–05, 06–11, 12–17, 18–23 and 24–29.
Next, randomly select one member from the first group of six students.
One way to do this is to roll an ordinary, fair dice and subtract 1 from the number rolled. Another way is to use a random number from a calculator. If the calculator gives the random number 0.927 you can take this as being 09, 02 and 07, so select student number 02 from the first group (there is no student 09 in the first group) and then every 6th name after this on the list will also be selected, that is, numbers 02, 08, 14, 20 and 26.
CONTINUED
b The possible samples given by this method are shown in the table.
Students selected for the sample 00 Dev 06 David 12 Brian 18 Muzn 24 Robert 01 Lucy 07 Shuyi 13 Ketan 19 Paula 25 Maeena 02 Angela 08 Simon 14 James 20 Rahmin 26 Sally 03 Nor 09 Joy 15 Ruby 21 Daisy 27 Faida 04 Mary 10 Harriet 16 Dolly 22 Tasmia 28 Hilda 05 Rina 11 Umar 17 Sara 23 Jane 29 Johan
Sample A and sample E in Worked example 1.7 are the most unrepresentative of the population in terms of gender. This may or may not be of concern: If the purpose of the survey is to find out about female students’ opinions on school lunches, then sample E would be the only one of these six samples that could be used.
Stratified Random Samples
Populations often consist of distinct groups (or strata). Human populations are often stratified by:
• gender
• employment status
• age group
• income group.
A stratified random sample aims to represent each group (or stratum) in the population fairly. Sample members are selected at random from each group, and the size of the groups in the sample should be proportional to the size of the groups in the population.
WORKED EXAMPLE 1.8
A class of 30 students consists of 18 females and 12 males.
A random sample of size 5, stratified by gender, is to be selected.
a Find the correct composition of the sample.
b Show that all members of the population are equally likely to be selected if a random sample of size 5, stratified by gender, is selected.
CONTINUED
Answers
a The ratio of females to males in the population is 18 : 12 = 3 : 2, so the ratio of females to males in the sample must also be 3 : 2.
The number of females required is 3 5 × 5 = 3, and they will be selected at random from the 18 females.
The number of males required is 2 5 × 5 = 2, and they will be selected at random from the 12 males.
b All members of the population are equally likely to be selected because:
5 out of 30 students are selected, which is 5 30 = 1 6 of the students.
3 out of 18 females are selected, which is 3 18 = 1 6 of the females.
2 out of 12 males are selected, which is 2 12 = 1 6 of the males.
Quota Samples
Quota sampling is similar to stratified sampling and is often used in market research, but instead of obtaining a sample by selecting randomly from each group, the person who is carrying out the survey (the interviewer or data collector) is responsible for choosing the people for the sample. The interviewer may be asked to interview a certain number of males and a certain number of females. They could also be asked to interview twice as many people under 40 years of age as over 40 years of age, but they can choose who to include in the sample from the given quota of numbers and types of people.
One advantage of quota sampling is that a numbered list of names (a sampling frame) is not required.
One disadvantage of quota sampling is that the data collector may have a preference for interviewing certain types of people and ignoring other types. For example, they may choose not to interview smokers, people wearing dirty clothes, women who are chewing gum or men with beards.
Bias in Sampling
If a sample is not representative of the population from which it is selected, then the sampling method used may be biased.
SAMPLE
Unrepresentative samples are a major cause of sampling error and should be avoided wherever possible. There should be no bias towards any member or group in the population.
Random and systematic sampling may accidentally produce unrepresentative samples, as both could select all males, for example.
Stratified sampling is fairer, but it is not possible to take account of each and every group in a population.
TIP
The answer to Worked Example 1.8 part b can also be written using probabilities, which you will learn more about in Chapter 2: P(selecting a particular student) = P(selecting a particular female) = P(selecting a particular male) = 1 6
We are working towards endorsement of this title for the Cambridge Pathway.
A sample of people must be selected carefully, as bias towards certain types of people is quite common. Quota sampling may be biased because of the data collector’s prejudices.
WORKED EXAMPLE 1.9
A committee of 12 is to be selected from a group of 100 people who are classified by age group as follows.
YoungMiddle-agedOld 283834
Which of the three age groups is likely to be over-represented when a sample, stratified by age group, is selected?
Answer
The calculations to find the numbers in each age group for the sample are:
We cannot select these numbers of people because they are not integers. The calculated numbers must be rounded to the nearest whole number, and they must add up to 12.
So, we will select 3 young, 5 middle-aged and 4 old people.
The middle-aged group will be over-represented because this value has been rounded up, but the other two values have been rounded down.
WORKED EXAMPLE 1.10
a Nadia wants to collect data from the students at her school so that she can assess their attitude to reading. Suggest one location in the school where the data she collects might be unrepresentative of the population.
b A doctor’s list of recent patients is used to obtain a sample of 20 from the people living in a village. Explain why this sample is not likely to be representative of the population.
Answers
a The data will be unrepresentative of the population if she collects it in the library.
This is because most of the people she interviews will have a very positive attitude to reading.
b The sample is not likely to be representative because healthy people have little or no chance of being selected, whereas people with health issues have a much greater chance of being selected.
Exercise 1B
1 A sample of 11 people is chosen from 431 people who work outdoors and 754 people who work indoors.
Calculate the number of people that are selected from each group, if the sample is stratified by place of work.
2 Omar wants to investigate the musical preferences of the students in Stages 10 and 11 of his school using a stratified sample of 30. There are 368 students in the population and there are 205 students in Stage 10.
Find the number of students in Stage 10 and the number of students in Stage 11 that should be in the sample if it is to be representative in terms of school stage.
3 The two-digit random numbers in the table are used to select a simple random sample of five from a list of 40 names that are numbered from 00 to 39.
34415567047082269223 12093827092712093812 57612450941702312436
a Explain why the first row of numbers cannot be used to select the sample.
b Explain why the second row of numbers cannot be used to select the sample.
c Use the third row to select the required sample and list the numbers of those selected.
4 Various samples are to be selected from a list of the names of 25 boys and 35 girls. The boys are numbered from 00 to 24, and the girls are numbered from 25 to 59.
The two-digit random numbers in the table are used to select the samples.
07637720076894376322
Starting at the left and moving along a row, use:
a the first row to select a simple random sample of four children
b the second row to select a simple random sample of two boys and three girls
c the third row to select a systematic sample of six children.
5 A population consists of 80 cars. There are 47 electric cars, numbered from 00 to 46 and there are 33 petrol cars numbered from 47 to 79.
The random two-digit numbers in the table are used to select samples from the population.
133681450627225834557120974577 786218510399315729461692102356 041782904837233848172189197784
a The first row is used to select a simple random sample of five cars.
i Write down the numbers in the first row that cannot be used to select members for the sample.
ii Starting from the left, write down the numbers of the five cars that will be selected for the sample.
SAMPLE
iii Explain in what way the sample obtained in part a ii is not representative.
b The first car for a systematic sample of five is to be chosen by randomly selecting an appropriate number from the second row.
i Which numbers in the second row could be used to select the first car for the sample?
ii Find the number of electric cars and the number of petrol cars in the systematic samples using each of the numbers in your answer to part b i
c Explain how the third row of random numbers could be used to select a random sample of five, stratified by type of car, indicating which five numbers would be selected.
6 The three-digit random numbers in the table are used to select various samples from a population of hospital patients. The sampling frame consists of 640 patients numbered 000 to 639.
0.7180.4060.0890.2220.7110.4050.0030.9040.6370.253
0.3360.5820.8180.0400.3170.4290.9450.6050.0260.333
a Using the first row of the table, write down the numbers of the first and last patient selected in a simple random sample of six.
b The second row of the table is used to select a systematic sample of size 16. Write down the numbers of the 1st, 10th and 16th names to be selected for this sample.
7 A farmer keeps 44 sheep and 63 goats. An animal health inspector asks the farmer to randomly select 4, 5 or 6 of these animals, stratified by type of animal, to be examined.
For which of these three sample sizes will the sample be the most representative of the population?
8 At a college, 12% of the students write with their left hand and 88% write with their right hand.
Calculate the smallest sample size that will include 2 people who write with their left hand, if the sample is to be stratified according to which hand the students write with.
9 A travel agency published 60 holiday brochures this year: 30 were for beach holidays, 20 were for cruises and 10 were for weekend city breaks.
A survey is to be carried out on a sample of six of these brochures.
a Express the sample size as a percentage of the population size.
b If the purpose of the survey is not related to the different types of holidays, how will this affect the choice of which sampling method is used?
c Two-digit numbers have been allocated to the brochures: 00–29 for beach holidays 30–49 for cruises 50−59 for weekend city breaks
The two-digit random numbers in the table are used to select various samples.
20511328925107184247
14117380270543663997
56530733266812098447
i Starting at the beginning of the first row of the table, and moving along the row, select a simple random sample of six.
ii State, giving reasons for your answers, which two of the first six numbers in the first row you have ignored.
d A systematic sample is to be selected.
i Write down the smallest and the largest possible two-digit number of the first brochure to be selected.
ii Write down the numbers of the six selected brochures, if the systematic sample is to be selected by starting at the beginning of the second row of the table and moving along the row.
e A sample stratified by type of holiday is to be selected.
i Find the number of each type of brochure that should be selected.
ii Starting at the beginning of the third row, and moving along the row, select a sample stratified by type of holiday. Use every number if the type of holiday to which it relates has not yet been fully sampled.
10 The names of 36 students are listed and numbered from 00 to 35.
a The 3-digit random numbers in the table below are used to select a simple random sample of five students.
0.1820.9040.8890.3350.060
i Write down the names of the five students selected. Explain briefly how these were obtained.
ii In what way could this sample be considered to be unrepresentative?
b A systematic sample is to be selected from the list of 36 names
Write down the names of the students selected if the systematic sample contains:
i four students, where Tefo is the first name to be selected
ii three students, where Siti is the third name to be selected
iii six students, where Muda is the fourth name to be selected.
SAMPLE
11 At a university, there are equal numbers of undergraduate and postgraduate students. To obtain a sample of 100 students, a researcher decides to toss a coin.
• If the coin shows heads, she will randomly select 100 undergraduate students.
• If the coin shows tails, she will randomly select 100 postgraduate students. Is this method of sampling biased? Explain your reasoning.
12 At a school there are 720 students, and 392 of the students are boys. A sample of five students is to be selected to attend a meeting with parents.
a i Find the number of boys and the number of girls that should be selected if the sample is to be representative in terms of gender.
ii State the name of this type of sample
b Each of the 720 students has a four-figure identification number. The girls have consecutive odd identification numbers starting at 2001. The boys have consecutive even identification numbers starting at 2002. Write down the largest identification number issued to: i a girl ii a boy.
c A calculator is used to generate six random numbers: 0.143, 0.673, 0.772, 0.084, 0.219 and 0.5.
i Explain how these random numbers can be used to select students for the sample.
ii State which one of these six random numbers cannot be used to select a member of the sample and explain why.
iii Write down the identification numbers of the five students that will be selected for the sample.
13 There are 200 students in a university dining hall.
A sample of 5% of these students is to be selected to participate in a survey on political opinions.
a Briefly describe a biased method of selecting the sample in the dining hall, stating why it would be biased.
b Briefly describe a method of selecting a representative sample of the students.
REFLECTION
How do you remember the difference between different sampling methods?
Write some short notes to help someone who finds this difficult. How do you remember the difference between the different types of data and variable?
SAMPLE
Write some short notes to help yourself remember how to do this.
SELF-EVALUATION CHECKLIST
After studying this chapter, think about how confident you are with the different topics. This will help you to see any gaps in your knowledge and help you to learn more effectively.
I am able to:
• classify data and variables according to their type.
• know the difference between a census and a sample survey.
• distinguish between the different types of sample.
• understand the methods used to select a representative sample and know that some sampling methods can be biased.
• select a sample using a random number table.
• use open and closed questions in a survey.