MATHEMATICS Learner’s Study and Revision Guide for Grade 12
DATA HANDLING Part 1
Revision Notes, Exercises and Solution Hints by
Roseinnes Phahle
Examination Questions by the Department of Basic Education
Preparation for the Mathematics examination brought to you by Kagiso Trust
Contents Unit 19 Part 1 Revision notes contain everything you need to know
3
Exercise 19 lets you practice everything you need to know
6
Solution with hints and much enrichment to Exercise 19 – Ungrouped data
8
Solution with hints and much enrichment to Exercise 19 – Grouped data
10
Examination questions with solution hints and answers
12
Unit 19 Part 2 More questions from past examination papers Answers – the answers have much to teach you about statistical analysis
How to use this revision and study guide 1. Study the revision notes given at the beginning. The notes are interactive in that in some parts you are required to make a response based on your prior learning of the topic from your teacher in class or from a textbook. Furthermore, the notes cover all the Mathematics from Grade 10 to Grade 12. 2. “Warm-up” exercises follow the notes. Some exercises carry solution HINTS in the answer section. Do not read the answer or hints until you have tried to work out a question and are having difficulty. 3. The notes and exercises are followed by questions from past examination papers. 4. The examination questions are followed by blank spaces or boxes inside a table. Do the working out of the question inside these spaces or boxes. 5. Alongside the blank boxes are HINTS in case you have difficulty solving a part of the question. Do not read the hints until you have tried to work out the question and are having difficulty. 6. What follows next in Part 2 are more questions taken from past examination papers. 7. Answers to the extra past examination questions appear at the end. Some answers carry notes to enrich your knowledge. 8. Finally, don’t be a loner. Work through this guide in a team with your classmates.
Data Handling – Part 1
REVISION UNIT 19: DATA HANDLING – PART 1 The following is what you should understand: 1. Raw data is hard to work with. Arranging the data in a stem and leaf plot is much easier because: a) the data can be seen in an ascending order; and b) so the median and the quartiles Q1 and Q3 can easily be obtained. c) a pattern, if any, in the data can easily be discerned. For example, symmetry, skewness, etc. To do a stem and leaf diagram, especially when you have a relatively large data set, first construct a “rough” stem and leaf diagram. This you do by recording the stems in the right order. But the leaves you record them as you read them column by column or row by row. As an example: 66 74 24 18 10 87 65 34 27
32 37 53 37 14 48 43 57 92
39 86 75 58 73 82 85 71
31 34 55 86 33 58 23 74
88 90 25 54 52 21 14 28
18 42 32 94 72 51 38 99
Here is the “rough” stem and leaf diagram reading data values downwards column by column starting with the first column and also the corrected diagram in which the leaves are arranged in ascending order: “Rough” stem and leaf diagram 1 2 3 4 5 6 7 8 9
Correct diagram with leaves in ascending order 1 2 3 4 5 6 7 8 9
0448 73518 427791328 48342 37858421 65 453142 8762568 2049
3
0448 13578 122347789 23448 12345788 56 123445 2566788 0249
Preparation for the Mathematics examination brought to you by Kagiso Trust
Arranging this relatively large amount of data in a stem and leaf diagram makes it easy to read from it the five number summary (see item 4 below). 2. You must be able to calculate the mean, median and mode which are known as the measures of central tendency. In the case of grouped data you will need to have drawn an ogive from which to determine all three quartiles: Q1 , the median Q2 and Q3 .
3. Measures of spread are the range, interquartile range and standard deviation. 4. The five number summary is needed to sketch a box-and-whisker plot. The five numbers are: the minimum value; Q1 ; the median Q2 ; Q3 ; and the maximum value. 5. You must be able to interpret a box-and-whisker plot: a) tell from the plot if the data is positively skewed; b) tell from the plot if the data is negatively skewed. 6. Understand the following about the spread of the data: a) 25% of data lies between the min value and Q1 ; b) 25% of the data lies between Q1 and Q2 ; c) 25% of the data lies between Q2 and Q3 ; d) 25% of the data lies between Q3 and max value. 7. Use of the stats function on an electronic calculator to calculate the mean, variance and standard deviation as well as determine the minimum and maximum data values. (If you do not know how, see the answer to Exercise 19.) 8. You must be able to interpret the statistical summaries of two data sets, for example: Data set A Data set B
Mean 17,5 17,5
Standard deviation 5,6 11,8
Data set A has a smaller standard deviation and this means that data set A is more consistent than data set B and so more of its values are clustered around its mean. 9. Regarding cumulative frequency graphs (ogives):
Data Handling – Part 1
a) You must know the points to be plotted – what you plot as points are the upper limits of each interval against the cumulative frequency of that interval, and the from upper limit of the first interval you extend the curve to touch the horizontal axis at the lower limit of the interval; b) You must know how to read Q1 , Q2 and Q3 from the graph - mark the 25%, 50% and 75% of the data along the vertical axis; their corresponding values on the horizontal axis will give you Q1 , Q2 and Q3 . c) You must know how to deduce the box-and whisker plot from the ogive – as illustrated in the diagram below, simply drop vertical lines from the points on the horizontal you have marked as Q1 , Q2 and Q3 . y
x 1
2
3
4
5
6
7
8
9
We have assumed above that there are 60 data values. So 25% of the values will be marked by 15 on the vertical axis, 50% by 30 and 75% by 45. Along the horizontal axis are shown 9 classes or intervals or groups the first one being 0 ≤ x < 10 , the second one 10 ≤ x < 20 and the last one being 80 ≤ x ≤ 90 . 10. You must understand that histograms give an indication of whether the data is symmetrical or not, skewed or not, uni-modal or otherwise but individual data is lost so that all measures of central tendency and spread become estimates. 11. You must know how to group data into classes or intervals. 5
Preparation for the Mathematics examination brought to you by Kagiso Trust
12. You must know how to work out the mid value of each class when data is grouped so as to use it as the value to estimate the mean and standard deviation of grouped data. 13. Learn to interpret data that is symmetrically or nearly symmetrically distributed (normal data) in terms of the 68%-95%-99,7% rule or z-scores , for example, finding what percentage of scores lie within one, two or three standard deviations of the mean, that is within each of the following intervals x ± σ n , x ± 2σ n or x ± 3σ n respectively.
EXERCISE 19 (This exercise will help you revise everything you need to know in Data Handling).
In the table below write out the formulae for mean, standard deviation and the positions of the quartiles when the data is arranged in ascending order: Mean
Standard Deviation
Quartiles Position of lower quartile Q1 is
Position of median (middle quartile) Q2 is
Position of upper quartile Q3 is
The 36 numbers below were obtained by using the random number generation key [Ranint (10,49) to generate integers between 10 and 49] facility on an electronic calculator: 16 20 18 24 20 23
31 33 47 41 16 49
28 27 12 34 24 48
17 39 23 47 13 20
46 31 39 43 15 20
24 38 10 17 44 20
UNGROUPED DATA SET 1. Arrange the data in a stem and leaf diagram. 2. Calculate the five number summary. (HINT: use your stem and leaf diagram to do this.)
Data Handling – Part 1
3. Draw a box-and-whisker plot. 4. Interpret the data. 5. Use the STAT function on your calculator to determine the mean of the data (see page 193). 6. Use the STATS function on your calculator to determine the standard deviation of the data. 7. Calculate the percentage of data that lies within one standard deviation of the mean.
GROUPED DATA SET 1. Arrange the data into 4 classes between 10 and 50 inclusive. Complete the following frequency distribution table and, using appropriate formulae, hence calculate the approximate mean and approximate standard deviation. Why are they approximate? Class
x
f
xf
(x − x )
( x − x )2
( x − x )2 f
Cumulative f
TOTAL 1. Which is the modal class? 2. Draw a histogram. 3. Draw a cumulative frequency curve (ogive). 4. On the curve mark the median, Q1 and Q3 . 5. Use the above markings to draw the box-and-whisker plot just below your ogive. (HINT: You can quickly draw the box-and-whisker diagram just below the ogive by simply extending the readings of the quartiles you make along the horizontal axis.) 6. Calculate the approximate mean of the grouped data using the STAT facility on your electronic calculator (see page 195). (HINT: You must use the mid-point of each interval to represent the interval and multiply it by the frequency of the interval.) 7. Write down the approximate standard deviation of the grouped data. 7
Preparation for the Mathematics examination brought to you by Kagiso Trust
SOLUTION TO EXERCISE 19 Ungrouped data Corrected stem and leaf 2. Five number summary: diagram: Minimum 10 1 023566778 19 Q1 2 000003344478 Median 24 3 1134899 39 Q2 4 13467789 Maximum 49
1. â&#x20AC;&#x153;Roughâ&#x20AC;? stem and leaf diagram: 1 2 3 4
686273507 040387430040 1349198 71987634
Here is a reminder on how the positions of the quartiles are worked out: 1 1 Position of the 1st quartile is given by (n + 1) = (36 + 1) = 9,25 so that it lies between the 9th 4 4 th and 10 positions when the data is arranged in ascending order (as in the stem and leaf diagram which you can use for this purpose. The 9th position is occupied by data value 18 and the 10th by data value 20. 1 (18 + 20) = 19 2 Positions of the median and 3rd quartile are given by
So Q1 =
1 (n + 1) and 3 (n + 1) where n is 36 in this example. 2 4
3 Box and whisker plot 6
5
4
3
2
1
0
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
Data Handling – Part 1
4 Intepretation of the data No skewness and no symmetry. But a small variation among the number of data values in each class: 9 data values in the 1st class, 12 in the 2nd, 7in the 3rd and 8 in the 4th. The modal class of the data values in the 20’s deviates somewhat from more or less the same number in the other classes. The small variation is due to the fact that the data values in this example are numbers between 10 and 49 randomly generated on an electronic calculator (using the key Ranint on CASIO fx-82ES PLUS). As random numbers they each have an equal chance (probability) of featuring and so too will be how many of them there will be in each of the classes of 10’s, 20’s, 30’s and 40’s. (The distribution of data whose values have an equal chance of coming up is said to be rectangular – its histogram will be rectangular or nearly rectangular in shape. Another example 1 are the numbers on a fair die each with a probability of showing up when the die is tossed.) 6 5. and 6. Electronic calcalculation of mean and standard deviation In the examination you will find it much quicker to use the STAT function on your electronic calculator to determine the mean and standard deviation. These are the steps of how it’s done on the CASIO fx-82ES PLUS – what you key into your calculator is shown in bold print: MODE 2: STA 1: 1-VAR Using the data of this exercise, you enter it as follows: 16 = 20 = 18 = and so on till 44 = 20 = AC – you must press this key at the end of the input of all the data. To see the mean: To see the standard deviation: Do you know the difference between σ x SHIFT 1 SHIFT 1 and s x ? 4: VAR 4: VAR 2: x 2: σ x σ x is the population standard deviation. = 28,25 s x is the sample standard deviation = 11,76 (which is an approximation to the population standard deviation). Answer: Answer: In this exercise, is the data a sample or a x =28,25 σ x =11,76 population? 9
Preparation for the Mathematics examination brought to you by Kagiso Trust
7 Percentage of data within one standard deviation of the mean You must evaluate the interval ( x ± σ x ) which is
(x − σ x ; x + σ x ) = (28,25-11,76; 28,25+11,76) = (16,49;
40,01)
So you must look at your data values and see how many of them lie between 16,49 and 40,01. This is easy to work out from the stem and leaf diagram. Answer: 24 out of 36 so percentage is 66,7%.
SOLUTION TO EXERCISE 19 Grouped data
1. Frequency distribution table
Class
x
f
xf
10 - 19 20 - 29 30 - 39 40 - 49 TOTAL
14,5 24,5 34,5 44,5
9 12 7 8 36
130,5 294 241,5 356 1022
(x − x )
( x − x )2
( x − x )2 f
Cumulative f
-13,89 -3,89 6,11 16,11
192,9231 15,1321 37,3321 259,5321
1736,3079 181,5852 261,3247 2076,2568 4255,4746
9 21 28 36
Answers:
x=
∑ xf
σx =
n
= 28,39
∑ (x − x ) n
2
f
= 10,87
NOTE: This table entails a lot of work and much time is consumed to construct it. Do you see how quicker and easier it would be to use the STATS function on your electronic calculator and why you must ensure you can use this function correctly? But some times the examination question leaves you with no choice but to construct the table. 2. Modal class Answer: The modal class is 20 – 29 or the 20’s.
Data Handling â&#x20AC;&#x201C; Part 1
3. Histogram 12
10
8
6
4
2
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
10
11
12
13
14
15
4. 5. and 6. Cumulative frequency curve (ogive) and box plot
40
30
20
10
0
0
1
2
3
4
5
6
7
8
9
-10
-20
7 and 8 Mean and standard deviation using the STATS function on an electronic calculator When the data comes with frequencies of the data values or classes of data values, you must first set your calculator in FREQUENCY mode. This you do by first keying in the following steps: SHIFT SET UP Scroll down and select 3: STAT 1: ON Now follow the steps shown in Question 5 of Ungrouped Data. 11
NOTE: There are two columns to fill. One for the data values under X and the other for FREQ. Fill in the data values under X and then scroll over to the next column to fill in the frequencies.
Preparation for the Mathematics examination brought to you by Kagiso Trust
PAPER 2 QUESTION 8
DoE/ADDITIONAL EXEMPLAR 2008
Data Handling â&#x20AC;&#x201C; Part 1
PAPER 2 QUESTION 8 Number Hints and answers 8.1 These questions carry very little marks. Just one mark each. That is because the nature of the data given here makes it easy to observe without any real calculation the median, lower and upper quartiles. Answers: Median =
DoE/ADDITIONAL EXEMPLAR 2008 Work out the solutions in the boxes below Use the following formulae to work out the position:
(
)
1 n + 1th position. 4 1 th The median or Q2 is in the (n + 1) position. 2 3 th Q3 is in the (n + 1) position. 4
Q1 is in the
8.2
Q1 = Q3 =
8.3
Draw the box and whisker diagram on the diagram below. DIAGRAM SHEET 2
8.4
Statistical comment depends on you looking at summary measures such as the quartiles and diagrams of the data to see what story they tell. Look at the box plot and say if the data is positively or negatively skewed or symmetric or nearly symmetric. What does you observation imply in view of the division of the data by the quartiles? What are the whiskers on either side like, long or short? And what does that imply? Are there any outliers in the data? If so, which countries are outliers? Comment (write down you comments):
13
Preparation for the Mathematics examination brought to you by Kagiso Trust
PAPER 2 QUESTION 9
DoE/ADDITIONAL EXEMPLAR 2008
PAPER 2 QUESTION 10
DoE/ADDITIONAL EXEMPLAR 2008
Data Handling – Part 1
PAPER 2 QUESTION 9
DoE/ADDITIONAL EXEMPLAR 2008
Number Hints and your solutions 9.1 Draw the scatter diagram on the axes shown below. DIAGRAM SHEET 2
9.2 9.3
Look at your diagram and say whether you see a linear, quadratic or exponential trend. Fit the line or curve of best fit according to what you have observed. Answer: (You write it down) Use your diagram to make the estimate. Answer: CPI for January 2008 is estimated at (answer is close to 9%). Do you agree?
PAPER 2 QUESTION 10
DoE/ADDITIONAL EXEMPLAR 2008
Number Hints and and your solutions 10.1 There are two ways of calculating the standard deviation. Either you draw a table or you use the statistical keys on your calculator. Drawing a table will take up too much time not worth the 3 marks for the question. So you are advised to use the statistical keys on your calculator. In that case there is no working to be shown. Answer: σ n ≈ 1,69 (1,68518 …) 10.2
Bear in mind that the standard deviation is a measure of spread of data about its mean. What story does the data tell you? Is there much or little variation in the daily temperatures? Is the confirmation by the standard deviation? What is the range? Does the range tell you anything about the variation in the data? Answer: (Write down your comment)
15
Preparation for the Mathematics examination brought to you by Kagiso Trust
PAPER 2 QUESTION 11
DoE/ADDITIONAL EXEMPLAR 2008
PAPER 2 QUESTION 11
DoE/ADDITIONAL EXEMPLAR 2008
Number Hints and your solutions 11.1 Complete the cumulative frequency column: DIAGRAM SHEET 3
Data Handling – Part 1
Number Hints and your solutions 11.2 Remember that it is the upper limit of each interval that you must plot against the cumulative frequency of that interval. Then pull your curve to join the upper limit of the first interval to its lower limit. DIAGRAM SHEET 3
11.3
11.4
Read the answer off the ogive. Answer: You should answer the question asked not simply as “92 learners” but by stating what this number is. Your answer should read something like this: The number of learners who spent about R50 or less on airtime is 92. To calculate the mean of grouped data you must use the the mid-value of each interval as a representative of the interval. This way the mean will be an approximation of the true mean. Thus the table given to you below includes a column which you must complete with the mid-values. Then another column in which you multiply the mid-value by the frequency of the interval (or group)
Answer: Mean =
7420 ≈ R 46,38 160
17
Preparation for the Mathematics examination brought to you by Kagiso Trust
PAPER 2 QUESTION 9
PAPER 2 QUESTION 10
DoE/NOVEMBER 2008
DoE/November 2008
Data Handling – Part 1
PAPER 2 QUESTION 9 Answer 9.1
9.2
DoE/NOVEMBER 2008
Hints and answers No hint is required. Do the calculation. Answer: 22 minutes You are asked in the question to use the formula on the information sheet. So to answer this question you must not use the statistical keys on your calculator.
Formula:
n=
(x − x )
x
Can you identify the formula? Write the formula down in the box opposite.
( x − x )2
It means you must set up a table as shown in the box opposite. You must now fill in the table and do the necessary calculations.
Σ( x − x ) =
Σx =
2
x= Answer: σ n = 3,95 9.3
You must evaluate the interval: x ± σ n Then see how many runners fall within this interval. Answer: 6 runners
19
σn =
Preparation for the Mathematics examination brought to you by Kagiso Trust
PAPER 2 QUESTION 10
DoE/NOVEMBER 2008
Number Hints and your solutions 10.1 Work out the frequencies of each interval on the horizontal axis by reading the tops of each corresponding bar of the histogram against the vertical axis. DIAGRAM SHEET 3
10.2
See the hint to Question 11.2 of Additional Exemplar 2008 Paper 2. DIAGRAM SHEET 3
10.3 10.4
How does the median divide the data? What is the position for the given data? Mark its position on the vertical axis and read the value corresponding to it on the horizontal axis. Answer: An acceptable answer lies between R84 and R90 It is the position of the upper quartile you must find here. What is it? Read its corresponding value on the horizontal axis. This will give you the lower limit of the upper 25% interval. Answer: R96 â&#x2030;¤ sales â&#x2030;¤ R120
Data Handling – Part 1
PAPER 2 QUESTION 11
DoE/NOVEMBER 2008
PAPER 2 QUESTION 12
DoE/NOVEMBER 2008
21
Preparation for the Mathematics examination brought to you by Kagiso Trust
PAPER 2 QUESTION 11 Number 11.1
11.2
11.3
DoE/NOVEMBER 2008 Hints and your solutions
DIAGRAM SHEET 4
The curve is either linear, quadratic, increasing exponential , decreasing exponential or describe it in your own words. Which is it? Answer: Find 5,5 seconds on the horizontal and read off its corresponding value on the vertical axis. Answer: Approximately 90 m.
Data Handling â&#x20AC;&#x201C; Part 1
PAPER 2 QUESTION 12
DoE/NOVEMBER 2008
Number Hints and your solutions 12.1 This question carries two marks. You must therefore make two statements each of which will carry one mark. There are more than two statements that can validly be made. The statements must be relevant and valid in terms of the features of the two box plots. Answer: (You take a good look of the plots and make two statements describing common features of the plots) 12.2
12.3
Find the upper quartile of class B. Find also the lower quartile of class B. The interquartile range is the difference between the two.
Show your working in this box.
Answer: 18 This question carries three marks. Obviously one mark will go to saying whether Mr Jack is right or wrong. Having said whether Mr Jack is right or wrong the remaining two marks will go to the reasons you give to support your answer as to whether Mr Jack is right or wrong. The two marks left mean that you must give two reasons to support your answer, each of which will carry one mark. There are at least 4 answers you can give. Answer:
23