1 1 statistical analysis

Page 1

1.1 – Statistical Analysis 1.1.1 - State that error bars are a graphical representation of the variability of data Error bars are a graphical representation of the variability of data. Error bars can be used to show either the range of data or the standard deviation on a graph. The value extending above and below one bar will be the same. When we collect data, it will always variation data because biological systems are subject to a genetic program and environmental variation. Time is sometimes too limited to be able to repeat experiments enough time to confirm an accurate result. Instead, we display data, we show the degree of variability in the readings. When we are doing this on a graph, we use error bars. Each error bar represents the range of readings obtained for that value, or the standard deviation.

1.1.2 - Calculate the mean and standard deviation of a set of values Mean The mean is a measure of the central tendency of the data. If the distribution is skewed, the mean may not in fact be the middle value. The median or mode may be more appropriate.

www.ibscrewed.org


The mean is an average of data points. To calculate a mean value, all the values are summed, and the total divided by the number of values. ́x =

́x

∑x n

= arithmetic mean

∑x

= sum of all measurements

n = the total number of measurements

Standard Deviation The standard deviation is a measure of how the individual observations of a data set are dispersed or spread around the mean. Standard deviation is determined by a mathematical formula which is programmed into your calculator. The standard deviation of the mean tells us how spread out the readings are. A small standard deviation indicates that the data is clustered closely around the mean value, whilst a large standard deviation indicates a wider spread around the mean To calculate this by hand: 1.

Find the mean : ́x

2.

Measure the deviation of each recording : x− ́x

3.

Square the deviations :( x− ́x )2

4.

Add all the squared deviations : ∑ ( x− ́x )

5.

(x− ́x )2 ∑ Divide by the number of samples :

2

n

Once obtained, the value may be applied to the normal distribution curve. Note that 68% of data occurs within ±1 SD, and more than 95% of the data occurs within ±2 SDs.

www.ibscrewed.org


To calculate using the Graphics Calculator (TI-nspireŠ): Mean:

Then simply enter the numbers, separated by commas, and press Enter Standard Deviation:

Then enter the numbers, separated by commas, and press Enter

Mean in Experimental situations

www.ibscrewed.org


Mean, Median and Mode Mode is the most frequent in a set of values Median is the middle value when in a set of values arranged in ascending order Mean is the sum of all the values divided by the number of values Range is the spread of the data, measured by the difference between the highest and lowest values. This is especially affected by any outliers. A normal distribution is when the values are grouped symmetrically around a central value. As a result, the mean, median and mode are all the same. A skewed distribution means that the values reduce in frequency faster on one side, causing a difference between the mean, median and mode.

www.ibscrewed.org


1.1.3 - State that the term standard deviation is used to summarise the spread of values around the mean, and that 68% of the values fall within one standard deviation of the mean Standard deviation is a measure of how spread out the data values are from the mean. It is assumed that there is a normal distribution of values around the mean and that the data is not skewed to either end. 68% of data occurs within Âą1 SD, and more than 95% of the data occurs within Âą2 SDs. If the bell curve were flatter, the deviations would be larger.

We use standard deviation to compare the means and spread of data between two or more samples.

www.ibscrewed.org


The standard deviation tells us how tightly the data points are clustered around the mean. When the data points are clustered together, the standard deviation is small; when they are spread apart, the standard deviation is large. Calculating the standard deviation of a data set is easily done on your calculator. This is useful because it tells you how many extremes there are in the data. If there are many extremes, the standard deviation would be large; with few extremes the standard deviation will be small.

1.1.4 - Explain how the standard deviation is useful for comparing the means and the spread of data between two or more samples

A sample with a small standard deviation suggest narrow variation, but a second sample with a larger standard deviation suggests wider variation. Therefore, a flatter bell curve will have a larger standard deviation. We make inferences about a whole population based on just a sample of the population. For example, two data sets may have the same mean, but one may have a larger range. Therefore, one will have a larger standard deviation leads us to question what is causing

www.ibscrewed.org


the wide spread of data. It may also lead us to question the design of the experiment and if all the variables had been covered. Comparing the standard deviations of data is a useful tool for IA’s.

1.1.5 - Deduce the significance of the difference between two sets of data using calculated values for t and the appropriate tables Once the means and standard deviations have been calculated, the next question to be asked is: Null Hypothesis: Is there no significant difference between the two samples except as caused by chance selection of data. Alternative Hypothesis: Is there a significant difference between the selected data.

The T-Test The t-test provides a way of measuring the overlap between two sets of data. A large value of t indicates little overlap and makes it highly likely that there is a significant difference between the two data sets.

Performing the t-test

www.ibscrewed.org


1. Establish the null hypothesis 2. Check that the data is normally distributed 3. Calculate the value of t using a calculator or spreadsheet formulae.

t=

́x a −́x b

s a 2 s b2 + n a nb ́x a = mean of set a

́x b

= mean of set b

s a2 = standard deviation of set a, squared sb 2 = standard deviation of set b, squared na

= number of data points in set a

n b = number of data points in set b 4. Determine the degrees of freedom. This is found using the formula:

df =total number of values∈both samples−2 5. Use a table of critical values. For Biology, we use the column of a significance level

of 0.05 (p). Find the value at the degrees of freedom calculated above. 6. If the calculated t value is greater than the critical value, then there is a lower

probability that the difference is due to chance. This means that we can reject the null hypothesis, and conclude that there is a significant difference between the two samples. These results indicate that there is an underlying reason for the difference in the means.

www.ibscrewed.org


In Excel:

degrees of freedom

p values at 0.05

1

12.71

2

4.30

3

3.18

4

2.78

5

2.57

6

2.45

7

2.36

8

2.31

9

2.26

10

2.23

www.ibscrewed.org


12

2.18

14

2.15

16

2.12

18

2.10

20

2.09

22

2.08

24

2.06

26

2.06

28

2.05

30

2.04

40

2.02

60

2.00

120

1.98

∞

1.96

When you making your conclusion, you should do the following: 1. State the null hypothesis and the alternative hypothesis 2. Set the critical p level as 0.05 3. Write the decision rule for rejecting the null hypothesis

If p > 0.05, then the two sets are the same and the null hypothesis is accepted If p < 0.05, then the two sets are different and the null hypothesis is rejected 4. Write a summary statement based on the decision

www.ibscrewed.org


We use the t-test to determine whether or not the difference between two sets of data is a significant difference. The t-test compares two sets of data to tell us the probability (p) that chance alone could produce the difference. When comparing two groups of data, we use the mean, standard deviation and sample size to calculate the value of t. We then use the table of t values. First, look in the left hand column headed 'Degrees of Freedom,' then across to the t value. The degrees of freedom are the sum of the sample sizes of each group minus two.

1.1.6 - Explain that the existence of a correlation does not establish that there is a causal relationship between two values A correlation is a mutual relation between two or more things, or an interdependence of variable quantities. The belief that two things which occur together must therefore be connected is a common mistake. The fact that two events regularly occur together may cause you to believe that A causes B (or vice versa). However, this is not always true. It may be a common event that causes them both, otherwise known as a spurious correlation. For example, some infants develop the symptoms of autism shortly after the normal time when the MMR inoculation is administered. Many parents blamed the vaccination for their child's condition. As a result, the practice of having the triple injection became unpopular. It was some time before studies could convince these parents that the two events were not causally linked. Having applied statistical tests that indicate the possibility of a correlation, we cannot then assert that one event is the cause of another. However, just because a correlation does not prove the cause, it does not mean that there cannot be a causal relationship. Therefore, statistical confidence in the possibility of a causal link is springboard to further investigation, but not proof of a relationship.

www.ibscrewed.org


We make many observations of the world around us all the time. For example, we may notice that when the soil around a plant is dry, the plant wilts. This is only an observation. We must do an experiment to see if watering the plant prevents it from wilting. Observing that wilting occurs when the soil is dry is a simple correlation, but the experiment gives us the evidence that a lack of water is the cause of wilting. Experiments provide a test which can show cause. Observations without an experiment can only show a correlation. When using a mathematical correlation test, the value of r signifies the correlation. The value of r can vary from +1 (completely positive correlation) to 0 (no correlation) to -1 (completely negative correlation)

www.ibscrewed.org


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.