How To Calculate Your Own Sample Size by Azmi Mohd Tamil

Calculate Your Own Sample Size 2008

CALCULATE YOUR OWN SAMPLE SIZE by Dr Azmi Mohd Tamil Department of Community Health, UKM Medical Centre (UKMMC). Secretariat of Medical Research & Industry, Dr Azmi Mohd Tamil UKM Medical Centre (UKMMC).

2 Calculate Your Own Sample Size 2008

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 3

Table of Contents 1. The Magic Number

2. What is Power?

3. Effect Size & Power

4. Tools for calculation

5. Installation of Tools

6. Study Design & Sample Size

7. Sample Size for Prevalence Study

8. Sample Size for Cross-Sectional Study

9. Sample Size for Case-Control Study

10. Sample Size for Cohort

11. Sample Size for Clinical Trial

12. Sample Size for Continuous Outcome

13. Sample Size for Sensitivity & Specificity

14. Conclusion

References

49 Dr Azmi Mohd Tamil

4 Calculate Your Own Sample Size 2008

The Magic Number

A common question posed to me and members of my department from the postgraduate students is “How many subjects do I need for my study?”. What is the magic number? Not being magicians nor omniscient, we can’t answer that unless we get some answers first from the students about their own studies. In medical research, the sample size has to be “just large enough”. If the sample size is too small, it’s a waste of time doing the study since no conclusive results are likely to be obtained. What happens if the sample size is too small? To demonstrate that, below is the result from a clinical trial on 30 patients, comparing the ability of two modes of treatment to control pain. Type of treatment * Pain (2 hrs post-op) Crosstabulation

Type of treatment

Pethidine

Cocktail

Total

Count % within Type of treatment Count % within Type of treatment Count % within Type of treatment

Pain (2 hrs post-op) No pain In pain 8 7

Total 15

53.3%

46.7%

100.0%

26.7%

73.3%

100.0%

40.0%

60.0%

100.0%

Chi-square =2.222, p=0.136

In spite of the large difference of pain control (53% vs 27%), the difference is not statistically significant (p > 0.05). Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 5

By entering the result from that study into a computer application, Power & Sample Size Program, we can calculate the power of that study. The power is only 32%.

Power 32%

We can see that the result of the study is not significant since power of the study is less than 80%. Using the same software, I calculated the required sample size. The result indicated that we need 50 patients in each group, giving a total of 100 patients. To demonstrate that an adequate sample size will lead to significant result of the study, I increased the sample size of the earlier study by three times, from 30 patients to 90 patients. Dr Azmi Mohd Tamil

6 Calculate Your Own Sample Size 2008 Type of treatment * Pain (2 hrs post-op) Crosstabulation

Type of treatment Pethidine

Cocktail

Total

Count % within Type of treatment Count % within Type of treatment Count % within Type of treatment

Pain (2 hrs post-op) No pain In pain 24 21

Total 45

53.3%

46.7%

100.0%

26.7%

73.3%

100.0%

40.0%

60.0%

100.0%

Chi-square =6.667, p=0.01

Instead of 30 patients, now we have 90. The difference between the rates (53.3% vs 26.7%) is the same as before but the result is now significant with the larger sample size. Therefore the sample size has to be â&#x20AC;&#x153;just large enoughâ&#x20AC;?. By knowing the sample size required, we can also estimate the cost of the study. If a larger sample is required, the cost would be higher. We can also estimate the length of the study. For example if the sample needed is 120 patients and each year there are only 40 patients available, the student will need at least 3 years to collect the data. By knowing the above, the student can decide whether the study is feasible or not. Whether he/she can complete it within the constraints of time allocated (i.e. 1 year) and the budget available (i.e. RM2500).

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 7

2. What is Power? Power is the probability of finding an effect given that one truly exists. Power is denoted by 1-β. By taking the power of the study into consideration, we can thus be reasonably sure that if there is no difference of benefit exists between the treatment and control group, it won’t be found in the trial.

In the above table, it illustrates what we want, which is either true negative or true positive. • True negative is when the result of the study indicated that there is no difference of effect between the two treatment modes and it is truly so. • True positive is when the result of the study indicated that there is a difference of effect between the two treatment modes and it is truly so.

Dr Azmi Mohd Tamil

8 Calculate Your Own Sample Size 2008

A type I error (false positive) occurs when in reality there is no difference of effect between the two treatment modes but the study that we conducted detected a difference of effect. It is also known as α error. It may happen due to choosing an inappropriate level of significance. A type II error (false negative) occurs when in reality there is a difference of effect between the two treatment modes but the study that we conducted cannot detect the difference of effect. It is also known as β error. It may happen due to inadequate sample size. Any increase in sample size increases the power of the test. This is because as the sample size increases, the standard error of the mean decreases, thus reducing the overlap between the null and alternative hypotheses.

The greater the power of study, the more sure we can be of the study results, but a greater power requires a larger sample size. It is common to require a power of between 80% and 90%. Usually we choose;

•Power of 80% for detecting difference of effect (i.e. Drug A is better than Drug B)

•Power of 90% for proving equal effect (equivocal studies; i.e. Drug A is as good as Drug B)

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 9

3. Effect Size & Power Sample size calculation is based on the quantity known as the effect size. The smaller the effect size is, the larger is the required sample size. For example, if the treatment group improved 10 times better than the control group, the sample size required is only 242. But if the treated group improved only 5 times better than the control group, the sample size required is 664. There are different types of Effect Size: •

Standardised Mean Difference — comparing the mean between either treatment groups or naturally occurring groups.

•

Odds-Ratio — comparing the rate between either treatment groups or naturally occurring groups.

•

Correlation Coefficient — measuring the association between continuous variables.

•

Proportion — comparing measures of central tendency such as comparing HIV/AIDS prevalence rates and comparing proportion of homeless persons found to be alcohol abusers.

•

Standardised Gain Score — any gain or change between two measurement points on the same variable before and after intervention such as reading speed before and after a reading improvement class.

Dr Azmi Mohd Tamil

10 Calculate Your Own Sample Size 2008

Formula for Effect Size;

Effect _ size =

μ 0 − μ1 σ

where σ is standard deviation of population of dependent (outcome) measure scores. The Effect Size (d) is interpreted as; • • •

0.2 — small effect size 0.5 — medium effect size 0.8 — large effect size

So in summary, power will be governed by; • • • • • •

Effect size (larger effect size, more power) Number of subjects (more subjects, more power) Choice of α (0.01 need larger sample, 0.05 need smaller sample) Sources of variability (i.e. sampling method) Study design (i.e. case-control vs cohort vs clinical trial) Choice of statistical test (i.e. chi2 or t-test)

So we need to have the answers to the above in order to calculate the sample size.

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 11

4. Tools for Calculating Power & Sample Size To help you to calculate your own sample size, I will cover 3 commonly used tools. They are; 1. 2.

A book by Lwanga SK, Lemeshow S., entitled “Sample Size Determination in Health Studies: A Practical Manual” published by WHO in 1991. A free computer programme known as “Power and Sample Size Program” or PS2 from Vanderbilt University. It is available in t he e nc l o se d C D o r download from http:// biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize A free computer programme known as “Statcalc” (it is part of EpiInfo6) from Center for Disease Control (CDC). It is available in the enclosed CD or download from http://www.cdc.gov/ epiinfo/Epi6/EI6dnjp.htm

Any blue label leads to help

Delta means expected difference in scores Expected standard deviation of the differences

Dr Azmi Mohd Tamil

12 Calculate Your Own Sample Size 2008

The book by Lwanga & Lemeshaw is useful in calculating sample size for prevalence studies. PS2 can be used to calculate sample size for; • • • • • •

Independent Case-Control Studies: Chi-square test, Fisher's exact test. Matched Case-Control Studies: McNemar's Test. Cohort Studies With Dichotomous Outcomes: Chi-square test, McNemar's test Continuous Response Measures in Two Groups: Paired and independent t tests. Linear Regression Survival Studies

StatCalc can be used to calculate sample size for; • • • •

Cross-Sectional – Prevalence Studies Cross-Sectional – Categorical Risk Factor & Outcome Cohort Studies With Dichotomous Outcomes: Categorical Risk Factor & Outcome Unmatched Case-Control Studies: Categorical Risk Factor & Outcome

Besides the two examples given earlier, there are other commercial applications available for calculating sample size. An example is nQuery Advisor version 6.0. If downloading and installing the applications seems too much of a hassle for you, you can make use of free online calculators from; zhttp://www.changbioscience.com/stat/ssize.html zhttp://calculators.stat.ucla.edu/

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 13

5. Installation of Tools Power & Sample Size Program 1.

3. 4.

Double-click on pssetup.exe in the enclosed CD (under CD:\Applications\PS2\pssetup.exe) or download it from http:// biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize and double-click on it. Click OK, then click on SETUP and then click on OK again. Youâ&#x20AC;&#x2122;ll see the screen below.

Upon seeing the above screen, click on the button with the computer icon. Click CONTINUE and then OK. Success! You can access the program PS2 through the menu, i.e.; Click START => All Programs =>PS - Power and Sample Size Calculation =>PS - Power and Sample Size Calculation Dr Azmi Mohd Tamil

14 Calculate Your Own Sample Size 2008

EpiInfo/StatCalc 1.

The fastest way to get StatCalc running on your computer is to copy STATCALC.EXE from the enclosed CD (CD:\Applications\StatCalc\STATCALC.EXE) into a folder named EPI6 on your C: drive. Then create a shortcut for STATCALC.EXE on your Desktop. Create the shortcut by rightclicking on the Desktop, select NEW=>SHORTCUT. Type C:\EPI6\STATCALC.EXE in the location box. Click NEXT, NEXT and then FINISH. Now StatCalc can be accessed through the new shortcut that you have just created.

If there is no enclosed CD, you can download STATCALC.EXE from the following links. After downloading, repeat the earlier steps to create a shortcut for the application.

• •

http://www.guidoluechters.de/References/Programs/STATCALC.EXE http://www.pico.at/site/index.php?menuid=159&downloadid=10&reporeid=0

You can also get STATCALC.EXE by downloading and installing Epi Info 6.04d from http://www.cdc.gov/epiinfo/Epi6/ EI6dnjp.htm

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 15

6. Study Design & Sample Size The first step in calculating your own sample size is to determine what is your study design and the outcome being measured. Students had been introduced to the various study designs commonly used in medical research. The following examples should help you to remember what you’ve learnt before; • Cross-sectional •Study on the prevalence of obesity in HUKM •Comparing the rate of diabetes mellitus between Indians and

non-Indians •Case-control - Comparing the rate of diabetes mellitus between those

with cataract and those without cataract. •Cohort - Measuring the incidence and relative risk of diabetes mellitus between normal and overweight adults. •Clinical trial - Comparing the effectiveness of Fluoxetine against Sertraline for treating depression. •Diagnostic study - Measuring the sensitivity and specificity of a new serological test against the gold standard. You also have to determine the outcome being measured, whether it is continuous or categorical. If the outcome being measured is whether the patient is in pain or not (Yes or No), then it is categorical. If the outcome being measured is the blood pressure in mm Hg after being treated, then the outcome is continuous. Each design and outcome requires a different approach for sample size calculation. Upon determining them, please refer to page 3 to get to the correct chapter. Dr Azmi Mohd Tamil

16 Calculate Your Own Sample Size 2008

7. Sample Size for Prevalence Study In a cross-sectional study, you could either be measuring the prevalence of a disease/risk factor or trying to determine the association between a categorical risk factor (i.e. ethnicity) and the categorical outcome (i.e. diabetic or not). This chapter only covers the sample size for prevalence studies. Please refer to Chapter 8 if you want to calculate the sample size for determining the association between a categorical risk factor and a categorical outcome. First you have to do a literature review to estimate the prevalence being studied. Then decide the absolute precision required (usually between 3% to 5%). Then you can calculate the sample size required. For example, lets say that we want to measure the prevalence of obesity in HUKM. We assume that the prevalence (P) is 20%, based on literature review. We decided on a precision of 5%, so that if the calculated prevalence of the study is 20%, then the true value of the prevalence lies between 15-25%. The confidence interval (1 - α) is set at 95%. Calculate Manually (Kish L. 1965) •n = (Z1-α)2(P(1-P)/D2) where •Z1-α = Z0.95 = 1.96 (For CI of 95%, Z=1.96; normal distribution table). •P = 20% = 0.2 in this example •D = 5% = 0.05 in this example •n = 1.962 x (0.2(1-0.2)/0.052) = 245.8 So the sample size required is 246. Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 17

Refer To Table By referring to the table in S.K. Lwanga, S. Lemeshaw 1991, Sample Size Determination in Health Studies pg 25, with a Prevalence (P) of 20%, precision of 0.05, the table indicated that the sample size required is 246.

Calculate Using StatCalc Start the application by double-clicking on the shortcut. Select “Sample Size & Power”, then select “Population Survey”. Fill up the boxes with the following values; •P

= 20%

•worst

acceptable result is either 15% or 25% since D=5% Press F4 to calculate. Dr Azmi Mohd Tamil

18 Calculate Your Own Sample Size 2008

So at the 95% confidence level, the sample size required is 246, which is the same value as the manual calculation & from the table. The value is the same since all methods use the same formula to calculate the required sample size. Conclusion So instead of worrying where to get the table or tediously calculating the sample size manually, just use StatCalc since the answers will be the same.

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 19

8. Sample Size for Cross-Sectional Study In a cross-sectional study, you could either be measuring the prevalence of a disease/risk factor or trying to determine the association between a categorical risk factor (i.e. ethnicity) and the categorical outcome (i.e. diabetic or not). This chapter only covers the sample size for determining the association between a categorical risk factor and a categorical outcome. Please refer to Chapter 7 if you want to calculate the sample size for prevalence studies. First you have to do a literature review to identify the rate of disease among those with the risk factor and rate of disease among those without the risk factor. You also have to identify the proportion of those with the risk factor in the population being studied. For example, you want to prove that Indians are at a higher risk of having diabetes mellitus compared to other races in your country using a cross-sectional study. From literature review; •Proportion of sample from unexposed (non-Indians) population = 85% •Proportion of sample from exposed (Indians) population = 15% •P1=true proportion of DM in unexposed (Non-Indians) population =

8% •P2=true proportion of DM in exposed (Indians) population =14%

Dr Azmi Mohd Tamil

20 Calculate Your Own Sample Size 2008

Calculate Manually Calculate using these formulas (Fleiss JL. 1981. pp. 44-45)

m=n1=size of sample from population 1 n2=rm=size of sample from population 2 P1=proportion of disease in population 1 P2=proportion of disease in population 2 α= "Significance” = 0.05 β=chance of not detecting a difference = 0.2 1-β = Power = 0.8 r = n2/n1 = ratio of cases to controls P = (P1+rP2)/(r+1) Q = 1-P. From table A.2 in Fleiss; •If 1- α is 0.95 then cα/2 is 1.960 •If 1- β is 0.80 then c1-beta is -0.842 Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 21

The sample size required is 1421 non-Indians & 251 Indians. Calculate Using StatCalc Start the application by double-clicking on the shortcut. Select “Sample Size & Power”, then select “Cohort or cross-sectional”.

Fill up the boxes with these values;

•CI

= 95% •Power=80% •Ratio=85:15 •P1=0.08 •P2=0.14 Press F4 to calculate.

Dr Azmi Mohd Tamil

22 Calculate Your Own Sample Size 2008

So at the 95% confidence level, power of 80%, the sample size required is 251 Indians and 1,422 non-Indians, giving a total of 1673. This value is similar to the value from the manual calculation since both methods use the same formula to calculate the required sample size. Calculate Using PS2 Start the application by clicking on START =>All Programs=>PS-Power and Sample Size Calculation=>PS-Power and Sample Size Calculation. Since both risk factors and outcome are dichotomous qualitative variable, click on the “Dichotomous” tab. Fill up the boxes accordingly; • • • • •

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 23 • • • • •

α = 0.05 (for 95% CI) Power = 0.8 (for power = 80%) P0 = 0.08 (Prevalence of DM amongst non-Indians) P1 = 0.14 (Prevalence of DM amongst Indians) m = ratio of control to case patients = 85/15 = 5.6667

Once done, click on “Calculate”.

Based on the above result, the sample size for the exposed group is 231. So for the unexposed, it is m x n1 or 85/15 x 231=1309. This will give a total of 1540, which is slightly less than the results from manual calculation and StatCalc. Dr Azmi Mohd Tamil

24 Calculate Your Own Sample Size 2008

By clicking on the “Graph” button, we can generate the chart on the right. It clearly indicated that if we want to increase the power to 90%, the sample size for Indians would be 320. So the higher the power, the larger the required sample size is. Why PS2 ≠ StatCalc? PS2 uses Schlesselman’s method for independent case and control groups for studies that will be analysed using an uncorrected chi-square test. For independent studies that will be analysed using Yates Correction or Fisher's exact test, PS2 uses Casagrande et al’s method. PS2 only uses the generalisation of Casagrande's method proposed by Fleiss for unequal case & control sample size. On the other hand, StatCalc uses the Fleiss method for unequal case & control sample size for all its calculations. Even then the answers differ as illustrated in the previous example; 1673 vs 1540. A difference of 133 patients. Conclusion So instead of calculating the sample size manually, just use either StatCalc or PS2 to calculate your sample size. Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 25

9. Sample Size for Case-Control Study In a case-control study, you start by recruiting the cases and the controls. Then you compare the rate of exposure/risk factor between the case and control group. For example you want to prove that cataract patients (cases) have a higher rate of diabetes mellitus (risk factor) compared to patients with normal vision (controls). From literature review, the rate of diabetes mellitus among the cataract patients is 50% and among the normal vision is 8%. We decide to have a ratio of one control for each case; i.e. 1:1. •Proportion of sample from controls (Normal) population = 50% •Proportion of sample from cases (Cataract) population = 50% •P1=true proportion of DM in controls (Normal) population = 8% •P2=true proportion of DM in cases (Cataract) population =50%

Dr Azmi Mohd Tamil

26 Calculate Your Own Sample Size 2008

Calculate Manually Calculate using these formulas (Fleiss JL. 1981. pp. 44-45)

m=n1=size of sample from population 1 n2=rm=size of sample from population 2 P1=proportion of exposure in population 1 P2=proportion of exposure in population 2 α= "Significance” = 0.05 β=chance of not detecting a difference = 0.2 1-β = Power = 0.8 r = n2/n1= ratio of cases to controls P = (P1+rP2)/(r+1) Q = 1-P. From table A.2 in Fleiss; •If 1- α is 0.95 then cα/2 is 1.960 •If 1- β is 0.80 then c1-beta is -0.842

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 27

Calculate Using StatCalc Start the application by double-clicking on the shortcut. Select “Sample Size & Power”, then select “Unmatched case-control”.

Fill up the boxes with these values; CI = 95% Power=80% Ratio=1:1 P1=0.08 P2=0.5 Press F4 to calculate.

So at the 95% confidence level, power of 80%, the sample size required is 22 cataract patients and 22 normal vision patients, giving a Dr Azmi Mohd Tamil

28 Calculate Your Own Sample Size 2008

total of 44. This value is similar to the value from the manual calculation since both methods use the same formula to calculate the required sample size. Calculate Using PS2 Start the application by clicking on START =>All Programs=>PS-Power and Sample Size Calculation=>PS-Power and Sample Size Calculation. Since both risk factors and outcome are dichotomous qualitative variables, click on the â&#x20AC;&#x153;Dichotomousâ&#x20AC;? tab. Fill up the boxes accordingly;

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 29 • • • • • • • • • •

What do you want to know? Select “Sample size”. Matched or Independent? Select “Independent”. Case control? Select “Retrospective” since this is a case-control study. How is the alternative hypothesis expressed? Select “Two proportions”. Uncorrected chi2 or Fisher’s exact test? Select “Uncorrected chi2 test”. α = 0.05 (for 95% CI) Power = 0.8 (for power = 80%) P0 = 0.08 (Prevalence of DM amongst normal vision patients) P1 = 0.5 (Prevalence of DM amongst cataract patients) m = ratio of control to case patients = 1/1 = 1

Once done, click on “Calculate”. Based on the result, the sample size for the cataract patients is 17. So for the normal vision patients, it is m x n1 or 1 x 17=17. This will give a total of 34, which is slightly less than the results from manual calculation and StatCalc. The explanation on the differences of results can be read in Chapter 8. Conclusion So instead of calculating the sample size manually, just use either StatCalc or PS2 to calculate your sample size.

Dr Azmi Mohd Tamil

30 Calculate Your Own Sample Size 2008

10. Sample Size for Cohort In a cohort study, you identify those who are currently diseasefree. From this group, you’ll identify those with and without the exposure or risk factor. Then this cohort is followed up for a predetermined period of time to identify those who will develop the disease and those who won’t. For example you want to prove that overweight adults have higher risk of diabetes mellitus compared to normal weight adults. From literature review, the rate of diabetes mellitus among the overweight adult is 32% and among the normal weight adult is 7%. We decide to have a ratio of one to one for each adult at risk. •Ratio of unexposed vs exposed; 1:1 •Proportion of sample from no-risk (Normal) population = 50% •Proportion of sample from at-risk (Overweight) population = 50% •P1=true proportion of DM in no-risk (Normal) population = 7% •P2=true proportion of DM in at-risk (Overweight) population =32%

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 31

Calculate Manually Calculate using these formulas (Fleiss JL. 1981. pp. 44-45)

Dr Azmi Mohd Tamil

32 Calculate Your Own Sample Size 2008

The sample size required is 46 overweight adults & 46 normal weight adults. Calculate Using StatCalc Start the application by double-clicking on the shortcut. Select “Sample Size & Power”, then select “Cohort or crosssectional”. Fill up the boxes with these values; •CI

= 95% •Power=80% •Ratio=1:1 •P1=0.07 •P2=0.32 Press F4 to calculate.

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 33

So at the 95% confidence level, power of 80%, the sample size required is 46 overweight adults & 46 normal weight adults, giving a total of 92 adults. This value is similar to the value from the manual calculation since both methods use the same formula to calculate the required sample size. Calculate Using PS2 Start the application by clicking on START =>All Programs=>PS-Power and Sample Size Calculation=>PS-Power and Sample Size Calculation. Since both risk factors and outcome are dichotomous qualitative variables, click on the â&#x20AC;&#x153;Dichotomousâ&#x20AC;? tab. Fill up the boxes accordingly;

Dr Azmi Mohd Tamil

34 Calculate Your Own Sample Size 2008 • • • • • • • • • •

What do you want to know? Select “Sample size”. Matched or Independent? Select “Independent”. Case control? Select “Prospective” since this is not a casecontrol study. How is the alternative hypothesis expressed? Select “Two proportions”. Uncorrected chi2 or Fisher’s exact test? Select “Uncorrected chi2 test”. α = 0.05 (for 95% CI) Power = 0.8 (for power = 80%) P0 = 0.07 (Prevalence of DM amongst normal weight adults) P1 = 0.32 (Prevalence of DM amongst overweight adults) m = ratio of control to adults at risk = 1/1 = 1

Once done, click on “Calculate”. Based on the result, the sample size for overweight adults is 38. So for the normal weight adults, it is m x n1 or 1 x 38 = 38. This will give a total of 76, which is slightly less than the results from manual calculation and StatCalc. The explanation on the difference of result can be read in Chapter 8. Conclusion So instead of calculating the sample size manually, just use either StatCalc or PS2 to calculate your sample size.

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 35

11. Sample Size for Clinical Trial A clinical trial is a planned experiment in humans designed to assess efficacy of treatment or intervention. It allow comparison of outcomes in a group of patients treated with new therapy and those in a comparable group of patients with a control therapy. For example, comparing the effectiveness of Fluoxetine against Sertraline for treating depression. From literature review, 75% of patients on Fluoxetine improved and 70% of patients on Sertraline improved. •Ratio of control vs treatment group; 1:1 •Proportion of sample from control (Fluoxetine) population = 50% •Proportion of sample from treatment (Sertraline) population = 50% •P1=true proportion of improvement in control (Fluoxetine) population

= 75% •P2=true proportion of improvement in treatment (Sertraline) population =70%

Dr Azmi Mohd Tamil

36 Calculate Your Own Sample Size 2008

The fastest way to calculate the sample size is to refer to a table. One such table is published in an article entitled “Clinical Trials in Cancer Research” in Environmental Health Perspectives Vol. 32, pp. 3148, 1979 by Edmund A. Gehan. It is available for download from http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1637924 Since the cure rate of 75% is not available in the table, we deduct 75% from 100%, giving us 25%. 0.25 is available in the table. The difference of cure rate is 0.05. For this table; Upper figure: α=0.05, power equals 0.8; middle figure: α=0.05, power equals 0.9; lower figure: α=0.01, power equals 0.95.

So for a sample size at the 95% confidence level and power of 80%, we need to refer to the upper figure. Therefore the required sample size per group is 1250. For two groups, you’ll need 2500 patients. Let us compare with the results from other methods of calculation. Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 37

Calculate Manually Calculate using these formulas (Fleiss JL. 1981. pp. 44-45)

Dr Azmi Mohd Tamil

38 Calculate Your Own Sample Size 2008

Calculate Using StatCalc Start the application by double-clicking on the shortcut. Select “Sample Size & Power”, then select “Cohort or cross-sectional”.

Fill up the boxes with these values; •CI

= 95% •Power=80% •Ratio=1:1 •P1=0.75 •P2=0.70 Press F4 to calculate.

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 39

So at the 95% confidence level, power of 80%, the sample size required is 1290 patients for Fluoxetine treatment and 1290 patients for Sertraline treatment, giving a total of 2580 patients. This value is similar to the value from the manual calculation since both methods use the same formula to calculate the required sample size. Calculate Using PS2 Start the application by clicking on START =>All Programs=>PS-Power and Sample Size Calculation=>PS-Power and Sample Size Calculation. Since both risk factors and outcome are dichotomous qualitative variables, click on the â&#x20AC;&#x153;Dichotomousâ&#x20AC;? tab. Fill up the boxes accordingly;

Dr Azmi Mohd Tamil

40 Calculate Your Own Sample Size 2008 • • • • • • • • • •

What do you want to know? Select “Sample size”. Matched or Independent? Select “Independent”. Case control? Select “Prospective” since this is not a casecontrol study. How is the alternative hypothesis expressed? Select “Two proportions”. Uncorrected chi2 or Fisher’s exact test? Select “Uncorrected chi2 test”. α = 0.05 (for 95% CI) Power = 0.8 (for power = 80%) P0 = 0.75 (Cure rate for Fluoxetine) P1 = 0.32 (Cure rate for Sertraline) m = ratio of control to treatment group = 1/1 = 1

Once done, click on “Calculate”. Based on the result, the sample size for treatment group is 1251. So for the control group, it is m x n1 or 1 x 1251 = 1251. This will give a total of 2502, which is quite similar to the table, manual calculation and StatCalc. Conclusion •From the table •From PS2 •From StatCalc •From manual calculation

; 1250 from each group = 2500. ; 1251 from each group = 2502 ; 1290 from each group = 2580. ; 1291 from each group = 2582.

So the sample size from the table is very similar to PS2’s results. The results from other methods also do not differ much.

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 41

12. Sample Size for Continuous Outcome In the earlier chapters, we calculated the sample size for categorical outcomes. But in medical research, sometimes the outcome being measured is continuous. For example, measuring the drop of blood pressure after being treated with anti-hypertensive. So we also need to know how to calculate the sample size for such studies. The following calculations are meant for comparison between two independent groups such as in a clinical trial. To make life easier, there are various tables and normograms available for us to calculate sample size for continuous data. One such article is entitled “An introduction to power and sample size estimation” published in Emergency Medical Journal 2003;20;453-458 by Jones SR, Carley S & Harrison M. It is available for download from http://emj.bmjjournals.com/cgi/content/full/20/5/453 . Refer to Normogram or Table Before we can make use of such tables or normograms, we need to specify the following; zStandard deviation of the variable (s.d) zClinically relevant difference (δ) zThe significant level (α) – 0.05 zThe power (1 - β ) – 80% The standardised difference is calculated as; δ , s.d For example, if the difference between the means = 10 mmHg & population standard deviation = 20 mm Hg, then the standardised difference is 10 /20 = 0.5. Dr Azmi Mohd Tamil

42 Calculate Your Own Sample Size 2008

The standardised difference that we calculated earlier is 0.5. We want a sample size with the power of 80%. So we draw a straight line from 0.5 to the value of 0.80 on the scale for power. We read off the value for N on the line corresponding to Îą = 0.05, which gives a total sample size of 128, so we required 64 samples for each group. We can also refer to a table as illustrated on the next page. Both normogram and table came from the sample size article listed earlier. It was written by Jones SR et al and published in EMJ in 2003.

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 43

Since Sdiff is 0.5 and power that we want is 80%, the table gave us a sample size of 64 for each group. So for 2 groups, the total sample size would be 128. Calculate Using PS2 Start the application by clicking on START =>All Programs=>PS-Power and Sample Size Calculation=>PS-Power and Sample Size Calculation. Since the outcome is continuous quantitative variable, click on the â&#x20AC;&#x153;t-testâ&#x20AC;? tab. Fill up the boxes accordingly; Dr Azmi Mohd Tamil

44 Calculate Your Own Sample Size 2008 • • • • • • •

What do you want to know? Select “Sample size”. Paired or Independent? Select “Independent”. α = 0.05 (for 95% CI) Power = 0.8 (for power = 80%) δ = 10 (difference of the means) σ = 20 (within group standard deviation) m = ratio of control to treatment group = 1/1 = 1

Once done, click on “Calculate”.

Based on the result, the sample size for treatment group is 64. So for the control group, it is m x n1 or 1 x 64 = 64. This will give a total of 128, which is the same as the normogram and the table. Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 45

Calculate Manually Calculate using this formula (Snedecor GW, Cochran WG. 1989)

s = standard deviation, d = the difference to be detected, and C = constant (refer to table below); if α=0.05 & 1-β=0.8, then C = 7.85.

d = 10 mmHg s = 20 mm Hg n = 1 + 2 x 7.85 (20/10)2 = 63.8 = 64 This is similar to the result from the normogram, table and PS2! Conclusion •From the normogram •From the table •From PS2 •From manual calculation

; 64 from each group = 128. ; 64 from each group = 128 ; 64 from each group = 128. ; 64 from each group = 128.

So the sample size is the same, irrespective of the method used. Therefore please choose the easiest possible method for you.

Dr Azmi Mohd Tamil

46 Calculate Your Own Sample Size 2008

13. Sample Size for Sensitivity & Specificity Conduct a literature review, to find out the sensitivity & specificity of the diagnostic test being studied. Once we know the sensitivity and specificity of the test, we can calculate the required sample size. We can calculate the sample size required based on the sensitivity using the following formula;

For example, we have a diagnostic test with a sensitivity of 95% being tested on a population with a disease prevalence of 30%. Level of accuracy (W) required is 5% and confidence interval of 95%. • SN = 95% • z = 1.96 for CI of 95% • P = Prevalence of disease amongst test population = 30% • W = 0.05

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 47

We can also calculate the sample size required based on the specificity using the following formula;

For example, the same diagnostic test has a specificity of 80% being tested on the same population with a disease prevalence of 30%. Level of accuracy (W) required is 5% and confidence interval of 95%. • SP = 80% • z = 1.96 for CI of 95% • P = Prevalence of disease amongst test population = 30% • W = 0.05

Conclusion In the above example, a diagnostic kit with a sensitivity of 95% requires a sample size of 243. The same kit has a specificity of 80% requires a sample size of 351. If the study is interested in both sensitivity and specificity, then we take the higher number (e.g. 351). Dr Azmi Mohd Tamil

48 Calculate Your Own Sample Size 2008

14. Conclusion You can calculate your own sample size. Tools are easily available and most of them are free. Please determine what is your study design and then choose the appropriate method to calculate the sample size.

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 49

References •

•

• •

•

• • •

Casagrande JT, Pike MC, Smith PG: An Improved Approximate Formula for Calculating Sample Sizes for Comparing Two Binomial Distributions. Biometrics, 1978; 34:483-486. Dupont WD and Plummer WD: PS power and sample size program available for free on the Internet. Controlled Clinical Trials,1997;18:274 Fleiss JL. Statistical methods for rates and proportions. New York:John Wiley and Sons, 1981. Gehan EA. Clinical Trials in Cancer Research. Environmental Health Perspectives, Vol. 32, pp. 3148, 1979. Download from http://www.pubmedcentral.nih.gov/articlerender.fcgi? artid=1637924 Jones SR, Carley S & Harrison M. An introduction to power and sample size estimation. Emergency Medical Journal 2003;20;453-458. Download from http://emj.bmjjournals.com/ cgi/content/full/20/5/453 Kish L. Survey sampling. John Wiley & Sons, N.Y., 1965. Snedecor GW, Cochran WG. Statistical Methods. 8th Ed.Ames: Iowa State Press. 1989. Schlesselman: Case-control Studies: Design, Conduct, Analysis. New York: Oxford U. Press; 1982:144-152.

Dr Azmi Mohd Tamil

50 Calculate Your Own Sample Size 2008

Dr Azmi Mohd Tamil

Calculate Your Own Sample Size 2008 51

Dr Azmi Mohd Tamil

52 Calculate Your Own Sample Size 2008

Published by Secretariat of Medical Research & Industry, UKM Medical Centre (UKMMC), Yaacob Latif Road, Bandar Tun Razak, Cheras, 56000 Kuala Lumpur, Federal Territory, MALAYSIA. Tel: +603-9145 5048 Fax: +603-9172 5339 http://www.hukm.ukm.my/mrs/

Dr Azmi Mohd Tamil