CREDIT 3
REVIEW 2013 BATCH
SINGLE CHOICE 20’ TERMS SIMPLE QUESTION COMPLICATED QUESTION
BHAGATH M S
RAJEEV BISWAS
SECTION A :: MCQ (Multiple Choice) 1. There are 100 men., with mean 171.06cm and standard deviation 4.95cm.in heights ,with mean 61.54kg and standard deviation 5.02kg in weights. Which of the following statements is true? (A) The variation of the weight is higher than that of the height. (B) The variation of the height is higher than that of the weight. (C) The variation of the height is same as the weight. (D) the variation of the height and that of the weight are not comparable because of different units. (A) The variation of the weight is higher than that of the height. 2 Which of the following statements is correct? (A) The range is a measure of central tendency. (B) The median is a measure of dispersion. (C) For a symmetric distribution, the mean is equal to the median. (D) For a skewed distribution, the variance is a negative number. (C) For a symmetric distribution, the mean is equal to the median. 3. In an experiment to determine if antibiotics increase the final dressed weight of cattle, the following were measured on each animal in the study. sex, weight, grade of meat. where grade is recorded as (A, B, or C). The scale of measurement of these variable is: (a) Nominal, Numerical, ordinal (b) Nominal, Nominal, ordinal (c) Ordinal, Numerical, Nominal (d) Ordinal, categorical, Nominal (a) Nominal, Numerical, ordinal
4.If there are 100 man,the means of height is 171.06cm,standard deviation is 4.95cm, the means of weight is 61.54kg,standard deviation is 5.02kg, which is right. (A) The variation of weight is higher than height. (B) The variation of height is higher than weight. (C) The variation of height is same as the weight. (D) It can not compare the variation of height and weight, because their units are different. (A) The variation of weight is higher than height.
5
A study is conducted on students taking a statistic class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student owns is quantitative. B) Number of credit hours taken during that semester is quantitative. C) The time the student waited in line at the bookstore to pay for his/her textbooks is categorical. D) Home state of the student is quantitative. (B) Number of credit hours taken during that semester is quantitative.
6
Which of the following statements is INCORRECT about the sampling distribution of the sample mean : (a) The standard error of the sample mean will decrease as the sample size increases. (b) The standard error of the sample mean is a measure of the variability of the sample mean among repeated samples. (c) The sample mean is unbiased for the true (unknown) population mean. (d) The sampling distribution shows how the sample was distributed around the sample mean. (d) The sampling distribution shows how the sample was distributed around the sample mean.
7
Which of the following statements is NOT true? (a) In normal distribution, the mean and the median are equal. (b) Normal distributions are bell-shaped. (c) The area of standard normal distribution between -1.96 and 1 is 0.95. (d) Normal distribution with mean μ=0 and variance σ2=1 is called the standard normal distribution. (c) The area of standard normal distribution between -1.96 and 1 is 0.95.
8
The average time it takes for a person to experience pain relief from aspirin is 25 minutes. A new ingredient is added to help speed up relief. Let μ denote the average time to obtain pain relief with the new product. An experiment is conducted to verify if the new product is better. What are the null and alternative hypotheses? (a) H0 : μ = 25 vs HA : μ≠25 (c) H0 : μ < 25 vs HA : μ = 25
(b) H0 : μ = 25 vs HA : μ < 25 (d) H0 : μ < 25 vs HA : μ > 25
(b) H0 : μ = 25 vs HA : μ < 25 9
If the correlation between body weight and annual income were high and positive, we could conclude that: (a) High incomes cause people to eat more food.
(b) Low incomes cause people to eat less food. (c) High income people tend to spend a greater proportion of their income on food than low income people, on average. (d) High income people tend to be heavier than low income people, on average. (d) High income people tend to be heavier than low income people, on average 10 In a test of H0 : μ = 100 against HA : μ ≠ 100, a sample of size 10 produces a sample mean of 103 and a pvalue of 0.08. Thus, at the 0.05 level of significance: (a) there is sufficient evidence to conclude that μ≠ 100. (b) there is sufficient evidence to conclude that μ = 100. (c) there is insufficient evidence to conclude that μ = 100. (d) there is insufficient evidence to conclude that μ≠ 100.
(d) there is insufficient evidence to conclude that μ≠ 100.
Section B :: TERMS POPULATION The entire group or the measurements of the entire groups that researchers are of interest. SAMPLE A portion or subset of population that we use for inferring population. Sample size, n The number of individuals to be included in a sample. SAMPLE SIZE Sample size, n The number of individuals to be included in a sample. PARAMETER A number that describes a population characteristic. Example :: average gross income of all people in China in 2002. PROBABILITY It’s the numerical measure of the likelihood of degree of predictability that event will occur P(A). A formal way to measure the chance of these uncertain events 0<P<1
STATISTIC ď&#x201A;ˇ A number that describes a sample characteristic. ď&#x201A;ˇ Example :: average gross income of people from Sample of 3 provinces in 2002 CI ď&#x201A;ˇ Confidence interval ď&#x201A;ˇ A range [or an interval] of values used to estimate the true value of the population parameter. ď&#x201A;ˇ The level of confidence 1 â&#x20AC;&#x201C; Îą is the probability that the interval estimate contains population parameter usually 90% (Îą = 10%), 95% (Îą = 5%), 99% (Îą = 1%) RI ď&#x201A;ˇ Reference interval ď&#x201A;ˇ A range of values within which majority of measurements from normal subject will lie. ď&#x201A;ˇ Majority : 90%, 95%, 99% etc SAMPLING ERROR ď&#x201A;ˇ Difference between statistic and parameter ď&#x201A;ˇ CAUSATION: individual variation + sampling ď&#x201A;ˇ REPRESENTATION: o Difference between statistic and parameter o Differ from one sample to another. ď&#x201A;ˇ For each statistic, sampling error has its own distribution. MEDIAN ď&#x201A;ˇ If the data are arranged in increasing or decreasing order, the median is the middle value, which divided the set into equal halves. ď&#x201A;ˇ When n is odd, M = Xn+1 1
2
ď&#x201A;ˇ When n is even, M = 2 [X n + Xn+1 ] 2
2
CV ď&#x201A;ˇ Describes the variation of expressing the standard deviation as a proportion or percentage of the mean. The resulting is coefficient of variation. đ?&#x2018;&#x2020; ď&#x201A;ˇ CV = xĚ&#x2026; x 100% ď&#x201A;ˇ Non zero mean, make comparison between different distribution.
SD ď&#x201A;ˇ ď&#x201A;ˇ
Standard deviation It is the square root of variance (s2)
ď&#x201A;ˇ
ÎŁ (Xâ&#x2C6;&#x2019;X)2
S=â&#x2C6;&#x161;
đ?&#x2018;&#x203A;â&#x2C6;&#x2019;1
SE ď&#x201A;ˇ Standard error ď&#x201A;ˇ The measure of sampling error is named standard error ď&#x201A;ˇ For the sampling error of means SE is calculated by using Ď&#x192; đ?&#x2018;&#x2020; đ?&#x153;&#x17D;đ?&#x2018;ĽĚ&#x2026; = n sxĚ&#x2026; = đ?&#x2018;&#x203A; â&#x2C6;&#x161;
â&#x2C6;&#x161;
SIGNIFICANCE LEVEL ď&#x201A;ˇ Îą=0.05 ď&#x201A;ˇ If H0 is true the sample mean should be closed to 132 ď&#x201A;ˇ If the H1 is true, a sample mean will be expected to be significantly different (greater or lower than) from 132. ď&#x201A;ˇ Significant different means that the result of experiment will be rare results if the H0 is true. ď&#x201A;ˇ What is the rare result or rare enough result to be suspicious of the null hypothesis? ď&#x201A;ˇ We define 0.05 or 0.01 as the probability of rare result. That is significance level. RATIO (Ratio Comparison Index) ď&#x201A;ˇ Ratio is defined as one quantity relative to another.
ď&#x192;Ścď&#x192;ś ď&#x192;§ ď&#x192;ˇk ď&#x192;¨d ď&#x192;¸ â&#x20AC;˘ C, d the frequency or relative frequency of occurrence of some events or terms, such as the person-doctor ratio, the person-hospital bed ratio. â&#x20AC;˘ K used in ratio are mostly 1 and 100. PROPORTION â&#x20AC;˘ Composing index, the relative frequency of every composition taking account of special factor, such as race, sex, age group in a whole group. ď&#x192;Ś fi ď&#x192;ś ď&#x192;§ď&#x192;§ ď&#x192;ˇ100 fi ď&#x192;ˇď&#x192;¸ ď&#x192;Ľ ď&#x192;¨ â&#x20AC;˘ â&#x20AC;˘ For example, sex proportion, race proportion, age proportion. RANDOMIZATION ď&#x201A;ˇ The process of assigning participants to group is called Randomization. ď&#x201A;ˇ It is a method used to prevent bias in research. Treatment assignments are generated by a computer, and each participant have an equal chance of being assigned to one of two or more groups, the control group and the treatment group: ď&#x201A;ˇ The control group is made up of the people who get the most widely accepted treatment (standard treatment) for their cancer ď&#x201A;ˇ The investigational group is made up of the people who get the new treatment being tested.
RANK SUM ď&#x201A;ˇ Wilcoxon Rank sum test ď&#x201A;ˇ Rank Sum Test for Comparing the Locations of Two Populations ď&#x201A;ˇ Mann-Whitney test ď&#x201A;ˇ review t-test for comparing 2 population means Normality and homogeneity CORRELATION COEFFICIENT ď&#x201A;ˇ Coefficient of correlation â&#x20AC;&#x153;râ&#x20AC;? measures the degree of association between the two values of a related variables given in the data set. ď&#x201A;ˇ It takes values from +1 to â&#x20AC;&#x201C; 1 ď&#x201A;ˇ
r=
=
Ě&#x2026;) ÎŁ(xi â&#x2C6;&#x2019;xĚ&#x2026;)(yi â&#x2C6;&#x2019;y â&#x2C6;&#x161;đ?&#x203A;´(xi â&#x2C6;&#x2019;xĚ&#x2026;) 2 đ?&#x203A;´( yi â&#x2C6;&#x2019;y Ě&#x2026;) 2 ÎŁ x i yi â&#x2C6;&#x2019; â&#x2C6;&#x161;[ÎŁ đ?&#x2018;&#x2039;đ?&#x2018;&#x2013;2 â&#x2C6;&#x2019;
2 (â&#x2C6;&#x2018;đ?&#x2018;Ľđ?&#x2018;&#x2013; ) đ?&#x2018;&#x203A;
ÎŁ xi ÎŁyi đ?&#x2018;&#x203A;
][ÎŁ đ?&#x2018;Śđ?&#x2018;&#x2013;2 â&#x2C6;&#x2019;
2 (â&#x2C6;&#x2018;đ?&#x2018;Śđ?&#x2018;&#x2013; ) đ?&#x2018;&#x203A;
]
REGRESSION COEFFICIENT ď&#x201A;ˇ Correlation analysis tells us how close that relationship between 2 variables is ď&#x201A;ˇ Regression analysis tells us something about relationship between 2 variables, how one changes with the other, can be used to predict another. ď&#x201A;ˇ Equation :: yË&#x2020; ď&#x20AC;˝ a ď&#x20AC;Ť bx a is interceptďź&#x152;the value of y when X=0ďź&#x203A; b, slope, b, regression coefficient, the average units that y change when x change by 1 units.
Section C :: Simple questions 1. WHAT ARE THE SAMPLE MEAN (TO 2 DECIMAL PLACES) AND MEDIAN OF THE FOLLOWING DATA? 15, 21, 24,16,13,18 15+21+24+16+13+18 107 = 6 = 17.833 6 24+16 Median = 2 = 20 (đ?&#x2018;&#x17D;đ?&#x2018; đ?&#x2018;&#x2013;đ?&#x2018;Ąđ?&#x2018; đ?&#x2018;&#x2019;đ?&#x2018;Łđ?&#x2018;&#x2019;đ?&#x2018;&#x203A;, đ?&#x2018; đ?&#x2018;˘đ?&#x2018;&#x161; đ?&#x2018;&#x153;đ?&#x2018;&#x201C; đ?&#x2018;&#x161;đ?&#x2018;&#x2013;đ?&#x2018;&#x2018;đ?&#x2018;&#x2018;đ?&#x2018;&#x2122;đ?&#x2018;&#x2019; 2 đ?&#x2018;&#x203A;đ?&#x2018;˘đ?&#x2018;&#x161;đ?&#x2018;?đ?&#x2018;&#x2019;đ?&#x2018;&#x;đ?&#x2018; đ?&#x2018;?đ?&#x2018;Ś 2)
ď&#x201A;ˇ Mean = ď&#x201A;ˇ
2. WHAT IS DIFFERENCE BETWEEN SD AND SE? STANDARD DEVIATION (SD) is the square root of variance (đ?&#x2018;&#x2020; 2 ) ď&#x201A;ˇ Large variance or SD means : o More variable, wider range o Lower degree of representativeness of mean ď&#x201A;ˇ Small variance o SD means : o Less variable, narrow range o Higher degree of representativeness of mean ď&#x201A;ˇ SD measures how widely scattered the measurements are. SAMPLE ERROR (SE) is the measure of sampling error. ď&#x201A;ˇ SE measures the magnitude of sampling error. ď&#x201A;ˇ The larger SE, Larger the sampling error. ď&#x201A;ˇ SE measures how good our estimate of mean is. ď&#x201A;ˇ Larger SE means that it is not confident for us to just using samples mean as an estimation for population mean. 3. What is difference between CI and RI? REFERENCE INTERVAL ď&#x201A;ˇ A range of values within which majority of measurements from â&#x20AC;&#x153;normalâ&#x20AC;? subjects will lie ď&#x201A;ˇ Majority: 90%, 95%, 99% etc. CONFIDENCE INTERVAL
ď&#x201A;ˇ ď&#x201A;ˇ
A range [or an interval] of values used to estimate the true value of the population parameter. The level of confidence 1 â&#x20AC;&#x201C; Îą is the probability that the interval estimate contains population parameter usually 90% (Îą = 10%), 95% (Îą = 5%), 99% (Îą = 1%)
4. PLEASE GIVE 2 EXAMPLES FOR 3 TYPES OF DATA( OR VARIABLE). RESPECTIVELY. (quiz 1 ques.) QUANTITATIVE: ______ __________________ QUALITATIVE:
_____________________
___
ORDINAL DATA QUANTITATIVE: IQ, HEIGHT, WEIGHT, AGE, QUALITATIVE: GENDER, COLOR OF SKIN ORDINAL DATA DEGREE OF ACHE: MEDICAL EXAMINATION WITH RANKS; DEGREE OF SATISFACTION
5. THE MEANS AND SDS OF HEIGHT AND WEIGHT ARE FOLLOWED. HOW TO COMPARE THE VARIATION OF HEIGHT AND WEIGHT OF A GROUP OF BOYS OF AGE 12. weight: mean = 40 Kg
variance = 16 kg
height: mean = 140 cm variance = 28 cm
CV1=16/40=0.4 CV2=28/140=0.2 So weight has more dispersion than height
6. HOW TO CONTROL THE SAMPLING ERROR? 1. to increase sample size 2. to decrease variation, 7. WHAT ARE THE DIFFERENCES AMONG CARDINAL DATA, ORDINAL DATA? (EXPLAIN WITH EXAMPLES) CARDINAL DATA ***************************** NOMINAL DATA The categories are not ordered but simply have names. Example:: Blood group (A, B, AB, O) and marital status (married, single, widow etc) ORDINAL DATA The categories are ordered in some way. Example :: Disease staging system (Advanced, moderate mild, none) and degree of pain (severe, moderate, mild, none) 8. WHAT IS THE DIFFERENCE BETWEEN A PARAMETRIC TEST AND A NONPARAMETRIC TEST? PARAMETRIC TEST NON – PARAMETRIC TEST Mean and standard deviation Median and inter-quartile range Pearson’s correlation coefficient Spearman’s or Kendall’s correlation coefficient One sample sign test Sign test Two sample test Wilcoxon rank sum test Independent t-test Mann-Whiteney U or Wilcoxon Rank Sum test Analysis of variance Mann-Whiteney U test Repeated measures analysis of variance Friedmans ANOVA test
Quiz 1 Statistical Description 1. SUMMARY STATISTICS FOR TWO SAMPLES OF DATA ARE FOLLOWED:
Sample 1: mean=19 variance=10
Sample 2: mean=10 variance=19 WHICH SAMPLE HAS THE LARGER SPREAD OF OBSERVATIONS)? (
)
a) Sample 2 b) Sample 1 c) they have the same spread d) There is not enough information to answer the question. A) Sample 2 2. CONSIDER THE FOLLOWING ORDERED SET OF DATA(n=16),
44 49 50 51 |53 57 58 62| 66 66 68 71 | 75 77 80 85 WHAT IS THE INTERQUATILE RANGE (IQR)? 74 - 52 = 22 3. WHAT ARE THE SAMPLE MEAN (TO 2 DECIMAL PLACES) AND MEDIAN OF THE FOLLOWING DATA? 15, 21, 24,16,13,18,25,26,27 SAMPLE MEAN = 185/9 = 20.56; MEDIAN 21 4. PLEASE GIVE 2 EXAMPLES FOR 3 TYPES OF DATA( OR VARIABLE). RESPECTIVELY. QUANTITATIVE: ______ ________________ QUALITATIVE:
_____________________
___
ORDINAL DATA QUANTITATIVE: IQ, HEIGHT, WEIGHT, AGE, QUALITATIVE: GENDER, COLOR OF SKIN ORDINAL DATA DEGREE OF ACHE: MEDICAL EXAMINATION WITH RANKS; DEGREE OF SATISFACTION 5. WHICH MEASURE OF CENTRAL TENDENCY IS MORE SENSITIVE TO SKEWNESS? A) MEDIAN
B) MEAN
C) THEY ARE ALL ABOUT THE SAME
B) MEAN 6. WHAT IS THE LABEL FOR X- AXIS AND Y-AXIS OF HISTOGRAM? WHAT IS PRESENTED IN HISTOGRAM? NAME OF VARIABLE; FREQUENCY/RELATIVE FREQUENCY
D) NEITHER
7. IN A CLINICAL TRIAL ON THE DRUG FOR LUNG CANCER, WHAT IS THE POPULATION AND WHAT ARE THE POTENTIAL SAMPLES? THE MEASUREMENTS FROM ALL PATIENTS WITH LUNG CANCER / OR ALL PATIENTS WITH LUNG CANCER; SUBSETS OF PATIENTS THAT ENROLLED FROM AVAILABLE HOSPITAL/COMMUNITY. 8. WHEN A DISTRIBUTION IS SKEW TO THE RIGHT, WHAT IS THE RELATIONSHIP BETWEEN MEAN AND MEDIAN? MEAN IS LARGER THAN MEDIAN
Quiz 2 Probability model & parameter estimation 1. QUESTION ON PROBABILITY MODEL THE GESTATION TIME (THE TIME ELAPSED BETWEEN CONCEPTION AND BIRTH) FOR PREGNANCIES WITHOUT PROBLEMS IN HUMANS IS APPROXIMATELY NORMALLY DISTRIBUTED WITH A MEAN OF 266 DAYS AND A STANDARD DEVIATION OF 16 DAYS.( Z0.02=-2) WHAT PERCENTAGE OF PREGNANCIES LAST BETWEEN 234 AND 298 DAYS? 96% Z1 = (234-266)/16=-2 Z2 = (298-266)/16= 2 P( Z<-2) = 0.02 and P( Z>=2)=0.02 P(-2<Z<2) = 96% So P(234<x<298) = 96%
2. QUESTION ON CALCULATION OF REFERENCE INTERVAL Hb (HEMOGLOBIN) FOR 360 NORMAL MALE. THE MEAN CALCULATED FROM THE SAMPLE IS 13.45 g/100ml AND STANDARD DEVIATION 0.71 g/100ml; HB IS ASSUMED NORMAL DISTRIBUTION. ESTIMATE THE 95% REFERENCE INTERVAL FOR NORMAL FEMALE ADULT. DATA ON Hb OF NORMAL MALE ARE NORMALLY DISTRIBUTED AND WE HAVE SAMPLE MEAN 13.45, SD 0.71, SO TO CALCULATE 95% REFERENCE INTERVAL AS FOLLOWED: CL = 13.45 - 1.96 X 0.71 = 12.06 (g/100ml) CU = 13.45 + 1.96 X 1.71 = 14.84 (g/100ml) THE 95% REFERENCE INTERVAL( OR RANGE) OF HEMOGLOBIN FOR NORMAL MALE (12.06,14.84) g/100ml
3. QUESTION ON CALCULATION OF CONFIDENCE INTERVAL THE DATA ON HEIGHTS (CM) OF 36 BOYS OF AGE 3 ARE FOLLOWED. 99 104
102 87
105 101
105 101
104 106
95 103
100 90
114 107
108 98
103 101
94 107
105 105
101 94
109 111
103 104
98 87
96 117
IT IS KNOW THAT S = 6.69 , SAMPLE MEAN IS 101.77, ESTIMATE THE 95% CONFIDENCE INTERVAL FOR POPULATION MEAN OF HEIGHT OF BOYS OF AGE 3 USING Z=1.96.
For the height data are assumed normal distribution, so calculate the 95% CI as followed: 95%=1-α, soα=0.05, We have Z0.025 = 1.96, sample mean 101.77, SD 6.69, n = 36, so CL = 101.77 - 1.96 X 6.69/√36 = 99.5846 (cm) CU = 101.77 - 1.96 X 6.69/√36 = 103.9554 (cm) The 95 confidence interval of average height of boys of age 3 is (99.5846, 103.9554) cm. Interpretation: One is 95% confident that the height in boys of age 3 is between 99.58cm and 103.96cm.
98 102
Section D :: Complicated question Quiz 3 T TEST 1. SUPPOSE INVESTIGATORS WISH TO DETECT IF THE SMOKING SITUATION OF MOTHER WILL INFLUENCE THE LOW BIRTHWEIGHT OF BABIES. THE POPULATION MEAN OF BIRTH WEIGHT OF 105 OZ. A SAMPLE OF 25 BABIES WERE MEASURE WITH MEAN=95 OZ , SD=10. ASSUME BIRTH WEIGHTS ARE NORMALLY DISTRIBUTED. a) FOR THIS STUDY, ONE SIDE OR TWO SIDE TEST IS APPROPRIATE? b) THE BASIC STEPS FOR HT ATΑ= 0.05 AND MAKE CONCLUSION. A ) ONE SIDE TEST B) HYPOTHESES
H0 : μ = μ0 , the smoking situation of mother will not influence the low birthweight of babies.
H1 : μ < μ0 the smoking situation of mother will influence the low birthweight of babies. SIGNIFICANCE 0.05 LEVEL T VALUE
t
X | 95 105 | 5 10 / 5 s/ n
υ=25-1=24
T CRITICAL VALUE
t24,0.05=1.711, t24,0.025=2.064
CONCLUSION
t> t24,0.05=1.711, P<0.05, so reject H0, accept H1,
for one side test, we take 1.711 as the critical value.
We conclude that smoking may influence the birthweight of babies/ smoking mothers may have more low birthweight babies than non smoking mothers/ the birthweight mean of babies with smoking mother is significant different from that of babies with non-smoking mother.
Quiz 4 CHI SQUARE-TEST 1. IS LYING ABOUT CREDENTIALS BY JOB APPLICANTS CHANGING? TO SEE IF THERE IS A CHANGE IN THE PROPORTION OF APPLICANTS WHO LIED ABOUT HAVING A DEGREE, WE CAN COMPARE TWO PERIODS (SIX MONTHS APART). HERE ARE THE DATA (M&M 8.29): PERIOD LIED NOT LIED TOTAL 1 15 69 84 2
21
85
106
total
36
154
190
HYPOTHESES
H0:π1=π2, the proportions of …are equal: H1: π1≠π2, the proportions of …are not equal:
SIGNIFICANCE 0.05 LEVEL ACTUAL FREQUENCY
A11=15
THEORETICAL T11=15.9 FREQUENCIES
A12=69 A21= 21 A22=85 T12=68.1 T21=20.1 T22=85.9
CHI-SQUARE VALUE
( A T ) 2 15 15.9 69 68.1 T 15.9 68.1 2 2 21 20.1 85 85.9 0.117 20.1 20.1
CRITICAL VALUE
3.84
CONCLUSION
P<0.05,not reject H0, proportions of applicants who lied about having a degree from 2 stages are not different.
2
2
2
2. A GROUP OF INDIVIDUALS ARE ASKED A QUESTION ON PUBLIC AFFAIRS WHICH THEY TO ANSWER YES OR NO. AFTER A PROPAGANDA LECTURE THEY ARE ASKED THE SAME QUESTION AGAIN. ANSWER THE QUESTION OF WHETHER THE PROPAGANDA LECTURE WAS EFFECTIVE IN CHANGING THE PROPORTION SAYING YES. HERE IS THE DATA (DIXON & MASSEY): BEFORE AFTER YES NO
YES 30 9
NO 15 51
FORMULATE APPROPRIATE NULL AND ALTERNATIVE HYPOTHESES THAT CAN BE ADDRESSED WITH THESE DATA, CARRY OUT THE SIGNIFICANCE TEST ( " = 0.05) AND SUMMARIZE THE RESULTS IN TERMS OF THE PROBLEM. HYPOTHESES
H0:π1=π2, the proportions of …are equal: H1: π1≠π2, the proportions of …are not equal:
SIGNIFICANCE 0.05 LEVEL CHI-SQUARE VALUE
For b+c<40, so we calculate test statistics as 1.042
CRITICAL VALUE
12,0.05 3.84
CONCLUSION
, 2 12,0.05 3.84 or 1.042<3.84, P>0.05, the propaganda lecture was not effective in changing the proportion saying yes./ the proportion saying yes after the propaganda lecture was not different from that before.
( b c 1) 2 (6 1) 2 (b c) 2 2 or 1.042 bc bc 24 2
Quiz 5 RANK SUM TEST 1. WHAT ARE THE DIFFERENCES AMONG CARDINAL DATA, ORDINAL DATA? 2. WHAT IS THE DIFFERENCE BETWEEN A PARAMETRIC TEST AND A NONPARAMETRIC TEST? 3. THE ACTUAL CHANGE SCORES ON THE ELECTRORETINOGRAM (ERG), A MEASURE OF ELECTRICAL ACTIVITY IN THE RETINA, ARE PRESENTED FOR EACH PATIENT IN TABLE 9.2 ERG CHANGE SCORES FOLLOWING SURGERY FOR RP (Berson et al., 1996) TableE4.1 NO. OF PATIENT
SCORE
SIGNED RANK
1
−0.238
-8
2
−0.085
-3
3
−0.215
-6
4
−0.227
-7
5
−0.037
-1
6
+0.090
+4
7
−0.736
-10
8
−0.365
-9
9
−0.179
-5
10
−0.048
-2
aThe change scores = ln(ERG amplitude) at follow-up − ln(ERG amplitude) at baseline. A negative score indicates decline. EVALUATE THE SIGNIFICANCE OF THE RESULTS WITHOUT ASSUMING THE CHANGE SCORES ARE NORMALLY DISTRIBUTED. WHAT DO THE RESULTS MEAN? HYPOTHESES
H0 : M1 = 0 , the proportions of …are equal: H1 : M1 ≠ 0 , the proportions of …are not equal:
SIGNIFICANCE 0.05 LEVEL TEST STATISTICS
ranking the absolute difference and add original plus or minus to get signed ranks, the calculate T+ and T-. T+=4 T - = 51
CRITICAL VALUE
T0.05 = 8 - 47
CONCLUSION
T does not falling in the critical interval, so P < 0.05, reject H0, accept H1, the change between … and… is significant or
4. SUPPOSE WE WANT TO COMPARE THE LENGTH OF HOSPITAL STAY FOR PATIENTS WITH THE SAME DIAGNOSIS AT TWO DIFFERENT HOSPITALS. THE RESULTS ARE SHOWN IN TABLE FOLLOWED. TableE4.2 Comparison of length of stay in 2 hospitals FIRST 10 HOSPITAL
32
68
8
44
29
5
13
21
10
60
87
76
125
60
35
73
RANK SECOND 27 HOSPITAL
96
44
RANK
HYPOTHESES
H0 : M1 = M2, the median lengths of hospital stay for patients from 2 hospitals are equal; H1 : M1 ≠ M2, the median lengths of hospital stay for patients from 2 hospitals are not equal:
SIGNIFICANCE 0.05 LEVEL TEST STATISTICS
Ranking the length of hospital to get ranks, the calculate T1 and T2, Take the T with small size as T. n1 = 9, T1= n2 = 12, T2=
CRITICAL VALUE
T0.05 = 71 - 127
CONCLUSION
1. T TEST FOR TWO INDEPENDENT GROUP ASSUMPTIONS 1. Two samples are independent 2. The two sample sizes are small. Ie, n1 ≤ 30 And n2 ≥ 30 3. Both samples are simple random samples TEST STATIC FOR TWO MEANS Independent and Sample Size Variance are not equal
238
ď&#x192;&#x2DC; t=
(đ?&#x2018;ĽĚ&#x2026; 1 â&#x2C6;&#x2019;đ?&#x2018;ĽĚ&#x2026; 2 )â&#x2C6;&#x2019;(đ?&#x153;&#x2021;1 â&#x2C6;&#x2019;đ?&#x153;&#x2021;2 ) đ?&#x2018; 2 đ?&#x2018; 2 â&#x2C6;&#x161; 1 + 2 đ?&#x2018;&#x203A;1
đ?&#x2018;&#x203A;2
ď&#x192;&#x2DC; Degree of degree of freedom = n1 + n2 â&#x20AC;&#x201C; 2 ď&#x192;&#x2DC; Independent and small samples variance are equal (đ?&#x2018;ĽĚ&#x2026; â&#x2C6;&#x2019;đ?&#x2018;ĽĚ&#x2026; )â&#x2C6;&#x2019;(đ?&#x153;&#x2021;1 â&#x2C6;&#x2019;đ?&#x153;&#x2021;2 ) t= 1 2 2
2
(đ?&#x2018;&#x203A; â&#x2C6;&#x2019;1)đ?&#x2018; 1 +(đ?&#x2018;&#x203A;2 â&#x2C6;&#x2019;1)đ?&#x2018; 2 â&#x2C6;&#x161; 1 đ?&#x2018;&#x203A;1 +đ?&#x2018;&#x203A;2
ď&#x192;&#x2DC; Degree of degree of freedom = n1 + n2 â&#x20AC;&#x201C; 2 EXAMPLE ď&#x192;&#x2DC; Do males and females significantly differ on their level of math anxiety ? IV : Gender (2 groups: males and females) DV : Level of math anxiety ď&#x192;&#x2DC; Do older people exercise significantly less frequently than younger people ? IV : Age (2 groups: older people and younger people) DV : Frequency of getting exercise STEP 1 :: State the Hypothesis H0 : The null hypothesis states that two samples comes population. In other words. There is no statistically significant difference between two groups on the dependent variable. Symbols :: Non â&#x20AC;&#x201C; directional : H0 : Âľ1 = Âľ2 Directional : H0 : Âľ1 â&#x2030;Ľ Âľ2 or H0 : Âľ1 â&#x2030;¤ Âľ2 HA : The alternative hypothesis states that the two samples come from different populations. In other words, These is a statistically significant difference between the two groups n the dependent variable. Symbols :: Non â&#x20AC;&#x201C; directional : H0 : Âľ1 â&#x2030; Âľ2 Directional : H1 : Âľ1 > Âľ2 or H1 : Âľ1 < Âľ2 STEP 2 :: Set a Criterion for Rejecting Ho ď&#x192;&#x2DC; Compute degrees of freedom ď&#x192;&#x2DC; Set alpha level .001, .01, .05, or .10 etc ď&#x192;&#x2DC; Identify critical value(s) - Directional or Non â&#x20AC;&#x201C; directional - To determine your CV(s) you need to know : o df â&#x20AC;&#x201C; if df are not in the table, use the next lowest number to be conservative o directionality of the test o alpha level
STEP 3 :: Collect data and Calculate t statistic
STEP 4 :: Compare test statistic to criterion
STEP 5 :: MAKE DECISION Fail to reject the null hypothesis and conclude that there is no statistically significant difference between the two groups on the dependent variable, t = , p > α or Reject the null hypothesis and conclude that there is a statistically significant difference between the two groups on the dependent variable, t = , p < α 2. A study was carried out to investigate the effectiveness of a treatment.50 subjects participated in the study, with 25 being randomly assigned to the “treatment group” and the other 25 to the “control (or placebo) group”. Some results are listed as follows. Please answer these questions a. what are the hypotheses for this question? b. If the t value of significance test is 7.139, how much df and p value? (the critical value is t0.05/2,40=2.021 or t0.05/2,50=2.009) c. What is statistical conclusion? d. what is medical conclusion?
3. CHI SQUARE TEST CONTIGENCY TABLE An r x c contingency table shows the observed frequencies for two variables. The observed frequencies are arranged in r rows and c coloumns. The intersection of a row and a coloumn is called a cell TEST STATISTIC FOR A CHI SQUARE TEST : (đ??´ â&#x2C6;&#x2019; đ?&#x2018;&#x2021;)2 đ?&#x153;&#x2019; = â&#x2C6;&#x2018; đ?&#x2018;&#x2021; 2
Where A ď&#x192; The actual frequencies T ď&#x192; Theoretical frequencies Degree of freedom V = (r - 1)(c - 1) Theoretical frequency for a cell Trc in a table is, đ?&#x2018;&#x203A;đ?&#x2018;&#x; đ?&#x2018;&#x203A;đ?&#x2018;? (sum of raw r) x (sum of column c) Trc = đ?&#x2018;&#x203A; = (sample size)
SPECIFIC FORMULA OF CHI SQUARE TEST FOR FOUR-FOLD TABLE : + TOTAL A b a+b I C d c+d II a+c b+d n TOTAL đ?&#x153;&#x2019;2 =
(đ?&#x2018;&#x17D;đ?&#x2018;&#x2018;â&#x2C6;&#x2019;đ?&#x2018;?đ?&#x2018;?)2 đ?&#x2018;&#x203A; (đ?&#x2018;&#x17D;+đ?&#x2018;?)(đ?&#x2018;?+đ?&#x2018;&#x2018;)(đ?&#x2018;&#x17D;+đ?&#x2018;?)(đ?&#x2018;?+đ?&#x2018;&#x2018;)
STEPS :: 1. Hypothesis H0 : Ď&#x20AC;1 = Ď&#x20AC;2 HA : Ď&#x20AC;1 â&#x2030; Ď&#x20AC;2 2. Significance level Îą = 0.05 3. đ?&#x153;&#x2019; 2 Statistics đ?&#x153;&#x2019;2 = â&#x2C6;&#x2018;
(đ??´â&#x2C6;&#x2019;đ?&#x2018;&#x2021;)2 đ?&#x2018;&#x2021;
4. RANK SUM TEST Part I: WILCOXON RANK SUM TEST ď&#x201A;§ Rank Sum Test for Comparing the Locations of Two Populations ď&#x201A;§ Mann-Whitney test ď&#x201A;§ review t-test for comparing 2 population means Normality and homogeneity
EXAMPLE 1: Table 9.1 Survival Times of Cats & Rabbits without oxygen
STEP I: TEST HYPOTHESIS AND SIG. LEVEL H0:M1=M2 population locations of survival time of both cat and rabbit are equal H1: M1 ≠ M2 population locations of survival time of both cat and rabbit are not equal ; a = 0.05 STEP II: STATISTIC ASSIGN RANKS To pool n1 +n2 observations to form a single sample rank all observations of the pooled sample from smallest to largest in column 2 and 4 Mid-ranks are used by tied values
Calculate the rank sums for the two samples respectively, denotes by T1 and T2. Take the Ti with small n as T. n1=8<n2=12, so T= T1 =127.5. Sum(T1 ,T2)=N(N+1)/2=210
STEP III: DETERMINE P VALUE, CONCLUSION From table in appendix E, by n1=8 n2-n1=4, we have the critical interval of Tα (58-110) Since T=127.5, is beyond of Tα, so, P≤α。Given α=0.05, P<0.05; H0 is rejected, it concludes that the survival times of cats and rabbits in the environment without oxygen might be different. Cat will survive for longer time without oxygen. BASIC LOGIC N=N1+N2 GIVEN N, the total rank sum is fixed and can be calculated . If H0 is true, the total rank sum should be assigned between 2 groups with weight of n i. N ( N 1) n1 ( N 1) n2 ( N 1) 2 2 2 NORMAL APPROXIMATION n1>10 or n2-n1 >10
Z
T n1 ( N 1) / 2 0.5 n1n2 ( N 1) / 12
N n1 n2
Correction of ties Zc Z / c
C 1 (t j3 t j )/(N 3 N)