Laboratory Tests of Sampling Techniques BY NORMAN C. MEIER AND CLETUS J. BURKE Considerable controversy has taken place between those who favor quota sampling and those who advocate area sampling in opinion polling. This article does not attempt to give a final solution to the controversy, but points out certain limitations to both methods, and suggests that the most reliable sampling procedure may well vary according 60 the type of problem under investigation. Dr. Meier is Director of the Bureau of Audience Research and Associate Professor of Psychology, University of Iowa. H e is also technical consultant t o the Iowa Poll, and has served as special consultant in audience measurement and in legal cases involving surveys. Mr. Burke is Research Assistant in the Department of Pyschology at the University of Iowa. Each of the two general types of sampling techniques used in large-scale surveys, briefly referred to as quota methods and area methods, has adherents who claim superiority for it over the other. The quota designation refers to methods which feature assignments to interviewers of types of respondents, specified as to age, sex, income group or any other stratification of the general population that may correlate with the objective of the survey. The returns should be proportional to the existence of these strata in the universe sampled. The operation of such a check function gives to the methods the designation: quotacontrol, representative, stratified sampling. Within the assignment the interviewer has considerable freedom in choosing the actual respondents. Area methods require that interviews be made only in specified, circumscribed areas and that within these areas eyery pertinent individual, or a strictly random subsampling of such individuals,
be interviewed. The areas are selected by drawing-ideally from a listing of all areas into which the universe is divided-by chance methods, usually from a table of random numbers. The authors are particularly interested in the application of sampling methods to the determination of public opinion. All public opinion survey groups use quota methods except one, the Washington Post Poll, which is using area methods on an experimental basis. The Bureau of the Census and the Bureau of Agricultural Economics are the chief users of area methods. The area samplers initiated the present ~ontroversy.~ They state that their method provides ( I ) freedom from bias 1 Hearings before the Committee to Investigate Campaign Expenditures, House of Representatives, 78th Congress, Part 12 (Anderson Committee, Report o f Technical Sub-Committee), Washington, D.C.: U.S. Government Printing Office (1945); and Hansen, Morris H. and Hauser, Philip M., "Area SamplingSome Principles of Sample Design," Public Opinion Quarterly, g, No. 2 (1945).
LABORATORY TESTS O F SAMPLING TECHNIQUES in the long range, statistical sense, (2) an equal chance of representation for every pertinent individual in the universe to be sampled, and (3) an estimate based on probability theory of the size of the sampling error to be expected with a given size of sample from a given number of areas. They contend that, because of the "fortuitous" selection permitted the interviewer, the above conditions are not met by quota methods, conceding, however, that the quota methods are useful where a high degree of precision is not required. The quota samplers contend that, since the number of categories which they control is fairly large, the chance of bias is small. They anticipate the ways in which bias can occur and attempt, in a series of detailed instructions for their interviewers, to eliminate long range bias. They point to the relatively large cost of using the area method as a point in their favor. They admit that no exact solution for the stutisticut reliability of quota polls has been achieved but contend that, because of the sampling within categories, their results are at worst as variable as those obtained in simple random sampling so that the confidence limits based on simple random sampling will tend to be conservative. They point to past successes in predicting such things as election outcomes as evidence that their methods are sound. This paper will not enter further into the points of controversy or the statistical arguments for the use of either method because of space limitations. Such contention has, thus far, done little to bring about agreement. The best attack lies in the actual comparison of the results obtained for check variables whose distribution in the universe is
587
known, by simultaneous application of the two methods in the field.2 Another possibility is afforded by constructing samples in the laboratory, using the interview records from a completed survey as the statistical universe. There are certain limitations to the conclusions which can be reached from such studies; yet it is believed by the authors that information of value can be obtained. The results of one such study are reported here. Procedure
A house-to-house survey in Iowa City, in connection with housing, yielded a fairly large coverage of the city, providing data on income distribu.tion, owner-renter status, and occupation. Taking the actual coverage as the known universe, the percentages of residents in each income bracket, ownerrenter category, and occupational group were obtained. The entire collection of records was then arranged by street and number, so that samples .could be designated on a map of the city and the pre-contacted respondents thus be selected by location. Extracting these records from the file provided the opportunity to ascertain the characteristics of each sample so drawn and to make statistical comparisons of the deviations of sample results from the figures. Thus, the relative validity of the two methods in this restricted context can be ascertained. This particular type of laboratory test has these features: I. It partially eliminates the human element. There are no interviewers. % sfudy of this nature, by Charles F. Haner, under direction of the first author, has been under way for two years and will soon be completed and published.
PUBLI'C OPINION QUAR TERLY, WINTER 1947-48 Hence there can be no charges of poor interviewing skill or errors in judgment of selecting respondents. On the other hand, this eliminates the chance for bias to enter through the interview itself, a chance which is present in public opinion polling. Consequently, the reader shall bear in mind that in this respect the laboratory situation differs from the situation in actual opinion polling. 2. The characteristics of the universe are known. There can be no question of adequacy or inadequacy of historical data, such as outdated Census figures in population, occupation, and other characteristics. The data are known exactly as of the time when the samples are drawn. 3. Inasmuch as individual attitudes and opinions constitute public opinion which does not itself exist in any permanent measurable universe, validity of public opinion measurement must be approached through correlated factors. Socioeconomic status and occupation, generally believed to be so correlated, are selected for attention in this study. In this connection it should be noted that one of the basic controls on the quota samples is the location within the community as based on general income level contours. A preliminary study, made in 1946 by Stanley Skiff3 and the first author, simply compared the closeness with which the characteristics of each sample drawn approximated the known characteristics of the universe. Five samples by each method were drawn, of sizes of 15, 20, 25, 50, and IOO "respondents" each. Twenty-five points of comparison were possible. Of these, quota samples were closer to the known characteristics of the universe in 16; area samples were
better in four; and in five instances there was little or no difference. This study was then projected. The area samples were redrawn, using an adaptation of area sampling by Watson4 which meets the standards set up by the area samplers but is somewhat simpler in operation than the methods in general use. The community was divided into 370 areas. An area factor of 5/370 was constant through all the samples. On the basis of an estimated 4,500 households, the individual selection factor for the sample size 15 was 0.25; for the size 20, it was 0.32; for the size 25, it was 0.41 and for the 50 size sample, 0.82. The selection of individuals within selected areas was made by reference to a table of random numbers.. The quota samples were designated from a map on which rent and general income level contours had been indicated by a real estate man, as is expected in quota sampling. Four practices, all in general use, were followed in designating respondent interview points: every household along a given street, every third household, any five households per block, and every household along intersecting streets. In the preliminary study area samples derived by the Iowa State College Statistical Laboratory were used; in the expanded study the area samples were worked out by the authors in accordance with the Watson methodology. 3 Skiff, Stanley C., "Validity of Laboratory Constructed Samplcs by Area and Quota Methods." Unpublished M. A. Thesis, State University of Iowa, August, 1946. 4 Watson, Alfred N., Respondent Pre-Selecdon. A statistical method of reducing interviewer bias in market surveys. Philadelphia: Curtis Publishing Co., 1946.
LABORATORY TESTS OF SAMPLING TECHNIQUES Statistical Treatment
The following hypotheses were tested: a. Are the four methods of quota sampling significantly different? b. Is there bias in the quota samples? c. Are the quota samples significantly different from the area samples? A few points of qualification on the statistical procedures used are in order. The universe was not very large, but its finite size was not taken into account. Since all samples dealt with were quite small, the error made in using the formulas for an infinite universe was negligible. The basic formulas used in computing the statistics were those which apply to the variation of a percentage or proportion in samples of a given size when simple random samples are drawn. In effect, this means that simple random samples were used as a touchstone and that any comparisons between the area and quota methods are relative to what would happen in simple random samples of the same size. Although it is not exact, this procedure was thought to be preferable to other possible procedures. It would have been possible to use more accurate formulas foi computing the sampling variation in the area sample^,^ but the assumptions from which these formulas were derived are not satisfied in the case of the quota samples. It was decided, therefore, that the ordinary formulas for the simple random samples would ~rovideas good a common basis for comparison of the two methods as any other. This means that any comparisons made between either of the two methods and simple random sampling can be regarded as exact.
589
A graphical interpretation of the kind of test used is given for one particular sample in Figure I. It will be seen that on this particular sample the quota method is more precise than the area, which is less precise than simple random sampling. Direct comparisons between the two methods, however, are not exact in any statistical sense but must be regarded as approximate tests. In terms of the actual tests used, the values of chisquare which occur below may be regarded as exact tests of either type of sampling against simple random sampling, whereas the values of F are to be thought of as approximate tests of the relative efficiency of the two methods. In the cases where binomial tests based on the values of F are used, these tests are exact. There is another sense in which the tests used here differ from the tests in ordinary use: advantage is taken of the known population figure. T o estimate the variance of sample percentages the population per$entage is subtracted from each member of a group of sample percentages to obtain residuals. The expected value of the squares of such residuals is estimated and the estimate is taken as proportional to chi-square. In a procedure of this sort, any long range bias will show up as an increase in the obtained value of chi-square. If, however, one method should happen to have a small bias and a small sampleto-sample variation and the other method has no bias but a large sampleto-sample variation, the biased method may appear better than the unbiased 5 Haner, Charles F., "The Adaptability of Area Sampling for Public Opinion Measurcment." Ph.D. Thesis, State University of Iowa, Iowa City, August 1947, pp. 25-27.
590
PUBLI'C OPINION QUARTERLY, WINTER 1947-48 FIGURE
I
THE CURVES ON THIS FIGURE REPRESENT ESTIMATIONS, BASED O N THE SAMPLE RESULTS, OF THE DISTRIBUTION OF PERCENTAGES THAT WOULD BE EXPECTED I N LARGE NUMBERS OF COMPARABLE SAMPLES. THE CIRCLES AND TRIANGLES REPRESENT THE OBTAINED PERCENTAGES IN THE SAMPLES.
LABORATORY TESTS O F SAMPLING TECHNIQUES one. This reflects the fact that, on the average, the individual samples from the biased method would provide closer approximations to the actual population figures than would individual samples from the unbiased method; in other words, the expected square deviation would be smaller for the unbiased method than for the biased method. In every case, sub-groups of the check variables were lumped into two groups in such fashion as to make the population figures close to 50 per cent for each of the groups. Results
a. Are the four methods diflerent? Income Factor. Inasmuch as the C and D per cents combined are close to 50 per cent, these were used to construct a table with the values for each of the four columns and the four sample sizes for the columns. The population value of 52.1 was subtracted from each sample percentage and these values multiplied by the N, yielding a table of comparable estimates of the same variance, and making it possible to apply analysis of variance to the table. Since the population mean is known and need not be assumed, the procedure differs in this respect from that normally used. With 3 degrees of freedom, estimates of the population variance from within groups, and with four degrees estimates from the sum of the squares of the means, the F value was found to be 4.09. Since this is not large enough to be significant, it was concluded that the four types of quota samples may be considered equivalent and for subsequent computations they were lumped. H o m e Oiunership. Following the same procedure the value of F was
591 found to be 4.68, which again is far from significant. Hence these values were p o l e d for the four methods in order to test further hypotheses. .-
b. Is the quota sample biased? Income Factor. Lumping the values for C and D income and per cent of home ownership, and knowing the population percentages, use was made of the fact if P is the population per cent, the percentages in samples of size N are normally distributed with mean equal to p and variance equal to P(IOO-P)
N The percentage of C's and D's in the lumped sample,was 52.8 per cent. Applying the critical ratio formula, the value of 0.28 was obtained. This small value would be exceeded bv chance 78 per cent of the time. Hence the hypothesis that quota sampling, in this context at least, is biased cannot be maintained. H o m e Ownership. Applying the same procedure to the home ownership percentages (p = 55.4; of combined samples 63.0 per cent) a C.R. of 3.18 was found. Since this value would be exceeded by chance only 14 .times in 10,000, it. is indicated that there has been an unwitting selection of a high house ownership ratio, evidence of a definite bias. Area Samples. Although by definition the area method of selection makes bias theoretically impossible under ideal conditions of selection, the same tests were applied to the area lumped samples. For income, the C.R. was 0.48. A value this large would occur by chance 63 per cent of the time.
PUBLI'C OPINION QUARTERLY, WINTER 1947-48 592 For home ownership, the C.R. was ability that one will give a lower value 0.94. A value this large would occur by of x2 is 54. The probability that one will be better than the other 4 times out chance 35 per cent of the time. of 4 is (54)4 or 1/16. c. Are quota samples different from Thus it may be asserted that at the area samples? 6.25 level the quota is better. T o test the relative efficiency of the For HomeOwnership: two types of sampling, for each sample X z = 4.62 (df = 4) N = I5 size the deviation of the obtained perQuota X 2 = I 1.5 centages from the population value was F = 2.49 ( A better)~ ~ ~ computed. N = 20 Area X 2 = I 1.83 (df =4, The C's and D's were grouped as be4) fore, as were also the home-ownership Quota x2 = 8.21 values. From the observed deviations F = I.44 ( Q better) ~ ~ ~ ~
the variance of such deviations was esX % = 2.36
N = 25 timated. Then the F ratio of the vari8.40
Quota "X ances was obtained. F = 3.56 (Area better)
For Income: N = 50 Area X 2 = 6.40 Quota X 2 = 3.75 I. N = 15 Quota X 2 = 1.03 Area X 2 = 5.47 F = 1.03 (df = 4, 4) (Quota betF = 1.46: Not significant at the 10 ter ) per cent level.' None of the above F's are significant. Quota X 2 = 5.55 2. N = 20 The combined X 2 are: Area x2 = 6.15 Area X 2 = 25.21 (df = 16) : SignifiF = 1.1I : Not significant at the 10 cant at the 7 per cent level (about). per cent level. Quota X 2 = 34.33 (df = 16): Signif3.76 3. N = 25 Quota "X icant at a level considerably higher Area X 2 = 7.25 than I per cent. F = 1.93: Not significant at the 10 F = 1 . ~ 6(df = 16, 16) : Not signifiper cent level. cant 2t 10 per cent level. 4. N = 50 Quota X 2 = 2.73 (Area better) Area X 2 = 6.20 The binomial test favors neither F = 2.27: Not significant at the 10 method. per cent level. For Occupation. By grouping occuSumming the values of x2: pations into four classes: professional Composite X 2 (area) = 25.07 (df = and managerial; clerical and sales; serv16): Significant at the 7 per cent or 8 ice; and trades; and then applying analper cent level. ysis of variance, it was found: Composite x 2 (quota) = 15'79 (df , T h e reason for using the 10 per cent level
= 16) in place of the 5 per cent level given in the
F = 1.59 (df = 1616): Not signifi- table is that the hypothesis is double-ended,
cant at 10 per cent level, but whereas the table has been calculated for
sinyle-ended hypotheses. Sre, for example,
quota is better. Snedecor, G., Statistical Methods (4th Ed.),
Binomial test: If neither method is defIowa: The Collegiate Press, 1946, initely better than the other, the prob249.
LABORATORY TESTS O F SAMPLING TECHNIQUES Area X 2 = 5.94 (df = 4) Quota x2 = 3.80 (df = 4): Neither is significant. F = 1.57 (df = 4, 4) : Favors quota method but far from significant. No binomial test is possible. On the assumption that the correlations between the various check variables are not large, a composite chisquare for all variables on each method can be taken as an over-all comparison of each method with simple random sampling, by summing all of the chisquares and all of the degrees of freedom. This gives: X2
Area Quota
df
Level of Significance
46.22 53.92
28% 36 36 5% If these values of chi-square arose from accidents of random sampling, the effect of correlations between the variables would be to decrease the number of degrees of freedom and, hence, to enhance the significance of the above values.
Conclusions
The conclusions must be rather tentative in so far as none of the experimental results were clear-cut. None of the differences, for example, were significant at the 5 per cent level.
593
In this context:
( I ) The quota method comes out better on the point for point comparison. (2) The results on home-ownership indicate the possibility of unwitting bias in the quota samples. (3) On income alone, the quota method is better than simple random sampling. (4) There is some evidence that neither mcthod is as efficient as simple random sampling for the composite results on all three variables. (This statement is true for the area samples by definition, of course.) We should not extrapolate these results directly to ~ u b l i copinion polling. Both methods operate under severe limitations in this study. The area method is based on a very small number of areas for each sample. The quota method is severely penalized by the lack of opportunity for the kind of classification within the general income level contours which is open to the interviewer in the field. This study shows that, in all probability, the differences between the results obtained by the two methods are not so great that a clear-cut superiority for one or the other can be easily demonstrated. Some evidence has been obtained; this should be considered along with evidence from future experiments and controlled field studies.