AREA SAMPLING- SOME PRINCIPLES OF SAMPLE DESIGN BY MORRIS H. HANSEN AND PHILIP M. HAUSER WHEREVER polling men gather, they are likcly to debate the quota vs. the area sampling methods. Wherever non-technicians interested in the polls gather, they are likely to want to know the meaning of the two esoteric-sounding terms. This article contributes both to the explanation and to the debate. Messrs. Hansen and Hauser describe the principles on which area sampling is based and indicate the types
of situations in which they believe the one method is to be preferred over the other. Morris Hansen and Philip Hauser, both distinguished statisticians, write with the authority of leaders in the development of the area sampling method. Mr. Hansen is now Statistical Assistant to the Director of the U.S. Bureau of the Census; Mr. Hauser, Assistant Director of the U.S. Bureau, of the Census.
CoNsmxmLE ATTENTION has been devoted in the past months by survey and public opinion poll organizations to problems of sampling design. In part, this is attributable to the developments and innovations in sampling techniques and procedures in the statistical work of the Federalrgovernment, particularly in the Bureau of the Census and in the Bureau of Agricultural Economics, and, in part, to the investigation of the 1944 election poll of the Institute of Public Opinion by the House Committee to Investigate Campaign Expenditures.' This attention, however, undoubtedly reflects the continuing interest' of survey organizations in the improvement of techniques in all phases of their activities, including sampling. In the following an attempt is made to indicate the types of situations in which one method of sampling is to be prderred over another and to describe the principles on which area sampling is based. In this discussion of sampling methods, we shall consider only the discrepancies between the results obtainable from a complete enumeration of the population under consideration and the estimates made from a sample. Errors of interviewing and other errors arising in survey results that are present in a complete enumeration as much as in a sample enumeration may be either more or less important than sampling errors. We shall confine our remarks here to errors arising because only a sample is covered instead of taking a complete census of a finite population. l Hearings before the Committee to Investigate Campaign Expenditures, House of Representatives, 78th Congress, 2nd Session o n H. Res. 551, Part 12, U.S. Government P ~ i n t i n g Ofice, Washington, I 945.
I 84
PUBLIC OPINION QUARTERLY, SUMMER 1945
The science of sampling design involves: ( I ) looking at the resources available, the restrictions within which one must work, the mathematical and statistical tools available, the accumulated knowledge of certain characteristics of the populations to be sampled; and (2) putting these together to arrive at the optimum design for the purpose at hand. Ordinarily, there are many alternatives of sample designs, and an understanding of alternative designs and an analysis of their eficiency is necessary if a wise choice is to be made. THE CRITERIA FOR SAMPLE DESIGN
The over-all criterion that should be applied in choosing a sampling design is to so design the sample that it will yield the desired information with the reliability required at a minimum cost; or, conversely, that at a fixed cost it will yield estimates of the statistics desired with the maximum reliability possible. Various restrictions and limitations may necessarily be imposed upon the design other than mere cost restrictions. In wartime those restrictions have to do with tho number of interviewers with cars and with gasoline rationing, as well as with the ordinary restrictions on time, personnel, etc. A second criterion that a sample design should meet (at least if one is to make important decisions on the basis of the sample results) is that the reliability of the sample results should be susceptible of measLlrement. Methods of sample selection and estimation are available for which the risk of errors in the sample estimates can be measured and controlled. If such methods are used, as the size of the sample is increased, the expected discrepancies between the estimated value from the sample and the true value (i.e., the value that would be obtained from a complete census) will decrease. With such methods one can know the risk taken that the error due to sampling will exceed a specified amount. This risk (i.e., the risk that the error will exceed any specified amount) can be made as small as desired by taking a sample of adequate size. An essential feature of such sampling methods is that each element of the.population being sampled (housewives, voters, or whoever is being interviewed) has a chance of being included in the sample and, moreover, that that chance or probability is known. The knowledge of the probability of inclusion of various elements of the population makes it possible to apply appropriate weights to the sample results so as to
AREA SAMPLING-SOME
PRINCIPLES OF DESIGN
185
yield "consistent" or "unbiased:' estimates, or other estimates for which the risk of error can be measured and controlled.' On the assumption that the above criteria should govern in the selection of a sample design, we are ready to consider the relative merits of the "quota" method, which is commonly used in opinion and market surveys, and the "area sampling" method. The quota method in its essence involves: ( I ) the choice of selected characteristics of the population to be sampled, which are used as "controls"; (2) the determination of the proportion of the population possessing the characteristics selected as "controls"; and (3) the fixing of quotas for enumerators who select respondents so that the population interviewed contains the proportion of each class as determined in ( 2 ) . These specifications cannot provide sample estimates for which the risk of error can be measured because they do not provide for the selection of persons in a way that permits knowing the probabilities of selection. Errors in the setting of quotas may introduce unknown differences in the probabilities of selection of persons for inclusion in the sample. Moreover, because of the latitude permitted the enumerator in selection of respondents, it is less probable, for example, that the sample will include a housewife without children who works away from home than a woman with children who does her own housework, even though each may be in the same "control" group. Because the probabilities of inclusion in the sample of various classes of elements are unknown, the estimates frequently made of sampling error of quota sample results, supposedly based on sampling theory, usually are erroneous. A fuller treatment of the difficulties and limitations of the widely known and used quota method is given el~ewhere.~ AREA SAMPLING
Area sampling eliminates dependence on the assignment of quotas that may be more or less seriously in error, and does not permit the inHere the words "consistent" and "unbiased" are used technically, and have a mathematical definition. See J. Neyman, "Lectures and Conferences on Mathematical Statistic?," Graduate School of the U.S. Department of Agn'ctulture, Washington 25, D.C., 1938, p. 131; and R. A. Fisher, "Statistical Methods for Research Workers," 6th edition, p. 12, Oliver and Boyd (1936), Edinburgh. 3 Philip M. Hauser and Morris H. Hansen, "On Sampling in Market Surveys," T h e Iot4rnal o f Marketing, July 1944; and Alfred N. Wation, "Measuring the New Market," Printers' Ink, June 2, 1944, vol. 207, No. 9, pp. 17-20.
I 86
PUBLIC OPINION QUARTERLY, SUMMER 1945
terviewer discretion in the choice of the individuals to be included in the sample. With appropriate methods of designating areas for coverage in the sample, the probabilities of inclusion of the various elements of the population are known, and consequently the reliability of results from the sample can be measured and controlled. Area sampling, of course, is not the only method that produces such results, but it is ÂŁrequently an effective method. To illustrate how and why area sampling works, suppose we are interested in sampling for certain characteristics of the population in a city. For example, we may want to know the total number of persons in certain broad occupational groups, and the number within each of these occupational groups who have a particular opinion, read a specified magazine, or are in a certain income class. To estimate the total number of persons having the various characteristics mentioned above, we might proceed by first making an up-todate list containing the name of every person, or, at considerably less expense, identifying every address or household, in the area to be surveyed, and then selecting a sample from this listing. Through taking a randon sample from such listings of individuals or of households (interviewing all persons within the selected households if households are sampled), we could, with an adequate size of sample, obtain an excel1,ent cross-section of the people in the city for any problem. This procedure would lead to highly reliable sample results, but frequently it is not practical for a number of reasons-the principal one being that preparing a listing would cost too much. Moieover, 'even where a complete pre-listing is already available, it may be too costly to interview the widely scattered sample that would be obtained by sampling individuals (or households) at random from such a listing. One method of getting a reduction in cost over sampling individuals from a pre-listing is to use an area sampling method in which the individuals interviewed are clustered into a selected set of sample areas. In "area sampling" the entire area in which the population to be covered is located is subdivided into smaller areas, and each individual in the population is associated with one and only one such small area -for example, the particular small area in which he resides. Neither the names nor numbers of persons residing in the areas need be known in advance. A sample of these small areas is drawn, and all or a sub-
AREA SAMPLING--SOME PRINCIPLES OF DESIGN
187
sample of the population residing in the selected areas is covered in the survey. A simple illustration will show that if a complete list of areas is available and a random selection of a sample of areas is made, and if the population of these sample areas is completely enumerated, then the chances (or probabilities) of being included are the same for each individual in the population. Moreover, on the average, the population surveyed within such a sample will reveal precisely the characteristics of the entire population from which the sample was drawn. A sample can be made as reliable a cross-section as desired, for any characteristics whatever, by merely increasing the size of the sample. Thus, if the population is changing in character, a random cross-section of small areas will reveal those shifts. Suppose, for illustration, that we wish to draw a sample out of a universe of five blocks, and that everyone living in the selected blocks will be interviewed. We shall assume certain values for each block for the total number of votes for a specified candidate, as is shown below, although the illustration will work in the same way whatever values are assumed for the total vote or for any other characteristic. Block No,
Votesfor a Specified Candidate
I
i
3 4 5
Total
19
Each of the possible samples that will be obtained in drawing at random a sample of two blocks is listed below, together with the estimated tothl number of votes for the specified candidates from each sample. In unrestricted random sampling of blocks each of these possible samples will have the same probability of being selected. The estimated totals shown are obtained by computing the average number per block from the sample, and multiplying this sample average by the known total number of blocks in the population. The results for each possible sample are as folldws:
I 88
PUBLIC OPINION QUARTERLY, SUMMER 1915 Sample Consisting of Blocks I I I
I 2 2
Total Votes for the Specified Candidate Enumerated in the Sample
Estimated Total Votes for the Specified Candidate
10
25.0
6
15.0
and 2 and 3 and 4
and 5
and 3
and 4
2 and 5
3 and 4
3 and 5
4 and 5
Total
190.00
Average estimate Standard deviation of sample estimates
19.00 6.25
Notice, first, that each block appears in four out of ten possible samples of two that can be drawn. Therefore, the probability that an individual living in Block I will be included in the sample is .4, and the same is true of an individual living in any one of the other blocks, even though the number of persons in each block may be different. Note, also, that on the average, the estimates from the samples of voters for the specified candidate agree exactly with the actual number of votes for the specified candidate in this population. Furthermore, it is to be observed that the standard deviation of all possible estimates from the sample is equal to 6.25, which is exactly what is given by the formula for the average error of a sample of two blocks, based on statistical t h e ~ r y . ~ Of course in any real problem the number of blocks will be considerably larger, more efficient methods of estimation may be available, and the sampling variance formula may be considerably more complicated. However, the above example will suffice to illustrate that with 4 T h e standard error of the estimate from the sample, ug-1, is equal to c M
M-m
d,M-
,,m' where u is the standard deviation between the 5 original blocks of the characteristic being estimated, M is the total number of blocks in the population, and m is the number of blocks included in the sample.
AREA SAMPLING-SOME
PRINCIPLES OF DESIGN
189
area sampling the probabilities of an individual being drawn into the sample can be fixed in advance of the actual enumeration, and that when this is so, and appropriate estimating procedures are used, it is possible to measure the average or standard error of the sample estimate. The formula for the standard error shows that as the size of the sample increases, the standard deviation of the sample estimate decreases. This fact takes on more meaning for more realistic populations where the number of blocks in the population is very large. Under such circumstances, the average error of a sample may be made very small by drawing a fairly large number of blocks, even though the number in the sample consists of a very small proportion of the blocks in the population. We have over-simplified the case here to simplify the illustration, but the principles are just as applicable to more complicated cases. ALTERNATIVE AREA SAMPLING DESIGNS
It is to be emphasized that many modifications in the area method may be introduced that would make effective use of available information concerning the areas being sampled. A very important variation in design is the introduction of a method of subsampling, in which two or more levels of sampling are used. For example, a national population sample may involve the selection of a sample of fairly large areas such as cities or counties, and then of a sample of smaller areas within each; or a sample for a city may involve the selection of a sample of blocks, and the subsampling ,of addresses or dwelling units from the selected blocks. However, if the subsampling approach is to conform with the criteria of good sampling outlined earlier in this paper, purposive or judgment methods of selecting the units to be included in the sample are e~cluded.~ SAMPLING EFFICIENCY
In evaluating the alternative designs that are possible, many statistical or mathematical tools are available for guiding one to the selection of an efficient m e t h ~ dT. ~o illustrate, suppose one is 'considering taking 5 For an illustration of the application of an area subsampling design to obtain a national sample see "The Labor Force Bulletin," No. 5 , Bureau o f the Census, Washington, November 1944. 6Morris H. Hansen and William N. Hurwitz, "A New Sample of the Population," Bureau o f the Census, Washington, Sept. 1944; also, "On the Theory of Sampling from Finite Populations," Annals o f Mathematical Statirtics, vol. XIV ( 1 9 4 3 ) ~pp. 333-362; and "Relative Efficiencies of Various Sampling Units in Population Inquiries," lournal American Statistical
190
PUBLIC OPINION QUARTERLY, SUMMER 1945
a sample of blocks in a city, and then preparing a listing of all of the people in the sampled blocks and interviewing every 4-th individual on the list. The reliability of the final estimate from such a sample will depend both upon the number of blocks in the sample and the average number of interviews within each block. It is fairly obvious that if all persons were interviewed within, say, twenty selected blocks, the sample result might be highly erratic, depending on the particular twenty blocks in the sample. However, if one-tenth of the persons were interviewed in each of 200 blocks, a much better cross section of the city would be obtained and a more reliable sample estimate could be made with the same number of interviews. But if each of 200 blocks must be completely listed before selecting the persons for interview, and if the sample is scattered over 200 instead of twenty blocks, the cost of the survey is increased both by the cost of pre-listing and of more travel. Statistical theory is available to aid in the resolution of this conflict between cost and sampling reliability, and to guide one to an efficient design for a given cost. The efficiency of area sampling can be increased through the effective use of good maps and of the available data for small areas.' For reasonably large-scale survey operations, it may pay to invest in maps which make possible the clear delineation of very small clusters of households and thereby eliminate or reduce the amount of pre-listing necessary. However, if detailed maps for defining very small areas are not available and it is necessary to pre-list whole blocks or moderately large selected rural areas, the cost of pre-listing need not be particularly significant where surveys are to be taken repetitively, since the cost of designating a sample of areas and of listing the dwelling units within the sample areas can be spread over a considerable number of surveys. As<ociation, vol. 37 (1942), pp. 89-94. J. Neyman, "On the Two Different Aspects of the Representative Method; a Method of Stratified Sampling and the Method of Purposive Selection," Iournal Royal Statistical Society, New Series, vol. 97 (1931), pp. 558-606; also, "Contribution to the Theory of Sampling Human Populations," lournal American Statistical Association, vol. 35 (1938), pp. 101-116. W. G. Cochran, "The Use of Analysis of Variance in Enumeration by Sampling," lournal Amevican Statistical Association, vol. 34 (1939), pp. 492510; also, "Sampling Theory when the Sampling Units are of Unequal Sizes," lournal American Statistical Association, vol. 37 (1942) pp. 199-212. P. C. Mahalanobis, "A Sample Survey of the Acreage under Jute in Bengal," Sankhyd, vol. 4 (1940)~p p 511-530. 7 For discussion of available data for small areas, and of detailed maps see Morris H. Hansen and W. Edwards Deming, "On Some Census Aids to Sampling," lournal American Statistical Association, vol. 38 (1943), pp. 353-357; and Morris H. Hansen, "Census to Sample Population Growth," Domestic Commerce, vol. 32, No. 11 (1g44), p. 6.
AREA SAMPLING-SOME
PRINCIPLES OF DESIGN
191
In some Census experiences in which pre-listing is used and subsamples are drawn from these listings for repetitive surveys, the cost over a year's time of pre-listing actually amounts to less than ten percent of the total survey cost. AREA SAMPLING COSTS
It has sometimes been stated that area sampling methods are practicable for the government with mass surveys and extensive resources but are not adaptable to private research organizations on a limited budget. It is clear from the above, that at least the cost of actually selecting the sample need not be a highly significant factor in the total cost of such surveys-and that if this method is more costly than other less rigorous methods it is primarily because of the cost of interviewing within the designated sample households rather than because of the cost of locating the households in which interviews are to be made. Actually, the interviewing costi may be considerably affected by the necessity for call-backs or other steps taken to insure that the pre-designated person or household is interviewed; and procedures are available for making call-backs on only a sample of those not at home on first visit that will yield unbiased sample results.' It is ordinarily true, however, that in practice no reasonable number of call-backs can insure an interview with all persons designated for interview-and that a small bias may necessarily remain in the estimate due to the non-interviews that remain. An important distinction between area sampling methods and quota sampling methods lies in the treatment of non-interviews. The quota method, by ignoring the problem, has all of the biases without pointing up the magnitude of this source of error. The area method points up this source of error and makes it possible to correct for it (through calling back to interview all or a sample of the original noninterviews) if the proportion of non-interviews is high. The maximum error attributable to this factor can be kept very small and the maximum bounds of error coming from this cause can be measured. CHOICE OF METHODS
There should be little question that the eficiency of a sample design should be evaluated in terms of reliability o/ results obtained per dollar 8 "Working Plan for Annual Census of Lumber Produced in 1943" published by the Forest Service of the Department o f Agriculture in the fall of 1943. William N. Hurwitz extended the theory of double sampling to cover this problem.
192
PUBLIC OPINION QUARTERLY, SUMMER 1945
of cost, rather than in terms of the number of interviews obtained per dollar. Through the use of the principles of sampling described or referred to above, one is aided in the selection of a sampling method which produces results of maximum reliability per dollar expended. However, these principles lead only to choices between alternative designs that conform with the criteria of good sampling, we have assumed. Therefore, they provide a guide only in choosing between those designs for which it is possible to measure the expected sampling error and the sources of the contribution to it, and thus, through proper adjustment in design, to minimize the sampling error per dollar expended. Since statistical theory is not available for measuring the reliability of sample results obtained by the quota method, this method is automatically excluded from consideration if the criteria of sample design outlined above are to be followed. Thus, although the facts could not be established, it could happen that a quota sampling method would in a particular situation yield more reliable results per dollar than the optimum method chosen through the application of the criteria and sampling theory we have considered. How, then is one to know which to use? A possible answer to this question is the following. If it is important that results of specified reliability be obtained, and if there is a fairly heavy loss involved if the wrong action or decision is taken as a consequence of having depended on results that actually turn out to have larger errors than are considered tolerable, then quota sampling cannot safely be employed, and area,sampling or some other method for which the risk of error can be controlled should be used. On the other hand, if conditions are such that only fairly rough estimates are required from the sample, and important decisions do not hinge on the result, then only a small sample is required, or the price to be paid for using a sample whose accuracy can be measured may not be justified. Under these conditions it may be that the biases of the quota method (or of the area method used without call-backs, or of other low-cost methods) will be considerably less important than the errors resulting from the small size of the sample, and thus such methods may produce results of sufficient reliability more economically than would more rigorous alternative methods. It would, of course, be wasteful to pay for assurance of greater reliability in the results than is necessary. However, it appears reasonable to believe that in most instances in which a fairly
AREA SAMPLING-SOME
PRINCIPLES OF DESIGN'
793
precise estimate is desired and for which, therefore, a fairly large sample is used, that the possible biases of quota sampling may be sufficiently serious as to make that method considerably less efficient in terms of relhbility of results per dollar than the appropriate area sampling methods. We believe that the question of what criteria should be applied in determining the most appropriate sampling method for a specific purpose is deserving of more extensive attention and consideration than it has yet received.