REPORT ON THIRD MEETING OF MEMBERS 29th October., 1951.
PRESENTATION OF
PHILIPS AND HOLLERITH PRIZES
on
"A ClassiJicatiolz of' the Sampling Pi.ocess"
BEFOREProfessor Kendall spoke to the very well attended Third Meeting of Members, the Chairman of Council, Dr. C. Oswald George, presented cheques of ÂŁ25 each to the winners of the Philips and Hollerith Prizes. The Philips Prize, given by Philips Electric, Ltd., for the best set of papers in the Final Examination of the Association in June, was awarded to Mr. J. L. Stewart. The Hollerith Prize, awarded by the British Tabulating Machine Co., Ltd., for the best paper in the Final Examination on the "Organization and Equipment of a Statistical Office," was won by Mr. B. W. Andrews. After the presentation of the prizes, Professor Kendall gave an informal talk on "A Classification of the Sampling Process." Although there was an audience of about seventy people to hear him, the subject, and his treatment of it, was so interesting that it was felt that some report should be made available to members who were unable to attend the meeting. It should, however, be borne in mind that the following prCcis is based on notes taken during Professor Kendall's discourse, and that any omissions are the sole responsibility of the reporter. Professor Kendall said that he proposed to discuss a conspectus of the sampling processes. He had set them out in the form of a genealogical tree. (For the convenience of readers the chart prepared and discussed by Professor Kendall is printed overleaf.) This had not been done, to his knowledge, in any of the standard textbooks. He started with a division of the population. The population could be existent or hypothetical. The distinction was not simply a matter of interest. It was a necessary one all through the sampling processes. It underlay the nature of the inferences to be drawn from the samples when obtained. Most populations dealt with in practice were existent in the sense that they could be enumerated, and were therefore necessarily finite. Others were equally necessarily hypothetical as in the cases of throwing a die or tossing a penny. One had to imagine some sort of population in these cases. 41
CLASSIFICATION OF THE SAMPLING PROCESS
Hypothetical
I
I
I
I
Sampling
I I
Random
I
Simple Random with Random Probability Proportional to Size
I
Non-random
I
Multi-stage Sampling
Sampl~ngon Successive Occasions
Constrained
Purposive
1
Balanced
II
.
I
1
Quasi-random
t-d
Systematic Quota
Sub-classifications as for Random ~alanced~~esi~ns, Latin Squares, etc.
There were two main types of processes-random and nonrandom. The difference was not one of nomenclature but was derived from the concept of probability. In all forms of random sampling the chance of selection of any one unit in the population, or group within the population, was constant throughout the sampling process. There was an important theoretical exception when selection took place without replacement in a finite population. It was also necessary to take all reasonable precautions to exclude bias. . Subject to these qualifications one had to take the word "random" more or less for granted. Processes which did not ensure randomness were often spoken of as restricted sampling. He did not feel that this was good practice. The important distinction was between random and non-random sampling. Professor Kendall enumerated the following divisions of the random sampling processes :- Simple random; random with probability proportional to size: multi-stage sampling; sampling on successive occasions, and two forms of constrained samplingstratified sampling and balanced designs, Latin squares, etc. The first division treated was that of simple random sampling, i.e. sampling which was unrestricted in the sense that any member of the population could be chosen at any sampling and in which the probability of choice of any unit was the same at any stage in the sampling process. When there was no replacement in a finite population the probabilities did change, but the problem was a well-known one and the effects unimportant if the population was large ili relation to the sample. Choice was in any case unrestricted. He then dealt with random sampling with probability proportional to size. In this case instead of individuals having the same chances they. had different probabilities of being selected. There were no theoretical or conceptual difficulties, and not very many practical difficulties. But there were very considerable mathematical difficulties in cases of sampling without replacement from a finite population. In such circumstances it was very difficult to find such expressions as the variance. He knew of no formula which would give it where a population was, in fact, finite; and that was one of the important cases. An example arose when sampling an area and it was desired to sample towns in such a way as to give each town the probability of being chosen proportional to its population. A finite sampling effect came in, and one of the outstanding problems of the present time was to try to find some simple mathematical way to ascertain sampling errors in such a case. The problem raised no practical difficulties, and possibly the mathematical difficulties might be overcome. As a further elaboration of the random sampling process there 43
was multi-stage sampling. Instead of opening the whole population to examination, it was divided into units and then sub-samples were taken from the units. Very few practical difficulties arose from the process, but it raised a certain number of the same kind of mathematical difficulties as before. It could.also raise the general problem of securing unbiased systems in the first place. Sampling on successive occasions was of frequent occurrence in practice, particularly in the case of farms and human beings. The sampling scheme was set up of primary units of a particular kind. Having instituted the machinery for taking a sample, one wanted to go on and sample them on a successive occasion. A correlation effect quite obviously came in. Still within the random sampling process there were techniques involving constraint. Examples were stratified sampling and the more special cases of balanced designs, Latin squares, etc. It was important to realize that stratified sampling was more than a kind of special process of random sampling. In fact, it was a case of randomly sampling several populations. In this sense the process was constrained, but sampling was unrestrictedly random within each population. The same was true of balanced designs, Latin squares, etc. There the constraint on the selection of the populations to be sampled were even stricter, but no constraint was imposed, as it were, on the randomness. The inferences to be drawn were still in terms of ordinary unrestricted random sampling. The restrictions were imposed only on the regions within which the inferences were made. He felt it was important to get that point clearly in mind. Professor ICendall then passed on to what he described as the much more interesting field of non-random sampling. He said that we knew a good deal about the processes already described. Now we came to the ones about which we did not know much. There was a great deal of argument as to whether they were the sort of sampling device with which we should or should not have anything to do. In the first place, there was a process that might be called a balanced sample. This was to be clearly distinguished from the balanced design. The balanced design dealt with the population. The balanced sample was an attempt to bring the sample into agreement in some respects to certain predetermined values. Professor. Kendall said that a discussion of this process was to be found in Dr. Yates' book and only, he thought, there. That discussion brought out the best that could be said about the method. It probably did not take one very far wrong, but it was quite clearly running one into serious danger to select numbers of a sample in respect of certain attributes in the hope that the sample would then be representative in terms of other values, if there was 44
any possibility of correlation between them. He felt that the method should be regarded with a good deal of suspicion. Turning to quasi-random and the systematic sampling process, Professor Kendall made some general remarks regarding both of them. He said that if, l'or example, one obtained a list of one hundred names by taking, say, every hundredth house in a street directory, one could not regard it as a random sample in the strict sense. There were, in fact, only one hundred different possible samples that could be taken under such a process. The question was whether one could apply the ordinary theory of random sampling to such a case. It was possible to invent a hypothetical situation as overlying the system, and that was, of course, quite legitimate. This could be done by supposing that there had been a random shuffling of the population beforehand and that, if, for example, the inquiry was in respect of the prevalence of blue eyes, their possessors had not, in that shuffle, been directed to particular places in the list. If they had been so directed the inquiry would go wrong. A particular problem arose if one had to sample again from the same list. To obtain the same theoretical justification. one would have to have a second hypothetical shuffling of the population, otherwise the sample might break down. It wa.s the sort of thing one had to do when using any list or register. No one had yet properly investigated the circumstances under which one could rely upon systematic sampling, and he thought that it would repay a certain amount of examination. Systematic sampling he thought best confined as a description to the situation where one took a sample at specified intervals in some systematic way. Quasi-random sampling would cover the situation where one could imagine a primitive shuffling, but where one did not necessarily take any systematic selection; for example, where one took a sample of the attendance of a football match by taking the first fifty spectators to come out of the gate. In both these particular kinds of sampling process one could have the same kind of proliferation of sub-processes as had already been listed under random sampling. Having dismissed purposive sampling as unworthy of discussion and fortunately moribund, Professor Kendall concluded his discussion of the classification of the sampling processes by outlining the case for and against quota sampling. He said that it was certainly non-random in the way he had been using the term, in that one did not choose a set of numbers at random. On the contrary, one set out with the accepted aim of trying to make the sample representative of the population in certain respects. For example, given a sample of forty, one set out to include twenty men and twenty women; so many in each age group; so many from the 45
middle, lower, and upper classes, etc. Then one hoped that, having made one's sample representative in these respects, it would be representative in other respects, and, in particular, in the respect under scrutiny. In a good many of the quota sampling inquiries, there were more than one of these second respects, and one might be asking in a number of them questions in regard to thirty or forty respects. It was possible to be very far wrong. There had been a lot of hard things said about quota sampling by the theoretical statisticians, but there were people who employed the process to a large extent and it was generally agreed that the inquiries gave results which did make sense. Public opinion polls, in particular, got close to the truth. The process was so much cheaper than any form of random sampling and it seemed that one could get what one wanted for the purposes of administrative decision sufficiently accurately. It had been one of the objectives of his research division to bridge the gap between the users of random and non-random processes. Nevertheless, all one could say about quota sampling was that it possessed this power to work in certain conditions and to work reasonably well. The trouble was that nobody knew why, or what those conditions were, nor did there appear to be any particular reason why it did not work on a particular occasion. His own personal opinion was that it was better to try to bring down the cost of random sampling than to spend time trying to justify quota sampling. Reviewing the chart of the Classification of the Sampling Processes, Professor Kendall said that the major theoretical question was that thrown up by quota sampling and which was dealt with many years ago by Professor Bowley in his report to the International Statistical Institute. Suppose one were to have a sample from a large population and that one had two attributes, x and y, and that by some means the sample was made representative of the population in respect of the attribute x. The general problem was what one could say about the representativeness of the sample in respect of the attribute y. The trouble was that one did not know if there was a bias, so that the sample might not be a random sample at all. On the other hand, there was the feeling strongly felt by the quota samplers, that the more factors in which one made the sample representative of the population, the more likely it was that the remaining factors would also be representative. One could not, he felt, dismiss that entirely. It was, from one point of view, only an aspect of general inferential reasoning. The greater number of times certain things happen, the more certainly one could say what was going to happen on the next occasion. The greater the number of characteristics in which one thing was similar to another, the
46
greater the certainty with which one could expect the two things to be similar in other respects. As it was, there was a feeling that in quota sampling one was getting out the larger part of the possible error that one would have in random sampling. No one had found a method of expressing these things on a formal, mathematical, and theoretical basis. Dr. George and Mr. Morrell proposed and seconded a vote of thanks to Professor ICendall, which was passed with acclamation. Mr. Cauter, Mr. Wilkins, Mr. Reece, Mr. Swann, and Mr. Prys Williams took part in the discussion, to which Professor Kendall replied.
POSTAL TUITION for the Examinations of
THE ASSOCIATION OF INCORPORATED STATISTICIANS \Volscy Hall, Oxford (founded in ~ U g q ) , provides individually-conducted Postal Courses drawn up especially for the Examinations of the Association. Courses are now ready for the examination for REGISTERED STATISTICAL ASSISTANTS; and for the ASSOCIATESHIP INTERMEDIATE and FINAL Examinat~ons. Wolsey Hall has more than $0 years' experience in preparing candidates by correspondence for a very wide range of examinations; its Courses for University Entrance, London University B.A., B.Sc. Econ., and B.Com. Degree examinations, are particularly well-known. Tuition for A.I.S. examinations is of the same high PROSPECTUS
on request (mentioning examination) to C. D. Parker, M.A., LL.D., DeDt. BN,
WOLSEY HALL, OXFORD