Validity of Responses by Juan Pardo

Validity of Responses

to Survey Questions

BY HUGH J. PARRY AND HELEN M. CROSSLEY This article is designed as one of a series which will discuss certain aspects of validity in surveys. T h e first article, which appears below, examines t w o current concepts of validity (as predictive accuracy, and as a matter of interpretation), reviews the literature on the subject, and presents some of the results of a speciallydesigned survey i n Denver which showed that the validity of even simpleufactual" responses may often be open to question. Subsequent articles will discuss the effect of the interviewer on the validity of survey results and the variations in validity according to respondent characteristics and other variables. Hugh I. Parry was formerly Acting Director of the Opinion Research Center of the University of Denver, and is at present Director of Publications for the A n d Defamation League. Helen M. Crossley, formerly Senior Analyst at the Opinion Research Center, is now with the Attitude Research Branch of the Armed Forces information and Education Division. Perhaps no word has been more vaguely or loosely used in all the social sciences than "validity." T o some it is a matter of gradation-a continuum, so to speak-ranging from an imaginary absolute of perfection down to an equally imaginary absolute of nonvalidity. T o others, more naive, it is an either-or dichotomy, chiefly useful as a weapon to hurl against personal or ideological opponents. Yet validity is basic to all research, and the concept clearly must be made more specific. A. CONCEPTS O F VALIDITY Validity as Predictive Accuracy In defining the essential meaning of "validity," two main schools of thought can be distinguished. The more common definition is given in terms of predictive accuracy. Social psychologists, educators, and others concerned with

psychological testing are familiar with the concept of validity as the ability of a test to predict performance; the criterion in the case of an entire test being some outside measure such as school success, while item validity measures the predictive accuracy of individual test items against the criterion of the full test score. Hundreds of new )tests have been devised and validated, based sometimes on dangerously small groups of college students, sometimes on larger heterogeneous populations. Many discussions of test validation, both definitional and methodological, can be found in the literature and will not be reviewed here. Nearly all of them, to a greater or lesser extent, are based on the concept of validity as the ability to predict performance, although some writers have begun to point out that the performance criteria themselves

PUBLIC OPINION QUARTERLY, SPRING 1950

may be subject to various types of in- users of social research have come to va1idity.l realize that the use of "validity" to Research workers in the broader field mean predictive accuracy is not a fair of public opinion and market research or complete test of the accuracy or useand other social scientists interested in fulness of survey results. Opinion may the definition of attitudes and opinions be closely related to behavior, but it is and their manifesta~ionshave applied not the same thing and it may therethe concept of validity as predictive fore have separate validity of its own. accuracy more broadly to mean predic- As Connelly has so aptly said: "Answers tion of behavior. In this sense, attitude to every question asked uniformly of surveys are considered valid if they can an adequate sample by capable interpredict with reasonable certainty how viewers have ~ a l i d i t y ; " ~the trouble various groups or individuals will bestarts with interpretation of responses have at the polls, in a grocery store, or apart from their own stimuli. Morein some other future behavioral situaover, validity as predictive accuracy tion. However, since Link and Freiapplies only to attitude and opinion berg2 made their historic, categorical studies, and calls for another definition statement linking validity to behavior, their concept has been questioned more for the validity of so-called "factual" and more frequently and severely. questions. Definition and measurement Dollard3 pointed out in a recent article of validity, obviously, also involves that the conditions under which opin- problems of semantics, since both the ions can be expected to predict behavior researcher and the user of survey remay vary greatly according to such fac- sults must mean the same thing when tors as the state of mind and verbal they talk about opinion, factual inforability of respondents, the conditions of mation, and validity. Such necessary the test situation, and the intrusion of inclusiveness of the validity concept outside factors between the time of the was pointed out in 1946 by M ~ N e r n a r . ~ test and the actual behavioral situation. 1 Cf. Jenkins, J. G., "Validity for What?", A study by Pace at Syracuse compared Journal of Consulting Psychology, Vol. I O the answers to nine "opinion scales" (1946). 2 Link, Henry C., and A. D. Freiberg, "The with results from seven "activity scales" Problem of Validity vs. Reliability in Public and led to the c o n c l ~ s i o n : ~ "Manv attitude tests are descri~tive but not predictive, and their and interpretation is limited by this fact. Definitions of attitude as a tendency to act may need to be reconsidered; acceptance of the definition implies that behavior is the criterion of validity."

meaLing

Validity as a Matter of Interpretation In recent years many analysts and

Opinion Polls," Public Opinion Quarterly, Vol. 6, No. I (1942), p. 98. 3 Dollard, John, "Under What Conditions Do Opinions Predict Behavior?", Public Opinion Quarterly, Vol. 12, NO. 4 (1948), p. 623. Pace, C, Robert, "Opinion and Action: A study in Validity of Attitude Measurement," American Ps~chologist,Val. 4 (1949), P. 242. Connelly, Gordon M., "Now Let's Look at the Real Problem: Validity," Public Opinion Quarterly, Vol. 9, No. I (1945), p. 53. 6 McNemar, Quinn, "Opinion-Attitude Methodology," Psychological Bulletin, Vol. 43 (1946), p. 315.

VALIDITY O F RESPONSES T O SURVEY QUESTIONS The conflicting definitions of validity were clearly brought out at the Central City Conference on Public Opinion Research in 1946, at which some participants held out for validity in terms of prediction, others for consistency criteria, and still others in terms of the interpretations made of survey data.? Since that time growing numbers of social scientists have begun to define validity in terms not of prediction but of interpretation. This meaning of validity is an extension of the classical use of the term to describe something that measures "what it is supposed to measure," limited by a careful consideration of what the instrument can logically be expected to measure. The significance of the differing conCePts of validity can be clearly illustrated by the case of the ~ r polls. In terms forecasting the behavior of the electorate at the voting booths, the 1 9 4 ~polls were far from yet in terms measuring 'elative sentiment towards and Truman at the time they were taken, their validity may have been high' The report of the 'pecia1 committee of the Social Science Research Council even went so far as to report that: "There is a possibility that the shift [last-minute swing to Truman] could have been large enough to make Gallup's and Crossley's last pre-election surveys not too far off the mark, as of two weeks before the elecmay have measured opint i ~ n . "They ~ ion of respondents, two or three weeks beforehand, as to what they thought they would do at election time; but because of the last-minute shift, turnout complications, and many other less spectacular factors, they could not validly forecast who would actually

vote or how. N o more convincing argument for using "validity" in terms of interpretation of data rather than prediction of behavior can be given. Validity is the most important single concept facing either the casual or the specialized user of survey results. In view of this fact, it is discouraging that so little attention has been paid to it to date. Even Cantril's admirable text, Gauging Public Opinion, does not tackle the problem of validity, except in a brief technical study of interviewer ratings. Had social scientists paid more attention to this crucial matter, the election predictions of 1948 might have been less widely accepted as Gospel by laymen, and the causes of their failure might have been more widely understood. At any rate, there would now be less evidence ~ of the- misconception, ~ still ~ prevalent, that the polls' failure in 1948 was due to some particular error in sampling, interviewing technique, or statistical allocation of certain groups, rather than to the far more basic error of trying to predict behavior where they could not validly be expected to do more than measure pre-election opinion and intention. B. MEASURES O F VALIDITY Aggregate and Individual Validity

Whether validity is considered as predictive accuracy or as interpretation, 7 Proceedings of the Central City Conference on Public Opinion Research, Panel 5 : "Validity in Public Opinion Surveys"-Panel Members: H. H. Remmers, E. Palmer Hoyt, Wilfrid Sanders, Herbert Hyman. National Opinion Research Center, Denver, Colorado, 1946. 8 T h e Pre-Election Polls of 1948-Report to the Committee on Analysis of Pre-Election Polls and Forecasts. Social Science Research Council, Bulletin 60, 1949,p. 313.

PUBLIC OPINION QUARTERLY, SPRING 1950 64 One of the earliest studies was the however, some way must still be found to measure it. Often, of course, no well-known experiment of LaPiere,lo check is possible for the major findings who between 1930 and 1932 traveled which the survey was designed to un- extensively with a young Chinese coucover; validity must be established for ple, and then obtained questionnaires related questions and independent from many of the hotels, auto camps, characteristics. Except in test validation, tourist homes, and eating establishthe usual method has been by means of ments they had visited; over go per comparisons of aggregate results from cent of the proprietors in each group the survey in question against actual or said they would not accept Chinese as percentage figures from an outside guests. His early findings did much to source, such as election results or cen- show that the best test of validity of sus figures. The concept of aggregate measured attitudes may be something validation, both of sample designs and other than behavior in a hypothetical of survey results, is a familiar one in or real situation. as well as in the field market re~earch,~ of election forecasting and social re- Commercial Research search in general. On many types of In the commercial field, studies of surveys, given sufficient aggregate individual validity, as opposed to agchecks, results can often be assumed to gregate validations, have sometimes have over-all validity. Yet there is al- taken the form of "pantry inventories" ways a danger that satisfactory aggre- to see whether what is actually on the gate comparisons may conceal danger- shelves agrees with housewives' reports. ous compensating errors. Thus, the A similar type of study was reported most reliable means of establishing the in 1938 by Jenkins and Corbin,ll who validity of survey results is the compari- checked daily sales slips for 70 regular son of aggregate results with outside customers of a local grocery store in data accompanied by an independent Ithaca, New York. The check covered check on the worth of the individual 13 frequently purchased articles, and responses. resulted in a range of 62 to I O O per Validation of individual reports is cent of respondents naming as most extremely difficult to carry out, because recent purchase the brand actually of the anonymity of most respondents shown on the store's sales slip. The and the difficulty of verifying answers authors found that indices of validity even when respondents are identified. did not exhibit uniformity from prodNevertheless, there have been several uct to product, and concluded that attempts to make such checks on a Cf. Committee on Marketing Research small or large scale. Some have been Techniques, "Design, Size, and Validation of based on the predictive concept of va- Sample for Market Research," Journal o f lidity, others on the more limited one ~Marteting,Vol. 10 (1946). 1 0 LaPiere, Richard T., "Attitudes vs. Acof truthfulness. Some of the more significant of these studies are outlined tions," Social Forces, Vol. 13 (1934). l1 Jenkins, John G., and Horace H. Corbin, briefly below as illustrations of the Jr., "Dependability of Psychological Brand difficulties involved and the results Barometers-11: The Problem of Validity," that can be achieved. Journal o f Applied Psychology, Vol. 22 ( 1 9 3 8 ) . Q

VALIDITY O F RESPONSES T O SURVEY QUESTIONS while reliability of last-purchase questions (as measured through re-interviews) could safely be assumed, the validity of such questions should be determined individually for each product to be studied. The Magazine Audience Group, which was sponsored originally by Life through the Continuing Study of Magazine Audiences and later expanded into a general advisory body on magazine research for many publishers, was from its beginnings in mid1938 especially concerned with the problem of validity. In order to eliminate invalid answers from reports of magazine readership, the committee developed a system called "Confusion Control" based on the technique used by Professor Darrell B. Lucas12 in measuring the impact of advertisements. The basic technique involved the use of advance magazines not yet published that respondents could not possibly have seen, in order to find out the amount of false identification. At first the correction applied to readership figures on an aggregate basis only. But beginning with Report No. 4 in 1941, a method was devised to evaluate individual replies according to the number of pages identified. The amount of confusion (false identification, either deliberate or mistaken) found was generally low, well below 10 per cent. A small study done for the Magazine Audience Group by Crossley Incorporated13 in early 1941 was set up to check on the accuracy of education reports received from respondents on regular surveys. While done on a limited scale in a few small cities only, this experiment is particularly significant in the study of validity in view of the apparently common upward edu-

cational bias of even the most carefully designed quota or area samples. Crossley's study checked each respondent's answers on the amount of education received against three different sources: later reports from other members of the family, interviews with neighbors, and actual school records where available. As expected, results showed exaggeration of reports on the part of respondents, although the exaggeration was more evident in reports of graduation from grade, high school, or college, than it was in actual attendance at the different types of schools. On the basis of this study, Crossley concluded that simple questions regarding the number of years the respondent attended school were likely to have low validity and should not be relied upon. Government Research The Federal Government has occasionally made various studies which bear on validity. In a brief but revealing article in 1944 Hyman14 cited three surveys done for the Ofice of War Information which showed distortion of the truth by from 4 to 42 per cent of respondents. From these results Hyman concluded that, at least on questions concerning behavior having a prestige character, poll results should be used with the greatest caution. One of the most significant of his findings was the fact that invalidity may exist in varying l 2L U C ~ Darrell S, Blaine, "Rigid Techniques for Measuring the Impression Values of Specific Magazine Advertisements," Iournal of Applied Psychology, Vol. 24 (1940). 13 Results of this study were never published, and the authors are indebted to Archibald M. Crossley for permission to cite them here. 14Hyman, Herbert, "Do They Tell the Truth?", Public Opinion Quarterly, Vol. 8 , No. 4 (7944). P. 557.

PUBLIC OPINION QUARTERLY, SPRING 1950

amounts in different population groups. Some work related to validity was done by the armed forces during World War 11, notably the methodological studies by the Bureau of Naval Personnel and the experiments in prediction made by the Research Branch of the War Department's Information and Education Division. The American Soldier,16 the impressive, recently published report of War Department research, contains a few references to the validity of individual attitudes as established by future behavior. These studies, however, are all concerned with the predictive concept of validity; there seems to have been little concern with the more vital matter of validity as representation of truth. The most comprehensive government work on validity is that now being set up by the Bureau of the Census to be applied on the 1950 Census of Population. The Bureau has a Response Research Unit whose task it is to find out the kind and amount of error involved in reports obtained by enumerators. Various techniques are being used, including re-interviews and special statistical analyses. When the reports from this source are available, they should provide a wealth of hitherto unknown facts about the nature of the validity of census-type information. In both governmental and non-governmental research validity problems are almost unbounded. Tests for validity are still limited by the accessibility of check data, but they range widely. In the past year the writers have had occasion to devise measures of validity of surveys on subjects ranging from antiSemitism and election behavior to reports from hunters and fishermen in California regarding the amount of

game they bagged or fish they caught. In the latter case, as in many other types of surveys, the issues involved were both memory and honesty-that is, could anglers and hunters give us reasonably correct answers on the matter, and would they if they could? Researchers can and must use sufficient ingenuity to apply a great variety of validity or quasi-validity checks to every study design of the future. Medical and Related Research

Medicine is generally considered as belonging to the field of the relatively exact or physical sciences, one with which social scientists have usually had little contact. But a study done in Michigan indicates that the methods of social research may soon be applied more widely in the medical field.16 The objective of the study was the validation of a new method to determine the need for medical attention among farm families. The basic technique used was a list of symptoms which should receive medical attention, information on which was obtained by regular interviewing methods from an informant (usually the housewife) for each member of her family. The information was then validated by means of actual physical examinations of the members 15 T h e American Soldier, Princeton University Press, 1949.Vol. I: Adjustment During Army Life, by Samuel A. Stouffer, Edward A. Suchman, Leland C. DeVinney, Shirley A. Star, Robin M. Williams, Jr.; Vol. 2: Combat and Its Aftermath, by Samuel A. Stouffer, Arthur A. Lumsdaine, Marion Harper Lumsdaine, Robin M. Williams, Jr., M. Brewster Smith, Isving L. Janis, Shirley A. Star, Leonard S. Cottrell, Jr. la Hoffer, Charles R., "Medical Needs of the Rural Population in Michigan," Rural S O C ~ O ~V0l. O ~ YI ,2 (1947).

VALIDITY O F RESPONSES T O SURVEY QUESTIONS of about one-sixth of the families. Complete agreement between the questionnaire reports and the physician's examinations was found in 8 out of 10 cases, and indicated that the determination of the medical needs of a population by asking individuals to list their symptoms was quite feasible. Kinsey17 has given a great deal of attention to the problem of validity. Perhaps the most comprehensive of his techniques to establish validity is the comparison of reports from 231 pairs of spouses. For most of his items Kinsey found that between 80 and 99 per cent of this group of subjects gave replies that were later verified independently by their marriage partners. In addition to this type of check, the Kinsey investigators obtained a small number of re-takes to test the constancy of memory. They also noted such things as internal consistency of the case histories, reports from the skilled interviewers on falsification and cover-up, constancy of patterns in members of different segments of the population, checks by sexual partners other than spouses, comparisons between interviewers of results for similar groups, hundred per cent samples, and comparisons of reports from older and younger generations. Kinsey found that accuracy varies considerably with different individuals. The validity of individual histories also varies with particular items and for different segments of the population. Incidence data were found to be more accurate than frequency data, and averages of social statistics such as age, education, events concerned with marriage, etc., check closely with averages obtained by direct observations. In spite of the author's warning that

the results presented in the remainder of the book are only fair approximations of fact, the careful reader will be inclined to accept the findings as having been obtained in a most scientific manner and as having a more than satisfactory degree of validity, so far as individual reports are concerned. The one point at which the Kinsey Report is vulnerable to criticism is the one at which many other studies stop-aggregate validation. In the absence of a scientifically selected sample (a requirement which might be quite impossible for such a survey to meet on a full-scale basis), the Kinsey results and background data should be validated against all possible criteria in a regular probability sample of perhaps one or two selected areas. In this way, results which are now reasonably valid for individuals and special groups could be applied to larger, more general segments of the population.1s Political Research It is only recently that election pollers have begun to recognize the need for validating individual answers. Election results were considered the acid test, and a poll which came close to the aggregate official results of an election had indeed performed a difficult task. Since elections are secret, the problem of how to validate respondent reports is almost insurmountable, the limit 1 7 Kinsey, Alfred C., Wardell B. Pomeroy, and Clyde E. Martin, Sexual Behavior in the Human Male. Philadelphia: W. B. Saunders Company, 1948. 18 Parry, Hugh J., "Some Contributions of the Kinsey Report to Opinion and Attitude Research," unpublished paper presented before the American Association for the Advancement of Science, New York City, December 30,

1949.

PUBLIC OPINION QUARTERLY, SPRING 1950

usually being a check against precinct records after the election to see whether each respondent voted or not, with no way of telling for whom he voted. Reinterviews with respondents after election, as were made in the 1940 Erie County survey;g serve somewhat the same purpose, with the added advantage of including the report of the candidate voted for-but since they are still verbal reports from the same subjects and not checks against outside data, they are as much a reliability measure as validity, and may be subject to the same kinds of inaccuracy on voting reports as the original pre-election questions. In December 1942 the American Institute of Public OpinionZ0 made a small but significant study in Ewing Township, near Trenton, New Jersey, in which 271 out of the 739 registered voters in the Seventh Precinct were interviewed and asked whether they had voted in the election a month before; their answers were then checked against precinct records. Correct answers were given by 93 per cent of the respondents. Incorrect replies included 5 per cent who said they had voted but actually had not, and 2 per cent who said they had not, but actually did. Similar results indicating high validity in some post-election studies in 1948 are not confirmed by the extensive check made in Denver six months after the election, as will be demonstrated in the following section. The 1948 election gave rise to several post-election checks by polling agencies. Among them was the intensive panel study carried out during the campaign period in Elmira, New York, which will yield 'much information when it is fully analyzed. A preliminary re-

portz1 states that the respondents' postelection reports of voting corresponded with official records in 98 per cent of the cases. These respondents, however, as members of a panel, were interviewed several times in the course of the campaign, and, because of their generally cooperative attitude, could be expected to give more truthful answers than respondents on other types of surveys. A resurvey of 317 respondents was made by the Washington Public Opinion Laboratory in the State of Washington during the first week of December 1948. High agreement with official records was reported in this study: of the 299 respondents who reported having voted on November 2, 287 were actually found to have done so.zz This situation may be rather unusual, however, in that the respondents had been interviewed before and may have been more inclined for this reason to give correct replies. Re-interviews were also made in 1948 by the Survey Research Center of the University of Michigan on a national sample.2za N o check was made against official records for these respondents, however, since this study, like most others, was intended not as a validity check but to throw light on the 1 9 Lazarsfeld, Paul F., Bernard Berelson, and Hazel Gaudet, T h e People's Choice. Second Edition, New York: Columbia University Press, 1948. 20 The authors are indebted to William S. Gillam and the AIPO for permission to present here the results of this hitherto unpublished study. 2 1 Dinerman, Helen, "1948 Votes in the Making-a Preview," Public Opinion Quarterly, Vol. 12, NO. 4 (1948), p. 585. zz SSRC, op. cit., pp. 368-369. 2 2 8 Ibid., pp. 373-379.

VALIDITY O F RESPONSES T O SURVEY QUESTIONS problems of voting intention and turnour. In both surveys it is interesting to note that the percentage reported having voted is higher than in the population at large. ~ n o t h e rcheck was carried out in New Jersey following the 1948 electiom by Carroll S. Moore7 Jr. of the Trenton Times Poll.22bHe did not reinterview his respondents, but checked their voting through precinct records. He found that 95 per cent of those who had intended to vote actually did "; but that I 2 per cent Of resP""dents who said they were reg'stered and eligible to vote were in fact not registered at all. From the findings of the various studies reported above, and from others not included because of space considerations, it can be seen that the validity of individual replies can never be taken for granted, even when aggregate validity is very high. on the other hand, as is shown in studies such as the Kinsey Report and the post-election checks of voting behavior, the fact that individual replies have a great deal of does automatically insure that the over-all results will therefore be valid. Before survey results can be relied on, they must be subjected to both kinds of tests-do the aggregate results check against important known data? and if so, are the individual reports sufficiently truthful? ~ i kind~ without the other may be of misleading. C. DENVER VALIDITY STUDY Plan of Study

In order to make a systematic attack of the previously problems of validity7 a detailed study

was planned and carried out in 1949 at the University of Denver's Opinion Research Center, of which Don Cahalan was Director. It was made possible through generous grants-in-aid from the Rockefeller Foundation, the National Opinion Research Center (through funds allocated from the Interviewer-~ffectproject sponsored by the Social Science Research Council), and the University of Denver, and was also assisted by a contribution from Elmo Roper. In its inception and planning the study benefited immeasurably from the advice and assistance of a formidable number of social scientists, whose aid is gratefully a c k n ~ w l e d g e d . ~ ~ 22b The authors are indebted to Mr. Moore for making available the results of this unpublished study. ,,In addition to ORC staff members, erous assistance was given by Herbert Hyman of NORC, whose earlier research inspired much of this study; by Clyde Hart and Paul Sheatsley of NORC; and by Frederick Stephan of the Commieee on Measurement of Opinion, Attitudes, and Consumer Wants. Others who have made contributions include: Fitzhugh L. Carmichael, Bureau of Business and Social Research, University of Denver; Archibald M. Crossley, Crossley Inc.; Lawrence E. Dameron, Department of ~sychology,University of Denver; W. Edwards Deming, Bureau of the Budget; Leland DeVinney, Rockefeller Foundation; George Gallup, American Institute of Public Opinion; Donald Glad, Department of psychology, University of Denver; Charles Y . Glock, Bureau of Applied Social Research, Columbia University; h ~ ~ Morris Hansen, Bureau of the Census; Paul Lazarsfeld, Bureau of Applied Social Research, Columbia University; Dean Manheimer, Bureau of Applied Social Research, Columbia University; William McPhee, Research Services, Inc., Denver; Lawrence W. Miller, Department of Psychology, University of Denver: Robert and Ann Neel, of DenDepartment of P s ~ c h o l o g ~University , ver; Elmo Roper; Samuel A. Stouffer, Department of Social Relations, Harvard University; Coleman Woodbury, Urban Redevelopment

PUBLIC OPINION QUARTERLY, SPRING 1950

The study was designed to explore three areas: a substantive area of the determinants and concomitants of community satisfaction, and the methodological implications of interviewer effect and of validity. The substantive area will not be covered here, except as it overlaps the methodological areas. This article will limit itself to a report of the design and over-all findings of the validity portion of the survey. Items of Investigation

The subjects chosen for the check on validity of response were generally of a sort common to survey questionnaires. Wording of the questions was based on forms commonly used by other opinion research organizations. T o some degree, the subjects used for investigation were supplied by the logic of necessity; that is, we had to limit our choices to items which were significant and which also could be checked against official records. The subjects finally selected for checking were : ( I ) Respondent's registration and voting in the six city-wide Denver elections held between 1944 and 1948. Official precinct lists of voters are in the public domain, so each respondent's reported voting history could be checked against them. In the case of the primary election in 1948, we could also check on party affiliation. (2) Personal contribution during the fall 1948 Community Chest drive. (3) Possession of a valid Denver Public Library Card in respondent's name.

(4) Possession of a valid Colorado driver's license. (5) Ownership of an automobile by respondent or spouse, and make and year of car. (6) Respondent's age. This was checked t h r e e way s-against v o t i n g r e g i s t r a t i o n records, against driver's license reports, and finally, for internal consistency, against another question on the ballot. (7) Ownership or rental of respondent's place of residence. (8) Telephone in respondent's home. It can be seen that the items chosen evoke varying amounts of prestige, and varying degrees of potential distortion as caused by social pressure, ease of verification, memory factors, and the like. Perhaps the items of greatest practical interest and importance are those dealing with elections, since past performance (or the respondent's version of past performance) has often been used, deliberately or unconsciously, in behavior in the an attempt to future. Other items used are common ones, either in opinion and attitude research or in the more specialized field of market research. Cross-analyses are frequently made on the basis of responses to these items, and conclusions are drawn from the attitudes or past behavior of these groups; it is therefore important to know to what extent such breakdowns are based on valid information.

Study, Chicago. The authors are also indebted to Hadley Cantril and Elizabeth Deyo of the Office of Public Opinion Research, Princeton University, for making available machine equipment for the final analyses.

VALIDITY O F RESPONSES T O SURVEY QUESTIONS The results presented here, of course, will not apply automatically to any survey done on any population, although many of the findings are of importance to research in general. Their significance and application must be studied in the light of the conditions under which they were obtained. For this reason it is necessary to present here a brief but basic outline of the sample used in the Denver survey and the techniques employed to obtain the information. The Sample

Most fortunately, while the study was still in the planning stage, a new edition of the City Directory of Denver residents was issued. A series of informal checks on the Directory information indicated that it was sufficiently accurate and up-to-date to take the place of a costly enumeration on our part and that it could be used as the universe for this study. While there may have been some small distortions in representativeness in the Directory, they would not materially affect our results, since our purpose was to obtain a random list of individuals for the validity and interviewer effect tests rather than to make any numerical estimates. Using a probability method of systematic selection, 1,349 names were taken from the Directory (discarding, of course, such unusable listings as business places, out-of-town addresses, and duplications of names). These I ,349 names were distributed to the 45 interviewers in assignments of 30 (one assignment was 29 names). Interviewers were allowed to make no substitutions, and were required to make at least four calls to reach their respondents.

A total of 920 usable interviews was finally obtained. By using the Directory as a sampling universe, we have available, as a by-product, considerable data on the characteristics of respondents who were not reached; analysis of these data will be made available later. Interviewer Selection and Training The field work was begun on April 19, 1949, and continued through May. The 45 interviewers used came from two groups: experienced professional interviewers on the staffs of national and local research organizations, and graduate and undergraduate students in opinion research and social science at the University of Denver. Each interviewer was given intensive personal training in two or more special sessions, and was assigned to a special supervisor for the duration of the field work. The result was that the interviewing staff, when it went into the field, was presumably somewhat above average in its ability and training. This point is stressed only to show that little of the invalidity of response found could have been caused by an unduly amateur or inefficient staff of interviewers. Further evidence of the quality of the field work was given by a postinterviewing check by the office staff, both for respondents interviewed (certain items were checked against Directory information and by telephone) and for those not reached (using Post OGce records and other methods). Eight ballots were discarded as invalid, chiefly because of mistaken identity, a constant ~ r o b l e m with name-and-address samples. Thus, it can be assumed on this survey, in contrast to others where such rigid control and checking of field work are not feasible, that a

PUBLIC OPINION QUARTERLY, SPRING 1950

minimum of the invalidity uncovered is due to dishonesty or incompetence on the part of interviewers. Another aspect of the sample design assured that differential validity among various groups could not be due to certain interviewers' interviewing more of certain types of people. The city was divided into five sectors, as equivalent as possible with respect to several factors, and within each sector respondents were stratified by sex and geographical location and assigned at random to the nine interviewers. Thus, careful control was exercised to see that each interviewer's assignment was as nearly like every other assignment as possible. F u r t h e r m o r e , interviewers were allocated to the various sectors of the city so as to equalize as far as possible the effects of such factors as interviewer's sex, experience, education, age, and social introversion-extraversion. The importance of this technique will be brought out in an article in preparation dealing with the relationship of the interviewer to the validity of survey results. The interviewers, it should be added, were given no indication of the real purpose of the study nor were they told that there would be a check on the respondents (although, to improve efficiency, they were, as usual, told that there might be checks on their own work). As far as they were concerned, it was a normal survey covering community satisfaction. Later checks indicated that none of the interviewers became aware of the justifiable trick being played on them. Checking Validity Information

T o ascertain the validity of information obtained by the interviewers, a

long and tedious, name-by-name, response-by-response check was carried out. Each respondent's answers to the questions cited earlier were compared with records of the City and County of Denver, the Denver Community Chest, the Denver Public Library, and the Mountain States Telephone Company. All checking, except on Community Chest records, was done by Center personnel. In this investigation the Center received the whole-hearted and efficient cooperation of all agencies concerned. It is not necessary to go into the mechanical details of the validity check. Such factors as marriage and consequent change of name by female respondents between 1943 and the present, changes of address during the period, and the like all contributed to our difficulties. The official records of the City and County of Denver appear to have been in a state of much higher order and accuracy than many researchers have found in other areas, but even so occasional difficulties crept in. While it was possible to solve the great majority of problems by rechecking, digging, leg-work, and phone-work, it must be realized that some error in the base criteria was inevitable and is reflected in the measures of validity obtained. The Results

The level of invalidity on the various items or combinations of items checked ran from nearly zero up to almost half of the responses received. As the following ,tables show, invalidity often follows social pressures. More respondents exaggerated their participation in elections than under-reported it. The same tendency is evident in the reports

VALIDITY OF RESPONSES TO SURVEY QUESTIONS of possession of

library cards and driverjs licenses. o n l y the over-all totals are presented here; later articles will explore the variations in validity by respondent characteristics, conditions of the interview, and other factors. Elections. Since the laxest amount of in the data concerns reports various Denver elections, we shall discuss them first. Results are shown in Table

24 Question 14: "Here are some questions about registration and voting in Denver. Have you been registered to vote in Denver at any time since rg43?33 OR 'IDON~T ~~~~~i~~ 1 4 ~(IF KNOW"): "Have you voted in any election in Denver since 1943, either in person or by mailing an absentee ballot back to Denver?" Question 15 (UNLESS "NO" T O 14 OR 14A): "We know a lot of people aren't able to vote in every election. Do you remember for cevlain whether or not you voted in any of these elections: First . ." (ELECTIONS READ OFF, ONE AT A TIME).

TABLE I VALIDITY OF REGISTRATION AND VOTING REPORTS 100%=92o Cases

A. Whether registered or voted in Denver since 1943: Correct reports:

Not registered since 1943

Voted or registered since 1943

Exaggerated registration or voting

Under-reported registration or voting

Confused (Don't remember, No answer)

B. V o ~ i n greports on combination of six elections: Correct in all statements Exaggerated (voted in fewer than reported) Under-reported (voted in more than reported) Confused (voted in same number but different elections, or Don't remember or No answer to one or more elections)

C. Vodng reports on six specific elections: ( I ) November 1948 Presidential election: Correct reports:

Did not vote

Voted

Exaggerated (said voted, but did not)

Under-reported (said did not vote, but did)

Confused (Don't remember, No answer)

*Less than 0.5 per cent.

PUBLIC OPINION QUARTERLY, SPRING

TABLE I (continued) 1 0 0 ~ = 9 2 o Cases (2) September 1948 primary election: Correct reports: Did not vote Voted in Republican Primary Voted in Democratic Primary Exaggerated (said voted, but did not) Under-reported (said did not vote, but did) Confused (Don't remember, No answer, wrong answer on party)

(3) November 1947 city charter election: Correct reports:

Did not vote

Voted

Exaggerated (said voted, but did not)

Under-reported (said did not vote, but did)

Confused (Don't remember, No answer)

(4) May 1947 Mayoralty election: Correct reports:

Did not vote

Voted

Exaggerated (said voted, but did not)

Under-reported (said did not vote, but did)

Confused (Don't remember, N o answer)

(5) November 1946 Congressional election: Correct reports:

Did not vote

Voted

Exaggerated (said voted, but did not)

Under-reported (said did not vote, but did)

Confused (Don't remember, N o answer)

(6) November I944 Presidential election: Correct reports:

Did not vote

Voted

VALIDITY O F RESPONSES T O SURVEY QUESTIONS TABLE

(continued)

~oo%=gzo Cases Exaggerated (said voted, but did not) Under-reported (said did not vote, but did) Confused (Don't remember, No answer)

The cumulative amount of invalidity for the six elections is somewhat startling. While four-fifths of the respondents gave valid answers as to their registration during the period, only a third gave entirely correct answers to questions regarding all six elections. And "correct" in this check is only in terms of whether or not the respondent actually voted; if a check could be made on the truthfulness of the reports given on candidates voted for, even a larger number of errors might be uncovered. On the questions regarding specific elections the amount of invalidity varied from a seventh to a fourth of all responses. Clearly, on the basis of these results, any of these questions would have little value as a means for checking the representativeness of a sample, for drawing assumptions on the basis of voting groups, and particularly for using this reported past voting behavior as a means of indicating future voting behavior.25 The 1948 Presidential election was both nearest in time and highest in importance to respondents in general. Thus the level of invalidity here was somewhat lower than on the other elections. However, to some extent, the lower level of invalidity is artifactual: where invalidity is basically in the direction of exaggeration and where a

higher proportion vote than in most elections, there is simply a smaller group of persons who are likely to give incorrect responses.

Community Chest Contribution. Table shows that the query concerning personal contributions to the 1948 Community Chest drive provided a large relative amount of in~alidity.'~ It can be seen that about a third of the respondents said that they did not contribute to ,the Chest; in these cases no further check was made, on the pragmatic but probably reliable assumption that few if any respondents would deny contributions they had made. 2

25 "In five of the six elections, using the respondent's unverified statement to classify him as a 'voter' or 'non-voter' would result in misclassifying from 22 to 30 per cent of the respondents." Don Cahalan, "Validity of Behavior Reports in Opinion Surveys," unpublished paper read before the American Statistical Association, New York City, December 30, 1949. Nevertheless, in terms of aggregate validation, the uncorrected sample results showed that Truman had received 53 per cent of the major party vote in Denver in 1948; he actually received 54 per cent! Evidently, even on the most recent election, a set of canceling biases was in operation. 26 Question 25: "Did you yourself happen to contribute or pledge any money to the Community Chest during its campaign last fall?"

PUBLIC OPINION QUARTERLY, SPRING 1950 TABLE 2

VALIDITY O F REPORTS ON COMMUNITY

CHESTCONTRIBUTIONS 1 0 0 ~ = 9 2 0 Cases Reported not giving (statements assumed to be correct, but not checked against records) Reported giving, and did give Reported giving, and might have given through uncheckable source Reported giving, but did not give Don't remember, N o answer

About a fourth correctly said they had given, either at work or at home. Slightly over a third said they had given but were not listed as donors in the Community Chest files. About a tenth of the responses could not be classified as valid or invalid-though the presumption is toward invalidity, since the Chest records were in very good shape and, except for certain collective donations, included a very complete list of donors. Thus it is evident that about four out of every ten responses here were invalid. Undoubtedly social pressures and a belief that the responses would not be checked were the major factors behind the high level of invalidity. It should be noted, however, that the

question created considerable ambiguity. Despite the stress on "you yourself," some respondents tended to answer in terms of pledges by other members of the family. Whatever the reason for invalidity, it can safely be said that this sort of question, whether it concerns the Community Chest or some other charitable organization, is not very helpful for survey use. Moreover, the issue tested here was only the fact of giving; if it had been necessary to find out the amounts of contributions made, even more invalidity could have been expected.

Library Card. As Table 3 shows, there was a slight tendency for respondents to claim possession of a currently valid

TABLE 3 100%=92o Cases Correct reports: Do not have card Have card Exaggerated (reporting having card, none on file) Under-reported (reported no card, one on file) Don't remember, N o answer

VALIDITY O F RESPONSES T O SURVEY QUESTIONS

TABLE 4 VALIDITY O F REPORTS O N DRIVER'S LICENSE AND AUTOMOBILE OWNERSHIP 100%=92o Cases A. Possession of Driver's License: Correct reports: Do not have license Have license Exaggerated (reported license, but none on file) Under-reported (reported no license, one on file) Don't know, No answer (most had licenses on file)

44% 44 10

2 Q

B. Possession of Automobile, Year and Make: Reported no car owned (statements assumed to be correct, but not checked against records) Correct on ownership, make, and year Correct on ownership and make, incorrect on year Correct on ownership and year, incorrect on make Correct on ownership, incorrect on make and year Incorrect on ownership No answer (more than half of these actually had cars registered) *Less than 0.5 per cent.

library card, when no card was actually on file.27About a tenth of the responses were invalid in this respect, and a negligible proportion were invalid in the direction of under-statement-probably infrequent users unaware that their cards remain in three years. Driver's License and Car Ownership. Again, as can be seen in Table 4, about a tenth of the respondents claimed possession a driver's license when actually they did not have one. Less invaliditv was found in items concerning ownership, by respondent or spouse, of an automobile, and the make and year of such car.28

While the number of correct answers on such questions is gratifying in comparison to the answers on other types of questions, an error of as little as 3 per cent in the proportion of families own27 Question 21: "Do you have a library card for the Denver public library in your own name?" 28 22: YOU have a Colorado

driver's license that is still good?" Question 23: ''Do You happen to own an automobile at the present time? (IF "YES") Is it registered in your name alone, or in your (,ife,,) (husbandBs) name ~ u e s t i o nz?A (IF "YES" T O 23): "Does the car have Colorado plates or plates from some other state?" Question 2 3 B (IF TO 23): ''What year and make of car is it?"

PUBLIC OPINION QUARTERLY, SPRING 1950

TABLE 5

A. Consistency check by year of birth: 100%=886 Cases Age and year of birth consistent within a year Reported age more than year younger than age by year of birth Reported age more than year older than age by year of birth

I3. Check by driver's license records: 1oo%=411 Cases Reported age within one year of age on license record Reported age more than year younger than age on record Reported age more than year older than age on record

C. Check by election registration records (Men only):29 1oo%=zg7 Cases Reported age within one year of age on registration record Reported age more than year younger than age on record Reported age more than year older than age on record

ing cars might be quite serious for some purposes, such as estimating the tire needs of the country, since it is over and above any error that might be expected from sampling. If it were proved that the entire 3 per cent were actually incorrect respondent reports, and not omissions in the official files or various other types of error, a survey with direct need for valid figures on car ownership would have to examine this problem carefully. In the Denver study, a tendency was also noticed to report the car owned as newer than it actually was-a fact which might also require attention in a specialized survey.

Respondent's Age. Results on the various age checks showed a generally satisfactory level of validity, as indicated in Table 5.30 The correlation with age as reported on the traditional age question was highest for the information on year of birth obtained from the other end of 29 1n former years women in Colorado were not required to give their exact ages when registering to vote, only to swear that they were over 21. Consequently the registration check on women's ages is omitted here because the information is not sufficiently precise. 30Question 10: "May I ask your age?" Question 35: "In what year were you born?"

VALIDITY O F RESPONSES TO SURVEY QUESTIONS

TABLE 6 VALIDITY OF REPORTS ON HOMEOWNERSHIP AND TELEPHONE

H o m e Ownership:

1oo%=91g Cases Correct reports: Home owned Home rented Probably exaggerated ownership (place owned by someone of a different name) Probably under-reported ownership (place owned by someone of same family name)

96% 53% 43 3 I

B. Telephone: 1oo%=g18 Cases Correct reports: Telephone N o telephone Exaggerated (reported telephone, but none in family name at that address) Under-reported (reported no telephone, but one in family name at that address)

the ballot. This result was to be expected, since the check was essentially more one of reliability than validity. The device was suggested by the Bureau of Applied Social Research of Columbia University, ,, and would be more useful on successive panel studies than on a single questionnaire as here. For those respondents who did not have drivers' licenses or were not registered to vote in Denver, it was, of course, not possible to check the age information against such records. The fact that those who were so checked, however, appeared more accurate when -compared with drivers' license records tha; wi,th registration records may mean one of several things-that the registration records are less accurate than the license records, that some re-

spondents are motivated to give less valid reports to registration officers, or that the people for whom the various checks were possible differ in their tendencies to give invalid answers to the official reporters and to interviewers. These differences emphasize the point that it is not enough merely to know that invalidity exists and the extent of it; information is also needed on its sources and on means to distinguish which of several answers is the valid one and which the invalid. Ownership and T e l e p h ~ n e . ~ ~ These factors, which are commonly

Home

31Question 3 0 : "Do you or your family rent, or own, the place where you live?" Question 3 3 : " I s there a telephone in your home in your family's name?"

PUBLIC OPINION QUARTERLY, SPRING 1950

used for breakdown and checking purposes in many types of surveys, were found to have a high degree of validity when checked against city property records and telephone company listings. Results of this check are given in Table 6. CONCLUSIONS The Denver study disclosed amounts of invalidity ranging from a twentieth to nearly a half of the responses received on various types of factual questions. While other situations or areas may show more or less validity depending on circumstances, the survey results demonstrate clearly the wide range of invalidity to be found in the answers to a number of factual items of types often used in survey research. They further underline the need for caution in accepting so-called "factual information" at face value; even census-type data must be considered suspect. Because of the special controls exercised in the design of the survey and the careful training and supervision of interviewers, it is believed that the invalidity found here represents close to a

minimum, and that national surveys which cannot be so rigidly controlled should expect to encounter even more on many types of items. Except on certain more or less innocuous items, the range of invalidity is suacient to cause worry, and indicates a great need for further research on the truthfulness of respondents' statements of fact. Nevertheless, the reader should not infer from these findings that research in the social sciences is relatively hopeless. H e need not feel that truth is unascertainable by pragmatic methods of experimental science, and that he had better turn to Yoga or Neo-Thomism. For invalidity, in the final analysis, is not inevitable. It has causes which can be found in the questionnaire, in the respondent, in the interviewer, and above all in the interpretation of data. It varies by subject and among subgroups. Yet it can be measured and analyzed. Once this is done, it is subject to certain pragmatic checks and controls. Succeeding articles in this series will attempt to examine these phases of the problem, and will suggest certain practical remedies.