WANTED: RULES FOR WORDING S T R U C T U R E D QUESTIONNAIRES BY E L I S A B E T H N O E L L E - N E U M A N N T h e influence of question wording and questionnaire construction is probably underestimated in present-day market and opinion research. I n this article, the close dependency of survey results on details of questionnaire wording is demonstrated by the outcome of a large number of split-ballot field experiments. Elisabeth Noelle-Neumann is founder (1947) and director of the Institut fiir Demoskopie Allensbach, the first German institute for public opinion research. She is also Professor of Journalism and director of the Institute for Public Communication at the University of Mainz.
T
HE PURPOSE of this paper is twofold: First, it sets out to attack the assumption that the structured questionnaire is a sturdy measuring instrument, and attempts to demonstrate that the results are highly dependent on details of questionnaire construction and wording. Second, a number of typical questionnaire designs that are liable to affect results are critically dealt with, and certain rules are deduced. T h e essential condition in an experiment designed for the purpose of testing questionnaire effects is that only one explanation must be possible for significant differences in the results of various survey segments, and care must be taken that all other explanations are made impossible. T o satisfy this condition, the split-ballot procedure was used for the experiments. Surveys were randomly divided into two or more segments running simultaneously under identical conditions. T h e only variation was the form of the questionnaire used in each segment. Sometimes these differed in one single word; sometimes in larger groups of questions; sometimes even in whole elements of design, e.g. in the position of questions, the length of a series of questions, etc. Methodologically, these are field experiments, i.e. experiments carried out in natural surroundings, as opposed to laboratory experiments. T h e subjects react without knowing that they are taking part in an experiment. T h e design of a split-ballot corresponds, in every respect, to the conditions maintained in a controlled experiment, as ,first developed in natural science.1 T h e only possible explanation of the 1 The split-ballot as an instrument suitable for measuring effects of question wording was probably first described by Donald Rugg and Hadley Cantril, "The Wording
192
ELISABETH NOELLE-NEUMANN
observed differences is the questionnaire variation, whose causal significance can therefore be inferred. WHEN INFORMATION IS ASKED
FOR, P R E C O D E S I N F L U E N C E R E S U L T S
Is it possible to measure the amount of particular knowledge, the dissemination of particular information, if the questionnaire contains the information? In the most common form of this, the information is written next to the question, as one of the possible replies, e.g. "Do you know who is the president of the United States?" Possible replies: "Nixon," "other names," "don't know." This method saves a lot of coding and speeds u p punching. However, I have never heard of a split-ballot experiment where a questionnaire with precoded information did not reflect a higher level of information than a parallel survey with a questionnaire not containing the correct reply.2 For example: a representative cross-section of the German population was shown a quality symbol for textiles, and asked: "Have you ever seen this sign before?" I n case of "yes," follow-up question: "Could you tell me what this sign is called?" Questionnaires for group A (1,ooo interviews) contained the question with the correct, or almost correct, answers precoded. T h e questionnaire for group B (1,000 interviews) contained an open-ended question without precoded answers, merely with a dotted line on which to enter the respondent's reply. Of both groups A and B, 76 per cent had seen the symbol. Its correct name, however, was stated by 39 per cent of group A, and by 31 per cent of group B. Difference in the level of information: 8 per cent (level of significance, o. i %). T h e term "Esperanto" was correctly defined by 35 per cent of the of Questions," in H. Cantril, ed., Gauging Public Opinion, Princeton, Princeton University Press, 1944. Stanley L. Payne, in T h e Art of Asking Questions, Princeton, Princeton University Press, 1951, expressly states that the split-ballot is a controlled experiment. More detailed descriptions of the controlled field experiment in public opinion research are to be found in Donald Campbell's "Factors Relevant to the Validity of Experiments in Social Settings," Sociological Bulletin, Vol. 54, 1957, pp. 297-312; E. Noelle-Neumann, "On the Methodological Progress in Survey Research," Allensbacher Schrift No. 7, Allensbach and Bonn, 1962; Umfragen i n der Massengesellschaft, Reinbek bei Hamburg, 1963. 4th ed. 1968; "Die Rolle des Experiments in der Publizistikwissenschaft," Publizistik, 1965, No. 3; Winfried Schulz, Kausalitiit u n d Experiment i n den Sozialwissenschaften, Mainz, Verlag Hase & Koehler, 1969. 2 I t seems quite significant that the arguments with which experts recommend the use of open-ended questions do not include that of obtaining realistic figures when measuring information. Cf. B. Frisbie and S. Sudman, "The Use of Computers in Coding Free Responses," Public Opinion Quarterly, Vol. 32, 1968, p. 221. 3 Allensbach Archives, IfD Survey No. 2047 (1968). All other references to sources, if not otherwise stated, relate to the Allensbach Archives and to representative cross-sections of the population-usually 16 years and over-of West Germany and West Berlin. The numbers of interviews have been rounded off.
RULES FOR WORDING QUESTIONNAIRES
193
respondents if the questionnaire did not contain the correct answer. However, if it did contain the precoded answer, correct replies increased to 52 per cent. Difference in the level of information: 17 per cent (level of significance, 0.1%). I t is impossible to estimate in advance how large these differences are going to be.4 T h e same situation will arise if the correct precoded answer is not printed next to the question but, instead, is given in one of the subsequent questions, e.g. "The U. S. president's name is Nixon. Now, I should like to ask a few questions about him . . ." It is particularly irritating that even subsequent information in the questionnaire influences replies to questions asked previously, because frequently the researcher is confronted with the task of measuring information of varying quality, such as spontaneous knowledge without memory aid and knowledge with slight, medium, or strong aid. T h e obvious pattern to be adopted for this type of exploration would seem to be to ask for the information required, step by step, giving a stronger aid at successive steps. Finally, the circle of persons who are informed in the broadest sense will be determined. T h e following example illustrates the bias to which this method can lead.6 T h e survey dealt with a label for pickled cucumbers, tubed mustard, etc. One version of the questionnaire (2,000 interviews) was designed to measure the respondent's familiarity with the trademark in six steps: at each step a label bearing slightly more information was shown to the respondent. T h e first label showed only one insignificant detail of the trademark. (Question: "This you will always find on a certain branded article. Of course, some of it is missing, but could you nevertheless guess to which brand, to which branded article, this red field belongs?") Then, at each step, the label shown was a little nearer to complete, until, at the final stage, it was complete with the brand's name. Another form of questionnaire was used in six simultaneously conducted surveys, some with 500 and others with 1,000 interviews. Here, each of the six more or less complete labels was tested independently. ("Could you guess to which brand . . . this belongs?") Table 1 shows the percentage of respondents who classified the labels correctly, differentiated according to the way the answers were elicited: either in one interview, presenting the six label designs successively (Version A), or in six polls, each carried out with a different sample, and by presenting the labels singly to each of the six groups (Version B). How is it possible that, in a personal, structured interview, replies 4 5
IfD Survey No. 1072 (1962).
IfD Surveys Nos. 1044. 1045, 1047 (1960).
ELISABETH NOELLE-NEUMANN
TABLE 1 FAMILIARITY WITH, AND CORRECT CLASSIFICATION OF, A BRAND LABEL IN TWODIFFERENT QUESTIONNAIRE FORMS Label
b
Version A
Version B
Less than one per cent.
Level of significance, 0.1%.
given early in the interview can be influenced by questions, information, illustrations, etc. that appear later in the interview? One would tend to blame the interviewers. Indeed, this nuisance probably belongs to the sort of difficulty we call "influence of the experimenter's attitude and expectations." Anyway, rather than try to alter the interviewers' conduct and attitude, it might be more expedient to take these causes of distortion into consideration, and to avoid them by conducting several concurrent surveys instead of a single one, if information of varying quality is to be measured.
T h e next problem is questionnaire monotony. There appears to be a general belief that avoidance of monotony i n questionnaires is merely a matter of being kind to respondents; of unnecessary kindheartedness, to be more precise. It is surprising indeed how many researchers expect that the degree of monotony of an interview will not influence its results. I t will influence the results because, after all, respondents are not machines but human beings, motivated in their attitudes and behavior by their likes and dislikes. As an example I quote an experiment6 which was carried out with newspapers, a type of publication which is known to be comparatively resistant to questionnaire influence and to respondents' poor memory, as far as readership figures are concerned. This is so because daily papers are read extremely regularly. With, for example, monthly periodicals, the situation is different.? IfD Survey Nos. 2022, 2023, 2024 (1966/67). In this respect, Belson, Schyberger, and the Allensbach Media Analysis come to identical results. Cf. William A. Belson, Studies in Readership, London, Business Publications Ltd., 1962; Bo Walterson Schyberger, Methods of Readership Research, Lund, 1964; Allensbacher Werbetrager-Analyse 1964, pp. 146 ff.; E. Noelle-Neumann, "Zeitschriftenleser 1964," in Die Anzeige, No. 17, 1964. 6
7
RULES FOR WORDING QUESTIONNAIRES
195
I n one type of questionnaire (3,000 interviews), a series of questions was repeated five times. Each time, the respondent who had claimed to have read a particular paper within the past three months was asked when he last had read this paper. In a concurrent survey (3,000 interviews), the questionnaire was constructed in such a manner that the respondent could become active, thus avoiding monotony. At the start, title cards were presented to determine which of 16 national newspapers the respondent had read within the past three months. Then, four strips of paper bearing the following descriptions were put on the table: "read it yesterday," "day before yesterday," "within the last 7 days," "some time before then." T h e respondent answered the question as to when he had last read the various papers by placing the title cards on the appropriate strips. Readership figures obtained by the sort of interview which stimulates interest are lo to 15 per cent higher (relative to the total readership), depending on the type of newspaper. Even a subsequent question, identical in both forms of questionnaire, dealing with the reading of local newspapers produced 5 per cent more readers per day for local newspapers when it was included in the more animated questionnaire. FOLLOW-UP QUESTIONS REDUCE REPLIES TO FIRST Q U E S T I O N AND
INTERFERE WITH TREND MEASUREMENT
These findings lead to the next problem, which is closely connected. Many monotonous questionnaires are characterized by a large number of follow-up questions relating, for instance, to every periodical read by the respondent. I t is a fact that in structured interviews, follow-up questions after certain replies will almost invariably reduce the number of these replies. T h e following examples will show the order of magnitude of this tendency to avoid follow-up questions. T h e following question was put to a cross-section of the general population (2,000 interviews):s "What is your opinion-has West German foreign policy been successful in recent years, or d o you think that the position of Germany has deteriorated?" I n Questionnaire A (1,000 interviews), no follow-up question was asked. I n Questionnaire B (1,000 interviews) respondents who answered "deteriorated were asked a follow-up question: "What have you mainly in mind?" T h e results are shown in Table 2. T h e same question was asked nearly two years later, this time with a follow-up question not only when the respondent replied "deteriorated," but also if he said "successful" (A, n o follow-up, and B, follow-up, 1,000 interviews each).g Again, follow-up questions led to 8 S
IfD Survey No. 1017 (1958).
IfD Survey No. 1038 (1959).
ELISABETH N O E L L E - N E U M A N N TABLE 2
Opinion on German Situation Successful Unchanged Deteriorated No opinion a
Version A
Version B
21% 34 21 24
19% 32 16 * 33 a
Level of significance 0.1%.
the same symptom: response categories where an additional explanation is required go down, at the same time significantly increasing the "no opinion" category.lWhether it is the interviewer or the respondent who is avoiding the follow-up questions, we cannot tell. T h e same effect has also been observed in market research.1 T h e question, "Do you smoke cigarettes?" if followed by a brief inquiry as to the preferred brand, is answered positively by 35 per cent of a representative cross-section. If the question is followed by five further questions-four of which require the sorting of a set of cards with 35 brands listed-the percentage of cigarette smokers drops to 30 per cent (level of significance, 0.1 %). Not all follow-up questions result in avoidance effects; there are exceptions. However, these occur too infrequently to make it possible for us to establish rules or principles as yet.12 T h e investigator should pay particular attention to these avoidance effects when he is concerned with trend measurement. Even if the question is identically worded in each survey, n o reliable information on trends can be obtained if at some stage the series of questions is made longer (or shorter). For example,l3 when a new brand of tropical fruit was introduced on the market, brand familiarity was regularly measured; sometimes biannually, and sometimes annually. One day the client wished to have the questionnaire extended: the colored symbols on the trademark were also to be tested. This was done in a splitballot with the results shown i n Table 3. Questionnaire A (1,000 interviews) was unchanged from earlier administrations; Questionlo E. Noelle-Neumann, Umfragen in der Massengesellschaft, p. 11 IfD Surveys Nos. 1035, 1037 (1959). 12 One of the rare examples is the question: "What is your
83.
opinion-do you or don't you agree with the introduction of compulsory military service?" T h e percentage of "Don't agree" rose from 43 per cent to 47 per cent if "Don't agree" was followed by the question: "Would you say that you are against compulsory service on principle, or would you be in favor of it if Germany were reunited?" (IfD Survey No. 1098, 1956). 1s IfD Survey No. 2047 (1968).
RULES FOR WORDING QUESTIONNAIRES TABLE 3 FAMILIARITY WITH BRANDSOF TROPICAL FRUIT Reply "Don't know any such brands"
Version A
Version B
36%
43 %
Level of significance, 1 %.
naire B (1,000 interviews) added one question to respondents who knew any brands. This example also demonstrates what steps should be taken when a client wants to measure trends but, at the same time, requires additional information. A split-ballot is then the only solution. One half of the interviews are used for trend measurement; the other half obtain the additional information and, simultaneously, gauge the extent of the avoidance effect. S U R V E Y RESULTS DEPEND O N HOW WELL A QUESTION D E F I N E S THE O B J E C T O F I N Q U I R Y
T h e examples given so far would appear to indicate that it is chiefly the interviewer who is responsible for distortions. Consequently, one might be inclined to call for more conscientious interviewers in order to persist in the opinion that the questionnaire is a sturdy tool. Therefore, I shall now give some instances which demonstrate that distortion can also sometimes, perhaps mainly, originate with the respondent. A number of procedures attempt to solve the difficult problem of making the questions asked on a structured questionnaire quite clear to respondents of different educational levels, of varying degrees of intelligence, of different age groups, of different socioeconomic strata, and to respondents living in different geographical regions. One such technique defines the subject of the question by asking questions about similar subjects in advance, thereby making it obvious that since the interviewer has already asked about A, his question about B cannot relate to A. (This method is better than the well-known classroom approach where a raised forefinger is employed: "Now we mean only B, and by no means A!" T h e classroom approach places the respondent in a passive role and one cannot tell whether he is absorbing the facts or, indeed, whether he is even listening. T h e method of eliminating subjects by asking about them leads to active participation, and consequently to a better learning process. Nevertheless, there are times when one cannot altogether dispense with the classroom method.) T h e following example illustrates the technique of active elimination, and also indicates to what extent survey results depend on care-
198
ELISABETH NOELLE-NEUMANN
ful definitions.14 T h e task was to find out how many women wear wigs. In Questionnaire A (about 550 interviews), the women were asked whether they owned or wore wigs made of genuine hair. Questionnaire B (about 550 interviews) asked whether the respondents owned or wore wigs, and also whether they owned or wore hairpieces made of genuine hair. I n this manner there was a true count of wigs as distinct from hairpieces, which might have been included in the count of wigs. T h e results are shown in Table 4. T h e less carefully designed Questionnaire A grossly distorted the result, overstating the true figure by a factor of eight. TABLE 4
WOMENOWNINGWIGS
Ownership of Wigs and Hairpieces Wigs Hairpieces
POSITION OF QUESTIONS I N
Version A
Version B
8%
-
1%
15
CONTEXT ANCHORS
ASSOCIATIONS,
WHICH I N T U R N INFLUENCE REPLIES
One further remark on the importance of position: I t is generally known that the order in which questions are asked can have an irritating effect on the results. T h e contextual effects, establishing cognitive structures or attitudes of the respondents, play an uncomfortably. important role. This has been demonstrated for items in a Guttman scale, and on other occasions.15 An example from an image study will furnish additional evidence. T h e image of three basic foodstuffs had to be explored: potatoes, noodles, and rice.16 Of a number of image components which were presented to the respondent on a set of cards, I need only mention one: the image component "German." One form of questionnaire asked about the image of potatoes first, and then about rice; 30 per cent of the respondents felt that potatoes were "German." But when first questioned about the image of rice, and after that about potatoes, 14 IfD Survey No. 2045 (1968). IsDonald P. Hayes, "Item Order and Guttman Scales," American Journal of Sociology, July 1964, pp. 51-88. Evidence for effects of questionnaire context is also given by Hadley Cantril, op. cit., footnote 1; Jeannette Sayre, "A Comparison of Three Indices of Attitude toward Radio Advertising," Journal of Applied Psychology, Vol. 23, 1939; American Marketing Association, "The Technique of Marketing Research," New York, McGraw-Hill, 1937; H. H. Hyman and P. B. Sheatsley, "The Current Status of American Public Opinion," in John C. Payne, ed., T h e Teaching of Contemporary Affairs, Twenty-first Yearbook of the National Council for the Social Studies, Washington, National Council for the Social Studies, 1959. 16 IfD Survey No. 2043 (1968).
RULES FOR WORDING QUESTIONNAIRES
199
potatoes were thought "German" by 48 per cent (level of significance, 0.1%). Results were similar in the case of noodles: When the noodle image was asked for before asking for the rice image, g per cent stated noodles were "German"; when asked for after the rice image, the figure rose to 24 per cent. This effect was neutralized by alternating the sequence of questions, resulting in an average of 39 per cent "German" for potatoes, and of 16 per cent "German" for noodles. However, what purpose is served by this procedure of alternating sequences and striking averages? Suppose the survey had included other subjects as well, for instance, the image of bread or cornflakes; completely different averages would have resulted. Perhaps, by alternating sequences of questions and then working out averages, we conceal the very fact which it is our duty to expose clearly-the unstable images which change according to the associations formed. Given the results of experiments on attitude change, and considering the importance of personality characteristics in this connection, one would expect different groups of respondents to react in different ways to these questionnaire influences, and this has indeed been confirmed by research hitherto undertaken. T h e effect of context can also be proved with polls on up-to-date political issues, where opinions are more guided by rational considerations. As an example I quote a question about the United States. I n one poll (Version A, 1,000 interviews) this was asked before an identical question concerning Soviet Russia, and in another poll (Version B, 1,000 interviews) just afterwards.17 T h e question was, "Now I should like to ask your opinion on the United States of America. Have you formed a more favorable or a less favorable opinion on the USA during the last one or two years?"ls Results are shown in Table 5. VALIDATION OF THE P R I N C I P L E
THAT, A S A RULE,
A L T E R N A T I V E S M U S T BE E X P R E S S L Y S T A T E D
Finally, I come to the problem of complete and incomplete alternatives in the wording of questions. T h e effects are frequently so 17A study conducted by ORC Caravan Surveys, Inc., seems to arrive a t different results with regard to questionnaire influence (R. Cohen, "A Test on Position Effects i n Caravan Surveys," Princeton, 1964). Actually, the results need not contrast: T h e ORC study mainly examines position effects of a battery of questions. A number of studies on this subject lead to apparently identical results-that i t does not really matter whether a battery of questions is asked early in the interview, or later. See Norman M. Bradburn and William M. Mason, "The Effect of Question Order on Responses," Journal of Marketing Research, November 1964, pp. 57-61. also referring to other published studies which empirically tested effects of question order. Also see Allensbach Archive, IfD Survey No. 2044, September 1968 (respondents' opinion of both German Chancellor and German Foreign Minister remained unaltered, even when the relative question was switched from number 19 i n version A of questionnaire to number 69 in version B). 18 IfD Survey No. 2046 (1968).
ELISABETH NOELLE-NEUMANN TABLE 5
Opinion More favorable Less favorable Unchanged Undecided, no opinion
Version A
Version B
5% 40 42 13
5% 33 52 10
Level of significance, 0.1%. Level of significance, 5%.
staggering that it is apparent that much research needs to be done to establish the psychological and cognitive reasons for this. As an example, I quote a question put to nonworking housewives who were asked whether they would like to go out to work.19 One questionnaire form (300 interviews) did not offer an alternative, and read: "Would you like to have a job, if this were possible?" T h e other version (200 interviews) read as follows: "Would you prefer to have a job, or do you prefer to do just your housework?" T h e results are shown in Table 6. T h e conclusion arrived at is bound to be this: In a structured questionnaire, full alternatives must be offered, and exceptions can only be permitted for absolutely sound reasons. Exceptions are inevitable when simple facts have to be determined, and they are necessary if a response handicap-such as a taboo-has to be overcome. When a particular opinion is considered risky, but the alternative is generally approved and accepted as correct, there is no need to expressly mention the alternative, since conformity encourages the respondent to give the "proper" reply anyway. Likewise, it is better not to mention the alternative if information is wanted: . . or have you
".
TABLE 6
NONWORKING HOUSEWIVES' PREFERENCES FOR JOBS
Preference Prefer to have a job Like to work part-time Not like to have a job, prefer to do just their housework Undecided 19
IfD Survey No. 2029 (1967).
Version A (Without Stated Version B Alternative) (With Alternative) 17% 38
10% 14
19 26
68 8
RULES FOR WORDING QUESTIONNAIRES
20 1
-
never heard of it?" Mentioning this alternative would have a discouraging effect. Split-ballot experiments have shown that a large proportion of respondents claim to be informed, and proof of their knowledge is subsequently given by correct explanations if they are just asked: "Do you know . ?"20 Because the problem of questions with and without worded response alternatives has been treated i n detail elsewhere,21 it will not be considered further here. I t goes without saying that a joint effort, on a broad scale, must be-made soon to .w;rk out a catalog - of empirically tested rules for the design of structured questionnaires.
. .
2 0 E. Noelle-Neumann and W. Schramm, Umfragen in der Rechtspraxis, Weinheim, Verlag Chemie, 1961, p. 93; IfD Survey No. 1025 (1958). 21 E. Noelle-Neumann, "On the Methodological Progress in Survey Research!' For example, that paper quoted the question: "Do you think that all workers of a factory should be trade union members?" The answer "It is up to the individual to decide whether he wants to join the union; one cannot force all workers of a factory to join" increased from 20 per cent to 70 per cent as soon as this alternative was expressly presented (IfD Survey No. 082, 1955). An interesting example, leading to the more intricate problems of question wording, is a split-ballot with two forms of questions, one with a clearly worded alternative, the other just hinting at the alternative: "Are you, or are you not, in favor of Christian workers forming a Christian trade union?" and "Are you in favor of the Christian workers forming a Christian trade union, or would you say that all workers should be in one union?" The percentage of respondents who were opposed to the formation of a Christian trade union rose from 41 per cent with the first version to 65 per cent with the second (IfD Survey No. 088, 1955). By analogy to experiments reported by Hyman (when asking multiple-choice questions, one relevant response alternative was missing), it can be assumed that interviewer bias also increases if alternatives are not expressly presented (Herbert H. Hyman, Znteruiewing in Social Research, Chicago, The University of Chicago Press, 1954, pp. 217 f.).