A New Training Method for Opinion Interviewers* By LESTER GUEST This article describes an experiment designed to test the relative effectiveness of training interviewers by simple discussions of the schedule to be used, by this "customary" training plus the use of practice interviews, and by the "customary" training plus some experience in coding. The author found that error was reduced more when coding was included in the training program than it was when practice interviews were included. Lester Guest is a Professor in the Department of Psychology at the Pennsylvania State College.
A
amount of the data utilized by social scientists in their attempts to analyze social behavior is collected by interviewers. T h e answers-obtained to the questions asked must be completely and accurately recorded in order that subsequent analyses lead to unbiased conclusions. I t has been some time since Rice demonstrated that characteristics of interviewers are related to dist0rtion.l Numerous studies since have multiplied evidence of distortion, sought explanatory concepts, and tested ways to reduce it.2 Ways of attempting to reduce error have been improvement of questionnaire format and design, adequate question formulation, and better selection and training of interviewers. It stands to reason that, other thing bking equal, improved training of interviewers should reduce the errors, either biased or unbiased. ParenCONSIDERABLE
+This study was supported in part by grants from the American Philosophical Society, the Britt Foundation, and the Council on Research of The Pennsylvania State University. IRice, S. "Contagious Bias in the Interview", Amer. J. Soc., 35, 1929, 420423. %f. Blankenship, A., "The Effect of the Interviewer upon the Response in a Public Opinion Poll," Journal of Consulting Psychology, Vol. 4 (1940), pp. 134136; Katz, D., "Do Interviewers Bias Poll Results?", Public Opinion Quarterly, Vol. 6 (1942), pp. 248268; Robinson, D. and S. Rhode, "Two Experiments with an Anti-Semitism Poll", Journal of Abnormal and Social Psychology, Vol. 41 (1946), pp. 136-144; Shapiro, S. and J. Eberhart, "Interviewer Differences in an Intensive Interview Survey", International Journal of Opinion and Attitude Research, 1947, 1, 1-17; Crespi, L., "The Interview Effect in Polling", Public Opinion Quarterly, Vol. 12 (1948), pp. 99-111; Stember, H. and H. Hyman, "How Interviewer Effects Operate Through Question Form", International Journal of Opinion and Attitude Research, Vol. 3 (1949-50), pp. 493-512; Stember, H. and Hyman, H., "Interviewer Effects in the Classification of Responses", Public Opinion Quarterly, Vol. 13, 1949, pp. 669-682; Fisher, H., "Interviewer Bias in the Recording Operation", International Journal of Opinion and Attitude Research, Vol. 4 (1950), pp. 391-411; Smith, H. and H. Hyman, "The Biasing Effect of Interviewer Expectations on Survey Results", Public Opinion Quarterly, Vol. 14 (1950-51), pp. 492-506; and many others.
288
PUBLIC OPINION QUARTERLY, FALL, 1954
thetically, it is fortunate for research in the social sciences that the net error from surveys is considerably less than the gross error.3 We may not always be favored in this manner. T h e amount of time devoted to interviewer training varies considerably from research group to research group, as does the kind of training experience provided.4 In spite of recent improvements in training technique, a further reduction in interviewer error seems in order. I t would seem that the avenue promising the greatest reduction of error would be the provision for training experiences not yet utilized to any degree. Although relatively little research has been reported regarding methods of training opinion interviewers, there is a considerable body of opinion about how to do the job best. Some things that have been done include role playing, practice interviewing with supervisors, practice interviewing in the field (mostly under direct observation) followed by critical discussion, demonstration interviews, and the use of films.5 In most cases, no attempt has been made to evaluate experimentally the results of differential training. I t has been the author's observation that interviewers who have later coded and tabulated questionnaires, some of which they themselves used in conducting interviews, are sharply critical of their own and others' completed schedules. They become painfully aware of the various types of errors that can be and are made, and the difficulties occasioned by them. T h e author's hypothesis is that "training by coding" would reduce interviewer error. Informal discussions with others concerned with interviewing problems support this hypothesis. Among others, Parten suggests coding training in order to improve the quality of returned schedules.6 T h e problem in this study was to investigate whether coding training would lead to fewer errors of all kinds in interviews done by opinion interviewers. As Parten points out, there are dangers in training by coding. Interviewers may learn what is "wanted", and therefore either purposely cheat or unconsciously bias results in order to return schedules of high "quality". Furthermore, there is the practical problem of having widespread aNational Ovinion Research Center, "Isolation, Measurement, and Control of Interviewer ~ f f e c t " , ' ~ e ~ oNo. r t 49, Aug. 1953 (To be published). 41bid. 6Cannell, C, and R. Kahn, "The Collection of Data by Interviewing," Ch. 8 in Research Methods in the Behavioral Sciences, Edit. by L. Festinger and D. Katz, Dryden Press, N. Y., 1953. (p. 374378); Sheatsley, P., "The Art of Interviewing and a Guide to Interviewer Selection and Training," Ch. 13 in Research Methods in Social Relations, Edit. M. Jahoda, M. Deutsch, and S. Cook, Dryden Press, N. Y., 1953. pp. 489-492; Reed, V., K. Parker, and H. Vitriol, "Selection, Training, and Supervision of Field Interviewers in Marketing Research", Journal of Marketing, Vol. 12 (1948), pp. 365-378. OParten, M., Surveys, Polls, and Samples, Harper, N. Y., 1950, pp. 335-336. '
TRAINING METHOD FOR OPINION INTERVIEWERS
289
groups of interviewers code responses for each and every questionnaire before they use them in the field. An alternative course of action would be to provide a general coding experience which would minimize or eliminate these problems. Studies of transfer of training have clearly shown that there is high liklihood of successful transfer if there are identical elements common to two situations and these are perceived by the trainee.7 M7ith this in mind, a series of interviews was "created" to be used as a training device, and the results of such training compared with the results of other methods of training in an experimental situation. In addition to the experimental training program, a series of tests was administered to interviewers in order to test their validity as predictors of error. T h e results of the latter will be considered in a later paper. PROCEDURE
T h e experimental design employed was that of a control group and two experimental groups. T h e subjects were the interviewers and they were divided into three groups for differential training. T h e interviewers were recruited from college students and townspeople. Three sources contributed interviewers; some came as a result of class announcements of the need, some responded to an advertisement in the local newspaper, and some were sent by the university employment office. In order to equate experience, no applicant was selected who had any previous interviewing experience. Although 60 interviewers participated in the final study, a number of others were trained as possible substitutes in the event that some of the persons selected could not complete the assignment. I t is not suggested that these interviewers are typical of the bulk of opinion interviewers,a but for the purposes of this study, it is not necessary that they be a representative sample of interviewers in general. All interviewers were paid at reasonable rates for training time as well as for the interviewing. T h e interviewers were matched on a group basis as follows. Each interviewer's personal time schedule was inspected to determine which of three successive days he would be available for interviewing. Those who were available only one or two days of a week were assigned first, and the remainder distributed in each group so that as much as possible, average age, education, and sex were equal across the groups. Unless the day that each interviewer was available to do interviewing was related to interviewWronbach, L., Educational Psychology, Harcourt, Brace, N. Y. 1954, (Ch. 9). %heatsley, P., "An Analysis of Interviewer Characteristics and Their Relationship to Performance", International Journal of Opinion and Attitude Research, Vol. 4 (1950-51), pp. 473498; Vol. 5 (1951-52), pp. 79-94, and 191-220.
290
PUBLIC OPINION QUARTERLY, FALL, 1954
ing skill, no serious bias in group assignment should be found. Although various tests were administered to all interviewers and these results might have served as matching data, they were not available in time to make group assignments. Table 1 presents the results of the matching as well as the results of the Wonderlic Personnel Test which were determined after the matching had been completed. TABLE 1
Mean age Mean Highest College Semester completed Number of men Number of women Mean Wonderlic Raw Score
Group 1 21.00 4.85 12.00 8.00 30.45
Group II Group 111 22.25 21.55 4.95 4.75 11.00 12.00 9.00 8.00 28.35 31.55
Except for the Wonderlic score, the groups are fairly well matched. Only the difference between the means for Groups I1 and I11 on the Wonderlic is statistically significant (5 percent level). Since it would not be practical for each interviewer group to interview the same respondents, three relatively comparable respondent groups had to be selected. Again a rough matching procedure was employed. Since there were 20 interviewers in each group, and each interviewer was to be assigned two city blocks from which to obtain 10 interviews, a total of 120 blocks was selected on an area sampling basis.9 After these blocks were located on a map of Altoona, Pennsylvania, sets of three blocks in the same neighborhood were grouped on a judgmental basis, and one of the blocks was assigned randomly to each interviewer group. I t was assumed that the 40 blocks assigned to each interviewer group would be comparable. I n order to partially check this assumption, the 1950 Census of Housing, Altoona Block Statistics, was consulted to determine average monthly rental value and average sale value of owner occupied single family dwellings. Averages for the three groups were determined and slight adjustments made to make the groups as equal as possible. (The actual respondents were all "women of the house" in dwelling units selected at random within the selected blocks). T h e results of this matching of interviewing blocks is shown in Table 2, along with other data about the respondents or their households collected from the respondents as part of the study. Inspection of the results in Table 2 shows undeniable differences, but most of them are small and random. @Watson,A., "Respondent Preselection," in Marketing Research Practice, Edit. by D. Hobart, Ronald, N. Y., 1950, pp. 343-355.
TRAINING METHOD FOR OPINION INTERVIEWERS
291
Therefore, it is felt that the respondent groups are as well matched as feasible under reasonable circumstances. TABLE 2 COMPARABILITY OF RESPONDENT GROUPS Group I Group 11 Group III Average monthly rental (Census) $ 29.33 $ 30.85 $ 30.34 Average sale value (Census) 5699.00 5719.00 5777.00 Number of interviews obtained 199 200 199 Estimated Socio-Economic Status A 1% 3% 1% 20 18 32 B C 68 61 54 D 11 18 13 Percentage voting in 1952 Presidential Election 59% 62% 57% Percentage having a television set 60% 57% 61% Number of people living in household 60% 55% 55% 1-3 people 35 41 39 4-6 people 7 or more ~ e o ~ l e 5 4 6 Education of'the' respondent 0-6 grades 7-12 grades Some college Post-graduate Special (non-academic) Don't know (can't remember)
T h e materials used were a series of tests given to all interviewers (to be discussed in a later paper), a "final" questionnaire (used in the field with actual respondents), a "training" questionnaire (artificially constructed with contrived answers from hypothetical respondents), and a code book for the "training" questionnaire. T h e question content for the final questionnaire was of no great im. port for a study of training effects, provided it consisted of different types of questions so that a variety of error types could be made. T h e actual content was determined by a graduate student whose contributions of time and money made it possible to complete this study without exceeding the formal grants.lO T h e content was primarily concerned with women's information about current affairs and their degree of community participation as related to television ownership, and the appropriate questionnaire was designed by the student. T h e types of questions used included free response, filter followed by free response, simple alternative, multiple choice, and check list. Prior to the construction of the final questionnaire, the author designed a training questionnaire utilizing a variety of question forms and con1째Guest, P., "Television as a Variable in Citizenship Activities of Women", Ph.D. dissertation, The Pennsylvania State University, 1954.
292
PUBLIC OPINION QUARTERLY, FALL, 1954
cerned with a number of content areas. T h e questions on this questionnaire were culled from questions found in the Public Opinion Quarterly and the International Journal of Opinion and Attitude Research and formerly used by opinion agencies, as well as some unpublished questions that had been used by agencies. After these items had been molded into a questionnaire, the author invented a series of plausible answers for 12 "respondents." However, "errors" were made in a variety of ways; simple omissions, poorly handled free response questions, failure to clarify ambiguous answers, failure to probe for satisfactory specificity, and the like. T h e 12 questionnaires were ostensibly completed by two interviewers, each doing six interviews. Although mistakes were made by both interviewers, an attempt was made to have one interviewer do a poor job, and one do a rather good job. Many kinds of errors were illustrated, and the difficulties such errors caused in satisfactory coding were made obvious. A code book for these training questionnaires with instructions in its use was constructed. T h e training questionnaires were pre-tested for codability by having three persons code them. On the basis of the pre-tests, revisions were made, and final codes constructed.11 No attempt was made to correct the coding done during the training session, although the efficiency of using coding as a training device would likely be enhanced if each interviewer's practice coding errors were discussed with him prior to actual field work. Although the question of the relative effectiveness of "coding training" vs. "customary training" was of major interest in this study, it would be important to make more than this comparison alone since more time spent in any purposeful activity (even more customary training) might result in better performance. I n order roughly to equate time with different kinds of training experience as well as compare coding training with customary training (different amount of time), three interviewer groups were formed, each consisting of 20 interviewers. One group (Group I) received "customary training" only (4 hours of discussion of interviewing in general, the sampling procedures to be used in the study, and the handling of each question on the final questionnaire). A second group (Group 11) received customary training plus completing three practice interviews using the final questionnaire along with about two and onehalf hours group discussion of the results. Group I1 is the "practice training" group, and had about 4 hours more training than the customary training group. T h e third group (Group 111) received customary training plus approximately four hours of coding training just discussed (12 =This "coding training" packet available upon request.
TRAINING METHOD FOR OPINION INTERVIEWERS
293
interviews to code). All three groups received the customary training together. Table 3 presents the design of the experiment. TABLE 3
I TUESDAY, APRIL 21 2 hours of tests, 1 hour general training TUESDAY. APRIL 28 3 hours df general training WEDNESDAY, APRIL 29 Practice interviewing (1 % to 2 hours) THURSDAY, APRIL 30 Group discussion of practice interviewing results (2% hours) Coding training (about 4 hours) 1ntervie;ing on final questionnaire, Group I WEDNESDAY, MAY6 Interviewing on final questionnaire, Group I 1 THURSDAY, M A Y 7 Interviewing on final questionnaire, Group I11
Grot@ I1 III
X
X
X
X
X
X
X X
X X X X
I t will be noticed that the entire program was completed in 17 days. This was done in order to keep interest high, and to reduce the chances of intercommunication between groups, as well as to keep the training program close to the actual interviewing. Ideally, the whole period should have been even shorter, with all interviews being completed on the same day, and by three groups of interviewers who were isolated from each other to eliminate any possibility of intercommunication. T h e interests of practicality ruled out such a procedure. Although the members of each group knew that they were doing different things during the training period, they were not told what the differences were, and were asked to cooperate by not discussing their activities with members of other groups until the study was completed. I t would be too much to expect that no information was passed or received by anyone, but informal checks after the field work was completed indicated that it was at a minimum. Using the above design, comparisons of Groups-I1 and I11 with I allow an estimate of the effect of more time spent in training of the types used, whereas a comparison of Group I1 with Group I11 allows an estimate of the effect of two tyfies of training with time held relatively constant. T h e criterion for quality of performance, as always, posed a problem. I t has been demonstrated that returned schedules do not adequately reflect the actual interview.12 However, it was manifestly impossible to in12Guest, L., "A Study of Interviewer competence," International Journal of Opinion and Attitude Research, Vol. 1 (1947), pp. 1-17; and Stewart, N. and Flowerman, S., "An Investigation of Two Different Methods for Evaluation of Interviewer Job Performance," American Psychologist, Vol. 5 (1950), pp. 314 (Abstract).
294
PUBLIC OPINION QUARTERLY, FALL, 1954
stall tape recorders in all 600 homes in which interviews were to take place. This would be necessary in order to reconstruct the actual interview perfectly. Therefore, the more often used criteria of success of public opinion interviewers were used in this study, that is, frequency of omissions, inadequacy of specificity of answers, poor handling of free response answers, obvious clerical errors, and obvious errors in following sampling procedures. If anything, error counts made on these bases should be at a minimum since only the most obvious errors could be counted. T h e scoring of errors was all done by the experimenter, and was somewhat strict. Any deviation from the instructions was scored as error, even though an answer might be determined from another source on the questionnaire.13 If a space was left blank without a record of "don't know", it was counted as an error, although in one list of men to identify, a blank probably meant the respondent didn't know. (Some interviewers were charged with many omissions in this question). Omission of the interviewer's name, or the date of the interview was counted as error in spite of the fact that in this study, both were readily determined. This strictness resulted in a high average error per interviewer and per interview. Although many of the errors committed were minor in nature, the author feels that these so-called trivial errors are symptomatic of proneness to error, and at least are annoying to deal with. I t was impossible to make a complete check of sample dwelling units to determine whether sampling rules were strictly followed. However, whenever it was apparent that a sampling error had been made, it was counted. This would include omission of any part of the complete address or an incorrect address, as well as not following the pre-arranged route correctly. (Note that an omission of sampling information is counted as sampling error). T h e sampling plan was designed so that each interviewer would obtain at least 10 interviews. I n two cases, interviewers returned with only nine, and several times, interviewers returned with more than ten. For the study of training, only the first ten interviews completed were used. In the two cases where only nine interviews were returned, an estimate of the number of errors that would be made for ten was made by finding the average error for nine and adding it to the actual errors for nine interviews. Statistics were computed with 19 degrees of freedom for each interviewer group. I n order to check the reliability of the error count, each person's interviews were counted twice independently by the experimenter. T h e Pearson r for the two counts for total number of errors per interviewer was .99, 131n order that the results of the content study not be adversely affected, a follow-up was conducted to complete interviews where serious omissions were found.
TRAINING METHOD FOR OPINION INTERVIEWERS
295
.99, and .98 for Groups I, 11, and 111, respectively. T h e second error scaring was used in subsequent analyses. I n the process of counting errors, they were grouped into 20 types of errors. If reliability by type of error had been computed, it would have been considerably lower than for total errors. RESULTS
T h e major hypothesis of this study was that there would be a reduction in frequency of error from the coding training group, to the practice training group, to the customary training group respectively. Table 4 presents the results of the analysis of differences between means and standard deviations. T h e coefficients of variation are also given. TABLE 4 SIGNIFICANCE OF DIFFERENCES IN INTERVIEWER ERROR AS RELATED TO METHOD OF TRAINING Mean errors Sigma errors per interviewer per interviewer V Group I 35.60 24.67 69.30 28.00 17.71 63.25 Group I1 Differences 7.60 6.96 6.05 t value 1.09 1.41 Group I 35.60 24.67 69.30 Group I11 21.40 13.14 61.40 Differences 14.20 11.53 7.90 2.55 * * t value 2.22 * Group I 1 28.00 17.71 63.25 Group I11 21.40 13.14 61.40 Differences 6.60 4.57 1.85 t values 1.19 1.28 *Differences significant a t 5% level, one tail test. **Differences significant a t 1 % level, one tail test.
T h e differences are all in the predicted direction, but only the differences between Group I and Group 111 are significant by commonly accepted standards. There seems little doubt that customary training plus coding training is superior to customary training alone both in reducing average error and reducing variability. Although there is some possibility that the other differences may have arisen by chance, the results suggest that extra training devoted to practice training or coding training reduces average error and variability of error, and that coding training reduces average error and variability of error more than practice training. Two cautions should be observed. First, as previously mentioned, the groups were not matched on the basis of general academic aptitude, al-
296
PUBLIC OPINION QUARTERLY, FALL, 1954
though at a later time scores for the Wonderlic Personnel Test became available. Table 5 indicates that there were differences between the groups in this respect. TABLE 5 TESTRESULTS FOR THE GROUPS WONDERLIC Mean Raw Sigma Raw Percentile Score Score Range Group I 30.45 5.52 32-98 32-98 28.35 4.70 Group I 1 Group I11 31.55 67-99 5.03
One of these differences is significant to the 5 percent level, that between the means of Groups I1 and 111. If intelligence and errors are negatively correlated as has been found in previous studies,l4 the significance of the difference between these groups in terms of errors would be magnified. However, Pearson r's between total number of errors and Wonderlic scores for each group were as follows:- Group I, .05; Group 11, -.41; and for Group 111, -.15. Combining all groups, the relationship was -.14. Only the -.41 is statistically significant.lE Although some negative relationship probably does exist, it seems small enough in this instance to suggest that the direction of the difference~between the groups would not be changed as a result of the differences in intelligence between the groups. Another factor possibly favoring one group over another might be that the interviews were done on different days, customary training first, then practice training, and then coding training. Insofar as any information was passed, the groups doing the later interviews might be favored. There is little reason to suspect that this would be a major factor in the results or that the day of the week on which the interviews were completed would be related to proneness to error. Each day during which interviews were done was about the same as far as weather was concerned. Since allocating errors into categories of kinds of errors was difficult to do reliably, little will be presented in this regard. However, by combining some of the categories, and leaving out others which are considered less serious, a rough idea of areas of error can be given. Originally, the types I4Guest, L., op. cit; Guest, L. and Nuckols, R., "A Laboratory Experiment in Recording in Public Opinion Interviewing," International Journal of Opinion and Attitude Research, 1950, 4, 336-352; Keyes, D., "A Study of Interviewer Effect and Interviewer Competence," M. A. Thesis, Univ. of Denver, 1949, (from "Isolation, Measurement, and Control of Interviewer Effect", o p cit); Dvorak, B., Fox, F., and Meigh, C., "Tests for Field Survey Interviewers", J . Mkt., 1952, 16, 301-306. 16There is no guarantee that each interviewer within a group was exposed to the same potentiality for error, but on the other hand, there is no reason to suppose that those of greater intelligence were confronted with the harder situations. If this were true, then the degree of negative correlation would be masked.
297
TRAINING METHOD FOR OPINION INTERVIEWERS
of errors were classified into 20 categories. I n Table 6, the types of errors counted have been limited to 9 categories, grouped into 4 headings. These four headings include only the more serious errors. TABLE 6
Omissions, except sampling information (2 categories) Not clear or specific responses (3 categories) Sampling errors (3 categories) Poor handling of free response (1 category)
N =
Group 11 47% 25 27 1
::yo
100% 100% 383 313
100% 307
I 58% 20 18 4
111 16 3
---
T h e three groups maintain the same relative position in terms of total number of serious errors as they did for total errors, Group I having the most, and Group I11 having the least. However, in terms of percentage of errors by types committed, there are significant differences between the groups. I n the omission category, the difference is significant to the 1 per cent level between Groups I and 11, and to the 5 per cent level between Groups I1 ahd 111. I n the sampling error category, the differences between Groups 1 and I1 and between Groups I1 and I11 are both significant to the 1 per cent level. A possible explanation for the differences in the omission category is that Group I1 was trained on the final questionnaire and thus were more conscious of the exact places where they would be prone to omissions. As far as the differences in percentage of sampling error, Group I1 did not get emphasis on this type of error in the practice interviews. However, neither did the other groups get such emphasis. Therefore, the reasons for these differences are not easily explained. Another comparison that was made was that of the number of "don't knows" obtained in each group as a whole. There is only a slight reason to suppose that extra amounts of training of the kind given would tend to make interviewers differentially get more "don't knows". All the interviews for each group were examined for the total number of don't knows obtained as a percentage of the total number of don't knows possible. T h e number of don't knows possible in each group is a function of number of filter questions answered in ways requiring follow-up questions. T h e results of this analysis are given in Table 7. None of the differences between the percentages are significant, and it is worthwhile to note the small percentage of such responses in all the groups. I t has been pointed out that situational elements may be related to de-
298
PUBLIC OPINION QUARTERLY, FALL, 1954
gree of error in interviewing. Thus, it might be that the last half of an assignment might have more errors than the first half as a result of fatigue or anticipation of finishing, or just because it might be harder to do a good job after daylight hours. On the other hand, i t is conceivable that these factors might be more than negated by a practice effect and a development of confidence as more and more interviews are completed. Some previous research has indicated that where the situations are similar, there is a high degree of consistency of error, and where dissimilar, there is less consistency.1" TABLE 7 ANALYSIS OF DON'TKNOWS
Group I Number possible Number obtained Percentage Don't Knows
Group II
Group III
7150 307 4.29%
7158 305 4.24%
7141 323 4.52%
If fatigue is operative, it should lead to more errors in the last half of an assignment. Each interviewer's errors were analyzed for the first five interviews vs. the last five interviews. I n addition, each interviewer's errors for odd numbered vs. even numbered interviews were determined. These data are presented in Table 8. TABLE 8 ERRORS FOR FIRSTHALF-SECOND HALFOF ASSIGNMENT AND FOR ODD-EVEN NUMBERED INTERVIEWS 7st half 2nd half Mean Sigma Mean Siema r 1518 17.80 12730 .84 Group I 17.80 9.60 15.15 9.18 .73 Group I1 12.85 6.21 .84 7.47 10.55 Group I11 10.85 Odd Group I Group I1 Group I11
Mean 15.50 12.75 10.30
Sigma 1656 8.84 6.95
Mean 20.10 15.25 11.10
Even Siema 1<04 9.36 6.74
r .85 .89 .84
None of the differences in this table are statistically significant. Apparently, either there is n o fatigue effect operating or else the practice effect cancels out such an effect. T h e comparison between the odd and even interviews shows as much variation as the first vs. last half comparison. Evidently, the amount of error is more a function of situational factors within each interview rather than in temporal order. However, the high correlations found in both kinds of comparisons indicates a high degree of consistency of error behavior throughout the whole interviewing assignlaReported in, "Isolation, Measurement, and Control of Interviewer Effect", o p cit.
TRAINING METHOD FOR OPINION INTERVIEWERS
299
ment. This is only partly in accord with the previous research which indicated that uniform behavior on the part of respondents would lead to high consistency of error within the interviewer, and thus transient factors would play less part in introducing error. I n this study, one would not expect highly uniform behavior from interviewee to interviewee, and yet there is a high relationship in commission of error from first half to last half of an assignment and also between odd and even numbered interviews. Of course, the previous research was on the basis of the relationship between errors within individual interviews. CONCLUSIONS
Generally speaking, the samples of interviewers were not large enough or the differences between the groups large enough to demonstrate significant differences in commission of errors in public opinion interviewing as a function of training of interviewers. However, all differences were in the predicted direction, that is, fewer errors and less variability in the coding training group than in the other two groups, and fewer errors and less variability in the practice training group than in the customary training group. Ordinarily, coding training could be completed more easily than practice training, even by mail if necessary. Furthermore, the coding training is a more general training than practice training with the actual interview to be conducted, and doesn't require the "final" questionnaire to be available before training can start. Of course, practice training with a general questionnaire was not investigated in this study. Obviously, there are several other methods of training that might prove feasible, but were not explored in this study. There is a possibility, as yet not explored, that the results obtained from the coding experience might prove to be a valid predictor for the selection of interviewers. An analysis of the number of don't know answers obtained by the interviewers in the three groups showed no significant differences. There were significant differences found in the types of errors made among the three groups, but this may be a function of specific lacks of experience within a training method, rather than of the training method itself. Finally, there was no evidence to support an hypothesis that there is an increase in error toward the end of an assignment that might result from fatigue or goal expectation.