Missing Data in Sociological Research

Page 1

Am Soc (2012) 43:448–468 DOI 10.1007/s12108-012-9161-6

Missing Data in Sociological Research: An Overview of Recent Trends and an Illustration for Controversial Questions, Active Nonrespondents and Targeted Samples Jeremy R. Porter & Elaine Howard Ecklund

Published online: 23 June 2012 # Springer Science+Business Media, LLC 2012

Abstract In an age of telemarketers, spam emails, and pop-up advertisements, sociologists are finding it increasingly difficult to achieve high response rates for their surveys. Compounding these issues, the current political and social climate has decreased many survey respondents’ likelihood of responding to controversial questions, which are often at the heart of much research in the discipline. Here we discuss such implications for survey research in sociology using: a content analysis of the prevalence of missing data and survey research methods in the most cited articles in top sociology journals, a case study highlighting the extraction of meaningful information through an example of potential mechanisms driving the non-random missing data patterns in the Religion Among Academic Scientists dataset, and qualitative responses from non-responders in this same case. Implications are likely to increase in importance given the ubiquitous nature of survey research, missing data, and privacy concerns in sociological research. Keywords Missing data . Non-response . Survey . Sociology . Religion . Family

Introduction Since early references to the survey of population samples as “a new research technique” by Gerhard Lenski in 1961, the landscape of survey research has changed dramatically (Lenski 1961). In many ways the widespread development and implementation of the sample survey method has been instrumental to the accumulation of This research was supported by a grant from the John Templeton Foundation (grant #11299; Elaine Howard Ecklund, PI). The authors also wish to thanks and acknowledge that Kelsey Pedersen provided invaluable help with manuscript editing and formatting. J. R. Porter (*) City University of New York, Brooklyn College and Graduate Center, 218 Whitehead Hall, Brooklyn College, 2900 Bedford Ave, Brooklyn, NY 11220, USA e-mail: jporter@brooklyn.cuny.edu E. H. Ecklund Department of Sociology, Rice University, Houston, TX, USA


Am Soc (2012) 43:448–468

449

knowledge across the social sciences and specifically sociology (Wright and Marsden 2010). In more recent years, however, a number of obstacles have developed in the implementation of the survey as a reliable method for the collection of social data. Many of these obstacles have been overcome by the introduction of more complex sampling and analytic methods, but many remain a concern to those involved in the collection of data via the sample survey. Here we highlight a few of these issues and their direct relevance to the most influential research conducted in the field of sociology in the recent past. The inundation of Americans with requests for information and opinions has reached an all time high. As an increasingly common response, many potential survey participants are actively withholding information or refusing to respond at all. Compounding this issue is the fact that an inquiry for information about opinions on politically or socially sensitive topics is currently met with extremely high levels of suspicion. It is extraordinarily difficult, then, for legitimate researchers to achieve high response rates for their surveys. Random-sample surveys of the general population now routinely report response rates of only 30 % (PEW Research Center for the People and Press 2004). And even when a high response is achieved, subjects often do not answer all of the questions on the survey, leading to the intractable problem of missing data. In relation to the latter, the current political and social climate, coupled with more advanced methods for respondent identification and decreased privacy make many uneasy about divulging personal information, and, in particular, their feelings about controversial issues. Rose and Fraser (2008) point out the ubiquitous nature of missing data on surveys in this current age of concerns for privacy as well as political or social sensitivity, in particular the impact of these on survey response. Their research brings to light the importance of the issue at hand by stating that “missing data are nearly always a problem in research, and missing values represent a serious threat to the validity of inferences drawn from findings” (Pp. 71). Unfortunately, in order to accurately estimate relationships among indicators with high levels of missing data, very good prior knowledge about the patterns of what Allison (2002) calls “missingness” is needed. Furthermore, estimation methods, driven by variations in data replacement and imputation procedures, are at risk of producing statistics that are highly sensitive to the handling of the missing data. Within the last decade, attention given to the issue of missing data in the research process has included much methodological and theoretical discussion about the implications of low response rates and item non-response, and how to overcome them (Abraham et al. 2006; Alosh 2009; DeSouza et al. 2009; Garcia et al. 2010; Gelman et al. 2005; Groves 2006; Groves et al. 2006; Haung et al. 2005; Khoshgoftaar et al. 2007; Martinussen et al. 2008; Montiel-Overall 2006; Olson 2006; Paik 2004; Porter et al. 2009; Rose and Fraser 2008; Southern et al. 2008; Satten and Carroll 2000; Verbeke and Molenberghs 2000, 2010). As response rates continue to drop, however, and missing data leave greater lacunae, researchers have been forced to handle less-than-ideal data situations. Common fixes for large amounts of missing data include the well-known statistical procedures of data weighting, imputation, and other corrective schemes. The results of such imputation methods may be detrimental to the ability of researchers to confidently present the results of their studies (Allison 2002; Dawid


450

Am Soc (2012) 43:448–468

1984; Gelman et al. 1996; Rubin 1984). For instance, Alosh (2009) highlights the implications of variations in results based on differential imputation schemes. And Garcia and colleagues (2010) point to potential selection biases associated with different imputation schemes, while Haung and colleagues (2005) tackle the issue of relying on proxy information for Bayesian methods of data replacement. In each of the above examples, the researchers point out considerable variations across imputation procedures and highlight an important point when dealing with missing data; namely, the types of methods we use to account for missing data have a direct impact on findings. This is especially important when policy implications are nested within research and its findings. Yet, we continue to lack a clear understanding of the kinds of questions to which specific populations of people are more or less likely to respond (Babbie 2007; Maxim 1999; Allison 2002; Griliches 1986). In light of potential result-quality issues and decreasing rates of survey response, a systematic interpretation of respondents’ refusal to answer an entire survey or select questions is vitally needed. The present case contributes to our understanding of this phenomenon by documenting the relationship between differential patterns of item non-response, controversial survey items, and insightful information that may be obtained from specific patterns of data missingness. Furthermore, as part of her study, Ecklund was in direct communication with non-respondents, garnering their reasons for not responding to a survey of scientists’ attitudes towards religion (discussed in greater detail below). Here—in the first data collection of its kind—we use the Religion among Academic Scientists (RAAS) survey to examine the types of survey questions the targeted group did not answer and some of the reasons they did not answer them. The controversial questions in this study relate to religiosity among scientists, a group of individuals often caught in the middle of the larger societal debate about whether religion and science are in conflict (Evans and Evans 2008). Our findings are generated from analyses of missing data from what we call “active nonresponders,” both those who filled out part of the survey but abstained from part and those who did not respond to the survey but wrote to tell us why. In both instances, we have some advantages in our interpretation of the data (or lack of data, in this case). First, controversial issues were the primary interest of the RAAS, so non-response had already been anticipated and plans made for its interpretation. Second, potential RAAS respondents directly communicated with this study’s PI, who is a fellow member of the targeted group. This familiarity allows for an interesting case in which direct discourse about the respondents’ reasons for non-response can be linked to their refusal to participate in the survey or parts of it. How often does a solicitor get to knock on the closed door again to ask, “Why not?” According to our findings, target group matters. Our respondents—scientists from top U.S. universities—did not have the same missing data patterns (in regards to specific questions) as researchers have projected from surveys of the general population. Question topic also matters. Scientists were more likely to present missing data on the more controversial questions related to religion, especially when asked to compare their religiosity to that of the general population. We link their missing data patterns to family formation, religious socialization, and present religiosity. The specific findings may be further explained by the historical context in which the survey was completed. More broadly, our findings show that traditional statistics do


Am Soc (2012) 43:448–468

451

not always help us understand the reasons behind missing data and low surveyresponse rates. Lastly, select populations may display unique missing data patterns that need to be understood as survey researchers attempt to develop more rigorous methodology.

Item Non-Response: Patterns of Implications When attempting to understand potential patterns associated with missing data through the process of conducting analysis with collected survey data, the most basic form of the data can be dichotomized into two categories, unobserved (missing) and observed (non-missing) (Gelman et al. 2005). In Bayesian notation, the complete dataset is then made up of two components as expressed in the following equation: ycom Âź yobs; ymis Here the complete dataset (ycom) is composed of the observed cases (yobs) and missing cases (ymis) for any given variable. This very simplistic notation of the makeup of a complete dataset is somewhat logical and allows for the visualization of these two components in relation to one another. As the observed data (yobs) increases as a proportion of the dataset, and given all else equal in terms of analytic techniques, the findings are assumed to be more reliable in relation to instances where the proportion of missing data (ymis) associated with any one variable is high. Furthermore, one should expect this zero-sum relationship to be dynamic as the higher the proportion missing (and thus the lower the proportion observed) the less reliable coefficient estimates associated with that set of responses are. With regards to the magnitude of missing data, introduced above, the type of missing data is also very important to understand (Rubin 1976; Little and Rubin 1987; Little 1995; Allison 2002; Alosh 2009; among others). Most important here is the level of randomness associated with the missing data. If missingness is independent of the observed and unobserved data, then the data are denoted as being missing completely at random (MCAR). If missingness is dependent on the observed and unobserved data, however, it is considered missing not at random (MNAR). Similarly, if missingness is independent of the unobserved data, conditional of the observed data, it is considered missing at random (MAR). Missingness is ignorable if the patterns are MCAR or MAR, meaning the item-non-responses are not identifiably associated with other characteristics of the survey sample. If item-non-response is not identifiably independent of other sample characteristics (i.e. MNAR), it is nonignorable (Verbeke and Molenberghs 2000, 2010; Gelman et al. 2005; Alosh 2009). Data identified as MNAR means that there is some underlying pattern to the data missingness that is associated with a trend in sociodemographics, attitudes, or other categorizing indicator of the sample. As a direct result, survey data with extremely low item-response rates and nonrandom patterns of missing data are less reliable as indicators of the research area in question and may ultimately fail to provide any useful guide for the implementation of social policy. While recent reports from The Pew Research Center indicate the potential for representativeness in spite of high levels of survey non-response (PEW


452

Am Soc (2012) 43:448–468

Research Center for the People and Press 2004), certain types of missing data patterns unquestionably introduce bias to sample statistics (Allison 2002). Thus, the recent trends in survey research that include increasing rates of unreliable and biased social science data from survey non-response are particularly troubling in an era when reliable data are vitally needed to continue addressing the most pressing social problems of our day. In some of the more popular texts on social research methods, scholars mention a number of ways to deal with the existence and magnitude of missing data (Babbie 2007; Maxim 1999; Neuman 2003). These texts uniformly agree that in-depth analyses of missing data could yield potential insights into and interpretation of their meaning. Furthermore, the past decade has seen further advancements in the handling and understanding of missing data across a wide diversity of disciplines. This interdisciplinary focus on the issue at hand brings to light the importance surrounding recent trends in data-collection quality and research reliability, regardless of discipline. For instance, this review of recent literature on the subject has found contributions from clinical trials (Alosh 2009; DeSouza et al. 2009; Garcia et al. 2010; Haung et al. 2005; Southern et al. 2008), social work (Rose and Fraser 2008), library sciences (Montiel-Overall 2006), statistics and data mining (Khoshgoftaar et al. 2007; Verbeke and Molenberghs 2010), public opinion (Abraham et al. 2006; Groves 2006; Groves et al. 2006; Olson 2006), sociology (Allison 2002), demography (Porter et al. 2009), and public health (Paik 2004). Of course this list is not exhaustive, but it does give an indication of the very recent and widespread interest in the subject of missing data across many–seemingly–unrelated academic disciplines. Regardless of disciplinary context, in all cases the primary interest of the research has been to better understand and deal with patterns of non-ignorable missing data. Yet, very little research exists among sociologists concerning potential issues with missing data and issues that may arise as a consequence. For the most complete analysis of missing data issues within the field of sociology, see Allison 2002. A discussion of some of the more basic issues associated with missing data follows. Nonignorable Data More often than not, data are not missing at random and have underlying in their patterns more systematic reasons for non-response (Allison 2002; Griliches 1986). Allison (2002) points out that if the data are not missing at random, we say that the missing data mechanism is nonignorable. He goes on to say that “unfortunately, for effective estimation with nonignorable missing data, very good prior knowledge about the nature of the missing data process usually is needed, because the data contain no information about what models would be appropriate, and the results typically will be very sensitive to the choice of model” (p. 5). As mentioned above, we specifically find ourselves at an advantage as we have confidential and direct information concerning both the survey respondents as well as the topic of interest (religion and science). While the religion and science debate is much more controversial among certain groups in our society, there is some information that is consistently likely to yield high levels of missing data. Researchers generally find that unanswered questions,


Am Soc (2012) 43:448–468

453

resulting in nonignorable missing data-patterns, include questions about recent events (Babbie 2007), intrusive items (such as income, sexual behaviors, and criminal acts) (Maxim 1999), lost or restricted data (Neuman 2003), and other questions that respondents simply don’t know how to answer. Although researchers have some information about what kinds of questions respondents are less likely to answer, we do not have more fine-tuned data on whether these unanswered-question patterns are similar across different types of groups. For example, would a survey of an elite population generate the same unanswered-question patterns as a survey of the general population? The Reality of Ignored Data We focus our attention in this section on missing data and social science research, more specifically highly visible research in the discipline of sociology. Our focus is grounded in recent literature highlighting the ubiquity (Rose and Fraser 2008) and high levels of frequency (Montiel-Overall 2006) in which missing data exhibits itself in social science research. Given the high levels of researcher encounters with issues involving item-non-response, considerable focus has been given to the post-hoc corrections for the estimation of social relationships involving data that are MNAR (non-ignorable). Rarely do researchers who are on the ground, however, have discussion about or develop methods for collecting information on the reasons that individuals decide not to participate in surveys or not to answer certain questions. Yet, we know that both survey and item-non-response are at or approaching all-time lows and the ability to understand these implications is extremely important. As an example of the ubiquitous nature of survey research (and by relation, associated issues of missing data inherent in survey research) in sociology, we have searched the most-cited articles in the American Sociological Review (ASR), the American Journal of Sociology (AJS), and Social Forces—often regarded as the most prestigious general sociology journals in the field according to empirical observation and/or reputational standing—we found that (using A.W. Hartzig’s “Publish or Perish”1 publication impact software) of the 105 most-cited articles during the past 10 years, 79 % (n083) used primary or secondary data collected from survey/ questionnaire techniques.2 The results of this analysis are presented in Fig. 1. The surveys that provided the data for examination in these articles included the wellknown large-scale Panel Study of Income Dynamics (PSID); the General Social Survey (GSS); the National Longitudinal Survey of Youth (NLSY); the National Health and Nutritional Examination Survey (NHANES); the Current Population Survey (CPS); the National Longitudinal Study of Adolescent Health (AddHealth); the Health and Retirement Survey (HRS); the World Values Survey (WVS); the Union of International Associations, National Organizations Survey (NOS); the 1

Harzig, A.W. 2011. Publish or Perish, version #3, available at www.harzing.com/pop.htm In order to ensure an equal coverage across the three journals, the top 35 articles since 2000 were identified and downloaded for further examination of data collection techniques and handling of any noted missing data. We further standardized the selection process by total cites per year so not to give more weight to articles published earlier in the decade based on total cites. We do understand that there is still a lag in regards to publication and citation timing, but believe that we have collected a representative sample of the most visible sociology publications over the past decade.

2


454

Am Soc (2012) 43:448–468

Fig. 1 Percentage reporting the usage of survey or questionnaire methods in the collection of data used in the identified articles (ASR, AJS, Social Forces), N0105

Survey of Crime, Community, and Health (CCH); and the Project on Human Development in Chicago Neighborhoods (PHDCN), among other less well-known surveys. As the figure is organized, the results in the left-hand side of the chart apply to all 105 of the most-cited articles across the three journals (35 each from ASR, AJS, and Social Forces). Here one can see the strong reliance of these articles on survey/ questionnaire methods for the collection of primary/secondary data sources that were used in their analyses. In the right-hand side panel, the use of survey data by the mostcited articles over the past decade is broken down across each of the three journals. Here noticeable differences by journal type are uncovered and findings show that articles in Social Forces were the most likely to be based on survey data (92 %), followed by ASR (86 %), and AJS (60 %). The variations across journals are not necessarily important to our analysis here, except to highlight the differences that exist in types of research projects published in specific types of journals and the resulting implications of item non-response in surveys. We also understand, however, that there are documented issues associated with all types of research methodologies and our intention is not to highlight potential problems with published articles in these top sociology journals, but instead to shine light on the importance of understanding issues associated with item-non-response with special attention to controversial questions. Again, the fact that missing data exists is nothing new to even the most novice of researchers; however, the ability to make informed decisions given nonignorable patterns of missing data continues to escape even some of the most advanced. Of the 79 % of all journals in our sample that made use of survey data, item nonresponses to controversial questions such as household income were, almost without exception, supplemented with imputed values in order to avoid a substantial loss of the sample. Of the surveys listed above, the PSID codebook reports issues with missing variables that include questions about the family, income, marital/fertility issues, and open-ended occupation/industry items. And the income question for the 2006 wave of the GSS reported over 15 % missing data, even though it was measured


Am Soc (2012) 43:448–468

455

via the less-intrusive categorical question, thought to overcome well-documented problems with respondents reporting their exact income. As further documentation of the methods used in dealing with issues of missing data, Table 1 presents a tabular breakdown. From the table, one can see that 21 % of the studies in this sample did not use a survey or questionnaire based data collection technique. For those that did, nearly 20 % of all articles in the sample explicitly acknowledged an impact that missing data had on the design of their analysis (6.7 % imputed values and 1 % excluded cases) or on the presentation of their results (7.6 % acknowledged issues associated with the estimation of their results from missing data and 1.9 % tested for comparable relationships across other intact variables). Ultimately, nearly 61 % did not acknowledge any action taken in their analysis to account for missing data, but many did provide links to primary source materials associated with the secondary sources, which often indicated a correction, weighting or imputation for missing data. Others did not acknowledge missing data or simply mentioned that they screened for missing data. These last groups are interesting; perhaps they did not have issues with missing data or simply did not account for the patterns of missingness that are appropriate with survey/questionnaire data. For those who did not mention missing data or simply relied on a linkage to materials from a secondary source it is possible that these nonignorable patterns present underlying correlates that are likely to help guide future data collection procedures and even resulting policy implications from the study at hand. While the authors of each of the ASR, AJS, and Social Forces articles handled their patterns of missing data and non-response in a manner deemed appropriate by reviewers and editors alike (as evidenced by their ability to pass peer review in the discipline’s top outlets), the fact that missing data was a notable issue in nearly a quarter (~22 %) of those articles that used a survey/questionnaire collection method indicates the magnitude of potential bias introduced by this concern. Perhaps an even larger issue here is the reliance on secondary source materials (and corrections, imputations, etc.) and no mention of missing data issues, which make up the other 73 % (64 out of 83) of studies using data collected via survey methods for their analysis. This highlights our main concern. That our discipline has such a high Table 1 Identified techniques for handling missing data issues among the most influential sociology articles (N0105) Technique for handling missing data a

%

n

Did not use survey data

21.0

22

Imputed missing values

6.7

7

Acknowledged issues with missing data as a study limitation

7.6

8

Excluded variables with high levels of missingness

1.0

1

Tested for generalizability by comparing results to other research

1.9

2

60.9

64

100.0

105

Provides links to materials for secondary data sources, mentions “screening” for missing data, or does not explicitly acknowledge any missing data Total Number of Articles

a Categories are based on self-reported methods for handling missing data that appear within the actual published article


456

Am Soc (2012) 43:448–468

reliance on such methods, and that so little is understood about the unique circumstances driving these patterns in each case, begs for the continued development of such an understanding. In fact, it is well known that social scientists take a great number of precautions in collecting data; and yet, missing data still exists and it is very often systematic. Most important, what are the underlying mechanisms of item non-response and is it possible to gain substantial insights by examining missing-data patterns on survey questions? We expect that the answer is “yes” and unique to each data collection effort given the social and political climate, sensitivity of the questions being asked/ study topic, geographic location, and numerous other issues that must be taken into account when designing data collection tools and analyzing/presenting their results. This is most true when a significant percentage of the missing data is directly involved in dependent and primary independent measures, especially when the core findings of a research project are based extensively on such indicators. The special case provided by the RAAS allows for a unique look into the reasons for complete survey non-response. Often these reasons related directly to the controversial nature of the issue at hand and its timely relationship to historical circumstance. Such analyses could potentially yield important insights, allowing researchers to understand both their topic and their respondents to a greater degree than is possible with existing methods.

Purpose and Expectations The RAAS is a study of how elite scientists in the United States understand issues related to spirituality and religiosity. It provides an opportunity to understand the correlates of missing data on particular questions among a specific population (elite scientists) given the sensitive and controversial issue of religion and how it is negotiated in academia. Furthermore, given their appointment at the most prestigious universities in the United States, this sample is of scientists who have dedicated much of their lives to their work. As one might expect, asking this population questions about religion and their feelings on the subject is likely to be viewed by many as intrusive or controversial. Some of the more intrusive questions were aimed at understanding scientists’ religious outlook in comparison to other Americans, or their beliefs about God or a god, the Bible, and religion. (Question wording for these specific questions is further discussed below). These questions yielded nonignorable patterns of item non-response, though the survey achieved a high response rate overall (75 %). Interestingly, of the 25 % that did not participate in the survey, roughly 26 % directly corresponded with the study’s PI, providing insightful qualitative information regarding reasons for not completing the survey. The RAAS is appropriate, then, as a case for examining nonignorable missing data as well as the reasons individuals did not respond to a survey on a controversial topic (Ecklund 2010). Further, the survey was of a targeted sample, meaning that we can potentially generalize from this case to other surveys of scientists or other elite populations and even inform future studies of similar populations on the development and dissemination of their data collection instruments.


Am Soc (2012) 43:448–468

457

From this point forward, we provide an example analysis in which we examine a series of controversial survey questions in the RAAS in hopes of better understanding the non-random pattern associated with patterns of missing data within the unique context of that specific survey. Again, we believe that each data collection effort is undertaken within a unique context that must be understood in order to fully understand the responses (and lack of responses) from any one instrument at any given point in time. This analysis is undertaken with that in mind and in hopes of uncovering the unique relationships associated with the identified, and systematic, patterns of missing data. In order to do so, we set forth a series of assumptions and expectations which we will examine and that will guide our research. First, we assume that controversial questions (in this case, questions about personal religious views) may be more likely to garner missing data and that these patterns will be statistically nonignorable. Second, we expect a series of findings in relation to specific questions and patterns of non-response in the RAAS. We expect that elite scientists will have different patterns of missing data on population-specific, more intrusive measures (i.e. race, age, gender, and nativity) (Ecklund 2010) when compared to what other survey researchers have found among the general population. We also expect level of prestige will be an important predictor of missing data about controversial questions on religion. Given previous findings about differences in scientists’ religiosity according to rank and number of articles published (Ecklund), we expect that scientists at the assistant professor level and those who have published fewer articles (controlling for differences in article-publishing conventions among disciplines) will have more missing data about particularly controversial questions than will scientists at the associate and full professor levels. Our expectation is based on the reasoning that those who have less prestige will feel less free to answer controversial questions related to religion. Because of recent media attention regarding the irreligiosity of natural scientists, we also expect natural scientists will be more likely to have missing data on controversial questions about religiosity than will social scientists and, based on previous research, about religious socialization (Ecklund and Scheitle 2007; Ecklund 2010), we expect that scientists who are religious will be more likely to answer questions about religion, because they face less fear of being perceived as different from the general population. Finally, we expect, based on previous research indicating that having children is a positive predictor of religiosity among scientists (Ecklund and Scheitle 2007), that those scientists who have children will be more likely to be religious and more likely therefore to answer questions about religion on the survey.

An Example: Controversial Questions in the Religion Among Academic Scientists Study The RAAS study began during May 2005, when 2,198 faculty members in the disciplines of physics, chemistry, biology, sociology, economics, political science, and psychology were randomly selected from the universities in the sample. Although faculty were randomly selected, oversampling occurred in the smaller fields and undersampling in the larger ones. For example, a little more than 62 % of all


458

Am Soc (2012) 43:448–468

sociologists in the sampling frame were selected, while only 29 % of physicists and biologists were selected, reflecting the greater numerical presence of physicists and biologists at these universities when compared to sociologists. In analyses where discipline is not controlled for, data weights were used to correct for the over- and undersampling. (The data-weighting scheme is available upon request). Scientists included in the study were randomly selected from seven natural and social science disciplines at universities that appear on the University of Florida’s annual report of the “Top American Research Universities.”3 The University of Florida ranked elite institutions according to nine different measures, which include total research funding, federal research funding, endowment assets, annual giving, number of national academy members, faculty awards, doctorates granted, postdoctoral appointees, and median SAT scores for undergraduates. Universities were ranked and selected according to the number of times they appeared in the top 25 for each of these nine indicators. Initially, the study’s PI wrote a personalized letter to each potential participant in the study that contained a $15.00 cash pre-incentive, to keep regardless of decision to participate in the survey. Each letter included a unique identification code with which to log onto a Web site and complete the survey. After five reminder emails, the research firm commissioned to field the survey—Schulman, Ronca, and Bucuvalas, Inc. (SRBI)—called respondents up to a total of 20 times (or until responded to), requesting participation over the phone or on the Web. Six and a half percent of the respondents completed the survey over the phone, and 93.5 % completed the Webbased survey. Shedding Qualitative Light on Reasons of Non-Response Many of the scientists who chose not to participate wrote to tell us why. Overall, 131 personal emails or letters from those who did not wish to participate (out of 552 total nonrespondents) were received. Reasons for not participating were systematically coded in an attempt to uncover patterns. In total, the scientists provided 13 discrete reasons for not participating in the survey. Dominant reasons included lack of time, problems with the incentive, traveling or being away during the survey, and simply not wishing to participate. We did demographic analyses of the nonrespondent scientists and found no substantial differences along basic demographic indicators (such as gender, age, discipline, and race) between those who responded and those who did not. The results presented in Table 2, which categorizes specific reasons communicated for not completing the survey -are still beneficial in allowing us to understand some of the reasons for survey non-participation. It is likely that those that did not communicate with the PI fall in these categories somewhere, but it is also likely that they in themselves provide a sub-population of non-respondents. In this case, “non-active nonresponders” likely did not respond because of their own reasons, which we can not ascertain. Some respondents wrote to explain that they did not participate in the study specifically because it was on what they perceived to be an extremely controversial After the RAAS study began, the “Top American Research Universities” project moved to Arizona State University. See http://mup.asu.edu/, accessed April 17, 2009.

3


Am Soc (2012) 43:448–468

Table 2 Reasons identified by “Active-nonresponders” for not participating in the RAAS survey

459

Reason for not completing survey Too controversial

n 1.5

2

No time for survey

12.2

16

Have issue with the incentive

10.7

14

Policy of not participating in surveys

2.3

3

Survey not open-ended enough

2.3

3

21.4

28

Do not wish to participate Confidentiality issues Traveling, away, received survey after deadline Computer or technical problem

Data Source: All data was collected from direct correspondence between survey respondents and the project PI

%

0.8

1

16.0

21

6.1

8

Retired, sabbatical, or ill

3.8

5

Does not consider self appropriate for study

0.8

1

No reason Total “Active Nonresponders”

26.0

34

100.0

131

topic. One biologist explained: “You are naïve if you think that you could prevent Homeland Security or other major governmental agency from obtaining this confidential information. Sorry but here is your money back.” The biologist eventually ended up participating in the study. Still, his response might help in explaining why certain questions on the survey were much less likely to be answered than others. Even though complete confidentiality was assured and human subjects’ protection extensively discussed, some scientists were fearful that their answers to controversial questions might be used against them. The pre-incentive also raised mixed reactions. For example, one psychologist said, “As soon as I opened that up, I thought, ‘Oh my God. I’ve got the bills now. I have to do it’ [laughs]. . . . It was just brilliant.”4 Other scientists called the study “harassment” or even “coercion.” For example, a well-known sociologist wrote the PI an email saying, “It is obnoxious to send money (cash!) to create the obligation to respond.”5 It is important to note that the study received full human subjects’ approval at the PI’s university. Reasons for not responding are likely to vary by individual circumstance. In the case of the RAAS, about 26 % of those that did not respond (25 % of the total sample) to the survey communicated directly with the PI and provided reasons for not responding. As economists and political scientists have already discovered, however, the preincentive works. There was an overall response rate of 75 %, or 1,646 respondents, ranging from a 68 % rate for psychologists to a 78 % rate for biologists. This is an extremely high response rate for a survey of faculty. Even the highly successful Carnegie Commission study of faculty resulted in only a 59.8 % rate.6 Understanding who those are that did not complete these particular surveys is likely to help us to place our findings in a context of a specific type of respondent, without overgeneralizing to all scientists. However, we also understand that any meta-analysis 4

Psyc 17, conducted January 3, 2006. This individual did not participate in the survey. 6 See Ladd and Lipset, “The Politics of Academic Natural Scientists and Engineers.” 5


460

Am Soc (2012) 43:448–468

of missing data is likely to be constrained at some point by a simple lack of identification. Identifying Quantitative Patterns to Mechanisms of Non-Response Complementing our qualitative understanding of non-participants in the RAAS survey, we also quantitatively examine missing-data patterns. As expected, the questions with the highest levels of nonignorable missing data were in regards to, religion, the topic of the study. The survey asked some questions about religious identity, belief, and practice which were replicated from national surveys of the general population (such as the GSS), and other questions on spiritual practices, ethics, and the intersection of religion and science in the respondent’s discipline, some of which were replicated from other national surveys and some of which were developed uniquely for this survey. 7 There were also a series of inquiries about academic rank, publications, and demographic information. In order to better understand the mechanisms driving the nonignorable patterns of missing data, we selected the questions that had the highest non-response rates and analyzed them with a series of logistic regression equations to better understand the likelihood of a respondent refusing to answer a given question based on a selection of identified correlates. The analysis is informative in its ability to uncover the mechanisms that may inform data replacement methods or simply the presentation of findings from previous and future research using the RAAS dataset. The general equation applied to the logistic regression analysis is as follows: b p ln ¼ B0 þ B*1 X1 þ B*2 X2 þ :::::: þ B*i Xi 1 b p where the logged probability that the question was identified as missing (equal to 1) versus the probability that the question was not identified as missing (equal to 0) is examined as a probability equal to a constant (B0) plus a series of independent variable specific slope coefficients (Bi * Xi). Table 3 shows the questions on the survey that have the greatest proportion of missing data from the entire dataset. From the table, we find evidence supporting our expectation concerning controversial questions and item non-response, as the religiosity-specific questions are missing at a higher rate than are other measures more widely recognized as intrusive: income (6 % missing), family formation (<1 % missing), and number of children (2.5 % missing). But the question that asks where these elite scientists would place their religious views on a seven-point conservative/ liberal scale, when compared to other Americans, has a missing data rate of over 34 %. And over 16 % of those who responded to the survey were not willing to answer a question that described their feelings about the Bible. Questions about views on truth in religion and belief about God also garnered a more than 10 % nonresponse. These preliminary analyses reveal that the missing data from this survey is not missing at random. It is “nonignorable,” in Allison’s (2002) terms, meaning that 7

The 1998 GSS had 2,832 respondents, although only half of the sample was asked the expanded set of religion and spirituality questions. The 2004 GSS had 2,812 respondents. Where possible, we used data from the GSS 2006 for the comparisons of scientists with the general population. See Davis et al. (2007).


Am Soc (2012) 43:448–468

Table 3 Selected controversial survey items and non-response rates

461

Survey questions and response categories

% Missing a

Which of the following comes closest to your views about truth in religion?

10.1 %

(1) There is little truth to any religion (2) There are basic truths in religion (3) There is the most truth in only one religion Which of the following statement comes closest to expressing what you believe about God?

10.1 %

(1) I do not believe in God (2) There is no way to find our if there is a God (3) I believe in a higher power, but it is not God (4) I believe in God sometimes (5) I have some doubts but I believe in God (6) I have no doubt about God’s existence Which of these statements comes closest to describing your feelings about the bible?

16.1 %

(1) The Bible is an ancient book of fables, written by men (2) The Bible is inspired by the word of God, but not the actual word a

These percentages are relative to much lower levels of missing data concerning more “widely Known intrusive measures”, including income (6 %), marital status (<1 %), and number of children at home (2.5 %)

(3) The Bible is the actual word of God and should be taken literally Compared to Most Americans, where would you place your RELIGIOUS views on a seven point scale?

34.3 %

(1) Extremely Liberal→→→→→→→(7) Extremely Conservative

more complicated analysis is warranted. What is driving these patterns, and can we better understand these non-responses given an analysis of correlates associated with these missing patterns? That is our goal here. To examine these specific patterns in regards to our expectations that variations exist across demographic indicators, professional standing, religiosity and familial characteristics, we present results of a logistic regression analysis in Table 4. These four sub-groups were included in the analysis as they are often found to be linked to patterns of missing data in the general population (citizenship, gender, age, race, and family formation are used as explanatory variables) (Allison 2002; Griliches 1986). As mentioned, we also included professional characteristics that may be important to this population specifically, such as whether or not a faculty member is in the natural sciences (social sciences as comparison), prestige as indicated by number of published articles, and rank. Next, we included religiosity variables that may make a difference in missing-data rates, including present levels of religiosity (attendance at services and religious affiliation) and religious socialization (importance of religion in childhood). Finally, we included measures of family formation such as marital status and the number of children in the respondent’s family, given the results of previous analyses linking it to religiosity using this dataset (Ecklund 2010).


1.053(.04)

1.278(.36)

1.035(.55)

1.042(.12)

0.991(.01)

1.022(.02)

1.008(.31)

0.926(.19)

1.035(.05)

1.161(.27)

1.199(.26)

0.496(.53)

0.851(.09)†

0.761(.21)

0.884(.23)

0.975(.07)

Age

Income

White

Natural science

Number of published articles

Assistant professor (ref Full)

Associate professor (ref Full)

Attendance at services

Import of Relig. in childhood

Affiliated with Relig. Org.

Married

Number of kids

χ2 –

28.342*

0.851(.13)

1.257(.40)

0.993(.30)

a

0.926(.05)

1.084(.19)

1.055(.16)

1.028(.07)

0.571(.34)†

1.092(.22)

1.116(.21)

1.016(.04)

1.241(.15)

0.559(.23)**

27.271*

0.977(.08)

1.144(.26)

0.949(.20)

1.013(.09)

0.517(.43)

1.027(.33)

1.123(.39)

1.037(.06)

1.341(.21)

0.596(.25)*

0.897(.04)**

0.936(.14)

0.162(.13)***

0.803(.06)***

0.163(.56)***

1.235(.16)

0.782(.17)

1.035(.03)

0.951(.12)

1.613(.15)**

0.981(.01)

0.945(.03)†

0.953(.02)*

1.013(.01)*

1.017(.01)

1.015(.19)†

0.845(.15)

0.981(.04)

Reduced

1.756(.21)**

0.964(.06)

Full

Religious views

1.531(.19)*

0.961(.05)

Reduced

Belief about bible

309.452***

1.087(.18)

0.865(.46)

0.147(.16)***

0.776(.07)***

0.111(.71)**

1.175(.24)

0.714(.29)

1.062(.04)

1.187(.16)

1.999(.23)***

0.962(.02)†

1.011(.01)

0.784(.18)

1.032(.04)

Full

a Reduced models represent the isolated examination of the demographic, professional, religiosity, or family formation indicators of all other variables. In contrast the full model represents the controlled effect of each indicator with all other variables in the model. ***p<0.001, **p<.0.01, *p<0.05, † p<0.10

26.559*

0.795(.07)**

1.087(.24)

0.603(.31)†

0.952(.11)

1.211(.19)

0.553(.28)

1.229(.14)

0.378(.57)† 1.117(.11)

0.327(.58)* 0.398(.68)

0.695(.31)

2.253(.45)†

0.491(.60)

1.074(.09)

0.939(.33)

0.484(.41)

0.897(.04)*

1.006(.01)

0.922(.28)

1.062(.05)

1.123(.29)

0.492(.34)*

0.948(.03)

1.029(.01)**

1.945(.66)*

1.147(.10)

Full

1.788(.53)

1.082(.08)

1.095(.29)

0.991(.01)

1.393(.29)

1.064(.08)

0.761(.32)

0.934(.27)

0.882(.07)†

Female

Reduced

Full

Reduced

0.855(.06)*

Belief about god

Belief about religion

U.S. born

Explanatory variables

Table 4 Odds ratios predicting missing data on selected controversial survey items (standard errors)

462 Am Soc (2012) 43:448–468


Am Soc (2012) 43:448–468

463

We find very interesting patterns associated with the systematic missingness across the four questions concerning religious views. Of specific interest, we find that some of the demographic indicators that are traditionally thought to be highly correlated with missing data patterns are not among the strongest predictors in this analysis of elite scientists. Instead, we find them not to have a large influence on the likelihood of having missing data—with a couple of notable exceptions. Again, all odds-ratios should be interpreted as the likelihood of the data point being missing (i.e. ratio above one, more likely to be missing). The only significant effect of nativity is associated with the initial question concerning the respondents’ general views. Here we find that those native to the United States were 12 to 15 % less likely to have avoided answering this question in both the reduced and full models. Similar, yet insignificant, results were found concerning nativity and a respondent’s beliefs about the Bible. Interestingly, concerning their beliefs about God and their comparative views to other Americans, U.S. natives were directionally less likely to respond. We found that women (when compared to men) were significantly more likely to have avoided questions on beliefs about God and the Bible, 95 and 76 % respectively. While again not significant, women were less likely to have missing data on the question addressing their beliefs about religion and their comparative stance to the rest of Americans. Following the trend, older respondents were less likely to avoid the question concerning their beliefs in religion but more likely to avoid questions on their beliefs about God, the Bible, and their comparative views. In this case, only in regards to their beliefs about God is the probability significant. The effect of age, however, is ultimately explained away when professional indicators, religiosity, and family formation indicators are introduced. Scientists with higher levels of income are significantly less likely to answer questions concerning their beliefs in God, the Bible, and their comparative views. They are more likely to answer questions concerning their beliefs about religion (although these results are not statistically significant). Finally, scientists who classified themselves as racially white were less likely to have missing data on beliefs about God and the Bible and more likely to have avoided answering the questions concerning their religious beliefs and their religious views compared to those of other Americans. We find almost no evidence in support of our expectation that an increase in prestige leads to higher probability of answering questions concerning religiosity. As the number of articles a respondent has published increases, we find that the respondent is actually more likely to be missing data on all questions of interest (although these results are not statistically significant). Finally, we find that professors with an associate rank are over two times more likely than full professors to have avoided questions concerning their beliefs in religion but 67 % less likely to have avoided questions concerning their belief in God (results are significant). Finally, we also see that natural scientists (when compared to social scientists) are not significantly more or less likely to avoid answering any of the questions of interest here, in contrast with our expectation. When we turn to the influence of religious factors, we find that all of the religiosity measures are significant in decreasing the likelihood of missing data when scientists were asked to compare their religious views to those of other Americans. This provides strong support for our expectation that respondents who reported a higher


464

Am Soc (2012) 43:448–468

level of religiosity would in fact be more likely to answer the controversial questions concerning religiosity. In particular, the results show that as religiosity increases, scientists are significantly more likely to answer questions about their religious views, their views on God, their views on the Bible, and their comparisons of their beliefs to those of other Americans. Only the measure in comparison to other Americans, however, holds in the full model. Perhaps not surprisingly, this is the most influential set of predictors underlying the systematic patterns of nonignorable missing data. Not coincidentally, these religious indicators directly relate to the patterns of missing data among the religiouslycentered questions that are the endogenous variables in this analysis. This is extremely important in our ability to understand variations in missing data patterns among these four questions. We find that not only are all of the top-four questions, in regards to their quantity of missing data, all religion-centered questions, but also at least one of them is strongly predicted by a set of alternative religiosity indicators that have minimal levels of missing data. What does this mean for our understanding of these patterns and future analyses? This means that those who regularly attend services, those who had religion play an important role in their lives as children, and those who are currently affiliated with a religion are less likely to avoid these questions. Thus, the majority of missing data on these questions relates directly to individuals who are not religious or who are only weakly attached to religion. The question itself then seems to be somewhat in-exhaustive as it only asks for response on a seven point scale about religious views. It is certainly likely that much of the reason for nonresponse in this case is the inability of respondents to place themselves on the scale because of the fact that the highest levels of missing data were associated with those that exhibit very low levels of religiosity in regards to the questions that they did answer. The final section of determinants in the logistic regression model concerns the potential relationship between the formation and development of a family through marriage and having children to the likelihood of viewing questions about religion as controversial, thus making them more likely to answer them. We find support for this line of reasoning as all statistically significant responses indicate a lower likelihood of missing data in general for both married respondents and those with children. Being married is related to a lower likelihood of avoiding questions concerning the respondent’s religious views, and having children is significantly related to a lower likelihood of avoiding questions on God and on the comparison of one’s religious stance with those of other Americans.

Discussion and Conclusion Uncovering the mechanisms for underlying patterns of missing data is as important in assessing the reliability and validity of a data collection tool and its items as more popularly employed traditional statistical tests (alpha reliability test, exploratory data reduction techniques, etc.). Very little attention has been given to this very important issue, however. We currently have sophisticated methods for handling missing data. Some have inherent in them the regression-based probability analyses we have undertaken here (i.e. multiple imputation and other stochastic replacement methods).


Am Soc (2012) 43:448–468

465

None of these methods, however, allows understanding of the mechanisms driving the missing data patterns. Filling this lacunae may provide insight to development and continued advancement of “professional standards” in our own discipline of sociology as well as more broadly. For instance, the ability to detect inherent issues with our questions concerning the comparison of religious views is likely the reason that it had the highest rate of missing data in the entire dataset. If another survey utilized this question, it is likely that the addition of a category for “no religious views” would alleviate some of the missing data. It is still likely that this would remain one of the most avoided questions, as evidenced by the fact that the next three highest levels of missing data were all in regards to religiosity questions, but future analyses can now be informed given the results of this introspective examination of the RAAS data. Further, we found significant differences among questions that generated the most missing data. And these patterns of missing data did not fall along the predicted lines of what researchers have found in the general population. Questions about income and personal family issues, which scholars find often experience missing data in the general population (Allison 2002), did not generate the same missing-data patterns among natural and social scientists who teach and do research at elite universities. Instead, questions that are most likely to be controversial to this population were the most likely to generate missing data. In at least one case, a structural mechanism for the level of missingness was uncovered, but overall these included questions where respondents were asked to compare their religious views to those of other Americans, describe their feelings about the Bible, state their views on truth in religion, and quantify their beliefs about God. Analysis also revealed that these missing-data patterns for sample-specific controversial questions were not ignorable, meaning that such questions had a high percentage of missing data that could be directly linked, through statistically significant associations, to indicators of demographic makeup, professional rank, religiosity, and family formation. Further, we found that gender, race, and the presence of children in one’s household had an impact on the likelihood of having missing data on the controversial religion questions mentioned above. Women were more likely than men to have missing data when asked how their views on religion compared to those of other Americans, as did scientists who were racially white. In terms of religiosity measures, scientists who were the least religious were the most likely to have missing data for questions on belief about God and on the comparison of their religious views to those of other Americans. Some of these specific results about missing data among scientists may be explained by factors of marginalization. Our elite universities nearly all employ fewer women (especially in the natural sciences but even in the social sciences) than men. Women often feel marginalized in this situation, which may make them less likely to answer controversial questions about religion on a survey for fear of how the results might be used to further marginalize them. If this marginalization is happening, we might expect the same pattern of response from nonwhite groups, but there are too few black and Latino individuals in the sample to make a meaningful comparison. Asian Americans are overpopulated as a minority group among the sciences when compared to other minority groups, but many are first-generation immigrants or international scholars, those who may not be part of the same minority-


466

Am Soc (2012) 43:448–468

marginalization dynamics of American culture as other nonwhite racial minority groups. This same kind of reasoning about marginalization might also be applied to those who are not religious. Researchers find that atheists and the nonreligious are somewhat marginalized in the general population (Edgell et al. 2006). This means that scientists who are not religious may be especially unlikely to answer questions about religion for fear that the larger public might use such results against them. (Remember the quote from the biologist who was afraid his answers would be leaked to Homeland Security.) Interestingly, the missing-data patterns across the specific questions deemed controversial to this group were often avoided at different rates, given both the question of interest and the determinant at hand. For instance, and based solely on directionality of the obtained coefficients, the respondents were often likely to answer one or two questions related to general religion but avoid the others relating specifically to God and the Bible. Furthermore, general religion questions often seemed to be interpreted differently from those concerning God and the Bible. This may be because these questions connote adherence to the Christian religious traditions in particular. In this case, we find further evidence of potentially correctable issues in survey design would likely yield a more complete and reliable set of data in future rounds of data collection. Most obviously, we find that some of our survey questions do not allow for respondents to provide an appropriate answer given that they are not exhaustive. In this case, the fact that missing data exists is actually not associated with respondent avoidance, but instead with an error in the item creation. We can make some assumptions from the patterns of missing data here, however. For instance, it is likely that those who did not answer the question do not feel that they have “religious views” and therefore cannot compare themselves in this respect. Whatever the case, there were also a number of times when respondents answered these questions in a manner correlated with differences in nativity, gender, age, income, race, field, rank, religiosity, and marital status. Many of these associations were insignificant, but they are interesting nonetheless. The particular historical context of this survey may also be important. It took place in 2005, when there was much publicity surrounding controversial cases over intelligent design theory and charges that university professors were particularly biased in their teaching (Schrecker 2006; Ecklund 2010). Natural and social scientists in highprofile positions at that time may have been particularly reticent to openly compare their views about religion to those of the general population. Individual emails to the study PI expressed such reservations, along with concern that the survey data be kept completely confidential and that their names not be identified. Of general relevance to survey researchers, these results show the value of secondary analysis for understanding the underlying reasons for missing data in a particular data set, which the PI has done in other publications generated from this study (Ecklund et al. 2008). Where possible, it may be valuable for the researcher to have extensive contact with the research subjects to try and discern the possible reasons for their missing data. Further, these results show that missing data patterns— even fairly established ones having to do with demographics—may differ radically among survey populations. More and more survey researchers are trying to avoid low response rates by targeting specific (rather than representative) populations of


Am Soc (2012) 43:448–468

467

respondents. The kinds of conversations we have generated in this article about survey response rates, and particularly those about response rates concerning missing data among specific populations and related controversial questions, may hold value for survey-methods classes as well as researchers in the field. In sum, the ability to uncover issues associated with providing incentives for responses, survey item construction, reliability, and validity (among other issues inherent in survey data collection) is enough to warrant a more careful examination of the actual structure of the data we use to uncover many of the associations and relationships presented in our most coveted publication outlets. Coupled with the fact that nearly four out of every five of the articles obtained in this analysis made their contributions via data collected through survey or questionnaire methods, a more careful understanding of these mechanisms is warranted in order to continue to improve our ability to make reliable and valid inferences from the data we collect. Furthermore, it is important to move beyond simply employing a series of generic replacement methods in the process of imputation. Rather, researchers should more tightly link these data procedures or secondary correction methods to specific datasets, since every dataset is collected within a specific social, historical and political context. The targeted sample, geographic coverage, and data collection method also play into the ultimate “structure” of any given dataset. We must then strive to understand these conditions and take them into account when analyzing and presenting our results. Often such results, especially those in publications with an impact as high as those examined in this study, inform policy and future research agendas for years to come. Finally, the ability to incorporate the information extracted from active nonresponders and missing data patterns should be incorporated into the future design of these studies.

References Abraham, K., Maitland, A., & Bianchi, S. (2006). Nonresponse in the American time use survey: who is missing from the data and how much does it matter? Public Opinion Quarterly, 70(5), 676–703. Allison, P. (2002). Missing data (Sage quantitative applications in the social sciences series). Sage Publishing. Alosh, M. (2009). The impact of missing data in a generalized integer-valued autoregression model for count data. Journal of Biopharmaceutical Statistics, 19, 1039–1054. Babbie, E. (2007). Practice of social research (11th ed.). Thomson: Wadsworth Publishing. Davis, J. A., Smith, T. W., Marsden, P. V. (2007). General social surveys, 1972–2006 [Cumulative File]. ICPSR Study Number 4697. Dawid, A. P. (1984). Statistical theory: the prequential approach (with discussion). Journal of the Royal Statistical Society A, 147, 278–292. DeSouza, C. M., Legedza, A. T. R., & Sankoh, A. J. (2009). An overview of practical approaches for handling missing data in clinical trials. Journal of Biopharmaceutical Statistics, 19, 1055–1073. Ecklund, E. H. (2010). Science Vs. religion: What scientists really think. New York: Oxford University Press. Ecklund, E. H., & Scheitle, C. (2007). Religion among academic scientists: distinctions, disciplines, and demographics. Social Problems, 54(2), 289–307. Ecklund, E. H., Park, J. Z., & Veliz, P. T. (2008). Secularization and religious change among elite scientists: a cross-cohort comparison. Social Forces, 86(4), 1805–1840. Edgell, P., Gerteis, J., & Hartmann, D. (2006). Atheists as ‘other’: moral boundaries and cultural membership in American Society. American Sociological Review, 71, 211–234. Evans, J. H., & Evans, M. S. (2008). Religion and science: beyond the epistemological conflict narrative. Annual Review of Sociology, 34, 87–105.


468

Am Soc (2012) 43:448–468

Garcia, R. I., Ibrahim, J. G., & Zhu, H. (2010). Variable selection in the cox regression model with covariates missing at random. Biometrics, 66, 97–104. Gelman, A., Meng, X. L., & Stern, H. S. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica, 6, 733–807. Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F., & Meulders, M. (2005). Multiple imputation for model checking: completed data plots with missing and latent data. Biometrics, 61, 74–85. Griliches, Z. (1986). Comment on Behrman and Taubman. Journal of Labor Economics, 4(3), S146–S150. Groves, R. (2006). Nonresponse rates and nonresponse bias in household surveys. Public Opinion Quarterly, 70(5), 646–675. Groves, R., Couper, M., Presser, S., Singer, E., Tourangeau, R., Piani, G., & Nelson, L. (2006). Experiments in producing nonresponse bias. Public Opinion Quarterly, 70(5), 646–675. Haung, R., Liang, Y., & Carrierre, K. C. (2005). The role of proxy information in missing data analysis. Statistical Methods in Medical Research, 14, 457–471. Khoshgoftaar, T. M., Van Hulse, J., Seiffert, C., & Zhao, L. (2007). The multiple imputation quantitative noise corrector. Intelligent Data Analysis, 11, 245–263. Lenski, G. (1961). The religious factor. Garden City: Doubleday. Little, R. J. A. (1995). Modeling the drop-out mechanism in repeated measures studies. Journal of the American Statistical Association, 90, 1112–1121. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley. Martinussen, T., Nord-Larsen, T., & Johannsen, V. K. (2008). Estimating forest cover in the presence of missing observations. Scandinavian Journal of Forest Research, 23, 266–271. Maxim, P. (1999). Quantitative research methods in the social sciences. Oxford University Press. Montiel-Overall, P. (2006). Implications of missing data in survey research. Canadian Journal of Information and Library Science, 30(3/4), 241–269. Neuman, L. (2003). Social research methods: qualitative and quantitative approaches (5th ed.). Allyn and Bacon Publishing. Olson, K. (2006). Survey participation, nonresponse bias, measurement error bias, and total bias. Public Opinion Quarterly, 70(5), 737–758. Paik, M. C. (2004). Nonignorable missingness in matched case–control data analysis. Biometrics, 60, 306– 314. PEW Research Center for the People and Press. (2004). Polls face growing resistance, but still representative. Survey reports (April 20th). Porter, J. R., Cossman, R., & James, W. L. (2009). Research note: imputing large group averages for missing data, using rural–urban continuum codes for density driven industry sectors. Journal of Population Research, 26, 273–278. Rose, R. A., & Fraser, M. W. (2008). A simplified framework for using multiple imputation in social work research. Social Work Research, 32(3), 171–178. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592. Rubin, D. B. (1984). Multiple imputation for nonresponse in surveys. New York: Wiley. Satten, G. A., & Carroll, R. J. (2000). Conditional and unconditional categorical regression models with missing covariates. Biometrics, 56, 384–388. Schrecker, E. (2006). Worse than McCarthy. pp. B20, February 10, 2006, in The chronicle of higher education. Southern, D. A., Norris, C. M., Quan, H., Shrive, F. M., Diane Galbraith, P., Humphries, K., Gao, M., Knudtson, M. L., & Ghali, W. A. (2008). An administrative data merging solution for dealing with missing data in a clinical registry: adaptation from ICD-9 to ICD-10. BMC Medical Research Methodology, 8(1), 1–9. Verbeke, G., & Molenberghs, G. (2000). Linear mixed models for longitudinal data. New York: Springer. Verbeke, G., & Molenberghs, G. (2010). Arbitrariness of models for augmented and coarse data, with emphasis on incomplete data and random effects models. Statistical Modeling, 10(4), 391–419. Wright, J. D., & Marsden, P. V. (2010). In P. V. Marsden & J. D. Wright (Eds.), The Handbook of Survey Research (2nd ed.). United Kingdom: Emerald Press.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.