Zeitschrift für Psychology Issue 1, 2019

Page 1

Michael Bošnjak Timo Gnambs (Editors)

Zeitschrift für Psychologie Founded in 1890 Volume 227 / Number 1 / 2019 Editor-in-Chief Edgar Erdfelder Associate Editors Michael Bošnjak Benjamin E. Hilbig Bernd Leplow Steffi Pohl Christiane Spiel Elsbeth Stern

Hotspots in Psychology 2019


The first structured resource for psychologists that combines mindfulness with character strengths Ryan M. Niemiec

Mindfulness and Character Strengths A Practical Guide to Flourishing 2014, xx + 274 pp. + CD with meditation exercises US $39.80 / € 27.95 ISBN 978-0-88937-376-1 Also available as eBook At the core of this hands-on resource for psychologists and other practitioners, including educators, coaches, and consultants, is Mindfulness-Based Strengths Practice (MBSP), the first structured program to combine mindfulness with the character strengths laid out in the VIA Institute’s classification developed by Drs. Martin E. P. Seligman and Christopher Peterson. This 8-session program systematically boosts awareness and application of character strengths – and so helps people flourish and lead more fulfilling lives. The author’s vast experience working with both mindfulness and character strengths is revealed in his sensitive and clear presentation of the conceptual, practical, and scientific elements of this unique combined approach. It is not only those who are new to mindfulness or to character strengths who will appreciate the detailed primers on these

www.hogrefe.com

topics in the first section of the book. And the deep discussions about the integration of mindfulness and character strengths in the second section will benefit not just intermediate and advanced practitioners. The third section then leads readers step-by-step through each of the 8 MBSP sessions, including details of session structure and content, suggested homework, 30 practical handouts, as well as inspiring quotes and stories and useful practitioner tips. An additional chapter discusses the adaption of MBSP to different settings and populations (e.g., business, education, individuals, couples). The mindfulness and character strengths meditations on the accompanying CD support growth and development. This highly accessible book, while primarily conceived for psychologists, educators, coaches, and consultants, is suitable for anyone who is interested in living a flourishing life.


Michael Bošnjak Timo Gnambs (Editors)

Hotspots in Psychology 2019

Zeitschrift für Psychologie Volume 227 /Number 1/2019


Library of Congress Cataloging in Publication information is available via the Library of Congress Marc Database under the LC Control Number 20189302867 Ó 2019 Hogrefe Publishing Hogrefe Publishing Incorporated and registered in the Commonwealth of Massachusetts, USA, and in Göttingen, Lower Saxony, Germany No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the publisher. Cover image Óistockphoto.com/Rawpixel Stock photo. Posed by models. Printed and bound in Germany ISBN 978-0-88937-555-0 The Zeitschrift für Psychologie, founded by Hermann Ebbinghaus and Arthur König in 1890, is the oldest psychology journal in Europe and the second oldest in the world. Since 2007, it appears in English and is devoted to publishing topical issues that provide convenient state-of-the-art compilations of research in psychology, each covering an area of current interest. The Zeitschrift für Psychologie is available as a journal in print and online by annual subscription and the different topical compendia are also available as individual titles by ISBN.

Editor-in-Chief

Edgar Erdfelder, University of Mannheim, Psychology III, Schloss, Ehrenhof-Ost, 68131 Mannheim, Germany, Tel. +49 621 181-2146, Fax +49 621 181-3997, erdfelder@psychologie.uni-mannheim.de

Associate Editors

Michael Bošnjak, Trier, Germany Benjamin E. Hilbig, Landau, Germany

Bernd Leplow, Halle, Germany Steffi Pohl, Berlin, Germany Christiane Spiel, Vienna, Austria

Elsbeth Stern, Zurich, Switzerland

Editorial Board

Michael Ashton, St. Catharine’s, Canada Martyn Barrett, Guildford, UK Daniel M. Bernstein, Surrey, Canada Jason A. Chen, Williamsburg, VA, USA Mike W.-L. Cheung, Singapore Xenia Chryssochoou, Athens, Greece Reinout E. de Vries, Amsterdam, The Netherlands Timo Gnambs, Bamberg, Germany Vered Halamish, Ramat Gan, Israel

Alfons O. Hamm, Greifswald, Germany Moritz Heene, Munich, Germany Suzanne Jak, Amsterdam, The Netherlands Nadine Kasten, Trier, Germany Jennifer E. Lansford, Durham, NC, USA Kibeom Lee, Calgary, Canada Tania Lincoln, Hamburg, Germany Marko Lüftenegger, Vienna, Austria Alexandra Martin, Wuppertal, Germany

Anne C. Petersen, Ann Arbor, MI, USA Frank Renkewitz, Erfurt, Germany Anna Sagana, Maastricht, The Netherlands Melanie Sauerland, Maastricht, The Netherlands Monika Undorf, Mannheim, Germany Omer van den Bergh, Leuven, Belgium Suman Verma, Chandigarh, India

Publisher

Hogrefe Publishing, Merkelstr. 3, 37085 Göttingen, Germany, Tel. +49 551 999 50 0, Fax +49 551 999 50 425, publishing@hogrefe.com North America: Hogrefe Publishing, 7 Bulfinch Place, 2nd floor, Boston, MA 02114, USA Tel. +1 (866) 823 4726, Fax +1 (617) 354 6875, customerservice@hogrefe-publishing.com

Production

Christina Sarembe, Hogrefe Publishing, Merkelstr. 3, 37085 Göttingen, Germany, Tel. +49 551 999 50 424, Fax +49 551 999 50 425, production@hogrefe.com

Subscriptions

Hogrefe Publishing, Herbert-Quandt-Str. 4, 37081 Göttingen, Germany, Tel. +49 551 999 50 900, Fax +49 551 999 50 998

Advertising/Inserts

Hogrefe Publishing, Merkelstr. 3, 37085 Göttingen, Germany, Tel. +49 551 999 50 423, Fax +49 551 999 50 425, marketing@hogrefe.com

ISSN

ISSN-L 2151-2604, ISSN-Print 2190-8370, ISSN-Online 2151-2604

Copyright Information

Ó 2019 Hogrefe Publishing. This journal as well as the individual contributions and illustrations contained within it are protected under international copyright law. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without prior written permission from the publisher. All rights, including translation rights, reserved.

Publication

Published in 4 topical issues per annual volume.

Subscription Prices

Calendar year subscriptions only. Rates for 2018: Institutions US $372.00 / € 292.00; Individuals US $195.00 / €139.00 (all plus US $16.00 / €12.00 shipping & handling; € 6.00 in Germany). Single issue US $49.00 / € 34.95 (plus shipping & handling).

Payment

Payment may be made by check, international money order, or credit card, to Hogrefe Publishing, Merkelstr. 3, 37085 Göttingen, Germany. US and Canadian subscriptions can also be ordered from Hogrefe Publishing, 7 Bulfinch Place, 2nd floor, Boston, MA 02114, USA

Electronic Full Text

The full text of Zeitschrift für Psychologie is available online at www.econtent.hogrefe.com and in PsycARTICLESTM.

Abstracting Services

Abstracted/indexed in Current Contents/Social and Behavioral Sciences (CC/S&BS), Social Sciences Citation Index (SSCI), Research Alert, PsycINFO, PASCAL, PsycLit, IBZ, IBR, ERIH, and PSYNDEX. Impact Factor (2016): 1.830

Zeitschrift für Psychologie (2019), 227(1)

Ó 2019 Hogrefe Publishing


Contents Editorial

Hotspots in Psychology – 2019 Edition Michael Bo snjak and Timo Gnambs

1

Review Articles

The Mechanisms of Social Norms’ Influence on Consumer Decision Making: A Meta-Analysis Vladimir Melnyk, Erica van Herpen, Suzanne Jak, and Hans C. M. van Trijp

4

Call for Papers

Ó 2019 Hogrefe Publishing

How Does Ethical Leadership Impact Employee Organizational Citizenship Behavior? A Meta-Analytic Review Based on Two-Stage Meta-Analytic Structural Equation Modeling (TSSEM) Yucheng Zhang, Long Zhang, Guangjian Liu, Jiali Duan, Shan Xu, and Mike W.-L. Cheung

18

Impaired Interparental Relationships in Families of Children With Attention-Deficit/Hyperactivity Disorder (ADHD): A Meta-Analysis Lena Weyers, Martina Zemp, and Georg W. Alpers

31

Intra-Individual Value Change in Adulthood: A Systematic Literature Review of Longitudinal Studies Assessing Schwartz’s Value Orientations Carolin Schuster, Lisa Pinkowski, and Daniel Fischer

42

Scientific Misconduct in Psychology: A Systematic Review of Prevalence Estimates and New Empirical Data Johannes Stricker, and Armin Günther

53

Which Data to Meta-Analyze, and How? A Specification-Curve and Multiverse-Analysis Approach to Meta-Analysis Martin Voracek, Michael Kossmeier, and Ulrich S. Tran

64

Visual Inference for the Funnel Plot in Meta-Analysis Michael Kossmeier, Ulrich S. Tran, and Martin Voracek

83

The Psychology of Forensic Evidence: A Topical Issue of the Zeitschrift für Psychologie Guest Editors: Anna Sagana and Melanie Sauerland

90

Zeitschrift für Psychologie (2019), 227(1)



Editorial Hotspots in Psychology – 2019 Edition Michael Bošnjak1,2 and Timo Gnambs3,4 1

ZPID – Leibniz Institute for Psychology Information, Trier, Germany

2

University of Trier, Germany

3

LifBi – Leibniz Institute for Educational Trajectories, Bamberg, Germany Johannes Kepler University Linz, Austria

4

This editorial gives a brief introduction to the articles in the third “Hotspots in Psychology” issue of the Zeitschrift für Psychologie. The format is devoted to systematic reviews and meta-analyses in research-active (i.e., hotspot) fields that have generated a considerable number of primary studies. The common denominator is the research synthesis nature of the articles included, not a specific psychological topic or theme that all articles have to address. Moreover, methodological advances in research synthesis methods relevant for any subfield of psychology are being addressed. Comprehensive supplementary material to the articles can be found in PsychArchives (https://www.psycharchives.org/). Similar to the first two hotspots issues (Bosnjak & Erdfelder, 2018; Erdfelder & Bosnjak, 2016), the call for papers for this third one sought to attract contributions related to at least one of the following four topics: (1) Systematic reviews and meta-analyses on topics currently being debated in any subfield of psychology; (2) Systematic reviews and meta-analyses contributing to the recent discussion about replicability, transparency, and research integrity in psychology; (3) Meta-analytic replications and extensions of previously published syntheses, for example, by applying more recent approaches and/or by including more recent primary studies, and (4) Methodological advances in research synthesis methods relevant for any subfield of psychology. The papers that were accepted for publication addressed three out of these four topics and are briefly introduced below.

Systematic Reviews and Meta-Analyses on Topics Currently Being Debated Appeals to social norms frequently try to influence the behavior of consumers, either by appealing to descriptive Ó 2019 Hogrefe Publishing

norms (what most others do), or by addressing injunctive norms (what others approve of), or by trying to keep both aspects salient. However, the influence of these two norms on different consumption-related variables has been unclear so far. For the first contribution to this issue, Melnyk, van Herpen, Jak, and van Trijp (2019) meta-analyzed almost 300 studies examining the effects of descriptive and injunctive norms on the consumer decision-making processes. They found that descriptive norms directly influence behavior, while injunctive norms are most strongly related to intentions. Moreover, other characteristics of social norms, such as their specificity and the sources they derive from, exerted a moderating effect. Overall, this article contributes to illuminating the operation of social norms in consumer behavior and has immediate relevance for creating persuasive messages in an advertising context. Leadership behavior has been a hotly debated topic in organizational psychology for decades. Although extensive research has scrutinized leader–member interactions in the workplace, the mechanisms of how leaders may influence their subordinates still represent an unresolved question. Zhang and colleagues (2019) tackle this issue from a metaanalytic perspective and explore how ethical leadership shapes organizational citizenship behavior (OCB). Notably, the authors go beyond the analyses of mere main effects to also explore mediational pathways between ethical leadership and employee behavior from a justice perspective. Adopting a multi-dimensional view of OCB, the authors find that interactional justice explains the influence of ethical leadership on employees’ organizational OCB but not on interpersonal OCB. These findings suggest that interpersonal justice is foremost important for workers’ OCB toward the organization. By combining standard meta-analytic techniques with recent advances in structural equation modeling, the paper is a prime example of how meta-analyses can contribute to theory testing and refinement. The third contribution examines one of the most common childhood mental disorders, that is, attention-deficit/ hyperactivity disorder (ADHD). While most available Zeitschrift für Psychologie (2019), 227(1), 1–3 https://doi.org/10.1027/2151-2604/a000350


2

research on ADHD has focused on genetic and neurobiological factors, a growing number of studies also emphasize the quality of family relationships: The child’s symptomatology might strain parents’ relationship with the child and, in turn, a poor parental relationship might contribute to ADHD symptoms in the child. Because the available literature on this issue seemed to be rather inconsistent, with some studies supporting this conjecture and others not, Weyers, Zemp, and Alpers (2019) present a meta-analytic summary of the available findings and compare the relationship quality between parents of a child with ADHD and parents with healthy children. The meta-analysis shows that parents of affected children reported a substantially poorer relationship quality as compared to parents of healthy children. In light of these findings, the authors suggest that information on the family quality, particularly parental relationships, should be systematically incorporated into psychotherapeutic interventions for children with ADHD. Personal values are presumed to guide people in their lives as overarching principles of judgments, decision making, and actual behavior. Focusing on Shalom Schwartz’s circumplex value model, the contribution by Schuster, Pinkowski, and Fischer (2019) represents the first systematic literature review on the empirical evidence about stability and change of values in adulthood. The findings indicate moderate to high stability of values, even in light of potentially life-changing transitions. Moreover, the experimental evidence considered suggests that values can actually be changed with the aid of interventions. Overall, this research underlines the stability of personal values over time, coupled with the first systematically synthesized evidence on measures and approaches to changing them.

Systematic Reviews Contributing to Recent Discussions about Replicability, Transparency, and Research Integrity in Psychology Stricker and Günther (2019) add to the ongoing discussion about research integrity in psychology. In their systematic review of survey studies on questionable research practices (QRP), the authors found that the use of QRP was admitted by 6–33% of researchers in psychology; self-admission rates for outright data falsification ranged between 0.6% and 2.3%. This highlights the need for more open research practices that allow for independent replications of research findings. Additionally, the authors provide empirical data on retractions in psychology. They found that about 0.82 per 10,000 journal articles were retracted due to scientific misconduct. This was similar in most psychological subfields, Zeitschrift für Psychologie (2019), 227(1), 1–3

Editorial

highlighting that the need for further measures against scientific misconduct is not limited to specific fields of psychology. In some areas, meta-analytic findings on the same issue do not converge or even substantially contradict each other. The paper entitled “Which data to meta-analyze, and how? A specification-curve and multiverse-analysis approach to meta-analysis”, by Voracek, Kossmeier, and Tran (2019) explores the role of researchers analytic flexibility when analyzing meta-analytic data using specification-curve analysis and multiverse analysis. Using one specific thematic example for which their approach produced almost 1,600 meta-analytic variations, this contribution emphasizes the need for pre-analysis planning and independent replication, an especially relevant consideration in the current replication debate.

Methodological Advances in Research Synthesis Methods The funnel plot is widely used in meta-analyses to assess the potential degree and direction of publication bias. However, the quality of inferences drawn from funnel plots is found to be limited in many cases. To improve this situation, Kossmeier, Tran, and Voracek (2019) present an approach within which the funnel plot of the actually observed data is presented in a lineup with null funnel plots showing data simulated under the null hypothesis. If the real-data funnel plot is correctly identified, the null hypothesis is formally rejected and conclusions based on visual inspection of the real-data funnel plot might be warranted. The authors suggest routinely conducting visual funnel plot inference, as it appears to be a convenient way of increasing the validity of conclusions based on funnel plots. Moreover, the authors have developed functions embedded into the R package meta viz, enabling researchers to apply the procedures described in this article using their own meta-analytic datasets.

Concluding Remarks Overall, we very much hope that the contributions to this third “hotspots” issue would stimulate further research and contribute to scientific discussions. We would like to point readers to the comprehensive sets of supplementary material facilitating reproduction and replication that can be found in PsychArchives (https://www.psycharchives. org/), a newly established repository for psychological research. Ó 2019 Hogrefe Publishing


Editorial

References Bosnjak, M., & Erdfelder, E. (2018). Hotspots in psychology – 2018 edition. Zeitschrift für Psychologie, 226, 1–2. https://doi.org/ 10.1027/2151-2604/a000323 Erdfelder, E., & Bosnjak, M. (2016). Hotspots in psychology: A new format for special issues of the Zeitschrift für Psychologie. Zeitschrift für Psychologie, 224, 141–144. https://doi.org/ 10.1027/2151-2604/a000249 Kossmeier, M., Tran, U. S., & Voracek, M. (2019). Visual inference for the funnel plot in meta-analysis. Zeitschrift für Psychologie, 227, 83–89. https://doi.org/10.1027/2151-2604/a000358 Melnyk, V., van Herpen, E., Jak, S., & van Trijp, H. C. M. (2019). The mechanisms of social norms’ influence on consumer decision making: A meta-analysis. Zeitschrift für Psychologie, 227, 4–17. https://doi.org/10.1027/2151-2604/a000352 Schuster, C., Pinkowski, L., & Fischer, D. (2019). Intra-individual value change in adulthood: A systematic literature review of longitudinal studies assessing Schwartz’s value orientations. Zeitschrift für Psychologie, 227, 42–52. https://doi.org/ 10.1027/2151-2604/a000355 Stricker, J., & Günther, A. (2019). Scientific misconduct in psychology: A systematic review of prevalence estimates and new empirical data. Zeitschrift für Psychologie, 227, 53–63. https://doi.org/10.1027/2151-2604/a000356 Voracek, M., Kossmeier, M., & Tran, U. S. (2019). Which data to meta-analyze, and how? A specification-curve and multiverseanalysis approach to meta-analysis. Zeitschrift für Psychologie, 227, 64–82. https://doi.org/10.1027/2151-2604/a000357

Ó 2019 Hogrefe Publishing

3

Weyers, L., Zemp, M., & Alpers, G. W. (2019). Impaired interparental relationships in families of children with attentiondeficit/hyperactivity disorder (ADHD): A meta-analysis. Zeitschrift für Psychologie, 227, 31–41. https://doi.org/10.1027/ 2151-2604/a000354 Zhang, Y., Zhang, L., Liu, G., Duan, J., Xu, S., & Cheung, M. W.-L. (2019). How does ethical leadership impact employee organizational citizenship behavior? A meta-analytic review based on two-stage meta-analytic structural equation modeling (TSSEM). Zeitschrift für Psychologie, 227, 18–30. https://doi. org/10.1027/2151-2604/a000353 Published online March 29, 2019

Michael Bošnjak FB I – Psychologie Universität Trier 54286 Trier Germany mb@leibniz-psychology.org

Timo Gnambs Leibniz Institute for Educational Trajectories Wilhelmsplatz 3 96047 Bamberg Germany timo.gnambs@lifbi.de

Zeitschrift für Psychologie (2019), 227(1), 1–3


Review Article

The Mechanisms of Social Norms’ Influence on Consumer Decision Making A Meta-Analysis Vladimir Melnyk1, Erica van Herpen2, Suzanne Jak3, and Hans C. M. van Trijp2 1

Department of Business Administration, Carlos III University, Getafe-Madrid, Spain

2

Marketing and Consumer Behavior Group, Wageningen University, Wageningen, The Netherlands Child Development and Education – Methodology and Statistics, University of Amsterdam, The Netherlands

3

Abstract: In the past decades, marketing practitioners have embraced social norms as a powerful instrument of influencing consumers’ behavior. An important distinction has been made between descriptive norms (what most others do) and injunctive norms (what others approve of), and this meta-analysis across 297 studies examines the effects of these types of social norms on consumer decision-making processes. We argue that descriptive norms directly influence behavior, and consequently that their effect on behavior should be stronger than that of injunctive norms. Injunctive norms, by contrast, should be more strongly related to intentions than descriptive norms. Results of the meta-analysis support these predictions, and furthermore provide new insights into the moderating effects of aspects of the norm (specificity of the norm, norm source) and of the target person (gender, age). Keywords: social norm, injunctive norm, descriptive norm, meta-analysis, decision making

Consumers often take the expectations and behavior of others into account when they decide what is appropriate to do (Cialdini, Reno, & Kallgren, 1990). These expectations and behavior of others establish social norms, and influence a wide array of decisions, including whether to engage in “grasscycling” and composting (White & Simpson, 2013), whether to buy a hybrid car (Ozaki & Sevastyanova, 2011), and how many cookies to eat (Pliner & Mann, 2004). Policy makers and marketers thus benefit from a good understanding of the effectiveness of social norms in influencing consumer behavior (White & Simpson, 2013). Despite a large body of research on social norms, empirical findings about their effect on behavioral intentions and behavior are far from consistent. For example, Sheeran, Abraham, and Orbell (1999), in their meta-analysis of the willingness to use condoms (121 studies out of which 21 include social norms) find that subjective norms are weak predictors of intentions (r = .26), whereas Rivis and Sheeran (2003) in their meta-analysis of the theory of planned behavior (21 studies) find a more substantial correlation (r = .44). The current meta-analysis takes a broader perspective across consumer behavior and methodological approaches, to investigate the influence of social norms Zeitschrift für Psychologie (2019), 227(1), 4–17 https://doi.org/10.1027/2151-2604/a000352

and identify moderators of social norm effects. Based on a vast dataset (297 studies), it provides insights into the quantified effects of social norms on behavioral intentions and behavior, as well as potential moderators of these effects, for both descriptive and injunctive norms. Descriptive and injunctive social norms are distinct types of norms (Cialdini et al., 1990), with descriptive norms related to what other people do themselves and injunctive norms to what other people think one should do. Both types of norms can influence behavior, whereas the effects of descriptive norms are presumed to occur through a “rather nonconscious, peripheral route of information processing” (Göckeritz et al., 2010, p. 514), injunctive norms are assumed to have a more conscious effect. In the current meta-analysis, we will examine the effectiveness of both types of norms, and specifically whether descriptive norms, due to their heuristic processing, have a stronger influence on behavior than injunctive norms. Additionally, we will investigate moderating effects of aspects of the norm (how specific is the norm, who is the source of the norm) and the target person (gender, age). In terms of contribution, the current meta-analysis extends prior research in several directions. First, it extends Ó 2019 Hogrefe Publishing


V. Melnyk et al., The Influence of social norms on consumer decision making

beyond previous meta-analyses that only incorporated studies using a specific theoretical framework, such as the theory of planned behavior (Albarracín, Johnson, Fishbein, & Muellerleile, 2001; Sheppard, Hartwick, & Warshaw, 1988), by including studies using various theoretical approaches in the dataset. It also extends beyond previous meta-analyses focusing on a specific field of interest, such as sustainable behavior (Poškus, 2016). Second, we focus on moderators for the effect of social norms which, with a single exception for the study of Manning (2009), has not received substantive attention in previous research. Compared to Manning (2009) we extend the systematic investigation of conceptual moderators beyond those only relating to the type of behavior involved, to include aspects of the norms themselves and of the target of the norm. Finally, we make use of state-of-the-art meta-analysis techniques using meta-analytic structural equation modeling (MASEM) (Cheung, 2014a). Rather than pooling separate effect sizes representing bivariate relations, as most metaanalysis techniques do, MASEM facilitates the metaanalysis of complete models by combining techniques of meta-analysis and structural equation modeling. In doing so, MASEM takes the dependencies between the effect sizes into account, which results in more precise parameter estimates compared to performing several univariate metaanalyses (Cheung & Chan, 2005). Moreover, this enables the separate evaluation of direct, indirect, and total effects, showing how much of the effect of social norms on behavior is mediated by, in our case, behavioral intentions. Another advantage of MASEM over bivariate analyses is that MASEM allows the evaluation of the unique contributions of social norms in predicting behavior.

Social Norms Social norms are “rules and standards that are understood by members of a group, and that guide and/or constrain social behavior without the force of laws” (Cialdini & Trost, 1998, p. 152). These rules and standards include the expectations of valued others and standards that develop from observations of others’ behavior. Social norms are thus informal, socially shared, and relatively stable guides of behavior (Melnyk, van Herpen, Fischer, & van Trijp, 2011). Their informal, nonobligatory, character implies the presence of social reinforcements, such as approval or disapproval, and distinguishes social norms from laws. Additionally, social norms are shared within a group, which differentiates them from personal norms based on a consumer’s own internalized values, and ensures that they are generally stable over time (Jones, 2006). Several prominent theories, such as the theory of reasoned action (Fishbein & Ajzen, 1975) and the theory Ó 2019 Hogrefe Publishing

5

of planned behavior (Ajzen, 1991), include social norms (termed “subjective norms”) next to attitudes and perceived behavioral control as predictors of behavioral intention. Initially, social norms often appeared as the weakest predictor in such models (Armitage & Connor, 2001). In recent decades, however, social norms have resurged as an important research topic, primarily due to an adjustment in the conceptualization of social norms themselves (Jacobson, Mortensen, & Cialdini, 2011; Staunton, Louis, Smith, Terry, & McDonald, 2014). Specifically, the focus theory of normative conduct (Cialdini et al., 1990) emphasized the need to differentiate between descriptive norms (what most other do) and injunctive norms (what others approve of). In the theory of planned behavior and related theories, the subjective norm concept only incorporates injunctive elements, and “reflects the expectations and wants of significant others about engaging in a specific behavior” (Staunton et al., 2014, p. 319). Subsequent studies have added descriptive norms to these models. Although there are exceptions, that is, studies in which descriptive norms do not show significant effects over and above other constructs (e.g., Poškus, 2018), insights from a meta-analysis show that, in general, descriptive norms increase the predictive power of the theory of planned behavior (Rivis & Sheeran, 2003). Descriptive norms have been shown to be an effective instrument for changing people’s behavior (Goldstein, Cialdini, & Griskevicius, 2008). Several prior studies have indicated that another improvement can be made by including direct effects of social norms on behavior (e.g., Christian & Armitage, 2002; Okun, Karoly, & Lutz, 2002), and this is confirmed in the meta-analysis of Manning (2009). Our examination will thus include both indirect effects of injunctive and descriptive norms on behavior, via behavioral intentions, and direct effects. Attitudes and perceived behavioral control will be taken up as well, to control for their effects.

Injunctive and Descriptive Norms Previous research has emphasized the importance of distinguishing between injunctive and descriptive norms as a key feature to understanding the influence of social norms (Jacobson et al., 2011; Lapinski & Rimal, 2005). Injunctive norms prescribe behavior, and refer to what people should do in a given situation. A request to follow a dress code is an example of an injunctive social norm. Descriptive norms describe the typical behavior of others, which provides “social proof” of what is likely to be effective behavior and sets behavioral standards from which people may not want to deviate (Schultz, Nolan, Cialdini, Goldstein, & Griskevicius, 2007). For example, information about the number of others who refrain from smoking constitutes a descriptive norm. Zeitschrift für Psychologie (2019), 227(1), 4–17


6

V. Melnyk et al., The Influence of social norms on consumer decision making

Injunctive and descriptive norms are inherently different, and evidence is mounting that these two types of norms operate through different intervening psychological processes (Göckeritz et al., 2010; Jacobson et al., 2011; Melnyk et al., 2011; Melnyk, van Herpen, Fischer, & van Trijp, 2013; White & Simpson, 2013). This is especially relevant because descriptive and injunctive norms affect behavioral intentions and behavior through two distinct processes that operate “independent of each other,” as evidenced from Rimal and Real’s (2005, p. 410) study. Specifically, descriptive norms as a source of social proof can influence behavior directly without (much) conscious processing (Göckeritz et al., 2010). Several scholars explain this tendency of people to follow others by an evolutionary approach. In particular, for a social animal being closer to its herd increases survival (Alcock 2005; Griskevicius et al., 2009). This tendency of people to instinctively copy and mimic the behavior of others has evolutionary benefits and is an adaptive strategy for learning (Griskevicius, Cantú, & van Vugt, 2012).Thus, often, consumers follow the behavior of others automatically and unwittingly (Aarts & Dijksterhuis, 2003; Cialdini & Goldstein, 2004; Nolan, Schultz, Cialdini, Goldstein, & Griskevicius, 2008). This is in line with the suggestion that descriptive norms have a significant direct effect on behavior and that the relation between descriptive norms and behavior is stronger than the relation between injunctive norms and behavior (Manning, 2009). The heuristic processing of descriptive norms is corroborated by recent evidence that the effect of descriptive norms increases under conditions of depletion (Jacobson et al., 2011; Kredentser, Fabrigar, Smith, & Fulton, 2012). This implies that we expect a strong direct influence of descriptive norms on behavior, whereas the indirect effect of descriptive norms through behavioral intentions should be weak. Injunctive norms, on the other hand, have been suggested to influence behavior more indirectly, through motivation to comply with social sanctions, triggering higher levels of cognitive elaboration compared to descriptive norms (Manning, 2009). Injunctive norms generally lead to more conflict over decisions to conform or nonconform, with depletion decreasing conformity to these norms (Jacobson et al., 2011). In their paper, Jacobson and colleagues (2011) find that injunctive norms evoke feelings of social obligation, as well as thoughts and experiences of competing goals and decision-making conflict. This implies that injunctive norms refer to more elaborate decision processes, making a direct effect on behavior less likely. We thus expect to see a stronger indirect effect of injunctive norms on behavior through behavioral intentions, and only a weak direct effect on behavior. In comparison with descriptive norms, we expect – in line with prior results, which found the relation between descriptive norms and behavior stronger than the relation between injunctive Zeitschrift für Psychologie (2019), 227(1), 4–17

Figure 1. Conceptual model. Solid lines indicate the theory of planned behavior (TPB) effects. Dashed lines indicate the moderating effects. Dotted lines indicate control variables.

norms and behavior (Manning, 2009; Thøgersen, 2008) – that direct effects of injunctive norms on behavior are weaker than those of descriptive norms. In addition to these effects of injunctive and descriptive norms on behavior, we expect that their association with attitudes is not of equal strength. Prior research has indeed shown that descriptive norms can be forgotten over time or with situational changes (Reno, Cialdini, & Kallgren, 1993), implying that descriptive norms may not be readily internalized. Thus, the association between descriptive norms and attitudes may be relatively less strong. In contrast, injunctive norms, because these tend to focus consumers on what others approve or disapprove of in their social group (Reno et al., 1993), may activate attitudes and feelings associated with being a group member (Terry, Hogg, & McKimmie, 2000). We thus expect that injunctive norms will have a stronger association with attitudes than descriptive norms. Figure 1 presents the model for social norm influence that will be tested. In addition to investigating this model, the current study aims to examine two potential moderators that can influence the effects of social norms: moderators involving aspects of the norm and the target person who is influenced by the norm.

Norm Aspects: Specificity and Source Specificity Social norms, by their very nature as rules and standards that guide and constrain social behavior, need to be clearly defined in order to be effective. Concretely specified norms define what is appropriate and inappropriate for specific individuals in specific situations, whereas abstract norms allow for a wider range of behavioral options, and may allow consumers to violate a norm without fear of punishment (Shaffer, 1983). Thus, consumers are generally more Ó 2019 Hogrefe Publishing


V. Melnyk et al., The Influence of social norms on consumer decision making

strongly persuaded by detailed and specific descriptions of expected behavior than by more abstract descriptions, possibly because they can more easily process the information and imagine themselves performing the behavior (Gollwitzer & Brandstatter, 1997; O’Keefe, 1997). We would thus expect that more concretely specified social norms have a stronger influence on behavioral intentions and behavior than less concretely specified norms. This should hold for specifications of the expected behavior and the situation in which this behavior is appropriate (Feldman, 1984), as well as for potential sanctions when failing to comply, or of potential rewards when complying with the social norm. The specification of such concrete consequences provides consumers with arguments to follow the norm (Jones, 2006). Thus, concrete specifications of (a) expected behavior, (b) sanctions, and (c) rewards are expected to increase the influence of social norms. Source of the Norm Norms are, first and foremost, social phenomena. Who communicates a norm (the source) can determine the extent of its influence: the norms of more relevant groups should be more influential (Terry, Hogg, & White, 2000). More relevant groups can be psychologically close to others who usually share similar values, opinions, and attitudes (Stangor, 2004). Consumers should be more likely to follow social norms that come from people that they are close to, such as their mother or father, partner, or intimate friends, than social norms that come from more distant or abstract sources (e.g., “most people”). The thought of specific persons that consumers are close to may activate information about the relationship with them and about expected relational outcomes (e.g., disappointment, praise), and this can make it more difficult to disobey a norm. Moreover, building the evolutionary perspective, instinctively copying and mimicking of behavior is more likely when consumers are exposed to the behavior directly and often (Griskevicius et al., 2012). Hence, influencing attempts are generally more successful when these originate from a source that consumers perceive as similar to themselves (O’Keefe, 2002). In contrast, more distant or abstract groups of other people may have less control and influence. We thus predict that norms from persons that are psychologically close to the consumer (e.g., partner, friends) will have a stronger influence on intentions and behavior, than norms from sources that are more distant (e.g., authority figures) and norms from abstract sources (e.g., people in general).

The Target: Gender and Age Most studies in the dataset allow for the coding of age and gender, and thus enable the exploration of their potential Ó 2019 Hogrefe Publishing

7

effects. For both, it is a-priori not obvious what their effect would be. Only few studies have examined gender differences related to the influence of social norms. In the context of sexual behavior, Fisher (2009) concludes that “it appears premature to draw definite conclusions” with respect to the responsiveness to social norms for males versus females (p. 571). With respect to age, older people may generally be less susceptible to social influence as they have gained more independence with age, but they also may be more sensitive to social influence when they experience uncertainty (Pasupathi, 1999).

Method Identification of the Sample To identify relevant publications about social norms in the time period up to November 2013, references were retrieved from the electronic databases Web of Science, Psych Info, Online Contents National, and Google Scholar, and by checking other meta-analyses that included a (general) effect of social norms. We also checked the websites of the National Social Norms Resource Center, the Social Science Research Network, and The Higher Education Center for Alcohol and Other Drug Abuse and Violence Prevention for relevant studies, and posted a request for working papers and unpublished manuscripts on the electronic list server ELMAR. Finally, all cross-references from relevant papers were examined for inclusion. Figure 2 describes the PRISMA flow diagram of the literature retrieval and inclusion process (Moher, Liberati, Tetzlaff, Altman, & The PRISMA Group, 2009). The current meta-analysis focuses on behaviors that are in the domain of consumer behavior in the context of material objects, services, or consumption, while excluding interpersonal relations and judgments (e.g., norms on how to behave toward other people), because these activate different neurological processes than judgments about material objects or services (Langner, Schmidt, & Fischer, 2015; Yoon, Gutchess, Feinberg, & Polk, 2006). The database included studies that (1) contain the necessary information to obtain the bivariate statistical relationship between social norm (either injunctive, descriptive, or both) and attitude, behavioral intention, and/or behavior, (2) do not lump descriptive and injunctive norms together as one construct, and (3) measure effects at the individual level. All behaviors that involve the purchase, consumption, use, or disposal of products and services, including for instance the decision to join a gym club, decision to start smoking, donations, dieting, class enrolment, use of contraceptives, and littering, are included. Excluded studies are those where the autonomy of decision making is impaired, in particular, Zeitschrift für Psychologie (2019), 227(1), 4–17


8

V. Melnyk et al., The Influence of social norms on consumer decision making

Figure 2. PRISMA flow diagram of the literature retrieval and inclusion process. Note. 1Keyword string: (((subjective OR injunctive OR descriptive) AND (norm OR norms OR pressure)) OR (“social influence” OR “social norm$” OR “group pressure” OR “peer pressure” OR “group influence” OR “behavioral belief$” OR “normative belief$” OR (“social support” AND (“physical activity” OR exercise$)))) NOT (depression OR suicide OR death OR violence OR aggression OR schizophren* OR “surgical” OR “surgery” OR injury OR prejud* OR stigma* OR nursing OR nurse $ OR crimin* OR religion OR pain OR game$ OR lie OR cheat$ OR patient$). 2 Included are behaviors that involve the purchase, consumption, use, or disposal of products and serviced. Excluded are studies regarding interpersonal relations, studies in which autonomy of decision making is impaired and studies of illegal behaviors. 3 Studies should contain empirical data that allows for obtaining the bivariate statistical relationship between descriptive and/or injunctive norms on the one hand and attitude, behavioral intention and/or behavior on the other hand.

where participants are sick and may depend on others in their decisions regarding, for example, medical treatment (Meyers, 2004), where participants make decisions as part of their job and may be influenced by company policies, and where participants are addicted because this makes their decision-making ability questionable (Leshner, 1997). Finally, studies of illegal behaviors were excluded, because legal sanctions may overshadow or change the influence of social norms. The final sample consisted of 220 papers, comprising 297 studies. The total sum of all samples equaled 110,303 individual respondents with study sample size ranging from 25 to 3,859 (M = 371). The database and research materials are available as supplemental appendices (Melnyk, van Herpen, Jak, & Trijp, 2018).

Computation of Effect Sizes The required effect sizes for meta-analytic structural equation modeling are correlation coefficients. Most papers reported the Pearson correlation coefficients. For studies that did not report correlations, we converted t-ratios, F-ratios, and w2-statistics to correlation coefficients following the formulas provided by Lipsey and Wilson (2000). Zeitschrift für Psychologie (2019), 227(1), 4–17

We excluded studies that only reported regression results, because it is not appropriate to mix zero-order and regression coefficients, or partial correlations (which could be calculated from the regression coefficients) in one metaanalysis (Aloe, 2014). In papers with multiple studies, each study was included separately.

Coding of the Studies Interrater Agreement The majority of the sample (82% of the data entries) was coded by two independent judges. Interrater agreement was extremely good (the percentage of agreement for each of the constructs varied between 95% and 100%), and disagreements were resolved through discussion. Given this high interrater agreement, the remaining 18% of the data entries were coded by the main initial coder only. Type of Norm Type of norm was coded as injunctive when the norm contained a suggestion or expectation of what ought to be done (e.g., “you should. . .” or “my friends want me to. . .”, often referred to as normative beliefs) and as descriptive when the norm reflected what others do, or what they would Ó 2019 Hogrefe Publishing


V. Melnyk et al., The Influence of social norms on consumer decision making

do (e.g., “I think my friends drink more than 5 bottles of beer per week”, often referred to as behavioral beliefs).

9

Table 1. Description of the database Variable

Number of studies

% of studies

1

Norm type

Norm Aspects: Specificity The behavior was coded as specified when the act, situation, and/or time of its performance was specified (e.g., “eat 2 pieces of fruit per day”, “exercise at least 3 times a week”), and otherwise as unspecified (e.g., “eat healthy food”, “take regular physical activity”). Sanctions were coded as specified when negative consequences of not following the norm were provided (e.g., “my friends think I should use a condom during sexual intercourse, because it prevents disease acquisition”), and otherwise coded as unspecified. Similarly, rewards were coded as specified when positive consequences of following the norm (e.g., “my mother thinks I should eat fruit every day, because it is healthy for me”) were provided, and otherwise coded as unspecified.

Descriptive

78

26.3

282

94.9

Behavior specified

135

45.5

Sanctions specified

29

9.8

Rewards specified

46

15.5

Abstract

81

27.3

Authority figure

22

7.4

194

65.3

Injunctive Norm aspects Specificity1

Norm source

Close Target Gender2 Mostly female Mixed

Norm Aspects: Source The source of social norms was coded as (a) close when only close sources were mentioned (e.g., family members, partner, close friends), (b) authority figure when more distant sources with authority were mentioned (e.g., doctor, priest, official representatives), or (c) abstract when sources were general others or not mentioned (e.g., others, people from my environment, people important to me). Target: Gender and Age Gender was coded as the percentage of males in the sample and categorized into three groups (0–20% males, labeled as “mostly female”; 20%–80% male, labeled as “mixed”, and 80%–100% males, labeled as “mostly male”). Age was coded as the mean age of the participants and also categorized into three groups (up to 21 years, 21–50 years, and over 50 years). Table 1 provides further details on the independent variables. It shows that by far most of the studies included injunctive norms (95% of our sample), whereas fewer studies included descriptive norms (26% of our sample) due to its later introduction in the literature. Still, the sample of studies including descriptive norms (78 studies) is large compared to other meta-analyses; as a comparison, the meta-analysis of Manning (2009) included 21 studies with descriptive norms.

Statistical Analysis To preserve as much information as possible, we included separate effect sizes for subsamples (e.g., different age or gender groups) when this information was available. A small number of studies reported separate correlations for different items of the same construct (e.g., multiple items for injunctive norm measures) from the same sample, Ó 2019 Hogrefe Publishing

Mostly male Age

42

14.1

189

63.6

22

7.4

2

Up to 21 years

106

35.7

21–50 years

98

33.0

Over 50 years

13

4.4

Note. 1Percentages do not add up to 100% because the categories are not mutually exclusive (i.e., studies could contain both descriptive and injunctive norms, or specify behavior in multiple ways). 2Percentages do not add up to 100% due to missing information.

which are not independent measurements. As three-level MASEM is not available yet, these were averaged across (i.e., we averaged across z-transformed correlation coefficients and included the back-transformed correlation coefficients in the analysis, cf. Silver & Dunlap, 1987). We used the Two-stage approach (Cheung & Chan, 2005) to fit the hypothesized model to the data. In the first stage, correlation matrices are combined to form a pooled correlation matrix. In the second stage, a structural model is fitted to this pooled correlation matrix. In Stage 1, the random effects approach as implemented in the R-package metaSEM (version 1.1.1, Cheung, 2014b in R-version 3.4.4, R Core Team, 2018) was used to pool the correlation coefficients. Random effects models account for heterogeneity across studies. The degree of heterogeneity is evaluated using the I2 of the correlation coefficients (Higgins & Thompson, 2002). The I2 can be interpreted as the percentage of total variance that is due to between-study variability as opposed to sampling variability. In Stage 2, the hypothesized structural model (see Figure 1) is fitted on the pooled correlation matrix from Stage 1. Initially, we evaluated the model for the total sample of studies. Next, subgroup analysis on different groups of studies based on the study-level moderators examined the effects of these moderators. Whether the direct effects in Zeitschrift für Psychologie (2019), 227(1), 4–17


10

V. Melnyk et al., The Influence of social norms on consumer decision making

the model differed significantly across subgroups of studies was tested by constraining all effects to be equal across subgroups, and using likelihood ratio tests with the unconstrained model. Subsequently, for those cases where the constrained and unconstrained overall models differed significantly, we tested the equality of the effects of injunctive norms and descriptive norms separately. Throughout, the significance of parameter estimates was evaluated with 95% likelihood-based confidence intervals (Neale & Miller, 1997). If the 95% confidence interval around a parameter estimate did not include zero, the parameter estimate was considered significant at a 5% level. Publication bias is a serious concern in all meta-analyses, and refers to the problem that studies that find large and significant effects may be overrepresented in the academic literature (Rothstein, Sutton, & Borenstein, 2006). Publication bias may be detected by observing a dependency between effect size and sample size across studies. When publication bias is present, only the small sample studies that found favorable (large) effects will be published (Egger, Smith, Schneider, & Minder, 1997). This dependency can be tested by regressing the effect on the standard error. Publication bias will result in a significant positive relation between the standard error and effect size. In our metaanalysis, the relation between effect size and standard error was negative for all correlation coefficients, as can be seen in the supplemental Appendix B (“Plots of effect size and standard error”), which is not expected in the presence of serious publication bias (Melnyk et al., 2018).

Results Stage 1 Analyses (Pooling Correlation Matrices) Table 2 presents the pooled correlation matrix from Stage 1 analysis. All correlations were positive, as expected. The correlations between norms and attitudes were substantial (r = .37 for injunctive norms and r = .31 for descriptive norms). Importantly, and in line with our expectations, 95% confidence intervals indicated that the correlation between injunctive norms and attitudes (CI between .34 and .39) was higher than the correlation between descriptive norms and attitudes (CI between .27 and .35), and constraining these two correlations to be equal led to a significantly worse model fit (w2(1) = 6.21, p < .05). Thus, as anticipated, injunctive norms generally were related more strongly to attitudes than descriptive norms. Furthermore, results showed positive correlations between social norms and behavior (r = .22 for injunctive norms and r = .35 for descriptive norms). All correlation Zeitschrift für Psychologie (2019), 227(1), 4–17

coefficients had significant study-level variance, with I2 ranging between .85 and .97 (see Table 2). This implies that a substantial percentage of the total variance was due to between-study variability, and that it is thus appropriate to consider moderators.

Stage 2 Analyses (Fitting the Full Model) Table 3 presents results for the full model, which was able to explain 42% of the variance in behavioral intention, and 30% of the variance in behavior. As expected, descriptive norms had a substantial and positive total effect on behavior (β = .25). The total effect of descriptive norms was even larger than that of attitudes (β = .19). Moreover, whereas the effect of attitudes was completely mediated by behavioral intentions, the effect of descriptive norms was mostly direct (β = .17) and only partly mediated by behavioral intentions (β = .08). Constraining this indirect effect to be equal to the direct effect led to a significantly worse model fit (w2(1) = 4.74, p < .05). This supports recent insights that descriptive norms mainly act as heuristics, directly affecting behavior without much deliberation (Göckeritz et al., 2010; Jacobson et al., 2011). MASEM results showed that the indirect effect of injunctive norms on behavior was statistically significant, but very small (β = .05), while the direct effect was not significantly different from zero. This finding illustrates that although injunctive norms were positively correlated with behavior (r = .22), this does not necessarily imply that the effect of injunctive norms on behavior is substantial once the effects of other variables have been taken into account (cf. Nigbur, Lyons, & Uzzell, 2010; Thøgersen, 2008).

Moderators For each of the moderator variables, the data were split into subgroups based on the values of the moderator. Studies with a missing value on the moderator were not included in the respective analysis of this moderator. Table 4 shows the results of testing the equality of direct effects across subgroups of studies. First, we set all direct effects equal across subgroups, leading to a significant increase in chisquare if not all effects are in fact equal (see column with “Constrained model”). Next, the equality constraints were released on the direct effects from injunctive norms (see column “Model Free IN”) and on the direct effects of descriptive norms (see column “Model Free DN”). If releasing the constraints led to a significant decrease in chisquare compared with the constrained model, the direct effects are considered to be different across subgroups. For the significant moderator variables, Table 5 provides the direct, indirect, and total effects with their confidence Ó 2019 Hogrefe Publishing


V. Melnyk et al., The Influence of social norms on consumer decision making

11

Table 2. Pooled correlations (r), 95% confidence intervals (CI), proportion of study-level variance (I2), and number of studies that included the respective correlation coefficient (k) IN

DN

A

I

PBC

Injunctive norm (IN) Descriptive norm (DN) r

.38

[CI]

[.33; .43]

I2

.93

k

(55)

Attitude (A) r [CI]

.37

.31

[.34; .39]

[.27; .35]

I2

.91

.85

k

(166)

(40)

.25

.23

.35 [.32; .38]

Perceived behavioral control (PBC) r [CI]

[.22; .28]

[.16; .30]

I2

.92

.94

.95

k

(146)

(35)

(143)

Intention (I) r [CI]

.39

.42

.54

.43

[.37; .42]

[.37; .46]

[.52; .56]

[.39; .46]

I2

.92

.89

.94

.97

k

(179)

(45)

(165)

(165)

Behavior (B) r [CI]

.22

.35

.34

.31

.51

[.20; .25]

[.32; .39]

[.31; .37]

[.27; .36]

[.47; .55]

I2

.86

.90

.88

.95

.95

k

(128)

(46)

(92)

(89)

(100)

Table 3. Indirect, direct, and total effects on behavioral intention and behavior Direct effect on intention

Indirect effect on behavior

Injunctive norms

.13 [.09; .17]

.05 [.03; .07]

.04 [

Descriptive norms

.21 [.15; .26]

.08 [.06; .11]

.17 [.10; .23]

.25 [.19; .30]

Attitude

.35 [.31; .39]

.13 [.11; .16]

.06 [.00; .11]

.19 [.14; .24]

Perceived behavioral control

.22 [.18; .27]

.08 [.06; .11]

.10 [.04; .16]

.18 [.13; .25]

.38 [.31; .45]

.38 [.31; .45]

Intention

Direct effect on behavior .08; .00]

Total effect on behavior .01 [

.03; .05]

Note. 95% confidence intervals provided in square brackets.

intervals in the different subgroups. We will now discuss each of these in turn.

direct effects of social norms on behavior, both for injunctive and descriptive norms (see Table 5 for details).

Norm Aspects: Specificity With respect to specificity, there is a significant moderating effect of the specification of sanctions, but no significant moderating effect of the specification of rewards or behavior. In other words, whether rewards are specified in social norms did not affect their influence, whereas sanctions mattered. As expected, specifying sanctions increased the

Norm Aspects: Source We expected that norms from close sources would have a stronger influence than norms from authority figures and from abstract sources. This is indeed what results showed for the direct and total effects of social norms on behavior, but not for the indirect effects through behavioral intentions. Norms from close others had a less negative (in the

Ó 2019 Hogrefe Publishing

Zeitschrift für Psychologie (2019), 227(1), 4–17


12

V. Melnyk et al., The Influence of social norms on consumer decision making

Table 4. w2, degrees of freedom (df), and p-values of models with all effects equal versus models with the effects of norms free across groups Constrained model

Model Free IN

w2

df

Behavior specified

14.14

9

.12

Sanctions specified

47.77

9

< .01

Rewards specified

10.30

9

.33

Source

46.55

18

< .01

Moderator

p

Δw2

Δdf

Model Free DN p

Δw2

Δdf

p

Norm aspects 26.79

2

< .01

32.30

2

< .01

21.50

4

< .01

31.06

4

< .01

72.73

4

< .01

118.18

4

< .01

Target Gender (1 vs. 2 + 3) Age

2.37

9

.98

161.20

18

< .01

Note. DN = descriptive norm; IN = injunctive norm. We also tested the effects of the moderators by comparing the fit of the unconstrained model with the fit of models where the effects of, respectively, injunctive or descriptive norms were constrained to be equal across subgroups. With this step-up procedure we found significant moderation only for age. The cause of the difference between the top-down and step-up procedures is probably the difference in statistical power and Type 1 error of these tests. To guard against Type 1 errors we employed conservative alpha levels of .01, and only tested differences on IN and DN if the chi-square of the overall constrained model was significant.

Table 5. Indirect, direct, and total effects of injunctive and descriptive norms on behavior in the separate groups for significant moderators Injunctive norm Group

Indirect effect

Direct effect

Descriptive norm Total effect

Indirect effect

Direct effect

Total effect

Norm aspects Sanctions Specified Not specified

.09a

.14a

a

b

.05

.06

.23a .01

b

.12a

.51a

.63a

a

.08

.17

b

.24b

.07a

.00a

.07a

Source Abstract Authority Close

.04a

.13a

.09a

.04

a

.01

ab

.03

ab

.06

a

.00

b

.06

b

a

.12

ab

.16ab

.09

.18

b

.26b

.04

a

Age Up to 21

.05a

.03a

.08a

.11a

.17a

.28a

21–50

.06a

.00ab

.06a

.12a

.06ab

.18a

Over 50

.01

b

b

b

.08

b

.07

.05

.10

b

.04b

Note. Coefficients with different subscripts across groups within a moderator and within a norm are significantly different from each other based on nonoverlapping confidence intervals. Coefficients in italics are not significantly different from zero.

case of injunctive norms) or a more positive (in the case of descriptive norms) effect on behavior than norms from abstract sources. The effect of norms from authority figures was in-between and was not significantly different from either. Target: Gender and Age There was no a-priori expectation for the moderating effect of gender, and although the overall results indicate significant differences between gender groups (see Table 4), the more detailed investigation in Table 5 revealed overlapping confidence intervals in most situations and no systematic effects of gender. With respect to age, both indirect and direct effects of social norms were less positive (or even negative) for people in the higher age group (over 50), especially when comZeitschrift für Psychologie (2019), 227(1), 4–17

pared to people in the youngest age group (up to 21). The direct effects of both injunctive and descriptive norms on behavior were negative for the older age group, whereas this was not the case for the other two age groups. It thus appeared that people in the older age group were less likely to follow a social norm, whereas especially people in the youngest age group were more prone to conform to a norm. Given that many of the studies in our sample (46%) contained students/pupils of younger age categories, we did a follow-up test to compare studies with student samples to studies with non-student samples directly. Here, we found no significant differences for the effects of injunctive norms, but the total effect of descriptive norms on behavior was significantly different. This total effect was higher for student samples (β = .29, CI [.23; .35]) than for non-student samples (β = .14, CI [.08; .21]). Ó 2019 Hogrefe Publishing


V. Melnyk et al., The Influence of social norms on consumer decision making

Discussion Overall Effectiveness of Descriptive and Injunctive Norms This meta-analysis examined the effectiveness of descriptive and injunctive norms, as well as the moderating effects related to aspects of the norm and the target person. In line with our expectations, we have found that descriptive norms have a larger (total) effect on behavior than injunctive norms. Actually, when controlling for the effects of descriptive norms, attitudes, and perceived behavioral control, the indirect effect of injunctive norms on behavior is very small (β = .05) and the direct effect is insignificant, despite a significant positive bivariate correlation between injunctive norms and behavior (r = .22). Using MASEM allows us to estimate effects while controlling for other variables, and the insight that there is no unique contribution of injunctive norms in explaining behavior would not have been revealed with other, more traditional, meta-analysis methods. Our results moreover show that descriptive norms affect behavior primarily directly, whereas the effect of injunctive norms relies on the indirect effect through intentions. Furthermore, injunctive norms have a stronger relation with attitudes than descriptive norms. Overall, the results provide substantial support for the proposition that, controlled for attitudes and perceived behavioral control, descriptive norms have a stronger total effect on behavior than injunctive norms. This generalizes prior work showing that descriptive norms are more effective in changing behavior (Nolan et al., 2008) and implies that descriptive norms generally activate a requested behavior much more strongly than injunctive norms do. The small effect of injunctive norms on behavior is somewhat surprising, given that injunctive norms (under the name of subjective norms) are an essential part of influential theoretical models, such as the theory of planned behavior. Yet, this is not completely unexpected, as previous studies have shown that descriptive norms are an important addition to these models, especially due to their direct relation with behavior (Rivis & Sheeran, 2003). The finding is also in line with results of the metaanalysis of Manning (2009), where the total effect of injunctive norms is non-significant for various subsamples (e.g., when behavior is socially approved, as is the case is most of our underlying studies).

Moderators The effects of social norms are qualified by several moderators, which we classify into aspects of the norm and target person. With respect to norm aspects, the effectiveness of norms depends on the specification of sanctions, but not Ó 2019 Hogrefe Publishing

13

of rewards and behavior. This suggests that perhaps the behavior itself and rewards from performing the advocated behavior become automatically salient with social norms, and explicitly specifying these rewards and behaviors then has little incremental effect. This is in line with the evolutionary perspective that following social norms is inherently rewarding (Griskevicius et al., 2009, 2012). Explicitly specifying sanctions, in contrast, increases the direct effect of both injunctive and descriptive norms. This finding is in line with the negativity effect that has been shown across a broad range of psychological phenomena (Baumeister, Bratslavsky, Finkenauer, & Vohs, 2001): negatively valenced events have a greater impact on people than positively valenced events. People respond to sanctions by changing their behavior, presumably in an effort to avoid the sanction. There is no significant change in the indirect effect of the social norms, implying that the specification of sanctions does not significantly affect people’s intentions. Another important finding for norm aspects is that the source of the norm matters. Previous research suggests that the identity with reference group should increase the influence of social norms on people (Lapinski & Rimal, 2005). Consistent with this suggestion and with social identity theory (Tajfel & Turner, 1986), consumers are more likely to perform an advocated behavior when the source of the social norm is close to them rather than when the source is more distant or abstract. An important individual difference variable that affects the influence of social norms is age. Older people (over 50) are less susceptible to social norms than younger people (up to 21). Direct effects of social norms on behavior are negative for the older age group, implying that these older people are less likely to follow a social norm. Younger people, in contrast, are more likely to conform to social norms. When it comes to gender, no consistent pattern of differences in the effectiveness of social norms between males and females is present. This could also be due to differential mechanisms of those effects. For example, it is possible that whereas male motivation to comply with a social norm is driven primarily by their desire to be part of the group, female motivation is driven by their desire to maintain individual relationships (Melnyk, Van Osselaer, & Bijmolt, 2009).

Implications The results of this research have important implications for both researches investigating social norms and their effectiveness as well as the practitioners using social norms. With respect to the academic contribution, first, our key finding is that consumers respond differently to injunctive versus descriptive norms. Thus, although recently the distinction between injunctive and descriptive norms has been criticized, because consumers do not make the distinction Zeitschrift für Psychologie (2019), 227(1), 4–17


14

V. Melnyk et al., The Influence of social norms on consumer decision making

themselves, and mix up these two types of norms (Eriksson, Strimling, & Coultas, 2015), the current meta-analysis shows that their effect is distinctly different. Injunctive and descriptive norms are correlated (r = .38 across the papers in our meta-analysis), but have different effects on behavioral intentions and behavior. Specifically, injunctive norms lead to effects on behavioral intentions but not always on behavior, whereas descriptive norms generally influence both intentions and behavior. Understanding these differences provides important insights for research on social norms, by highlighting that an investigation of the influence of social norms which examines only intentions or only behavior does not provide a complete picture of the effect of social norms. To truly understand the influence of social norms, both intentions and behavior need to be examined. Moreover, although descriptive norms may be more effective in changing behavior, injunctive norms have a stronger correlation with behavioral intentions. Thus, injunctive norms may be more appropriate for activating and perhaps also changing people’s intentions, and for focusing them on the social group they are part of and on the norms that apply therein. Second, we have uncovered important moderators of the effectiveness of social norms. Specifically, the results of the meta-analyses reveal an intriguing asymmetry in the response to specifications: i.e., the effectiveness of norms depends on the specification of sanctions, but not rewards. This important finding contributes to diverse streams of academic literature, in particular those dealing with instrumental conditioning (Schacter, Gilbert, & Wegner, 2011) from marketing and management to clinical psychology. Finally, our findings with regard to target person-related moderators, the results of the meta-analyses shed light on both (1) insignificant and (2) significant moderators of effectiveness of social norms. With respect to the first group, our results suggest that older people are less susceptible to social norms than younger people, which can influence the decision on whether to employ social norms in marketing campaigns, depending on target group. With respect to the second group, the findings of our meta-analyses contribute to the stream of research on gender differences (e.g., Barone & Roy, 2010) by revealing that gender does not have a consistent pattern of differences in the context of responses to social norms. With respect to the managerial implications, the results of this meta-analysis show that descriptive norms are generally more effective than injunctive norms in affecting behavior. Hence, the first implication is that we would advise marketing managers who want to promote a product or service to focus on descriptive norms. This makes descriptive norms an especially convenient and effective instrument in situations when consumers make immediate

Zeitschrift für Psychologie (2019), 227(1), 4–17

decisions. For example, the effect of Amazon’s “people who bought this book, bought these other books” recommendations can be enhanced by incorporating descriptive norms (e.g., “most people who bought this book, bought these other books”). Similarly, product packaging suggesting descriptive norms (e.g., with the labels “best seller” or using product ratings) is likely to enhance the likelihood that a product is chosen. However, it is important to realize that descriptive norms may also backfire for instance when most people do not (yet) perform the desired behavior (e.g., most people do not consume enough fruit and vegetables according to dietary guidelines) or when the target group consists of older consumers (as the direct effects of social norms on behavior for this target group is negative). In those situations, descriptive social norms should be avoided. Furthermore, the fact that descriptive norms depend on the behavior of others and are internalized to a lower extent implies that the requested behavior may just as easily vanish when some people do not comply. Therefore, marketers should communicate descriptive norms frequently. For example, Glider, Midyett, Mills-Novoa, Johannessen, and Collins (2001) report changes in perceptions and behavior when messages are communicated at least once a week. Moreover, marketers should make sure to implement the campaign over a sufficient duration to affect change. For example, Clapp, Lange, Russell, Shillington, and Voas (2003) report a failure of a six-week campaign to reduce alcohol use, whereas Glider et al. (2001) report successful results of a similar campaign, which was run for 3 years. Policy makers and marketers who want to employ injunctive norms should keep in mind that the specification of sanctions or referring to or directly targeting people that are close to the consumers whom these campaigns want to influence can strengthen the effects of injunctive norms. Finally, our finding that young people, compared to older people, are especially influenced by social norms opens the avenue of combining social norms and social media marketing (e.g., Facebook) as a very effective and costefficient marketing tool.

Limitations and Future Research A meta-analysis can only examine the influence of variables that have been frequently reported in prior studies. There are social norm aspects that would be intriguing to examine, but that could not be included in the present metaanalysis, because these were either rarely reported in enough detail or hardly varied in the prior studies that we identified. These constitute possible directions for future research. Group size, that is, the number of others who provide the social norm, is one of these. Do social norms have

Ó 2019 Hogrefe Publishing


V. Melnyk et al., The Influence of social norms on consumer decision making

a weaker or stronger influence on consumer behavior when these norms are shared between more individuals? The effect of group size is not obvious, because larger groups may entail an increase in pressure from multiple persons (perhaps enhancing the influence of especially descriptive norms), whereas smaller groups may be more cohesive and have a more stringent social control of (especially injunctive) norms. Another potentially interesting aspect to consider is the dominant motive that consumers have when they encounter a social norm. For instance, research has shown that descriptive norms are more effective when consumers are motivated by fear and less effective when consumers are motivated by romantic desire (Griskevicius et al., 2009). These and other aspects remain promising directions for future research. The current meta-analysis drew upon a large dataset of prior studies and included studies across all stages of consumption behavior including initial purchase decisions (e.g., decisions to start smoking), regular consumption (e.g., amount of fruit in a diet), and post-consumption (e.g., decisions to recycle), focusing on the context of material objects, services, and consumption behaviors. Interpersonal relations and judgments were excluded from the current meta-analysis. Future research may want to examine whether results of the current meta-analysis differ from those obtained in studies concerning interpersonal relations (e.g., negotiations with sales personnel, decisions to join Fvirtual communities; Dholakia, Bagozzi, & Klein Pearo, 2004), and can do so by adding studies to the available database. With regard to statistical analysis, it is important to stress that although we analyze one-directional effects in the structural model, this does not imply that the effects could not be reversed. Similar to standard regression models, our hypothesized model is saturated, and will fit the data perfectly by definition. Justification of the model can therefore only be based on theory, which is quite strong for the applied model (Ajzen, 1991).

Conclusion Descriptive norms generally are more effective in influencing behavior than injunctive norms. Descriptive norms affect behavior primarily directly, whereas injunctive norms rely on an indirect effect through intentions and are more closely linked to people’s attitudes and intentions. These general effects are moderated by aspects of the norm, and the target person. Specifically, norms coming from close others, with specified sanctions are especially influential, and relatively young people are more likely to conform to social norms than older people.

Ó 2019 Hogrefe Publishing

15

References Aarts, H., & Dijksterhuis, A. (2003). The silence of the library: Environment, situational norm, and social behavior. Journal of Personality and Social Psychology, 84, 18–28. https://doi.org/ 10.1037/0022-3514.84.1.18 Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179–211. https:// doi.org/10.1016/0749-5978(91)90020-T Albarracín, D., Johnson, B. T., Fishbein, M., & Muellerleile, P. A. (2001). Theories of reasoned action and planned behavior as models of condom use: A meta-analysis. Psychological Bulletin, 127, 142–161. https://doi.org/10.1037/0033-2909.127. 1.142 Alcock, J. (2005). Animal behavior: An evolutionary approach (8th ed.). Sunderland, MA: Sinauer. Aloe, A. M. (2014). An empirical investigation of partial effect sizes in meta-analysis of correlational data. Journal of General Psychology, 141, 47–64. https://doi.org/10.1080/00221309.2013. 853021 Armitage, C. J., & Connor, M. (2001). Efficacy of the theory of planned behaviour: A meta-analytic review. British Journal of Social Psychology, 40, 471–499. https://doi.org/10.1348/ 014466601164939 Barone, M. J., & Roy, T. (2010). Does exclusivity always pay off? Exclusive price promotions and consumer response. Journal of Marketing, 74, 121–132. https://doi.org/10.1509/jmkg.74.2.121 Baumeister, R. F., Bratslavsky, E., Finkenauer, C., & Vohs, K. D. (2001). Bad is stronger than good. Review of General Psychology, 5, 323–370. https://doi.org/10.1037/1089-2680.5.4.323 Cheung, M. W.-L. (2014a). Fixed- and random-effects metaanalytic structural equation modeling: Examples and analyses in R. Behavior Research Methods, 46, 29–40. https://doi.org/ 10.3758/s13428-013-0361-y Cheung, M. W.-L. (2014b). metaSEM: An R package for metaanalysis using structural equation modeling. Frontiers in Psychology, 5, 1521. https://doi.org/10.3389/fpsyg.2014.01521 Cheung, M. W.-L., & Chan, W. (2005). Meta-analytic structural equation modeling: A two-stage approach. Psychological Methods, 10, 40–64. https://doi.org/10.1037/1082-989X.10.1.40 Christian, J., & Armitage, C. J. (2002). Attitudes and intentions of homeless people towards service provision in South Wales. British Journal of Social Psychology, 41, 219–231. https://doi. org/10.1348/014466602760060101 Cialdini, R. B., & Goldstein, N. (2004). Social influence: Compliance and conformity. Annual Review of Psychology, 55, 591–621. https://doi.org/10.1146/annurev.psych.55.090902.142015 Cialdini, R. B., Reno, R. R., & Kallgren, C. A. (1990). A focus theory of normative conduct: Recycling the concept of norms to reduce littering in public places. Journal of Personality and Social Psychology, 58, 1015–1026. https://doi.org/10.1037/0022-3514. 58.6.1015 Cialdini, R. B., & Trost, M. R. (1998). Social influence: Social norms, conformity, and compliance. In D. T. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), The handbook of social psychology (pp. 151–192). Boston, MA: McGraw-Hill. Clapp, J. D., Lange, J. E., Russell, C., Shillington, A., & Voas, R. B. (2003). A failed norms social marketing campaign. Journal of Studies on Alcohol, 64, 409–414. https://doi.org/10.15288/ jsa.2003.64.409 Dholakia, U. M., Bagozzi, R. P., & Klein Pearo, L. (2004). A social influence model of consumer participation in network- and small-group based virtual communities. International Journal of Research in Marketing, 21, 241–263. https://doi.org/ 10.1016/j.ijresmar.2003.12.004

Zeitschrift für Psychologie (2019), 227(1), 4–17


16

V. Melnyk et al., The Influence of social norms on consumer decision making

Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629–634. https://doi.org/10.1136/bmj.315. 7109.629 Eriksson, K., Strimling, P., & Coultas, J. C. (2015). Bidirectional associations between descriptive and injunctive norms. Organizational Behavior and Human Decision Processes, 127, 59–69. https://doi.org/10.1016/j.obhdp.2014.09.011 Feldman, D. C. (1984). The development and enforcement of group norms. Academy of Management Review, 9, 47–53. https://doi. org/10.2307/258231 Fishbein, M., & Ajzen, I. (1975). Belief, attitude, intention and behavior: An introduction to theory and research. Reading, MA: Addison-Wesley. Fisher, T. D. (2009). The impact of socially conveyed norms on the reporting of sexual behavior and attitudes by men and women. Journal of Experimental Social Psychology, 45, 567–572. https://doi.org/10.1016/j.jesp.2009.02.007 Glider, P., Midyett, S. J., Mills-Novoa, B., Johannessen, K., & Collins, C. (2001). Challenging the collegiate rite of passage: A campus-wide social marketing media campaign to reduce binge drinking. Journal of Drug Education, 31, 207–220. https:// doi.org/10.2190/U466-EPFG-Q76D-YHTQ Göckeritz, S., Schultz, P. W., Rendón, T., Cialdini, R. B., Goldstein, N. J., & Griskevicius, V. (2010). Descriptive normative beliefs and conversation behavior: The moderating roles of personal involvement and injunctive normative beliefs. European Journal of Social Psychology, 40, 514–523. https://doi.org/10.1002/ejsp.643 Goldstein, N. J., Cialdini, R. B., & Griskevicius, V. (2008). A room with a viewpoint: Using social norms to motivate environmental conservation in hotels. Journal of Consumer Research, 35, 472–482. https://doi.org/10.1086/586910 Gollwitzer, P. M., & Brandstatter, V. (1997). Implementation intentions and effective goal pursuit. Journal of Personality and Social Psychology, 73, 186–199. https://doi.org/10.1037/ 0022-3514.73.1.186 Griskevicius, V., Goldstein, N. J., Mortensen, C. R., Sundie, J. M., Cialdini, R. B., & Kenrick, D. T. (2009). Fear and loving in Las Vegas: Evolution, emotion, and persuasion. Journal of Marketing Research, 46, 384–395. https://doi.org/10.1509/jmkr.46.3.384 Griskevicius, V., Cantú, S. M., & van Vugt, M. (2012). The evolutionary bases for sustainable behavior: Implications for marketing, policy, and social entrepreneurship. Journal of Public Policy & Marketing, 31, 115–128. https://doi.org/10.1509/jppm.11.040 Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21, 1539–1558. https://doi.org/10.1002/sim.1186 Jacobson, R. P., Mortensen, C. R., & Cialdini, R. B. (2011). Bodies obliged and unbound: Differentiated response tendencies for injunctive and descriptive social norms. Journal of Personality and Social Psychology, 100, 433–448. https://doi.org/10.1037/ a0021470 Jones, T. (2006). “We always have a beer after the meeting”: How norms, customs, conventions, and the like explain behavior. Philosophy of the Social Sciences, 36, 251–275. https://doi.org/ 10.1177/0048393106289791 Kredentser, M. S., Fabrigar, L. R., Smith, S. M., & Fulton, K. (2012). Following what people think we should do versus what people actually do: Elaboration as a moderator of the impact of descriptive and injunctive norms. Social Psychological and Personality Science, 3, 341–347. https://doi.org/10.1177/ 1948550611420481 Langner, T., Schmidt, J., & Fischer, A. (2015). Is it really love? A comparative investigation of the emotional nature of brand and interpersonal love. Psychology & Marketing, 32, 624–634. https://doi.org/10.1002/mar.20805

Zeitschrift für Psychologie (2019), 227(1), 4–17

Lapinski, M. K., & Rimal, R. N. (2005). An explication of social norms. Communication Theory, 15, 127–147. https://doi.org/ 10.1111/j.1468-2885.2005.tb00329.x Leshner, A. I. (1997). Addiction is a brain disease, and it matters. Science, 278, 45–47. https://doi.org/10.1126/science.278. 5335.45 Lipsey, M. W., & Wilson, D. B. (2000). Practical meta-analysis. London, UK: Sage. Manning, M. (2009). The effects of subjective norms on behaviour in the theory of planned behaviour: A meta-analysis. British Journal of Social Psychology, 48, 649–705. https://doi.org/ 10.1348/014466608X393136 Mazar, N., & Zhong, C.-B. (2010). Do green products make us better people? Psychological Science, 21, 494–498. https://doi. org/10.1177/0956797610363538 Melnyk, V., van Herpen, E., Fischer, A. R. H., & van Trijp, H. C. M. (2011). To think or not to think: The effect of cognitive deliberation on the influence of injunctive versus descriptive social norms. Psychology & Marketing, 28, 709–729. https:// doi.org/10.1002/mar.20408 Melnyk, V., van Herpen, E., Fischer, A. R. H., & van Trijp, H. C. M. (2013). Regulatory fit effects for injunctive versus descriptive social norms: Evidence from the promotion of sustainable products. Marketing Letters, 24, 191–203. https://doi.org/ 10.1007/s11002-013-9234-5 Melnyk, V., van Herpen, E., Jak, S., & Trijp, H. C. M. (2018). The mechanism of the social norms’ influence on consumer decision making: A meta-analysis. [Supplementary material]. Trier, Germany: ZPID. http://dx.doi.org/10.23668/psycharchives.921 Melnyk, V., Van Osselaer, S. M., & Bijmolt, T. H. (2009). Are women more loyal customers than men? Gender differences in loyalty to firms and individual service providers. Journal of Marketing, 73, 82–96. https://doi.org/10.1509/jmkg.73.4.82 Meyers, C. (2004). Cruel choices: Autonomy and critical care decision-making. Bioethics, 18, 104–119. https://doi.org/ 10.1111/j.1467-8519.2004.00384.x Moher, D., Liberati, A., Tetzlaff, J., & Altman, D., The PRISMA Group. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Open Medicine, 3, 123–130. https://doi.org/10.1371/journal.pmed1000097 Neale, M. C., & Miller, M. B. (1997). The use of likelihood-based confidence intervals in genetic models. Behavior Genetics, 27, 113–120. https://doi.org/10.1023/A:1025681223921 Neidert, G. P. M., & Linder, D. E. (1990). Avoiding social traps: Some conditions that maintain adherence to restricted consumption. Social Behaviour, 5, 261–284. https://doi.org/ 10.1348/014466609X449395 Nigbur, D., Lyons, E., & Uzzell, D. (2010). Attitudes, norms, identity, and environmental behavior: Using an expanded theory of planned behaviour to predict participation in a kerbside recycling programme. British Journal of Social Psychology, 49, 259–284. Nolan, J. M., Schultz, P. W., Cialdini, R. B., Goldstein, N. J., & Griskevicius, V. (2008). Normative social influence is underdetected. Personality and Social Psychology Bulletin, 34, 913–923. https://doi.org/10.1177/0146167208316691 O’Keefe, D. J. (1997). Standpoint explicitness and persuasive effect: A meta-analytic review of the effects of varying conclusion articulation in persuasive messages. Argumentation and Advocacy, 34, 1–12. https://doi.org/10.1080/00028533.1997. 11978023 O’Keefe, D. J. (2002). Persuasion: Theory and research (2nd ed.). London, UK: Sage. Okun, M. A., Karoly, P., & Lutz, R. (2002). Clarifying the contribution of subjective norm to predicting leisure-time exercise. American Journal of Health Behavior, 26, 296–305. https://doi. org/10.5993/AJHB.26.4.6

Ó 2019 Hogrefe Publishing


V. Melnyk et al., The Influence of social norms on consumer decision making

Ozaki, R., & Sevastyanova, K. (2011). Going hybrid: An analysis of consumer purchase motivations. Energy Policy, 39, 2217–2227. https://doi.org/10.1016/j.enpol.2010.04.024 Pasupathi, M. (1999). Age differences in response to conformity pressure for emotional and nonemotional material. Psychology and Aging, 14, 170–174. https://doi.org/10.1037/0882-7974.14. 1.170 Pliner, P., & Mann, N. (2004). Influence of social norms and palatability on amount consumed and food choice. Appetite, 42, 227–237. https://doi.org/10.1016/j.appet.2003.12.001 Poškus, M. S. (2016). Using social norms to encourage sustainable behaviour: A meta-analysis. Psichologija, 53, 44–58. https://doi.org/10.15388/Psichol.2016.53.10031 Poškus, M. S. (2018). Investigating pro-environmental behaviors of Lithuanian university students. Current Psychology, 37, 225–233. https://doi.org/10.1007/s12144-016-9506-3 R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria; R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/ Reno, R. R., Cialdini, R. B., & Kallgren, C. A. (1993). The transsituational influence of social norms. Journal of Personality and Social Psychology, 64, 104–112. https://doi.org/10.1037/00223514.64.1.104 Rimal, R. N., & Real, K. (2005). How behaviors are influenced by perceived norms: A test of the theory of normative social behavior. Communication Research, 32, 389–414. https://doi. org/10.1177/0093650205275385 Rivis, A., & Sheeran, P. (2003). Descriptive norms as an additional predictor in the theory of planned behaviour: A meta-analysis. Current Psychology, 22, 218–233. https://doi.org/10.1007/ s12144-003-1018-2 Rothstein, H. R., Sutton, A. J., & Borenstein, M. (Eds.). (2006). Publication bias in meta-analysis: Prevention, assessment and adjustments. Chichester, UK: Wiley. Schacter, D. L., Gilbert, D. T., & Wegner, D. M. (2011). The role of reinforcement and punishment. In Psychology (2nd ed., pp. 278–288). New York, NY: Worth. Schultz, P. W., Nolan, J. M., Cialdini, R. B., Goldstein, N. J., & Griskevicius, V. (2007). The constructive, destructive and reconstructive power of social norms. Psychological Science, 18, 429–434. https://doi.org/10.1111/j.1467-9280.2007.01917.x Shaffer, L. S. (1983). Toward Pepitone’s vision of a normative social psychology: What is a social norm? Journal of Mind and Behavior, 4, 275–293. Sheeran, P., Abraham, C., & Orbell, S. (1999). Psychosocial correlates of heterosexual condom use: A meta-analysis. Psychological Bulletin, 125, 90–132. https://doi.org/10.1037/ 0033-2909.125.1.90 Sheeran, P., & Taylor, S. (1999). Predicting intentions to use condoms: A meta-analysis and comparison of the theories of reasoned action and planned behavior. Journal of Applied Social Psychology, 29, 1624–1675. https://doi.org/10.1111/ j.1559-1816.1999.tb02045.x Sheppard, B. H., Hartwick, J., & Warshaw, P. R. (1988). The theory of reasoned action: A meta-analysis of past research with recommendations for modifications and future research. Journal of Consumer Research, 15, 325–343. https://doi.org/ 10.1086/209170 Silver, N. C., & Dunlap, W. P. (1987). Averaging correlation coefficients: Should Fisher’s z transformation be used? Journal of Applied Psychology, 72, 146–148. https://doi.org/10.1037/ 0021-9010.72.1.146

Ó 2019 Hogrefe Publishing

17

Stangor, C. (2004). Social groups in action and interaction. New York, NY: Psychology Press. Staunton, M., Louis, W. R., Smith, J. R., Terry, D. J., & McDonald, R. I. (2014). How negative descriptive norms for healthy eating undermine the effects of positive injunctive norms. Journal of Applied Social Psychology, 44, 319–330. https://doi.org/ 10.1111/jasp.12223 Tajfel, H., & Turner, J. C. (1986). The social identity theory of intergroup behavior. In W. G. Austin & S. Worchel (Eds.), Psychology of intergroup relations (pp. 7–24). Chicago, IL: Nelson-Hall. Terry, D. J., Hogg, M. A., & McKimmie, B. M. (2000). Attitudebehaviour relations: The role of in-group norms and mode of behavioural decision-making. British Journal of Social Psychology, 39, 337–361. https://doi.org/10.1348/014466600164534 Terry, D. J., Hogg, M. A., & White, K. M. (2000). Attitude-behavior relations: Social identity and group membership. In D. J. Terry & M. A. Hogg (Eds.), Attitudes, behavior, and social context: The role of norms and group membership (pp. 67–94). London, UK: Erlbaum. Thøgersen, J. (2008). Social norms and cooperation in real-life social dilemmas. Journal of Economic Psychology, 29, 458–472. https://doi.org/10.1016/j.joep.2007.12.004 Wechsler, H., Nelson, T. E., Lee, J. E., Seibring, M., Lewis, C., & Keeling, R. P. (2003). Perception and reality: A national evaluation of social norms marketing interventions to reduce college students’ heavy alcohol use. Journal of Studies on Alcohol, 64, 484–494. https://doi.org/10.15288/jsa.2003.64.484 White, K., & Simpson, B. (2013). When do (and don’t) normative appeals influence sustainable consumer behaviors? Journal of Marketing, 77, 78–95. https://doi.org/10.1509/jm.11.0278 Yoon, C., Gutchess, A. H., Feinberg, F., & Polk, T. A. (2006). A functional magnetic resonance imaging study of neural dissociations between brand and person judgments. Journal of Consumer Research, 33, 31–40. https://doi.org/10.1086/ 504132 History Received February 28, 2018 Revision received October 24, 2018 Accepted October 28, 2018 Published online March 29, 2019 Acknowledgments The authors thank Rik Pieters, Harald van Heerde, Marijn de Bruin, Marcel Kornelis, and Arnout Fischer for their helpful comments and feedback. Authorship The first two authors contributed equally. Funding Suzanne Jak was supported by grant NWO-VENI-451-16-001 from the Netherlands Organization for Scientific Research (NWO). Vladimir Melnyk Department of Business Administration Carlos III University of Madrid Calle Madrid 126 28903 Getafe-Madrid Spain vladimir.melnyk@uc3m.es

Zeitschrift für Psychologie (2019), 227(1), 4–17


Review Article

How Does Ethical Leadership Impact Employee Organizational Citizenship Behavior? A Meta-Analytic Review Based on Two-Stage Meta-Analytic Structural Equation Modeling (TSSEM) Yucheng Zhang1, Long Zhang2, Guangjian Liu3, Jiali Duan4, Shan Xu5, and Mike W.-L. Cheung6 1

School of Public Administration, Southwestern University of Finance and Economics, Chengdu, China Business School, Hunan University, Changsha, China

2 3

Department of Organization and Human Resources, School of Business, Renmin University of China, Beijing, China

4

School of Management, University of New South Wales Business School, Sydney, Australia

5

School of Business Administration, Southwestern University of Finance and Economics, Chengdu, China

6

Department of Psychology, National University of Singapore, Singapore

Abstract: This study investigates the mechanism between ethical leadership and employees’ organizational citizenship behavior (OCB) through a justice perspective. We propose that interactional justice serves as a conduit that induces employees’ OCB in response to leaders’ ethical behaviors. The explanatory powers of interactional justice on two forms of OCB – organizational OCB (OCBO) and interpersonal OCB (OCBI) – are compared. Based on data from 34,028 participants in 100 empirical studies, we applied a two-stage meta-analytic structural equations modeling (TSSEM) and found that interactional justice only explains the influence of ethical leadership on employees’ OCBO but not OCBI. Theoretical and managerial implications of our findings are discussed. Keywords: meta-analysis, ethical leadership, interactional justice, organizational citizenship behavior, TSSEM

Extensive research has shown that ethical leadership plays a key role in shaping employee behaviors (Loi, Lam, & Chan, 2012; Neubert, Carlson, Kacmar, Roberts, & Chonko, 2009; Zhang, 2012). Accordingly, increasing attention has been paid to the mechanisms that underlie the influences of ethical leadership (Kalshoven, Den Hartog, & de Hoogh, 2013; Newman, Kiazad, Miao, & Cooper, 2014). While existing literature attempts to explain the effects of ethical leadership from several theoretical perspectives, such as cognitive and affective trust (Lu, 2014), and organizational politics (Kacmar, Andrews, Harris, & Tepper, 2013), there is a critical omission in exploring its mediational pathways from the perspective of justice. The core premise of justice perspective is the perception of fairness (Folger, 2001; Folger & Cropanzano, 2001), which is captured by the construct of organizational justice (Colquitt, Conlon, Wesson, Porter, & Ng, 2001). Organizational justice consists of three dimensions – distributive justice, procedural justice, and interactional justice. Different from distributive and Zeitschrift für Psychologie (2019), 227(1), 18–30 https://doi.org/10.1027/2151-2604/a000353

procedural justice which are organization-focused, interactional justice depicts the fairness regarding interpersonal treatment experienced by focal employees (Cohen-Charash & Spector, 2001; Colquitt et al., 2001). Bies and Moag define interactional justice as “concerns about the fairness of interpersonal communication” (1986, p. 44), incorporating communicational factors regarding how organizational system and policies are enacted and implemented (Karriker & Williams, 2009). As the main connections between an organization and employees, leaders directly influence employees’ feelings about their social exchange and interpersonal relationship with the organization. When leaders behave ethically in implementing organizational practices or policies, employees are more likely to perceive their interpersonal relationships with the leaders and with the organization as positive and favorable (i.e., high-level interactional justice) (Wang, Lu, & Liu, 2017). Such positive perceptions from employees will further impact their work-related attitudes and behaviors. Therefore, it is Ó 2019 Hogrefe Publishing


Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

important to examine the mediating role of interactional justice between ethical leadership and subordinates’ behaviors. In this paper, we focus on employees’ organizational citizenship behavior (OCB) as the dependent variable because it is an important domain in organizational behavior research with the belief that OCBs are critical drivers to organizational performance and competitive advantages. Empirical evidences supporting this argument are well documented in the literature (see Review Podsakoff, Blume, Whiting, & Podsakoff, 2009) and a large volume of studies have investigated the antecedents of employees’ OCB (e.g., Ehrhart, 2004; Organ, Podsakoff, & MacKenzie, 2005; Zeinabadi, 2010). With the development of OCB literature, researchers have increasingly recognized that OCB is multifoci – organizational OCB (OCBO) and interpersonal OCB (OCBI) – which calls for further research in a more rigorous manner (e.g., Karriker & Williams, 2009; Lu, 2014; Newman et al., 2014; Yang, Ding, & Lo, 2016) and advocated different mechanisms leading to OCBO and OCBI (e.g., Ferris, Lian, Brown, & Morrison, 2015; Lee & Allen, 2002). Thus, the main purposes of this study are threefold. First, it investigates the relationship between ethical leadership and employees’ OCBs. Second, it examines the mediating role of interactional justice in ethical leadership–OCB linkage. Finally, the explanatory powers of interactional justice on OCBO and OCBI are compared (Shin, 2012). The research question of this study is: From the perspective of interactional justice, which form of organizational citizenship behavior (OCB) – OCBI or OCBO – is more predictable regarding how ethical leadership influences OCB through interactional justice? To address this research question, we conduct a metaanalytic review using two-stage meta-analytic structural equations modeling (TSSEM). First, this review investigates interactional justice as a mechanism through which ethical leadership induces employees’ OCBs. Since leaders are those who implement organizational practices and policies, their behaviors (ethical or unethical) will directly influence how employees perceive the social exchange relations as well as the communication processes. Specifically, we argue that leaders’ ethical behavior can improve employees’ OCBs by increasing the employees’ perceptions of interactional justice. Second, we attempt to compare the predicting powers of interactional justice in the above relationships when the two forms of OCBs – OCBO and OCBI – are under consideration. Previous studies on OCBs have demonstrated that OCBs that directly benefit the organization itself (OCBO) and OCBs that benefit the interpersonal interactions (OCBI) are distinctive and related to different

Ó 2019 Hogrefe Publishing

19

predictors (e.g., Lehmann-Willenbrock, Grohmann, & Kauffeld, 2012; Skarlicki & Latham, 1996). It is thus reasonable to expect that interactional justice may have different roles in the relationships between ethical leadership and OCBO and OCBI, respectively. This study provides us with a better understanding of how OCBs with different targets are generated from ethical leadership.

Theory and Hypotheses Overview of the Current Study In the current study, we first explain the relationship between ethical leadership and two forms of OCBs from the organizational justice perspective and propose our theoretical model. Second, we analyze our data based on effect sizes of primary studies from multiple databases using TSSEM. Specifically, we apply TSSEM to test our hypothesized model and research question, which is more rigorous than traditional meta-analytic structural equations modeling (MASEM) (Viswesvaran & Ones, 1995). Based on the standard procedure of TSSEM proposed by Cheung (2015a, 2015b), a correlation matrix is pooled in the first stage, which is used to fit structural equation models in the second stage. The fixed-effects model is applied if the population effect sizes are homogeneous. Alternatively, a random-effects model will be applied so that the findings can be generalized to other contexts. In the second stage, a vector of the pooled correlation matrix and its asymptotic sampling covariance matrix are estimated, which is used to fit the SEM. The R package metaSEM is applied to analyze the data (Cheung, 2015a, 2015b). The TSSEM method offers several advantages. TSSEM enables us to test our theoretical model and get more accurate estimates by simultaneously examining multiple predictors and outcomes (Clarke, 2005; Viswesvaran & Ones, 1995). TSSEM helps us to compare two different mediation pathways by comparing models with and without the constraint on the regression paths (Cheung & Chan, 2005; Mackey, Frieder, Brees, & Martinko, 2017). TSSEM can also be used to test several theoretically possible models by comparing these models empirically (Rosenthal & DiMatteo, 2001). Third, the theoretical and practical implications are discussed. Figure 1 presents the proposed model of this study.

Ethical Leadership and Employees’ OCB Defined as “the demonstration of normatively appropriate conduct through personal actions and interpersonal relationships, and the promotion of such conduct to subordinates

Zeitschrift für Psychologie (2019), 227(1), 18–30


20

Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

OCBO

Ethical Leadership

Interactional Justice

OCBI

Figure 1. Hypothesized model of employees’ OCBs.

through two-way communication, reinforcement, and decision-making” (Brown, Treviño, & Harrison, 2005, p. 120), ethical leadership is posited to benefit a variety of individual and organizational outcomes (Ng & Feldman, 2015), with its potential to induce subordinates’ desirable attitudes and behaviors (Brown et al., 2005). An important research stream is the causal linkage between ethical leadership and employees’ OCBs (e.g., Avey, Palanski, & Walumbwa, 2011; Kacmar, Bachrach, Harris, & Zivnuska, 2011; Lu, 2014; Mayer, Kuenzi, Greenbaum, Bardes, & Salvador, 2009; Ruiz-Palomino, Ruiz-Amaya, & Knörr, 2011; Shin, 2012). OCBs are employees’ discretionary behaviors that benefit the organization (OCBO) and other organizational members (OCBI). The definition of ethical leadership provided by Brown et al. (2005) consists of two aspects of behaviors – “demonstration” and “promotion” (Brown & Treviño, 2006). Accordingly, two theoretical lenses are primarily used to explain the positive impacts of ethical leadership – social exchange theory (Blau, 1964) and social learning theory (Brown et al., 2005), with social exchange theory gaining more attention in the past decades (Newman et al., 2014). Social exchange theory is in accordance with the “demonstration” aspect of ethical leadership, while social learning theory explains the effects of “promotion” of ethical behaviors by leaders. According to social exchange rationales, when employees are treated with normative appropriateness and respects by ethical leaders (i.e., leaders demonstrate favorable behaviors), they are more willing to reciprocate such favorable treatments by exhibiting OCBs toward the organization, co-workers, and their leaders. Social learning theory, on the other hand, argues that ethical leadership serves as a role model for employees (Brown et al., 2005). It affects the collective ethical norms and employees’ ethical cognitions which guide employees’ behaviors (Resick, Hargis, Shao, & Dust, 2013). In this sense, employees learn from leaders’ ethical behaviors and engage in OCBs. Thus, we hypothesize: Hypothesis 1: Ethical leadership is positively related to employees’ OCBs, including OCBI and OCBO. Zeitschrift für Psychologie (2019), 227(1), 18–30

Ethical Leadership and Interactional Justice Although both social learning and social exchange perspectives explain the importance of ethical leadership on employees’ work-related behaviors, social learning theory is more frequently used for explaining “the mitigating effects of ethical leadership on subordinate deviant or unethical behaviors” (Newman et al., 2014, p. 115). Since the focus of our study is interactional justice and employee OCBs, while justice literature predominantly relies on social exchange theory to frame the effects of interactional justice (Moorman & Byrne, 2005), our arguments are mainly grounded in the social exchange theory. A premise underlying the social exchange relationship between ethical leadership and employees’ OCBs is that employees need to perceive behaviors from leaders as well-intentioned or virtuous so that they will reciprocate with their favorable behaviors, such as OCBs (Blau, 1964). A pertinent notion that reflects employees’ perception of the way they are treated in the workplace is interactional justice, which is an important component of organizational justice. Organizational justice refers to employees’ perception of fairness in the workplace (Lind, Greenberg, & Cropanzano, 2001). It consists of three dimensions – distributive justice, procedural justice, and interactional justice. Distributive and procedural justices are employees’ perceptions of fairness regarding an organization’s distribution results and distribution processes, respectively. They primarily focus on the organization as a system (Greenberg & Cropanzano, 1993; Loi, Yang, & Diefendorff, 2009). Different from these two structural forms of organizational justice, interactional justice emphasizes employees’ feelings of how they are treated by their leaders. It is directly associated with the social interactions and communications between employees and their leaders (Cropanzano, Prehar, & Chen, 2002). According to the agent-system model of justice (Bies & Moag, 1986; Tekleab, Takeuchi, & Taylor, 2005), interactional justice is more interpersonal-oriented and is more easily influenced by leaders’ behavioral manners than the other two forms of justice. Research has shown that ethical leadership closely impacts interactional justice (Bedi, Alpaslan, & Green, 2016; Bies & Moag, 1986; Ng & Feldman, 2015). Prior studies examined the mediating role of employee interactional justice between ethical leadership and employee outcomes, such as loyalty to supervisors (Wang et al., 2017) and trust (Arslantas & Dursun, 2008). Leaders are the conduits between the organization and the employees. Even with the same organizational practices and policies, different leaders could give rise to different responses from employees. When ethical leaders treat their employees with respect and explain decision-making processes well, Ó 2019 Hogrefe Publishing


Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

employees are inclined to perceive their social exchange relations with leaders and organizations as favorable, affective pleasant, and fair (Roch & Shanock, 2006; Wang et al., 2017).Therefore, we hypothesize: Hypothesis 2: Ethical leadership is positively related to employee interactional justice.

The Mediating Role of Interactional Justice Research that explains the connections between ethical leadership and employees’ behaviors flourished in the past decades (e.g., Lu, 2014; Resick et al., 2013; Sharif & Scandura, 2014; Walumbwa, Hartnell, & Misati, 2017). Social exchange theory argues that reciprocating behaviors emanate from “feelings of personal obligations, gratitude, and trust” (Blau, 1964, p. 94). When leaders behave ethically, employees are more likely to perceive and feel those behaviors as leaders’ efforts in maintaining and investing in the social relationships, which then motivate the employees to reciprocate with OCBs. Employees’ perceptions, hence, are an important part that connects ethical leadership to employee OCBs (Yang et al., 2016). As mentioned above, interactional justice is employees’ perceptions regarding the treatment they receive from their leaders and the organization. Interactional justice captures how employees feel about the interpersonal relations within an organization, and hence influences the employees’ willingness to reinforce close relationships. Unlike justice regarding organizational distribution system (i.e., distributive and procedural justices) that is understood as the assessment of organizational formal system and process (Cropanzano et al., 2002) and hence is more closely related to employees’ in-role behaviors (Kim & Mauborgne, 1996), interactional justice is more affectiverelated and cannot be easily regulated in an employment contract. When employees perceive a higher level of interactional justice, they tend to respond by exhibiting extrarole behaviors which are not formally required in their job descriptions. Therefore, we endorse the mediating role of interactional justice between ethical leadership and employee’s OCBs. For instance, Wang et al. (2017) suggest that interactional justice mediates the relationship between ethical leadership and employees’ loyalty to supervisor. This also echoes Kacmar and colleagues’ suggestion that “unethical leaders may affect outcomes by reducing interactional justice” (2013, p. 42). Organizational justice research confirms the importance of interactional justice in the workplace, and its positive effects on desirable employee attitudes and behaviors, including OCBs (e.g., Alotaibi, 2001; Carpenter, Berry, Ó 2019 Hogrefe Publishing

21

& Houston, 2014; Moorman, 1991; Organ & Ryan, 1995; Tepper & Taylor, 2003; Wong, Ngo, & Wong, 2006). Grounded on the social exchange theory, organizational justice literature argues that employees are more willing to perform OCB when they receive fair treatments from their organization and its agent, the leaders (Cropanzano & Mitchell, 2005). As Moorman states, “OCB appears to be a reasonable and likely way in which an employee can exchange the social rewards brought on by perceptions of fairness” (1991, p. 846). Since leaders represent both themselves and the organization, employees who perceive a high level of interactional justice from the leaders are inclined to behave beneficially to both the leader and the organization through OCBI and OCBO, respectively. In contrast, employees tend to conduct undesirable behaviors such as efforts of withdrawal in response to unfair treatments. Empirical evidence has provided supports for these relationships. For example, Rupp and Cropanzano (2002) found a positive relationship between interactional justice and OCBO. Similarly, Wong et al. (2006) found that organizational justice (including interactional justice) led to employee OCBs through the impact on employees’ trust in organization and supervisor. Therefore, we hypothesize: Hypothesis 3a: Interactional justice mediates the relationship between ethical leadership and employee OCBO. Hypothesis 3b: Interactional justice mediates the relationship between ethical leadership and employee OCBI.

The Explanatory Power of Interactional Justice The construct of OCBs is an “umbrella” concept which encompasses multiple aspects that deserve a more nuanced investigation (Özbek, Yoldash, & Tang, 2016). Researchers provided different ways to conceptualize different forms of OCBs (e.g., Moorman, 1991; Van Dyne, Graham, & Dienesch, 1994). Based on different targets, OCBs can be categorized into OCBO – citizenship behaviors benefitting organizations; and OCBI – citizenship behaviors benefitting individuals (Shin, 2012). While prior studies generally support the implications of ethical leadership on OCBs, little is known about the details of such a relationship when OCBO and OCBI are considered separately. Organizational justice provides a valuable theoretical foundation to investigate further in-depth differences, if any, between OCBO and OCBI. As has been discussed earlier, interactional justice is proposed to be associated with both OCBO and OCBI. However, its predicting power may vary depending on the targets of OCBs. Zeitschrift für Psychologie (2019), 227(1), 18–30


22

Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

Comparing the explanatory powers of organizational justice on different outcomes has a long tradition in the literature. Empirical studies based on different cultural contexts revealed that organizational justice’s prediction of employees’ OCBs has cultural uniqueness (Farh, Earley, & Lin, 1997; Moorman, 1991). Others compared the effects of organizational justice on different forms of OCBs. For instance, Özbek et al. (2016) examined three forms of OCBs (i.e., loyalty, participation, and obedience) and found that interactional justice produces loyalty but not participation and obedience. In this study, we attempt to compare the effects of interactional justice on OCBO and OCBI. An important concept in organizational justice literature is the source of fairness (Colquitt et al., 2001; Colquitt & Greenberg, 2003; Greenberg & Cropanzano, 1993). Different sources of justice perceptions will lead to different kinds of extra-role behaviors toward different targets (El Akremi, Vandenberghe, & Camerman, 2010; Karriker & Williams, 2009). For interactional justices generated from ethical leadership, two different sources can be attributed by employees – the leader and the organization – based on the agent-system and agent-dominance models of justice, respectively (Fassina, Jones, & Uggerslev, 2008). On one hand, the agent-system model suggests that the source of interactional justices is the leaders whereas the source of distribution and procedural justice is the organization. In other words, it distinguishes the system (i.e., the organization) from its agents (i.e., the leader). Therefore, interactional justices will motivate employees to reciprocate in ways that benefit the agent (i.e., leaders). Wong et al. (2006) found that interactional justice is more closely related to employees’ trust in supervisors rather than in organizations. Since leadership is directly related to interpersonal actions, the positive interactional justice employee perceived will affect more on employees’ interpersonal OCBs (i.e., OCBI) than OCBO. Conversely, agent-dominance model suggests that employees’ interpersonal relationships with the leader represent employees’ relations with the organization because employees tend to view the leader as the representative or agent of the organization (El Akremi et al., 2010; Fassina et al., 2008; Xu, Loi, & Ngo, 2016). Accordingly, ethical leaders’ behaviors will be interpreted as the implementation of the organization’s policies and norms. In this sense, the source of justices is the organization and the positive feelings employees hold would go beyond the ethical leaders and eventually indwell in the organization since the leaders are, in nature, agents that represent the organization. Therefore, the effect of ethical leadership on OCBs through interactional justice should be stronger toward OCBO than 1

OCBI. Due to the conflicting theoretical explanations of the influences from ethical leadership on the two forms of OCBs via interactional justice, we compare the mediating effects of interactional justice from ethical leadership on OCBO and OCBI. Research Question: Does interactional justice have stronger explanatory power on OCBO than on OCBI regarding the mediation relationship between ethical leadership and OCBs through interactional justice or the opposite?

Methods Literature Search and Inclusion Criteria We use three steps to identify the empirical studies that contain the four focal variables in the present study1. First, keywords including ethical leadership, interactional justice, OCBI, and OCBO were searched from following databases: Web of Science (SSCI), EBSCO, ABI/INFORM, ERIC, PsycINFO, Google Scholar and Scopus. OCBs have several different ways of classification. For example, Organ (1988) classified OCBs into five dimensions (altruism, courtesy, conscientiousness, sportsmanship, and civic virtue); Moorman and Blakely (1995) divided OCBs into four types (personal industry, interpersonal helping, individual initiative, and loyal boosterism). In this study, we classified altruism, courtesy, interpersonal helping, and individual initiative dimensions as aspects of OCBI, because the targets of these behaviors are employees; whereas conscientiousness, sportsmanship, civic virtue, personal industry, and loyal boosterism are considered as OCBO, as all these behaviors are targeted at the job, duty, or the whole organization. Second, we also manually searched the cross-referencing from existing qualitative and quantitative review studies of variables in our model (e.g., Bedi et al., 2016). Third, to avoid the issues of publication bias, we also searched the unpublished studies based on suggestions of Rothstein and Hopewell (2009). Inclusion Criteria We used the following inclusion criteria to identify the searched studies. First, the studies examined at least two variables in our model. Specifically, our study contains five pairs of bivariate relationships. The studies should at least examine one bivariate relationship. Second, the studies should include the measurements of at least two focal variables. If the study failed to measure the variable empirically,

The review protocol can be found via the following link (page 4–7) https://www.psycharchives.org/bitstream/20.500.12034/658/1/Research% 20Synthesis%202018_%20Abstract%20Collection.pdf

Zeitschrift für Psychologie (2019), 227(1), 18–30

Ó 2019 Hogrefe Publishing


Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

the study is not included. Third, the included studies must report bivariate relationships. In sum, 100 studies with 34,028 participants were included in the present metaanalysis.2

23

Table 1. Meta-analytic estimates of correlations among study variables 1

2

3

4

1. Ethical leadership r

Coding Procedures According to Krippendorff (2012), two authors responsible for coding firstly agree on a coding scheme, they independently code data according to the coding scheme (Cohen’s κ = .86). For every included empirical study, they code information including sample size and correlations of each variable. Operationalization of Variables All constructs are operationalized based on the definition we adopted in the current study. Ethical leadership was operationalized as leader’s behavior that is normatively appropriate in the interaction between leader and member interaction (Brown et al., 2005). Interactional justice was operationalized based on interpersonal treatment employees receive from their leaders (Colquitt et al., 2001). OCBI and OCBO are employees’ extra-role performance targeted at individual and organization respectively, such as help for a coworker and consciousness (Hoffman, Blair, Meriac, & Woehr, 2007).

Results Stage 1 Analysis In the first stage, we use the fixed-effects model to test whether all studies are homogeneous. The results indicated that the homogeneity of correlation should be rejected (w2(98) = 2,085.50, p = .000, CFI = .74, SRMR = .16, RMSEA = .29). Thus, we applied the random effects model to pool the correlation. Table 1 shows the meta-analytical results of bivariate relationships based on the random model.

SE

LL

UL

I2

N

K

2. Interactional justice r

0.56

SE

0.06

LL

0.44

UL

0.68

I2

0.87

N

2,582

K

11

3. OCBO r

0.23

0.32

SE

0.05

0.04

LL

0.14

0.24

UL

0.33

0.39

I2

0.61

0.70

N

2,025

5,772

K

9

17

r

0.20

0.10

0.50

SE

0.02

0.06

0.03

LL

0.15

0.43

4. OCBI

0.02

UL

0.24

0.43

0.56

I2

0.25

0.82

0.95

N

2,627

2,726

18,296

K

12

10

72

Note. r = correlation value based on random effect model, SE = standard error, LL = lower level of the 95% confidence interval, UL = upper level of 95% confidence interval, I2 = the I2 statistic, a measure of the proportion of dispersion that can be attributed to real differences in effect sizes as opposed to within-study error, N = number of participants in each analysis, K = number of independent effect sizes included in each analysis.

Stage 2 Analysis As the proposed model (Model 1) is a just-identified model, the fit index of the model is perfect. Hypothesis 1 indicated that ethical leadership is positively related to employee OCBs, including OCBI and OCBO. Based on the TSSEM analysis, the effects of ethical leadership on OCBO (B = .08, SE = 0.08, p = .319) is not significant and OCBI (B = .20, SE = 0.06, p = .001) is significant. Therefore, Hypothesis 1 was partially supported. 2

Hypothesis 2 proposed ethical leadership is positively related to employee interactional justice. The effects of ethical leadership on interactional justice (B = .56, SE = 0.06, p = .000) is significant. Therefore, Hypothesis 2 was supported. Hypothesis 3a proposed that interactional justice mediates the relationship between ethical leadership and employee OCBO. Regarding this hypothesis, the results

The supplementary materials, which can be used to replicate our analysis, for this article can be found online at http://dx.doi.org/10.17605/OSF. IO/EDGKJ

Ó 2019 Hogrefe Publishing

Zeitschrift für Psychologie (2019), 227(1), 18–30


24

Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

Figure 2. Results of TSSEM analysis. ***p < .001.

indicated that the mediating effects of interactional justice on the relationship between ethical leadership and OCBO is significant (B = .15, SE = 0.04, p = .000, CI [.07, .24]). In addition, Hypothesis 3b proposed that interactional justice mediates the relationship between ethical leadership and employee OCBI. The mediating effects of interactional justice on the relationship between ethical leadership and OCBI (B = .01, SE = 0.05, p = .912, CI [ .11, .10]) is not significant. Thus, Hypothesis 3a was supported, whereas 3b was not supported.

TSSEM Analysis of the Relative Importance Regarding the comparison on the mediating effects of interactional justice on the relationship between ethical leadership on OCBO and OCBI, we compared the models with and without the constraint on the paths. In specific, we constrained the mediating effects of interactional justice on the two paths to be equal (Model 2). The results indicated the model showed a significantly worse model fit (Δw2(1) = 5.97, p = .015, CFI = 0.99, RMSEA = 0.01, and SRMR = 0.06) compared with Model 1, which indicated the effects of the mediating effect of interactional justice on the relationship of ethical leadership on OCBO and OCBI are not equal. Therefore, the mediating effects of interactional justice on the relationship between ethical leadership OCBO and OCBI are significantly different. In addition, the coefficient different test showed that the mediating effect of interactional justice on the relationship between ethical leadership and OCBO is significantly stronger than OCBI (difference = .16, SE = 0.07, p = .028, CI [ .30, .02]). Figure 2 summarizes the results of our analysis. 3

Robustness Check To replicate our analysis, we analyzed our data using the traditional MASEM approach. The results of MASEM analysis are consistent with TSSEM. We found the effects of ethical leadership on interactional justice (B = .56, SE = 0.02, p = .000), OCBO (B = .07, SE = 0.02, p = .000) and OCBI (B = .21, SE = 0.02, p = .000) are significant. In addition, the effects of interactional justice on OCBI (B = .02, SE = 0.02, p = .401) is nonsignificant and OCBO (B = .28, SE = 0.02, p = .000) is significant. We found that traditional MASEM approach will underestimate the standard error of path coefficients. The results of replication indicated our findings were robust.

Exploratory Analyses for Moderating Effects Furthermore, because the results of first stage analysis indicated that there are between-study heterogeneities among primary studies, we examined the moderating role of country difference and publication year on the relationship between ethical leadership and its consequences to account for between-study variance3. We followed the country classification of United Nations Development Programme (UNDP) that is based on the human development index (HDI) and classified the countries into either developing or developed countries (United Nations Development Programme, 2018). In developed countries, the standards of professional ethics in the workplace are normally higher than those in the developing country. In the context of higher professional ethics standards, ethical leadership maybe more salient and has stronger effects on employees than context has lower professional ethics standards.

We applied a one-sample removed analysis to evaluate the robustness of our analysis in terms of detecting the impact of an individual study before conducting moderating analysis (Borenstein et al. 2011). The results of one-sample removed analysis suggested that there is no substantial difference in results after removing one sample for the three bivariate relationships ethical leadership-interactional justice ( r = .54, 95% CI [.27, .73]), ethical leadership-OCBI ( r = .21, 95% CI [.19, .23]) and ethical leadership-OCBO ( r = .24, 95% CI [.22, .25]).

Zeitschrift für Psychologie (2019), 227(1), 18–30

Ó 2019 Hogrefe Publishing


Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

The subgroup analysis was conducted to compare the effect sizes of these two groups. For the relationship between ethical leadership and interactional justice, the developing country group ( r = .40, 95% CI [ .94, .99]) is not significant and lower than developed country group ( r = .63, 95% CI [.49, .74]). The difference between these two groups is statistically significant (w2(1) = 11.00, p < .01). For the relationship between ethical leadership and OCBO, the developing country group ( r = .23, 95% CI [.17, .30]) is lower than developed country group ( r = .25, 95% CI [.08, .42]). The difference between these two groups is statistically significant (w2(1) = 10.00, p = .002). For the relationship between ethical leadership and OCBI, the developing country group ( r = .22, 95% CI [.18, .25]) is higher than developed country group ( r = .19, CI [.10, .28]). The difference between these two groups is statistically significant (w2(1) = 12.00, p = .001). In sum, these results suggested the ethical leadership has stronger effects on its consequences in developed countries than in developing countries among two of these three bivariate relationships. Regarding publication year, we applied meta-regression to analyze the data because publication year is a continuous variable. The results indicated publication year did not significantly explain the betweenstudy variance for the relationships between ethical leadership and interactional justice (B = .01, SE = 0.04, p = .765), OCBI (B = .00, SE = 0.01, p = .636), and OCBO (B = .01, SE = 0.02, p = .497).

Conclusions and Implications The explanatory mechanisms underlying the relationship between ethical leadership and subordinates’ behaviors were drawing greater attention in the past decades (Mayer, Aquino, Greenbaum, & Kuenzi, 2012). Based on previous studies, our meta-analysis reviewed and explained how ethical leadership influences employee OCBs through its impact on employees’ interactional justice. We also compared the explanatory powers of interactional justice in the relationships between ethical leadership and two forms of OCBs – OCBO and OCBI. Our results indicate that ethical leadership is positively related to OCBI. Furthermore, we found that employees’ interactional justice fully mediates the relationship between ethical leadership and OCBO, whereas such mediating role was not found in the ethical leadership–OCBI linkage.

Theoretical Implications Our study advances the growing literature on ethical leadership in the following ways. First, we empirically reviewed prior studies to examine the ethical leadership – employees Ó 2019 Hogrefe Publishing

25

OCB relationships as well as the underlying mechanism from a justice perspective. This meta-analysis provides us more convincing evidence for the proposed relations since any single study has its own cultural roots and uniqueness (Tsui, 2007, p. 25). Specifically, we adopt the TSSEM technique to test our hypothesized model and research question, which is more rigorous than MASEM (Viswesvaran & Ones, 1995). TSSEM particularly suits our research question and hypotheses because it enables us to examine multiple outcomes simultaneously (Clarke, 2005; Viswesvaran & Ones, 1995) and to compare the explanatory powers of interactional justice on OCBI and OCBO (Cheung & Chan, 2005; Mackey et al., 2017). Second, our study is the first that systematically summarizes and theorizes the interactional justice perspective to explain the causal process from ethical leadership to multiple employee OCBs. While existing research has explained the effects of ethical leadership from various perspectives, such as cognitive and affective trust (Lu, 2014; Newman et al., 2014) and organizational politics (Kacmar et al., 2013), there is a critical omission in exploring its mediational pathways through the perspective of interactional justice. This overlook is surprising since interactional justice has been attracting great interest for decades, both in academia and practices (Brown et al., 2005; Colquitt & Rodell, 2011). Our findings provide support for interactional justice’s importance in the social exchange process from ethical leadership to employees’ OCBs, especially those toward the organizations. The mechanisms underlying ethical leadership – OCB relationship have been investigated from various perspectives, such as group engagement (Yang et al., 2016). These perspectives supplement each other and provide an increasingly comprehensive picture regarding how leaders’ behaviors influence subordinates’ OCBs. Although using a different logic, Yang et al. (2016) argued that employees’ perception regarding the quality of interpersonal treatment facilitates the positive impact of ethical leadership on OCBs. This study, thus, enhanced Yang et al. (2016)’s idea and examined the mediating role of interactional justice from a social exchange perspective. Third, an interesting extension from both theoretical and practical standpoints is that whether interactional justice is always effective in explaining how ethical leadership renders employee behavioral reactions regarding the changes in OCBO and OCBI. While accumulating research supported the important role of ethical leadership on both OCBI and OCBO, they did not explicitly compare these two relationships in a more detailed manner (e.g., Newman et al., 2014; Shin, 2012). We examined the influences of ethical leadership on OCBI and OCBO separately,providing a fine-grained understanding of ethical leadership– OCB linkage. Comparing the predicting roles of interactional justices on the impacts of ethical leadership to OCBO Zeitschrift für Psychologie (2019), 227(1), 18–30


26

Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

and OCBI enabled us to better understand the unique emergence processes underlying different forms of OCBs. The results of our meta-analysis showed that interactional justice fully mediates the effects of ethical leadership on OCBO, whereas no such mediating effects were found in the relationship between ethical leadership and OCBI. This is surprising and interesting because the conventional belief was the target of employees’ reciprocating behaviors tends to be aligned with the source of benefits they get (El Akremi et al., 2010). In our case, employees’ interactional justice (i.e., benefit) is directly generated from their leaders’ behaviors (i.e., the source of benefit) and as a result, they are supposed to pay back this benefit through OCBs toward their leaders (i.e., the target). Based on the resourcetarget alignment from social exchange theory, two explanations – target-oriented and source-oriented – are provided below. One possible explanation (i.e., target-oriented explanation) for this unexpected result could be that in our definition, OCBI refers to OCBs that benefit other organizational members (e.g., co-workers) rather than the leaders themselves, which could obscure the targets of OCBI as leaders. There are conflicting views regarding which type of OCBs – OCBI or OCBO – is more beneficial to leaders. Some scholars posit that OCBO benefits the leader more directly than OCBI (e.g., Matta, Scott, Koopman, & Conlon, 2015; Williams & Anderson, 1991).The other explanation (i.e., source-oriented explanation) for the distinct roles of interactional justice on OCBO and OCBI could be the agent-dominance model of justice (Fassina et al., 2008). In contrast with a system-agent perspective which argues that employees tend to attribute interactional justice to their leader and other forms of organizational justice (i.e., distributive and procedural justice) to the organization (Bies & Moag, 1986; Tyler & Bies, 1990), the agent-dominance model of justice emphasizes the representativeness of leaders who act on behalf of the organization. Following this logic, rather than predicting employees’ OCBI, receiving interactional justice from the leaders is more likely to induce employees’ OCBO, since leaders are the key social agent of the organization. Our results hence supported the agent-dominance model of interactional justice.

Managerial Implications Our findings also have important managerial implications for practitioners. First, our findings confirmed the importance of ethical leadership for organizations in a more generalized manner. Ethical leadership increases subordinates’ OCBOs and OCBIs that are both beneficial to the organization. Hence, organizations should motivate their leaders to exhibit ethical behaviors. To make the most of ethical leadership, such emphasis should not limit to one Zeitschrift für Psychologie (2019), 227(1), 18–30

aspect but need to be throughout the whole human resource management system from recruitment, training, to appraisal, compensation and so forth (Newman et al., 2014). The second implication of our findings highlights the role of employees’ perception of interactional justice underlying the process from ethical leadership to employees’ extra-role behaviors. For ethical leadership to eventually induce employees’ OCBs, leaders are suggested to pay special attention to the formation and development of their subordinates’ perception regarding interactional justice (Colquitt et al., 2001; Kacmar et al., 2013). Moreover, our findings indicate that interactional justice only mediates the effects of ethical leadership on employees’ OCBs toward the organization but does not affect employees’ OCBs toward other organizational members. Hence, depending on the types of employees’ OCBs needed, organizations should adopt different strategies in influencing employees’ perceptions and behaviors (Newman et al., 2014; Zhang, Zhang, Lei, Yue, & Zhu, 2016).

Limitations and Suggestions for Further Research Despite the above implications, this research has several limitations that deserve further discussions to benefit future research. First, our study only focuses on one type of organizational justice as the mediator between ethical leadership and OCBs. This is because, as we discussed in the previous sections, compared with distributive justice and procedural justice, interactional justice is more aligned with leadership style (Brown et al., 2005) in a sense that it emphasizes employees’ perceptions of how they are treated by organizational authorities. However, existing empirical studies also found that ethical leadership can impact the distributive justice and procedural justices of subordinates (e.g., Xu et al., 2016). Thus, future studies could also explore the roles of distributive justice and procedural justice in the ethical leadership–OCB linkage, for example, comparing the effects as well as the underlying mechanisms of ethical leadership on different forms of organizational justices. Further studies examining the role of ethical leadership on different forms of organizational justices will provide a more fine-grained understanding of ethical leadership in influencing individual-level and organizational-level outcomes (Shin, Sung, Choi, & Kim, 2015). Second, even we compared the casual processes that lead to OCBO and OCBI, other employees’ workplace behaviors and outcomes, such as task performance (e.g., Aryee, Budhwar, & Chen, 2002; Sharif & Scandura, 2014), deviations (e.g., El Akremi et al., 2010) and turnover intention (e.g., Aryee & Chay, 2001; Palanski, Avey, & Jiraporn, 2014) could also be examined under a more extended leadership–justice framework. For example, we speculated Ó 2019 Hogrefe Publishing


Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

that the three dimensions of organizational justice represent different exchange relations between employees and employers. These different exchange relations will eventually lead to different outcomes. Specifically, we posited that interactional justice is more relevant to affective social exchange while distributive justice and procedural justice are more relevant to economic social exchange. Employees work for an employer with the expectations that they will get paid fairly for their works. Any violations of resource allocation fairness will damage the employment relations, which may then jeopardize employees’ in-role behaviors. In contrast, interactional justice is human beings’ affective demand that cannot be easily formalized in employment contracts. Because of the affective nature of interactional justice, we argue that interactional justice tends to impact employees’ extra-role behaviors, such as OCBs. The distinction between the above two exchange relations is similar to the concepts of transactional and socio-emotional relationships in the leader-member exchange (LMX) literature (Liden, Sparrowe, & Wayne, 1997), respectively. Further research could compare the effects of different organizational justices on employees’ different workplace behaviors. Such research is of importance because employees’ different behaviors (inrole or extra-role behaviors) will lead to different organizational outcomes, such as organizational efficiency, innovation, and competitive advantages (Becker & Kernan, 2003; Vey & Campbell, 2004; Zhang et al., 2017). Third, one outcome of our study is that we demonstrate the moderating role of OCBs’ target on interactional justice–OCB relationships (Colquitt et al., 2013), such that the positive relationship between interactional justice and OCBs is stronger when OCB is organizational-targeted. As such, the proposed mechanisms in this study would benefit from exploring other possible moderating conditions. Such conditions may affect either ethical leadership – justice relationship or justice–OCB relationship or both. For instance, on the one hand, some scholars have argued the culture at the organizational level (e.g., Erkutlu, 2011) or individual level (e.g., Özbek et al., 2016) may moderate the relationship between interactional justice and OCBs. Other possible moderators include employees’ perceived OCBs’ roles (e.g., Tepper & Taylor, 2003), organizational ownership (e.g., Wong et al., 2006), and so forth. On the other hand, Wang et al. (2017) also found that institutional culture (e.g., collectivistic orientation) moderates the impact of ethical leadership on interactional justice.

Conclusion In conclusion, drawing upon a justice theoretical perspective, our meta-analytic review is the first to systematically examine the effects of ethical leadership on employees’ different forms of OCBs by comparing the mediating roles Ó 2019 Hogrefe Publishing

27

of interactional justice. Based on 100 empirical studies, we use TSSEM and show that ethical leadership is beneficial for organizations by directly increasing employees’ OCBI, and indirectly increasing employees’ OCBO through improving interactional justice. By highlighting the interactional justice-based mechanism underlying the relationship between ethical leadership and employees’ OCBs, our findings suggest that interactional justice is particularly important for employees’ OCBs toward the organization.

References Alotaibi, A. G. (2001). Antecedents of organizational citizenship behavior: A study of public personnel in Kuwait. Public Personnel Management, 30, 363–376. https://doi.org/ 10.1177/009102600103000306 Arslantas, C., & Dursun, M. (2008). The impact of ethical leadership behavior on trustin manager and psychological empowerment: The mediating role of interactional justice. Anadolu University Journal of Social Sciences, 8, 111–128. Aryee, S., Budhwar, P. S., & Chen, Z. X. (2002). Trust as a mediator of the relationship between organizational justice and work outcomes: Test of a social exchange model. Journal of Organizational Behavior, 23, 267–285. https://doi.org/10.1002/job.138 Aryee, S., & Chay, Y. W. (2001). Workplace justice, citizenship behavior, and turnover intentions in a union context: Examining the mediating role of perceived union support and union instrumentality. Journal of Applied Psychology, 86, 154–160. https://doi.org/10.1037/0021-9010.86.1.154 Avey, J. B., Palanski, M. E., & Walumbwa, F. O. (2011). When leadership goes unnoticed: The moderating role of follower self-esteem on the relationship between ethical leadership and follower behavior. Journal of Business Ethics, 98, 573–582. https://doi.org/10.1007/s10551-010-0610-2 Becker, T. E., & Kernan, M. C. (2003). Matching commitment to supervisors and organizations to in-role and extra-role performance. Human Performance, 16, 327–348. https://doi.org/ 10.1207/S15327043HUP1604_1 Bedi, A., Alpaslan, C. M., & Green, S. (2016). A meta-analytic review of ethical leadership outcomes and moderators. Journal of Business Ethics, 139, 517–536. https://doi.org/10.1007/ s10551-015-2625-1 Bies, R. J., & Moag, J. F. (1986). Interactional justice: Communication criteria of fairness. In R. J. Lewicki, B. H. Sheppard, & M. H. Bazerman (Eds.), Research on negotiations in organizations (Vol. 1, pp. 43–55). Greenwich, CT: JAI Press. Blau, P. M. (1964). Exchange and power in social life. New Brunswick, NJ: Transaction Publishers. Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2011). Introduction to meta-analysis. Chichester, UK: Wiley. Brown, M. E., & Treviño, L. K. (2006). Ethical leadership: A review and future directions. Leadership Quarterly, 17, 595–616. https://doi.org/10.1016/j.leaqua.2006.10.004 Brown, M. E., Treviño, L. K., & Harrison, D. A. (2005). Ethical leadership: A social learning perspective for construct development and testing. Organizational Behavior and Human Decision Processes, 97, 117–134. https://doi.org/10.1016/j. obhdp.2005.03.002 Carpenter, N. C., Berry, C. M., & Houston, L. (2014). A metaanalytic comparison of self-reported and other-reported organizational citizenship behavior. Journal of Organizational Behavior, 35, 547–574. https://doi.org/10.1002/job.1909 Zeitschrift für Psychologie (2019), 227(1), 18–30


28

Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

Cheung, M. W.-L. (2015a). Meta-analysis: A structural equation modeling approach. Hoboken, NJ: Wiley. https://doi.org/ 10.1002/9781118957813 Cheung, M. W.-L. (2015b). Metasem: An R package for metaanalysis using structural equation modeling. Frontiers in Psychology, 5, 1–7. https://doi.org/10.3389/fpsyg.2014.01521 Cheung, M. W.-L., & Chan, W. (2005). Meta-analytic structural equation modeling: A two-stage approach. Psychological Methods, 10, 40–64. https://doi.org/10.1037/1082-989X.10.1.40 Clarke, K. A. (2005). The phantom menace: Omitted variable bias in econometric research. Conflict Management and Peace Science, 22, 341–352. https://doi.org/10.1080/07388940500339183 Cohen-Charash, Y., & Spector, P. E. (2001). The role of justice in organizations: A meta-analysis. Organizational Behavior and Human Decision Processes, 86, 278–321. https://doi.org/ 10.1006/obhd.2001.2958 Colquitt, J. A., Conlon, D. E., Wesson, M. J., Porter, C. O., & Ng, K. Y. (2001). Justice at the millennium: A meta-analytic review of 25 years of organizational justice research. Journal of Applied Psychology, 86, 425–445. https://doi.org/10.1037/ 0021-9010.86.3.425 Colquitt, J. A., & Greenberg, J. (2003). Organizational justice: A fair assessment of the state of the literature. In J. Greenberg (Ed.), Organizational behavior: The state of the science (pp. 165–210). Mahwah, NJ: Erlbaum. Colquitt, J. A., & Rodell, J. B. (2011). Justice, trust, and trustworthiness: A longitudinal analysis integrating three theoretical perspectives. Academy of Management Journal, 54, 1183–1206. https://doi.org/10.5465/amj.2007.0572 Colquitt, J. A., Scott, B. A., Rodell, J. B., Long, D. M., Zapata, C. P., Conlon, D. E., & Wesson, M. J. (2013). Justice at the millennium, a decade later: A meta-analytic test of social exchange and affect-based perspectives. Journal of Applied Psychology, 98, 199–236. https://doi.org/10.1037/a0031757 Cropanzano, R., & Mitchell, M. S. (2005). Social exchange theory: An interdisciplinary review. Journal of Management, 31, 874–900. https://doi.org/10.1177/0149206305279602 Cropanzano, R., Prehar, C. A., & Chen, P. Y. (2002). Using social exchange theory to distinguish procedural from interactional justice. Group & Organization Management, 27, 324–351. https://doi.org/10.1177/1059601102027003002 Ehrhart, M. G. (2004). Leadership and procedural justice climate as antecedents of unit-level organizational citizenship behavior. Personnel Psychology, 57, 61–94. https://doi.org/10.1111/ j.1744-6570.2004.tb02484.x El Akremi, A., Vandenberghe, C., & Camerman, J. (2010). The role of justice and social exchange relationships in workplace deviance: Test of a mediated model. Human Relations, 63, 1687–1717. https://doi.org/10.1177/0018726710364163 Erkutlu, H. (2011). The moderating role of organizational culture in the relationship between organizational justice and organizational citizenship behaviors. Leadership & Organization Development Journal, 32, 532–554. https://doi.org/10.1108/ 01437731111161058 Farh, J.-L., Earley, P. C., & Lin, S.-C. (1997). Impetus for action: A cultural analysis of justice and organizational citizenship behavior in Chinese society. Administrative Science Quarterly, 42, 421–444. https://doi.org/10.2307/2393733 Fassina, N. E., Jones, D. A., & Uggerslev, K. L. (2008). Meta-analytic tests of relationships between organizational justice and citizenship behavior: Testing agent-system and shared-variance models. Journal of Organizational Behavior, 29, 805–828. https://doi.org/10.1002/job.494 Ferris, D. L., Lian, H., Brown, D. J., & Morrison, R. (2015). Ostracism, self-esteem, and job performance: When do we

Zeitschrift für Psychologie (2019), 227(1), 18–30

self-verify and when do we self-enhance? Academy of Management Journal, 58, 279–297. https://doi.org/10.5465/ amj.2011.0347 Folger, R. (2001). Fairness as deonance. In S. Gilliland, D. Steiner, & D. Skarlicki (Eds.), Theoretical and cultural perspectives on organizational justice (pp. 3–33). Greenwich, CT: Information Age. Folger, R., & Cropanzano, R. (2001). Fairness theory: Justice as accountability. In J. Greenberg & R. Cropanzano (Eds.), Advances in organization justice (pp. 1–55). Stanford, CA: Stanford University Press. Greenberg, J., & Cropanzano, R. (1993). The social side of fairness: Interpersonal and informational classes of organizational justice. In R. Corponzona (Ed.), Justice in the workplace: Approaching fairness in human resource management (pp. 79–103). Hillsdale, NJ: Erlbaum. Hoffman, B. J., Blair, C. A., Meriac, J. P., & Woehr, D. J. (2007). Expanding the criterion domain? A quantitative review of the OCB literature. Journal of Applied Psychology, 92, 555–566. https://doi.org/10.1037/0021-9010.92.2.555 Kacmar, K. M., Andrews, M. C., Harris, K. J., & Tepper, B. J. (2013). Ethical leadership and subordinate outcomes: The mediating role of organizational politics and the moderating role of political skill. Journal of Business Ethics, 115, 33–44. https:// doi.org/10.1007/s10551-012-1373-8 Kacmar, K. M., Bachrach, D. G., Harris, K. J., & Zivnuska, S. (2011). Fostering good citizenship through ethical leadership: Exploring the moderating role of gender and organizational politics. Journal of Applied Psychology, 96, 633–642. https:// doi.org/10.1037/a0021872 Kalshoven, K., Den Hartog, D. N., & de Hoogh, A. H. (2013). Ethical leadership and followers’ helping and initiative: The role of demonstrated responsibility and job autonomy. European Journal of Work and Organizational Psychology, 22, 165–181. https://doi.org/10.1080/1359432X.2011.640773 Karriker, J. H., & Williams, M. L. (2009). Organizational justice and organizational citizenship behavior: A mediated multifoci model. Journal of Management, 35, 112–135. https://doi.org/ 10.1177/0149206307309265 Kim, W. C., & Mauborgne, R. A. (1996). Procedural justice and managers’ in-role and extra-role behavior: The case of the multinational. Management Science, 42, 499–515. https://doi. org/10.1287/mnsc.42.4.499 Krippendorff, K. (2012). Content analysis: An introduction to its methodology. London, UK: Sage. Lee, K., & Allen, N. J. (2002). Organizational citizenship behavior and workplace deviance: The role of affect and cognitions. Journal of Applied Psychology, 87, 131–142. https://doi.org/ 10.1037/0021-9010.87.1.131 Lehmann-Willenbrock, N., Grohmann, A., & Kauffeld, S. (2012). Promoting multifoci citizenship behavior: Time-lagged effects of procedural justice, trust, and commitment. Applied Psychology, 62, 454–485. https://doi.org/10.1111/j.1464-0597.2012. 00488.x Liden, R. C., Sparrowe, R. T., & Wayne, S. J. (1997). Leadermember exchange theory: The past and potential for the future. In G. R. Ferris (Ed.), Research in personnel and human resources management (Vol. 15, pp. 47–120). Greenwich, CT: JAI Press. Lind, E., Greenberg, J., & Cropanzano, R. (2001). Advances in organizational justice. Stanford, CA: Stanford University Press. Loi, R., Lam, L. W., & Chan, K. W. (2012). Coping with job insecurity: The role of procedural justice, ethical leadership and power distance orientation. Journal of Business Ethics, 108, 361–372. https://doi.org/10.1007/s10551-011-1095-3

Ó 2019 Hogrefe Publishing


Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

Loi, R., Yang, J., & Diefendorff, J. M. (2009). Four-factor justice and daily job satisfaction: A multilevel investigation. Journal of Applied Psychology, 94, 770–781. https://doi.org/10.1037/ a0015714 Lu, X. (2014). Ethical leadership and organizational citizenship behavior: The mediating roles of cognitive and affective trust. Social Behavior and Personality, 42, 379–389. https://doi.org/ 10.2224/sbp.2014.42.3.379 Mackey, J. D., Frieder, R. E., Brees, J. R., & Martinko, M. J. (2017). Abusive supervision: A meta-analysis and empirical review. Journal of Management, 43, 1940–1965. https://doi.org/ 10.1177/0149206315573997 Matta, F. K., Scott, B. A., Koopman, J., & Conlon, D. E. (2015). Does seeing “eye to eye” affect work engagement and organizational citizenship behavior? A role theory perspective on lmx agreement. Academy of Management Journal, 58, 1686–1708. https://doi.org/10.5465/amj.2014.0106 Mayer, D. M., Aquino, K., Greenbaum, R. L., & Kuenzi, M. (2012). Who displays ethical leadership, and why does it matter? An examination of antecedents and consequences of ethical leadership. Academy of Management Journal, 55, 151–171. https://doi.org/10.5465/amj.2008.0276 Mayer, D. M., Kuenzi, M., Greenbaum, R., Bardes, M., & Salvador, R. B. (2009). How low does ethical leadership flow? Test of a trickle-down model. Organizational Behavior and Human Decision Processes, 108, 1–13. https://doi.org/10.1016/j.obhdp.2008. 04.002 Moorman, R. H. (1991). Relationship between organizational justice and organizational citizenship behaviors: Do fairness perceptions influence employee citizenship? Journal of Applied Psychology, 76, 845–855. https://doi.org/10.1037/0021-9010. 76.6.845 Moorman, R. H., & Blakely, G. L. (1995). Individualism-collectivism as an individual difference predictor of organizational citizenship behavior. Journal of Organizational Behavior, 16, 127–142. https://doi.org/10.1002/job.4030160204 Moorman, R. H., & Byrne, Z. S. (2005). What is the role of justice in promoting organizational citizenship behavior. In Handbook of organizational justice: Fundamental questions about fairness in the workplace (pp. 355–382). Mahwah, NJ: Erlbaum. Neubert, M. J., Carlson, D. S., Kacmar, K. M., Roberts, J. A., & Chonko, L. B. (2009). The virtuous influence of ethical leadership behavior: Evidence from the field. Journal of Business Ethics, 90, 157–170. https://doi.org/10.1007/s10551-009-0037-9 Newman, A., Kiazad, K., Miao, Q., & Cooper, B. (2014). Examining the cognitive and affective trust-based mechanisms underlying the relationship between ethical leadership and organisational citizenship: A case of the head leading the heart? Journal of Business Ethics, 123, 113–123. https://doi.org/10.1007/s10551013-1803-2 Ng, T. W. H., & Feldman, D. C. (2015). Ethical leadership: Metaanalytic evidence of criterion-related and incremental validity. Journal of Applied Psychology, 100, 948–965. https://doi.org/ 10.1037/a0038246 Organ, D. W. (1988). Organizational citizenship behavior: The good soldier syndrome. Lexington, MA: Lexington Books. Organ, D. W., Podsakoff, P. M., & MacKenzie, S. B. (2005). Organizational citizenship behavior: Its nature, antecedents, and consequences. Thousand Oaks, CA: Sage. Organ, D. W., & Ryan, K. (1995). A meta-analytic review of attitudinal and dispositional predictors of organizational citizenship behavior. Personnel Psychology, 48, 775–802. https:// doi.org/10.1111/j.1744-6570.1995.tb01781.x Özbek, M. F., Yoldash, M. A., & Tang, T. L.-P. (2016). Theory of justice, OCB, and individualism: Kyrgyz citizens. Journal of

Ó 2019 Hogrefe Publishing

29

Business Ethics, 137, 365–382. https://doi.org/10.1007/ s10551-015-2553-0 Palanski, M., Avey, J. B., & Jiraporn, N. (2014). The effects of ethical leadership and abusive supervision on job search behaviors in the turnover process. Journal of Business Ethics, 121, 135–146. https://doi.org/10.1007/s10551-013-1690-6 Podsakoff, N. P., Blume, B. D., Whiting, S. W., & Podsakoff, P. M. (2009). Individual- and organizational-level consequences of organizational citizenship behaviors: A meta-analysis. Journal of Applied Psychology, 94, 122–141. https://doi.org/10.1037/ a0013079 Resick, C. J., Hargis, M. B., Shao, P., & Dust, S. B. (2013). Ethical leadership, moral equity judgments, and discretionary workplace behavior. Human Relations, 66, 951–972. https://doi.org/ 10.1177/0018726713481633 Roch, S. G., & Shanock, L. R. (2006). Organizational justice in an exchange framework: Clarifying organizational justice distinctions. Journal of Management, 32, 299–322. https://doi.org/ 10.1177/0149206305280115 Rosenthal, R., & DiMatteo, M. R. (2001). Meta-analysis: Recent developments in quantitative methods for literature reviews. Annual Review of Psychology, 52, 59–82. https://doi.org/ 10.1146/annurev.psych.52.1.59 Rothstein, H. R., & Hopewell, S. (2009). Grey literature. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (Vol. 2, pp. 103–125). New York, NY: Russell Sage Foundation. Ruiz-Palomino, P., Ruiz-Amaya, C., & Knörr, H. (2011). Employee organizational citizenship behaviour: The direct and indirect impact of ethical leadership. Canadian Journal of Administrative Sciences/Revue Canadienne des Sciences de l’Administration, 28, 244–258. https://doi.org/10.1002/cjas.221 Rupp, D. E., & Cropanzano, R. (2002). The mediating effects of social exchange relationships in predicting workplace outcomes from multifoci organizational justice. Organizational Behavior and Human Decision Processes, 89, 925–946. https:// doi.org/10.1016/S0749-5978(02)00036-5 Sharif, M. M., & Scandura, T. A. (2014). Do perceptions of ethical conduct matter during organizational change? Ethical leadership and employee involvement. Journal of Business Ethics, 124, 185–196. https://doi.org/10.1007/s10551-013-1869-x Shin, Y. (2012). CEO ethical leadership, ethical climate, climate strength, and collective organizational citizenship behavior. Journal of Business Ethics, 108, 299–312. https://doi.org/ 10.1007/s10551-011-1091-7 Shin, Y., Sung, S. Y., Choi, J. N., & Kim, M. S. (2015). Top management ethical leadership and firm performance: Mediating role of ethical and procedural justice climate. Journal of Business Ethics, 129, 43–57. https://doi.org/10.1007/s10551014-2144-5 Skarlicki, D. P., & Latham, G. P. (1996). Increasing citizenship behavior within a labor union: A test of organizational justice theory. Journal of Applied Psychology, 81, 161–169. https://doi. org/10.1037/0021-9010.81.2.161 Tekleab, A. G., Takeuchi, R., & Taylor, M. S. (2005). Extending the chain of relationships among organizational justice, social exchange, and employee reactions: The role of contract violations. Academy of Management Journal, 48, 146–157. https://doi.org/10.5465/amj.2005.15993162 Tepper, B. J., & Taylor, E. C. (2003). Relationships among supervisors’ and subordinates’ procedural justice perceptions and organizational citizenship behaviors. Academy of Management Journal, 46, 97–105. https://doi.org/10.5465/30040679 Tsui, A. S. (2007). From homogenization to pluralism: International management research in the academy and beyond. Academy of

Zeitschrift für Psychologie (2019), 227(1), 18–30


30

Y. Zhang et al., Ethical Leadership and Organizational Citizenship Behavior

Management Journal, 50, 1353–1364. https://doi.org/10.5465/ amj.2007.28166121 Tyler, T. R., & Bies, R. J. (1990). Beyond formal procedures: The interpersonal context of procedural justice. Applied Social Psychology and Organizational Settings,77–98. https://doi.org/ 10.4324/9781315728377-4 United Nations Development Programme. (2018). Human development indices and indicators, 2018 statistical update. New York, NY: United Nations. Van Dyne, L., Graham, J. W., & Dienesch, R. M. (1994). Organizational citizenship behavior: Construct redefinition, measurement, and validation. Academy of Management Journal, 37, 765–802. https://doi.org/10.5465/256600 Vey, M. A., & Campbell, J. P. (2004). In-role or extra-role organizational citizenship behavior: Which are we measuring? Human Performance, 17, 119–135. https://doi.org/10.1207/ S15327043HUP1701_6 Viswesvaran, C., & Ones, D. S. (1995). Theory testing: Combining psychometric meta-analysis and structural equations modeling. Personnel Psychology, 48, 865–885. https://doi.org/ 10.1111/j.1744-6570.1995.tb01784.x Walumbwa, F. O., Hartnell, C. A., & Misati, E. (2017). Does ethical leadership enhance group learning behavior? Examining the mediating influence of group ethical conduct, justice climate, and peer justice. Journal of Business Research, 72, 14–23. https://doi.org/10.1016/j.jbusres.2016.11.013 Wang, H., Lu, G., & Liu, Y. (2017). Ethical leadership and loyalty to supervisor in china: The roles of interactional justice and collectivistic orientation. Journal of Business Ethics, 146, 529–543. https://doi.org/10.1007/s10551-015-2916-6 Williams, L. J., & Anderson, S. E. (1991). Job satisfaction and organizational commitment as predictors of organizational citizenship and in-role behaviors. Journal of Management, 17, 601–617. https://doi.org/10.1177/014920639101700305 Wong, Y.-T., Ngo, H.-Y., & Wong, C.-S. (2006). Perceived organizational justice, trust, and OCB: A study of chinese workers in joint ventures and state-owned enterprises. Journal of World Business, 41, 344–355. https://doi.org/10.1016/j.jwb.2006.08.003 Xu, A. J., Loi, R., & Ngo, H.-y. (2016). Ethical leadership behavior and employee justice perceptions: The mediating role of trust in organization. Journal of Business Ethics, 134, 493–504. https://doi.org/10.1007/s10551-014-2457-4 Yang, C., Ding, C. G., & Lo, K. W. (2016). Ethical leadership and multidimensional organizational citizenship behaviors: The mediating effects of self-efficacy, respect, and leader–member exchange. Group & Organization Management, 41, 343–374. https://doi.org/10.1177/1059601115594973

Zeitschrift für Psychologie (2019), 227(1), 18–30

Zeinabadi, H. (2010). Job satisfaction and organizational commitment as antecedents of organizational citizenship behavior (OCB) of teachers. Procedia – Social and Behavioral Sciences, 5, 998–1003. https://doi.org/10.1016/j.sbspro.2010.07.225 Zhang, L., Zhang, Y., Jiang, H., Yang, M., Huang, Y., & Li, S. (2017). Customer identification in the healthcare industry. International Journal of Market Research, 59, 803–822. https://doi. org/10.2501/ijmr-2017-054 Zhang, Y.-J. (2012). The influence of ethical leadership on employees’ CWB: From social learning and social exchange perspective. Journal of Business Economics, 12, 23–32. https:// doi.org/10.14134/j.cnki.cn33-1336/f.2012.12.002 Zhang, Y., Zhang, L., Lei, H., Yue, Y., & Zhu, J. (2016). Lagged effect of daily surface acting on subsequent day’s fatigue. Service Industries Journal, 36, 809–826. https://doi.org/10.1080/ 02642069.2016.1272593 History Received February 28, 2018 Revision received October 8, 2018 Accepted October 14, 2018 Published online March 29, 2019 Funding This work was supported by the Young Scientists Fund of the National Natural Science Foundation of China (Grant Nos. 71602163, 71702043, and 71802077), and Young Scientists Fund from the Ministry of Education of Humanities and Social Sciences Project in China (Grant No. 16YJC630171). Long Zhang Business School Hunan University Yuelu, Changsha, 410082 China lyon.long.zhang@gmail.com Jiali Duan School of Management University of New South Wales Business School Kingsford, NSW 2052 Sydney Australia jiali.duan@unsw.edu.au

Ó 2019 Hogrefe Publishing


Review Article

Impaired Interparental Relationships in Families of Children With Attention-Deficit/ Hyperactivity Disorder (ADHD) A Meta-Analysis Lena Weyers, Martina Zemp, and Georg W. Alpers Department of Psychology, University of Mannheim, Germany

Abstract: Research on attention-deficit/hyperactivity disorder (ADHD) in children and adolescents has traditionally focused on the genetic and neurobiological aspects of the disorder, but the role of family relationships has been much less systematically examined. There is growing evidence that the quality of interparental relationships and a child’s ADHD symptoms are reciprocally related. Because the literature appears to be inconsistent, this meta-analysis aims to summarize previous research and assess whether there are robust differences in the quality of interparental relationships between parents of children with ADHD and parents of healthy children. This meta-analysis of 15 studies with 43 effect sizes revealed a small but significant difference (d = .24), which indicates that parents of a child with ADHD report poorer relationship quality than parents of healthy children. This effect was moderated by the child’s age and did not depend on whether the child had a comorbid oppositional defiant disorder (ODD) or conduct disorder (CD). The causality of this relationship has yet to be determined. Keywords: child ADHD, parents, couples, interparental conflict, relationship satisfaction

Attention-deficit/hyperactivity disorder (ADHD) is one of the most common mental disorders in childhood; it affects about 3–5% of children and adolescents worldwide, according to most recent prevalence estimates (Polanczyk, de Lima, Horta, Biederman, & Rohde, 2007; Polanczyk, Salum, Sugaya, Caye, & Rhode, 2015). ADHD is a neurodevelopmental disorder that is characterized by a persistent pattern of inattention, hyperactivity, and impulsivity, it substantially interferes with the child’s functioning or development (DSM-5; American Psychiatric Association, 2013). ADHD is usually accompanied by difficulties with executive functioning and maladaptive behavior and it leads to various degrees of impairment in different areas, that is, in social settings, at school, and at home (Faraone et al., 2015). The disorder is often associated with long-term impairments, in about half of the affected children (30–70%) the disorder is chronic and continues throughout adolescence and persists into adulthood, but the manifestation of the symptoms varies over the lifespan (Banaschewski, Poustka, & Holtmann, 2011; Caye et al., 2016; Schmidt & Petermann, 2009). Hyperactivity and impulsivity commonly decrease across childhood, whereas Ó 2019 Hogrefe Publishing

the attention deficit, as well as the symptoms of impaired mood regulation (Turgay et al., 2012), typically persist more constantly (Konrad & Rösler, 2009; O’Neill, Rajendran, Mahbubani, & Halperin, 2017). As ADHD is one of the mental disorders with the strongest heritability (Grimm, Kittel-Schneider, & Reif, 2018), research made headway toward a better understanding of the nature of the disorder in recent years. Its heritability is estimated to be above .70 (Faraone et al., 2005) and genetics as well as neurobiological factors have been well-established in the pathogenesis of this disorder (Banaschewski, Roessner, Uebel, & Rothenberger, 2004). By the same token, scientists gathered persuasive empirical support that family factors play a pivotal role in the etiology and maintenance of childhood ADHD (Johnston & Mash, 2001), but some of the findings appear to be less robust and less clear. The main focus of this line of research has been on parenting styles and the quality of the parent– child relationship (Theule, Wiener, Tannock, & Jenkins, 2013). One of the best replicated findings demonstrates that parents of a child with ADHD report more parenting problems and parenting stress, less parenting self-efficacy, and Zeitschrift für Psychologie (2019), 227(1), 31–41 https://doi.org/10.1027/2151-2604/a000354


32

L. Weyers et al., Interparental Relationships in Families of Children With ADHD

more parent–child conflicts than parents of healthy children (Fischer, 1990; Park, Hudec, & Johnston, 2017). Due to this strain it is well conceivable that interparental relationship (i.e., the couple relationship) in families with a child with ADHD may be afflicted as well, but this has not been well established so far.

Parental Relationship and Child ADHD Research suggests that there are bidirectional effects between a child’s ADHD and the quality of the interparental relationship. Children’s symptomatology can burden parents’ relationship and, vice versa, a poor parental relationship can affect the child’s symptoms (Zemp, 2018). On the one hand, the ADHD symptoms of the child often cause stress, frustration, and exhaustion on the parents’ part, which will likely strain the couple’s relationship (Zemp, Nussbeck, Cummings, & Bodenmann, 2017). Parenting of a child with ADHD can be time consuming and it demands energy. These perpetual challenges in childrearing and the elevated stress level can compromise the individual well-being of parents (Cheung, Aberdeen, Ward, & Theule, 2018) as well as the interparental relationship, leading to more couple distress and conflicts (Johnston & Mash, 2001). Interparental conflicts, in turn, have been identified as an important risk factor for children’s behavioral problems (Cummings & Davies, 2010; Heinrichs, Cronrath, Degen, & Snyder, 2010). There may also be an interaction of the parents’ and the child’s characteristics. Due to their low behavioral inhibition, children with ADHD may be more sensitive to interparental conflict; overall they tend to be more sensitive for unpleasant signals (Melfsen, Alpers, Walitza, & Warnke, 2006), and they have been shown to react to parental conflict with denial or defiant behavior (El-Sheikh, Keller, & Erath, 2007). Furthermore, given the high heritability of the disorder, parents of a child with ADHD may share some of their symptoms (Faraone et al., 2005) and some may display problematic behavior themselves (Cheung & Theule, 2016; Eisenbarth et al., 2008). Thus, it is plausible that dysfunctional interactions are especially pronounced in affected families and that this leads to a higher risk of family conflicts (Wymbs, Wymbs, & Dawson, 2015). In addition, it has been shown that the level of conflict and the probability for dysfunctional parenting behavior is generally higher if one or both parents suffer from ADHD themselves (Eakin et al., 2004). This can further amplify the crucial interaction between the child and parental behavior. In sum, it can be expected that the couple relationship of parents of a child with ADHD is strained compared to Zeitschrift für Psychologie (2019), 227(1), 31–41

families with healthy children. However, previous research in this field appears to still be somewhat discrepant (Johnston & Mash, 2001). While some studies clearly support an association between child ADHD and lower quality of the interparental relationship (e.g., Befera & Barkley, 1985; Goldstein et al., 2007), there are at least two studies that did not find this link (e.g., Kachooei, Daneshmand, Dolatshahi, Samadi, & Samiei, 2016; Wymbs, Pelham, Molina, & Gnagy, 2008). Recently, Zemp (2018) published a systematic review on the topic of the interparental relationship in families with children with ADHD, which sums up the inconsistent state of research, but doesn’t yet synthesize former findings statistically. It is thus necessary to meta-analytically evaluate whether the quality of the parental relationship in families of children with ADHD differs from that of families of healthy children. Furthermore, potential moderators of this likely relationship need to be examined quantitatively.

Potential Moderators of the Link Between Parental Relationship Quality and Child ADHD Inconsistencies in the pertinent literature may partially be explained by factors that modulate the mean difference in the quality of the interparental relationship in families of children with ADHD compared to families of healthy children. That is, several moderators have been implicated by previous publications but they were not systematically examined in all of the studies. For instance, the child’s age seems to be a relevant factor. As the manifestation of ADHD symptoms changes over the lifespan and often already in the transition from childhood to adolescence (Hart, Lahey, Loeber, Applegate, & Frick, 1995), the impact of ADHD on the interparental relationship may differ depending on the child’s age. Hyperactivity and impulsivity, which especially strain parents’ relationship (Cui, Donnellan, & Conger, 2007), often decline from childhood through adolescence (O’Neill et al., 2017). In the same vein, the challenges of parenthood for a couple’s relationship may decrease when children get older (Mitnick, Heyman, & Smith Slep, 2009), when parents have adapted to the new situation and learned effective ways to manage difficult child behavior. Taken together, there is reason to expect that the quality of the interparental relationship is especially affected in families with younger ADHD children. A second potential modulating factor on the difference in the interparental relationship between families of children with ADHD versus healthy children can be the child’s gender. Girls are more likely diagnosed as primarily inattentive and boys as the combined type, because they display more Ó 2019 Hogrefe Publishing


L. Weyers et al., Interparental Relationships in Families of Children With ADHD

hyperactivity and impulsivity on average (Biederman et al., 2002). The latter symptoms emerged as a particular burden and challenge for parents, because the symptoms of hyperactivity and impulsivity affect family functioning more than the inattention component (Lewis, 1992). Apart from that, externalizing symptoms seem to be more strongly associated with interparental conflict than internalizing symptoms (Cui et al., 2007; Jenkins, Simpson, Dunn, Rasbash, & O’Connor, 2005). Thus, the gender of the child with ADHD plays a role in the manifestation of the symptoms and can, thus, differentially affect the interparental relationship. Therefore, we expect that the quality of the interparental relationship is more strongly impaired in families with male ADHD children. Third, there is evidence for the notion that parents of a child suffering from ADHD with a comorbid Oppositional Defiant disorder (ODD) or Conduct disorder (CD) report higher levels of interparental conflict than parents of a child with ADHD alone (Barkley, Anastopoulos, Guevremont, & Fletcher, 1992; Wymbs, Pelham, Molina, & Gnagy, 2008). Williamson and Johnston (2016) found that the presence of comorbid behavioral problems were more predictive for interparental conflict than the ADHD symptoms. Furthermore, parents of a child with ADHD and a comorbid disorder of ODD report less marital satisfaction compared to parents of a child with ADHD alone (Lindahl, 1998). We thus expect that the quality of the interparental relationship is particularly low in families of children with ADHD and a comorbid behavioral disorder. Fourth, because women generally report lower relationship satisfaction than men (Jackson, Miller, Oka, & Henry, 2014), it may make a difference who provides the data on the quality of the relationship. Therefore, we assess the rater of the relationship measure (mother, father, or both parents) as a potential moderator. We expect that the effects of ADHD are more pronounced in studies where the mother evaluates the interparental relationship than if the father rates the relationship or the parents provide a joint assessment.

The Current Meta-Analysis Meta-analyses are an effective tool to systematically summarize conflicting results on a given research topic and to examine whether potential factors moderate a specific association. Therefore, the current study has two major aims: The first is to determine the strength of the mean difference (Cohen’s d) in parental relationship quality in families

1

33

of children with ADHD compared to families of healthy children. We expect that parents of a child with ADHD report lower relationship quality than parents of healthy children. Specifically, a lower relationship quality can be characterized by more negativity in the relationship (i.e., higher levels of interparental conflict), but also by less positivity (i.e., lower relationship satisfaction). Thus, we decided to include these two constructs as our target variable (Zemp, Bodenmann, Backes, Sutter-Stickel, & Bradbury, 2016). The second objective is to investigate whether any of the factors discussed above, namely age and gender of the child, the comorbidity with either ODD or CD, and the source of the relationship rating, moderate the strength of the mean difference (Cohen’s d) in parental relationship quality between families of children with ADHD versus healthy children. In addition, as methodological factors that potentially characterize the studies, we control for the publication year of the study and the sample sizes. We do not postulate specific hypotheses regarding these moderators but aim to rule out their potential confounding impact.1

Method Criteria for Inclusion and Exclusion To be included in the meta-analysis, the studies had to meet the following criteria: (1) The index child had to be diagnosed with ADHD by an expert based on DSM criteria in the respective current version by the APA (for the latest version, see American Psychiatric Association, 2013). Studies reporting no ADHD diagnosis (e.g., only symptom manifestation on a dimensional measure) and studies referring to an ADHD diagnosis based on other diagnostic material (questionnaires) or on sole parent- or teacher-report were excluded. (2) The index child had to be younger than 18 years, as this meta-analysis focuses on child ADHD. (3) At least one quantitative measure of the quality of the interparental relationship (i.e., couple conflict or relationship satisfaction of parents) was required. (4) A comparison of the interparental relationship quality between parents of children with ADHD and parents of healthy children needed to be reported; studies examining the association between the interparental relationship quality and child ADHD in a clinical group exclusively (without controls) were excluded. (5) Statistical indices of means, standard deviations, and sample sizes of both groups (ADHD vs. healthy children) were obtainable.

Raw data, R script, and supplementary material of the present meta-analysis are publicly accessible at https://madata.bib.uni-mannheim.de/ id/eprint/277

Ó 2019 Hogrefe Publishing

Zeitschrift für Psychologie (2019), 227(1), 31–41


34

L. Weyers et al., Interparental Relationships in Families of Children With ADHD

Identification

Figure 1. Selection process of the studies included in the meta-analysis.

3,125 articles identified from databases and reference lists 2,915 articles excluded from further review (non-relevance of topic, no childhood sample)

Screening

210 articles retrieved for relevance assessment

195 articles excluded 100 No relevant indicator measure

Eligibility

72 No relevant effect size measure 19 No clinical diagnosis 4 Unable to obtain the relevant data

Inclusion

Final inclusion: 43 effect sizes from 15 studies

Search Strategy The first author conducted the search and the study screening. Studies were searched in the following databases: PsycInfo, PsychArticles, Academic Search Premier, PSYNDEX, Pubmed, and Web of Science. The first search terms were “ADHD,” “attention deficit,” or “hyperactivity” in the title, abstract, or the keywords of the paper. Second, the terms “relationship”/“couple” and “quality”/“satisfaction”/“distress,” or “couple/interparental conflict” were searched in combination with the first search terms throughout the entire text of the paper. Figure 1 describes all stages of the selection process. The initial search yielded 3,125 results, which were then screened according to the criteria described above. If the title and the populations were relevant, the abstracts were read carefully. A large number of publications (k = 2,915) were excluded because they did not meet the inclusion criteria on the first assessment. Among the remaining 210 articles, it was examined whether the relevant measures were reported with the necessary statistics (N, M, SD), and whether ADHD was clinically diagnosed by an expert applying the DSM criteria. Nineteen studies had no ADHD group diagnosed by an expert, 100 articles lacked relevant dependent measures, and 72 studies did not compare between families of children with ADHD and families of healthy children. We contacted the authors of five articles via e-mail and one of them (Edwards, Barkley, Laneri, Fletcher, & Metevia, 2001) provided usable data for the current meta-analysis. The final set of studies for our data syntheses consists of 15 studies (asterisk [*] in the reference list) with k = 43 effect sizes based on n = 827 children with ADHD, n = 559 children with ADHD and comorbid ODD/CD, and n = 1,431 healthy Zeitschrift für Psychologie (2019), 227(1), 31–41

control children. Characteristics of the final set of studies are summarized in Table 1.

Methodological Quality of Studies All studies included in this meta-analysis are crosssectional, there are yet no longitudinal data on the topic according to our search. The reliability of the measures used to assess the quality of the interparental relationship were high overall (Cronbach’s α ranged from .80 to .95). Apart from that, the scales are well-established and frequently used in couples research (e.g., the Dyadic Adjustment Scale; Spanier, 1988). ADHD was in all cases diagnosed by an expert based on DSM criteria in the respective current version (for the most recent version, see APA, 2013). Depending on publication year, the studies used the DSM-III, DMS-III-R, DSM-IV, or DSM-IV-TR. Hence, our rigorous criteria applied to include/exclude studies in the search strategy warrant high methodological quality of primary studies using reliable and valid measures of the core constructs.

Data Extraction Relevant data on study characteristics and effect sizes are listed in Table 1. Studies were coded by the first author and the coding was independently double-checked by a research assistant who was blinded to our expectations. The coding procedure was done in a computer-based manner and was carried out according to established guidelines (Lipsey & Wilson, 2001). The two coders of the studies yield an agreement of 97.8% across all coded data. In the divergent cases (2.2%), the first author double-checked Ó 2019 Hogrefe Publishing


L. Weyers et al., Interparental Relationships in Families of Children With ADHD

35

Table 1. Study characteristics and effects sizes Authors

Publication year

Country of study

Na

Nb

Agec

Barkley et al.

1992

USA

27

77

14.4

56

77

13.9

15

15

8.48

100

0

0

Satisfaction

Mother

.84 [.10, 1.59]

15

15

8.76

0

0

0

Satisfaction

Mother

1.27 [.49, 1.59]

63

119

0

0

Satisfaction

Both parents

Befera and Barkley Ben-Naim, Gill, Laslo-Roth, and Einav

1985 2018

USA Israel

Genderd

ODD

CD

Outcome measure

Rater of relationship

Cohen’s d [95% CI]

88.9

0

0

Satisfaction

Mother

.04 [ .48, .40]

91.2

1

0

Satisfaction

Mother

.64 [.29, .99]

.34 [.03, .65]

Castro

2017

Slovenia

77

73

10.79

0

0

Conflict

Both parents

.45 [.13, .78]

Dutta and Sanyal

2015

India

32

32

0

0

Satisfaction

Both parents

.35 [ .14, .85]

32

32

Edwards et al.

2001

USA

80

31

14.8

Goldstein et al.

2007

USA

38

52

3

70

52

3

29

38

52

38

99

180

Green et al.

2016

Australia

Johnston

1996

Canada

0

0

Satisfaction

Both parents

.08 [ .41, .57]

1

0

Satisfaction

Mother

.33 [ .08, .75]

0

0

Conflict

Mother

.27 [ .15, .69]

1

0

Conflict

Mother

.68 [.32, 1.05]

3

0

0

Conflict

Father

.47 [ .02, .96]

3

1

0

Conflict

Father

.65 [.22, 1.08]

0

0

Conflict

Both parents

.28 [.04, .53]

7.71

100

64

23

33

8.18

87

0

0

Satisfaction

Mother

.60 [.06, 1.15]

25

33

8.75

96

1

0

Satisfaction

Mother

1.01 [.46, 1.46]

19

29

8.18

87

0

0

Satisfaction

Father

.67 [.07, 1.26]

14

29

8.75

96

1

0

Satisfaction

Father

.91 [.25, 1.58]

Kachooei et al.

2016

Iran

35

35

77.14

0

0

Satisfaction

Mother

.44 [ .92, .03]

Mohammadi et al.

2012

Iran

200

200

0

0

Satisfaction

Both parents

.19 [ .01, .38]

Murphy and Barkley

1996

USA

23

24

0

0

Satisfaction

Both parents

.70 [.11, 1.29]

23

24

0

0

Satisfaction

Both parents

.76 [.17, 1.36]

100

100

11.8

68

0

0

Satisfaction

Mother

.63 [.35, .91] .20 [ .06, .45]

Pheula, Rohde, and Schmitz

2011

Brazil

Sochos and Yahya

2015

UK

98

153

8.51

86.5

1

1

Satisfaction

Both parents

Wymbs et al.

2008

USA

26

88

15.27

96.15

0

0

Satisfaction

Mother

.19 [ .63, .25]

46

88

14.2

82.61

1

0

Satisfaction

Mother

.08 [ .70, .44]

23

88

14.13

0

1

Satisfaction

Mother

.07 [ .39, .53]

25

81

15.27

96.15

0

0

Satisfaction

Mother

.19 [ .64, .26]

46

81

14.2

82.61

1

0

Satisfaction

Mother

.07 [ .30, .43]

22

81

14.13

0

1

Satisfaction

Mother

.07 [ .41, .54]

26

88

15.27

96.15

0

0

Conflict

Mother

.25 [ .69, .18]

46

88

14.2

82.61

1

0

Conflict

Mother

.13 [ .22, .49]

23

88

14.13

0

1

Conflict

Mother

.33 [ .13, .79]

25

81

15.27

96.15

0

0

Conflict

Mother

.25 [ .70, .20]

46

81

14.2

82.61

1

0

Conflict

Mother

.10 [ .26, .46]

22

81

14.13

0

1

Conflict

Mother

.23 [ .24, .71]

26

88

15.27

96.15

0

0

Conflict

Mother

.22 [ .66, .21]

46

88

14.2

82.61

1

0

Conflict

Mother

.14 [ .50, .21]

23

88

14.13

0

1

Conflict

Mother

.09 [ .37, .55]

25

81

15.27

96.15

0

0

Conflict

Mother

.28 [ .73, .17]

46

81

14.2

82.61

1

0

Conflict

Mother

.16 [ .52, .20]

22

81

14.13

0

1

Conflict

Mother

.00 [ .47, .47]

100

100

100

100

100

100

Notes. Na = clinical sample (ADHD or ADHD with CD/ODD, respectively); Nb = healthy controls; Agec = mean age of child sample (in years); Genderd = percentage male in child sample; ODD = oppositional defiant disorder; CD = conduct disorder; 0 = Absence of comorbidity; 1 = presence of comorbidity. Multiple effect sizes for one study indicate multiple outcome measures, number of parents equates to the number of children in each group.

the values in the paper and corrected them as the errors were all typos. We extracted the means, standard deviations, and sample sizes for the interparental relationship Ó 2019 Hogrefe Publishing

quality index in the ADHD and the control group for computing Cohen’s d. Cohen’s d reflects the standardized mean difference between both groups. In case of a positive Zeitschrift für Psychologie (2019), 227(1), 31–41


36

L. Weyers et al., Interparental Relationships in Families of Children With ADHD

outcome variable (relationship satisfaction) it is computed by the group mean in the interparental relationship index of the parents with a healthy child minus the group mean of the parents with an ADHD child divided by the pooled variance. In case of a negative outcome variable (interparental conflict) the calculation of the mean difference was in the opposite direction, meaning the group mean of the ADHD group minus group mean of the healthy group. In so doing we ensure that higher d values reflect either more negativity or less positivity in the ADHD group. We expect a positive average effect size indicating that parents of children with ADHD report an overall lower relationship quality, thus more negativity and less positivity, than parents of children without ADHD. In addition, we extracted information on potential moderators (age, gender, comorbidity with ODD or CD, rater of the relationship, publication year, sample size). We expect a higher group difference, thus a higher Cohen’s d for families with younger than for older children. Also, we expect a higher d for families with boys than for girls, if a comorbid disorder (ODD/CD) is present and if the rating of the relationship bases only on the mother’s report. Age of child, age of publication (publication year), and sample size were used as continuous measures. Child gender was also used as a continuous moderator by percentage of boys with ADHD in a given study. The comorbidity with CD and ODD was used as two separate categorical moderators (“yes” vs. “no”). The rater of the relationship was coded categorically as “mother” (reference category), “father”, and “both parents”.

can be explained by sampling error only or by systematic differences among effect sizes. To investigate the second objective, namely to investigate whether any of the proposed variables moderate the strength of the mean difference (Cohen’s d) in parental relationship quality, meta-regressions with Knapp and Hartung (2003) adjustments were performed in metafor. This method is an adjustment to the standard errors of the estimated coefficients to partially account for the uncertainty in the estimate of the amount of heterogeneity. Because some of the studies provided more than one effect size, the source of our data is overlapping in some cases (see Table 1). Therefore, further analyses were conducted to assess whether an interdependence of effect sizes could be an issue. To this end, the R package robumeta, Version 2.0 (Fisher, Tipton, & Zhipeng, 2017) was used, which calculates robust variance estimation for metaregressions. With sensitivity analyses in the robumeta we investigated how the estimated amount of true variability around the mean effect size (τ2) changes for different values of the true correlation of the effect sizes (ρ). Only if these values remain robust, one can assume that there is no observable dependence between the effects (Hedges, Tipton, & Johnson, 2010).

Results The results support the proposed group differences between parents of a child with ADHD and parents with a healthy child and partly confirm the moderation hypotheses.

Data Syntheses The analyses were done with the R package metafor, Version 1.9.9 (Viechtbauer, 2010). To determine the strength of the mean difference (Cohen’s d) in parental relationship quality in families of children with ADHD compared to families of healthy children, Cohen’s d was calculated with the escalc function, which computes the standardized mean difference with a correction for a positive bias and the corresponding sample variances. We computed a Hedges-/Olkin-type random effects model (Hedges & Olkin, 1985) using the rma function, which synthesizes observed effect sizes with a weighted mean procedure where the inverse sampling variance of each effect size serves as a weight. This ensures that effect sizes with small sampling variance, presumably the more precise values, are weighed more in the overall effect size. In addition, the estimated amount of variability around the true standardized mean difference is computed. In the next step homogeneity of the overall weighted mean was assessed by means of three statistics (Q, I2, H2) in order to test whether the variability between observed effect sizes Zeitschrift für Psychologie (2019), 227(1), 31–41

Relationship Quality in Parents of Children with ADHD Compared to Parents of Healthy Children Our first research hypothesis that parents of a child with ADHD report lower interparental relationship quality than parents of healthy children was supported by our data. The meta-analytically computed overall standardized mean difference in the quality of the interparental relationship between the ADHD and the control group was d = .24 (p < .0001; CI [.14, .34]). The small, but significant and positive average effect size indicates a group difference in the proposed direction, that is, parents of a child with ADHD report a lower relationship quality than parents of a healthy child. A forest plot of all effect sizes can be found as Figure 2 in the Electronic Supplementary Material (ESM 1). The I2 value indicates that there was 60.21% variability in the effect sizes and the Q statistics was significant (p < .0001) indicating substantial heterogeneity in the data. Ó 2019 Hogrefe Publishing


L. Weyers et al., Interparental Relationships in Families of Children With ADHD

Moderators of the Mean Difference in Relationship Quality in Parents of Children With ADHD Compared to Parents of Healthy Children The second aim of this meta-analysis was to examine whether child age or gender, comorbidity with ODD or CD, or the source of the relationship rating moderates the strength of the mean difference in interparental relationship quality between families of children with ADHD compared to families of healthy children. For this purpose, a metaregression was performed as a first step with all moderators simultaneously to estimate the explained variance. In total, the moderators accounted for 89.91% of the heterogeneity (R2). The effects of child age (b = .14, p < .0001, 95% CI [ .19, .09]) and of the rater of the relationship (mother vs. parents: b = .79, p < .0001, 95% CI [ 1.16, .42]) were significant. Comorbidity with either ODD or CD, and gender were not significant as moderators. Including the two methodological control variables (publication year and sample size) 100% of the heterogeneity in the effect sizes was explained supporting the viability of the tested moderators. Next, the moderators were investigated separately (see Table 2 in ESM 1). Age explained the greatest amount of the heterogeneity (R2 = 58.91%) and the slope of the meta-regression was b = .05 (p < .001, 95% CI [ .08, .03]). This negative coefficient suggests that the effect size gets smaller with increasing age of the child. The rater source of the relationship explained 21.73% of variability and the slope was b = .15 for mothers (p = .011, 95% CI [.03, .27]), b = .60 for fathers (p < .001, 95% CI [.29, .91]), and b = .34 for both parents (p < .001, 95% CI [.14, .53]). The finding that all three categories of the variable yield significant effects does not support a moderation effect. In case of moderation, only one category has to yield significant results, but not the others. The publication year explained 17.18% of the variability (b = .02, p = .007, 95% CI [ .04, .01]). The negative coefficient suggests that more recent studies tended to have a smaller effect size than older studies. Gender accounted for 9.56% of the variability, but the slope was not significant (b = .01, p = .056, 95% CI [ .02, .00]). Comorbidity with either ODD or CD and sample size did not explain substantial variability (0.00% each). To assess whether there is a publication bias in the data, the Eggers Regression Test (Egger, Smith, Schneider, & Minder, 1997) was used. The test showed a plot asymmetry (p = .025). A trim-and-fill analysis (Duval & Tweedie, 2000) was performed estimating the number of studies missing due to suppression of the most extreme results on one side of the funnel plot. In this analysis five studies were added in the funnel plot. The results indicate that

Ó 2019 Hogrefe Publishing

37

there are no missing studies in the dataset (p = .500, df = 47) showing no publication bias. Taken together, our pattern of results suggests that the mean age of the children was a significant moderator in the expected direction. That is, Cohen’s d was larger for families of younger children with ADHD than for older supporting our hypothesis that the interparental relationship is especially affected in families with younger children with ADHD. The impact of the rater of the relationship appears less clear, because the effects yield significance in the simultaneous moderator analysis, but not in the separate. Moreover, the publication year was identified as an important methodological factor, as the effect size is smaller for more recent studies than for older ones.

Potential Dependence of Effect Sizes To determine if dependence of the effect sizes could limit the explanatory power of our analyses, sensitivity analyses using robust variance estimation were performed. The results showed that the values for the estimated mean effect size, the standard error, and τ2 remained robust for different set values of rho (ρ = 0; 0.2; 0.4; 0.6; 0.8; 1) indicating independence between the effect sizes (see Table 3 in ESM 1).

Discussion Poor interparental relationship quality may aggravate their child’s ADHD symptoms or the child’s behavior may be a strain on the parents’ relationship, but existing studies did not uniformly support that there is a robust correlation. It was the aim of this meta-analysis to investigate the strength of the difference in the interparental relationship between parents of children with ADHD compared to parents of healthy children. Our results suggest that there is indeed a significant but relatively small difference in the relationship quality between these two groups of parents. That is, we found evidence that the couple relationship of parents of affected children is characterized by less satisfaction and challenged by higher levels of conflict and distress. Notably, this finding was independent of the child’s comorbidity with behavioral problems (ODD/CD). This shows that our finding is probably not simply an artifact resulting from frequently observed noncompliant or defiant behavior that can accompany ADHD (as previous scholars suggested; Johnston & Mash, 2001), but it may be genuinely related to symptoms of ADHD per se. Furthermore, our analyses included studies from a wide range of populations (e.g., from the USA, India, Brazil, Iran, Slovenia) which shows that our results have substance and can be generalized across diverse cultural contexts.

Zeitschrift für Psychologie (2019), 227(1), 31–41


38

L. Weyers et al., Interparental Relationships in Families of Children With ADHD

As expected, the moderation by the child’s age indicates that the interparental relationship is particularly strained in families with younger children with ADHD. Possible explanations can be the characteristic change of symptoms from childhood to adolescence (O’Neill et al., 2017); hyperactivity and impulsivity, which may be particularly challenging for parenthood and the parents’ relationship (Cui et al., 2007), often decline in this period. Furthermore, children learn to regulate their emotions and behavior better as they grow older (Fox, 1994) and parents may even get more adapted to and more effective in dealing with the child’s disorder. A potential modulating factor in this context may also be the use of child- or family-oriented therapy early upon the onset of children’s symptoms, which can help to improve the maladaptive interaction of the child’s symptoms and the interparental relationship (Daley et al., 2017). In addition, the meta-regression with all moderators simultaneously suggests that the effect was smaller in studies where both parents evaluated the relationship together than in studies where the relationship quality was reported exclusively by the mother. This result is in line with previous findings of couple research showing that women generally report less relationship satisfaction than men or than the average of both parents (Jackson et al., 2014). Moreover, in many Western and European countries, mothers still hold the primary caregiving function for their children in the majority of the households. Given this traditional distribution of gender roles, it is plausible that mothers may be more affected by the child’s symptoms and related parenting problems. The difference between relationship rating of mothers versus fathers was not significant in the simultaneous moderator analyses, which may be because there were fewer effect sizes using just the father (k = 5) than the mother (k = 29) or both parents (k = 9) as the raters. However, the separate analyses of the rater of relationship as a moderator yielded significant results in each group indicating that there is no moderation effect. Further research should aim to clarify this issue, preferably in studies that include ratings from both partners in one study. Moreover, the effect of the publication year suggests that more recent studies tend to have a smaller effect size (which is often the case) and this supports the necessity to take this methodological factor into account in future meta-analyses. We cannot identify the reasons underlying this finding in our data, but potential explanations could be a general improvement in the accuracy of ADHD diagnoses over time, as the DSM diagnosis criteria get more precise. A positive development is that there have been many improvements in the therapy of childhood ADHD. In the last decade the literature on effective treatment of childhood ADHD was mainly focused on parent training, whether in combination with medication or not (Daley et al., 2017). Recently, the family systems theory and therapy with family- and Zeitschrift für Psychologie (2019), 227(1), 31–41

parent-based interventions have received increasing attention and seem to be especially effective in treating ADHD (Carr, 2014; von Sydow, Retzlaff, Beher, Haun, & Schweitzer, 2013). This approach integrates the child, but also the parents with their relationship problems more explicitly into the therapeutic process. This could contribute to an explanation for our finding that the effect sizes were smaller in more recent studies. There might also be a change in outcome reporting bias; generally, studies that report significant results are more likely to be published (Dwan et al., 2008). However, these interpretations remain tentative and additional research is needed to clarify this issue. Because some of the effect sizes that were available for our analyses came from the same sample, nonindependence of the effect sizes needed to be established. Importantly, our additional analyses support the robustness of our results. This indicates that the implications of the current meta-analysis do not substantially depend on potentially overlapping features of the synthesised studies or the corresponding samples. Thus, we argue that our results are of high validity and generalizability.

Practical Implications Taken together, this meta-analysis shows that parents of children with ADHD report higher negativity (more distress and conflict) and less positivity (less satisfaction). This finding is relevant for clinical evaluations and psychosocial interventions, particularly for parents with younger children. Our results highlight that it is important to integrate information about family relationships and, especially, the interparental relationship among diagnostic procedures and in psychotherapy for ADHD. Very likely, it is vital to diminish the reciprocal effects and mutual escalation between the child’s symptoms and interparental conflict at an early stage. It is therefore not surprising that the systemic family therapy appears to be a promising approach for treating childhood ADHD. From a systemic perspective, the family is regarded as an organized whole and behaviors or emotions of family members are inextricably interconnected and, as a result, must be conjointly addressed in treatments (Carr, 2014).

Limitations A few limitations qualify the present meta-analysis. First, we only included studies on parents who are currently in a relationship although it is known that divorce is more prevalent in families with a child with ADHD (Wymbs, Pelham, Molina, Gnagy, Wilson, et al., 2008). Thus, it is likely that the small effect we found actually underestimates the true effect. Second, all studies included in the meta-analysis Ó 2019 Hogrefe Publishing


L. Weyers et al., Interparental Relationships in Families of Children With ADHD

were cross-sectional so that it was not possible to examine causality. Therefore, we argue that the findings from the meta-analysis strongly encourage longitudinal studies in this field. Third, most of the studies did not control for parental ADHD and we were therefore unable to examine whether parents’ own psychopathology exacerbated effects. Likewise, only a few studies reported whether the children had pharmacological treatment and that made it impossible for us to investigate the potential impact of child medication on parental relationship quality. Fourth, a common problem of meta-analyses is the non-independence when multiple effect sizes are computed from one study (Beelmann & Bliesener, 1994). However, in this meta-analysis robust variance estimation was used to rule out potential dependence. According to recent methodological recommendations it would be even more powerful to conduct a multivariate meta-analysis with multiple levels to nest dependent data (Cheung, 2014), but the relatively small body of existing studies does not provide for adequate power for this approach (Van den Noortgate, López-López, Marín-Martínez, & Sánchez-Meca, 2013).

Conclusion To our knowledge, this is the first meta-analysis to integrate the existing findings on the quality of the interparental relationship in families of children with ADHD compared to parents of healthy children. It convincingly supports the notion that parents of affected children face particular challenges and specific difficulties that likely spill over to their intimate relationship. We believe that this study emphasizes the importance of prevention of interparental discord in families of children with ADHD. In clinical practice the improvement of the negative interdependence of the child’s symptoms and interparental conflict may be pivotal to reap a double benefit: enhancement of the child’s and the parents’ well-being.

Electronic Supplementary Material The electronic supplementary material is available with the online version of the article at https://doi.org/10.1027/ 2151-2604/a000354 ESM 1. Figure and tables (.pdf) Forest plot of studies included in the meta-analysis. Effect sizes and variance estimates/explanations.

References *Studies included in the meta-analysis are marked with an asterisk.

Ó 2019 Hogrefe Publishing

39

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC: American Psychiatric Association. Banaschewski, T., Poustka, L., & Holtmann, M. (2011). Autismus und ADHS über die Lebensspanne [Autism and ADHD across the lifespan]. Der Nervenarzt, 82, 573–581. https://doi.org/ 10.1007/s00115-010-3239-6 Banaschewski, T., Roessner, V., Uebel, H., & Rothenberger, A. (2004). Überblick Neurobiologie der Aufmerksamkeitsdefizit-/ Hyperaktivitätsstörung (ADHS) [Overview of the neurobiology of attention-deficit/hyperactivity disorder (ADHD)]. Kindheit und Entwicklung, 13, 137–147. https://doi.org/10.1026/0942-5403. 13.3.137 *Barkley, R. A., Anastopoulos, A. D., Guevremont, D. C., & Fletcher, K. E. (1992). Adolescents with attention deficit hyperactivity disorder: Mother–adolescent interactions, family beliefs and conflicts, and maternal psychopathology. Journal of Abnormal Child Psychology, 20, 263–288. https://doi.org/ 10.1007/BF00916692 Beelmann, A., & Bliesener, T. (1994). Aktuelle Probleme und Strategien der Metaanalyse [Current issues and strategies in meta-analysis]. Psychologische Rundschau, 45, 211–233. *Befera, M. S., & Barkley, R. A. (1985). Hyperactive and normal girls and boys: Mother-child interaction, parent psychiatric status and child psychopathology. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 26, 439–452. https://doi. org/10.1111/j.1469-7610.1985.tb01945.x *Ben-Naim, S., Gill, N., Laslo-Roth, R., & Einav, M. (2018). Parental stress and parental self-efficacy as mediators of the association between children’s ADHD and marital satisfaction. Journal of Attention Disorders, 23, 506–516. https://doi.org/ 10.1177/1087054718784659 Biederman, J., Mick, E., Faraone, S. V., Braaten, E., Doyle, A., Spencer, T., . . . Johnson, M. A. (2002). Influence of gender on attention deficit hyperactivity disorder in children referred to a psychiatric clinic. American Journal of Psychiatry, 159, 36–42. https://doi.org/10.1176/appi.ajp.159.1.36 Carr, A. (2014). The evidence base for family therapy and systemic interventions for child-focused problems. Journal of Family Therapy, 36, 107–157. https://doi.org/10.1111/1467-6427.12032 *Castro, T. B. (2017). Slovenian families with children with attention-deficit/hyperactivity disorder: Interpersonal relations, parents’ attention-deficit/hyperactivity disorder symptoms and implications for family therapy. Journal of Family Psychotherapy, 28, 170–186. https://doi.org/10.1080/08975353.2017.1288993 Caye, A., Spadini, A. V., Karam, R. G., Grevet, E. H., Rovaris, D. L., Bau, C. H. D., . . . Kieling, C. (2016). Predictors of persistence of ADHD into adulthood: A systematic review of the literature and meta-analysis. European Child & Adolescent Psychiatry, 25, 1151–1159. https://doi.org/10.1007/s00787-016-0831-8 Cheung, K., Aberdeen, K., Ward, M. A., & Theule, J. (2018). Maternal depression in families of children with ADHD: A meta-analysis. Journal of Child and Family Studies, 27, 1015–1028. https://doi.org/10.1007/s10826-018-1017-4 Cheung, K., & Theule, J. (2016). Parental psychopathology in families of children with ADHD: A meta-analysis. Journal of Child and Family Studies, 25, 3451–3461. https://doi.org/ 10.1007/s10826-016-0499-1 Cheung, M. (2014). Modeling dependent effect sizes with threelevel meta-analyses: A structural equation modeling approach. Psychological Methods, 19, 211–229. https://doi.org/10.1037/ a0032968 Cui, M., Donnellan, M. B., & Conger, R. D. (2007). Reciprocal influences between parents’ marital problems and adolescent internalizing and externalizing behavior. Developmental Psychology, 43, 1544–1552. https://doi.org/10.1037/0012-1649.43.6.1544

Zeitschrift für Psychologie (2019), 227(1), 31–41


40

L. Weyers et al., Interparental Relationships in Families of Children With ADHD

Cummings, E. M., & Davies, P. T. (2010). Marital conflict and children: An emotional security perspective. New York, NY: Guilford Press. Daley, D., Van Der Oord, S., Ferrin, M., Cortese, S., Danckaerts, M., Doepfner, M., . . . Asherson, P. (2017). Practitioner review: Current best practice in the use of parent training and other behavioural interventions in the treatment of children and adolescents with attention deficit hyperactivity disorder. Journal of Child Psychology and Psychiatry, 59, 932–947. https:// doi.org/10.1111/jcpp.12825 *Dutta, M. M., & Sanyal, N. (2015). A comparative study of marital quality and family pathology of parents of ADHD and non-ADHD children. Indian Journal of Community Psychology, 11, 226–233. Duval, S., & Tweedie, R. (2000). Trim and fill: A simple funnel-plotbased method of testing and adjusting for publication bias in meta-analysis. Biometrics, 56, 455–463. https://doi.org/ 10.1111/j.0006-341X.2000.00455.x Dwan, K., Altman, D. G., Arnaiz, J. A., Bloom, J., Chan, A.-W., Cronin, E., . . . Gamble, C. (2008). Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS One, 3, e3081. https://doi.org/10.1371/ journal.pone.0003081 Eakin, L., Minde, K., Hechtman, L., Ochs, E., Krane, E., Bouffard, R., . . . Looper, K. (2004). The marital and family functioning of adults with ADHD and their spouses. Journal of Attention Disorders, 8, 1–10. https://doi.org/10.1177/108705470400800101 *Edwards, G., Barkley, R. A., Laneri, M., Fletcher, K., & Metevia, L. (2001). Parent–adolescent conflict in teenagers with ADHD and ODD. Journal of Abnormal Child Psychology, 29, 557–572. https://doi.org/10.1023/A:1012285326937 Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629–634. https://doi.org/10.1136/bmj. 315.7109.629 Eisenbarth, H., Alpers, G. W., Conzelmann, A., Jacob, C. P., Weyers, P., & Pauli, P. (2008). Psychopathic traits in adult ADHD patients. Personality and Individual Differences, 45, 468–472. https://doi.org/10.1016/j.paid.2008.05.022 El-Sheikh, M., Keller, P. S., & Erath, S. A. (2007). Marital conflict and risk for child maladjustment over time: Skin conductance level reactivity as a vulnerability factor. Journal of Abnormal Child Psychology, 35, 715–727. https://doi.org/10.1007/ s10802-007-9127-2 Faraone, S. V., Asherson, P., Banaschewski, T., Biederman, J., Buitelaar, J. K., Ramos-Quiroga, J. A., . . . Franke, B. (2015). Attention-deficit/hyperactivity disorder. Nature Reviews Disease Primers, 1, 15020. https://doi.org/10.1038/nrdp.2015.20 Faraone, S. V., Perlis, R. H., Doyle, A. E., Smoller, J. W., Goralnick, J. J., Holmgren, M. A., & Sklar, P. (2005). Molecular genetics of attention-deficit/hyperactivity disorder. Biological Psychiatry, 57, 1313–1323. https://doi.org/10.1016/j.biopsych.2004.11.024 Fischer, M. (1990). Parenting stress and the child with attention deficit hyperactivity disorder. Journal of Clinical Child Psychology, 19, 337–346. https://doi.org/10.1207/s15374424jccp1904_5 Fisher, Z., Tipton, E., & Zhipeng, H. (2017). robumeta: An R-package for robust variance estimation in meta-analysis. Retrieved from http://cran.uni-muenster.de/web/packages/ robumeta/robumeta.pdf Fox, N. A. (1994). The development of emotion regulation: Biological and behavioral considerations (Vol. 59). Chicago, IL: University of Chicago Press. *Goldstein, L. H., Harvey, E. A., Friedman-Weieneth, J. L., Pierce, C., Tellert, A., & Sippel, J. C. (2007). Examining subtypes of behavior problems among 3-year-old children, part II: Investigating differences in parent psychopathology, couple conflict, and other family stressors. Journal of Abnormal Child

Zeitschrift für Psychologie (2019), 227(1), 31–41

Psychology, 35, 111–123. https://doi.org/10.1007/s10802-0069088-x *Green, J. L., Rinehart, N., Anderson, V., Efron, D., Nicholson, J. M., Jongeling, B., . . . Sciberras, E. (2016). Association between autism symptoms and family functioning in children with attention-deficit/hyperactivity disorder: A communitybased study. European Child & Adolescent Psychiatry, 25, 1307–1318. https://doi.org/10.1007/s00787-016-0861-2 Grimm, O., Kittel-Schneider, S., & Reif, A. (2018). Recent developments in the genetics of attention-deficit hyperactivity disorder. Psychiatry and Clinical Neurosciences, 72, 654–672. https://doi.org/10.1111/pcn.12673 Hart, E. L., Lahey, B. B., Loeber, R., Applegate, B., & Frick, P. J. (1995). Developmental change in attention-deficit hyperactivity disorder in boys: A four-year longitudinal study. Journal of Abnormal Child Psychology, 23, 729–749. https://doi.org/ 10.1007/BF01447474 Hedges, L.V., & Olkin, I. (1985). Statistical methods for metaanalysis. Cambridge, MA: Academic Press. Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-regression with dependent effect size estimates. Research Synthesis Methods, 1, 39–65. https://doi. org/10.1002/jrsm.17 Heinrichs, N., Cronrath, A. -L., Degen, M., & Snyder, D. K. (2010). The link between child emotional and behavioral problems and couple functioning. Family Science, 1, 152–172. https://doi.org/ 10.1080/19424620.2010.569366 Jackson, J. B., Miller, R. B., Oka, M., & Henry, R. G. (2014). Gender differences in marital satisfaction: A meta-analysis. Journal of Marriage and Family, 76, 105–129. https://doi.org/10.1111/ jomf.12077 Jenkins, J., Simpson, A., Dunn, J., Rasbash, J., & O’Connor, T. G. (2005). Mutual influence of marital conflict and children’s behavior problems: Shared and nonshared family risks. Child Development, 76, 24–39. https://doi.org/10.1111/j.1467-8624. 2005.00827.x *Johnston, C. (1996). Parent characteristics and parent-child interactions in families of nonproblem children and ADHD children with higher and lower levels of oppositional-defiant behavior. Journal of Abnormal Child Psychology, 24, 85–104. https://doi.org/10.1007/BF01448375 Johnston, C., & Mash, E. J. (2001). Families of children with attention-deficit/hyperactivity disorder: Review and recommendations for future research. Clinical Child and Family Psychology Review, 4, 183–207. https://doi.org/10.1023/ A:1017592030434 *Kachooei, H., Daneshmand, R., Dolatshahi, B., Samadi, R., & Samiei, M. (2016). Attention-deficit/hyperactivity disorder and martial satisfaction: The preliminary roles of employment and income. Iranian Journal of Psychiatry and Behavioral Sciences, 10, e4012. https://doi.org/10.17795/ijpbs-4012 Knapp, G., & Hartung, J. (2003). Improved tests for a random effects meta-regression with a single covariate. Statistics in Medicine, 22, 2693–2710. https://doi.org/10.1002/sim.1482 Konrad, K., & Rösler, M. (2009). Aufmerksamkeitsdefizit-/Hyperaktivitätssyndrom in der Lebensspanne [ADHD across the lifespan]. Der Nervenarzt, 80, 1302–1311. https://doi.org/ 10.1007/s00115-009-2810-5 Lewis, K. (1992). Family functioning as perceived by parents of boys with attention deficit disorder. Issues in Mental Health Nursing, 13, 369–386. https://doi.org/10.3109/01612849209010317 Lindahl, K. M. (1998). Family process variables and children’s disruptive behavior problems. Journal of Family Psychology, 12, 420–436. https://doi.org/10.1037/0893-3200.12.3.420 Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.

Ó 2019 Hogrefe Publishing


L. Weyers et al., Interparental Relationships in Families of Children With ADHD

Melfsen, S., Alpers, G. W., Walitza, S., & Warnke, A. (2006). Angstsensitivität bei Kindern mit Aufmerksamkeitsdefizit-/ Hyperaktivitätsstörung [Anxiety sensitivity in children with ADHD]. Verhaltenstherapie, 16, 25–30. https://doi.org/10.1159/ 000091595 Mitnick, D. M., Heyman, R. E., & Smith Slep, A. M. (2009). Changes in relationship satisfaction across the transition to parenthood: A meta-analysis. Journal of Family Psychology, 23, 848–852. https://doi.org/10.1037/a0017004 *Mohammadi, M. R., Farokhzadi, F., Alipour, A., Rostami, R., Dehestani, M., & Salmanian, M. (2012). Marital satisfaction amongst parents of children with Attention deficit hyperactivity disorder and normal children. Iranian Journal of Psychiatry, 7, 120–125. *Murphy, K. R., & Barkley, R. A. (1996). Parents of children with attention-deficit/hyperactivity disorder: Psychological and attentional impairment. American Journal of Orthopsychiatry, 66, 93–102. https://doi.org/10.1037/h0080159 O’Neill, S., Rajendran, K., Mahbubani, S. M., & Halperin, J. M. (2017). Preschool predictors of ADHD symptoms and impairment during childhood and adolescence. Current Psychiatry Reports, 19, 95. https://doi.org/10.1007/s11920-017-0853-z Park, J. L., Hudec, K. L., & Johnston, C. (2017). Parental ADHD symptoms and parenting behaviors: A meta-analytic review. Clinical Psychology Review, 56, 25–39. https://doi.org/10.1016/ j.cpr.2017.05.003 *Pheula, G. F., Rohde, L. A., & Schmitz, M. (2011). Are family variables associated with ADHD, inattentive type? A case– control study in schools. European Child & Adolescent Psychiatry, 20, 137–145. https://doi.org/10.1007/s00787-011-0158-4 Polanczyk, G., de Lima, M. S., Horta, B. L., Biederman, J., & Rohde, L. A. (2007). The worldwide prevalence of ADHD: A systematic review and metaregression analysis. American Journal of Psychiatry, 164, 942–948. https://doi.org/10.1176/appi.ajp.164.6.942 Polanczyk, G. V., Salum, G. A., Sugaya, L. A., Caye, A., & Rhode, L. A. (2015). Annual research review: A meta-analysis of the worldwide prevalence of mental disorders in children and adolescents. Journal of Child Psychology and Psychiatry, 56, 345–365. https://doi.org/10.1111/jcpp.12381 Schmidt, S., & Petermann, F. (2009). Developmental psychopathology: Attention deficit hyperactivity disorder (ADHD). BMC Psychiatry, 9, 58–68. https://doi.org/10.1186/1471-244X-9-58 *Sochos, A., & Yahya, F. (2015). Attachment style and relationship difficulties in parents of children with ADHD. Journal of Child and Family Studies, 24, 3711–3722. https://doi.org/10.1007/ s10826-015-0179-6 Spanier, G. B. (1988). Assessing the strengths of the Dyadic Adjustment Scale. Journal of Family Psychology, 2, 92–94. https://doi.org/10.1037/h0080477 Theule, J., Wiener, J., Tannock, R., & Jenkins, J. M. (2013). Parenting stress in families of children with ADHD: A metaanalysis. Journal of Emotional and Behavioral Disorders, 21, 3–17. https://doi.org/10.1177/1063426610387433 Turgay, A., Goodman, D. W., Asherson, P., Lasser, R. A., Babcock, T. F., Pucci, M. L., & Barkley, R. (2012). Lifespan persistence of ADHD: The life transition model and its application. Journal of Clinical Psychiatry, 73, 192–201. https://doi.org/10.4088/ JCP.10m06628 Van den Noortgate, W., López-López, J. A., Marín-Martínez, F., & Sánchez-Meca, J. (2013). Three-level meta-analysis of dependent effect sizes. Behavior Research Methods, 45, 576–594. https://doi.org/10.3758/s13428-012-0261-6

Ó 2019 Hogrefe Publishing

41

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Statistical Software, 36, 1–48. von Sydow, K., Retzlaff, R., Beher, S., Haun, M. W., & Schweitzer, J. (2013). The efficacy of systemic therapy for childhood and adolescent externalizing disorders: A systematic review of 47 RCT. Family Process, 52, 576–618. https://doi.org/10.1111/ famp.12047 Williamson, D., & Johnston, C. (2016). Marital and coparenting relationships: Associations with parent and child symptoms of ADHD. Journal of Attention Disorders, 20, 684–694. https://doi. org/10.1177/1087054712471717 *Wymbs, B. T., Pelham, W. E. J., Molina, B. S. G., & Gnagy, E. M. (2008). Mother and adolescent reports of interparental discord among parents of adolescents with and without attention deficit/hyperactivity disorder. Journal of Emotional and Behavioral Disorders, 16, 29–41. https://doi.org/10.1177/ 1063426607310849 Wymbs, B. T., Pelham, W. E. J., Molina, B. S. G., Gnagy, E. M., Wilson, T. K., & Greenhouse, J. B. (2008). Rate and predictors of divorce among parents of youths with ADHD. Journal of Consulting and Clinical Psychology, 76, 735–744. https://doi. org/10.1037/a0012719 Wymbs, B. T., Wymbs, F. A., & Dawson, A. E. (2015). Child ADHD and ODD behavior interacts with parent ADHD symptoms to worsen parenting and interparental communication. Journal of Abnormal Child Psychology, 43, 107–119. https://doi.org/ 10.1007/s10802-014-9887-4 Zemp, M. (2018). Die elterliche Paarbeziehung in Familien mit Kindern mit ADHS: Wechselwirkungen zwischen Partnerschaftsstörungen und kindlicher Symptomatik [Interparental relationship in families with children with ADHD: Interrelations between couple distress and child’s symptoms]. Zeitschrift für Kinder- und Jugendpsychiatrie und Psychotherapie, 46, 285– 297. https://doi.org/10.1024/1422–4917/a000558 Zemp, M., Bodenmann, G., Backes, S., Sutter-Stickel, D., & Bradbury, T. N. (2016). Positivity and negativity in interparental conflict: Implications for children. Swiss Journal of Psychology, 75, 167–173. https://doi.org/10.1024/1421-0185/a000182 Zemp, M., Nussbeck, F. W., Cummings, E. M., & Bodenmann, G. (2017). The spillover of child-related stress into parents’ relationship mediated by couple communication. Family Relations, 66, 317–330. https://doi.org/10.1111/fare.12244 History Received February 27, 2018 Revision received October 26, 2018 Accepted October 28, 2018 Published online March 29, 2019 Acknowledgments We thank our research assistant Nicole Groenhagen for helping with the coding procedure. Martina Zemp Department of Psychology University of Vienna Renngasse 6–8 1010 Vienna Austria martina.zemp@univie.ac.at

Zeitschrift für Psychologie (2019), 227(1), 31–41


Review Article

Intra-Individual Value Change in Adulthood A Systematic Literature Review of Longitudinal Studies Assessing Schwartz’s Value Orientations Carolin Schuster1, Lisa Pinkowski1, and Daniel Fischer1,2 1

Institute of Psychology, Leuphana University, Lüneburg, Germany

2

School of Sustainability, Arizona State University, Tempe, AZ, USA

Abstract: Values guide people in their lives as overarching principles of judgments and decision making. Focusing on Schwartz’s circumplex value model, the present work is the first systematic literature review (SLR) to comparatively synthesize the empirical evidence regarding stability and change of values in adulthood. Besides understanding the extent of value change, the aim of this review is to reveal the conditions under which values change. The search procedure and screening revealed 19 publications reporting empirical studies on 25 adult samples containing at least two measurements of Schwartz’s values in respondents. Results suggest moderate to high rank-order stabilities of values, even through potentially life-changing transitions. There is evidence of small changes, rarely consistent with theoretical predictions or cross-sectional findings. Preliminary experimental evidence shows that values can be changed with interventions. We identify considerable gaps in knowledge about value change and propose promising avenues for further research. Keywords: value change, value stability, value profiles, Schwartz’s value theory, longitudinal studies

Values represent “guiding principles in people’s lives” (Schwartz & Bardi, 2001, p. 269) that are thought to organize more context-specific attitudes and goals and predict various behaviors; for example, voting (Vecchione, Caprara, Dentale, & Schwartz, 2013) or pro-environmental behavior (Thøgersen & Ölander, 2002). Some values – namely, those related to helping or protecting others – are seen as more socially desirable than others. They are considered to be moral values (see Schwartz, 2007). Societies thus have an interest in promoting such values. However, it remains an open question whether, to what extent, and under what conditions values change in adults. Typically, the formation of a stable value system is seen as a developmental process during childhood and adolescence (see Knafo & Schwartz, 2010). Developmental theories also focus on how children and adolescents acquire moral reasoning abilities through several stages culminating in a mature state (Kohlberg & Kramer, 1969; Piaget, 1948). While values are conceptually different from moral reasoning, these developmental stages could be understood as the increasing adoption of moral, pro-social values, which then stabilize in early adulthood. Previous cross-sectional research has shown correlations of these kinds of values with age (Schwartz et al., 2001), possibly resulting from individual development. However, given Zeitschrift für Psychologie (2019), 227(1), 42–52 https://doi.org/10.1027/2151-2604/a000355

the lack of longitudinal data it could also be a difference between age cohorts. While there does not seem to be much longitudinal research on value change in adults, there is a theoretical model that posits that value change can happen in either direction by processes similar to attitude change (Bardi & Goodwin, 2011). To a large extent, however, this model is not based on empirical research on value change but by analogy to other well-known processes (e.g., consistency maintenance). In order to clarify whether values change in adulthood, a systematic synthesis of existing findings on the changeability of values is needed. This paper undertakes a systematic literature review (SLR) to present such a synthesis. First, we outline the construct and measurement of basic personal values. Then, we describe the method of the SLR and summarize the resulting sample of studies. Finally, we discuss the findings with regard to conclusions on the stability and change of values and conclude by suggesting next steps for closing research gaps.

Values as a Psychological Construct According to previous reviews on values (Schwartz, 1994; Schwartz & Bilsky, 1987), a value can be defined as “a Ó 2019 Hogrefe Publishing


C. Schuster et al., Value Change in Adulthood

(1) belief (2) pertaining to desirable end states or modes of conduct, that (3) transcends specific situations, (4) guides selection or evaluation of behavior, people, and events, and (5) is ordered by importance relative to other values to form a system of value priorities” (Schwartz, 1994, p. 20). Previous theories differ less in their definitions of values than in which values are considered relevant and in how their relationship is conceptualized (Rokeach, 1973; Schwartz & Bilsky, 1987; Vernon & Allport, 1931). Psychological research on values has a long history (Vernon & Allport, 1931). One of the currently best theoretically founded and empirically validated model of values is the circumplex model of Schwartz (1992, 1994), Schwartz and Bilsky (1987). Based on extensive cross-cultural studies, the model proposes a universal set of 56 specific values clustered into 10 distinct value types. These specific values, and accordingly the value types, are arranged on two orthogonal value dimensions (or four value clusters depicting the poles of these dimensions) in a circular pattern, with four value clusters depicting the poles of the two dimensions. The model predicts that values or value types on opposite sides of the circle correlate negatively, whereas neighboring values correlate positively. For example, people who prioritize having power, also tend to care about achievement but less about universalism (e.g., values of equality and unity with nature). Due to its wide-spread use and comprehensiveness, this SLR focuses exclusively on Schwartz’s approach.

43

Research Questions In summary, there are various reasons to assume that values are relatively stable over time: by definition, they transcend situations; they are overarching, abstract principles in a person’s belief system; they are deeply engrained in the individual’s sense of identity (see Hitlin, 2003); and they might be the consequence of a developmental process completed by adulthood. However, it is also plausible to expect that life experiences or transitions, the continuous influence of other people, or even maturation will affect a person’s basic value priorities. Given societal interest in value change and the lack of a reliable synthesis of empirical evidence on value change, this review aims to answer two research questions with regard to a general population: 1. How intra-individually stable are basic human values over time in adulthood? 2. To what extent and under which conditions do intraindividual changes in values occur in adulthood?

Method To answer these research questions, a SLR was conducted. A SLR is an “explicit, comprehensive, and reproducible method for identifying, evaluating, and interpreting the existing body of original work produced by researchers and scholars” (Fink, 2014, p. 36). The steps of this method are summarized in Figure 1.

Measuring Values There are several ways to measure the values proposed by Schwartz. The original measure, the Schwartz Value Survey (SVS), asks participants to rate the importance of each of the 56 values on a 9-point scale ranging from 1 (= opposed to my values) to 0 (= not important) and finally 7 (= of supreme importance). Schwartz (2013) recommends correcting participant ratings for response tendencies (i.e., their mean rating). A short 12-item version of this scale has been proposed by Stern, Dietz, and Guagnano (1998). The Portrait Values Questionnaire (PVQ) was developed as a measure more suitable for some samples (Schwartz et al., 2001). It contains 40 short descriptions of persons, each of which are rated based on their similarity to the respondent on a 6-point scale (1 = very much like me to 6 = not like me at all). There is also a 21-item short version. Another survey, the Schwartz Values Best-Worst Survey (SVBWS), was developed by Lee, Soutar, and Louviere (2008), with respondents asked to choose one value item as the most important and another one as the least important from different sets of values constructed from all 56 value items.

Ó 2019 Hogrefe Publishing

Search Strings and Inclusion Criteria We chose SCOPUS as a major database for peer-reviewed psychological research to extract an initial database sample. The search was restricted to peer-reviewed journal articles written in English or German in the field of psychology. The search strings were chosen to find as many relevant papers as possible while producing a minimum amount of irrelevant papers. We searched for the combined terms “value change” OR “values change” OR “change of values” OR “value stability” OR “stability of values” in article titles, keywords, and abstracts. The original search was conducted in April 2017 and updated during the revision of the article (2018/06/26). In order to attain a sample of comparable publications well suited to answering our research questions, we defined four inclusion and exclusion criteria. First, we looked for empirical studies with at least two temporally distinct value measurements of the same individuals. Second, only studies with a focus on general values as conceptualized by Schwartz were included. Third, values had to be

Zeitschrift für Psychologie (2019), 227(1), 42–52


44

C. Schuster et al., Value Change in Adulthood

external experts chosen among the authors most frequently appearing in the pre-final sample. Five authors were contacted (see Acknowledgments), all of whom replied. These experts suggested four additional papers, one of which fitted the inclusion criteria, resulting in a final sample of N = 19 articles. The screening protocol and the PRISMA statement are available as supplemental material at https://doi.org/ 10.17605/OSF.IO/E73CA or http://dx.doi.org/10.23668/ psycharchives.926.

Results and Discussion Categorization of the Final Sample of Studies

Figure 1. PRISMA flowchart of the search and screening procedure.

measured with a scale based on a version of Schwartz’s circumplex model to ensure comparability and conceptual clarity. Fourth, the samples had to be of the general adult population (18 years or older and without disabilities or clinical conditions).

Screening Procedure Two screening procedures were applied to all publications identified in the iterative search process. In the practical screening, the titles and abstracts (if available) of the initial sample of articles were read and those not fulfilling the criteria were excluded. Next, in the in-depth screening (eligibility check) the full texts of the remaining papers were examined and further irrelevant papers were excluded. Out of the papers identified through the initial database search (n = 160) and the hand search (n = 2), n = 42 papers passed the first screening step. Of these papers, another subset of n = 13 passed the second screening step (eligibility check) and formed the preliminary sample (n = 13). To enhance the comprehensiveness of the preliminary sample, a citation and a reference search was conducted using the Scopus database. The papers identified through citation search (n = 153) yielded another four additional articles fitting the inclusion criteria of both screening steps. In the reference search, another n = 45 papers were identified, with one fitting the inclusion criteria of both screening steps. The preliminary sample together with the five articles identified through citation and reference searches formed the pre-final sample (n = 18), which was then reviewed by Zeitschrift für Psychologie (2019), 227(1), 42–52

The 19 articles yielded a total of 27 relevant reported studies, all of which were published within the last two decades (see Table 1). Two articles presented studies on participant samples already analyzed in another article, thus reducing the total number of relevant studies to 25. The studies were categorized into three groups according their design and focus. Five studies from four articles were general longitudinal studies, observing stability or changes of values in the general population over a period of 1–8 years. Twelve studies from ten articles were longitudinal transition studies of value changes in the context of certain potentially lifechanging transitions or environmental changes. Nine experimental studies from five articles were intervention studies testing the effects of specific interventions to initiate value change. Furthermore, the studies used different methods to relate the measurement points to each other. One common way was to determine mean-level changes in values over two time points. A second common way was to examine rankorder stability for each value within a sample over two time points. A third less frequently used approach was to examine the stability of value profiles (intra-individual correlations of value rankings) over two time points. On the group level, the mean profile stability thus indicates how stable average individual priorities within the whole set of values are across time. Besides indicators of stability and change, for each study the population, sample size, and, if available, theoretical reasoning will be reported in order to provide some general indicators of interpretability or potential bias of the results.

General Longitudinal Studies: How Stable Are Values in General? Four empirical studies examined the longitudinal change of values without any interventions or specific variations of Ó 2019 Hogrefe Publishing


C. Schuster et al., Value Change in Adulthood

45

Table 1. Final sample of papers in the systematic literature review Reference

N

Age M (SD)

Time span

External influence

(College graduation)

General longitudinal studies 1. Dobewall and Aavik (2016)

53

26 (9)

3 years

3,962/5,156

49 (15)

3 years

1,090

> 17

1 year

30%: New waste disposal

107

All 21–22

8 years

(Transition to adulthood)

5. Bardi et al. (2009) (Study 2)

129

20 (4)

1 year

College

6. Bardi et al. (2009) (Study 3)

119

20 (4)

3 months

College

7. Bardi et al. (2009) (Study 4)

135

39 (12)

2 years

Stressful life events

8. Myyry et al. (2013)

132

26 (7)

3 years

College

9 months

Police training

2. Milfont et al. (2016) 3. Thøgersen and Ölander (2002) 4. Vecchione et al. (2016) Longitudinal transition studies

9. Bardi et al. (2014) (Study 1)

81

27 (7)

10. Bardi et al. (2014) (Study 2)

131/65

18/19 (3/1)

11. Bardi et al. (2014) (Study 3); see also Goodwin et al. (2011)

151

12. Lönnqvist et al. (2011, 2013) 13. Sundberg (2016) 14. Vecchione et al., 2013 (Study 1)

2 years

Psychology/business major

27 (7)

1.5 years

Migration

136

44 (14)

2.5 years

Migration

129

In their 20s

1,030

44 (18)

6–7 months

Deployment to war zone

7 weeks

Election

15. Bègue and Apostolidis (2000)

53

20

16. Lönnqvist et al. (2018)

292

women: 32 (4)/men: 34 (34)

3 months

National involvement in war

< 1 year

Birth of first child

17. Arieli et al. (2014) (Experiment 1)

36

19 (1)

s. s.

Self-persuasion task

18. Arieli et al. (2014) (Experiment 2)

48

23 (2)

2 weeks

Self-persuasion task

19. Arieli et al. (2014) (Experiment 3)

58

4 weeks

Self-persuasion task

20. Bernard et al. (2003) (Experiment 1)

100

21

s. s.

Reasoning task

21. Maio and Olson (1998) (Experiment 1)

77

Students

s. s.

Reasoning task

22. Maio and Olson (1998) (Experiment 2)

138

Students

s. s.

Reasoning task

23. Maio and Olson (1998) (Experiment 3)

144

Students

s. s.

Reasoning task

24. Maio et al. (2009) (Experiment 1)

175

Students

s. s.

Directed reasoning task

135/48

19

s. s./3 months

Anticipating actualization

Experimental intervention studies

25. Hirose (2004)

Note. Studies are presented and numbered in order of appearance in the SLR results section. Number 12 has two references because follow-up analyses of the same sample were published separately. Number 11 has two references because both studies use the same participant sample and fit inclusion criteria, however, the stability and change statistics are reported in Bardi et al. 2009, Study 3. N refers to the number of participants who completed all measurement points. Time span refers to the difference between the first and the last measurement points (s. s. = same session). External influences in parentheses are not reported as such by authors, but could be derived from the sample description.

conditions between measurement points. These studies provide some information about the general stability of values, and possible aging effects. Vecchione and colleagues (2016) surveyed the longest period (8 years; N = 107). Analyses of stability and change were, as in most studies, conducted on the level of value types. Even over an 8-year period, values were considerably stable, with an average correlation coefficient of r = .66. This is comparable to the rank-order stability of personality traits in adulthood (Roberts & DelVecchio, 2000). The least stable value type was power (r = .51) and the most stable was self-direction (r = .82). Despite the high stability of participant rank in the sample, with the exception of the three values in the openness to change category (self-direction, stimulation, and hedonism), the means of all value types changed significantly over the 8-year period. Self-transcendence and Ó 2019 Hogrefe Publishing

conservation values increased, whereas power and achievement values decreased. However, most changes were small, and they often occurred between the first and second measurements. The most interesting finding in this study may concern the stability of individual profiles, which ranged from r = –.30 to .89, M = 0.59, SD = 0.25. While most people’s profiles were at least moderately stable, 5% of all participants had profile correlations below r = .14. This shows that the extent of change in value priorities varies highly among individuals. The only other study covering all three indicators examined a smaller sample (N = 53) over the course of 3 years (Dobewall & Aavik, 2016). The average rank-order stability of value self-reports was r = .50, ranging from r = .21 (ns, tradition) to r = .65 (conformity). Mean levels in conformity values slightly decreased. The mean stability of value Zeitschrift für Psychologie (2019), 227(1), 42–52


46

profiles was M = 0.67, SD = 0.28, which is similar to the findings of Vecchione and colleagues. The largest longitudinal sample stems from a countrywide New Zealand four-wave study (Milfont, Milojev, & Sibley, 2016). Different Bayesian analyses were used to examine two out of four measurements, with the respective sample sizes being 3,962 for the 3-year rank-order stability analysis and 5,156 for the 4-year mean-level change (i.e., latent growth models). The sample is also uniquely diverse in age (Min = 25, Max = 75, M = 50, SD = 15). The measures on the level of the value clusters conservation, openness to change, self-transcendence, and self-enhancement all showed moderate rank-order stabilities, with Bayesian point estimates ranging from β = .55 to .60. In addition, there were age effects on the stability of conservation values (but not of other value clusters), which became more stable in early adulthood, slightly less stable between the age 40 and 60, and then more stable again. The analyses of mean-level changes showed that all value clusters decreased significantly in importance between measurements, which might be an artifact (Shrout et al., 2017). Finally, another large sample (N = 1,090) was surveyed twice over a 1-year period (Thøgersen & Ölander, 2002) but only with regard to self-transcendence and selfenhancement values (namely, benevolence, universalism, power, achievement, and hedonism). The 1-year stability of these values was high, ranging from r = .56 (hedonism) to r = .68 (universalism). There were no significant meanlevel changes on the level of the value types, and the reported single item changes might be spurious. Interestingly, the stability of universalism in the fraction of the sample with more opportunity to show universalism-consistent environmentally friendly behavior (participants from an area where a new recycling system had been introduced) was in trend even higher than for the rest. In summary, this type of studies provides convincing evidence of moderate-to-high rank-order stability of value types. While mean-level changes were significant in three out of four studies, there is no recognizable pattern and the changes were rather small. In addition, individual value profile stability was high on average but ranged widely.

Longitudinal Transition Studies: How Stable Are Values Throughout Life Change? One approach to examining value change and its potential causes is to accompany participants through life transitions (i.e., measurements before and after potentially relevant events). Eleven studies in our final sample used this approach. One study (Bardi et al, 2009, Study 4) measured the extent of individual life-changing events (e.g., death of a Zeitschrift für Psychologie (2019), 227(1), 42–52

C. Schuster et al., Value Change in Adulthood

spouse) between the two value measurements (N = 135). The stability of values over 2 years was lower than found previously with other measures, ranging from r = .26 (power) to .58 (self-direction). The only significant meanlevel change in the sample was an increase in the importance of hedonism. A multiple regression analysis of absolute change in all values showed that the extent of lifechanging events was a significant predictor, whereas age was not significant (R2 = 0.08). Educational Transitions Five studies examined value change in the context of the transition to higher education or vocational training. Going to college is a life-changing event for many young adults. Two studies by Bardi and colleagues (2009; Study 2 and 3) sampled university students at the beginning of their first year and again at the beginning of their second year (N = 129) or after 3 months (N = 119), respectively. The rank-order stability over a year (Study 2) ranged from r = .50 (conformity/achievement) to r = .70 (universalism) and over 3 months (Study 3) from r = .48 (benevolence) to r = .76 (universalism). In Study 2, the means of benevolence decreased and that of power increased. In Study 3, means of universalism and power values increased. The authors also conducted further analyses that largely supported their hypothesis that intra-individual value changes (difference scores) occur in line with the circumplex structure of the value model. A similar study examined students at the beginning of their bachelor programs and 3 years later (Myyry, Juujärvi, & Pesso, 2013; N = 132). The 3-year rank-order stabilities of value types were similarly high as in the general longitudinal samples, ranging from r = .59 (hedonism) to r = .78 (universalism). There were small but significant decreases in achievement values and increases in universalism and security values. In another study, Bardi and colleagues (2014, Study 2) hypothesized more specifically that psychology students would be socialized to endorse benevolence and universalism, while business students endorsed power and achievement. The values of students (131 psychology majors and 65 business majors) were measured at the beginning of the first, second, and third years of their studies. However, while there was evidence for value-based selfselection (psychology students valued universalism and benevolence higher and power lower than business students at T1), the mean-level changes of the two groups did not support the socialization hypothesis. Specifically, both groups decreased in conformity, and psychology students increased very slightly in stimulation, hedonism, and security values (ds < 0.05). This contradicts the assumption of value socialization throughout college. In line with that, another study (Bardi et al., 2014, Study 1) testing police trainees at the beginning and end of a 9-month training Ó 2019 Hogrefe Publishing


C. Schuster et al., Value Change in Adulthood

period (N = 81) found that there was not, as a socialization hypothesis would predict, an increase in conformity and power and decrease in self-direction values, nor did any other values change significantly. In summary, in five studies no consistent patterns of mean-level value change were found resulting from the transition to college or vocational training. There is also no clearly recognizable pattern pointing to a specific susceptibility of important or unimportant values to change. Migration as Transition Another transition that has been examined with regard to value change is migration to another country. One such study (Bardi et al., 2014, Study 3), which examined valuebased self-selection versus socialization in transition, tested Polish immigrants to Britain within 3 months after arrival and then two more times in 9-month intervals (N = 151). Based on country differences in values (as reported in public databases), the socialization hypothesis would predict an increase in self-direction, stimulation, hedonism, and benevolence, along with a decrease in tradition, conformity, security, and power values. However, only self-direction increased significantly. Power values also increased, an ambiguous finding, given that the migrants at the beginning had not only lower ratings with regard to power values than their fellow countrymen but also lower than the British. Another analysis of the same sample (R. Goodwin, Polek, & Bardi, 2011) shows that the belief that human behavior is highly variable and context sensitive predicts increases in universalism and self-direction values and decreases in tradition values. A similar study (Lönnqvist, Jasinskaja-Lahti, & Verkasalo, 2011, 2013) compared Ingrian-Finnish migrants from Russia to Finland before and 3–15 months after migration (Lönnqvist et al., 2011; N = 145), and again 13–28 months after migration (Lönnqvist et al., 2013; N = 136). The findings at the second measurement indicate a significant increase in universalism and security, and a decrease in power and achievement values. The authors explain these changes in their hypotheses with intergroup contact (universalism), stress or threat of migration (security), and downgrading as a response to discrimination (achievement and power). As reported in the follow-up article (Lönnqvist et al., 2013), individual values at the third measurement tended to rebound to their original rating. Interestingly, the two studies on migration propose different hypotheses about underlying processes (socialization in host country versus adaptation to stress) and the direction of value change in migrant populations. Deployment to a War Zone Another interesting analysis of value change comes from Sundberg (2016), who sampled Swedish ISAF soldiers Ó 2019 Hogrefe Publishing

47

(N = 129) before deployment to Afghanistan and again after their return 6 months later. In addition to values, measurements included the big five personality traits and the extent of combat exposure during the tour. Rank-order stabilities in this study were on average r = .82, with tradition as by far the least stable (r = .57) and benevolence and conformity the most stable (r = .92). Mean-level change was analyzed cross-sectionally only; however, value change at the individual level was calculated in the form of the reliable change index, a measure that compares an individual’s change score to an expected distribution of scores if no actual change were to take place. This analysis shows that the importance of at least one value changed for approximately 80% of the respondents. However, it also shows that for the vast majority of participants each value is stable and that increases and decreases in value importance are balanced. The individual’s profile stability reached a mean of r = .75 (SD = 0.22), which further indicates that value priorities stayed mostly unchanged. A striking type of analysis in this study involved the graphic visualization of change patterns. It indicates that changes mostly occurred toward the group mean. And finally, regression analysis of combat experiences and personality as predictors of value change show that the former was only marginally significant, and only with regard to whether or not change occurred at all. Personality traits, on the other hand, predicted the magnitude of value change for conscientiousness and emotional stability negatively and for openness positively. Despite the participants’ relatively extreme experiences, the stability indicators in this study were among the highest of all studies. In addition, the small changes that occurred are not as systematic and predictable as one might expect under similar external pressures. Becoming a Parent One study (Lönnqvist, Leikas, & Verkasalo, 2018, Study 2) examines how values change during the role transition of becoming a parent for the first time. Their sample of 292 participants (146 couples) reported their values during the pregnancy and on average 3.3 months after birth of their child. Only mean-level changes on the value dimensions were reported, finding a small but significant shift toward conservation values in new mothers (but not fathers) and no significant change on the self-transcendence/self-enhancement dimension. The authors emphasize that having a child is a “prototypical example of the type of event that would be expected to induce value change” (p. 50). Political Events Finally, two more studies were included in the category of transitional studies. However, the external life changes in these studies are far subtler than in the other studies. Bègue and Apostolidis (2000) examined the values of 56 female Zeitschrift für Psychologie (2019), 227(1), 42–52


48

French undergraduates before and during the Balkan war (in which the French army participated). They hypothesized an increase in conformity and security values but in a survey after the war found an increase only in universalism and stimulation values. However, these results have to be taken with caution, given several methodological and/or reporting issues in this paper (e.g., the scale means are only partially reported and are higher than the scale maximum). The external event between measurements in Vecchione and colleagues’ (2013) was an election in which the participants (N = 1,030) voted. The research question was concerned with reciprocal effects between values and voting behavior (center-right or center-left coalition). They do not report mean-level change. The prediction of values at T2 by T1, an indicator for rank-order stability, was very high, with estimates ranging from β = .75 (benevolence) to β = .87 (tradition). As hypothesized, several basic values predicted the vote, but the vote did not reciprocally affect values. In summary, the longitudinal study of basic values under a variety of potentially influencing conditions shows only little evidence of systematic change consistent with theoretically well-founded hypotheses. The exception might be that conservation values become more important to women after they become mothers (Lönnqvist et al., 2018). It may be that becoming a mother is a transition more similar across individuals, whereas, for example, migration involves more variable experiences and challenges, thus making it easier to predict a sample-wide direction of value change for new parents than for migrants. Otherwise, values are shown to be highly stable throughout transitions. In addition, except for the study by Sundberg (2016), only indicators of rank-order stability and mean-level change were examined in longitudinal transition studies, measures which are not designed to detect whether changes occur in individuals. Their sample also shows high rank-order stability, with no significant mean-level changes, and a highly consistent value profile for most participants. Still, most participants changed their priorities somewhat, but in various different ways and depending on differential as well as situational factors. Of the two studies measuring the extent of relevant events, one shows it moderately predicts value change (Bardi et al., 2009, Study 4), and one shows it tends to (Sundberg, 2016), so this might be a worthwhile approach for further research.

Experimental Studies: Can Values Be Changed Intentionally? Valuable insights into the processes of value change have also been gained by experimental studies, in particular a line of research by Gregory Maio and colleagues (Bernard, Maio, & Olson, 2003; Maio & Olson, 1998; Maio, Pakizeh, Zeitschrift für Psychologie (2019), 227(1), 42–52

C. Schuster et al., Value Change in Adulthood

Cheung, & Rees, 2009). Based on the values-as-truisms hypothesis that people often hold values with little cognitive support, they developed an intervention requiring participants to write down reasons for a list of values. They then used elaborately disguised pre-post value measurements to test the effect of this intervention on items representing this specific value cluster itself, compared to other clusters, and compared to the control condition. Three experiments (Ns = 77/138/119) show that the absolute changes in relevant values (in this case self-transcendence) were higher than in the control condition (Maio & Olson, 1998) but not in irrelevant values (openness) and only if participants previously lacked cognitive support for their values (Study 2). This idea was extended in another experiment by systematically varying the value cluster in the reasoning intervention (Bernard et al., 2003; N = 100). The results show that value change occurs specifically on the value clusters about which participants in that experimental group reasoned. The reported changes in all four experiments were absolute changes, and so either an increase or a decrease in the respective values. However, the favorability of reasons (coded) for a value tended to correlate with the direction of change. A slightly different intervention with a similar rationale (Maio et al., 2009; N = 175) was used to test the possibility of changing values in a specific direction. Here, participants in the experimental condition were given bogus rankings of their peer groups’ mean values, with either self-transcendence, self-enhancement, openness, or conservation values ranked highest. Then they had to compare their own rankings with the bogus rankings, read a positively sounding explanation about the values and the people who hold them, and then write a short explanation of their value choice. The control group performed a memory task. A mixed model analysis of the interaction effects shows that the intervention led to an increase in the importance of the values within a given cluster and a decrease in the opposite value cluster in the circumplex model (e.g., a self-enhancement value intervention increased selfenhancement and decreased self-transcendence values). With the objective of changing specific values in a specific direction (increasing benevolence values), Arieli, Grant, and Sagiv (2014) tested a persuasion intervention in three experiments. In line with theoretically postulated facilitators of value change (Bardi & Goodwin, 2011), their 30-minute intervention contained elements of priming, consistency maintenance, and self-persuasion. All three experiments (Ns = 36/48/58) provided evidence for the benevolence-increasing effect of the intervention. Experiment 3 showed that the effect lasted until 4 weeks after the experimental session. On a methodological note, all experiments described so far construct elaborate cover stories to hide the intention of changing values. Ó 2019 Hogrefe Publishing


C. Schuster et al., Value Change in Adulthood

A very different intervention was tested by Hirose (2004) in an experiment where participants first rated their values and then completed a second test that constituted the manipulation. In this test, they either rated their anticipated happiness if they hypothetically “actualized” a value (e.g., supporting environmental protection or becoming a millionaire) or estimating the degree of gender inequality with regard to the same items. The hypothesis was that actualization of values increases their importance. The analyses (probably regression analyses of T2 on T1 values separately for each condition; N = 140) suggest that 16 of the 20 specific values increased in importance in the experimental condition but only two increased in the control condition. In a follow-up with only the experimental condition (N = 50), only two values were found to be still more important compared to pretest values. All experimental studies used student samples. In summary, experimental research on value change shows that interventions involving cognitive justification of value importance lead to at least moderate, consistent changes in values. There is only some evidence that values can be influenced in a specific direction, and only one study attempted this with the complete spectrum of Schwartz’s values. In addition, it is not yet clear how long these effects last.

Discussion Summary: What Do We Know About Value Stability and Change? Concerning the first research question, we conclude that there is good evidence for a moderate to high rank-order stability of the ten basic value types in the Schwartz’s circumplex model over time, even over several years. The actual stability might even be underestimated, as measures of value types tend to have lower internal consistencies than conventional norms of reliability prescribe, especially if short scales are used (L. D. Goodwin & Leech, 2006; Schwartz, 2013). When intra-individual profile stability is examined, the results show that most people retain value priorities over time. Despite this finding, correlations of value profiles over time seem to vary considerably between persons, pointing to the possibility of value change (Studies 1 and 4; here and in the following, the studies are referenced by the study number in Table 1). With regard to the second research question, most studies explicitly examine how the importance of each value type changes in their sample. There are three broad categories of changes that have been examined. First, changes that take place over time in general. A possible theoretical

Ó 2019 Hogrefe Publishing

49

explanation for mean-level changes in these studies is aging, although the times between measurements exceeded 3 years in only one study. The changes, if significant, are mixed in direction and small in size. Previous crosssectional research has found that age positively correlates with self-transcendence values and negatively with selfenhancement values (Schwartz, 2005), which is supported by cross-sectional analyses in Study 2. However, only Study 4 provides consistent evidence of intra-individual change in this direction. Second, several studies examine changes through life transitions, most of them following one of two rationales. One line of studies examines whether changes of social context lead to value socialization but found little evidence that people become more similar to the social context to which they transition (Study 8); Rather, people seem to self-select into settings that fit their values (Studies 5, 6, 9, 10, and 11). A second (more or less explicit) theoretical rationale for value change considers transitions as difficult, confusing, and possibly threatening situations that might trigger a re-evaluation of values (Studies 7, 12, 13, 15, and 16). With the exception of becoming a parent (Study 16), these transitions did not point to a consistent pattern of change (Studies 13 and 15). If changes in line with hypotheses were found, a follow-up measurement showed a rebound to the original value pattern (Study 12). This sparsity of evidence predicting value change through life transitions might be at least partially attributable to the individual nature of value change. Changes of group means might thus not be informative about individual value change. For instance, some people might react to their transition to college, with its social and educational opportunities and challenges, by valuing stimulation and openness more, while others might prioritize universalist responsibilities. If the needs and challenges related to the new situation show little variance across individuals, as may be the case with becoming a mother, then mean-level change may be more likely (Study 16). The third approach involves using experimental studies to determine when and how changes of values might happen. An interesting theoretical concept is that values are often “truisms”, meaning that they feel true but it is difficult to explain why (Maio & Olson, 1998). Accordingly, as long as a value lacks cognitive support, being able to come up with good reasons for it might strengthen its importance, whereas not being able to do so may erode its importance (Studies 20–24, also 17–19 at least partially). Priming or providing reasons for a given value position lead participants to convince themselves of the importance of certain values or value types (Studies 17, 18, 19, and 24). However, if the purpose of the intervention was not carefully disguised, as it was in these studies, participants might rightfully feel

Zeitschrift für Psychologie (2019), 227(1), 42–52


50

manipulated, and develop reactance. This would raise ethical problems for real-world applications to change values. With regard to the limitations of this SLR, the most important one may be its focus on Schwartz’s value model (1992), excluding potentially informative studies based on older theories like Rokeach’s (1973). However, the theoretical comparability of the construct across studies seemed more important. A second limitation might be the effectiveness of the search strings. The fact that three articles were added from sources not found in the database search indicates that the search strings did not capture all relevant studies. However, trade-offs were required to limit the number of irrelevant papers. The risk of severe publication bias is low, at least with regard to general and transitional longitudinal studies. As the statistics we were interested in were often only a small part of the results, it is unlikely that publication would depend on their significance. In addition, most of the transitional longitudinal studies in the sample report did not find the expected mean-level changes.

Implications and Research Agenda This SLR reveals that basic value change is an emerging topic in psychological research. However, there are still large gaps in the research. Further research in the following areas seems crucial to narrowing these gaps. First, very few studies examine intra-individual profile stability and change. The data of longitudinal studies could be reanalyzed for ipsative profile correlations. In the only study reporting the full range of such correlations, there were also negative correlations (Study 4), meaning that some people’s value priorities tended to reverse. Getting a better grasp of the distribution of profile stability, as well as its predictors, could provide valuable insights into moderators of value change. Besides the profile correlations, indicators of individual-level change (e.g., difference scores) have been informative where reported (Studies 7 and 13), showing the relationship between life-changing events and value change (Study 7). Similarly, such indicators could be correlated in future research with, for instance, personality traits, environmental primes, incentives, or other possible facilitators of value change (Bardi & Goodwin, 2011; R. Goodwin et al., 2011). These indicators could also be used to compare groups with greater or lesser changes as well as to compare groups that changed in different directions under similar conditions. Second, and this holds for both longitudinal studies on transitions and for experiments, there is a need to better integrate findings into theoretical models of value change and describe causal relationships more clearly. The dual route model of value change (Bardi & Goodwin, 2011) is

Zeitschrift für Psychologie (2019), 227(1), 42–52

C. Schuster et al., Value Change in Adulthood

a promising model as it describes several cognitive mechanisms by which value change can be facilitated (e.g., adaptation, consistency maintenance). The model integrates several processes involved in value change but remains vague as to the direction and boundary conditions of initial as well as long-term change. One boundary condition to value change via self-persuasion has been already identified by Maio and Olson; namely, existing cognitive support (1998). We suggest that future researchers relate predictions to specific theoretical models, possibly refining the dual route model with better-supported theories on the effect of specific facilitators and moderators (e.g., strategies of consistency maintenance). Third, in the sample of studies the range of time between measurements varies from a few weeks to 3–4 years, with one outlier of over 8 years. Over such a short time period, aging effects on values cannot be examined, as they are more likely to occur over longer time periods or be mediated by changes in roles or experiences confounded with biological age (e.g., parenthood). In addition, most studies consisted of adults in their twenties. The one large study with a better cross-sectional age variance points to variations in stability and change of values across age groups (Study 2). In addition, two studies indicating initial value changes in the context of migration (Study 12) or an intervention (Study 25) found at least partial rebound to the baseline in a follow-up measurement. Therefore, to learn more about the triggers and moderators of enduring value change, longer time intervals are needed between measurement points.

Conclusion A systematic review of literature on value stability and change reveals an emerging interest in the topic as well as large gaps in the current state of published research. In summary, the high stability of basic values over time is not only theoretically plausible but also confirmed empirically, even though studies with greater measurement intervals are needed to better understand the role of age in value change. Studies on the development of values through life transitions remain inconclusive. This might at least partially be remedied by study designs that capture value change and its underlying processes on the individual level. Experimental studies imply that specific value types can be effectively targeted, but only a handful show effective change in a specific direction. In addition, it remains unclear how long value change triggered by interventions or external events persists. The results of this review should encourage researchers to intensify their efforts to provide further evidence on the conditions for changing values in adulthood.

Ó 2019 Hogrefe Publishing


C. Schuster et al., Value Change in Adulthood

References Arieli, S., Grant, A. M., & Sagiv, L. (2014). Convincing yourself to care about others: An intervention for enhancing benevolence values. Journal of Personality, 82, 15–24. https://doi.org/ 10.1111/jopy.12029 Bardi, A., Buchanan, K. E., Goodwin, R., Slabu, L., & Robinson, M. (2014). Value stability and change during self-chosen life transitions: Self-selection versus socialization effects. Journal of Personality and Social Psychology, 106, 131–147. https://doi. org/10.1037/a0034818 Bardi, A., & Goodwin, R. (2011). The dual route to value change: Individual processes and cultural moderators. Journal of CrossCultural Psychology, 42, 271–287. https://doi.org/10.1177/ 0022022110396916 Bardi, A., Lee, J. A., Hofmann-Towfigh, N., & Soutar, G. (2009). The structure of intraindividual value change. Journal of Personality and Social Psychology, 97, 913–929. https://doi.org/10.1037/ a0016617 Bègue, L., & Apostolidis, T. (2000). The 1999 Balkan war: Changes in ratings of values and pro war attitudes among French students. Psychological Reports, 86, 1127–1133. https://doi. org/10.1177/003329410008600309.2 Bernard, M. M., Maio, G. R., & Olson, J. M. (2003). Effects of introspection about reasons for values: Extending research on values-as-truisms. Social Cognition, 21, 1–25. https://doi.org/ 10.1521/soco.21.1.1.21193 Dobewall, H., & Aavik, T. (2016). Rank-order consistency and profile stability of self-and informant-reports of personal values in comparison to personality traits. Journal of Individual Differences, 37, 40–48. https://doi.org/10.1027/1614-0001/ a000186 Fink, A. (2014). Conducting research literature reviews: From the Internet to paper. Thousand Oaks, CA: Sage. Goodwin, L. D., & Leech, N. L. (2006). Understanding correlation: Factors that affect the size of r. The Journal of Experimental Education, 74, 251–266. https://doi.org/10.3200/JEXE.74.3. 249-266 Goodwin, R., Polek, E., & Bardi, A. (2011). The temporal reciprocity of values and beliefs: A longitudinal study within a major life transition. European Journal of Personality, 26, 360–370. https://doi.org/10.1002/per.844 Hirose, H. (2004). Changes in values cognition after rating valuesassociated feelings. Japanese Psychological Research, 46, 115–120. https://doi.org/10.1111/j.0021-5368.2004.00242.x Hitlin, S. (2003). Values as the core of personal identity: Drawing links between two theories of self. Social Psychology Quarterly, 66, 118–137. https://doi.org/10.2307/1519843 Knafo, A., & Schwartz, S. H. (2010). Identity formation and parentchild value congruence in adolescence. British Journal of Developmental Psychology, 22, 439–458. https://doi.org/ 10.1348/0261510041552765 Kohlberg, L., & Kramer, R. (1969). Continuities and discontinuities in childhood and adult moral development. Human Development, 12, 3–120. https://doi.org/10.1159/000270857 Lee, J. A., Soutar, G., & Louviere, J. (2008). The best–worst scaling approach: An alternative to Schwartz’s values survey. Journal of Personality Assessment, 90, 335–347. https://doi.org/ 10.1080/00223890802107925 Lönnqvist, J.-E., Jasinskaja-Lahti, I., & Verkasalo, M. (2011). Personal values before and after migration. Social Psychological and Personality Science, 2, 584–591. https://doi.org/ 10.1177/1948550611402362 Lönnqvist, J.-E., Jasinskaja-Lahti, I., & Verkasalo, M. (2013). Rebound effect in personal values. Journal of Cross-Cultural

Ó 2019 Hogrefe Publishing

51

Psychology, 44, 1122–1126. https://doi.org/10.1177/ 0022022113480040 Lönnqvist, J.-E., Leikas, S., & Verkasalo, M. (2018). Value change in men and women entering parenthood: New mothers’ value priorities shift towards Conservation values. Personality and Individual Differences, 120, 47–51. https://doi.org/10.1016/ j.paid.2017.08.019 Maio, G. R., & Olson, J. M. (1998). Values as truisms: Evidence and implications. Journal of Personality and Social Psychology, 74, 294–311. https://doi.org/10.1037/0022-3514.74.2.294 Maio, G. R., Pakizeh, A., Cheung, W.-Y., & Rees, K. J. (2009). Changing, priming, and acting on values: Effects via motivational relations in a circular model. Journal of Personality and Social Psychology, 97, 699–715. https://doi.org/10.1037/ a0016420 Milfont, T. L., Milojev, P., & Sibley, C. G. (2016). Values stability and change in adulthood: A 3-year longitudinal study of rankorder stability and mean-level differences. Personality and Social Psychology Bulletin, 42, 572–588. https://doi.org/ 10.1177/0146167216639245 Myyry, L., Juujärvi, S., & Pesso, K. (2013). Change in values and moral reasoning during higher education. European Journal of Developmental Psychology, 10, 269–284. https://doi.org/ 10.1080/17405629.2012.757217 Piaget, J. (1948). The moral judgment of the child. New York, NY: Free Press. Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency of personality traits from childhood to old age: A quantitative review of longitudinal studies. Psychological Bulletin, 126, 3–25. https://doi.org/10.1037/0033-2909.126.1.3 Rokeach, M. (1973). The nature of human values. New York, NY: Free Press. Schwartz, S. H. (1992). Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. Advances in Experimental Social Psychology, 25, 1–65. https://doi.org/10.1016/s0065-2601(08)60281-6 Schwartz, S. H. (1994). Are there universal aspects in the structure and contents of human values? Journal of Social Issues, 50, 19–45. https://doi.org/10.1111/j.1540-4560.1994.tb01196.x Schwartz, S. H. (2005). Robustness and fruitfulness of a theory of universals in individual human values. In A. Tamayo & J. B. Porto (Eds.), Valores e comportamento nas organizaç atoes (pp. 56–95). Petrópolis, Brazil: Vozes. Schwartz, S. H. (2007). Universalism values and the inclusiveness of our moral universe. Journal of Cross-Cultural Psychology, 38, 711–728. https://doi.org/10.1177/0022022107308992 Schwartz, S. H. (2013). Human values. Chapter 4: Measuring values. Bergen, Norway: European Social Survey Education Net. Retrieved from http://essedunet.nsd.uib.no/cms/topics/1/4/ Schwartz, S. H., & Bardi, A. (2001). Value hierarchies across cultures. Journal of Cross-Cultural Psychology, 32, 268–290. https://doi.org/10.1177/0022022101032003002 Schwartz, S. H., & Bilsky, W. (1987). Toward a universal psychological structure of human values. Journal of Personality and Social Psychology, 53, 550–562. https://doi.org/10.1037//00223514.53.3.550 Schwartz, S. H., Melech, G., Lehmann, A., Burgess, S., Harris, M., & Owens, V. (2001). Extending the cross-cultural validity of the theory of basic human values with a different method of measurement. Journal of Cross-Cultural Psychology, 32, 519–542. https://doi.org/10.1177/0022022101032005001 Shrout, P. E., Stadler, G., Lane, S. P., McClure, M. J., Jackson, G. L., Clavél, F. D., . . . Bolger, N. (2017). Initial elevation bias in subjective reports. Proceedings of the National Academy of Sciences of the USA, 115, E15–E23. https://doi.org/10.1073/ pnas.1712277115

Zeitschrift für Psychologie (2019), 227(1), 42–52


52

Sundberg, R. (2016). Value stability and change in an ISAF contingent. Journal of Personality, 84, 91–101. https://doi. org/10.1111/jopy.12142 Stern, P. C., Dietz, T., & Guagnano, G. A. (1998). A brief inventory of values. Educational and Psychological Measurement, 58, 984–1001. https://doi.org/10.1177/0013164498058006008 Thøgersen, J., & Ölander, F. (2002). Human values and the emergence of a sustainable consumption pattern: A panel study. Journal of Economic Psychology, 23, 605–630. https:// doi.org/10.1016/s0167-4870(02)00120-4 Vecchione, M., Caprara, G., Dentale, F., & Schwartz, S. H. (2013). Voting and values: Reciprocal effects over time. Political Psychology, 34, 465–485. https://doi.org/10.1111/pops.12011 Vecchione, M., Schwartz, S. H., Alessandri, G., Döring, A. K., Castellani, V., & Caprara, M. G. (2016). Stability and change of basic personal values in early adulthood: An 8-year longitudinal study. Journal of Research in Personality, 63, 111–122. https:// doi.org/10.1016/j.jrp. 2016.06.002 Vernon, P. E., & Allport, G. W. (1931). A test for personal values. Journal of Abnormal and Social Psychology, 26, 231–248. https://doi.org/10.1037/h0073233

Zeitschrift für Psychologie (2019), 227(1), 42–52

C. Schuster et al., Value Change in Adulthood

History Received February 27, 2018 Revision received November 1, 2018 Accepted November 5, 2018 Published online March 29, 2019 Acknowledgments The authors thank the external experts for validating the final sample: Anat Bardi, Nadi Hofmann-Towfigh, Shalom Schwartz, Ralph Sundberg, and Michele Vecchione. We also thank the editor and the reviewers for their constructive comments and Paul Lauer for his valuable language suggestions.

Carolin Schuster Institute of Psychology Leuphana University Lüneburg Universitätsallee 1 21335 Lüneburg Germany carolin.schuster@leuphana.de

Ó 2019 Hogrefe Publishing


Review Article

Scientific Misconduct in Psychology A Systematic Review of Prevalence Estimates and New Empirical Data Johannes Stricker and Armin Günther Leibniz Institute for Psychology Information, Trier, Germany

Abstract: Spectacular cases of scientific misconduct have contributed to concerns about the validity of published results in psychology. In our systematic review, we identified 16 studies reporting prevalence estimates of scientific misconduct and questionable research practices (QRPs) in psychological research. Estimates from these studies varied due to differences in methods and scope. Unlike other disciplines, there was no reliable lower bound prevalence estimate of scientific misconduct based on identified cases available for psychology. Thus, we conducted an additional empirical investigation on the basis of retractions in the database PsycINFO. Our analyses showed that 0.82 per 10,000 journal articles in psychology were retracted due to scientific misconduct. Between the late 1990s and 2012, there was a steep increase. Articles retracted due to scientific misconduct were identified in 20 out of 22 PsycINFO subfields. These results show that measures aiming to reduce scientific misconduct should be promoted equally across all psychological subfields. Keywords: scientific misconduct, research practices, research integrity, article retractions

Cases of scientific misconduct undermine the credibility of published results and ultimately reduce the confidence in the value of scientific research as a whole (Fang, Steen, & Casadevall, 2012). The detection of some spectacular cases of scientific misconduct (e.g., the case of Diederik Stapel; Callaway, 2011) has contributed to concerns over the validity of published results in psychology, especially in social psychology (e.g., see Rovenpor & Gonzales, 2015). For instance, Carey (2011), referring to expert evaluation, stated in a New York Times article that “the [Stapel] case exposes deep flaws in the way science is done in a field, psychology, that has only recently earned a fragile respectability”. Similarly, some psychological researchers themselves seem to be unsettled about the credibility of their field. For example, Motyl et al. (2017, p. 10) found that their sample of social and personality psychology researchers had the impression that “the field overall might be pretty rotten”. Scientific misconduct includes data fabrication, data falsification, plagiarism, and other serious and intentional practices that distort scientific results or lead to incorrect information about contributions to research (e.g., undisclosed competing interests; Hofmann, Helgesson, Juth, & Holm, 2015; Resnik, Neal, Raymond, & Kissling, 2015). Honest errors or differences of opinion do not qualify as scientific misconduct (Office of Research Integrity, 2011; Office of Science and Technology Policy, 2000). Besides the negative effect on the credibility of scientific research, there is a large number of additional adverse effects of

scientific misconduct. These negative consequences include the misplacement of monetary investments (e.g., grant funding) and research capacity, misinformation of the public and policy makers, damage of the careers of colleagues and graduate students unknowingly involved in fraudulent projects, the delay of scientific progress, and costs associated with the investigation of misconduct cases (Michalek, Hutson, Wicher, & Trump, 2010; Stroebe, Postmes, & Spears, 2012). While the toxic consequences of scientific misconduct are indisputable, the prevalence of these practices has been subject to debate (Gross, 2016; Marshall, 2000). This question is particularly relevant because reliable data on the occurrence of a phenomenon are crucial to understanding its causes and to developing prevention strategies. Many factors contributing to the engagement in scientific misconduct have been discussed. Those include the academic “publish-or-perish” culture (e.g., De Rond & Miller, 2005) and academic capitalism (Münch, 2014) leading to competitive and individualist norms (Louis, Anderson, & Rosenberg, 1995; Motyl et al., 2017). Many researchers experience significant pressure to publish significant and preferably surprising results in high-ranking journals to achieve tenure or promotion (Nosek, Spies, & Motyl, 2012), job security or financial rewards (Franzoni, Scellato, & Stephan, 2011). There is some evidence that this pressure has increased in the last decades (e.g., Anderson, Ronning, De Vries, & Martinson, 2007).

Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)

Zeitschrift für Psychologie (2019), 227(1), 53–63 https://doi.org/10.1027/2151-2604/a000356


54

J. Stricker & A. Günther, Scientific Misconduct in Psychology

Quantification of Scientific Misconduct

The Present Study

Three different approaches have been used to estimate the prevalence of scientific misconduct: (1) In survey studies, researchers anonymously indicate their involvement in scientific misconduct or estimate the involvement of their colleagues. A meta-analysis of survey studies (Fanelli, 2009) showed that a pooled weighted average of 1.97% of scientists from all scientific fields have admitted to have participated in fabricating, falsifying, or modifying data. 14.12% reported that they believed that their colleagues were involved in such practices. Survey studies on the prevalence of scientific misconduct have been criticized for providing varying estimates due to differences in item wording, survey distribution method, social desirability and other factors (Fanelli, 2009; Fiedler & Schwarz, 2016). (2) Through statistical (re)analyses of reported findings, some researchers attempt to identify statistical inconsistencies in published studies (e.g., inconsistencies between a reported p value and its test statistics) indicating scientific misconduct or questionable research practices (QRPs; e.g., inappropriately “rounding down” p values just over .05; e.g., Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2016). Yet, a considerable proportion of statistical inconsistencies may be a result of inadvertent honest errors rather than scientific misconduct (Bakker & Wicherts, 2011). Thus, studies based on statistical (re)analyses might strongly overestimate the prevalence of scientific misconduct. (3) The analysis of retracted articles and retraction notices has recently emerged as a main format for investigating scientific misconduct (for a review, see Hesselmann, Graf, Schmidt, & Reinhart, 2017). Analyses investigating scientific misconduct via retracted articles are mostly based on cases that after thorough investigations have been judged to be guilty of scientific misconduct. Yet, as it is often difficult to detect scientific misconduct (Stroebe et al., 2012), this approach provides an estimate only for the lower bound of the prevalence of scientific misconduct. Also, estimates derived from this approach are influenced by the quality of the monitoring systems implemented to detect scientific misconduct. Taken together, all three approaches in the quantification of scientific misconduct possess unique strengths and weaknesses in their ability to investigate the prevalence, distribution, and development of scientific misconduct. Thus, findings from all three approaches should be integrated in a field addressing scientific misconduct.

Zeitschrift für Psychologie (2019), 227(1), 53–63

The aim of the present study was to examine the prevalence and the development of scientific misconduct in psychology and its subfields. First, we conducted a systematic review of articles reporting quantitative prevalence estimates of scientific misconduct in psychology. Another concept that is linked to concerns about the validity of published psychological research are QRPs (e.g., Świątkowski & Dompnier, 2017). QRPs comprise practices that unambiguously qualify as scientific misconduct (e.g., falsifying data) and others that are less clear (e.g., failing to report all of a study’s dependent measures; John, Loewenstein, & Prelec, 2012; Motyl et al., 2017; Stürmer, Oeberst, Trötschel, & Decker, 2017). Thus, there is some degree of overlap between scientific misconduct and some of the behaviors subsumed under the term “QRPs”. Consequently, we also included prevalence estimates of QRPs in our review. Second, we analyzed new empirical data on the prevalence and development of retractions due to scientific misconduct in psychology accounting for subfields of psychology, their size, and the number of unique authors responsible for scientific misconduct. A preliminary version of our data set was reported by Margraf (2015). This work did not take into account the retraction reasons (misconduct or not), nor psychological subfields or responsible authors. Our data, scripts for data analysis, and materials (for the systematic review and the empirical study of article retractions) are accessible via the PsychArchives repository https://doi.org/10.23668/psycharchives.872

Method Systematic Review We searched the databases PsycINFO and Scopus with the search-string “(prevalence OR incidence) AND (“scientific fraud” OR “research fraud” OR “scientific misconduct” OR “research misconduct” OR “scientific integrity” OR “data falsification” OR “data fabrication” OR plagiarism OR “research practices” OR “p-hacking” OR “HARKing” OR retract*)” in abstracts and titles (last update: June 2018). Results from Scopus were limited to the subject area “psychology”. No other limits were set. Additionally, we conducted an exploratory literature search by entering our key words in Google Scholar and by following up references in the included studies. Our only inclusion criterion was that studies had to report quantitative prevalence estimates of scientific misconduct or QRPs in psychological research. In three studies, prevalence estimates of scientific misconduct were measured but not reported. We contacted the corresponding authors of these articles via e-mail and

Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)


J. Stricker & A. Günther, Scientific Misconduct in Psychology

received the relevant prevalence estimates from one article (Sacco, Bruton, & Brown, 2018). Studies addressing scientific misconduct in non-psychological research fields and in students (i.e., plagiarism and cheating in course work) were excluded. Empirical Study We used the search string “(retract*.ab. or retract*.ti.) and “01*”.pt.” (limit 1860–2017) to search PsycINFO for “retract*” in titles and abstracts of journal contributions (last update: January 2018). All records reporting that the respective article has been retracted or reporting the retraction of a previously published article were included in the analysis. Next, the original retraction notices were collected. Two independent raters categorized the retraction notices by reason for retraction (1. fraud, 2. plagiarism, 3. other misconduct, 4. author error, 5. publisher error, 6. other reason, 7. no explanation/justification). Categories 1, 2, and 3 were regarded as scientific misconduct. In case of scientific misconduct in multi-authored papers, responsible authors were identified based on the retraction notice. Coders were instructed to use the Retraction Watch Database (Center for Scientific Integrity, n.d.) to obtain additional information if needed. We used the articles’ content classification in PsycINFO to allocate the retracted articles to the respective psychological subfield. For the calculation of the prevalence rate, we divided the number of retracted articles or responsible authors by the size of the field (i.e., the number of records with document type “Journal Article” in the respective field). In the case that there was no clear indication which author of a retracted paper was responsible for the scientific misconduct, the entire author collective was incorporated as a single responsible author in the analyses.

Results Systematic Review The literature search yielded 136 results from PsycINFO and 56 results from Scopus leading to 139 results after removing duplicates and retraction notices. In the first step, we evaluated the titles and abstracts. We excluded 121 articles in this step because no quantitative prevalence estimates of scientific misconduct were measured. The full text of the remaining 18 articles was examined resulting in the inclusion of four articles for the systematic review. The explorative literature search and suggestions from the review process of this article yielded 12 additional relevant articles. In the final database, there were 16 studies: six survey studies, nine studies with statistical (re)analyses and one study analyzing retracted articles. Methods and prevalence estimates of scientific misconduct and QRPs from all included studies can be found in Table 1.

55

Empirical Study Searching PsycINFO for “retract*” in title and abstract yielded 2,302 records, including 402 retractions. 401 original retraction notices could be collected and were categorized for retraction reason by two independent raters. Interrater agreement (100 (number of agreeing values/ number of all coded values)) was 82.54%. Discrepancies were resolved by consulting the original retraction notice and by discussion. Of the 401 retractions, 260 (64.84%) were attributable to scientific misconduct (29.18% fraud, 26.68% plagiarism, 8.98% other misconduct). The overall retraction rate (1860–2017) due to scientific misconduct was 0.82 journal articles per 10,000 journal articles in PsycINFO. The development of retractions due to scientific misconduct since 1982 is shown in Figure 1. The rate of articles retracted due to scientific misconduct in psychological subfields can be found in Table 2.

Discussion Systematic Review This is the first systematic review synthesizing existing studies reporting quantitative prevalence estimates of scientific misconduct and QRPs in psychology. In survey studies, self-admission rates for data falsification ranged between 0.6% and 2.3%. Prevalence estimates for the involvement of other researcher in data falsification ranged between 9.3% and 18.7%. Self-admission rates for other QRPs that may or may not qualify as scientific misconduct such as inappropriately altering or “cooking” research data (e.g., 6%, Braun & Roussos, 2012) or “rounding down” p values just over .05 (e.g., 33%, Motyl et al., 2017) were more prevalent. There was criticism regarding the prevalence definition applied in some of the survey studies (e.g., John et al., 2012) because the percentage of researchers who admitted to have engaged in a QRP at least once was equated with the prevalence of the respective QRP (Fiedler & Schwarz, 2016). Also, the validity of researcher’s estimates of their colleagues’ involvement in QRPs is questionable (Agnoli, Wicherts, Veldkamp, Albiero, & Cubelli, 2017; Fiedler & Schwarz, 2016). Studies reporting statistical (re)-analyses found gross inconsistencies (i.e., reported p value significant, computed p value non-significant or vice versa) in 12.4%–20.5% of the published studies. However, the proportion of studies in which inconsistencies are attributable to scientific misconduct, QRPs or honest errors remains unclear. In the only study that investigated retractions (Grieneisen & Zhang, 2012), the number of analyzed retracted articles from psychology was low (n = 32 for psychology and n = 169 for Neurosciences; numbers derived from Supplementary Material). Also, the proportion of articles in psychology that

Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)

Zeitschrift für Psychologie (2019), 227(1), 53–63


Zeitschrift für Psychologie (2019), 227(1), 53–63

88 early-career researchers responded to an online-survey about the prevalence of 14 QRPs in the German social psychology community distributed via mailing-lists.

Stürmer et al. (2017)

Motyl et al. (2017)

277 members if the Italian Association of Psychology (AIP) responded to an online survey about their involvement in 10 QRPs and the involvement of other Italian research psychologists in these practices using a translated version of the questionnaire by John et al. (2012). 1,166 researchers in social and personality psychology responded to an online survey about their use of QRPs.

2,155 psychological researchers from major US universities completed an online questionnaire about their involvement in 10 QRPs and the involvement of other research psychologists in these practices. For half of the participants, there was an incentive for truth-telling.

John et al. (2012)a

Agnoli et al. (2017)

257 psychotherapy researchers from various professional associations completed an online survey about their scientific misbehavior.

Method

Braun and Roussos (2012)

Survey studies

Reference

2% of the respondents admitted to have at least once falsified data (although most respondents seem to have misinterpreted this item). 33% admitted to have at least once rounded down p values that were just over .05. Independent raters rated 0% of the justifications for “falsifying” data and 6% of the justifications for rounding down p values as unacceptable. Regarding the frequency, respondents on average indicated that they rarely or never “falsified” data or rounded down p values. 3.5% of the participants rated inventing data as “moderately” prevalent and 1.2% as “very” prevalent. 5.9% rated manipulating/faking data as “moderately” prevalent and 1.2% as “very” prevalent.

The self-admission rate for falsifying data was 2.3%. The respondents estimated that 18.7% of the other Italian psychology researchers have falsified data at least once.

The self-admission rates were 6% for inappropriately altering or “cooking” research data (10% in Europe, 5% in North America, 3% in Latin America), 2% for making up research data (1% in Europe, 0% in North America, 4% in Latin America) and 2% for denying authorship to someone who has contributed substantively to a manuscript (2% in Europe, 0% in North America, 1% in Latin America). The self-admission rate for falsifying data was 0.6% (1.7% in a group with an incentive for truth-telling). The percentage of other psychologists who have falsified data was estimated at 9.33% (9.86% with an incentive for truth telling).

Prevalence estimates of scientific misconduct

Table 1. Methods and prevalence estimates from all studies included in the systematic review

Other QRPs were rated as “fairly” or “very” prevalent by 22.9% (transforming data to yield the significance level) to 82.2% (conducting many studies, but reporting only those producing significant results) of the participants. (Continued on next page)

The self-admission rates for other QRPs ranged between 16% (claiming results were unaffected by demographics when they were) and 84% (selectively reporting studies that worked).

The self-admission rates for other QRPs ranged between 3.0% (4.5% with an incentive for truth telling) for claiming that results are unaffected by demographic variables (e.g., gender) when one is actually unsure (or knows that they do) and 63.4% (66.5% with an incentive for truth telling) for failing to report all of a study’s dependent measures. The estimated percentage of other psychologists who ever engaged in other QRPs ranged between 18.72% (21.37% with an incentive for truth telling) for claiming that results are unaffected by demographic variables when one is actually unsure (or knows that they do) and 61.01% (62.70% with incentive for truth telling) for deciding whether to collect more data after looking to see whether the results were significant. The mean self-admission rates for all QRPs were 40% in Social Psychology, 37% in Cognitive Psychology, 35% in Neuroscience, 32% in Personality Psychology, 31% in Industrial Psychology, 31% in Developmental Psychology 30% in Health Psychology, 28% in Forensic Psychology, and 27% in Clinical Psychology. The self-admission rates for other QRPs ranged between 3.1% for claiming that results are unaffected by demographic variables when one is actually unsure (or knows that they do) and 53.2% for deciding whether to collect more data after looking to see whether the results were significant.

The self-admission rates for other QRPs ranged from 5% for compromising the rigor of a study’s design or methodology in response to pressure from a commercial funding source to 23% for conducting research involving human subjects without prior approval from an IRB or Ethics Committee.

Prevalence estimates of other QRPs and potential scientific misconduct

56 J. Stricker & A. Günther, Scientific Misconduct in Psychology

Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)


Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)

The percentages of supported and unsupported hypothesis were compared between 215 articles from six industrial– organizational (I–O) psychology (20102012) journals and a sample of 127 dissertations from 16 PhD programs in I–O psychology (2010–2012). 1,212 p values from 102 studies published in three Spanish psychology journals in 2011 and 2012 were checked for errors (inconsistencies with its test statistic and dfs) 2,667 statistical results from 153 articles in high ranking psychological journals from 2001 to 2010 were analyzed. The authors compared the median p value, sample sizes, and the prevalence of reporting errors between studies with and without removal of outliers.

Mazzola and Deuling (2013)c

Bakker and Wicherts (2014)d

Caperos and Pardo Merino (2013)

In study 1, p values from 281 articles in psychological high-impact and low-impact journals were checked for errors (inconsistencies with its test statistic and dfs). In study 2, p values from 63 randomly selected psychological articles were checked.

136 researchers from various fields who held at least one grant from the USNational Institutes of Health of which eight indicated psychology as their field completed an online survey about their willingness to engage in 40 QRPs and about the prevalence of these QRPs in their field.

Method

Bakker and Wicherts (2011)b

Statistical (re)analyses

Sacco et al. (2018)

Reference

Table 1. (Continued)

Fabricating data by adding data for participants who in fact did not participate and assigning participants to study conditions based on pre-screen data in a way that is intended to maximize the likelihood of treatment effects were rated as very uncommon or uncommon by six participants and as somewhat uncommon and neither common nor uncommon by one participant each. Using others’ ideas, words, images, or other materials without citation was rated as very uncommon or uncommon by 5 participants and as neither uncommon nor common and as somewhat common by one participant each.

Prevalence estimates of scientific misconduct

(Continued on next page)

No differences between studies that reported outlier removal and studies that did not report outlier removal were found regarding the median p value, the sample size or reporting errors. In 41% of the articles without reported outlier removal, there was a discrepancy between the reported degrees of freedom (df) of t tests and the reported sample sizes (reporting error).

17.6% of all articles with complete information contained at least one gross inconsistency (i.e., reported p value significant, computed p value non-significant or vice versa).

12.4% of all articles contained at least one gross inconsistency (i.e., reported p value significant, computed p value non-significant or vice versa). There was no statistical difference in the proportion of gross errors between high-impact and low-impact journals. Moreover, the gross inconsistencies were more likely to render an insignificant effect significant than vice versa which showed that the gross errors were predominantly in favor of the researchers’ hypotheses. For instance, all rounding errors around a p value of .05 were in line with the researchers’ hypotheses. In study 2, 6.3% of all articles contained at least one gross inconsistency. 1,231 (73.10%) of the hypothesis in published I–O psychology studies and only 404 (32.93%) of the hypothesis in I–O psychology dissertations were supported.

The other QRPs had higher prevalence ratings. For instance, failing to report all of a study’s outcome measures was rated as somewhat common or common by 50% of the psychological researchers.

Prevalence estimates of other QRPs and potential scientific misconduct

J. Stricker & A. Günther, Scientific Misconduct in Psychology 57

Zeitschrift für Psychologie (2019), 227(1), 53–63


Zeitschrift für Psychologie (2019), 227(1), 53–63

Cortina, Green, Keeler, and Vandenberg (2017)g

Nuijten et al. (2016)b

Franco, Malhotra, and Simonovits (2016)

Bosco, Aguinis, Field, Pierce, and Dalton (2016)e

Veldkamp et al. (2014)b

Reference

Table 1. (Continued)

784 structural equation models from 75 papers published in the Journal of Applied Psychology and the Academy of Management Journal from 2011 to 2013 and 1993 to 1995 were checked. The authors tested whether there were discrepancies between the reported df and the calculated df based on the model description in the article Introductions (i.e., whether the models that were tested in the manuscripts differed from the models that the manuscripts claimed to test)

8,105 statistical results were retrieved from articles published in six high ranking journals from different psychological subfields from January 2012 to October 2012 and were checked for errors. In study 1, 247 effect sizes of the relations between job performance and nine other variables from two top ranking I/O psychology journals were analyzed. The authors tested whether the effect sizes were larger for hypothesized in comparison to nonhypothesized relations. Also, the HARKing self-admission rate was established by contacting the authors. In study 2, 281 hypothesized and nonhypothesized effect sizes from a metaanalysis of the relation between job satisfaction and job performance were compared. 32 published psychological studies for which the complete experimental design and the full set of measured variables was available from a competitive grant program in the United States were analyzed. The authors checked whether all experimental conditions and outcome variables were reported in the published manuscript. Over 250,000 p values from 30,717 articles published in eight major psychology journals from 1985 to 2013 were checked for errors (inconsistencies with its test statistic and dfs).

Method

Prevalence estimates of scientific misconduct

(Continued on next page)

12.9% of all articles with null-hypothesis significance testing (NHST) contained at least one gross inconsistency (i.e., reported p value significant, computed p value non-significant or vice versa). Overall, 1.4% (3,581) of the p values were grossly inconsistent. Between 1985 and 2013, the prevalence of articles with gross inconsistencies has declined. There were discrepancies between the calculated and reported df in 38.38% of the models that reported df and provided sufficient information to calculate df. These discrepancies could be reconciled in only 14.91% of the cases.

41% of the investigated studies failed to reported all experimental conditions and outcome measures. 63% of the reported tests but only 23% of the unreported tests were significant at the p < .05 level. Also, the reported effect sizes were about twice as large as unreported effect sizes.

20.5% of the articles contained at least one gross inconsistency (i.e., reported p value as significant and computed p value nonsignificant or vice versa). No journal differed significantly from any other journal in the prevalence of articles with at least one gross inconsistency. In study 1, hypothesized relations (mean r = .20) were larger than the nonhypothesized relations (mean r = .09). Also, 38% of the responding authors reported that (at least) one hypothesis has changed after the completion of data collection. In study 2, hypothesized job satisfaction–job performance relations (mean r = .22) were larger than nonhypothesized job satisfaction–job performance relations (mean r = .16).

Prevalence estimates of other QRPs and potential scientific misconduct

58 J. Stricker & A. Günther, Scientific Misconduct in Psychology

Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)


59

Note.. aThis study has been criticized for overestimating questionable research practices (QRPs) due to methodological problems (e.g., ambiguous item wording and equating the prevalence of a QRP with the percentage of researchers who ever engaged in a QRP) by Fiedler and Schwarz (2016). bActions that qualify as scientific misconduct (e.g., deliberately misreporting p values) provide an explanation for the reported gross statistical inconsistencies. Yet, a large proportion of the identified gross inconstancies might be attributable to honest error rather than scientific misconduct. cMazzola and Deuling (2013) interpreted their findings as evidence for selective reporting and HARKing. Yet, the authors mention alternative interpretations (e.g., “file drawing” null result projects) that may be equally plausible. dBakker and Wicherts (2014) regarded outlier removal as a potential indicator of significance chasing or p-hacking. Discrepancies between the reported sample sizes and df were interpreted as an indicator for the failure to report outlier removal or missing values. eBosco et al. (2016) interpreted their findings as evidence of HARKing and its negative impact. 13 alternative explanations of the findings were tested and ruled out. fWe were unable to obtain the number of psychological articles retracted due to scientific misconduct from this study. It is unlikely that all of the included retractions reflect cases of scientific misconduct. gThere were various reasons for the detected inconsistencies. Some reasons seem to reflect insufficient reporting (e.g., not enough detail provided to know which items go into which parcels) whereas others seem to reflect QRPs (e.g., unreported freeing of measurement error covariances).

were retracted due to scientific misconduct was not reported. Taken together, the existing studies show that the selfadmission rates for scientific misconduct are lower than self-admission rates for other QRPs that are regarded as less severe (see Sacco et al., 2018). Also, the self-admission rates for scientific misconduct were considerably lower than prevalence estimates regarding the actions of other psychological researchers and lower than the percentage of gross statistical inconsistencies. Even between the survey studies, estimates varied strongly and might overestimate (e.g., because of difficulties in item interpretation; e.g., Motyl et al., 2017) or underestimate (e.g., because of social desirability; Edwards, 1957) the prevalence of scientific misconduct. Thus, additional empirical data were required to obtain a reliable lower bound prevalence estimate of scientific misconduct in psychology.

Grieneisen and Zhang (2012)f

Retraction analyses

Reference

Table 1. (Continued)

Method

42 literature databases were used to locate retracted articles from 1928 to 2011 (n = 4,449) across the full spectrum of scientific disciplines. Ratios of retractions were calculated by dividing the number in each scientific field by the Web of Science 2010 records in this field. PsycINFO was not included as a data source.

Prevalence estimates of scientific misconduct

Prevalence estimates of other QRPs and potential scientific misconduct

Retraction rates were 0.16% in Psychology, Mathematical (1 article), 0.12% in Psychology, Social (4 articles), 0.11% in Psychology, Psychoanalysis (1 article), 0.09% in Psychology (8 articles), 0.09% in Psychology, Developmental (4 articles), 0.06% in Psychology, Experimental (5 articles), 0.06% in Psychology, Multidisciplinary (5 articles), 0.04% in Psychology, Biological (1 article), 0.04% in Psychology, Clinical (3 articles), 0% in Psychology, Applied (0 articles), 0% in Psychology, Educational (0 articles) and 0.38% in Neurosciences (169 articles).

J. Stricker & A. Günther, Scientific Misconduct in Psychology

Empirical Study This study was the first empirical investigation analyzing a large number of psychological articles retracted due to scientific misconduct. Our empirical analyses revealed that the percentage of retractions that was attributable to scientific misconduct (64.84% in PsycINFO) was similar to the biomedical and life-science literature (67.40% in PubMed; Fang et al., 2012) and similar to estimates derived from a variety of scientific disciplines and databases (47% “publishing misconduct” and 20% “research misconduct”; Grieneisen & Zhang, 2012). The overall rate of journal articles retracted due to scientific misconduct was somewhat higher in PsycINFO (0.82 per 10,000 journal articles) compared to Medline (0.56 per 10,000 journal articles; Wager & Williams, 2011). Importantly, all comparisons with other disciplines should be interpreted with caution due to differences in methods and covered time periods. For example, Fang et al. (2012) consulted further information in addition to the retraction notices to classify reasons for retractions whereas other authors did not (e.g., Wager & Williams, 2011). With regard to the temporal development, there was a steep increase in retractions due to scientific misconduct of journal articles in PsycINFO between the late 1990s and 2012. There were almost no retractions due to scientific misconduct in psychology before the late 1990s. Grieneisen and Zhang (2012) identified a similar trend in their study covering a wide range of scientific disciplines. This could either be explained by an increase in scientific misconduct or by changing mechanisms (e.g., plagiarism screening) and standards (e.g., journal policies) to detect and retract fraudulent articles. Fanelli (2013) argued that the increase in article retractions is attributable to improved detection and retraction systems. For instance, he found that the proportion of journals that retract articles has grown

Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)

Zeitschrift für Psychologie (2019), 227(1), 53–63


60

J. Stricker & A. Günther, Scientific Misconduct in Psychology

Figure 1. Development in number of journal articles retracted due to scientific misconduct per 10,000 published journal articles in PsycINFO from 1982 to 2017 by publication year of the retracted article.

Table 2. Number of article retractions due to scientific misconduct and number of authors responsible for scientific misconduct per 10,000 journal articles in PsycINFO subfields (1860–2017) Number of retracted articlesa Subfield

N

Social psychology

Number of responsible authorsa

Per 10,000 published articles

N

Per 10,000 published articles 0.86

31

3.80

7

Consumer psychology

9

2.52

7

1.96

Sport psychology & leisure

5

1.85

5

1.85 1.45

Engineering & environmental psychology

7

1.70

6

Personality psychology

17

1.64

5

0.48

Physiological psychology & neuroscience

57

1.48

54

1.40

Intelligent systems

4

1.25

4

1.25

Industrial & organizational psychology

11

0.77

11

0.77

Psychological & physical disorders

55

0.72

49

0.64

Health & mental health treatment & prevention

44

0.72

41

0.67

Human experimental psychology

15

0.69

8

0.37

Communication systems

3

0.61

3

0.61

Social processes & social issues

8

0.46

7

0.40

General psychology

1

0.38

1

0.38

Developmental psychology

7

0.38

5

0.27

Animal experimental & comparative psychology

3

0.34

3

0.34

Educational psychology

8

0.33

8

0.33

Forensic psychology & legal issues

1

0.30

1

0.30

Professional psychological & health personnel issues

3

0.27

3

0.27

Psychometrics & statistics & methodology

4

0.21

3

0.16

Psychology & the humanities

0

0.00

0

0.00

Military psychology

0

0.00

0

0.00

Note. To allocate the retracted articles to the respective psychological subfield, we used the articles’ content classification in PsycINFO. About 13% of the journal articles in PsycINFO were assigned to two subfields. Accordingly, in the determination of the subfield size, the number of retracted articles and the number of responsible authors, these articles are included in both subfields. a

dramatically while the cases of misconduct identified by the US Office of Research Integrity have not increased. Interestingly, the trend that was identified for article retractions in psychology was not found for gross statistical inconsistencies in published psychological articles which Zeitschrift für Psychologie (2019), 227(1), 53–63

are regarded as a potential indicator of scientific misconduct or QRPs (Nuijten et al., 2016). This finding supports Fanelli’s (2013) notion that the increase in article retractions is mostly attributable to improved detection and retraction systems (also see Gross, 2016). In recent years,

Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)


J. Stricker & A. Günther, Scientific Misconduct in Psychology

the rate of articles retracted due to scientific misconduct seemed to decline. This is likely to be due to the time delay with which cases of scientific misconduct are usually detected (Fang et al., 2012). In 20 out of 22 psychological subfields, there were articles retracted due to scientific misconduct. Based on the number of retracted journal articles, the largest prevalence was identified for Social Psychology. However, 80.65% of these cases were attributable to one author (D. Stapel). Based on the number of different responsible authors, Consumer Psychology had the highest prevalence. This finding shows, that the perception of some psychological subfields as being more fraudulent than others might be attributable to spectacular cases in which single authors were responsible for a large number of fraudulent studies (“repeat offenders”; Grieneisen & Zhang, 2012). General Discussion and Limitations Our systematic review showed that scientific misconduct including data falsification, data fabrication and other severe forms of misconduct in psychology is relatively rare in comparison to other QRPs. As expected, our empirical study yielded a somewhat lower prevalence estimate of scientific misconduct in comparison to survey studies. This reflects that scientific misconduct is not always detected. Yet, scientific misconduct was prevalent across a variety of geographic regions (Agnoli et al., 2017; Braun & Roussos, 2012; John et al., 2012) and in almost all psychological subfields. Even single incidents of scientific misconduct can have immense effects (Michalek et al., 2010). Consequently, we believe that it is important to promote measures which diminish the incentives and possibilities to engage in scientific misconduct equally across all psychological subfields. In our eyes, a promising approach lies in the advancement of open data and open materials (Tenopir et al., 2011) and in the improvement of systems for reporting suspected scientific misconduct (Crocker & Cooper, 2011). However, we do not believe that scientific misconduct can be entirely prevented through detection systems. Thus, fostering an ethical organization culture clearly communicating acceptable and unacceptable behavior in psychology departments and research groups (e.g., through rewards systems; KishGephart, Harrison, & Treviño, 2010) seems equally important. Our study has, of course, some limitations. First, the number of studies in the systematic review was relatively low. The heterogeneity in methods did not allow metaanalytic integration of the results. Similarly, the number of retracted articles in our empirical study was low for some subfields so that comparisons between subfields should be interpreted with caution. Second, retraction notices provide an estimate only of the lower bound prevalence estimate of

61

scientific misconduct as many cases can remain unnoticed. In this point, the investigation of scientific misconduct is similar to the calculation of crime rates, because only reported offenses are in the statistics (Bechtel & Pearson, 1985). Third, our empirical method was designed to quantify convicted cases of scientific misconduct. Other, subtler but potentially equally damaging (Simmons, Nelson, & Simonsohn, 2011) QRPs were only covered in our systematic review. Despite these constraints, the present study contributes to the understanding of scientific misconduct in psychology. Our study yielded reliable lower bound estimates of scientific misconduct which showed that scientific misconduct occurs across almost all psychological subfields. Also, the increasing retraction rate in comparison to the 1980s and 1990s shows that there are mechanisms which generally have the ability to detect scientific misconduct. Thus, initiatives to strengthen these systems (e.g., by increasing research transparency) should be promoted across all psychological subfields and not be restrained to fields with prominent cases of scientific misconduct.

References *References marked with an asterisk were included in the systematic review. *Agnoli, F., Wicherts, J. M., Veldkamp, C. L., Albiero, P., & Cubelli, R. (2017). Questionable research practices among Italian research psychologists. PLoS One, 12, e0172792. https://doi. org/10.1371/journal.pone.0172792 Anderson, M. S., Ronning, E. A., De Vries, R., & Martinson, B. C. (2007). The perverse effects of competition on scientists’ work and relationships. Science and Engineering Ethics, 13, 437–461. https://doi.org/10.1007/s11948-007-9042-5 *Bakker, M., & Wicherts, J. M. (2011). The (mis) reporting of statistical results in psychology journals. Behavior Research Methods, 43, 666–678. https://doi.org/10.3758/s13428-0110089-5 *Bakker, M., & Wicherts, J. M. (2014). Outlier removal and the relation with reporting errors and quality of psychological research. PLoS One, 9, e103360. https://doi.org/10.1371/journal. pone.0103360 Bechtel, H. K. Jr., & Pearson, W. Jr. (1985). Deviant scientists and scientific deviance. Deviant Behavior, 6, 237–252. https://doi. org/10.1080/01639625.1985.9967676 *Bosco, F. A., Aguinis, H., Field, J. G., Pierce, C. A., & Dalton, D. R. (2016). HARKing’s threat to organizational research: Evidence from primary and meta-analytic sources. Personnel Psychology, 69, 709–750. https://doi.org/10.1111/peps.12111 *Braun, M., & Roussos, A. J. (2012). Psychotherapy researchers: Reported misbehaviors and opinions. Journal of Empirical Research on Human Research Ethics, 7, 25–29. https://doi. org/10.1525/jer.2012.7.5.25 Callaway, E. (2011). Report finds massive fraud at Dutch universities. Nature, 479, 15. https://doi.org/10.1038/479015a *Caperos, J. M., & Pardo Merino, A. (2013). Consistency errors in p-values reported in Spanish psychology journals. Psicothema, 25, 408–414. https://doi.org/10.7334/psicothema2012.207 Carey, B. (2011, November 2). Fraud case seen as a red flag for psychology research (pp. A3). New York, NY: New York Times.

Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)

Zeitschrift für Psychologie (2019), 227(1), 53–63


62

J. Stricker & A. Günther, Scientific Misconduct in Psychology

Center for Scientific Integrity. (n.d.). Retraction watch database. Retrieved from http://retractiondatabase.org/RetractionSearch. aspx *Cortina, J. M., Green, J. P., Keeler, K. R., & Vandenberg, R. J. (2017). Degrees of freedom in SEM: Are we testing the models that we claim to test? Organizational Research Methods, 20, 350–378. https://doi.org/10.1177/1094428116676345 Crocker, J., & Cooper, M. L. (2011). Addressing scientific fraud. Science, 334, 1182. https://doi.org/10.1126/science.1216775 De Rond, M., & Miller, A. N. (2005). Publish or perish: Bane or boon of academic life? Journal of Management Inquiry, 14, 321–329. https://doi.org/10.1177/1056492605276850 Edwards, A. L. (1957). The social desirability variable in personality assessment and research. Worth, TX: Dryden Press. Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One, 4, e5738. https://doi.org/10.1177/ 1056492605276850 Fanelli, D. (2013). Why growing retractions are (mostly) a good sign. PLoS Medicine, 10, e1001563. https://doi.org/10.1371/ journal.pmed.1001563 Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences of the United States of America, 109, 17028–17033. https://doi.org/10.1073/ pnas.1212247109 Fiedler, K., & Schwarz, N. (2016). Questionable research practices revisited. Social Psychological and Personality Science, 7, 45–52. https://doi.org/10.1177/1948550615612150 *Franco, A., Malhotra, N., & Simonovits, G. (2016). Underreporting in psychology experiments: Evidence from a study registry. Social Psychological and Personality Science, 7, 8–12. https:// doi.org/10.1177/1948550615598377 Franzoni, C., Scellato, G., & Stephan, P. (2011). Changing incentives to publish. Science, 333, 702–703. https://doi.org/ 10.1126/science.1197286 *Grieneisen, M. L., & Zhang, M. (2012). A comprehensive survey of retracted articles from the scholarly literature. PLoS One, 7, e44118. https://doi.org/10.1371/journal.pone.0044118 Gross, C. (2016). Scientific misconduct. Annual Review of Psychology, 67, 693–711. https://doi.org/10.1371/journal.pone.0044118 Hesselmann, F., Graf, V., Schmidt, M., & Reinhart, M. (2017). The visibility of scientific misconduct: A review of the literature on retracted journal articles. Current Sociology, 65, 814–845. https://doi.org/10.1177/0011392116663807 Hofmann, B., Helgesson, G., Juth, N., & Holm, S. (2015). Scientific dishonesty: A survey of doctoral students at the major medical faculties in Sweden and Norway. Journal of Empirical Research on Human Research Ethics, 10, 380–388. https://doi.org/ 10.1177/1556264615599686 *John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524–532. https:// doi.org/10.1177/0956797611430953 Kish-Gephart, J. J., Harrison, D. A., & Treviño, L. K. (2010). Bad apples, bad cases, and bad barrels: Meta-analytic evidence about sources of unethical decisions at work. Journal of Applied Psychology, 95, 1–31. https://doi.org/10.1037/a0017103 Louis, K. S., Anderson, M. S., & Rosenberg, L. (1995). Academic misconduct and values: The department’s influence. The Review of Higher Education, 18, 393–422. https://doi.org/ 10.1353/rhe.1995.0007 Margraf, J. (2015). Zur Lage der Psychologie [On the state of psychology]. Psychologische Rundschau, 66, 1–30. https://doi. org/10.1026/0033-3042/a000247

Zeitschrift für Psychologie (2019), 227(1), 53–63

Marshall, E. (2000). Scientific misconduct–How prevalent is fraud? That’s a million-dollar question. Science, 290, 1662–1663. https://doi.org/10.1126/science.290.5497.1662 *Mazzola, J. J., & Deuling, J. K. (2013). Forgetting what we learned as graduate students: HARKing and selective outcome reporting in I–O journal articles. Industrial and Organizational Psychology, 6, 279–284. https://doi.org/10.1111/iops. 12049 Michalek, A. M., Hutson, A. D., Wicher, C. P., & Trump, D. L. (2010). The costs and underappreciated consequences of research misconduct: A case study. PLoS Medicine, 7, e1000318. https:// doi.org/10.1371/journal.pmed.1000318 *Motyl, M., Demos, A. P., Carsel, T. S., Hanson, B. E., Melton, Z. J., Mueller, A. B., . . . Skitka, L. J. (2017). The state of social and personality science: Rotten to the core, not so bad, getting better, or getting worse? Journal of Personality and Social Psychology, 113, 34–58. https://doi.org/10.1037/pspa0000084 Münch, R. (2014). Academic capitalism: Universities in the global struggle for excellence. New York, NY: Routledge. Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7, 615–631. https://doi.org/10.1177/1745691612459058 *Nuijten, M. B., Hartgerink, C. H., van Assen, M. A., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48, 1205–1226. https://doi.org/10.3758/s13428-015-0664-2 Office of Research Integrity. (2011). Definition of research misconduct. Rockville, MD: US Department of Health and Human Services. Retrieved from http://ori.hhs.gov/definition-misconduct Office of Science and Technology Policy (OSTP). (2000). Federal policy on research misconduct. Federal Register, 65, 76260–76264. Retrieved from https://ori.hhs.gov/federalresearch-misconduct-policy Resnik, D. B., Neal, T., Raymond, A., & Kissling, G. E. (2015). Research misconduct definitions adopted by US research institutions. Accountability in Research, 22, 14–21. https:// doi.org/10.1080/08989621.2014.891943 Rovenpor, D. R., & Gonzales, J. E. (2015). Replicability in psychological science: Challenges, opportunities, and how to stay upto-date. Psychological Science Agenda, 29(1). Retrieved from www.apa.org/science/about/psa/2015/01/replicability.aspx *Sacco, D. F., Bruton, S. V., & Brown, M. (2018). In defense of the questionable: Defining the basis of research scientists’ engagement in questionable research practices. Journal of Empirical Research on Human Research Ethics, 13, 101–110. https://doi. org/10.1177/1556264617743834 Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. https://doi.org/10.1177/ 0956797611417632 Stroebe, W., Postmes, T., & Spears, R. (2012). Scientific misconduct and the myth of self-correction in science. Perspectives on Psychological Science, 7, 670–688. https://doi.org/10.1177/ 1745691612460687 *Stürmer, S., Oeberst, A., Trötschel, R., & Decker, O. (2017). Earlycareer researchers’ perceptions of the prevalence of questionable research practices, potential causes, and open science. Social Psychology, 48, 365–371. https://doi.org/10.1027/18649335/a000324 Świątkowski, W., & Dompnier, B. (2017). Replicability crisis in social psychology: Looking at the past to find new pathways for the future. International Review of Social Psychology, 30, 111–124. https://doi.org/10.1027/1864-9335/a000324

Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)


J. Stricker & A. Günther, Scientific Misconduct in Psychology

Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., . . . Frame, M. (2011). Data sharing by scientists: Practices and perceptions. PLoS One, 6, e21101. https://doi.org/10.1371/ journal.pone.0021101 *Veldkamp, C. L., Nuijten, M. B., Dominguez-Alvarez, L., van Assen, M. A., & Wicherts, J. M. (2014). Statistical reporting errors and collaboration on statistical analyses in psychological science. PLoS One, 9, e114876. https://doi.org/10.1371/journal. pone.0114876 Wager, E., & Williams, P. (2011). Why and how do journals retract articles? An analysis of Medline retractions 1988–2008. Journal of Medical Ethics, 37, 567–570. https://doi.org/10.1136/ jme.2010.040964

63

History Received February 28, 2018 Revision received October 17, 2018 Accepted October 18, 2018 Published online March 29, 2019 Armin Günther Leibniz Institute for Psychology Information Universitätsring 15 54296 Trier Germany ague@leibniz-psychology.org

Ó 2019 Hogrefe Publishing. Distributed as a Hogrefe OpenMind article under the license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0)

Zeitschrift für Psychologie (2019), 227(1), 53–63


Review Article

Which Data to Meta-Analyze, and How? A Specification-Curve and Multiverse-Analysis Approach to Meta-Analysis Martin Voracek, Michael Kossmeier, and Ulrich S. Tran Department of Basic Psychological Research and Research Methods, Faculty of Psychology, University of Vienna, Austria

Abstract: Which data to analyze, and how, are fundamental questions of all empirical research. As there are always numerous flexibilities in data-analytic decisions (a “garden of forking paths”), this poses perennial problems to all empirical research. Specification-curve analysis and multiverse analysis have recently been proposed as solutions to these issues. Building on the structural analogies between primary data analysis and meta-analysis, we transform and adapt these approaches to the meta-analytic level, in tandem with combinatorial metaanalysis. We explain the rationale of this idea, suggest descriptive and inferential statistical procedures, as well as graphical displays, provide code for meta-analytic practitioners to generate and use these, and present a fully worked real example from digit ratio (2D:4D) research, totaling 1,592 meta-analytic specifications. Specification-curve and multiverse meta-analysis holds promise to resolve conflicting metaanalyses, contested evidence, controversial empirical literatures, and polarized research, and to mitigate the associated detrimental effects of these phenomena on research progress. Keywords: combinatorial meta-analysis, digit ratio (2D:4D), graphical display, multiverse analysis, specification-curve analysis

Structural analogies between meta-analysis and the analysis of primary studies in empirical research have been noted since the inception of meta-analytic methods in the late 1970s. In particular, whereas primary studies deal with and analyze a collection of observations (predominantly, data obtained from individual study participants), metaanalyses deal with and analyze a collection of outcomes of primary studies (predominantly, effect sizes extracted from individual studies). In the wake of the current (2010s) reproducibility debate and method-reform movement in psychological science and other empirical disciplines (Nelson, Simmons, & Simonsohn, 2018), it has increasingly come to attention that there are numerous flexibilities in data-analytic decisions (now interchangeably termed as researcher degrees of freedom, p-hacking, or the data-analytic “garden of forking paths”; Gelman & Loken, 2014; Simmons, Nelson, & Simonsohn, 2011). More specifically, it appears that researchers often disagree, or are uncertain, about which individual observations to include (vs. to exclude) in the analysis of an empirical dataset, and further, which data-analytic strategy is fitting. These phenomena are strikingly demonstrated in field experiments of crowdsourced data-analysis, wherein many data-analysts independently from each other tackle Zeitschrift für Psychologie (2019), 227(1), 64–82 https://doi.org/10.1027/2151-2604/a000357

the very same, seemingly simple, research question using the very same dataset (Silberzahn et al., 2018). In similar vein, meta-analyses often are criticized with regard to their study inclusion criteria (i.e., which studies are eligible vs. which are not), and further, which metaanalytic strategy might be appropriate or optimal (beginning from the choice of the effect-size metric, over possible transformations of these, to the type of meta-analytic modeling itself). Apart from that, at least some questionable research practices (such as p-hacking) that pervade primary research (Nelson et al., 2018; Simmons et al., 2011) might be less prevalent or likely in meta-analyses. For empirical primary studies, there have been recent proposals of methodologists to address and resolve these concerns (which data to include, and how to analyze them). Here, we adopt, modify, and apply the framework of these solutions to meta-analysis; illustrate the potential of this approach with a concrete, fully worked practical application example; indicate further such examples in diverse research fields; include appropriate data-visualization techniques; provide software code within the R software environment for practitioners; and discuss the implications of the approach for broader debates surrounding metaanalyses. Ó 2019 Hogrefe Publishing


M. Voracek et al., Which Data to Meta-Analyze, and How?

Specification-Curve and Multiverse-Analysis Approaches to the Analysis of Primary Studies Simonsohn, Simmons, and Nelson (2015) noted that, in the data analysis of primary studies, researchers, perhaps oftentimes, disagree on which data points to include, and further, disagree on which statistical tests to calculate. Briefly, these considerations boil down to the fundamental questions of which data to analyze, and how to analyze them. Simonsohn et al. (2015) foremost illustrated these points by discussing a highly publicized, controversial paper (Jung, Shavitt, Viswanathan, & Hilbe, 2014a), which claimed to show that the perceived femininity (vs. masculinity) of (arbitrarily chosen) names for hurricanes was associated with higher (vs. lower) death toll of these hurricanes in the USA. The paper appeared as a full report in the prestigious Proceedings of the National Academy of Sciences of the USA (PNAS) and was highly publicized in diverse news outlets, online social media, and the like, as indicated by an Altmetric attention score of 2,334 (as of end of August 2018; see https://www.altmetric.com/details/2397628#score). To put such an exceedingly high media response as this one in appropriate context, among the 11.68 million research outputs so far indexed by Altmetric, the Jung et al. (2014a) paper ranks 344 (or, at the percentile 99.997). At the same time, the Jung et al. (2014a) paper triggered a daisy chain of critical, published letters to the editor (Bakkensen & Larson, 2014; Christensen & Christensen, 2014; Maley, 2014; Malter, 2014), along with published author replies and rebuttals (Jung et al., 2014b, 2014c, 2014d), all offering discrepant, and seemingly irreconcilable, views on which hurricane data to include and how to analyze them. Simonsohn et al. (2015) assembled all these views (or, alternative specifications for data analysis) combinatorially, showed that this yielded 1,728 different ways to analyze more or less the same underlying data set, and further showed that the specific finding of female-named hurricanes being deadlier, as reported in Jung et al. (2014a), belonged to a small subset of analyses (37 out of a total of 1,728 specifications, or 2.1%) which yielded a nominally significant result. Hence, the published main finding of the hurricane paper clearly was not supported. As a side note, a later, independent replication attempt (Smith, 2016), published in a specialist journal and utilizing a much broader data set, also found no support for the main finding of the hurricane paper. Simonsohn et al. (2015) denominated their approach specification-curve analysis, comprised of the following steps: (1) identification of the (reasonable) specifications for analysis (which data to analyze, and how); (2) combinatorial assembly of these specifications (statistically Ă“ 2019 Hogrefe Publishing

65

analyzing all of these); (3) visualization of the different results emerging; (4) inferential statistical procedures (permutation/randomization tests or bootstrap techniques, dependent on the data structure and type of research hypothesis), in order to test whether the results as a whole deviate from the null hypothesis. Most recently, specification-curve analysis has successfully been applied for clarifying the role of birth-order effects in personality traits and cognitive abilities, a line of inquiry which hitherto has produced notoriously inconsistent findings (Rohrer, Egloff, & Schmukle, 2017). A quite similar proposal was made by Steegen, Tuerlinckx, Gelman, and Vanpaemel (2016), who called their approach multiverse analysis. Comparing multiverse analysis with specification-curve analysis shows that the above steps (1) and (2) are identical, that multiverse analysis proposes different graphical displays for step (3) (a histogram, and an additional tile plot, of the p values, as they emerge from multiverse analysis) than specification-curve analysis does (a specification-curve plot), and that multiverse analysis lacks the inferential statistics of step (4). These approaches are not without forerunners and congenial ideas, which are rooted in the robustness analysis practices in economics and more generally in the predictor-selection problem in regression analysis. Such similar approaches have recently been advocated in sociology (multimodel analysis: Young, 2018; Young & Holsteen, 2017) and epidemiology (vibration-of-effect analysis: Patel, Burford, & Ioannidis, 2015). Generally, these approaches formally appear less worked-out than the above ones and proceed more incremental than systematical or fully combinatorial. That is to say, available control variables (covariates or confounders) are added step by step to a model to test their influence. Owing to this narrower design and intention, we do not discuss them here further.

Research Synthesis of All Possible Study Subsets: Combinatorial Meta-Analysis One meta-analytic idea somewhat akin to the specificationcurve and multiverse analysis approaches is combinatorial meta-analysis (Olkin, Dahabreh, & Trikalinos, 2012). Mainstream sensitivity analysis in meta-analysis can be viewed as similar to regression diagnostics, in that it follows the leave-one-out method (i.e., of k studies leaving out one study at a time, and recalculating the statistic of interest based on the remaining k 1 studies in the meta-analysis). In contrast, combinatorial meta-analysis calculates the statistic of interest for all possible subsets of studies in the meta-analysis (of which there are 2k 1 subsets, when there are k studies). In addition, to visualize the results of Zeitschrift fßr Psychologie (2019), 227(1), 64–82


66

combinatorial meta-analysis (in particular, cross-study effect heterogeneity, depending on the selected study subset included in one meta-analytic scenario of the combinatorial meta-analytic universe), Olkin et al. (2012) proposed a novel meta-analytic graphical display, namely the GOSH (graphical display of study heterogeneity) plot. Although combinatorial meta-analysis is an elegant and excellent means to identify influential studies in a metaanalysis, as of yet this approach has rarely been used. Further, it quickly becomes computationally infeasible (due to 1) the combinatorial explosion inherent in the term 2k with an increasing number of primary studies desired to include in a meta-analysis.

A Specification-Curve and Multiverse-Analysis Approach to Meta-Analysis Our proposal is straightforward and simple: briefly, we suggest to adopt, transform, and blend the specification-curve and multiverse analysis approaches, which were developed for the analysis of primary studies, to a specification-curve and multiverse approach to meta-analysis (see Taylor & Munafò, 2016, for a recent call for method triangulation of meta-analytic evidence). This includes adaptations of the inferential statistical test (specifically, a parametric bootstrap procedure) of specification-curve analysis, as well as adaptations of the graphical displays of both specification-curve analysis (descriptive and inferential statistical specification-curve plots) and multiverse analysis (histograms and tile plots of p values for all specifications) to the meta-analytic framework. We supply software code for these data visualizations and all respective analyses (https://osf.io/nkv46). Also central to the context considered here is that, in essence, combinatorial meta-analysis is a brute-force method which simply automatically (and thus quasi blindfold) tests all possible study subsets in one meta-analysis. However, the vast majority of these conceivable subsets, which theoretically can be thought of, would not be regarded as reasonable alternative specifications vis-à-vis study eligibility in any meta-analysis. In that regard, the specification-curve and multiverse approach to metaanalysis can be viewed as a theoretically and conceptually guided, and thus parsimonious, minimal variant of combinatorial meta-analysis. A further important difference is that combinatorial meta-analysis analyzes all study subsets with the same meta-analytic technique, whereas the specification-curve and multiverse meta-analytic approach introduced here allows for several ones (e.g., fixed-effect vs. Zeitschrift für Psychologie (2019), 227(1), 64–82

M. Voracek et al., Which Data to Meta-Analyze, and How?

random-effects modeling). Bearing these differences in mind, we suggest to apply combinatorial meta-analysis in tandem with the conceptually more refined approach introduced here.

Worked Example: Meta-Analytic Specification-Curve and Multiverse Analysis of the Effect of Androgen Receptor Gene CAG Repeat Polymorphisms on Digit Ratio (2D:4D) Explanatory Background Our fully worked example is taken from a real, and contested, line of inquiry. In particular, it updates and expands two extant meta-analyses (Hönekopp, 2013; Voracek, 2014) on the same topic with new data, and for the first time utilizes a specification-curve and multiverse analysis approach of meta-analysis. In the following, we provide necessary background information about the research field underlying our example, explain the reasons for selecting this research example, and illustrate why we think that the meta-analytic specification-curve and multiverse analysis approach, along with combinatorial meta-analysis, is informative and insightful with regards to research constellations similar to this one. Diverse strands of animal research, as accumulated since the late 1950s, suggest that prenatal androgen action (PAA; foremost, testosterone levels, and exposure) have long-lasting, permanent (i.e., so-called organizational, or organizing) effects on the brain, behavioral traits, and disease susceptibility postnatally (Berenbaum & Beltz, 2011; Hines, 2010, 2011). This phenomenon is denominated as prenatal programing and, for the above reasons, of interest for a wide array of research fields (including biological, clinical, developmental, differential, economic, health, personality, and sport psychology). However, there are obvious barriers to study such effects in humans and in psychological science. For one thing, animal endocrine systems and routes, and the effects of these, may not be directly translatable to humans. On the other hand, prenatal hormone measurement is intractable for human research; human sex-hormonal experimentation (e.g., manipulating embryonic testosterone levels) for ethical reasons is infeasible; and experiments of nature (as provided by early-onset endocrine disorders in humans, such as congenital adrenal hyperplasia, complete androgen insufficiency syndrome, or polycystic ovary syndrome) have Ó 2019 Hogrefe Publishing


M. Voracek et al., Which Data to Meta-Analyze, and How?

their own limitations of insight. Hence, having valid retrospective markers for PAA (i.e., endocrine-sensitive endpoints which are observable and measurable) would be of great value for progress on this nexus of research questions and thus are a research desideratum (Cohen-Bendahan, van de Beek, & Berenbaum, 2005; Voracek, 2011). Of all such proposed PAA markers proposed over the past few decades (e.g., age at menarche, anogenital distance, finger-ridge count, otoacoustic emissions, and twintype comparisons of same-sex vs. other-sex dizygotic twin pairs), the second-to-fourth digit ratio (2D:4D) by far is the most frequently investigated one. 2D:4D is a fingerlength ratio, namely the length of the index finger (2D), relative to the length of the ring finger (4D). On average, men show lower (smaller) 2D:4D than women. This sex effect is of small-to-medium size (d < 0.50; Hönekopp & Watson, 2010). From embryologic studies, it is known that these sex differences and individual differences in 2D:4D emerge early, namely already prenatally, during the testosterone peak occurring after one third of gestational length, which in turn gives rise to sexual differentiation and masculinization of the brain and other tissues. Many contributors to the 2D:4D literature believe that sex and individual differences in 2D:4D are developmentally sufficiently stable, as to ensure that 2D:4D can indeed be taken as the long-desired retrospective PAA marker. The popularity of the 2D:4D marker for research is indicated by the fact that about one decade after the initiation of this line of inquiry (Manning, Scutt, Wilson, & LewisJones, 1998), according to a scientometric analysis of 2D:4D research, the literature totaled more than 300 published journal reports (Voracek & Loibl, 2009). Using the same literature search strategies as this scientometric account to keep track with the growth of this literature, our current estimate (as of end of August 2018) of the size of the 2D:4D literature amounts to more than 1,400 published journal reports, along with more than 150 published journal abstracts and about 300 unpublished academic theses. While these surely are formidable numbers, after 20 years of research, questions of validity (or, lack thereof) of the 2D:4D marker still permeate the literature. This is mainly due to an apparently widespread lack of replicability of initial 2D:4D research findings by subsequent large-scale investigations and corresponding metaanalyses (e.g., Voracek, Kaden, Kossmeier, Pietschnig, & Tran, 2018; Voracek, Pietschnig, Nader, & Stieger, 2011; Voracek, Tran, & Dressler, 2010). In this sense, 2D:4D research may well be characterized as a contested, if not polarized (Hofmann, 2018), field of investigation. As for its perceived importance to, and popularity in, psychological science, we note that published 2D:4D research preponderantly is conducted at psychology departments, and the journals most frequently publishing 2D:4D papers Ó 2019 Hogrefe Publishing

67

also are from psychology (Voracek & Loibl, 2009). Further, existing journal special issues on 2D:4D research have appeared in psychology journals (Hennig & Rammsayer, 2007; Voracek, 2011). Our worked example deals with one central validity claim of the 2D:4D marker, namely, its postulated association with a functional length-variant polymorphism found in the human androgen receptor (AR) gene (i.e., gene variants characterized through varying repetitive patterns, which variations alter the function of the gene). This association has been characterized as the “strongest evidence that androgens affect digit ratio” (Breedlove, 2010, p. 4117). Exon 1 of the human AR gene codes for an amino acid tract, in the form of CAG (polyglutamine) stretches of variable length. These repeat-length polymorphisms vary interindividually and, of particular importance, mediate the efficacy of testosterone action, such that longer CAG stretches are less efficacious, whereas shorter CAG stretches are more efficacious. Various research has found that, within physiologic limits, these CAG effects are linear. The genetically based differential efficacy existing in the human AR is therefore expected to correlate positively with 2D:4D, to the extent that the latter reflects testosterone sensitivity. That is, a shorter (and more efficacious) CAG repeat number should correspond to lower (masculinized, or male-typed) 2D:4D, whereas longer (and less efficacious) CAG repeats should correspond to higher (feminized, or female-typed) 2D:4D. This is what has been observed in the first suchlike study (Manning, Bundred, Newton, & Flanagan, 2003). Despite being based on a small sample (N = 50), numerous failures to replicate its findings in subsequent reports, which partly were based on much larger samples, and two meta-analyses of the cumulative empirical evidence, which both yielded null findings (Hönekopp, 2013; Voracek, 2014), the Manning et al. (2003) paper is one of the most-cited 2D:4D publications (Voracek & Loibl, 2009), with about 370 citations in Google Scholar (as of end of August 2018). Of these citations, more than one third (about 150) have accrued after the appearance of the two meta-analyses summarizing this literature. In contrast, citation counts for the two meta-analyses in the same database are comparatively low (25 citations each for Hönekopp, 2013, and Voracek, 2014). Further, a citation analysis (Voracek, 2014) of Manning et al. (2003) found that 80% of citations to Manning et al. (2003) cited the report confirmatively (as if there were evidence for 2D:4D/CAG correlations) and 70% cited the report solely (as if there were no further 2D:4D/CAG studies). In addition, citation analyses conducted in the Web of Science database (by citing source and science category) show that the citations garnered by Manning et al. (2003) preponderantly come from psychology journals. In line with this, the top-citing journal is from Zeitschrift für Psychologie (2019), 227(1), 64–82


68

psychology, as well as four further from the top-10-citing journals of Manning et al. (2003). Here, we are able to provide an appreciable update of the most recent meta-analysis on this topic (Voracek, 2014) only a few years afterward, because more than a few further 2D:4D/CAG studies have since been published. All of this shows the oftentimes uncertain and limited impact of meta-analyses on their respective literatures, and their sometimes disappointing ability to prevent redundant follow-up research (Habre, Tramèr, Pöpping, & Elia, 2014). Owing to its epistemological scope, a specification-curve/ multiverse meta-analysis should be more difficult to ignore, or wiped off, than a further conventional (single-specification) meta-analysis, and may as well safeguard against subsequent, largely overlapping and thus redundant (Ioannidis, 2016; Naudet, Schuit, & Ioannidis, 2017), conventional meta-analyses. This is why we opted to select this research question as the worked example.

Methods Literature Search for the Meta-Analytic Update For our worked example (for study details and findings, see Table 1), we update the most recent, and largest, published meta-analysis of CAG effects on 2D:4D (Voracek, 2014), that encompassed 13 studies published up to 2014 (Butovskaya et al., 2012; De Naeyer et al., 2014; Durdiaková et al., 2013; Folland et al., 2012; Hampson & Sankar, 2012; Hurd, Vaillancourt, & Dinsdale, 2011; Knickmeyer, Woolson, Hamer, Konneker, & Gilmore, 2011; Kubranská et al., 2014; Latourelle, Elwess, & Elwess, 2008; Loehlin, Medland, & Martin, 2012; Manning et al., 2003; Mas et al., 2009; Zhang et al., 2013), which provided a maximum of 18 samples (total N = 2,909) for meta-analytic inclusion, originating from nine countries located on five continents (Australia, Belgium, Canada, China, Slovakia, Spain, Tanzania, UK, and USA). Using the same multi-pronged literature search and data retrieval strategies and the same eligibility criteria as in the previous meta-analysis (see Voracek, 2014, for details), we ascertained seven further, more recent, studies (Babková Durdiaková et al., 2017; Chang et al., 2015, Cheng, Zhao, Lu, Liu, & Liu, 2016; Durdiaková, Celec, Laznibatová, Minárik, & Ostatníková, 2016; Durdiaková et al., 2015; Warrington et al., 2018; Zhang et al., 2018), which provided 13 additional samples for inclusion, including two samples from a further country (Denmark: Chang et al., 2015). The updated meta-analysis comprises a maximum of 31 samples, with total N = 10,183. The literature search also detected a duplicate publication (not included in analysis): Zhang et al. (2016), not citing Zhang et al. (2013), analyzed exactly the same sample, and used one half of the data, of the earlier report (by calculating Zeitschrift für Psychologie (2019), 227(1), 64–82

M. Voracek et al., Which Data to Meta-Analyze, and How?

correlations within subgroups defined by the lower/upper quartiles of study variables’ distributions). This corpus of primary studies largely is without author overlap; only one group contributed multiple (albeit relatively small) studies to the meta-analysis (Babková Durdiaková et al., 2017; Durdiaková et al., 2013, 2015, 2016; Kubranská et al., 2014). Apart from the CAG studies, Voracek (2014) also reported 2D:4D meta-analyses for a further AR gene repeat-length polymorphism, namely GGC (also termed GGN, polyglycine) stretches. We skip this further evidence, because the respective literature is much smaller and no additional data have emerged. Table 1 displays the 2D:4D correlations with CAG repeats for right-hand digit ratio (R2D:4D), as well as for left-hand digit ratio (L2D:4D), and the right-minus-lefthand difference in digit ratio (ΔR L). Although R2D:4D and L2D:4D are substantially positively correlated, and the ΔR L difference variable is not independent from its constituents as well, here we follow common conventions of digit ratio research and investigate all three of them separately. Specifically, it has been argued (Hönekopp & Watson, 2010) that R2D:4D shows larger sex differences and stronger, or more reliable, effects with variables of interest than L2D:4D, and that there is directional asymmetry, as well as a sex effect, in ΔR L (on average, ΔR L < 0, and more often so, or more pronounced, for men, as compared to women). The Specification Factors: Which Data to Meta-Analyze, and How We now turn to the specifications we make for the specification-curve and multiverse meta-analysis of the effects of CAG repeats on 2D:4D. We distinguish between external, or “How” factors (i.e., how to meta-analyze the data), and internal, or “Which” factors (i.e., which data to metaanalyze). We decided to consider two of the former and six of the latter type of factors, as follows. The first external factor concerns the choice of effect size, because, instead of meta-analyzing Pearson r coefficients, one could opt for transforming these to Fisher’s zr coefficients prior to meta-analysis (as in Voracek, 2014). The second external factor concerns the choice of the meta-analytic model. For instance, whereas Hönekopp (2013) used a random-effects model (REM), Voracek (2014) used the fixed-effect model (FEM). Further, we consider two REM variants, differing in how the between-study variance is estimated, namely the DerSimonian-Laird estimator (DL) and the restricted maximum-likelihood estimator (REML), and an unweighted meta-analytic model (UWM) as well. Although the latter approach clearly is atypical for meta-analysis (wherein the credo is that empirical evidence should be weighted according to its information value, a proxy of which is sample size), it nevertheless is Ó 2019 Hogrefe Publishing


Ó 2019 Hogrefe Publishing

Canada

Australia

Australia

Slovakia

China

China

Belgium

Slovakia

Denmark

Denmark

Slovakia

China

China

Slovakia

Slovakia

UK

UK

Loehlin (2012)

Durdiaková (2013)

Zhang (2013)

Zhang (2013)

De Naeyer (2014)

Kubranská (2014)

Chang (2015)

Chang (2015)

Durdiaková (2015)

Cheng (2016)

Cheng (2016)

Durdiaková (2016)

Babková Durdiaková (2017)

Warrington (2018), ALSPAC cohort

Warrington (2018), ALSPAC cohort

UK

Folland (2012)

Hampson (2012)

Tanzania

Butovskaya (2012)

Loehlin (2012)

USA

Knickmeyer (2011)

Spain

Mas (2009)

Canada

Spain

Mas (2009)

USA

USA

Latourelle (2008)

Hurd (2011)

USA

Latourelle (2008)

Knickmeyer (2011)

Country

UK

Study (first author)

Manning (2003)

Girls

Boys

Men

Women (premature ovarian failure patients) Women (controls) Girls

Boys

Men (Klinefelter syndrome, 47,XXY) Men (controls)

Men

Men

Women

Men

Boys

Girls

Boys

men

Men

Men

Girls

Boys

Male-to-female transsexuals Men

Men

Women

Men

Men

Sample

N

Photocopies

Photocopies

Flatbed scans

flatbed scans

R: 2,718; L: 2,714

R: 2,615; L: 2,618

65

51

156

74

Digicam photographs

Digicam photographs

15

b

73

73

75

677

391

294

147

218

182

134

71

103

b

70–74

– .03f

.28*c

.25c

.040c

.047c

.00b

.005c

(Continued on next page)

.013

.008

b

.00

.076g

.003g

.00

.094g

.104g

b

.312

.609*

– .01

.043

.053b

.0275b,e

.055c

.022

.085b

.06c

.10

.047

.00a

.0798b

.010b,c

.011

.03

b

.108b

.14

.1447b

.002b

f

.05

e

.018c e

.016 .030c

.09 .003

.04

.14*c

.08c

.063

.20

.1913

.028

.13

b

b

b,c

.014b

.12

.0941

.054

.06

.085

.10

.1347

.133

b

b,c

.143b

b

.006

.0021

.0685

71–74b

b

– b

– b

.00a,c

L

.00a

ΔR .36*

.005

L2D:4D

r

.29*

R2D:4D

178–180

63

72

b

72

35

50

Flatbed scans

Direct

Direct

Flatbed scans

Direct

Photocopies

Photocopies

Flatbed scans

Photocopies

Photocopies

Flatbed scans

Photocopies

Direct

Photocopies

Photocopies

Digicam photographs

Photocopies

Photocopies

Photocopies

Photocopies

Direct

2D:4D measurement

Table 1. Correlations of 2D:4D with CAG repeats length in the androgen receptor gene: Individual studies and updated meta-analysis

M. Voracek et al., Which Data to Meta-Analyze, and How? 69

Zeitschrift für Psychologie (2019), 227(1), 64–82


Note. Exact effect size not reported in the original study (but definitely was not nominally significant), and requested additional result details were not received (hence, effect set to zero). Effect size not reported in the original study, or the sample size was further amplified after publication (in either case, supply of the additional results details is gratefully acknowledged). cThe correlation is for the biallelic mean of CAG repeats. dEffect reported merely as “significant” in the original study, and requested additional result details were not received (hence, effect set to just-significant, p = .05, two-tailed). eEffect size (β coefficient) estimated from linear mixed-effects model, accounting for dependent data structure (siblings) and adjusted for age, height, and weight. fSpearman’s rs. gEffect size calculated from t statistic and group sizes, according to dichotomized (short vs. long) CAG repeats. *p < .05 (two-tailed). To ensure analytic reproducibility, effect sizes are not rounded. Datasets for the table are available at https://osf.io/ 2h73x/ (R2D:4D), https://osf.io/ac96w (L2D:4D), and (https://osf.io/5xud3 (ΔR L).

18.3 (7%)

a

Q (I²)

Combined r [95% CI]

Samples (total N)

b

.013 [ .024, .049] .007 [ .013, .028]

39.7 (40%) 35 (14%)

.019 [ .001, .038]

25 (9,014)

31 (10,183)

– –

.03 580

336 Flatbed scans

Flatbed scans China Zhang (2018)

Men China Zhang (2018)

Women

.06

– .128c .123c 287 Photocopies Australia Warrington (2018), QIMR cohort

Girls

ΔR

.135 .072 231 Photocopies Australia Warrington (2018), QIMR cohort

Boys

r

L2D:4D R2D:4D N 2D:4D measurement Sample Country Study (first author)

Table 1. (Continued)

18 (2,912)

M. Voracek et al., Which Data to Meta-Analyze, and How?

L

70

Zeitschrift für Psychologie (2019), 227(1), 64–82

interesting because the UWM has similarities with the “cognitive algebra” done in traditional, narrative, unsystematic reviews, namely the attitude of taking evidence “as is”, no matter what the respective underlying sample size is. Together, the two How factors make up for 2 4 = 8 different ways to meta-analyze the same data. Considering the Manning et al. (2003) study in terms of potentially relevant study features, we notice six of these, which therefore constitute our internal, or Which, factors. Manning et al. (2003) was a study of healthy adult White men, with 2D:4D directly measured from the fingers, and published as a full journal report, with all outcomes relevant for this meta-analysis reported therein. All these six study features (participant sex, age group, group status, ethnicity, 2D:4D measurement method, and publication status) are dichotomous; in theory, these Which factors thus make up for 26 = 64 ways to meta-analyze different data subsets. We note that, although specification factors generally are categorical, they are not necessarily confined to dichotomies, such as in this example. Further, there are no missing values on these, as the information either is directly reported in the study or self-evident. The six study features considered here are topically relevant for the following reasons. Regarding participant sex, analyzing AR gene CAG repeats in women is not as straightforward as it is in men, because the human AR gene is located on the X sex-chromosome, of which men (karyotype 46,XY) have but one, whereas women (46,XX) two, and therefore two AR alleles, of which one per cell is randomly inactivated. 2D:4D/CAG studies involving female samples (see Table 1) therefore use the biallelic mean of CAG repeats for analysis. For this reason, some researchers could object to meta-analyze female samples alongside male samples, or object to consider the evidence from female samples at all. In similar vein, researchers might object to consider non-adult (as opposed to adult samples), patient samples (as opposed to healthy individuals), nonWhite (as opposed to White samples), and samples with image-based 2D:4D measurement (as opposed to direct measurement), because the original evidence (Manning et al., 2003) was for healthy adult White males, whose fingers were directly measured. Regarding publication status, it is evident that among the primary studies a few only appeared as a published journal abstract, and not as a full report, and further that there also are a few studies for which effect-size guesstimates had to be imputed, because of lack of reporting detail in the published study and nonreceipt of requested additional study results information (Latourelle et al., 2008; Mas et al., 2009; for details, see Table 1). These latter studies have been incorporated in one of the prior meta-analyses (Voracek, 2014), but not in the other one (Hönekopp, 2013). It therefore appears fitting to account for publication status (full report, no guesstiÓ 2019 Hogrefe Publishing


M. Voracek et al., Which Data to Meta-Analyze, and How?

71

mates vs. no full report, and/or guesstimates) as the sixth, and final, of our internal factors. Evidently, this last factor merges several things. This is due to the specifics of the primary literature (see Table 1 note details) and its limited size. For larger meta-analyses, it would be both beneficial and feasible to disentangle these. As mentioned above, the six study features (our internal factors) give rise to potentially 26 = 64 different study designs. In terms of these study features, the exact antitype of the Manning et al. (2003) study would be a study conducted with a patient sample of non-adult non-White females, with image-based 2D:4D measurement, and not published as a full journal report (and/or involving an effect

guesstimate). Unsurprisingly, such an antitype study, with study features maximally dissimilar to those in Manning et al. (2003), does not occur among the known primary studies (Table 1). Rather, the sample most dissimilar to the original report is the female patient sample of Cheng et al. (2016), which still is identical in terms of two study features (adult sample and full report). On the other hand, there are two samples (De Naeyer et al., 2014; Chang et al., 2015: male sample) which are exactly identical on all these six study features to Manning et al. (2003), and the majority of samples is identical for at least four or even five out of the six study features. Table 2 displays the specification matrix of the six internal (or study-feature) factors for the

Table 2. Specification matrix for individual studies accounting for six study feature variables Study

Participant sex

Age group

Group status

Ethnicity

2D:4D measurement

Publication status

Manning (2003)

X

X

X

X

X

X

De Naeyer (2014)

X

X

X

X

X

X

Chang (2015), men

X

X

X

X

X

X

Hurd (2011)

X

X

X

X

Butovskaya (2012)

X

X

X

X X

X

Folland (2012)

X

X

X

X

X

Hampson (2012)

X

X

X

X

X

X

X

Kubranská (2014)

X

X

Chang (2015), patients

X

X

X

X X

X

Babková Durdiaková (2017)

X

X

X

X

Latourelle (2008), men

X

X

X

X

Mas (2009)

X

X

X

X

Knickmeyer (2011), boys

X

X

X

X

Loehlin (2012), boys

X

X

X

X

Durdiaková (2013)

X

X

X

Zhang (2013), men

X

Durdiaková (2015)

X

X

X X

X

X X

X

X

Warrington (2018), ALSPAC cohort boys

X

X

X

X

Warrington (2018), QIMR cohort boys

X

X

X

X

Zhang (2018), men

X

Latourelle (2008), women Mas (2009), patients

X

X

X

X

X

X

X X X

Knickmeyer (2011), girls

X

X

Loehlin (2012), girls

X

X

Zhang (2013), women

X

X

Cheng (2016), controls

X

X

X X X X

Durdiaková (2016)

X

X

X

Warrington (2018), ALSPAC cohort girls

X

X

X

X

X

X

Warrington (2018), QIMR cohort girls Zhang (2018), women

X

Cheng (2016), patients

X

X

X X

Note. Studies are ordered in decreasing similarity of study features to the original report of Manning et al. (2003) and, within degree of feature similarity, chronologically and alphabetically. The table entries (X vs. cell left blank) correspond to: male versus female sample (for participant sex), adult versus nonadult sample (for age group), healthy individuals versus patient sample (for group status), White versus non-White sample (for ethnicity), direct versus image-based measurement (for 2D:4D measurement), and published as full journal report and with no effect guesstimates necessary versus any of these (for publication status). The dataset for the table is available at https://osf.io/2h73x/.

Ó 2019 Hogrefe Publishing

Zeitschrift für Psychologie (2019), 227(1), 64–82


72

primary studies detailed in Table 1, listed by decreasing study-feature similarity to Manning et al. (2003). From this assembly it can also be gleaned that from the theoretical number of 26 = 64 different study designs, only 12 different study designs across 31 retrievable samples so far have been implemented by research. In accounting for the six internal (Which) factors, we fully combinatorially combined the factor levels, along with the respective superset, across all six factors. That is, the subset of male samples only, the subset of female samples only, and the superset of samples regardless of participant sex (either male, or female) were combined with those of adult samples only, non-adult samples only, and with samples of either age group; in turn, with healthy samples only, patient samples only, and samples of either group status; and so forth across all six factors. This yields 36 = 729 combinations potentially available for analysis. From these combinations, only those containing at least two samples were kept for meta-analysis, and duplicated combinations were not included in analysis. This finally yielded 85, 62, and 52 combinations for the R2D:4D, L2D:4D, and ΔR L analyses, respectively (or 12%, 9%, and 7% of the theoretically possible total). Each of these 85, 62, and 52 subsets (specified according to the Which, or study-feature, factors) was analyzed according to the 2 (effect-size metric) 4 (meta-analytic model) = 8 different ways (or How factors) to analyze the same meta-analytic subset, thus yielding a grand total of (8 85) + (8 62) + (8 52) = 1,592 different meta-analytic specifications calculated.

M. Voracek et al., Which Data to Meta-Analyze, and How?

inferential statistical test, we used a parametric bootstrap approach. For each sample from the literature (Table 1), we regarded all study features as fixed, but generated random values as new effect sizes under the assumption that the null hypothesis is true: that is, randomly drawn were values from a normal distribution with an expectation of always zero, but the standard deviation equal to the respective sample’s observed standard error (thus corresponding to the FEM of meta-analysis). Then, descriptive specification-curve analysis was applied. This whole procedure was repeated 1,000 times, and the resulting 1,000 bootstrapped specification curves then used to find the respective pointwise 2.5% and 97.5% quantiles as the lower and upper limits for each specification number separately. Exceeding one of these limits would indicate that the actual, descriptive specification curve deviates from the under-the-null scenario of no effect (r = 0) with two-tailed testing (in parenthesis, we note that, if desired, one-tailed testing would also be possible). Open Science Practices We disclose how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study (Simmons, Nelson, & Simonsohn, 2012). Specifically, as this is a meta-analysis, sample size is not determined, but rather arrived at through literature-search strategies and study inclusion/exclusion criteria (detailed above). The full meta-analytic dataset (see Table 1 note) is accessible via the OSF (Open Science Framework). For the full (i.e., conventional) meta-analytic model (see Results section below and Table 1), we did not exclude any data. Owing to the meta-analytic study format, there were no experimental manipulations. Also, there were no further measures than those appearing in Tables 1 and 2. All statistical manipulations (the How factors) and those due to study inclusion versus exclusion (the Which factors) are detailed above. The focus here is on method development, and the metaanalysis just an illustrative example; hence, we did not preregister it. However, all components necessary for reproducible data analysis (open data, open materials, and open code) are accessible via the OSF and, because of this repository’s characteristics, also comply with the FAIR (findable, accessible, interoperable, re-usable) guiding principles for scientific data (Wilkinson et al., 2016).

Algorithm for the Combinatorial Meta-Analysis With a maximum number of 31 available samples (for R2D:4D), well over 2 billion unique subsets (231 – 1 = 2,147,483,648 exactly) emerge for a full (exhaustive) combinatorial meta-analysis. While this computationally might still be feasible, it is time-consuming and poses problems with graphically displaying the results due to an abundance of overplotting data points. Conveniently, we chose a random sample of 100,000 different subsets for a combinatorial meta-analysis representative of the full set, using a stratified sampling approach with respect to subset size, such that the most prevalent subset sizes (those of intermediate size) were undersampled, while the rarest subset sizes (those of smallest and of largest size) were oversampled. This was achieved by randomly drawing unique subsets for each possible subset size (one to 31 samples for R2D:4D) separately, until the desired number of 100,000 unique subsets was reached.

Results and Discussion

Parametric Bootstrap for the Inferential Test of the Specification-Curve Meta-Analysis To evaluate the descriptive meta-analytic specificationcurve plot against the null hypothesis of no effect with an

Table 1 (bottom) contains the results of the updated metaanalysis for the associations of digit ratios (R2D:4D, L2D:4D, and ΔR L) with CAG repeats. According to these simple fixed-effect meta-analytic summaries, which use Fisher’s zr transformation of the Pearson r coefficients for

Zeitschrift für Psychologie (2019), 227(1), 64–82

Ó 2019 Hogrefe Publishing


M. Voracek et al., Which Data to Meta-Analyze, and How?

synthesis, there is no evidence for positive correlations between these variables. All combined effects are very close to zero and have rather tight 95% confidence intervals. Cross-study effect heterogeneity (as indicated by the Q tests and the I2 values) is relatively low. This updated meta-analysis exactly follows the meta-analytic decisions of Voracek (2014). As such, it is important to note that, as seen through the lens of specification-curve and multiverse meta-analysis, this constitutes not more than a single specification, whereas there are numerous alternative specifications. Figure 1 (to the left) provides a graphical display corresponding to these summary results (bottom of Table 1). Instead of the classic meta-analytic forest plot, we use an advancement of it, namely the meta-analytic rainforest plot (for details, see Schild & Voracek, 2015; Zhang, Kossmeier, Tran, Voracek, & Zhang, 2017). Figure 1 (to the right) contains the visualization (GOSH plots) of the all-subsets (combinatorial) meta-analyses. As mentioned above, we display random samples of 100,000 meta-analytic subsets, as drawn from the much larger number of possible subsets. The impression from the GOSH plots is straightforward: density estimates of the effect distributions are unimodal (thus not suggestive of influential subsets of studies or individual studies, including Manning et al., 2003, which study is highlighted in these plots) and closely centered around zero (thus not suggesting any real effects). Effect heterogeneity preponderantly is low; except that it is somewhat larger, when Manning et al. (2003) is included, thus indicating that this study really is an outlier. Figure 2 provides the descriptive meta-analytic specification-curve plots for the three meta-analyses (R2D:4D, L2D:4D, ΔR L). Whereas Simonsohn et al. (2015), in their corresponding plot for primary data analysis, depicted the regression-model point estimates resulting from the alternative specifications, we display the specifications’ summary effects with their associated 95% confidence intervals. Similar to meta-analytic caterpillar plots (i.e., magnitude-sorted forest plots), the summary effects are sorted by magnitude. The number of samples contained in each meta-analytic specification is depicted directly beneath, and the combination of Which and How factors constituting each meta-analytic specification through the area pattern below. This area pattern needs to be contemplated vertically. For facilitating this, we use a spectral-color design, comprised of six spectral colors (ordered from red, orange, yellow, green, blue, to violet), which, once more, signifies the number of samples involved in the respective meta-analytic specification (because of this near-redundancy, researchers may, of course, choose which components of this graphical display to keep). The array is such that red, orange, or yellow color (think of hot colors as alarm signals) means that in a given specification only the minimum number of samples is (or small numbers of Ó 2019 Hogrefe Publishing

73

samples are) involved, whereas violet, blue, or green color (think of cool colors as relaxative) codes for the maximum number (or at least large numbers) of samples involved. Again, the overall pattern is easy to follow: more or less regardless of the meta-analytic specifications made (in terms of which data are meta-analyzed, and how), no evidence for 2D:4D/CAG associations arises. For R2D:4D, 56 out of 680 specifications (8.2%) yield nominally significant (p < .05) positive combined effects, which would support the findings of Manning et al. (2003); for ΔR L, as few as 4 out of 416 specifications (2.7%); and for L2D:4D, only 6 out of 496 specifications (1.2%). Paralleling the results of Simonsohn et al. (2015) in their specification-curve analysis of the hurricane paper of Jung et al. (2014a), the rate of stray positive results is so low as to be perfectly plausible by chance alone. It is emphasized that it is the dominant pattern (i.e., the majority vote) arising from the space of specifications that counts, not any aggregation of these (e.g., their grand total). The latter neither is intended nor seems justified, as individual specifications partly are appreciably similar and there likely is no “deeper truth” calculable by averaging (Patel et al., 2015). It is interesting to note that the above small number of nominally significant specifications to a great extent involve rather small meta-analytic subsets (signified through hot colors and long confidence intervals in Figure 2) and tend to surface with UWM analyses. Meta-analysts certainly would not draw inferences from a completely unweighted model which summarizes only a small portion of available studies from the literature. However, at the same time this particular scenario has strong similarities with the actual reasoning and the usual procedures involved in writing stand-alone traditional (narrative, unsystematic) literature reviews, or in drafting the literature review section for the introductory part of an empirical research article. The idiosyncracy inherent in these scenarios is that only a small portion of the totality of research evidence is seen and accounted for, and moreover evaluated in a fashion as if all studies would have identical information value (see Kühberger, Scherndl, Ludwig, & Simon, 2016, for a demonstration of the detrimental effects of this misleading approach). For these reasons, it may well be informative to incorporate UWMs into specification-curve/multiverse meta-analyses on a regular basis. In the case of our worked example, this may also serve to understand the persistence of citations to Manning et al. (2003) in this literature, as well as the neglect of available meta-analyses on the same topic (Hönekopp, 2013; Voracek, 2014). The inferential meta-analytic specification plots (Figure 3) corroborate the above findings, in that they nowhere deviate from the under-the-null scenario of an underlying zero effect. The slight results differences between these Zeitschrift für Psychologie (2019), 227(1), 64–82


74

M. Voracek et al., Which Data to Meta-Analyze, and How?

Figure 1. Combinatorial meta-analysis of 2D:4D with CAG repeats length in the androgen receptor gene. Note. Rainforest plots (on the left; see Schild & Voracek, 2015) for all three meta-analyses visualize study effects as raindrops and the metaanalytic summary effect as diamond at the bottom. Raindrop widths correspond to conventional 95% confidence intervals, while raindrop heights and their shading correspond to the likelihood (i.e., plausibility) of underlying true values, considering the observed study effects, and are proportional to the meta-analytic weight. Study Manning et al. (2003) is highlighted in red. GOSH plots (on the right; see Olkin et al., 2012) show the FEM meta-analytic summary effects on the x axis and the between-study variance statistic I2 on the y axis for a random sample of 100,000 different study subsets. The distributions of these 100,000 values are visualized by density estimates at the top (for the summary effect) and to the right (for the I2 values). Study subsets including Manning et al. (2003) are highlighted in red in the color version of this figure available with the online version of the article. R code to reproduce the figure is available at https://osf.io/kqgey/.

Zeitschrift fßr Psychologie (2019), 227(1), 64–82

Ă“ 2019 Hogrefe Publishing


M. Voracek et al., Which Data to Meta-Analyze, and How?

75

Figure 2. (Continued on next page).

Ó 2019 Hogrefe Publishing

Zeitschrift für Psychologie (2019), 227(1), 64–82


76

M. Voracek et al., Which Data to Meta-Analyze, and How?

Figure 2. (Continued) Descriptive meta-analytic specification plots for R2D:4D, L2D:4D, and ΔR L. Descriptive meta-analytic specification plots depict the three specification-curve meta-analyses for R2D:4D, L2D:4D, and ΔR L. Within each plot, the vertical columns (in the lower half) represent which factor-level combinations of internal (How) and external (Which) specification factors constitute a given specification. In addition, each vertical column is color-coded, signifying the number of samples included in a specification (hot vs. cool spectral colors code for smaller vs. larger number of samples included). The panel in the middle (filled black line chart) likewise shows how many samples are included in a given specification. The top panel shows the resulting meta-analytic summary effects (r) for each specification, along with 95% confidence intervals. The summary effects are sorted by their magnitude and connected, resulting in a specification curve. A horizontal dotted line of no effect is inserted at r = 0. R code to reproduce the figure is available at https://osf.io/e4bs8/ (R2D:4D), https://osf.io/738sr/ (L2D:4D), https://osf.io/8tw59/ (ΔR L).

descriptive versus inferential specification plots (some stray positive results vs. none) are understandable through the different evaluation criteria used (null-hypothesis significance testing vs. parametric bootstrap); the conclusions however are identical. Finally, Figure 4 displays histograms of the p value distributions for the summary effects of the various meta-analytic specifications (as adopted from the multiverse analysis approach of Steegen et al., 2016). Further conforming with the above evidence of zero effects, no consistent or clear piling up of p < .05 values is evident. In principle, the gist of the information provided by these histograms can already be gleaned from the topmost part of the descriptive meta-analytic specification plots (Figure 2). For the sake of completeness, we note that a tile Zeitschrift für Psychologie (2019), 227(1), 64–82

plot of p values (constructed similarly to a two-dimensional nested cross-table design, as introduced by Steegen et al., 2016, for multiverse analysis of primary data) would furthermore enable to look up the exact meta-analytic specifications, wherein p < .05 values occur. Since we considered hundreds of specifications, the p value tile plot would be cluttered and thus is omitted here. Researchers working with fewer meta-analytic specifications might however wish to present a p value tile plot in addition (see Steegen et al., 2016, for examples). All in all, although the evidence from our worked example casts a bleak view on the validity status of the 2D:4D marker vis-à-vis genetically based testosterone sensitivity, precisely the exhaustiveness and convergence of this evidence matters and is reassuring. Ó 2019 Hogrefe Publishing


M. Voracek et al., Which Data to Meta-Analyze, and How?

77

Figure 3. Inferential meta-analytic specification plots for R2D:4D, L2D:4D, and ΔR L. Inferential meta-analytic specification plots show the specification curve (solid line) of the magnitudesorted observed meta-analytic summary effects for all specifications. The same curves appear in the corresponding descriptive meta-analytic specification plots (Figure 2). The limits of the gray area correspond to the pointwise 97.5% and 2.5% quantiles of 1,000 specification curves simulated under the null hypothesis for a given specification number, using a parametric bootstrap procedure. Exceeding these limits would constitute evidence against the null hypothesis (r = 0, regardless of specification). R code to reproduce the figure is available at https://osf.io/ ru264.

Figure 4. Histograms of the p value distributions for the summary effects of all meta-analytic specifications. Histograms of p values for all meta-analytic specifications, testing whether the meta-analytic summary effect differs from zero (Figure 2). The proportion of nominally significant values (p < .05) is in the leftmost column (light gray). R code to reproduce the figure is available at https://osf.io/yu98x.

Ó 2019 Hogrefe Publishing

Zeitschrift für Psychologie (2019), 227(1), 64–82


78

Conclusions and Implications We conclude with some further considerations regarding the presented approach. It is important to note that in specification-curve/multiverse meta-analysis the Which and How factors constituting the specifications cannot be adopted automatically: rather, they need to be tailor-made each time anew, informed by specific debates in the primary literature or by prior related meta-analyses. Still, this leaves room for subjectivity (researcher degrees of freedom) and disagreement about what the relevant and reasonable specifications are. Like primary studies, conventional meta-analyses increasingly are preregistered. This could also be done for meta-analytic specification designs. Relatedly, higher consensus might also be achieved by diversifying specification decisions via web-based frameworks, such as community-augmented meta-analysis (Tsuji, Bergmann, & Cristia, 2014) and Curate Science (LeBel, McCarthy, Earp, Elson, & Vanpaemel, 2018). Adversarial collaborations might be expedient (Kahneman, 2003; Kerr, Ao, Hogg, & Zhang, 2018). Combinatorial meta-analysis may act as the final arbiter in such matters. It has been observed that early decisions in metaanalyses (foremost, the study inclusion-exclusion criteria) frequently generate more result variation than the subsequent statistical modeling (Goodyear-Smith, van Driel, Arroll, & Del Mar, 2012). In other words, the Which factors take precedence over the How factors. Others have noted just the opposite pattern (e.g., Young & Holsteen, 2017; albeit for primary data analyses). It will therefore be interesting to see which importance relations between Which versus How factors will typically arise in applications of specification-curve/multiverse meta-analysis. We feel confident that there are abundant instances of empirical research suited for, and worthy the effort of, specification-curve/multiverse meta-analysis. We briefly allude here to just three such examples, all taken from current psychological research. Example 1: Are there ovulatory-cycle effects on women’s mating preferences, as predicted by evolutionary psychological theorizing? Yes, according to one meta-analysis, published in the premier journal Psychological Bulletin (Gildersleeve, Haselton, & Fales, 2014a), which however prompted an exchange between commentators (Harris, Pashler, & Mickes, 2014; Wood & Carden, 2014) and the authors (Gildersleeve, Haselton, & Fales, 2014b). No, according to another meta-analysis, published almost simultaneously (Wood, Kressel, Joshi, & Louie, 2014), which counterevidence triggered an even more extensive debate (van Anders, 2014; Brown, Cross, Street, & Brand, 2014; Ferguson, 2014; Hyde & Salk, 2014; Jones, 2014; Wood, 2014). Zeitschrift für Psychologie (2019), 227(1), 64–82

M. Voracek et al., Which Data to Meta-Analyze, and How?

Example 2: The research question of possible associations between brain size and cognitive abilities (IQ) has a long and checkered history. According to a widely cited meta-analysis (McDaniel, 2005), these correlations are substantial. According to the Web of Science database, this report currently ranks within the top-20 most-cited articles out of about 1,800 articles published in the journal Intelligence since 1980. Based on a substantially larger corpus of primary studies, and accounting for many hitherto unreported effects, other meta-analysts (Pietschnig, Penke, Wicherts, Zeiler, & Voracek, 2015) have found that these associations are noticeably smaller than previously thought and further show a decline in more recent studies (which would be consistent with stronger publication bias in earlier research). Subsequently, others (Gignac & Bates, 2017) applied alternative study eligibility criteria to the same meta-analytic database (i.e., did not retrieve and assemble new data), and in their meta-analysis of merely a subset of the Pietschnig et al. (2015) database, again observed a larger effect. Of note, the specification justified in Gignac and Bates (2017), even if reasonable, remains just one out of many more specifications that are conceivable. Example 3: Over the years, research about aggressive effects of violent video games has become known for controversies surrounding the veracity of this evidence. It appears that multiple (and throughout highly cited) meta-analyses have not resolved the issue to what extent such effects indeed are real (Anderson & Bushman, 2001; Anderson et al., 2010; Greitemeyer & Mügge, 2014) or more likely due to publication bias (Ferguson, 2007a, 2007b, 2015). As diverse as these examples may appear on the surface, their in-depth commonalities are more important. These include: (1) the conflicting meta-analyses are rooted in controversies already found in the respective literatures which they attempt to synthesize and clarify; (2) even multiple meta-analyses apparently can fail to resolve contentious issues that pervade corresponding primary research; and (3) this sometimes can lead to debates which, likely by more than a few in the research community, are viewed as agonizing and fruitless. We see potential in the proposed approach to mitigate and countersteer against such detrimental phenomena and undesired developments. In conclusion, whether it be primary studies or metaanalyses, there often seems to be a lack of consensus about which data to analyze and how to analyze them. Paralleling the potential of specification-curve analysis and multiverse Ó 2019 Hogrefe Publishing


M. Voracek et al., Which Data to Meta-Analyze, and How?

analysis for clarifying the trustworthiness and robustness of evidence from primary studies, an analogously pursued approach to meta-analysis, as introduced here, holds similar promise. Instead of presenting just one meta-analysis and then defending this specification of one’s own (or criticizing others’ alternative specifications), better assess all possible study subsets (combinatorial meta-analysis) and focus on relevant and justifiable meta-analytic specifications (specification-curve and multiverse meta-analysis).

References Anderson, C. A., & Bushman, B. J. (2001). Effects of violent video games on aggressive behavior, aggressive cognition, aggressive affect, physiological arousal, and prosocial behavior: A metaanalytic review of the scientific literature. Psychological Science, 12, 353–359. https://doi.org/10.1111/1467-9280.00366 Anderson, C. A., Shibuya, A., Ihori, N., Swing, E. L., Bushman, B. J., Sakamoto, A., . . . Saleem, M. (2010). Violent video game effects on aggression, empathy, and prosocial behavior in Eastern and Western countries: A meta-analytic review. Psychological Bulletin, 136, 151–173. https://doi.org/10.1037/a0018251 Babková Durdiaková, J., Celec, P., Koborová, I., Sedláčková, T., Minárik, G., & Ostatníková, D. (2017). How do we love? Romantic love style in men is related to lower testosterone levels. Physiological Research, 66, 695–703. Bakkensen, L. A., & Larson, W. D. (2014). Population matters when modeling hurricane fatalities [Letter to the editor]. Proceedings of the National Academy of Sciences of the United States of America, 111, E5331–E5332. https://doi.org/10.1073/pnas. 1417030111 Berenbaum, S. A., & Beltz, A. M. (2011). Sexual differentiation of human behavior: Effects of prenatal and pubertal organizational hormones. Frontiers in Neuroendocrinology, 32, 183–200. https://doi.org/10.1016/j.yfrne.2011.03.001 Breedlove, S. M. (2010). Organizational hypothesis: Instances of the fingerpost. Endocrinology, 151, 4116–4122. https://doi.org/ 10.1210/en.2010-0041 Brown, G. R., Cross, C. P., Street, S. E., & Brand, C. O. (2014). Comment: Beyond “evolutionary versus social”: Moving the cycle shift debate forward. Emotion Review, 6, 250–251. https://doi.org/10.1177/1754073914523050 Butovskaya, M. L., Vasilyev, V. A., Lazebny, O. E., Burkova, V. N., Kulikov, A. M., Mabulla, A., . . . Ryskov, A. P. (2012). Aggression, digit ratio, and variation in the androgen receptor, serotonin transporter, and dopamine D4 receptor genes in African foragers: The Hadza. Behavior Genetics, 42, 647–662. https:// doi.org/10.1007/s10519-012-9533-2 Chang, S., Skakkebæk, A., Trolle, C., Bojesen, A., Hertz, J. M., Cohen, A., . . . Gravholt, C. H. (2015). Anthropometry in Klinefelter syndrome: Multifactorial influences due to CAG length, testosterone treatment and possibly intrauterine hypogonadism. Journal of Clinical Endocrinology and Metabolism, 100, E508–E517. https://doi.org/10.1210/jc.2014-2834 Cheng, F., Zhao, J., Lu, H., Liu, D., & Liu, L. (2016). The association of the digit ratio and androgen receptor gene CAG polymorphism in patients with premature ovarian failure [in Chinese]. Journal of Ningxia Medical University, 38, 856–859, 867. Christensen, B., & Christensen, S. (2014). Are female hurricanes really deadlier than male hurricanes? [Letter to the editor]. Proceedings of the National Academy of Sciences of the United States of America, 111, E3497–E3498. https://doi.org/10.1073/ pnas.1410910111

Ó 2019 Hogrefe Publishing

79

Cohen-Bendahan, C. C. C., van de Beek, C., & Berenbaum, S. A. (2005). Prenatal sex hormone effects on child and adult sextyped behavior: Methods and findings. Neuroscience and Biobehavioral Reviews, 29, 353–384. https://doi.org/10.1016/j. neubiorev.2004.11.004 De Naeyer, H., Bogaert, V., De Spaey, A., Roef, G., Vandewalle, S., Derave, W., . . . Kaufman, J. M. (2014). Genetic variations in the androgen receptor are associated with steroid concentrations and anthropometrics but not with muscle mass in healthy young men. PLoS One, 9, e86235. https://doi.org/10.1371/ journal.pone.0086235 Durdiaková, J., Celec, P., Laznibatová, J., Minárik, G., Lakatošová, S., Kubranská, A., & Ostatníková, D. (2015). Differences in salivary testosterone, digit ratio and empathy between intellectually gifted and control boys. Intelligence, 48, 76–84. https://doi.org/10.1016/j.intell.2014.11.002 Durdiaková, J., Celec, P., Laznibatová, J., Minárik, G., & Ostatníková, D. (2016). Testosterone metabolism: A possible biological underpinning of non-verbal IQ in intellectually gifted girls. Acta Neurobiologiae Experimentalis, 76, 66–74. https://doi.org/ 10.21307/ane-2017-006 Durdiaková, J., Lakatošová, S., Kubranská, A., Laznibatová, J., Ficek, A., Ostatníková, D., & Celec, P. (2013). Mental rotation in intellectually gifted boys is affected by the androgen receptor CAG repeat polymorphism. Neuropsychologia, 94, 1693–1698. https://doi.org/10.1016/j.neuropsychologia.2013.05.016 Ferguson, C. J. (2007a). Evidence for publication bias in video game violence effects literature: A meta-analytic review. Aggression and Violent Behavior, 12, 470–482. https://doi.org/ 10.1016/j.avb.2007.01.001 Ferguson, C. J. (2007b). The good, the bad and the ugly: A metaanalytic review of positive and negative effects of violent video games. Psychiatric Quarterly, 78, 309–316. https://doi.org/ 10.1007/s11126-007-9056-9 Ferguson, C. J. (2014). Comment: Why meta-analyses rarely resolve ideological debates. Emotion Review, 6, 251–252. https://doi.org/10.1177/1754073914523046 Ferguson, C. J. (2015). Do angry birds make for angry children? A meta-analysis of video game influences on children’s and adolescents’ aggression, mental health, prosocial behavior, and academic performance. Perspectives on Psychological Science, 10, 646–666. https://doi.org/10.1177/1745691615592234 Folland, J. P., McCauley, T. M., Phypers, C., Hanson, B., & Mastana, S. S. (2012). Relationship of 2D:4D finger ratio with muscle strength, testosterone, and androgen receptor CAG repeat genotype. American Journal of Physical Anthropology, 148, 81–87. https://doi.org/10.1002/ajpa.22044 Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102, 460–465. https://doi.org/10.1511/2014. 111.460 Gignac, G. E., & Bates, T. C. (2017). Brain volume and intelligence: The moderating role of intelligence measurement quality. Intelligence, 64, 18–29. https://doi.org/10.1016/j.intell.2017. 06.004 Gildersleeve, K., Haselton, M. G., & Fales, M. R. (2014a). Do women’s mate preferences change across the ovulatory cycle? A meta-analytic review. Psychological Bulletin, 140, 1205–1259. https://doi.org/10.1037/a0035438 Gildersleeve, K., Haselton, M. G., & Fales, M. R. (2014b). Metaanalyses and p-curves support robust cycle shifts in women’s mate preferences: Reply to Wood and Carden (2014) and Harris, Pashler, and Mickes (2014). Psychological Bulletin, 140, 1272– 1280. https://doi.org/10.1037/a0037714 Goodyear-Smith, F. A., van Driel, M. L., Arroll, B., & Del Mar, C. (2012). Analysis of decisions made in meta-analyses of depression screening and the risk of confirmation bias: A case

Zeitschrift für Psychologie (2019), 227(1), 64–82


80

study. BMC Medical Research Methodology, 12, 76. https://doi. org/10.1186/1471-2288-12-76 Greitemeyer, T., & Mügge, D. O. (2014). Video games do affect social outcomes: A meta-analytic review of the effects of violent and prosocial video game play. Personality and Social Psychology Bulletin, 40, 578–589. https://doi.org/10.1177/ 0146167213520459 Habre, C., Tramèr, M. R., Pöpping, D. M., & Elia, N. (2014). Ability of a meta-analysis to prevent redundant research: Systematic review of studies on pain from propofol injection. British Medical Journal, 349, g5219. https://doi.org/10.1136/bmj.g5219 Hampson, E., & Sankar, J. S. (2012). Re-examining the Manning hypothesis: Androgen receptor polymorphism and the 2D:4D digit ratio. Evolution and Human Behavior, 33, 557–561. https://doi.org/10.1016/j.evolhumbehav.2012.02.003 Harris, C. R., Pashler, H., & Mickes, L. (2014). Elastic analysis procedures: An incurable (but preventable) problem in the fertility effect literature. Comment on Gildersleeve, Haselton, and Fales (2014). Psychological Bulletin, 140, 1260–1264. https://doi.org/10.1037/a0036478 Hennig, J., & Rammsayer, T. (2007). Research on 2D:4D: A promising challenge for the study of individual differences [Editorial]. Journal of Individual Differences, 28, 53–54. https:// doi.org/10.1027/1614-0001.28.2.53 Hines, M. (2010). Sex-related variation in human behavior and the brain. Trends in Cognitive Sciences, 14, 448–456. https://doi. org/10.1016/j.tics.2010.07.005 Hines, M. (2011). Gender development and the human brain. Annual Review of Neuroscience, 34, 69–88. https://doi.org/ 10.1146/annurev-neuro-061010-113654 Hofmann, B. (2018). Fake facts and alternative truths in medical research. BMC Medical Ethics, 19, 4. https://doi.org/10.1186/ s12910-018-0243-z Hönekopp, J. (2013). No evidence that 2D:4D is related to the number of CAG repeats in the androgen receptor gene. Frontiers in Endocrinology, 4, 185. https://doi.org/10.3389/ fendo.2013.00185 Hönekopp, J., & Watson, S. (2010). Meta-analysis of digit ratio 2D:4D shows greater sex difference in the right hand. American Journal of Human Biology, 22, 619–630. Hurd, P. L., Vaillancourt, K. L., & Dinsdale, N. L. (2011). Aggression, digit ratio and variation in androgen receptor and monoamine oxidase A genes in men. Behavior Genetics, 41, 543–556. https://doi.org/10.1007/s10519-010-9404-7 Hyde, J. S., & Salk, R. H. (2014). Comment: Menstrual cycle fluctuations in women’s mate preferences. Emotion Review, 6, 253–254. https://doi.org/10.1177/1754073914523049 Ioannidis, J. P. (2016). The mass production of redundant, misleading, and conflicted systematic reviews and metaanalyses. Milbank Quarterly, 94, 485–514. https://doi.org/ 10.1111/1468-0009.12210 Jones, B. C. (2014). Comment: Alternatives to Wood et al.’s conclusions. Emotion Review, 6, 254–256. https://doi.org/ 10.1177/1754073914523048 Jung, K., Shavitt, S., Viswanathan, M., & Hilbe, J. M. (2014a). Female hurricanes are deadlier than male hurricanes. Proceedings of the National Academy of Sciences of the United States of America, 111, 8782–8787. https://doi.org/10.1073/ pnas.1402786111 Jung, K., Shavitt, S., Viswanathan, M., & Hilbe, J. M. (2014b). Reply to Bakkensen and Larson: Population may matter but does not alter conclusions [Letter to the editor]. Proceedings of the National Academy of Sciences of the United States of America, 111, E5333. https://doi.org/10.1073/pnas.1419330111 Jung, K., Shavitt, S., Viswanathan, M., & Hilbe, J. M. (2014c). Reply to Christensen and Christensen and to Malter: Pitfalls of

Zeitschrift für Psychologie (2019), 227(1), 64–82

M. Voracek et al., Which Data to Meta-Analyze, and How?

erroneous analyses of hurricanes names [Letter to the editor]. Proceedings of the National Academy of Sciences of the United States of America, 111, E3499–E3500. https://doi.org/10.1073/ pnas.1411652111 Jung, K., Shavitt, S., Viswanathan, M., & Hilbe, J. M. (2014d). Reply to Maley: Yes, appropriate modeling of fatality counts confirms female hurricanes are deadlier [Letter to the editor]. Proceedings of the National Academy of Sciences of the United States of America, 111, E3835. https://doi.org/10.1073/pnas.1414111111 Kahneman, D. (2003). Experiences of collaborative research. American Psychologist, 58, 723–730. https://doi.org/10.1037/ 0003-066X.58.9.723 Kerr, N. L., Ao, X., Hogg, M. A., & Zhang, J. (2018). Addressing replicability concerns via adversarial collaboration: Discovering hidden moderators of the minimal intergroup discrimination effect. Journal of Experimental Social Psychology, 78, 66–76. https://doi.org/10.1016/j.jesp.2018.05.001 Knickmeyer, R. C., Woolson, S., Hamer, R. M., Konneker, T., & Gilmore, J. H. (2011). 2D:4D ratios in the first 2 years of life: Stability and relation to testosterone exposure and sensitivity. Hormones and Behavior, 60, 256–263. https://doi.org/10.1016/ j.yhbeh.2011.05.009 Kubranská, A., Lakatošová, S., Schmidtová, E., Durdiaková, J., Celec, P., & Ostatníková, D. (2014). Spatial abilities are not related to testosterone levels and variation in the androgen receptor in healthy young men. General Physiology and Biophysics, 33, 311–319. https://doi.org/10.4149/gpb_2014005 Kühberger, A., Scherndl, T., Ludwig, B., & Simon, D. M. (2016). Comparative evaluation of narrative reviews and meta-analyses: A case study. Zeitschrift für Psychologie, 224, 145–156. https://doi.org/10.1027/2151-2604/a000250 Latourelle, S. M., Elwess, N. L., & Elwess, J. M. (2008). Finger forecasting: A pointer to athletic prowess in women – a preliminary investigation by an undergraduate biology class. American Biology Teacher, 70, 411–414. https://doi.org/ 10.1662/0002-7685(2008)70[411:FFAPTA]2.0.CO;2 LeBel, E. P., McCarthy, R. J., Earp, B. D., Elson, M., & Vanpaemel, W. (2018). A unified framework to quantify the credibility of scientific findings. Advances in Methods and Practices in Psychological Science, 1, 389–402. https://doi.org/10.1177/ 2515245918787489 Loehlin, J. C., Medland, S. E., & Martin, N. G. (2012). Is CAG sequence length in the androgen receptor gene correlated with finger-length ratio? Personality and Individual Differences, 52, 224–227. https://doi.org/10.1016/j.paid.2011.09.009 Maley, S. (2014). Statistics show no evidence of gender bias in the public’s hurricane preparedness [Letter to the editor]. Proceedings of the National Academy of Sciences of the United States of America, 111, E3834. https://doi.org/10.1073/ pnas.1413079111 Malter, D. (2014). Female hurricanes are not deadlier than male hurricanes [Letter to the editor]. Proceedings of the National Academy of Sciences of the United States of America, 111, E3496. https://doi.org/10.1073/pnas.1411428111 Manning, J. T., Bundred, P. E., Newton, D. J., & Flanagan, B. F. (2003). The second to fourth digit ratio and variation in the androgen receptor gene. Evolution and Human Behavior, 24, 399–405. https://doi.org/10.1016/S1090-5138(03)00052-7 Manning, J. T., Scutt, D., Wilson, J., & Lewis-Jones, D. I. (1998). The ratio of 2nd to 4th digit length: A predictor of sperm numbers and concentrations of testosterone, luteinizing hormone and oestrogen. Human Reproduction, 13, 3000–3004. https://doi.org/10.1093/humrep/13.11.3000 Mas, M., Alonso, C., Hernandez, P., Fernandez, M., Gutierrez, P., Salido, E., & Baez, D. (2009). Androgen receptor CAG and GGN polymorphisms and 2D:4D finger ratio in male to female

Ó 2019 Hogrefe Publishing


M. Voracek et al., Which Data to Meta-Analyze, and How?

transsexuals [Abstract]. Journal of Sexual Medicine, 6(Suppl. 5), 419–420. McDaniel, M. A. (2005). Big-brained people are smarter: A metaanalysis of the relationship between in vivo brain volume and intelligence. Intelligence, 33, 337–346. https://doi.org/10.1016/ j.intell.2004.11.005 Naudet, F., Schuit, E., & Ioannidis, J. P. A. (2017). Overlapping network meta-analyses on the same topic: Survey of published studies. International Journal of Epidemiology, 46, 1999–2008. https://doi.org/10.1093/ije/dyx138 Nelson, L. D., Simmons, J., & Simonsohn, U. (2018). Psychology’s renaissance. Annual Review of Psychology, 69, 511–534. https://doi.org/10.1146/annurev-psych-122216-011836 Olkin, I., Dahabreh, I. J., & Trikalinos, T. A. (2012). GOSH: A graphical display of study heterogeneity. Research Synthesis Methods, 3, 214–223. https://doi.org/10.1002/jrsm.1053 Patel, C. J., Burford, B., & Ioannidis, J. P. (2015). Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. Journal of Clinical Epidemiology, 68, 1046–1058. https://doi.org/10.1016/j.jclinepi. 2015.05.029 Pietschnig, J., Penke, L., Wicherts, J. M., Zeiler, M., & Voracek, M. (2015). Meta-analysis of associations between human brain volume and intelligence differences: How strong are they and what do they mean? Neuroscience and Biobehavioral Reviews, 57, 411–432. https://doi.org/10.1016/j.neubiorev.2015.09.017 Rohrer, J. M., Egloff, B., & Schmukle, S. C. (2017). Probing birthorder effects on narrow traits using specification-curve analysis. Psychological Science, 28, 1821–1832. https://doi.org/ 10.1177/0956797617723726 Schild, A. H. E., & Voracek, M. (2015). Finding your way out of the forest without a trail of breadcrumbs: Development and evaluation of two novel displays of forest plots. Research Synthesis Methods, 6, 74–86. https://doi.org/10.1002/jrsm.1125 Silberzahn, R., Uhlmann, E. L., Martin, D., Anselmi, P., Aust, F., Awtrey, E. C., . . . Nosek, B. A. (2018). Many analysts, one dataset: Making transparent how variations in analytical choices affect results. Advances in Methods and Practices in Psychological Science, 1, 337–356. https://doi.org/10.1177/ 2515245917747646 Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). Falsepositive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. https://doi.org/10.1177/ 0956797611417632 Simmons, J., Nelson, L., & Simonsohn, U. (2012). A 21 word solution. Dialogue, 26, 4–7. https://doi.org/10.2139/ssrn.2160588 Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2015). Specification curve: Descriptive and inferential statistics on all reasonable specifications. Retrieved from http://sticerd.lse.ac.uk/ seminarpapers/psyc16022016.pdf Smith, G. (2016). Hurricane names: A bunch of hot air? Weather and Climate Extremes, 12, 80–84. https://doi.org/10.1016/j. wace.2015.11.006 Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11, 702–712. https://doi. org/10.1177/1745691616658637 Taylor, A. E., & Munafò, M. R. (2016). Triangulating metaanalyses: The example of the serotonin transporter gene, stressful life events and major depression. BMC Psychology, 4, 23. https://doi.org/10.1186/s40359-016-0129-0 Tsuji, S., Bergmann, C., & Cristia, A. (2014). Community-augmented meta-analyses: Toward cumulative data assessment. Perspectives on Psychological Science, 9, 661–665. https:// doi.org/10.1177/1745691614552498

Ó 2019 Hogrefe Publishing

81

van Anders, S. M. (2014). Comment: The social neuroendocrinology example: Incorporating culture resolves biobehavioral evolutionary paradoxes. Emotion Review, 6, 256–257. https:// doi.org/10.1177/1754073914523047 Voracek, M. (2011). Special issue preamble: Digit ratio (2D:4D) and individual differences research. Personality and Individual Differences, 51, 367–370. https://doi.org/10.1016/j.paid.2011. 04.018 Voracek, M. (2014). No effects of androgen receptor gene CAG and GGC repeat polymorphisms on digit ratio (2D:4D): A comprehensive meta-analysis and critical evaluation of research. Evolution and Human Behavior, 35, 430–437. https://doi.org/ 10.1016/j.evolhumbehav.2014.05.009 Voracek, M., Kaden, A., Kossmeier, M., Pietschnig, J., & Tran, U. S. (2018). Meta-analysis shows associations of digit ratio (2D:4D) and transgender identity are small at best. Endocrine Practice, 24, 386–390. https://doi.org/10.4158/EP-2017-0024 Voracek, M., & Loibl, L. M. (2009). Scientometric analysis and bibliography of digit ratio (2D:4D) research, 1998–2008. Psychological Reports, 104, 922–956. https://doi.org/10.2466/ PR0.104.3.922-956 Voracek, M., Pietschnig, J., Nader, I. W., & Stieger, S. (2011). Digit ratio (2D:4D) and sex-role orientation: Further evidence and meta-analysis. Personality and Individual Differences, 51, 417–422. https://doi.org/10.1016/j.paid.2010.06.009 Voracek, M., Tran, U. S., & Dressler, S. G. (2010). Digit ratio (2D:4D) and sensation seeking: New data and meta-analysis. Personality and Individual Differences, 48, 72–77. https://doi.org/ 10.1016/j.paid.2009.08.019 Warrington, N. M., Shevroja, E., Hemani, G., Hysi, P. G., Jiang, Y., Auton, A., . . . Evans, D. M. (2018). Genome-wide association study identifies nine novel loci for 2D:4D finger ratio, a putative retrospective biomarker of testosterone exposure in utero. Human Molecular Genetics, 27, 2025–2038. https://doi.org/ 10.1093/hmg/ddy121 Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., & Bouwman, J. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18 Wood, W. (2014). Author reply: Once again, menstrual cycles and mate preferences. Emotion Review, 6, 258–260. https://doi.org/ 10.1177/1754073914523053 Wood, W., & Carden, L. (2014). Elusiveness of menstrual cycle effects on mate preferences: Comment on Gildersleeve, Haselton, and Fales (2014). Psychological Bulletin, 140, 1265–1271. https://doi.org/10.1037/a0036722 Wood, W., Kressel, L., Joshi, P. D., & Louie, B. (2014). Metaanalysis of menstrual cycle effects on women’s mate preferences. Emotion Review, 6, 229–249. https://doi.org/10.1177/ 1754073914523073 Young, C. (2018). Model uncertainty and the crisis in science. Socius. Advance online publication. https://doi.org/10.1177/ 2378023117737206 Young, C., & Holsteen, K. (2017). Model uncertainty and robustness: A computational framework for multimodel analysis. Sociological Methods & Research, 46, 3–40. https://doi.org/ 10.1177/0049124115610347 Zhang, C., Dang, J., Pei, L., Guo, M., Zhu, H., Qu, L., . . . Huo, Z. (2013). Relationship of 2D:4D finger ratio with androgen receptor CAG and GGN repeat polymorphism. American Journal of Human Biology, 25, 101–106. https://doi.org/10.1002/ ajhb.22347 Zhang, C., Lu, H., Hao, S., Yan, Y., Dang, J., Zheng, L., . . . Huo, Z. (2016). Relationship between androgen receptor CAG/GGN repeat polymorphisms and the ratio of 2D:4D [in Chinese]. Acta Anatomica Sinica, 47, 409–414.

Zeitschrift für Psychologie (2019), 227(1), 64–82


82

Zhang, K., Yang, X., Yang, Y., Xue, M., Fang, P., Wang, B., . . . Gong, P. (2018). Revisiting the relation of ratio of 2D:4D with the androgen receptor (AR) gene and the circulating testosterone levels: Cross-sectional study and meta-analyses,. Manuscript submitted for publication Zhang, Z., Kossmeier, M., Tran, U. S., Voracek, M., & Zhang, H. (2017). Rainforest plots for the presentation of patientsubgroup analysis in clinical trials. Annals of Translational Medicine, 5, 24. https://doi.org/10.21037/atm.2017.10.07 History Received February 28, 2018 Revision received October 27, 2018 Accepted October 28, 2018 Published online March 29, 2019 Acknowledgments The authors gratefully acknowledge the input generated by inspiring audience questions and discussions at the Research Synthesis 2018 Conference (June 2018, Leibniz Institute for Psychology Information, University of Trier, Germany) and at the COSB (Centre for Organismal Systems Biology) Colloquium (May 2018, University of Vienna, Austria),

Zeitschrift für Psychologie (2019), 227(1), 64–82

M. Voracek et al., Which Data to Meta-Analyze, and How?

where earlier versions of this work were presented, as well as excellent anonymous reviewer comments, all of which helped to improve the presentation of this work. Martin Voracek Department of Basic Psychological Research and Research Methods Faculty of Psychology University of Vienna Liebiggasse 5 1010 Vienna Austria martin.voracek@univie.ac.at Michael Kossmeier Department of Basic Psychological Research and Research Methods Faculty of Psychology University of Vienna Liebiggasse 5 1010 Vienna Austria michael.kossmeier@univie.ac.at

Ó 2019 Hogrefe Publishing


Review Article

Visual Inference for the Funnel Plot in Meta-Analysis Michael Kossmeier, Ulrich S. Tran, and Martin Voracek Department of Basic Psychological Research and Research Methods, Faculty of Psychology, University of Vienna, Austria

Abstract: The funnel plot is widely used in meta-analyses to assess potential publication bias. However, experimental evidence suggests that informal, mere visual, inspection of funnel plots is frequently prone to incorrect conclusions, and formal statistical tests (Egger regression and others) entirely focus on funnel plot asymmetry. We suggest using the visual inference framework with funnel plots routinely, including for didactic purposes. In this framework, the type I error is controlled by design, while the explorative, holistic, and open nature of visual graph inspection is preserved. Specifically, the funnel plot of the actually observed data is presented simultaneously, in a lineup, with null funnel plots showing data simulated under the null hypothesis. Only when the real data funnel plot is identifiable from all the funnel plots presented, funnel plot-based conclusions might be warranted. Software to implement visual funnel plot inference is provided via a tailored R function. Keywords: funnel plot, meta-analysis, publication bias, small-study effects, visual inference

The funnel plot is a widely used diagnostic plot in metaanalysis to assess small-study effects and publication bias in particular (Light & Pillemer, 1984). It was one of the first genuine plots proposed to visualize meta-analytic data and, next to the forest plot (Lewis & Clarke, 2001), is the most iconic and popular display for this purpose (Schild & Voracek, 2013). In essence, the funnel plot is a scatter plot of study effects on the abscissa and a study precision measure (preponderantly, the study standard error) on the ordinate (Sterne & Egger, 2001). Its main idea is that observed effects should scatter randomly and symmetrically around the metaanalytic summary effect. As smaller studies are displayed toward the bottom and higher effect size variability is expected for these, this gives rise to the characteristic shape of an inverted funnel for this graphical display. Certain deviations from this expected funnel plot shape are commonly taken as suggestive for publication bias, although they similarly emerge via true effect heterogeneity or chance alone. In particular, a frequent observation is that smaller studies on average report larger effects. This so-called small-study effect, in turn, leads to asymmetric funnel plots (see Figure 1). However, despite the popularity of the funnel plot, its suitability to detect publication bias has been questioned (Lau, Ioannidis, Terrin, Schmid, & Olkin, 2006). Indeed, empirical research suggests that subjective interpretations of whether publication bias is present or absent, based on mere visual funnel plot inspection, often times are wrong Ă“ 2019 Hogrefe Publishing

(Hunter et al., 2014; Simmonds, 2015; Tang & Liu, 2000; Terrin, Schmid, & Lau, 2005). Formal statistical tests (e.g., Egger regression test, and others) based on funnel plot asymmetry are widely used to establish objectivity, while controlling for type I errors. All these tests have in common that they are based on funnel plot asymmetry quantified via the association of study effects with study standard errors (or respective functions of these). On the other hand, visual inspection of funnel plots allows incorporating a multitude of visually displayed statistical information in an exploratory fashion, in order to assess the presence and severity of publication bias. Important questions that can be addressed by visual funnel plot examinations include the following: Which role does statistical significance play (as indicated by significance-contour funnel plots)? Is there an abundance of just-significant studies? Is asymmetry driven by single-study outliers or clusters of studies? As prominently outlined in the Cochrane Collaboration handbook for systematic reviews, formal tests for funnel plot asymmetry should never be interpreted in isolation, but rather always, and first of all, in the light of a visual inspection of the funnel plot (Sterne, Egger, & Moher, 2008, p. 317). Hence, visual inference might be the sought-after bridge between both worlds: it allows researchers to formally safeguard against type I errors, while still being in keeping with the more general diagnostic, open, and explorative nature of visual examination of statistical graphs. Zeitschrift fßr Psychologie (2019), 227(1), 83–89 https://doi.org/10.1027/2151-2604/a000358


84

M. Kossmeier et al., Visual Inference for the Funnel Plot in Meta-Analysis

Figure 1. Two examples of funnel plots including 95% confidence contours and significance contours at the .05 and .01 levels. Left: unsuspicious funnel plot with symmetrical scatter of studies around the summary effect (vertical black line). Right: funnel plot showing conspicuous smallstudy effects. Evidently, studies with larger standard errors on average observe larger effects, whereas studies with null or negative effects are missing. Hence, the funnel plot is asymmetric. R code to reproduce the figure: https://osf.io/drws7/.

Visual Inference for Statistical Graphs Visual inference is a formal inferential framework (Buja et al., 2009) which allows researchers to test whether graphically displayed data support a hypothesis or not. The principal idea is that if a suitable statistical plot of actually observed data indeed is visually distinguishable from corresponding plots of data simulated under the null hypothesis, then this constitutes evidence against the null hypothesis. In this framework, valid inferences are drawn via the so-called lineup protocol. That is, a lineup of diagnostic plots is constructed for the actually observed data and the null hypothesis a researcher wants to reject. The total lineup comprises k plots, of which k 1 plots show data simulated under the null hypothesis, and wherein the single plot with the actually observed data is positioned randomly. This lineup then is inspected by a viewer unfamiliar with the actually observed data and their peculiarities with the aim to identify the real-data plot. In practice, the primary researcher and the viewer of the lineup often times will be the same person, which is valid as long as the primary researcher is unfamiliar with the shape of the real-data plot. After the viewer has selected one of the plots in the lineup, the position of the true-data plot in the lineup is revealed. If the real-data plot, showing the actually observed data, indeed visually was noticeably different and therefore identifiable by the viewer from all the other plots in the lineup, then the null hypothesis is rejected. If the actually observed data in fact are realizations of the null hypothesis, the probability to identify the real-data plot and therefore to falsely reject the null hypothesis is 1/k. Hence, the alpha level is controlled by design (i.e., the size Zeitschrift fßr Psychologie (2019), 227(1), 83–89

of the lineup). A natural choice for the number of plots in a lineup therefore is 20, corresponding to the conventional alpha level (5%). Just as with conventional statistical tests, a test statistic (the actual plot) is compared to the null distribution (the plots showing data simulated under the null hypothesis). At the same time, visual inference differs from conventional inference, in that the test statistic is not compared to the entire null distribution, but rather to a finite number of realizations thereof (Majumder, Hofmann, & Cook, 2013). Evaluations made by several independent viewers of a lineup, instead of by a single viewer, can be used for visual inference as well. This extension of the basic procedure has the potential to increase the power to correctly reject the null hypothesis. For either two, three, four, five, six, or seven viewers of the same lineup of 20 plots, it is sufficient that at least two viewers are able to identify the real-data plot to reject the null hypothesis with the alpha level of the procedure not exceeding 5% (for further details, see Majumder et al., 2013). In recent studies visual inference has shown promise under scenarios of different statistical plots and data contexts (Chowdhury et al., 2015; Loy, Follett, & Hofmann, 2016; Loy, Hofmann, & Cook, 2017; Majumder et al., 2013).

Visual Inference for the Meta-Analytic Funnel Plot We suggest evaluating meta-analytic funnel plots via visual inference for two main reasons. First, by controlling for type I errors, visual inference has the potential to increase the (often low) validity of conclusions based on mere visual Ă“ 2019 Hogrefe Publishing


M. Kossmeier et al., Visual Inference for the Funnel Plot in Meta-Analysis

1. Data ID 1 2 3 4 5 . . .

d 1.32 1.01 1.13 0.69 0.78 . . .

2. Create lineup se 0.48 0.47 0.25 0.32 0.30 . . .

85

3. Inspect lineup

4. Draw inference

Which funnel plot stands out? Typical evaluations Small-study effects discernible? Role of statistical significance? Abundance of just-significant study outcomes? Conspicuous outliers or clustering of studies?

Reject H0 Yes

Real-data funnel plot identified? No

Retain H0 Figure 2. Visual inference testing procedure using the lineup protocol with funnel plots. Starting from the effect sizes and their standard errors actually observed in the meta-analysis (step 1), a lineup is constructed, showing the real-data funnel plot randomly positioned among null funnel plots (step 2). A viewer visually inspects the funnel plots in the lineup and picks the one that seems most noticeably or eye-catching (step 3). If the picked funnel plot indeed is the real-data funnel plot, the null hypothesis (H0) used for null plot simulation is rejected (step 4).

inspection of the funnel plot. The lineup protocol allows researchers to safeguard against prematurely interpreting funnel plot patterns which might be perfectly plausible by chance. Second, formal statistical tests for funnel plot asymmetry exclusively focus on the association of study effects with their standard errors, whereas visual inference preserves the explorative nature of diagnostic graph inspection. Using visual perception as a formal statistical test allows to flexibly incorporate a multitude of visual information to assess the plausibility of the observed data under the null. Specifically, we propose using visual funnel plot inference as a pretest before drawing further conclusions from the visual inspection of a funnel plot. Only if the actually observed data displayed in a funnel plot visually are distinguishable from random patterns, any further conclusions based on mere visual inspection of the real-data funnel plot might be warranted. The procedural details to conduct valid statistical inference using funnel plots are outlined in Figure 2. As an illustrative example, Figure 3 shows a lineup for visual inference using a published meta-analytic funnel plot (Shanks et al., 2015).

Null-Plot Simulation for Visual Funnel-Plot Inference Essential for visual inference is the null distribution used to simulate the data displayed in the null plots, as this directly corresponds to the null hypothesis one seeks to reject. For meta-analysis, natural choices are the fixed-effect model (FEM) and the random-effects model (REM). The FEM assumes the observed effects yi can be modeled as

yi ¼ μ þ ui ; with ui N 0; σ2i : Ó 2019 Hogrefe Publishing

That is, the study effects yi are independent realizations of normal distributions with the same shared expected value μ, but with study-specific variances σ2i , which are mainly due to different sample sizes. The FEM therefore assumes that differences between study effects entirely are due to sampling error. The REM allows the modeling of additional (unsystematic) random variability between the study effects, which exceeds the amount of variability expected under the FEM. The REM assumes that the observed effects yi can be modeled as

yi ¼ μ þ ui þ ei ; with ui N 0; σ2i and ei N 0; τ2 :

In the REM, the variance of each effect is σ2i þ τ2 , and therefore increased by the constant τ2, as compared to the FEM. Based on these models, two straightforward ways to construct null plots for visual funnel plot inference are as follows. Given n actually observed effects yi,obs, with estimated ^i , the estimated meta-analytic summary standard errors σ ^obs , and an optional estimate for the between-study effect μ variance ^τ2obs , the effects displayed in each null plot are simulated using the following model:

^obs þ ui þ ei ; with ui N 0; σ ^2i and yi; simul ¼ μ ei N 0; ^τ2obs :

That is, the effects in each null plot are randomly drawn from normal distributions with expected value equal to the ^obs and actually observed meta-analytic summary effect μ study-specific variances equal to the sum of the observed ^2i , and the estimated between-study variance variance σ 2 ^τobs from the actually observed data (REM). For the FEM, ^τ2obs is simply set to zero. The null dataset for one null plot is then given as the n simulated effects yi,simul and the Zeitschrift für Psychologie (2019), 227(1), 83–89


86

M. Kossmeier et al., Visual Inference for the Funnel Plot in Meta-Analysis

Figure 3. Example of a funnel plot lineup using data from a published meta-analysis on romantic priming (Shanks et al., 2015). One funnel plot shows the actually observed data, whereas the data in the 19 other funnel plots have been simulated under the null hypothesis of a randomeffects meta-analytic model. Shown are 95% confidence contours (black lines), the summary effect (vertical line), and significance contours (dark area indicates the .05 and .01 levels). Only if the real-data funnel plot showing the actually observed data is distinguishable from the null plots and therefore is identifiable, the null hypothesis can be rejected and any further conclusions based on mere visual inspection of the real-data funnel plot might be warranted. The randomly positioned real-data funnel plot is at position 12. R code to reproduce the lineup figure and to conduct visual inference: https://osf.io/6qyg4/.

^ i . To initially observed corresponding standard errors σ emphasize, effect sizes are randomly drawn from a null distribution, with actually observed standard errors regarded as fixed. Under both the FEM and REM scenarios, the null hypothesis tested against in visual inference is that study effect sizes are independent realizations of normal distributions with the same expected value, but different (nonrandom) variances. In the context of meta-analysis, null funnel plots simulated that way are well-behaved, symmetric random noise. Which of the two models above should be used for nullplot simulation? In most cases, the REM is a suitable default choice. This allows researchers to exclude the alternative possibility, namely, that the reason for rejecting the null Zeitschrift für Psychologie (2019), 227(1), 83–89

hypothesis via visual inference was solely due to an excess of unsystematic between-study variation in the real-data plot. An exception to this rule would be when (paralleling the Cochran Q procedure as the conventional meta-analytic test) the excess of between-study variability itself is a target for visual inference. In this case, the FEM should be used for null-plot simulation. Finally, alternative models to simulate the data for the null funnel plots, including Bayesian models, may well be proposed and used. Quite a number of variants of the classic meta-analytic funnel plot have been proposed, which differ in the statistical information conveyed and in their diagnostic purpose (Langan, Higgins, Gregory, & Sutton, 2012). These variants include different choices for the ordinate, the display of study subgroups, confidence contours, significance Ó 2019 Hogrefe Publishing


M. Kossmeier et al., Visual Inference for the Funnel Plot in Meta-Analysis

87

Figure 4. Example of a funnel plot lineup incorporating subgroups of studies. One funnel plot shows the actually observed data from a published meta-analysis on the Mozart effect (Pietschnig, Voracek, & Formann, 2010), whereas the other 19 funnel plots show data simulated under the null hypothesis of a meta-analytic random-effects model. Study subgroups are depicted with different plotting symbols (white squares: studies from one author group of interest; dark circles: studies from all other authors). Subgroup membership is randomly drawn without replacement from the actually observed data and randomly assigned for each study in each null plot. The randomly positioned real-data funnel plot is at position 9. R code to reproduce the lineup figure and to conduct visual inference: https://osf.io/mx9zy/.

contours, or the regression line from Egger’s test, and the Duval-Tweedie trim-and-fill method. The visual inference framework can be accommodated to include all these variants. As an example, Figure 4 shows a lineup for visual inference of a meta-analytic funnel plot, incorporating subgroups, and using data from a published meta-analysis (Pietschnig, Voracek, & Formann, 2010).

Software for Visual Funnel-Plot Inference For meta-analytic practitioners, an important question is how to conveniently conduct visual funnel-plot inference. Within the statistical computing environment R (R Core Ó 2019 Hogrefe Publishing

Team, 2018), the package nullabor (Wickham, Chowdhury, & Cook, 2014) is available, which provides general-purpose functions to conduct visual inference with arbitrary graphical displays, including functionalities to reveal the position of the real-data funnel plot in the lineup only after inspecting a lineup. Building on this, we have developed and documented the R function funnelinf within the R package metaviz (Kossmeier, Tran, & Voracek, 2018) for specifically conducting visual funnel-plot inference. The funnelinf function provides tailored features, which currently include: (1) options for null-plot simulation under both FEM and REM meta-analysis; (2) subgroup analysis; (3) graphical options specific to the funnel plot (significance and confidence contours, and choice of the ordinate); and (4) additional options to display various statistical information (Egger’s regression line, and imputed studies by, as well Zeitschrift für Psychologie (2019), 227(1), 83–89


88

M. Kossmeier et al., Visual Inference for the Funnel Plot in Meta-Analysis

as the adjusted summary effect from, the trim-and-fill method). For further details, example code, and example data we refer to the documentation of package metaviz (https://CRAN.R-project.org/package=metaviz).

Conclusions and Implications We propose to present, contemplate, and evaluate the funnel plot of the actually observed data simultaneously with null-hypothesis funnel plots. Only if the real-data funnel plot is identifiable from null-plots, the null hypothesis is formally rejected and conclusions based on visual inspection of the real-data funnel plot might be warranted. We suggest using visual funnel-plot inference routinely, as it is a convenient way to increase the validity of conclusions based on funnel plots by saving investigators from interpreting funnel-plot patterns which might be perfectly plausible by chance. Empirical experiments are suited to examine the power of the procedure to reject the null hypothesis in different scenarios. Using datasets simulated under an alternative hypothesis of interest, the proportion of corresponding lineups leading to correctly rejecting the null hypothesis can be used as a direct estimate of the power of the lineup procedure (Majumder et al., 2013). Ideally, for this purpose larger numbers of independent viewers would work through funnel plot lineups from different experimental conditions. A difficulty in recruiting larger numbers of viewers is the likely effect of formal education in meta-analysis and expertise with funnel plots in particular. The magnitude of this effect is unknown, but at least questions the use of online recruitment systems like Amazon’s Mechanical Turk, which has been regularly used in visual inference experiments in the past (e.g., Loy et al., 2016; Majumder et al., 2013). Hence, either innovative ways have to be found to recruit viewers with already existing funnel plot expertise or to train viewers unfamiliar with the funnel plot in an efficient way prior to the experiment. Questions for future experimental research include: What is the power of the procedure in different scenarios, for instance, for varying levels of publication bias or between-study heterogeneity? How does the procedure compare to conventional statistical tests for funnel plot asymmetry? Are there power differences to detect publication bias when using different graphical variants of the funnel plot for visual inference, for instance, by additionally showing Egger’s regression line? Which role do viewer characteristics and expertise with funnel plots play in successfully conducting visual inference with funnel plots? What is the power-wise benefit when basing the decision to reject the null hypothesis on more than one viewer’s evaluation per lineup? As a promising topic for future Zeitschrift für Psychologie (2019), 227(1), 83–89

empirical inquiry, visual inference with funnel plots is yet at its beginning. Visual funnel plot inference also holds potential to serve didactic purposes in meta-analysis and research synthesis, by allowing students and users to collect experience with the manifold shapes and patterns appearing in funnel plots just by chance. For this specific purpose, plot lineups entirely comprised of null plots, also known as the Rorschach protocol of visual inference (Buja et al., 2009), might be used. Software to conduct all these forms of visual inference with meta-analytic funnel plots is readily available in the form of a tailored function within R package metaviz (Kossmeier et al., 2018).

References Buja, A., Cook, D., Hofmann, H., Lawrence, M., Lee, E. K., Swayne, D. F., & Wickham, H. (2009). Statistical inference for exploratory data analysis and model diagnostics. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 367, 4361–4383. https:// doi.org/10.1098/rsta.2009.0120 Chowdhury, N. R., Cook, D., Hofmann, H., Majumder, M., Lee, E. K., & Toth, A. L. (2015). Using visual statistical inference to better understand random class separations in high dimension, low sample size data. Computational Statistics, 30, 293–316. https://doi.org/10.1007/s00180-014-0534-x Hunter, J. P., Saratzis, A., Sutton, A. J., Boucher, R. H., Sayers, R. D., & Bown, M. J. (2014). In meta-analyses of proportion studies, funnel plots were found to be an inaccurate method of assessing publication bias. Journal of Clinical Epidemiology, 67, 897–903. https://doi.org/10.1016/j.jclinepi.2014.03.003 Kossmeier, M., Tran, U. S., & Voracek, M. (2018). metaviz,. [R software package]. Retrieved from https://CRAN.R-project.org/ package=metaviz Langan, D., Higgins, J. P., Gregory, W., & Sutton, A. J. (2012). Graphical augmentations to the funnel plot assess the impact of additional evidence on a meta-analysis. Journal of Clinical Epidemiology, 65, 511–519. https://doi.org/10.1016/j.jclinepi. 2011.10.009 Lau, J., Ioannidis, J. P., Terrin, N., Schmid, C. H., & Olkin, I. (2006). Evidence based medicine: The case of the misleading funnel plot. British Medical Journal, 333, 597. https://doi.org/10.1136/ bmj.333.7568.597 Lewis, S., & Clarke, M. (2001). Forest plots: Trying to see the wood and the trees. British Medical Journal, 322, 1479–1480. https:// doi.org/10.1136/bmj.322.7300.1479 Light, R. J., & Pillemer, D. B. (1984). Summing up: The science of reviewing research. Cambridge, MA: Harvard University Press. Loy, A., Follett, L., & Hofmann, H. (2016). Variations of Q-Q plots: The power of our eyes!. American Statistician, 70, 202–214. https://doi.org/10.1080/00031305.2015.1077728 Loy, A., Hofmann, H., & Cook, D. (2017). Model choice and diagnostics for linear mixed-effects models using statistics on street corners. Journal of Computational and Graphical Statistics, 26, 478–492. https://doi.org/10.1080/10618600.2017.1330207 Majumder, M., Hofmann, H., & Cook, D. (2013). Validation of visual statistical inference, applied to linear models. Journal of the American Statistical Association, 108, 942–956. https://doi.org/ 10.1080/01621459.2013.808157 Ó 2019 Hogrefe Publishing


M. Kossmeier et al., Visual Inference for the Funnel Plot in Meta-Analysis

Pietschnig, J., Voracek, M., & Formann, A. K. (2010). Mozart effect–Shmozart effect: A meta-analysis. Intelligence, 38, 314–323. https://doi.org/10.1016/j.intell.2010.03.001 R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Schild, A. H., & Voracek, M. (2013). Less is less: A systematic review of graph use in meta-analyses. Research Synthesis Methods, 4, 209–219. https://doi.org/10.1002/jrsm.1076 Shanks, D. R., Vadillo, M. A., Riedel, B., Clymo, A., Govind, S., Hickin, N., . . . Puhlmann, L. M. C. (2015). Romance, risk, and replication: Can consumer choices and risk-taking be primed by mating motives? Journal of Experimental Psychology: General, 144, e142–e158. https://doi.org/10.1037/xge0000116 Simmonds, M. (2015). Quantifying the risk of error when interpreting funnel plots. Systematic Reviews, 4, 24. https://doi.org/ 10.1186/s13643-015-0004-8 Sterne, J. A., & Egger, M. (2001). Funnel plots for detecting bias in meta-analysis: Guidelines on choice of axis. Journal of Clinical Epidemiology, 54, 1046–1055. https://doi.org/10.1016/S08954356(01)00377-8 Sterne, J. A., Egger, M., & Moher, D. (2008). Addressing reporting bias. In J. P. Higgins & S. Green (Eds.), Cochrane handbook for systematic reviews of interventions (pp. 297–333). Chichester, England: Wiley.

Ó 2019 Hogrefe Publishing

89

Tang, J. L., & Liu, J. L. (2000). Misleading funnel plot for detection of bias in meta-analysis. Journal of Clinical Epidemiology, 53, 477–484. https://doi.org/10.1016/S0895-4356(99)00204-8 Terrin, N., Schmid, C. H., & Lau, J. (2005). In an empirical evaluation of the funnel plot, researchers could not visually identify publication bias. Journal of Clinical Epidemiology, 58, 894–901. https://doi.org/10.1016/j.jclinepi.2005.01.006 Wickham, H., Chowdhury, N. R., & Cook, D. (2014). nullabor,. [R software package]. Retrieved from https://CRAN.R-project.org/ package=nullabor History Received February 28, 2018 Revision received September 27, 2018 Accepted October 22, 2018 Published online March 29, 2019 Michael Kossmeier Department of Basic Psychological Research and Research Methods Faculty of Psychology University of Vienna Liebiggasse 5 1010 Vienna Austria michael.kossmeier@univie.ac.at

Zeitschrift für Psychologie (2019), 227(1), 83–89


Call for Papers “The Psychology of Forensic Evidence” A Topical Issue of the Zeitschrift für Psychologie Guest Editors: Anna Sagana and Melanie Sauerland Faculty of Psychology and Neuroscience, University of Maastricht, The Netherlands

This topical issue focuses on one of the major challenges in criminal proceedings, namely the evaluation and application of forensic evidence. How information is gathered during criminal investigation has a direct influence on the nature and quality of evidence that is presented to key decision makers such as prosecutors and judges. Hence, the quality of evidence can make or break a case. Unfortunately, the collection and interpretation of forensic evidence are susceptible to contextual influences and cognitive biases. This is true not only for the so-called “soft” pieces of evidence such as alibis and witness statements, but also for the “hard” evidence such as fingerprints and DNA traces. Furthermore, it is becoming apparent that not all evidence is equal in terms of reliability and accuracy and that, in court, different types of evidence are assigned different weight, depending on early influences and occasionally premature interpretations. Interestingly, there often exists a mismatch in court between the perceived and actual quality of evidence. Therefore, the challenges and failures surrounding the collection and interpretation of evidence can shape the future of defendants and change the lives of those who come into contact with the legal system. This topical issue aspires to reach out to researchers who examine the role, validity, and corruptive influences of different pieces of evidence and their interplay in criminal proceedings. We invite articles that advance our current knowledge by addressing new perspectives, theoretical frameworks, and methods related to: (a) the evidentiary weight of different types of evidence, (b) their interaction with other forms of evidence, and (c) cognitive and behavioral changes that the knowledge of and beliefs about such evidence might generate. Articles may touch upon all players involved in the investigative process (e.g., eyewitnesses, suspects, police officers, forensic experts) and in court (e.g., attorneys, jurors, judges). Zeitschrift für Psychologie (2019), 227(1), 90 https://doi.org/10.1027/2151-2604/a000349

How to Submit There is a two-stage submission process. Initially, interested authors are requested to submit only abstracts of their proposed papers. Authors of the selected abstracts will then be invited to submit full papers. All papers will undergo blind peer review. Interested authors should submit a letter of intent including: (1) a working title for the manuscript, (2) names, affiliations, and contact information of all authors, and (3) an abstract of no more than 500 words detailing the content of the proposed manuscript to guest editor Anna Sagana (E-mail anna.sagana@maastrichtuniversity.nl). Feedback on whether or not the editors encourage authors to submit a full paper will be given by August 15, 2019. Deadline for submission of abstracts is June 23, 2019. Deadline for submission of full papers is November 15, 2019. The journal seeks to maintain a short turnaround time, with the final version of the accepted papers being due by February 22, 2020. The topical issue is scheduled as issue 3 (2020). For additional information please contact the guest editors. For detailed author guidelines please see the journal’s website at www.hogrefe.com/j/zfp/

About the Journal The Zeitschrift für Psychologie, founded in 1890, is the oldest psychology journal in Europe and the second oldest in the world. One of the founding editors was Hermann Ebbinghaus. Since 2007 it is published in English and devoted to publishing topical issues that provide state-of-the-art reviews of current research in psychology. Ó 2019 Hogrefe Publishing


Instructions to Authors The Zeitschrift für Psychologie publishes high-quality research from all branches of empirical psychology that is clearly of international interest and relevance, and does so in four topical issues per year. Each topical issue is carefully compiled by guest editors. The subjects being covered are determined by the editorial team after consultation within the scientific community, thus ensuring topicality. The Zeitschrift für Psychologie thus brings convenient, cutting-edge compilations of the best of modern psychological science, each covering an area of current interest. Zeitschrift für Psychologie publishes the following types of articles: Review Articles, Original Articles, Research Spotlights, Horizons, and Opinions. Manuscript submission: A call for papers is issued for each topical issue. Current calls are available on the journal’s website at www.hogrefe.com/j/zfp. Manuscripts should be submitted as Word or RTF documents by e-mail to the responsible guest editor(s). An article can only be considered for publication in the Zeitschrift für Psychologie if it can be assigned to one of the topical issues that have been announced. The journal does not accept general submissions. Detailed instructions to authors are provided at http://www. hogrefe.com/j/zfp Copyright Agreement: By submitting an article, the author confirms and guarantees on behalf of him-/herself and any coauthors that he or she holds all copyright in and titles to the submitted contribution, including any figures, photographs, line drawings, plans, maps, sketches and tables, and that the article and its contents do not infringe in any way on the rights of third parties. The author indemnifies and holds harmless the publisher from any third-party claims. The author agrees, upon acceptance of the article for publication, to transfer to the publisher on behalf of him-/herself and any coauthors the exclusive right to reproduce and distribute the article and its contents, both physically and in nonphysical, electronic, and other form, in the journal to which it

Ó 2019 Hogrefe Publishing

has been submitted and in other independent publications, with no limits on the number of copies or on the form or the extent of the distribution. These rights are transferred for the duration of copyright as defined by international law. Furthermore, the author transfers to the publisher the following exclusive rights to the article and its contents: 1. The rights to produce advance copies, reprints, or offprints of the article, in full or in part, to undertake or allow translations into other languages, to distribute other forms or modified versions of the article, and to produce and distribute summaries or abstracts. 2. The rights to microfilm and microfiche editions or similar, to the use of the article and its contents in videotext, teletext, and similar systems, to recordings or reproduction using other media, digital or analog, including electronic, magnetic, and optical media, and in multimedia form, as well as for public broadcasting in radio, television, or other forms of broadcast. 3. The rights to store the article and its content in machinereadable or electronic form on all media (such as computer disks, compact disks, magnetic tape), to store the article and its contents in online databases belonging to the publisher or third parties for viewing or downloading by third parties, and to present or reproduce the article or its contents on visual display screens, monitors, and similar devices, either directly or via data transmission. 4. The rights to reproduce and distribute the article and its contents by all other means, including photomechanical and similar processes (such as photocopying or facsimile), and as part of so-called document delivery services. 5. The right to transfer any or all rights mentioned in this agreement, as well as rights retained by the relevant copyright clearing centers, including royalty rights to third parties. Online Rights for Journal Articles: Guidelines on authors’ rights to archive electronic versions of their manuscripts online are given in the document ‘‘Guidelines on sharing and use of articles in Hogrefe journals’’ on the journal’s web page at www.hogrefe.com/j/zfp July 2017

Zeitschrift für Psychologie (2019), 227(1)


Alternatives to traditional self-reports in psychological assessment

“A unique and timely guide to better psychological assessment.” Rainer K. Silbereisen, Research Professor, Friedrich Schiller University Jena, Germany Past-President, International Union of Psychological Science

Tuulia Ortner / Fons J. R. van de Vijver (Editors)

Behavior-Based Assessment in Psychology Going Beyond Self-Report in the Personality, Affective, Motivation, and Social Domains (Series: Psychological Assessment – Science and Practice – Vol. 1) 2015, vi + 234 pp. US $63.00 / € 44.95 ISBN 978-0-88937-437-9 Also available as eBook Traditional self-reports can be an unsufficiant source of information about personality, attitudes, affect, and motivation. What are the alternatives? This first volume in the authoritative series Psychological Assessment – Science and Practice discusses the most influential, state-of-the-art forms of assessment that can take us beyond self-report. Leading scholars from various countries describe the theo-

www.hogrefe.com

retical background and psychometric properties of alternatives to self-report, including behavior-based assessment, observational methods, innovative computerized procedures, indirect assessments, projective techniques, and narrative reports. They also look at the validity and practical application of such forms of assessment in domains as diverse as health, forensic, clinical, and consumer psychology.


“It is the benefits that our products can bring, both for individuals and for society, that spur us on.” Dr. G.-Jürgen Hogrefe, Publisher


This third collection of “Hotspots in Psychology” presents state-of-the-art meta-analytic research in psychology. It features the first systematic literature review of the empirical evidence regarding stability and change of values in adults, as well as meta-analyses examining the effects of descriptive and injunctive norms on consumer decision making, how ethical leadership shapes organizational citizenship behavior, and whether interparental relationship quality differs between parents of children with or without ADHD. Methodological contributions include a systematic review of questionable research practices, an exploration of the role of researchers’ analytic flexibility when meta-analyzing data using specification-curve analysis and multiverse analysis, as well as a new approach to improving the quality of inferences drawn from funnel plots.

Contents include: The Mechanisms of Social Norms’ Influence on Consumer Decision Making: A Meta-Analysis Vladimir Melnyk, Erica van Herpen, Suzanne Jak, and Hans C. M. van Trijp How Does Ethical Leadership Impact Employee Organizational Citizenship Behavior? A Meta-Analytic Review Based on Two-Stage Meta-Analytic Structural Equation Modeling Yucheng Zhang, Long Zhang, Guangjian Liu, Jiali Duan, Shan Xu, and Mike W.-L. Cheung Impaired Interparental Relationships in Families of Children With Attention-Deficit/ Hyperactivity Disorder (ADHD): A Meta-Analysis Lena Weyers, Martina Zemp, and Georg W. Alpers Intra-Individual Value Change in Adulthood: A Systematic Literature Review of Longitudinal Studies Assessing Schwartz’s Value Orientations Carolin Schuster, Lisa Pinkowski, and Daniel Fischer Scientific Misconduct in Psychology: A Systematic Review of Prevalence Estimates and New Empirical Data Johannes Stricker and Armin Günther Which Data to Meta-Analyze, and How? A Specification-Curve and Multiverse-Analysis Approach to Meta-Analysis Martin Voracek, Michael Kossmeier, and Ulrich S. Tran Visual Inference for the Funnel Plot in Meta-Analysis Michael Kossmeier, Ulrich S. Tran, and Martin Voracek

Hogrefe Publishing Group Göttingen · Berne · Vienna · Oxford · Paris Boston · Amsterdam · Prague · Florence Copenhagen · Stockholm · Helsinki · Oslo Madrid · Barcelona · Seville · Bilbao Zaragoza · São Paulo · Lisbon www.hogrefe.com

ISBN 978-0-88937-555-0 90000 9 780889 375550


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.