Verbal Autopsy: innovations, applications, opportunities

Page 1

Verbal autopsy: innovations, applications, opportunities Population Health Metrics Reproduced from Volume 9 August 2011

BioMed Central

Downloading a QR code reader from the Internet to your smartphone/mobile device will allow you to scan this image which links to the online version of this series

Volume 9 August 2011

Verbal autopsy: innovations, applications, opportunities Improving cause of death measurement A thematic series


Journal information Population Health Metrics is published by: BioMed Central Ltd Floor 6, 236 Gray’s Inn Road London WC1X 8HB, UK T: +44 (0) 20 3192 20000 F: +44 (0) 20 3192 2010 E info@biomedcentral.com Population Health Metrics is an open access, peer-reviewed, online journal featuring innovative research, which addresses issues relating to concepts, methods, ethics applications and results in the measurement of the health of populations. This includes areas of health state measurement and valuation, summary measures of level of population health, inequality in population health, descriptive epidemiology at the population level, burden of disease and injury analysis, disease and risk factor modelling for populations, and comparative assessment of risks to health at the population level. The journal aims to provide a platform for researchers in all these areas to share their findings with the global research community. The journal can be found at http://www.pophealthmetrics.com (ISSN 1478-7954). Open Access

Online access to Population Health Metrics is free and available to all. All articles published by BioMed Central in Population Health Metrics are open access, which means they are universally and freely accessible via the Internet and deposited in at least one widely and internationally recognized open access repository (such as PubMed Central). Open access also means that the authors or copyright owners grant any third party the right to use, reproduce and disseminate the article. BioMed Central is committed to maintaining open access

for all research articles that it publishes, both retrospectively and prospectively, in all eventualities, including any future changes in ownership. For more information please refer to the BioMed Central Open Access Charter and permanency of articles web pages accessible from http://www.biomedcentral.com/info/about

Advertising For information about advertising in Population Health Metrics, including the rate card and specifications, contact the Advertising Department at the above address. Alternatively send an email to: advertising@biomedcentral.com

Copyright Unless stated otherwise, copyright rests with the publisher, BioMed Central; copyright for articles rests with the authors. With the exception of articles labelled ‘Open Access’, no part of this publication may be reproduced or transmitted, electronic or otherwise, without prior permission of the copyright owner. For all other use, permission should be sought directly from BioMed Central Ltd, Floor 6, 236 Gray’s Inn Road, London WC1X 8HB, UK. For articles labelled ‘Open Access’, verbatim copying and redistribution are permitted in all media provided the copyright notice is preserved (see any article labelled ‘Open Access’) along with the article’s original URL.

Reprints BioMed Central can provide high quality reprints and offers a rapid delivery service. For further information and prices, contact the above address, or send an email to: reprints@biomedcentral.com

Indexing/abstracting Population Health Metrics is indexed by: • CABI • CAS • Cinahl • Citebase • EmBase • EmCare • Google • Google Scholar • Index Copernicus • OAIster • PubMed • PubMed Central • Scirus • Scopus • SOCOLAR • Zetoc

Disclaimer Whilst every effort is made by the publishers, editors-in-chief and editorial board to see that no inaccurate or misleading data, opinion, or statement appear in this publication, they wish to make it clear that the data and opinions appearing in the articles herein are the responsibility of the contributor concerned. Accordingly, the publishers, the editors-in-chief and editorial board, and their respective employees, officers and agents accept no liability whatsoever for the consequences of any such inaccurate or misleading data, opinion or statement.

Cover images Photo 1 credit: The International Rescue Committee’s mortality survey teams interview villagers in Misoke, Democratic Republic of Congo, 2007. The teams’ findings were published in the report “Measuring Mortality in the Democratic Republic of Congo” released by the IRC. Photo by Peter Biro/The IRC. © 2011 International Rescue Committee. Photo 2 and 3 credit: Institute for Health Metrics and Evaluation

PopHealthMetrics-9-prelims.indd 1

26/08/2011 15:45:35


Volume 9 August 2011

Verbal autopsy: innovations, applications, opportunities Improving cause of death measurement A thematic series

Thematic Series Editors: Alan D Lopez University of Queensland School of Population Health, Australia Rafael Lozano Institute for Health Metrics and Evaluation, United States of America Christopher JL Murray Institute for Health Metrics and Evaluation, United States of America Kenji Shibuya University of Tokyo, Japan

Acknowledgments Submissions for this thematic series were encouraged greatly by the “Global Congress on Verbal Autopsy: State of the Science,” held in February 2011 in Bali, Indonesia. The editors would like to acknowledge the critical contributions of individuals and organizations whose collective efforts led to that event, including the members of the Congress’ Organizing Committee, the Institute for Health Metrics and Evaluation, the University of Queensland, the Indonesian National Institute of Health Research and Development, the INDEPTH Network, and the participants from many organizations who helped fuel a vibrant and stimulating discussion. We would like to thank both the Australian Agency for International Development (AusAID) and the Bill & Melinda Gates Foundation for their contributions of resources to help fund both Congress participants and submissions for this thematic series. Finally, we would like to thank the peer reviewers who provided their insights and comments.

http://www.pophealthmetrics.com/series/verbal_autopsy


Volume 9 August 2011

Editorial Governance Editors-in-Chief Christopher JL Murray (Institute for Health Metrics and Evaluation, University of Washington, USA) Alan D Lopez (University of Queensland School of Population Health, Australia)

Associate Editors Majid Ezzati (Imperial College London, UK) Emmanuela Gakidou (IHME, University of Washington, USA) Michel Guillot (University of Pennsylvania Population Studies Center, USA) Aisha O Jumaan (Centers for Disease Control and Prevention, USA) Rafael Lozano (IHME, University of Washington, USA) Ali Mokdad (IHME, University of Washington, USA) Joshua Salomon (Harvard School of Public Health, USA) Kenji Shibuya (Global Health Policy, University of Tokyo, Japan)

Managing Editors Kate Muller (IHME, University of Washington, USA) Jill Oviatt (IHME, University of Washington, USA)

Editorial Board Dan Brock (USA), Peter Byass (Sweden), David Cutler (USA), Lalit Dandona (India), George Davey-Smith (UK), Marie Louise Essink-Bot (Netherlands), Tim Evans (Bangladesh), Marc Fleurbaey (France), Prabhat Jha (Canada), Ichiro Kawachi (USA), Paul Kind (UK), Gary King (USA), Anton E Kunst (Netherlands), Ana Langer (USA), Martin McKee (UK), Matthew McKenna (USA), Vikram Patel (India), Samuel Preston (USA), Juergen Rehm (Canada), Jonathan Samet (USA), Vladimir M Shkolnikov (Germany), Katarzyna Skarbek (Belgium), Cynthia Stanton (USA), Martin Tobias (New Zealand), Stephen Tollman (South Africa), Eddy van Doorslaer (Netherlands), Michael Wolfson (Canada)


Volume 9 August 2011

Contents EDITORIAL Verbal autopsy: advancing science, facilitating application Christopher JL Murray, Alan D Lopez, Kenji Shibuya, Rafael Lozano

1

COMMENTARIES Verbal autopsy: who needs it? Carla AbouZahr

3

Verbal autopsy and global mortality statistics: if not now, then when? Philip W Setel

5

Opportunities and challenges for verbal autopsy in the national Death Registration System in Sri Lanka: past and future Samath D Dharmaratne, Rajitha L Jayasuriya, Buddhipani Y Perera, EM Gunesekera, A Sathasivayyar

7

Validation and validity of verbal autopsy procedures Daniel Chandramohan

9

Whither verbal autopsy? Peter Byass

11

Advances in verbal autopsy: pragmatic optimism or optimistic theory? Edward Fottrell

13

Synergism of verbal autopsy and diagnostic pathology autopsy for improved accuracy of mortality data Corinne L Fligner, Jill Murray, Drucilla J Roberts

15

Computer-based analysis of verbal autopsies: revolution or evolution? Ian Riley

18

ARTICLES Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets Christopher JL Murray, Alan D Lopez, Robert Black, Ramesh Ahuja, Said M Ali, Abdullah Baqui, Lalit Dandona, Emily Dantzer, Vinita Das, Usha Dhingra, Arup Dutta, Wafaie Fawzi, Abraham D Flaxman, Sara Gómez, Bernardo Hernández, Rohina Joshi, Henry Kalter, Aarti Kumar, Vishwajeet Kumar, Rafael Lozano, Marilla Lucero, Saurabh Mehta, Bruce Neal, Summer L Ohno, Rajendra Prasad, Devarsetty Praveen, Zul Premji, Dolores Ramírez-Villalobos, Hazel Remolador, Ian Riley, Minerva Romero, Mwanaidi Said, Diozele Sanvictores, Sunil Sazawal, Veronica Tallo

22

Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies Christopher JL Murray, Rafael Lozano, Abraham D Flaxman, Alireza Vahdatpour, Alan D Lopez

37


Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards Abraham D Flaxman, Alireza Vahdatpour, Sean Green, Spencer L James and Christopher JL Murray for the Population Health Metrics Research Consortium (PHMRC)

48

Simplified Symptom Pattern Method for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards Christopher JL Murray, Spencer L James, Jeanette K Birnbaum, Michael K Freeman, Rafael Lozano and Alan D Lopez for the Population Health Metrics Research Consortium (PHMRC)

59

Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies Spencer L James, Abraham D Flaxman and Christopher JL Murray for the Population Health Metrics Research Consortium (PHMRC)

73

Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards Rafael Lozano, Alan D Lopez, Charles Atkinson, Mohsen Naghavi, Abraham D Flaxman and Christopher JL Murray for the Population Health Metrics Research Consortium (PHMRC)

89

Effects on the estimated cause-specific mortality fraction of providing physician reviewers with different formats of verbal autopsy data Rohina Joshi, Devarsetty Praveen, Clara Chow, Bruce Neal

102

An improved method for physician-certified verbal autopsy reduces the rate of discrepancy: experiences in the Nouna Health and Demographic Surveillance Site (NHDSS), Burkina Faso Maurice Yé, Eric Diboulo, Louis Niamba, Ali Sié, Boubacar Coulibaly, Cheik Bagagnan, Jonas Dembélé, Heribert Ramroth

109

Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards Abraham D Flaxman, Alireza Vahdatpour, Spencer L James, Jeanette K Birnbaum and Christopher JL Murray for the Population Health Metrics Research Consortium (PHMRC)

118

Using verbal autopsy to assess the prevalence of HIV infection among deaths in the ART period in rural Uganda: a prospective cohort study, 2006-2008 Billy N Mayanja, Kathy Baisley, Norah Nalweyiso, Freddie M Kibengo, Joseph O Mugisha, Lieve Van der Paal, Dermot Maher, Pontiano Kaleebu

128

Epidemiologic application of verbal autopsy to investigate the high occurrence of cancer along Huai River Basin, China Xia Wan, Maigeng Zhou, Zhuang Tao, Ding Ding, Gonghuan Yang

135

Assessing quality of medical death certification: Concordance between gold standard diagnosis and underlying cause of death in selected Mexican hospitals Bernardo Hernández, Dolores Ramírez-Villalobos, Minerva Romero, Sara Gómez, Charles Atkinson, Rafael Lozano

144

Use of verbal autopsy in a national health information system: Effects of the investigation of ill-defined causes of death on proportional mortality due to injury in small municipalities in Brazil Elisabeth França, Deise Campos, Mark DC Guimarães, Maria de Fátima M Souza

154

Feasibility of using a World Health Organization-standard methodology for Sample Vital Registration with Verbal Autopsy (SAVVY) to report leading causes of death in Zambia: results of a pilot in four provinces, 2010 Sheila S Mudenda, Stanley Kamocha, Robert Mswia, Martha Conkling, Palver Sikanyiti, Dara Potter, William C Mayaka, Melissa A Marx

163

Verbal autopsy completion rate and factors associated with undetermined cause of death in a rural resource-poor setting of Tanzania Mathew A Mwanyangala, Honorathy M Urassa, Jensen C Rutashobya, Chrisostom C Mahutanga, Angelina M Lutambi, Deodatus V Maliti, Honorati M Masanja, Salim K Abdulla, Rose N Lema

172

Classifying perinatal mortality using verbal autopsy: is there a role for nonphysicians? Cyril Engmann, John Ditekemena, Imtiaz Jehan, Ana Garces, Mutinta Phiri, Vanessa Thorsten, Manolo Mazariegos, Elwyn Chomba, Omrana Pasha, Antoinette Tshefu, Elizabeth M McClure, Dennis Wallace, Robert L Goldenberg, Waldemar A Carlo, Linda L Wright, Carl Bose

179

Trends in causes of death among children under 5 in Bangladesh, 1993-2004: an exercise applying a standardized computer algorithm to assign causes of death using verbal autopsy data Li Liu, Qingfeng Li, Rose A Lee, Ingrid K Friberg, Jamie Perin, Neff Walker, Robert E Black

189


Social autopsy: INDEPTH Network experiences of utility, process, practices, and challenges in investigating causes and contributors to mortality Karin Källander, Daniel Kadobera, Thomas N Williams, Rikke T Nielsen, Lucy Yevoo, Aloysius Mutebi, Jonas Akpakli, Clement Narh, Margaret Gyapong, Alberta Amu, Peter Waiswa

200

Social autopsy for maternal and child deaths: a comprehensive literature review to examine the concept and the development of the method Henry D Kalter, Rene Salgado, Marzio Babille, Alain K Koffi, Robert E Black

212

Using verbal autopsy to track epidemic dynamics: the case of HIV-related mortality in South Africa Peter Byass, Kathleen Kahn, Edward Fottrell, Paul Mee, Mark A Collinson, Stephen M Tollman

225

Verbal autopsy-based cause-specific mortality trends in rural KwaZulu-Natal, South Africa, 2000-2009 Abraham J Herbst, Tshepiso Mafojane, Marie-Louise Newell

233

Adaptation of a probabilistic method (InterVA) of verbal autopsy to improve the interpretation of cause of stillbirth and neonatal death in Malawi, Nepal, and Zimbabwe Stefania Vergnano, Edward Fottrell, David Osrin, Peter N Kazembe, Charles Mwansambo, Dharma S Manandhar, Stephan P Munjanja, Peter Byass, Sonia Lewycka, Anthony Costello

246

Validating physician-certified verbal autopsy and probabilistic modeling (InterVA) approaches to verbal autopsy interpretation using hospital causes of adult deaths Evasius Bauni, Carolyne Ndila, George Mochamah, Gideon Nyutu, Lena Matata, Charles Ondieki, Barbara Mambo, Maureen Mutinda, Benjamin Tsofa, Eric Maitha, Anthony Etyang, Thomas N Williams

255

Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards Rafael Lozano, Michael K Freeman, Spencer L James, Benjamin Campbell, Alan D Lopez, Abraham D Flaxman and Christopher JL Murray for the Population Health Metrics Research Consortium (PHMRC)

267


Murray et al. Population Health Metrics 2011, 9:18 http://www.pophealthmetrics.com/content/9/1/18

EDITORIAL

Open Access

Verbal autopsy: advancing science, facilitating application Christopher JL Murray1*, Alan D Lopez2, Kenji Shibuya3 and Rafael Lozano1 School of Population Health, and Population Health Metrics. The Congress convened the global research and policy community who currently work with VA data, or who could greatly benefit from doing so. The conference inspired vibrant discussions about critical aspects of VA, including instrument design, analysis methods, and the potential use of VA in national health information systems. By convening a wide array of participants with different perspectives, a greater exchange of ideas, collaboration, and intellectual innovation was encouraged to advance the use and understanding of VA as a mechanism for gathering valuable information about causes of death in populations. The innovative research presented at the conference has motivated the creation of a community of scientists, policymakers, and practitioners dedicated to furthering this important field of population health. In an effort to promote and disseminate the key research breakthroughs discussed at the Global Congress, we are publishing this thematic series. After peer review, 24 papers and eight commentaries were accepted for publication. The innovations in VA detailed in these papers represent a substantial increase in knowledge about the comparative performance of various methods to assign causes of death, from applications of methods used in current practice, including physician review, to a rigorous validation of new automated methods with significant potential for future application in routine national and research data collection platforms. We expect that this thematic series of Population Health Metrics will provide an opportunity for informed discussion and debate and hopefully will stimulate the widespread application of VA where it is needed. This collection of research clearly shows that automated methods for VA are more accurate, faster, and cheaper than traditional physician review. Scientific innovation has taken VA from infancy to maturity. While methods innovation will and must continue, we hope that this thematic series will stimulate debate,

Editorial Critical information on population health is needed to inform planning, resource allocation, program implementation, monitoring, and evaluation. One of the key descriptors of a population’s health is information about causes of death. Since many countries lack complete vital registration systems with medical certification of deaths, cause of death information is often missing. Verbal autopsy (VA) can be used to determine individuals’ causes of death and cause-specific mortality fractions in populations without a complete vital registration system. A standard VA instrument paired with easy-to-implement and reliable analytic methods could help bridge significant gaps in information about causes of death, particularly in resource-poor settings. A great deal of research has been conducted in the past several decades about VA and its application in the field, particularly in research settings, but some traditional methods of implementation and analysis can be costly, time-consuming, and potentially of varying quality. Verbal autopsies can now be analyzed using a much wider array of innovative techniques, most of which will be less expensive and yield higher quality results than current practice. What has been missing from the field of verbal autopsy is a collection of the most up-to-date research to help decision-makers choose the best and most cost-effective VA techniques to identify causes of death in their populations. This thematic series of Population Health Metrics, “Verbal autopsy: innovations, applications, opportunities,” was developed in response to this need. The research published in this thematic series emerged from the “Global Congress on Verbal Autopsy: State of the Science,” held in Bali, Indonesia, in February 2011. The conference was co-sponsored by the Institute for Health Metrics and Evaluation, the University of Queensland * Correspondence: cjlm@u.washington.edu 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave, Suite 600, Seattle, WA 98121, USA Full list of author information is available at the end of the article

© 2011 Murray et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1


Murray et al. Population Health Metrics 2011, 9:18 http://www.pophealthmetrics.com/content/9/1/18

Page 2 of 2

increase knowledge, and facilitate application of VA in national health information systems. We believe that the global health community, including national governments, can now more confidently measure causes of death to monitor progress toward health and development goals. We urge them to seize the opportunities for improved population health measurement that are now available. Author details 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave, Suite 600, Seattle, WA 98121, USA. 2University of Queensland, School of Population Health, 288 Herston Road, Herston, Queensland 4006, Australia. 3Graduate School of Medicine, University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan. Received: 15 July 2011 Accepted: 27 July 2011 Published: 27 July 2011

doi:10.1186/1478-7954-9-18 Cite this article as: Murray et al.: Verbal autopsy: advancing science, facilitating application. Population Health Metrics 2011 9:18.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

2


AbouZahr Population Health Metrics 2011, 9:19 http://www.pophealthmetrics.com/content/9/1/19

COMMENTARY

Open Access

Verbal autopsy: who needs it? Carla AbouZahr Commentary Verbal autopsy has long been used to generate mortality data, often with the needs of specific programs, such as child and maternal mortality, in mind [1,2]. This led to a proliferation of instruments and the resulting data were rarely comparable across research sites or over time [3]. Demands for standardization led to the 2007 publication of the World Health Organization (WHO) verbal autopsy standards, which many researchers have adopted [4,5]. Increased convergence around standards has stimulated interest in using verbal autopsy outside research settings on a routine basis. Decision-makers, program managers, donors, and development partners have identified the need for simple data collection instruments, implemented using mobile phones or other hand-held devices and linked to the provision of care [6]. These potential users of verbal autopsy methods have different perspectives from researchers, tending to prioritize instrument simplicity, feasibility, and program relevance above technical performance. Verbal autopsy offers a solution to the challenge of generating cause of death information in settings where deaths occur outside the health care system. The “gold standard” is medical certification of cause according to the International Classification of Diseases (ICD) [7]. The WHO verbal autopsy tools are designed to generate causes of death that are “ICD compatible.” WHO recommends that mortality data derived from verbal autopsy be tabulated separately from data derived using medical certification and ICD coding [7]. While verbal autopsy may not be as reliable as hospital-based certification for identifying causes of death, it is able to produce information not available from a medical certificate. Alongside questions about signs and symptoms in the deceased person, verbal autopsy can ask about risk factors and health care seeking prior to death, elucidating social, economic, behavioral, and health system issues that may have contributed to death. This contextual knowledge is invaluable to health care managers and planners. Potential users of data

generated through verbal autopsy include communities, health care planners and managers, researchers, global decision-makers, and donors [8]. While there is a degree of overlap, these users have different perspectives on the uses of mortality data. These in turn have an impact on the desirable characteristics of data collection instruments. Researchers, epidemiologists, and global-level decision-makers want mortality data to inform burden of disease estimation and program evaluation. Cause of death estimates must be scientifically validated, meet high standards of accuracy, and be comparable over time and across countries. Uncertainty in cause-specific mortality fractions can be managed. National/subnational decision-makers and health system managers want cause of death data for planning, budgeting, and resource allocation and for monitoring and reporting to donors. The ability to track trends over time is more important than cross-country comparability. Uncertainty is problematic, especially when data are needed to inform allocation of resources. Data need to be actionable and program relevant, implying an interest in information on socio-economic determinants of mortality [9]. Data collection instruments should be feasible and cost-effective to implement and adaptable to local circumstances and conditions. Disease-specific programs want data that highlight specific areas of interest and data collection instruments that are feasible, appropriate, and program relevant. They often have a particular interest in data on socioeconomic and health system factors associated with avoidable mortality. Beyond the health sector, civil registrars-general and national statistics offices want mortality information generated through verbal autopsy to complement data from routine administrative sources. Data collection instruments should be endorsed by technical partners such as WHO and should be simple and straightforward for implementation in routine registration encounters.

Can one size fit all? Given the variety of users and uses, it is legitimate to ask whether a single data collection instrument can

Correspondence: abouzahr.carla@gmail.com Health Metrics Network, World Health Organization, Ave Appia, 1211 Geneva 27, Switzerland

© 2011 AbouZahr; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

3


AbouZahr Population Health Metrics 2011, 9:19 http://www.pophealthmetrics.com/content/9/1/19

Page 2 of 2

respond to all user demands [10]. However, it would be a retrograde step to revert to a situation characterized by the coexistence of multiple, divergent tools. Instead, the research community should focus on developing methods that are aligned with core standards but can be implemented through modular approaches, adapted to local circumstances and information needs. In doing so, it is important to reflect on the potential of extending the current WHO verbal autopsy standards to incorporate socio-economic, community, behavioral, and health system determinants of mortality. For verbal autopsy methods to successfully extend their reach from research to routine application, a solid standards-based foundation will be essential, but so will a degree of flexibility and responsiveness to user requirements. New challenges will emerge, many of which will require local-level operations and implementation research to resolve. Building the evidence base of what works and where will be critical for demonstrating to potential users that the techniques can indeed generate data that are both robust and fit for purpose. This implies rigorous validation to ensure that the data generated are reliable and scientifically sound, coupled with testing of the instruments for feasibility, sustainability, and local relevance. Both sets of criteria will need to be met if the results of verbal autopsy are to be used effectively to inform policies and programming.

9.

Iyengar K, Iyengar SD, Suhalka V, Dashora K: Pregnancy-related deaths in rural Rajasthan, India: exploring causes, context, and care-seeking through verbal autopsy. J Health Popul Nutr 2009, 27(2):293-302. 10. Fottrell E, Byass P: Verbal autopsy: methods in transition. Epidemiol Rev 2010, 32(1):38-55. doi:10.1186/1478-7954-9-19 Cite this article as: AbouZahr: Verbal autopsy: who needs it? Population Health Metrics 2011 9:19.

Competing interests The author declares that they have no competing interests. Received: 13 April 2011 Accepted: 27 July 2011 Published: 27 July 2011 References 1. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: their development and validation in a multicentre study. Trop Med Int Health 1998, 3(6):436-446. 2. Ronsmans C, Vanneste AM, Chakraborty J, Van Ginneken J: A comparison of three verbal autopsy methods to ascertain levels and causes of maternal deaths in Matlab, Bangladesh. Int J Epidemiol 1998, 27(4):660-666. 3. Anker M, Black RE, Coldham C, Kalter HD, Quigley , Ross D, Snow RW: A standard verbal autopsy method for investigating causes of death in infants and children Geneva: World Health Organization; 1999, (WHO/CDS/CSR/ISR/ 99.4). 4. Baiden F, Bawah A, Biai S, Binka F, Boerma T, Byass P, Chandramohan D, Chatterji S, Engmann C, Greet D, Jakob R, Kahn K, Kunii O, Lopez AD, Murray CJ, Nahlen B, Rao C, Sankoh O, Setel PW, Shibuya K, Soleman N, Wright L, Yang G: Setting international standards for verbal autopsy. Bull World Health Organ 2007, 85:570-571. 5. World Health Organization: Verbal autopsy standards: ascertaining and attributing cause of death. ISBN 978 92 4 154721 (NLM classification: WA900) WHO; 2007. 6. Oluoch J: Millennium Villages Blog - The MVP introduces enhanced mobile technology to reduce child and maternal mortality. 2010 [http:// blogs.millenniumpromise.org/index.php/2010/03/03/the-mvp-introducesenhanced-mobile-technology-to-reduce-child-and-maternal-mortality]. 7. World Health Organization: ICD International Statistical Classification of Diseases, 10th Revision. Second edition. Geneva: World Health Organization; 2004. 8. Byass P: Who needs cause-of-death data? . PLoS Med 2007, 4(11).

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

4


Setel Population Health Metrics 2011, 9:20 http://www.pophealthmetrics.com/content/9/1/20

COMMENTARY

Open Access

Verbal autopsy and global mortality statistics: if not now, then when? Philip W Setel Commentary More than a decade ago, the World Health Organization pointed out the degree to which deficits in the production, sharing, and use of critical health information hampered evidence-based health development in countries with the poorest health status [1]. This “information paradox” in global health came to refer to the predicament in which countries with the greatest need for timely, accurate, and comprehensive health information - including on causes of death at the population level have had the least access to it [2,3]. Since that time, some (but not enough) improvements in information systems and technologies have begun to fill voids in our knowledge of population health [4,5]. Throughout this period, however, comparatively little attention has been paid to advancing the science and practice of direct measurement of mortality and its causes - particularly among adults [6-11]. In 2000, the state of knowledge on verbal autopsy (VA, a term that covers the design and application of postmortem caregiver interviews, procedures for assigning one or more probable causes of death, and the aggregation and tabulation of population-level mortality statistics based on this data source) centered on a small group of demographers and epidemiologists, many of whom ran intervention trials in various demographic surveillance sites. Almost the entire community of scholarship was on a first-name basis; we could easily gather in a medium-sized conference room, and any of our students or colleagues could become an expert on the VA literature with a week or two of focused reading. Throughout this period, those who remained dedicated to maximizing the potential of VA made steady progress. Yet throughout, a deep and sometimes reflexive scepticism remained that VA could ever really deliver the goods as a reliable measurement tool. The persistent shortcomings in cause of death data, and reluctance to

widely embrace VA outside of demographic surveillance sites, have forced the global health community to make do with sources of limited coverage and dubious quality and consistency, applying increasingly complex statistical analyses to “correct” for all manner of bias and nonsampling error. The papers in this issue of Population Health Metrics go far in addressing central questions about how much VA can contribute to our measurement of health and health impact. How close to truth can VA ever get? How good is “good enough” for decision-making? Is our putative “gold standard” of medically certified deaths all that robust to begin with - in industrialized or lowerincome countries? Can we make the production of VA data better, faster, and cheaper? What alternatives to demographic surveillance systems exist to permit the collection of mortality data from large, representative population samples? Can VA detect disease outbreaks, the population effects of antiretroviral therapy scale-up, and long-term trends in causes of death? Collectively, this special issue should just about put to rest the conventional wisdom that in the 21st century those living on the margins of the global economy must continue to make do with models and guesstimates about leading causes of death for priority-setting and decision-making. This is not to deny that an important implementation research agenda remains. For example, there is an urgent need to identify optimal platforms, systems, and sample sizes for administration of VA, and to understand if and how VA-based death registration and cause of death statistics might be a stepping stone to increasing the coverage of functioning civil registration and death certification systems. While there have been a few examples of VA being administered on a large scale as an explicit part of the development of national statistics, such as the national causes of death rider survey to the 2007 Mozambique national census [12], they have been one-offs and not widely published. Those who tout the impact of development assistance for health by confidently proclaiming the numbers of

Correspondence: philip.setel@gatesfoundation.org Measurement Learning and Evaluation for Global Health, Bill & Melinda Gates Foundation, PO Box 23350, Seattle, WA 98102, USA

© 2011 Setel; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

5


Setel Population Health Metrics 2011, 9:20 http://www.pophealthmetrics.com/content/9/1/20

Page 2 of 2

lives they have saved due to their investments - especially from particular diseases - rely on modeled estimates that are far less than the state of the art in observational epidemiology and medical demography. Part of the solution is for the international health sector to move on from the question of whether VA can be used to how we will use it to its maximum potential. New advances in automated assignment of probable causes of death, such as those described in this series, combined with smartphone and tablet computing technology and the opportunity to further streamline VA interviews, promise to remove the remaining obstacles to meeting a far higher standard of credible evidence: to measure impact rather than just continue to model it. These game-changing innovations open the door ever wider to a future in which no one will go uncounted, and all lives will be more equally valued. Will this be the decade when we finally help those who die unseen to finally be seen and documented? Will we now start to support information systems capable of providing direct measurement of births, deaths, and causes of death among the most marginalized populations whose lives and deaths currently leave no trace in any official record or statistic?

12. Instituto Nacional de Estatistica: Mortalidade em Mocambique. Inquerito Naciocal Sobre Causas de Mortalidade, 2007/8. Maputo: Instituto Nacional de Estatistica; 2009. doi:10.1186/1478-7954-9-20 Cite this article as: Setel: Verbal autopsy and global mortality statistics: if not now, then when? Population Health Metrics 2011 9:20.

Competing interests As indicated in the acknowledgments of the relevant articles, the Bill & Melinda Gates Foundation funded some of the research and findings presented in this issue of Population Health Metrics. Received: 29 June 2011 Accepted: 27 July 2011 Published: 27 July 2011 References 1. WHO: The World Health Report 2000. Geneva: WHO; 2000. 2. Ustun BT, Jakob R: Calling a spade a spade: meaningful definitions of health conditions. Bull World Health Organ 2005, 83:802. 3. Walker S: Health information at a global level: working to support the information paradox countries. HIM J 2004, 33:78. 4. Murray CJ: Towards good practice for health statistics: lessons from the Millennium Development Goal health indicators. Lancet 2007, 369:862-873. 5. Walker N, Bryce J, Black RE: Interpreting health statistics for policymaking: the story behind the headlines. Lancet 2007, 369:956-963. 6. Gakidou E, Hogan M, Lopez AD: Adult mortality: time for a reappraisal. Int J Epidemiol 2004, 33:710-717. 7. AbouZahr C, Cleland J, Coullare F, Macfarlane SB, Notzon FC, Setel P, Szreter S, Anderson RN, Bawah AA, Betran AP, et al: The way forward. Lancet 2007, 370:1791-1799. 8. Hill K, Lopez AD, Shibuya K, Jha P, AbouZahr C, Anderson RN, Bawah AA, Betran AP, Binka F, Bundhamcharoen K, et al: Interim measures for meeting needs for health sector data: births, deaths, and causes of death. Lancet 2007, 370:1726-1735. 9. Lopez AD, AbouZahr C, Shibuya K, Gollogly L: Keeping count: births, deaths, and causes of death. Lancet 2007, 370:1744-1746. 10. Mahapatra P, Shibuya K, Lopez AD, Coullare F, Notzon FC, Rao C, Szreter S: Civil registration systems and vital statistics: successes and missed opportunities. Lancet 2007. 11. Setel PW, Macfarlane SB, Szreter S, Mikkelsen L, Jha P, Stout S, AbouZahr C: A scandal of invisibility: making everyone count by counting everyone. Lancet 2007, 370:1569-1577.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

6


Dharmaratne et al. Population Health Metrics 2011, 9:21 http://www.pophealthmetrics.com/content/9/1/21

COMMENTARY

Open Access

Opportunities and challenges for verbal autopsy in the national Death Registration System in Sri Lanka: past and future Samath D Dharmaratne1*, Rajitha L Jayasuriya1, Buddhipani Y Perera2, EM Gunesekera2 and A Sathasivayyar2 Commentary Analyses of cause of death (COD) statistics are fundamental for monitoring the health situation of populations and for planning suitable interventions. The accuracy of national COD statistics is important for reliably prioritizing health problems in order to decide upon interventions and for resource allocation. The completeness of registration in the Death Registration System (DRS), which is a major component of the Civil Registration System (CRS) of Sri Lanka, is above 90% [1]. However, the quality of COD statistics are deficient, with 30% of deaths categorized as being due to “signs, symptoms, and ill-defined causes” [2]. Deaths coded to these categories are of little use for decision-making. There is considerable potential for verbal autopsy to complement the DRS to improve the quality of COD statistics in Sri Lanka [3]. In Sri Lanka, once a death occurs, it has to be registered before the deceased can be cremated or buried. For deaths that occur outside hospitals, the relatives of the deceased notify the Death Registrar (DR). The majority of these notifications will not have a medicallydetermined COD, and the Death Registrar determines this by interviewing the relatives regarding events preceding the death. For deaths that occur in hospitals, a COD is declared by the medical officer who attended the deceased by filing a Death Declaration Form (DDF). Except for “sudden deaths,” the COD for three out of four deaths that occur outside a hospital is given by the Death Registrar. Sudden deaths (which are a small proportion of total deaths) that occur outside a hospital are attended by an Inquirer into sudden death or by a court of law. The majority of the Death Registrars are lay people with

minimal or no training in how to decide on the probable COD. Deaths that occur outside a hospital, in the majority of instances, do not have a death declaration made by a medical officer (A.Sathasivayyar. Assistant Registrar General of Sri Lanka. 3-11-2010 - personal communication). Several studies conducted in Sri Lanka [1,4], have highlighted the biases that are present in the DRS. They point out that only 30% to 40% of the registered deaths occur in a government hospital and that 80% of the registration and certification of deaths is done by nonmedical registrars. A study carried out in 1996 to assess the quality and coverage of death certification found that 15.5% of the medical officers misclassified the underlying cause of death. The study also found that the use of ill-defined terms (e.g., cardiovascular arrest) was frequent (76.4%), as was the use of abbreviations leading to misclassification (26.4%) [1]. The Verbal Autopsy Questionnaire (VAQ), introduced into Sri Lanka in 2006, has several important limitations: only a limited number of diseases are included that encompass very broad categories, such as high blood pressure, heart disease, diabetes, kidney disease, paralysis, wheeze, any fever and cancer; symptoms asked about are limited; and the ability of the Death Registrars to identify the correct disease using this VAQ is also limited. Nonetheless, the introduction of a VAQ with the support of policymakers is an important step towards improving the quality of cause of death data in Sri Lanka. This not only has sensitized the government to the technique of verbal autopsy but users (i.e., Death Registrars) now accept it as an integral part of their function in certifying deaths and as an important step towards improving data quality. The difficult part of the policy change, establishing the system and ensuring that the Death Registrars accept it, has already been achieved. What is now needed is to improve it: to

* Correspondence: samath20@gmail.com 1 Department of Community Medicine, Faculty of Medicine, University of Peradeniya, Sri Lanka Full list of author information is available at the end of the article

© 2011 Dharmaratne et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

7


Dharmaratne et al. Population Health Metrics 2011, 9:21 http://www.pophealthmetrics.com/content/9/1/21

Page 2 of 2

restructure and expand the VAQ to include additional items to help the Death Registrars to arrive at a probable COD using the VAQ. The Standard VAQ developed by the World Health Organization (WHO), which is also being used in a number of other countries for improving COD statistics, is being translated and validated for routine use in Sri Lanka. This innovation, particularly if combined with automated methods for diagnosing cause of death, has the potential to substantially improve the quality and timeliness of critical cause of death information for policy and planning in Sri Lanka. Acknowledgements The authors would like to thank the Registrar General and his staff who helped in the data collection. We also acknowledge all the others who helped us in numerous ways to make this study a success. Author details 1 Department of Community Medicine, Faculty of Medicine, University of Peradeniya, Sri Lanka. 2Registrar General’s Department, Colombo, Sri Lanka. Authors’ contributions SDD planned the study and data collection, analyzed and interpreted the data, and prepared the manuscript. RLJ helped in the data acquisition and helped in the preparation of the manuscript. The other authors were involved in acquisition of data and manuscript preparation. Competing interests The authors declare that they have no competing interests. Received: 11 February 2011 Accepted: 1 August 2011 Published: 1 August 2011 References 1. Fonseka WAAP: A study in the quality and coverage of death registration in a district of Sri Lanka. MD Thesis, Postgraduate Institute of Medicine, University of Colombo, Sri Lanka; 1996. 2. Vital Statistics. Department of Census and Statistics, Colombo, Sri Lanka; 2010 [http://www.statistics.gov.lk/]. 3. World Health Organization: Verbal Autopsy Standards: ascertaining and attributing cause of death. Geneva, Switzerland: World Health Organization; 2007. 4. Banduthillake C: Epidemiology of maternal mortality in Sri Lanka. Postgraduate Institute of Medicine, University of Colombo, Sri Lanka; 1997. doi:10.1186/1478-7954-9-21 Cite this article as: Dharmaratne et al.: Opportunities and challenges for verbal autopsy in the national Death Registration System in Sri Lanka: past and future. Population Health Metrics 2011 9:21.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

8


Chandramohan Population Health Metrics 2011, 9:22 http://www.pophealthmetrics.com/content/9/1/22

COMMENTARY

Open Access

Validation and validity of verbal autopsy procedures Daniel Chandramohan Commentary Methods for interpreting verbal autopsy (VA) that have been validated fall into two major categories: (1) physician-certified verbal autopsy (PCVA), the commonlyused method in which one or more physicians ascertain causes of death based on their clinical judgment; and (2) computerized coding of verbal autopsy (CCVA), in which causes of death are derived using predefined criteria. Decision rules for CCVA can be expert opinionbased or data driven. The accuracy of these VA interpretation methods varies depending on causes of death per se, while the effect of misclassification error in VA on the estimates of cause-specific mortality fractions (CSMF) depends on the distribution of causes of death. The importance of acknowledging the effects of misclassification of causes of death by VA has been highlighted by the recent controversial estimates of malaria mortality in India [1]. The parameters of validity of VA obtained from a validation study may be useful to measure the uncertainty limits of CSMFs due to misclassification errors of VA, and in some contexts, to adjust the estimate of CSMF for the effect of misclassification error [2,3]. The gold standard diagnosis of cause of death (COD) for assessing the validity of VA has been the COD derived from hospital medical records. The main limitations of using hospital-based CODs as the gold standard are: (1) The accuracy of medical records-based COD is debatable, even though some studies have refined the diagnosis with expert review of hospital records; and (2) the composition and distribution of hospital CODs may not be representative of deaths occurring in the community. In addition, if diagnostic algorithms for CCVA are developed from subsets of validation study datasets, their external validity may be compromised. Nevertheless, hospital diagnosis of COD based on defined clinical

and laboratory criteria are the only useful gold standard available at present for validating VAs. The validity of InterVA has not previously been tested against a gold standard diagnosis. The reliability of InterVA has been determined by examining the concordance of CSMFs estimated by InterVA and PCVA. Given that the accuracy of PCVA is questionable, estimating concordance between causes of death derived by PCVA and InterVA as a measure of validity needs to be interpreted with caution. Measures used to assess the validity of VA include sensitivity, specificity, positive predictive value, and absolute (absolute error) or relative (relative error) difference between CSMF estimated by VA and true CSMF in the validation data. Sensitivity and specificity that measure accuracy at the individual level vary substantially between causes of death across different VA interpretation methods. The absolute and relative errors of CSMF measure the accuracy of VA at the population level. The variability of the absolute error in CSMF appears to be reasonable for most CODs because often the number of false positive and false negative diagnoses balance out. However, the relative error in CSMF tends to be exaggerated, especially if the CSMF is low. Murray and colleagues in this series recommend determining the validity of VAs using cause-specific and average chance-corrected concordance across causes for single cause assignment methods, as well as for one to k causes across causes for individual multiple cause assignment methods [4]. For estimation of CSMFs, they recommend CSMF accuracy and cause-specific concordance correlation coefficients of estimated CSMFs compared to true CSMFs. These measures are useful to compare the performance of different VA interpretation methods and could also be used to estimate the uncertainty limits of CSMF estimates attributable to misclassification errors of VA. Methods to estimate uncertainty limits for CSMFs attributable to misclassification errors of VA need to be further developed.

Correspondence: Daniel.chandramohan@lshtm.ac.uk London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK

Š 2011 Chandramohan; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

9


Chandramohan Population Health Metrics 2011, 9:22 http://www.pophealthmetrics.com/content/9/1/22

Page 2 of 2

Flaxman et al [5] have developed and validated a new CCVA, the Random Forest (RF) Method, for interpreting VA in a large multicountry validation dataset. The median chance corrected concordance rate of the RF Method is higher than PCVA for adult, child, and neonatal VAs. These are very promising results and if confirmed in other validation datasets, software for coding VAs based on the RF Method would greatly improve the reliability and timeliness of CSMFs collected using VAs. What is urgently required is an objective assessment of the performance of the RF Method versus InterVA, based on this high-standard VA validation study dataset, and then to actively promote and facilitate the implementation of the best-performing method in all mortality surveillance systems using VAs. This would likely greatly improve the quality and comparability of cause-specific mortality data obtained using VAs. Received: 19 May 2011 Accepted: 1 August 2011 Published: 1 August 2011 References 1. Valecha N, Staedke S, Filler S, Mpimbaza A, Greenwood B, Chandramohan D: Malaria-attributed death rates in India. Lancet 2011, 377:992-93. 2. Korenromp EL, Williams BG, Gouws E, Dye C, Snow RW: Measurement of trends in childhood malaria mortality in Africa: an assessment of progress toward targets based on verbal autopsy. Lancet Infect Dis 2003, 3:349-58. 3. Polprasert W, Rao C, Adair T, Pattaraarchachai J, Porapakkham Y, Lopez AD: Cause-of-death ascertainment for deaths that occur outside hospitals in Thailand: application of verbal autopsy methods. Popul Health Metr 2010, 8:13. 4. Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9:28. 5. Flaxman AD, Vahdatpour A, Green S, James SL, Murray CJL: Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:29. doi:10.1186/1478-7954-9-22 Cite this article as: Chandramohan: Validation and validity of verbal autopsy procedures. Population Health Metrics 2011 9:22.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

10


Byass Population Health Metrics 2011, 9:23 http://www.pophealthmetrics.com/content/9/1/23

COMMENTARY

Open Access

Whither verbal autopsy? Peter Byass1,2 Commentary Wherever the field of verbal autopsy (VA) may be heading, the exciting and considerable extent of new work presented in this Population Health Metrics series clearly shows that the topic is not withering. The Global Congress on Verbal Autopsy held in Bali in February 2011 undoubtedly marked a significant milestone: VA has come of age as an area of scientific interest in its own right. We may, however, be at something of a tipping point in that most of the work over the past few decades has (perhaps largely unconsciously) concentrated on presenting VA (usually interpreted by physicians) as a second-best substitute for medical certification of cause of death, particularly for application in areas where routine certification is either practiced selectively or not required [1]. However, it now emerges that medical certification of death is not as reliable as is often assumed, and physicians are also not particularly good at interpreting VA data consistently and reliably [2]. We have also learned that evaluations of cause-specific mortality are generally compromised by a lack of true gold standard data and metrics for comparative purposes [3,4]. At the same time, the dominance of research domains in VA applications is partly giving way to concepts of using VA in more routine ways, at least as an interim strategy in countries where universal routine death certification remains some way off. These perceived needs, coupled with new methodological developments, offer exciting prospects. The VA literature has extensively used and abused the concept of “gold standards” for validating cause of death determination. Metallurgists would say that 100% pure gold is an impossibility; the highest possible quality is normally certified as being 99.9% gold, while most of the quality-assured gold we encounter on an everyday basis ranges from 37% to 75% purity. It is perhaps also worth reflecting that 99% pure gold is an

extremely soft and somewhat impractical material. Cause of death, on the spectrum of measurable biomedical phenomena, is also a somewhat soft commodity. For that reason, any approach to assessing cause of death involves alloying professional expertise with the best evidence in order to generate robust outcomes. Different approaches to cause of death determination do this in different ways. Pathologists undertaking autopsies combine their specific expertise with visualized intracorporeal evidence to arrive at a cause of death (which frequently varies from a nonautopsy cause of death [5,6]). Physicians certifying a patient’s death combine their expertise with antemortem data, the quality and extent of which may vary considerably. Verbal autopsy interpreted by physicians relies on similar expertise to medical certification, but using the very different evidence base of the VA interview. Modeled approaches to cause of death determination need some kind of expert input - whether it be, for example, the physician committee that established the mapping between clinical criteria and causes of death in the new Population Health Metrics Research Consortium (PHMRC) dataset [3] or the expert group that refined prior probability estimates in the InterVA model [7] and to incorporate that captured expertise with available evidence to deliver a reliable model. As in any field of science, methods for cause of death determination evolve and develop over time. Any “good” approach ideally needs to demonstrate both a satisfactory quantitative metric of performance and established widespread confidence among its users. In this respect the ground is currently somewhat unstable; the widespread confidence in physician-derived cause of death is being challenged, and InterVA, the cause of death model that has been most widely applied during the past decade, has so far primarily established its performance against physicians [8]. New ideas for models may perform well in terms of quantitative metrics against test datasets [4] but as yet have not achieved widespread confidence among actual users. The future for VA is therefore likely to be dynamic and exciting - and will hopefully help the world to move to a position where

Correspondence: peter.byass@epiph.umu.se 1 Umeå Centre for Global Health Research, Umeå University, 90187 Umeå, Sweden Full list of author information is available at the end of the article

© 2011 Byass; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

11


Byass Population Health Metrics 2011, 9:23 http://www.pophealthmetrics.com/content/9/1/23

Page 2 of 2

mortality patterns are well documented and available as evidence to feed into health service planning. Acknowledgements The Umeå Centre for Global Health Research is supported from FAS, the Swedish Council for Working Life and Social Research (http://www.fas.se) (grant no. 2006-1512). PB’s participation in the Global Congress on Verbal Autopsy was funded by the Institute for Health Metrics and Evaluation, University of Washington. Author details 1 Umeå Centre for Global Health Research, Umeå University, 90187 Umeå, Sweden. 2MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa. Competing interests PB is a member of the PHM Editorial Board. Received: 3 May 2011 Accepted: 1 August 2011 Published: 1 August 2011 References 1. Fottrell E, Byass P: Verbal Autopsy - methods in transition. Epidemiologic Reviews 2010, 32:38-55. 2. Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJL: Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:32. 3. Murray CJL, Lopez AD, Black R, Ahuja R, Ali SM, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gomez S, Hernandez B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Devarsetty P, Premji Z, RamírezVillalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D, Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metr 2011, 9:27. 4. Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9:28. 5. Cox JA, Lukande RL, Lucas S, Nelson AM, Van Marck E, Colebunders R: Autopsy causes of death in HIV-positive individuals in sub-Saharan Africa and correlation with clinical diagnoses. AIDS Review 2010, 12:183-194. 6. Shojania K, Burton E, McDonald K, Goldman L: The Autopsy as an Outcome and Performance Measure. Rockville, MD: Agency for Healthcare Research and Quality; 2002, Evidence Report/Technology Assessment No. 58 (Prepared by the University of California at San Francisco-Stanford Evidence-based Practice Center under Contract No. 290-97-0013). AHRQ Publication No. 03-E002.. 7. Byass P, Fottrell E, Huong DL, Berhane Y, Corrah T, Kahn K, Muhe L, Van DD: Refining a probabilistic model for interpreting verbal autopsy data. Scandinavian Journal of Public Health 2006, 34:26-31. 8. Byass P, Kahn K, Fottrell E, Mee P, Collinson MA, Tollman SM: Using verbal autopsy to track epidemic dynamics: the case of HIV-related mortality in South Africa. Popul Health Metr 2011, 9:46.

Submit your next manuscript to BioMed Central and take full advantage of:

doi:10.1186/1478-7954-9-23 Cite this article as: Byass: Whither verbal autopsy? Population Health Metrics 2011 9:23.

• Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

12


Fottrell Population Health Metrics 2011, 9:24 http://www.pophealthmetrics.com/content/9/1/24

COMMENTARY

Open Access

Advances in verbal autopsy: pragmatic optimism or optimistic theory? Edward Fottrell1,2 Commentary In recent decades, verbal autopsy (VA) methods have been increasingly used to identify likely causes of death in settings where the majority of deaths occur without medical attention or certification as to cause [1]. Developments in the 1980s and 1990s advanced the conceptual and methodological aspects of the science considerably but fell short of providing a clear message about best practices for those who rely on VA data [2-10]. There has been a hiatus of methodological development since then, due in part to persistent, narrow assumptions as to the desire and need for cause of death data and unrealistic evaluation standards. This has not limited the application of VA methods in the world’s poorest settings, but it has almost certainly limited the usefulness and comparability of the data. There remains scarce evidence on which to base choice of methods at the various stages of the VA data process. However, the first ever Global Congress on Verbal Autopsy, held in Bali, Indonesia in February 2011, represents a resurgence of methodological and conceptual developments - VA is arguably one of the most important fields in global health today. Methodological development in recent years, particularly in relation to probabilistic interpretation of VA data, has brought VA into an exciting era that is creating new opportunities for reliable, timely, and useful cause-specific mortality measurement. A shift away from limited individual-level and clinical paradigms towards population-based epidemiological thinking and public health utility has been characterized by a flurry of new methodological thinking and innovations from relatively small groups of researchers. Among these like-minded researchers, however, there is risk of a divide between pragmatic optimists and optimistic theorists. The pragmatic optimists are driven by the realities, perils, and

pitfalls of real-life health measurement in low-income settings and strive to enhance health knowledge with methods that are good enough to fill data gaps reliably and efficiently. The optimistic theorists, whose methodological developments are often theoretically superior, are often far from offering practical solutions to those on the ground who need to know the major burdens of cause-specific mortality in their populations simply, quickly, and cheaply in the absence of pre-existing data and where “true validity” is difficult to establish. Such dichotomization is perhaps somewhat artificial, but there is a real risk that unrealistic standards and expectations in method development and evaluation will become the enemy of good enough methods that are able to provide essential data to those who need it. The Global Congress on VA was attended by over 100 delegates representing numerous agencies and academic institutions from around the globe. This level of participation illustrates not only the persistent desire to know who died from what in the world’s poorest settings, but also a desire for clear leadership on what the best methods are and how to use them. Whilst methodological favoritism, ideology, and competition can be a threat to an objective answer to this question, such factors are, in reality, likely to be minor and secondary to the more fundamental difficulty of recognizing the range of users who need cause of death data and what data they need. Differing cause of death data needs have been well described previously and the nonexistence of a one-sizefits-all solution to all needs is likely to persist [11]. This statement is grounded in recognition of the realities of health metrics in the world’s poorest populations and the imperfect world of lay-reported signs and symptoms, dubious record keeping, and biases in remembering, reporting, and recording certain events. It is in recognition of these realities that the gap between the pragmatic optimists and the optimistic theorists may grow: the first believing that method development cannot necessarily rely on pre-existing data and should be evaluated in terms of comparability to reference (but not

Correspondence: Edward.Fottrell@epiph.umu.se 1 Umeå Centre for Global Health Research, Department of Public Health and Clinical Medicine, Umeå University, Sweden Full list of author information is available at the end of the article

© 2011 Fottrell; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

13


Fottrell Population Health Metrics 2011, 9:24 http://www.pophealthmetrics.com/content/9/1/24

Page 2 of 2

gold) standards, plausibility in relation to given knowledge, and the ability to characterize population-level cause-specific mortality well enough for the use of such data for planning, monitoring, and evaluation; the latter continuing on a quest for true validity, believing that this does exist in relation to cause of death. Validation is of course desirable for any method, but do global cause of death gold standards for the validation of VA methods really exist? Hospital-based data have been used in the past and continue to be used for validation studies; however, deaths for which such data are available are not representative of deaths in the majority of individuals who live their lives, get sick, and die with limited or no contact with formal health services. Similarly, the symptom profiles of individuals who die in the community in the absence of health care, or at least the symptoms that are recalled and reported by relatives of those individuals during VA interviews, are likely to be considerably different from those who were informed by contact with medical services. Even if sophisticated methods attempt to adjust for this, the costs of this alchemy are high and the gold is without doubt alloyed, limiting its relevance for true community-based populations and those who must plan services for such populations. This fact does not mean that such comparisons are not valuable to a certain extent they may highlight gross inconsistencies, for example. A key achievement of the Bali congress was a “Bali Declaration” that physician review of VA data as the default method of choice for all VA interpretation should be a thing of the past. On this, most experts agreed, and this declaration represents a substantial step forward, which will have untold implications for the timeliness and utility of VA-derived cause of death data. Such unity in communication on best practice in VA methods from the world’s leaders in the field is timely and commendable. The same unified voice on the best method for VA interpretation that many might have hoped to hear in Bali is not yet loud and clear, and indeed better methods are likely to evolve over the coming years. Before this can happen, however, there needs to be agreement on flexibility in evaluation standards, recognizing that the utility of a method is highly dependent on who wants the data and what they want to do with them. Single-mindedness with regards to absolute measures of validity, true gold standards, and the utility of data are likely to hinder ongoing developments in VA at the expense of immediate public health benefits and to the detriment of conceptual advances that have been made in recent years.

Author details Umeå Centre for Global Health Research, Department of Public Health and Clinical Medicine, Umeå University, Sweden. 2Centre for International Health and Development, Institute of Child Health, University College London, UK. 1

Received: 28 April 2011 Accepted: 1 August 2011 Published: 1 August 2011 References 1. Fottrell E, Byass P: Verbal Autopsy: Methods in Transition. Epidemiologic Reviews 2010, 32:38-55. 2. Kalter HD, Gray RH, Black RE, Gultiano SA: Validation of postmortem interviews to ascertain selected causes of death in children. Int J Epidemiol 1990, 19:380-6. 3. Snow B, Marsh K: How Useful Are Verbal Autopsies to Estimate Childhood Causes of Death? Health Policy and Planning 1992, 7:22-9. 4. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: issues in their development and validation. Int J Epidemiol 1994, 23:213-22. 5. Quigley MA, Armstrong Schellenberg JR, Snow RW: Algorithms for verbal autopsies: a validation study in Kenyan children. Bull World Health Organ 1996, 74:147-54. 6. Anker M: The effect of misclassification error on reported cause-specific mortality fractions from verbal autopsy. Int J Epidemiol 1997, 26:1090-6. 7. Maude GH, Ross DA: The effect of different sensitivity, specificity and cause-specific mortality fractions on the estimation of differences in cause-specific mortality rates in children from studies using verbal autopsies. Int J Epidemiol 1997, 26:1097-106. 8. Reeves BC, Quigley M: A review of data-derived methods for assigning causes of death from verbal autopsy data. Int J Epidemiol 1997, 26:1080-9. 9. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: their development and validation in a multicentre study. Trop Med Int Health 1998, 3:436-46. 10. Ronsmans C, Vanneste AM, Chakraborty J, Van Ginneken J: A comparison of three verbal autopsy methods to ascertain levels and causes of maternal deaths in Matlab, Bangladesh. Int J Epidemiol 1998, 27:660-6. 11. Byass P: Who needs cause-of-death data? PLoS Medicine 2007, 4:1715-6. doi:10.1186/1478-7954-9-24 Cite this article as: Fottrell: Advances in verbal autopsy: pragmatic optimism or optimistic theory? Population Health Metrics 2011 9:24.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution

Acknowledgements EF is supported at the Umeå Centre for Global Health Research by FAS, the Swedish Council for Working Life and Social Research (grant 2006-1512).

Submit your manuscript at www.biomedcentral.com/submit

14


Fligner et al. Population Health Metrics 2011, 9:25 http://www.pophealthmetrics.com/content/9/1/25

COMMENTARY

Open Access

Synergism of verbal autopsy and diagnostic pathology autopsy for improved accuracy of mortality data Corinne L Fligner1*, Jill Murray2 and Drucilla J Roberts3 Commentary This series provides an important opportunity to consider how diagnostic pathology autopsy could be used in conjunction with verbal autopsy to provide more accurate cause of death and mortality data in all countries, and specifically in those countries with inadequate or nonexistent death registrations systems. For the purposes of this commentary, the term “autopsy” will denote the medical-pathology diagnostic procedure, in contrast to “verbal autopsy.” The term “autopsy” means “to see or observe for oneself,” but traditional use has been reserved for the postmortem examination of a (dead) body by a physician/ pathologist, in order to identify diseases and injuries and determine the cause(s) of death. This medical-diagnostic pathology procedure integrates trained observation of the external and internal body with dissection or other invasive procedures, in order to obtain tissue samples, which are evaluated by microscopy and other specialized laboratory modalities, including chemical, toxicologic, genetic, and molecular biologic analyses. Used more broadly, the term “autopsy” reflects the aggregate of procedures used for postmortem medical diagnosis or death investigation, including investigative procedures that identify information about the deceased’s medical history and the circumstances and scene of his/her death. To most pathologists and physicians, the term “verbal autopsy” seems a contradiction in terms. However, it is a clearly defined procedure, which allows classification of cause of death and cause-specific mortality by the analysis of data derived from structured interviews of family, friends, and caretakers, as well as review of any available medical records [1]. As more than two-thirds of the world’s population lives and dies in countries that

lack functional vital registration systems, and in which most deaths occur outside of medical facilities and are neither enumerated nor classified by cause, verbal autopsy has become the primary methodology for determining population-based cause-specific mortality [2,3]. The development of computerized algorithmic systems for determination of cause of death by analysis of verbal autopsy data is a major focus of health metrics research, and emphasis is currently focused on using the recentlycompleted dataset from the Population Health Metrics Research Consortium (PHMRC) project that will allow analysis of verbal autopsy data collected from more than 12,000 hospitalized patients with causes of death established by rigorous clinical criteria. The accuracy of verbal autopsy depends in large part on the quality of the diagnostic criteria, as well as on the age of the deceased and the type of diseases that are involved. Deaths associated with nonspecific signs and symptoms are especially problematic. The recent controversy about malaria mortality in India was a newsworthy example of the difficulty of differentiating malariacaused deaths from those due to other febrile illnesses, such as septicemia, meningitis, encephalitis, and pneumonia [4]. Other areas of poor diagnostic specificity include maternal deaths, perinatal deaths, and stillbirths. Verbal autopsy has not been validated using deaths in which diagnostic pathology autopsies have been performed to determine the cause of death. Diagnostic pathology autopsies have long been considered the “gold standard” for cause of death determination. Although autopsy rates are generally low in many developed countries, estimated at less than 5% in US hospital deaths, studies have continued to demonstrate substantial discordance between clinically- and autopsydetermined causes of death despite technologic advances in diagnostic modalities. These discrepancies are reflected in both clinical records and death certificates. Major diagnostic error rates involving the primary cause

* Correspondence: fligner@u.washington.edu 1 Departments of Pathology and Laboratory Medicine, University of Washington School of Medicine, Box 356100, Seattle, WA, 98195, USA Full list of author information is available at the end of the article

© 2011 Fligner et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

15


Fligner et al. Population Health Metrics 2011, 9:25 http://www.pophealthmetrics.com/content/9/1/25

Page 2 of 3

and middle-resource countries, particularly in nonurban settings. However, compared with many diagnostic modalities, such as those required for radiologic imaging, autopsy is a relatively low-tech and inexpensive procedure, which utilizes the same clinical and anatomic pathology laboratory resources needed for the provision of quality medical care in these settings. Another concern is the impact of cultural and religious attitudes about death and the handling of dead bodies. As this varies by country, culture and religion, this issue will need to be addressed on a case-by-case basis. Education, discussion, respectful communication and practice, and modification and limitation of invasive procedures for community acceptance could contribute to acceptance of this postmortem diagnostic procedure. Verbal autopsy is a valuable indirect system for establishing cause of death and cause-specific mortality, but like any clinical and historical investigation, it will misclassify a substantial number of deaths when compared to the true “gold standard” for death classification, diagnostic pathology autopsy. Too many natural disease processes have similar presentations, symptoms, and signs, and the same assumptions that result in inaccurate cause of death determination in up to 40% of physiciancertified deaths will be present in verbal autopsy data. What is now needed in low- and middle-resource countries are robust studies of the concordance of verbal and diagnostic pathology autopsy to assess the contribution that diagnostic pathology autopsy can make to the quality of verbal autopsy and other mortality data. Similar studies would also be valuable in developed countries with reportedly high-quality cause of death data. Continuous evaluation of a proportion of deaths by diagnostic pathology autopsy in conjunction with verbal autopsy could promote continuous quality assessment and improvement of mortality data, and at the same time facilitate the identification of new or emerging disease processes.

of death have ranged from 10% to more than 30%, even in a recent study that suggested a decline in the autopsy detection of unsuspected diagnoses [5]. The extent of antemortem diagnostic workup did not predict autopsy discrepancy rates. It is still widely accepted that a properly performed autopsy and death review or investigation can provide the most accurate determination of cause of death. The contributions of autopsy to a family’s understanding of a death, to the clinician’s understanding of a death, to discovery of new disease processes or effects of therapy, and to medical education are well established [6-8]. Autopsy-based studies of cause of death in Africa have also confirmed a high rate of diagnostic discrepancy. In a population-based autopsy study of HIV-1 infected gold miners, discrepancies between clinical and autopsy diagnoses were high, with 51% of infections and 55% of pulmonary tuberculosis diagnosed at autopsy having been missed clinically [9]. A study in Maputo, Mozambique noted that autopsies identified more specific diagnoses than death registries, resulting in a different distribution of leading causes of death [10]. A review of the autopsy series in sub-Saharan Africa from 1992 to 2010 showed only a weak correlation between clinical diagnosis and pathologic findings in HIV-positive individuals [11]. In a unique autopsy-based study of maternal deaths, a 40% discrepancy rate between clinical and autopsy diagnoses was identified in a tertiary referral hospital in Mozambique from 2002 to 2004 [12]. Given that verbal autopsy datasets frequently do not have any medical sources of information, it is even more likely that there are substantial discrepancies between diagnostic autopsy- and verbal autopsy-determined causes of death. A major advantage of a diagnostic autopsy is that it can identify a cause or causes of death based on pathologic tissue diagnosis. Even in deaths in which pathologic examination does not identify a definite cause of death, that information can be integrated into the death investigation process to guide the performance of additional studies and the formulation of the most likely cause of death based on all available diagnostic information. Autopsy also identifies chronic and infectious disease processes that may not be the direct cause of death, permitting assessment of prevalence for these processes. Another advantage is that the creation of tissue repositories by preservation of tissue for histologic studies may facilitate later disease discovery and characterization, allowing modification of mortality data based on medical scientific advancement. A major impediment to the performance of diagnostic pathology autopsies is the requirement for trained pathologists and assistants and histologic laboratory infrastructure for processing tissues for microscopy and preserving tissues, not currently available in many low-

Author details 1 Departments of Pathology and Laboratory Medicine, University of Washington School of Medicine, Box 356100, Seattle, WA, 98195, USA. 2 National Institute for Occupational Health, National Health Laboratory Service and School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, PO Box 4788, Johannesburg 2000, South Africa. 3 Massachusetts General Hospital (MGH)-Pathology, 55 Fruit Street WRN 219, Boston, MA, 02114, USA. Authors’ contributions CLF, JM, and DJR participated in the discussion of the concepts. CLF drafted the manuscript. All authors read, revised, edited, and approved the manuscript. Competing interests CLF has received grant funding from the Washington Global Health Alliance (formerly Puget Sound Partners in Global Health) for the development of a minimally-invasive autopsy for low-resource settings.

16


Fligner et al. Population Health Metrics 2011, 9:25 http://www.pophealthmetrics.com/content/9/1/25

Page 3 of 3

CLF, JM, and DJR received funding from the INDEPTH Network to attend the 2011 Global Congress on Verbal Autopsy: State of the Science and a postmeeting session focused on research opportunities related to diagnostic pathology autopsies and verbal autopsies in Bali, Indonesia, in February 2011. Received: 15 April 2011 Accepted: 1 August 2011 Published: 1 August 2011 References 1. World Health Organization: Verbal Autopsy Standards: Ascertaining and attributing causes of death [http://www.who.int/whosis/mort/ verbalautopsystandards/en/]. 2. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: Current practices and challenges. Bulletin of the World Health Organization 2006, 84:239-245. 3. Fottrell E, Byass P: Verbal autopsy: Methods in transition. Epidemiol Rev 2010, 32:38-55. 4. Butler D: Verbal autopsy methods questioned: Controversy flares over malaria mortality levels in India. Nature 2010, 467:1015. 5. Shojania KG, Burton EC, McDonald KM, Goldman L: Changes in rates of autopsy-detected diagnostic errors over time–A systematic review. JAMA 2003, 289:2849-2856. 6. Goldman L, Sayson R, Robbins ZAZ, Cohn LH, Bettman M, Weisberg M: The value of the autopsy in three medical eras. N Engl J Med 1983, 308:1000-1005. 7. Scordi-Bello IA, Kalb TH, Lento PA: Clinical setting and extent of premortem evaluation do not predict autopsy discrepancy rates. Modern Pathology 2010, 23:1225-1230. 8. Ayoub T, Chow J: The conventional autopsy in modern medicine. J R Soc Med 2008, 101:177-181. 9. Murray J, Sonnenberg P, Nelson G, Bester A, Shearer S, Glynn JR: Cause of death and presence of respiratory disease at autopsy in an HIV-1 seroconversion cohort of southern African gold miners. AIDS 2007, 21(supple 6):S97-104. 10. Dgedge M, Novoa A, Macassa G, Sacarlal J, Black J, Michaud C, Cliff J: The burden of disease in Maputo City, Mozambique: Registered and autopsied deaths in 1994. Bulletin of the World Health Organization 2001, 79:546-552. 11. Cox JA, Lukande RL, Lucas S, Nelson AM, Van Marck E, Colebunders R: Autopsy causes of death in HIV positive individuals in sub-Saharan Africa and correlation with clinical diagnoses. AIDS Rev 2010, 12:183-194. 12. Ordi J, Ismail MR, Carrilho C, Romagosa C, Osman N, Machungo F, Bombí JA, Balasch J, Alonso PL, Menéndez C: Clinico-pathological discrepancies in the diagnosis of causes of maternal death in subSahara Africa: retrospective analysis. PLOS Med 2009, 6(2):e100036. doi:10.1186/1478-7954-9-25 Cite this article as: Fligner et al.: Synergism of verbal autopsy and diagnostic pathology autopsy for improved accuracy of mortality data. Population Health Metrics 2011 9:25.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

17


Riley Population Health Metrics 2011, 9:26 http://www.pophealthmetrics.com/content/9/1/26

COMMENTARY

Open Access

Computer-based analysis of verbal autopsies: revolution or evolution? Ian Riley Scientific revolution and the clinico-pathological paradigm In this edition of Population Health Metrics, a series of papers describes new, automated methods for analyzing verbal autopsy questionnaires. Are we witnessing a revolution in the computer-based analysis of verbal autopsies? The use of the word, revolution, brings to mind Thomas Kuhn’s seminal essay, The Structure of Scientific Revolutions, first published in 1962, and his concept of the scientific paradigm [1]. Verbal autopsies, as we now know them, are not a new concept, but rather are late developments within the clinico-pathological paradigm that displaced the humoral theory of disease in the late 18th and early 19th centuries. We could mark the beginnings of this scientific revolution from the publication in 1761 of Morgagni’s The Seats and Causes of Disease Investigated by Anatomy [2]. Encyclopedic in scope, it is over 2,000 pages in length in its English translation. His general plan is to present symptom histories of patients who had died, describe the findings at autopsy, speculate on the likely causal relationships between pathology and symptoms, and finally to discuss similar cases found in a literature extending back from recent centuries to the ancients. The first level of classification is by the three major body cavities (head, thorax, and belly [sic]) and surgery; the second level is by symptom group. At the end of the three volumes are his indices, which he regarded as critical: one of these lists symptoms alphabetically, cross-referencing them to the pathology of disease case-by-case; another lists pathological lesions, similarly cross-referencing them to symptom histories. His underlying thesis, that clinical symptoms reflect organ dysfunction, was not to be fully accepted for over half a century. This thesis can be briefly stated, but it is the sheer weight of evidence, somewhat in the manner of The Origin of Species, that makes his case. That was

not his only purpose: he wanted this to be a working manual (albeit a very large one) for physicians and anatomists. Equally, we could mark the endpoint of this scientific revolution with the publication in 1819 of another massive work - Laennec’s A Treatise on Diseases of the Chest [3]. Laennec argued that the symptom history was inaccurate. Using his new invention, the stethoscope, he linked detailed auscultatory findings to autopsy findings of pulmonary pathology. All of modern medicine rests upon this linkage. Morgagni validated pathological anatomy as the cause of disease by using the symptom history as his gold standard. Sixty years later, Laennec dismissed the symptom history as inaccurate and validated auscultation using pathological anatomy as his gold standard. In the course of this paradigm shift, pathology had been moved from the margins of the medical solar system to its heart. In Foucault’s words, “The space of configuration of the disease” was now superimposed “upon the space of the localization of illness.” [4] Medical practice in the 18 th century had depended heavily on what we would now call unstructured symptom histories. Physical examination was limited in the main to the observation of the fully clothed patient and palpation of the pulse. Laennec’s work led to a wave of enthusiasm for auscultation to the exclusion of other methods of diagnosis. In attempting to restore balance, Pierre Louis “initiated a method of clinical teaching based on precise observation and statistical analysis.” He began the structuring of the clinical history by introducing direct questioning. In the words of Stanley Reiser [5]: Louis objectified each footprint of disease by numeration. For him all signs had equal merit, the criteria for their excellence being the care with which they were observed and described, and their statistical correlation with a particular disease, checked when possible by autopsy.

Correspondence: i.riley@uq.edu.au School of Population Health, University of Queensland, Brisbane, Australia

© 2011 Riley; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

18


Riley Population Health Metrics 2011, 9:26 http://www.pophealthmetrics.com/content/9/1/26

Page 2 of 4

Initially, diagnosis was regarded as a linear process referred to as hypothetico-deductive reasoning, which involved successive steps of problem definition, formulation of tentative hypotheses, collection of preliminary data, formulation of a specific hypothesis, accumulation of further data that tested the hypothesis, and drawing of diagnostic conclusions [8]. It became obvious, however, that this was a weak process used principally by novices and not by expert diagnosticians. A second method, pattern recognition, used by experts working within a familiar field, was associated with 10-fold greater odds of diagnostic success in a test situation than was hypothetico-deductive reasoning. Aptitude depended on both an extensive knowledge base and extensive experience. The development of the necessary skills entailed a much more complex cognitive process than the name implies, and it could not be taught to novices. A third method, known as scheme-inductive reasoning, was associated with five-fold greater odds of diagnostic success. This method was used by experts in less-familiar situations and was also suitable for novice learners. Schemes reflect the organized structure of knowledge. They can be drawn on paper like inductive trees to recreate the major divisions, or chunks of information, used by expert clinicians for both storage and retrieval of knowledge in memory. Decisions are made explicitly at branches of the tree. “After several branching points, when the number of diagnostic options has been considerably reduced, deductive reasoning or pattern recognition may be exploited. Finally, the schemeinductive process is not content-independent; each of the organizational schemes is specific to the clinical presentation.” [9] It is a rewarding intellectual experience for a clinician to work his or her way to an accurate diagnosis. Awareness of this, allied to pride in performance, will have colored physicians’ reactions to learning of the superiority of machine learning over expert judgement. The suggestion that analysis of verbal autopsies is qualitatively different from other epidemiological analyses would seem to reflect intuitive awareness of the underlying complexity of cognitive processes. Large differences in diagnostic success rates between physicians and in one physician at different times is most probably a consequence of the large differences in success rates for different cognitive processes. Criticisms of the clinical history include it being overly time-consuming, unreliable and, because of its subjectivity, unscientific. It certainly would not be regarded as the main platform for diagnosis. Yet a study in medical referral clinics in England, again by educationists, demonstrated that over 80% of diagnoses could be made accurately on the basis of general practitioners’ referral

If we wanted a direct ancestor for our work on verbal autopsies, we might do worse than by adopting Pierre Louis for whom it was “indispensable to count.” [6] By the middle of the 19th century, the new paradigm had won general acceptance. From then on we can think of clinical diagnosis in terms of its three pillars: clinical history, physical examination, and laboratory investigation. Its current form as a semistructured interview was established in the early 20th century. The formal physical examination with its successive steps of observation, palpation, percussion, and auscultation, too, was fixed comparatively early. The great scientific advances lay in the development of new branches of pathology - histopathology, pathophysiology, microbiology, clinical biochemistry, and medical genetics - and in imaging. New methods of investigation gave the physician increasingly direct access to the pathology of the living body. Autopsies became rarities and the pathological anatomy of Morgagni and his successors was relegated to bottles in museums.

The verbal autopsy: clinical parallels The lineage of the verbal autopsy instrument should now be clear. It was based on the clinical history and, like the clinical history, passed through a phase of unstructured narrative before being adapted as a structured survey instrument. Verbal autopsy diagnostics then passed through phases of physician review, of Bayesian analysis based on prior probabilities, and most recently of machine learning. An estimate based on machine learning is that the verbal autopsy is 75% accurate when measured against clinical gold standards, well exceeding human accuracy in the analysis of the same instrument [7]. The reactions of physicians to this demonstration of the power of machine learning to realize the information content of the autopsy is reminiscent of the reactions of chess players on hearing of Garry Kasparov’s defeat by the IBM supercomputer, Deep Blue, in 1997: one senses feelings not only of chagrin but also of regret that yet another domain of what had appeared to be peculiarly human reasoning had yielded to the power of computers. It would be appropriate to ask, therefore, exactly how physicians do reason when they make clinical diagnoses. This question assumed increasing importance for medical educators with the introduction of curricula based on problem-solving. Hitherto the emphasis had been on the accumulation of factual knowledge allied to bedside experience in hospital wards. The challenge they faced was whether diagnostic skills could be taught as such.

19


Riley Population Health Metrics 2011, 9:26 http://www.pophealthmetrics.com/content/9/1/26

Page 3 of 4

... an impersonal clinical attitude and a wall of technology were ways of shielding medical personnel from anxiety-producing thoughts prompted by the critically ill or dying person, about their limitations and failures as healers, and their own mortality.

letters and medical history [10]. The remainder of patients required physical examination and/or laboratory investigation to be made. These results need to be treated with caution but, given that a number of referral letters described symptoms alone and many patients would have been referred because they presented diagnostic difficulty, it seems reasonable to equate these results with the accuracy of verbal autopsy diagnoses based on a combination of symptom history and medical record recall by families. In short, for purposes of diagnosis the information content of the clinical history is much greater than many clinicians would be prepared to acknowledge. Kuhn wrote about the transformation of a world view and of the “conversion” of scientists to a new paradigm [1]. Paradigm shift in Kuhn’s terms was not of particular interest to Foucault [4]. He was concerned with the evolution of perception, language, and discourse, with their interdependence, and with the emergence of clinical science from the medicine of the 18 th century. One of his important arguments is that the disease of the 19 th century could not have been realized by the discourse of the 18 th . To understand the processes of transformation of the medical worldview one should turn to Foucault. Considering his reputation as one of the architects of post-modernism, any doubts about the significance of the clinico-pathological paradigm to Western thought should be dispelled by this judgement [4]:

However, if prejudice against machine-based diagnosis rests on similar attitudes, it is misplaced. The arguments against continuing with physician-based certification of cause of death from verbal autopsies are too compelling: they relate to unreliability of diagnosis and the difficulties of maintaining quality work over long periods of time. The human interaction is between trained interviewers and families. If we have neglected the ethics of verbal autopsies then it is because we have not focused sufficiently on this interaction. As physicians and epidemiologists, we work within the clinico-pathological paradigm without acknowledging it for what it is. We regard a way of thinking that links signs, symptoms, disease, and death as “natural” when, in fact, it is highly derived. It is odd that any high school science student is likely to be familiar with the work of Newton and Darwin but many health professionals would be pressed to give a coherent account of the great paradigm shift between the 18th and 19th centuries. This affects our judgements about ourselves. The use of machine learning in assigning causes to verbal autopsies is likely to be revolutionary in terms of its impact in assigning causes to deaths outside hospitals worldwide, but the scientific revolution took place long before our time. “If we have seen further it is by standing on the shoulders of giants,” [12] is a familiar saying but one containing much truth.

This structure in which space, death, and language are articulated - what is known, in fact, as the anatomo-clinical method - constitutes the historical condition of a medicine that is given and accepted as positive. Positive here should be taken in the strong sense. Disease breaks away from the metaphysic of evil, to which it had been related for centuries; and finds in the visibility of death the full form in which its content appears in positive terms. It will no doubt remain a decisive fact about our culture that its first scientific discourse concerning the individual had to pass through this stage of death. ... from the integration of death into medical thought is born a medicine that is given as a science of the individual.

Author’s information The author chaired the penultimate session on Future Directions at the Global Congress on Verbal Autopsy, Feb. 17, 2011. Acknowledgements Writing of this commentary was supported by funding from the Bill & Melinda Gates Foundation under Grand Challenges in Global Health initiative #13. Competing interests The authors declare that they have no competing interests. Received: 13 April 2011 Accepted: 1 August 2011 Published: 1 August 2011

The structure of this commentary owes much to Reiser’s Medicine and the Rise of Technology and his clear understanding of how overdependence on technology has distanced the physician from the patient as person. Given the above comments from Foucault, it is ironic that Reiser summarized observations by Kubler-Ross in these terms [11]:

References 1. Kuhn TS: The Structure of Scientific Revolutions. 3 edition. Chicago and London: University of Chicago Press; 1996. 2. Morgagni JB: The Seats and Causes of Diseases Investigated by Anatomy New York: Macmillan (Hafner Press); 1960, Translated by Benjamin Alexandar, 1769 Facsimile. With a preface, introduction, and a new translation of five letters by Paul Klemperer..

20


Riley Population Health Metrics 2011, 9:26 http://www.pophealthmetrics.com/content/9/1/26

Page 4 of 4

Laennec RTH: A Treatise on the Diseases of the Chest Cambridge: Cambridge University Press; 1981, Translation by John Forbes of the 1st French edition. London: Underwood, 1821. (Reprinted by MacMillan (Hafner Press). New York, 1962.) Cited by Reiser SJ. Medicine and the reign of technology.. 4. Foucault M: The Birth of the Clinic: An Archaeology of Medical Perception USA: Vintage Books; 1975, (Trans A.M. Sheridan Smith).. 5. Reiser SJ: Medicine and the Reign of Technology Cambridge: Cambridge University Press; 1981. 6. Louis P: An Essay on Clinical Instruction. Cambridge: Cambridge University Press; 1981, London: S Highley, 1834. Cited by Reiser SJ. Medicine and the reign of technology.. 7. Flaxman AD, Vahdatpour A, Green S, James SL, Murray CJL: Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:29. 8. Doran GA: Aspects of the content versus process debate in medical education. Medical Education 1984, 18:401-406. 9. Coderre S, Mandin H, Harasym PH, Fick GH: Diagnostic reasoning strategies and diagnostic success. Medical Education 2003, 37:695-703. 10. Hampton JR, Harrison MJG, Mitchell JRA, Prichard JS, Seymour C: Relative Contributions of History-taking, Physical Examination, and Laboratory Investigation to Diagnosis and Management of Medical Outpatients. Brit Med J 1975, 2:486-489. 11. Kubler-Ross E: On Death and Dying Cambridge: Cambridge University Press; 1981, New York: Macmillan, 1970. Cited by Reiser SJ. Medicine and the reign of technology.. 12. Newton , Isaac : Letter to Robert Hooke 1675.

3.

doi:10.1186/1478-7954-9-26 Cite this article as: Riley: Computer-based analysis of verbal autopsies: revolution or evolution? Population Health Metrics 2011 9:26.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

21


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

RESEARCH

Open Access

Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets Christopher JL Murray1*, Alan D Lopez2, Robert Black3, Ramesh Ahuja4, Said Mohd Ali5, Abdullah Baqui3, Lalit Dandona1,6, Emily Dantzer7, Vinita Das8, Usha Dhingra3, Arup Dutta3, Wafaie Fawzi9, Abraham D Flaxman1, Sara Gómez10, Bernardo Hernández10, Rohina Joshi11, Henry Kalter3, Aarti Kumar4, Vishwajeet Kumar4, Rafael Lozano1, Marilla Lucero12, Saurabh Mehta13, Bruce Neal11, Summer Lockett Ohno1, Rajendra Prasad8, Devarsetty Praveen14, Zul Premji15, Dolores Ramírez-Villalobos10, Hazel Remolador12, Ian Riley2, Minerva Romero10, Mwanaidi Said15, Diozele Sanvictores12, Sunil Sazawal3 and Veronica Tallo12

Abstract Background: Verbal autopsy methods are critically important for evaluating the leading causes of death in populations without adequate vital registration systems. With a myriad of analytical and data collection approaches, it is essential to create a high quality validation dataset from different populations to evaluate comparative method performance and make recommendations for future verbal autopsy implementation. This study was undertaken to compile a set of strictly defined gold standard deaths for which verbal autopsies were collected to validate the accuracy of different methods of verbal autopsy cause of death assignment. Methods: Data collection was implemented in six sites in four countries: Andhra Pradesh, India; Bohol, Philippines; Dar es Salaam, Tanzania; Mexico City, Mexico; Pemba Island, Tanzania; and Uttar Pradesh, India. The Population Health Metrics Research Consortium (PHMRC) developed stringent diagnostic criteria including laboratory, pathology, and medical imaging findings to identify gold standard deaths in health facilities as well as an enhanced verbal autopsy instrument based on World Health Organization (WHO) standards. A cause list was constructed based on the WHO Global Burden of Disease estimates of the leading causes of death, potential to identify unique signs and symptoms, and the likely existence of sufficient medical technology to ascertain gold standard cases. Blinded verbal autopsies were collected on all gold standard deaths. Results: Over 12,000 verbal autopsies on deaths with gold standard diagnoses were collected (7,836 adults, 2,075 children, 1,629 neonates, and 1,002 stillbirths). Difficulties in finding sufficient cases to meet gold standard criteria as well as problems with misclassification for certain causes meant that the target list of causes for analysis was reduced to 34 for adults, 21 for children, and 10 for neonates, excluding stillbirths. To ensure strict independence for the validation of methods and assessment of comparative performance, 500 test-train datasets were created from the universe of cases, covering a range of cause-specific compositions. Conclusions: This unique, robust validation dataset will allow scholars to evaluate the performance of different verbal autopsy analytic methods as well as instrument design. This dataset can be used to inform the

* Correspondence: cjlm@uw.edu 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave, Suite 600, Seattle, WA 98121, USA Full list of author information is available at the end of the article © 2011 Murray et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

22


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 2 of 15

implementation of verbal autopsies to more reliably ascertain cause of death in national health information systems. Keywords: Verbal autopsy, VA, validation, Philippines, Tanzania, India, Mexico, gold standard, cause of death

Background Verbal autopsy (VA) is a critically important tool to measure causes of death in populations without complete medical certification of causes of death. A variety of methods have been proposed for VA cause assignment [1,2], ranging from physician-certified verbal autopsy (PCVA) [3,4] to data-derived algorithms [5-7], various applications of Bayes’ theorem [8-13], and direct statistical estimation of cause fractions [14]. New methods to analyze VAs and attribute causes of death to them are now being developed [15-19], and it is likely that there will continue to be new methods and refinements. Given both the increasing demand for good cause of death information for the world’s poorest populations and the expanding array of VA approaches, it is essential to be able to assess the performance of these options in a scientific and comparable manner. Several validation studies of VA cause assignment methods have been published [2,3,12,20-31]. Results of validation studies to date, however, have been challenged on several grounds [32-34]. First, previously published validation studies compare the cause of death for individuals derived from verbal autopsy to the cause of death recorded in hospital records or that derived from independent review of hospital medical records. The quality of record keeping and the laboratory, medical imaging, and pathological services available in many developing country hospitals can be extremely poor. This is especially true in resource-poor remote areas where validation studies have been undertaken. As a result, many of these validation studies are actually comparisons of two imperfect cause of death assignment approaches: low-quality hospital-assigned cause of death and the verbal autopsy. In the language of psychometrics, most studies provide information on convergent validity rather than a comparison to a true gold standard known as criterion validity [35]. Second, many studies start with a community sample and then trace back as many deaths to hospital records as possible. The resulting studies often yield small numbers for many causes, so that published results only cover the convergent validity of VA with hospital-assigned (or derived) causes of death for a limited number of causes of death. For many important causes of death such as liver cirrhosis, chronic obstructive pulmonary disease (COPD), or specific sites of cancer, there is essentially no published information on performance of VA. Third, validation studies often do not provide details on the exact

items in the VA instrument, the training of interviewers, the training of physicians for PCVA, the coding of death certificates completed by physicians for PCVA, or the protocol used to extract a cause of death from the hospital records. The Population Health Metrics Research Consortium (PHMRC) gold standard verbal autopsy validation study was initiated in 2005 to address these research limitations and to ensure that comparative assessments of VA performance were based on clinically reliable diagnoses. We designed the study as a multisite collaboration that aims to address some of the key limitations of previous validation studies and stimulate the development of new methods or refinements of existing methods. The primary goal was to collect a dataset that would help provide more definitive answers as to which VA approaches are more valid and to capture data in a standardized way. In this paper, we describe the design of the study, the criteria used to establish a gold standard (GS) cause of death, the implementation of fieldwork, and the creation of standardized datasets for developing and testing new methods.

Methods Data collection sites

Gold standard VA data collection was implemented in six sites in four countries: Andhra Pradesh, India; Bohol, Philippines; Dar es Salaam, Tanzania; Mexico City, Mexico; Pemba Island, Tanzania; and Uttar Pradesh, India. Table 1 shows the age and sex distribution for the decedents represented in this study, as well as the national life expectancy. Research at the Andhra Pradesh, India, site was implemented and coordinated through the George Institute for Global Health, India, and was centered in the main capital city, Hyderabad, as well as the neighboring areas of Ranga Reddy, Medak, and Nalgonda. Hyderabad is 100% urban with a population of roughly 3,830,000 inhabitants. The neighboring area Ranga Reddy has a similar population size (3,575,000) and is roughly half urban and half rural. The Medak and Nalgonda areas are similar to each other, both roughly 14% urban, comprised of 3,248,000 people in Nalgonda and 2,670,000 in Medak. The Bohol Island site was led by the Research Institute for Tropical Medicine in Manila. Bohol is a tropical island province located in the Central Visayas of the Philippines, with 46 municipalities and Tagbilaran City.

23


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 3 of 15

Table 1 The age and sex distribution of the decedents represented in the verbal autopsy sample and the national life expectancy for the country according to the 2010 United Nations numbers Site

National life expectancy

Decedents sampled % Male

% Female

% Under age 5

% Ages 5 - 59

% Ages 60+

Andhra Pradesh, India

64.2

59

41

28

55

17

Bohol, Philippines

67.8

56

44

31

38

31

Dar es Salaam, Tanzania

55.4

48

52

44

41

15

Federal District and Morelos, Mexico

76.2

53

46

21

46

34

Pemba Island, Tanzania

55.4

52

48

60

31

10

Uttar Pradesh, India

64.2

58

42

24

58

18

et al. (1994) for adult deaths and of Anker et al. (1999) for neonatal and child deaths [38,39]. Separate questions were developed for neonatal deaths and stillbirths, children 1 month to 11 years, and adults 12 years and older. Experience gained from VA studies in Andhra Pradesh and China where the WHO instrument, or slight variants of it, had been applied was also considered [40,41]. A committee drawn from the principal and associate investigators considered modifications based on published and unpublished experiences with the WHO instrument, including fieldwork conducted as part of a large VA study in Thailand. The final instrument was translated into the respective local languages, and then back-translated to English by a different translator to ensure accuracy. The PHMRC instrument is comprised of a general information module, an adult module, and a child and neonatal module. Skip patterns were integrated into the general information module to collect the age of the deceased and then direct interviewers to the correct module to administer. In administering the WHO

Verbal autopsies were collected over the entire island, as well as a small proportion from Manila. According to the 2007 census, 1,230,000 people live in Bohol. Manila is urban, while Bohol is divided into roughly 46% urban and 54% rural. The research site in Dar es Salaam, Tanzania, was managed by collaborators at the Muhimbili University of Health and Allied Sciences. Verbal autopsies were collected from all over the city of Dar es Salaam, which has a population of roughly 2,487,000 people according to the 2002 census, with 94% of people living in urban areas and 6% living in rural areas. The Mexican study was coordinated by the National Institute of Public Health in the Federal District and the state of Morelos. According to the 2010 Census, 8.85 million inhabitants live in the Federal District and 1.8 million live in Morelos. Sixteen percent of the population of the state lives in rural areas [36]. Pemba Island, Tanzania, is the smaller of the two islands of the Zanzibar archipelago. The research there was coordinated through the Public Health Laboratory Ivo de Carneri as part of a collaboration between the Ministry of Health and Social Welfare and Johns Hopkins University. Verbal autopsies were collected from all areas of the island. This island has a population of roughly 400,000 inhabitants. The island is 99% rural and 1% semi-urban. Finally, the Uttar Pradesh site in India was led by collaborators at the CSM Medical University (CSMMU, formerly, King George Medical College) in Lucknow. Verbal autopsies were collected from a wide range of districts in the state of Uttar Pradesh: Ambedkar Nagar, Bahraich, Barabanki, Basti, Faizabad, Gonda, Hardoi, Lakhimpur, Lucknow, Rae Bareli, Sitapur, Sultanpur, and Unnao. Table 2 shows the population and urban percentage for each of these districts.

Table 2 The population size in thousands and percent of population that is urban for the Uttar Pradesh, India field sites, according to the 2001 Census of India Population Size 2,026

9

Bahraich

2,381

10

Barabanki

2,673

9

Basti

2,084

6

Faizabad

2,088

13

Gonda

2,765

7

Hardoi

3,398

12

Lakhimpur Lucknow

Instrument

The instrument development was based on the WHO standardized verbal autopsy instrument [37], which in turn was based in part on the work of Chandramohan

24

% Urban

Ambedkar Nagar

889

7

3,647

64

Rae Bareli

2,872

10

Sitapur

3,619

12

Sultanpur

3,214

4

Unnao

2,700

15


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 4 of 15

question type dropped from the instrument was the duration of certain symptoms. For example, the PHMRC instrument asks whether adults had developed a lump in the neck, armpit, breast, or groin but dropped the follow up question “For how long did s/he have the lumps?” as the presence of the symptom alone was the most important information. Another common question type dropped from the WHO instrument was about treatment that had been received by the decedent, as they were less important in informing the cause of death. Finally, the PHMRC instrument did not include questions about chronic conditions in children, such as cancer, tuberculosis, and diabetes. Additional file 1 illustrates the content questions, such as symptoms experienced by the decedent that were added or dropped when converted from the WHO instrument to the PHMRC instrument. The small wording changes are not included in this additional file, though the full PHMRC instrument is included in Additional file 2 (general module), Additional file 3 (adults), and Additional file 4 (children and neonates) for reference.

instrument, the interviewer must first determine the age of the deceased and select the correct instrument to deliver, which results in the potential for more interviewer error and a less fluid interview. The general information module, which is administered in all verbal autopsies, collects items such as education of the decedent, household characteristics, and a household roster. The adult module collects a history of chronic conditions, symptoms of the deceased, women’s health questions if the decedent is female, alcohol and tobacco use, and injury information; it also transcribes any available medical record and death certificate information. The child and neonatal module first asks background questions on information such as whether the mother is still alive, where the deceased was born, the size of the decedent at birth, and the delivery date. The questionnaire then ascertains whether the decedent was a stillbirth and, if so, collects symptom questions, such as signs of injury. If not, the questionnaire collects more general information such as the age of the baby or child when they became ill and the age at death. If the decedent is under 28 days (inclusive of stillbirths), a maternal history is collected. In addition, if the decedent is under 28 days and was born live, a full set of neonatal symptom questions are collected. If the decedent is between 28 days to 11 years, infant and child symptom questions are asked. All available health records and death certificates are transcribed for both neonatal and child deaths. Finally, for all ages, the open narrative section was moved to the end of the interview, after the structured questions. This was done to ensure that in future work, we could remove the open-ended items without concern that the results collected in this study were a function of the open-ended items coming prior to structured content. In addition to the structural changes, there are important differences between the PHMRC instrument and the WHO instrument. First, the WHO adult module is administered on ages 15 and above, while the PHMRC adult module begins at age 12. This expansion of the ages included in the adult module ensures that conditions clinically present, such as maternal mortality in 12 to 14 year olds, are captured through this instrument. Second, a substantial portion of the questions were reworded to ensure clarity. Medical terminology was converted to easily understandable descriptions to target a lay population. For example, “Did s/he have abdominal distension?” was reworded to “Did [NAME] have a more than usual protruding belly?” Information was also added for precision, or removed to ensure only the most diagnostically relevant information was collected. Similarly, we added or dropped entire questions to capture the most essential information, while reducing the duration of the interview as much as possible. One common

Cause list

A key challenge for the study was to identify the cause list for each of the three age groups for which we would seek to collect a sample of gold standard deaths. Our selection of the target cause list was based on consideration of the WHO estimates of the leading causes of death in the developing world in each age group, those causes for which verbal autopsy might be able to function adequately because unique signs and symptoms could potentially be collected in an interview, and the potential to find, in the six sites, deaths with sufficient laboratory, medical imaging, and pathological detail in order that a gold standard cause of death assignment could be made. The cause lists were also designed so that they were mutually exclusive and collectively exhaustive. The target cause list for adults, children, and neonates included 53, 27, and 13 GS causes, respectively, plus stillbirths (for a complete list of causes, see Additional file 5). These cause lists are much longer than for any previously undertaken VA validation study. In fact, nearly all previous VA validation studies have started with a community or convenience sample of deaths and then ascertained cause in hospital records rather than seeking to collect data on a list of causes by design. Gold standard criteria

A critical component of the study was the development, for each cause, of clear criteria that had to be fulfilled for a death to be assigned as a GS cause of death. Depending on the cause of death, these criteria included clinical endpoints, laboratory findings, medical imaging,

25


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 5 of 15

to the gold standard criteria (levels 1, 2A, and 2B). The medical information from all qualifying cases selected by the clinicians was extracted and sent to the George Institute Hyderabad office for enrollment in the verbal autopsy study. In Bohol, the majority of deaths were reviewed at the Bohol Regional Hospital. This facility is the referral hospital for Bohol Province with the highest available standards of clinical investigation and hence diagnosis. Three nurses monitored all deaths in the hospital. They ensured that all reports of investigations (imaging and laboratory) were located and attached to the charts. In addition, to augment the number of deaths collected, 467 deaths were recruited from two hospitals in Manila: the Veterans Memorial Medical Center and the Rizal Medical Center. In all locations, the nurses summarized the case notes, including reports of investigations, onto the medical data extraction forms. MDEFs were first reviewed by two study physicians who assigned cause of death and decided by diagnosis and GS level which VAs should not be collected. Deaths were reviewed as soon as possible after the death. At the Dar es Salaam site, five health facilities were used as recruitment sites. These were Mwananyamala Hospital, Temeke Hospital, Muhimbili National Hospital, Ocean Road Cancer Institute, and Hindu Mandal Hospital. Mwananyamala and Temeke are both district hospitals, each of which records roughly 1,500 deaths per year. Ocean Road Cancer Institute is the only cancer treatment facility in Tanzania and was an important source for causes such as cervical cancer, esophageal cancer, breast cancer, leukemia, prostate cancer, and lymphomas. Muhimbili National Hospital is a referral and teaching hospital with a higher mortality rate than the other enrolled facilities. Hindu Mandal Hospital is a private hospital in the heart of Dar es Salaam. It has a well-established HIV/AIDS clinic and commonly receives noncommunicable disease cases. At each location, a nurse affiliated with the study reviewed medical records to identify qualifying cases. The cases identified by the nurses were reviewed by physicians, who filled out the MDEFs with the gold standard levels for the cases that were eligible for enrollment. The nurses spoke with family members of the deceased if present at the hospital to enroll them in the study, collect their consent, and obtain mapping information and directions for a verbal autopsy interview. In Mexico, after obtaining authorization to work in each medical unit, a group of six trained physicians reviewed the medical records of cases (and when available the reports from autopsies) that could be included in the study, filled an extraction form for each case, and classified them as levels 1, 2, or 3 according to the gold standard criteria proposed by the PHMRC. Only cases

and pathology. Additional file 6 (adults) and Additional file 7 (children and neonates) provide the gold standard criteria for each cause. These gold standard criteria were developed by a committee of physicians involved in the study and underwent multiple cycles of group review. Preliminary review of hospital records in the sites indicated it would be very difficult to identify any deaths for some causes that would meet the strict gold standard criteria. In order to ensure that as many potentially eligible deaths in each site as possible were collected for the study, a less strict but nevertheless detailed level 2 set of criteria were also developed (see Additional files 6 and 7). In some cases, these level 2 criteria were further disaggregated into level 2A and level 2B. By way of example, the criteria for determining a death as being due to adult breast cancer, adult acute myocardial infarction, child pneumonia, and neonatal birth asphyxia are shown in Table 3. By recording the level of diagnosis for each death, we are able to test whether the assessment of performance for any method is affected by the level of cause of death assignment according to our criteria. Data collection Identification of gold standard deaths

As described above, a stringent set of diagnostic criteria for each cause of death was developed by a team of study physicians before fieldwork began. Each site then enrolled local health facilities at which medical records would be reviewed. Consortium members led a two-day training at each of the sites to train the reviewers in the gold standard definitions, the protocols for identifying cases meeting these criteria, and the procedure for extracting the pertinent medical information. Each reviewer was provided a pocket guide detailing the necessary criteria for each gold standard cause of death. The medical information from qualifying records was extracted using a standard medical data extraction form (MDEF, see Additional file 8), which the study team developed. Once eligible records were extracted, a local physician reviewed the medical information and determined the gold standard level of the particular case according to the diagnostic criteria outlined for each level for each cause. The following information details the specific protocol followed by each research site. In Andhra Pradesh, four hospitals were recruited for the study. Three are government hospitals - Gandhi Hospital, Osmania General Hospital, and Chest Hospital - and one is a private hospital, CARE Foundation. There was 24-hour surveillance at the hospitals and all patients were enrolled with their addresses. Study supervisors collected information on all deceased patients from all wards, and clinicians involved in the study then reviewed the case sheets to select those that conformed

26


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 6 of 15

Table 3 Examples of gold standard criteria for adult breast cancer, adult acute myocardial infarction, child pneumonia, and neonatal birth asphyxia Adult breast cancer Level 1

One of the following: • Operative specimen with histological confirmation • Biopsy/fine needle aspiration cytology

Level 2A

Both of the following: • Mammography diagnosis • Imaging evidence of metastases in bone, lung, etc. based on CT scan/MRI/X-rays

Level 2B

Patient under treatment from a recognized cancer hospital or cancer unit for breast cancer in cases where the basis for the initial diagnosis is no longer available.

Adult acute myocardial infarction Level 1

Evidence of acute MI within three months preceding death based upon one or more of the following: • Cardiac perfusion scan • ECG changes • Documented history of CABG or PTCA or stenting • Coronary angiography • Enzyme changes (any troponin elevation or CK-MB isoenzyme elevation >2 times the upper limit of normal) in the context of myocardial ischemia

Level 2A

Clinical evidence of the following: • Sudden death within six hours of the onset of characteristic shock and chest pain when the case has been witnessed by a physician

Child pneumonia Level 1

Chest X-ray showing primary end-point consolidation, pleural effusion or other consolidation/infiltration, plus two or more of the following: • Respiratory rate >70/minute • Severe lower chest indrawing • Abnormal breath sounds (i.e., grunting, decreased breath sounds, crepitations) • Rectal temperature >38°C or <36°C • Oral or axillary temperature >37.5°C or <35.5°C

Neonatal birth asphyxia Level 1

Each of the following: • Failure both to breathe spontaneously and to cry at birth • No major congenital abnormality • Not a stillbirth (one or more signs of life at birth like pulse or movement) Plus one of the following in the 24 hours after birth: • Not feeding • Hypotonia • Seizures • Needed and failed resuscitation at birth

Level 1 is the most stringent criteria, while level 2A or 2B were also collected for some causes.

In Pemba, there are four major government hospitals on the island, though most facilities do not have a certified medical doctor present and are managed by medical assistants and nurses. Surveillance systems were put in place in all four hospitals to identify deaths and to classify them into GS categories. The hospital supervisor recorded complete identification information upon admission of each patient, and the attending physician medical assistant confirmed the admission diagnosis. Hospital supervisors ensured that the signs and

classified as levels 1 and 2 were considered eligible for the study. The original design considered the inclusion of only one to three large hospitals in Mexico City, but due to the difficulty of completing the quota of gold standard cases, hospitals from the health service network of the Federal District government and from the Ministry of Health of the state of Morelos were included. The data were collected from 36 public hospitals: 33 from the Federal District and three from Morelos.

27


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 7 of 15

symptoms experienced by the patient were recorded and that a mortality form with the cause(s) of death was filled out by the attending physician in the event of a death. All forms were sent back to the field headquarters for data entry. A computer algorithm was run to identify cases meeting GS criteria, and all GS cases were recorded in a database. A computer listing was prepared with identifier information to schedule the VA interviews. In Uttar Pradesh, the gold standard deaths were enrolled at CSMMU, Lucknow, which is a tertiary care government facility with patient inflow from all over Uttar Pradesh and bordering states, including districts in the neighboring country of Nepal. The catchment area spreads over a radius of more than 500 km, of which about 85% cases come from 13 districts surrounding Lucknow. There was 24-hour surveillance at facilities and all patients were enrolled with an address. When a death occurred, the project medical officer reviewed the patient case sheet in consultation with the resident doctor in order to assess the GS levels against standard criteria.

Quality control of fieldwork and data entry

To ensure the highest quality data was collected, quality control checks were performed both at the individual site level, as well as at the Institute for Health Metrics and Evaluation (IHME), where all data were transmitted through a secured password-protected site for analysis. In all sites, supervisors were trained in the protocols for monitoring quality control at the site level. Supervisors were instructed to observe VA interviewers in the field during the early stage of data collection to ensure they were conducted properly and to provide guidance. Supervisors additionally checked every VA form collected throughout the study to ensure that it was filled out consistently and correctly. If issues were identified by the supervisor, a reinterview was conducted as needed. The field interviewers had periodic meetings with their supervisors to discuss performance, progress, and challenges. Supervisors at most sites additionally reinterviewed a portion of the verbal autopsies to spot check the quality of the information collected. At IHME, we systematically evaluated all datasets electronically for numerous types of quality issues by a comprehensive set of codes. First, we reviewed the dataset for missing values and for incorrect skip patterns that result in specific questions having been filled in or left blank erroneously. The dataset was also evaluated to determine if any of the observed values fell outside of expected ranges. For example, if the response for a neonatal symptom duration was greater than 28 days (the cutoff for classification as a neonatal death), this value was flagged. Next, if the dataset was submitted in multiple sections, we examined the final comprehensive database for any technical issues that may have occurred in merging the individual files. Finally, we merged the dataset with the gold standard medical record information, which was separately transmitted to IHME by the site coordinator. We examined the observations for consistency between the two sources of information, such as the sex of the decedent as reported in the medical record and as reported by the verbal autopsy respondent. Any issues determined through this stringent checking process were compiled into a report and sent to the site to review. Site coordinators were asked to speak with the interview staff and rectify any correctable issues such as data entry mistakes.

VA interview

Once enrolled, the VA interviewers at each site attended a training session led by consortium members using standardized materials and an interviewer’s manual. The training manuals provided information on the study background, the roles and responsibilities of the VA interviewer, background on how VA cases were selected, instructions for administering the questionnaire, and information on every question in the instrument. The manual provided guidance on how to handle an array of questions or concerns, tips for building rapport with the respondents, and probing as needed to collect reliable information. Following the training, VA assignments were given to interviewers blinded to the medical information or cause of death of the decedent along with directions or map queues to the households. In some sites the families were contacted in advance to schedule an appointment, though this decision was left to the sites’ discretion. All interviews were collected after a culturally appropriate grieving period had passed. The minimum grievance period was six days in Bohol and the maximum was six months in Mexico (as required by the ethics boards at the hospitals). The maximum amount of time post-death that an interview was collected was eight months in the Mexico site. The rate of interview refusals varied by site from 1.8% to 9.5%. For those that consented to a verbal autopsy, the instrument was administered on paper in the field, and returned to the field headquarters for double data entry. Interviews lasted an average of 45 minutes across all of the sites.

Generation of dichotomized variables

In addition to the full dataset as it was collected, we have also created a series of dichotomous variables from each of the polytomous (categorical) and continuous (duration) variables. Some analytical methods can only use dichotomized variables, so this effort to create the dichotomous variables increases the information available to these types of empirical methods. For each

28


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 8 of 15

questions asked if the deceased had any specified conditions, which would likely indicate a health care provider had diagnosed the individual. Each of the following conditions was asked: “Did decedent have [asthma, hypertension, obesity, stroke, tuberculosis, AIDS, arthritis, cancer, COPD, dementia, depression, diabetes, epilepsy, heart disease]?” Second, if any medical records were available, the interviewer was asked to provide a transcription of the last note on the medical record. Third, if a death certificate was available, the interviewer was asked to record the immediate cause of death, first underlying cause, second underlying cause, third underlying cause, and contributing causes from the death certificate. Finally, at the end of the questionnaire, an open-ended section was provided to collect any comments from the interviewer, as well as to ask the respondent “to summarize, or tell us in your own words, any additional information about the illness and/ or death of your loved one?” Excluding this entire section excludes both open narrative recall of HCE but also, in the case of PCVA, excludes any other information on timing and sequencing of signs and symptoms that might be conveyed in this section.

continuous duration item, depending on the item, we identified a short or long cutoff. For example, a duration of 8.8 days marks long duration of a fever. If a VA reports a fever of 10 days, it is considered to have the symptom of “having a long fever.” We determine the cutoff as being two median absolute deviations above the median of the mean durations across causes (MAD estimator). The MAD estimator can be used as a robust measure of the standard deviation and is especially useful in cases where extremely long durations may be reported, which would bias measures such as the standard deviation. Additional file 9 shows the cutoffs for each item developed in this way. For polytomous variables, we examined the pattern of the endorsement rates across causes and mapped the categories into two, thus creating a dichotomous version of the variable. For example, we judged that there was a stronger signal produced by combining moderate and severe fevers. Additional file 10 shows the mapping of each response category into dichotomous variables. Based on the data collected, some polytomous variables appeared to have little or no information content and were not mapped into a dichotomous form. These low information content items are shown in Additional file 11. This exercise was undertaken for neonatal, child, and adult modules separately.

Processing free text for use in empirical methods

The structured instrument includes various open text items. First, some questions in the instrument ask the respondent to choose from a list of specified response options. For example, “Where was the rash located?” has the following response options: face, trunk, extremities, everywhere, or “other (specify: ____).” If the response is not one of the listed options, the respondent is asked to fill in the location of the rash as the “other” response. The questions that include an “other” free text response option are as follows: “Where was the rash located?"; “Where was the pain located?"; “Which were the limbs or body parts paralyzed?"; “What kind of tobacco did [NAME] use?"; “Did [NAME] suffer from an injury or accident such as a ____?"; “Where was the deceased born?"; “What were the abnormalities?” in reference to any abnormalities at time of delivery; “Where did the deceased die?"; “What was the color of the liquor when the water broke?” in reference to labor; “Where did the delivery occur?"; and “Who delivered the baby?” In the questions that collect information about a health facility or midwife, free text responses collected the name and address of the place or person. In addition to these free text items, if any medical record or death certificates were available, the interviewer was asked to transcribe the information from the records as free text. Finally, at the end of each interview, the open narrative question “Summarize, or tell us in your own words, any additional information about the illness and/or death of your loved one?"(as described

Inclusion of health care experience

There has long been concern that the performance of a VA instrument and the associated analytical method for assigning cause could be different for deaths where the decedent died in a hospital or had made extensive use of health services prior to death, compared to deaths with no health care experience (HCE). As an attempt to examine how VA may work in communities with limited or no access to health care services, Murray et al. [12] studied how PCVA and the Symptom Pattern Method performed when all items referring to use of health services such as “Have you ever been diagnosed with...” or hospital records or death certificates were excluded from the analysis. They showed that, in China, recall of the household or possession of medical records recorded in the VA interview had a profound effect on both the concordance for PCVA as well as the performance of the Symptom Pattern Method. Given this empirical finding, we believe it is useful to test how excluding household recall of health care experience likely provides a more realistic assessment of how VA performs in communities without access to health services. As such, we have created two versions of the datasets developed above, one version with all variables and one version excluding recall of health care and medical records. Specifically, the without HCE dataset excludes the following information. First, a series of

29


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 9 of 15

above) was collected in addition to any notes from the interviewer. Open text could in theory be highly informative, especially household recall of HCE and an interviewer’s direct recording of death records or hospital records kept by the household. These observations are likely to be available in populations with some access to health care services. To make this information available to automated methods, we processed open text in the following steps. First, all free text was compiled into a database and a dictionary was created to map all similar words to the same stem word. For example, the terms AMI, myocardial infarction syndrome, acute myocardial infarction, ISHD, MI, coronary heart disease, CHD, IHD, MCI, and MYIN would all be mapped by the dictionary into the same variable ("IHD: Acute Myocardial Infarction”). Next, a program called README [42] extracts each individual variable and assigns a frequency count for the number of times it appears in the entire free text database. Variables that are not deemed to be diagnostically relevant or that are very low in frequency are then dropped from the dataset. The final product is a condensed dictionary of medically important terms consisting of 106 variables for adults, 90 for children, and 39 for neonates. These terms are added as additional binary symptoms (present or not present) in the VA database. If any of the terms appear in the free text for a particular death, it is counted as a positive endorsement for that symptom. These symptoms are not used in the “without” HCE dataset. Additional file 12 provides the comprehensive dictionary that was developed.

Original Data with Validated Gold Standard

25%

Random CSMF via Dirichlet

Sampling without replacement

Test Data Pool

75%

Sampling with replacement

Train Dataset

Test Dataset

Figure 1 The process of generating 500 test and training datasets (done separately for each cause of death).

Analysis datasets

For empirical VA methods that must be developed using the pattern of responses observed in a dataset, validation needs to be undertaken on a set of deaths that were not included in the development of the method. This is the concept of a training dataset distinct from a test dataset. Further, as recommended in Murray et al. [15] it is important to have test datasets with widely varying cause-specific mortality fractions (CSMFs) so that a VA method does not by chance appear to be better than another because of the specific CSMF composition in the training set. To facilitate strict comparability, we have created 500 train-test dataset pairs. Each pair was created by first splitting the data randomly (without replacement) into 75%/25% training and test datasets, cause by cause, and then resampling the data in the test dataset (with replacement) to have 7,836 adult, 2,075 child, 1,629 neonatal, and 1,002 stillbirth deaths, matching a cause composition drawn from an uninformative Dirichlet distribution (Figure 1). In other words, each test dataset has been resampled to have a different CSMF composition. Because the CSMF compositions

have been drawn from an uninformative Dirichlet, across the 500 test datasets, there are cases where any given cause has a cause fraction near zero and cause fractions as high as 20% or more. By the nature of this sampling strategy, there is no correlation between the CSMF composition of the training and test dataset pairs. Shortened cause lists

In order to have an efficient cause list for the analysis, we have reduced it in two steps as illustrated in Table 4. From the original gold standard target cause list we received deaths from the sites for 53 diseases in adults, 27 in children, and 13 in neonates, excluding stillbirths. The first step was to select only those causes with 15 or more deaths (see Additional file 5 for a detailed mapping), and due to that decision we reduced the list into 46 adult causes, 22 child causes, and 12 neonate causes, excluding stillbirths. For instance, pelvic inflammatory

30


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 10 of 15

(88%) were deaths that met the highest level of GS criteria (level 1). This number varies from 84% in Bohol to 91% in Dar es Salaam; and by age, 86% of adult deaths were level 1, 81% of child deaths, and 99.7% of neonate deaths. The majority of the remaining 12% level 2 deaths were adults. It is interesting to note the cause distribution by quality of the gold standards. Table 6 presents the breakdown of how many level 1 and level 2 GS cases were collected for each of the 53 adult causes. Eighty-six percent of adult deaths were level 1, 13% were level 2A, and 1% were level 2B. Twenty five causes of death, which represent 47% of all adult causes, were exclusively level 1. For the remaining 28 causes, the frequency of level 1 deaths varies, such as cirrhosis and asthma with less than 30% level 1 cases; pneumonia and sepsis with between 30% and 60% level 1 cases; and stroke, lung and esophageal cancers, and tuberculosis with between 60% and 75% level 1 cases. Table 7 shows the results for the 2,075 deaths in children. Eighteen causes of death, which comprise 67% of all of the child causes, reached the level 1 gold standard. Another six causes do not achieve more than 60% of gold standard level 1 and vary from 0% (measles) to more than 50% (malaria, pneumonia, and sepsis). Table 8 shows that the level of quality was very high for the 1,629 neonatal deaths and 1,002 stillbirths. The distribution of cases (all criteria levels combined) across the six sites is shown in Additional file 13. The relative distribution of cases by age of death across sites reflects their overall progress with mortality transition. Thus adult deaths were comparatively fewer in Pemba compared to all other sites where 1,200 to 1,600 cases were typically collected. Larger numbers of child deaths were collected in Dar es Salaam and Uttar Pradesh, where child death rates are higher than elsewhere. Similar numbers of neonatal deaths were collected in each site (250 to 400) except for Dar es Salaam. In this case, the site collected VAs on a significantly higher number of neonatal deaths (1049) than was targeted, as the site had the VA interviewer capacity to easily add these cases as they were identified. For example, while the targeted number of stillbirth deaths was 100, the Dar es Salaam site was able to easily collect interviews on 432 cases to help build a more robust dataset.

Table 4 Reduction in number of causes to the final analysis cause list, excluding stillbirths Adult

Child

Neonate

Target cause list

53

27

13

>15 deaths

46

22

12

Cross classification

34

21

10

diseases, uterine cancer, and dementia in adults; AIDS with tuberculosis in children; and meningitis in neonates had fewer than 15 deaths each. We also eliminated pertussis in children and neonatal tetanus because no pertussis and only four neonatal tetanus deaths were gathered. These deaths were assigned to one of the remaining categories, such as residual categories like “other defined cancers” or “other childhood infectious diseases.” In the next step we explored the frequency with which one cause was erroneously classified as another cause in the analysis. For example, deaths due to maternal hemorrhage were often assigned to anemia in the analysis and vice versa. Similarly, all types of diabetes in adults (diabetes with coma, with renal failure, or with skin infection), sepsis with and without local bacterial infection in children, and respiratory distress syndrome in neonates regardless of the gestational age were all frequently hard to differentiate in the analysis. The causes that were frequently confused with each other were aggregated into a new cause in the final analysis cause list. For example, all six maternal causes were combined into one maternal category. After this step, the final cause list for analysis had 34 causes for adults, 21 for children, and 10 for neonates, excluding stillbirths.

Results Table 5 shows that of the 12,542 deaths collected as gold standard cases for the study, the vast majority Table 5 Numbers of VAs collected by site and gold standard level Site

Andhra Pradesh

Adult

Child

Neonate

Total

Level 1

Level 2

Level 1

Level 2

Level 1

Level 2

1,285

269

385

66

376

1

2,382

Bohol

998

262

234

30

374

0

1,898

Dar es Salaam

1,556

162

366

106

1,047

2

3,239

Mexico

1,373

215

124

4

313

2

2,031

Pemba Island

266

31

156

105

261

3

822

Uttar Pradesh

1,277

142

412

87

251

1

2,170

Total

6,755 1,081 1,677

398

2,622

9

12,542

Discussion PHMRC was able to obtain completed VA interviews for more than 12,000 deaths with GS assignment of true cause of death. Because of the poor quality of medical record-keeping and limitations of diagnostic technology in many hospitals, to identify more than 12,000 GS deaths required reviewing and screening a much larger number of records. While it was difficult in many sites

31


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 11 of 15

Table 6 Numbers of VAs collected by cause of death and gold standard level for adult causes

Table 6 Numbers of VAs collected by cause of death and gold standard level for adult causes (Continued)

Adult causes

Level 1

Level 2A

Level 2B

Renal failure

411

AIDS

345

0

8

Road traffic

202

0

0

AIDS with TB

148

0

0

Sepsis

24

46

0

Acute myocardial infarction

376

24

0

Stomach cancer

50

10

2

Anemia

68

0

0

Stroke

378

252

0

Asthma

13

34

0

Suicide

124

0

0

Bite of venomous animal

66

0

0

TB

196

79

0

Breast cancer

179

3

12

Uterine cancer

1

1

1

COPD

170

1

0

Cervical cancer

127

23

5

Cirrhosis

82

231

0

Colorectal cancer

85

6

8

Dementia

1

0

0

144

0

0

Diabetes with coma Diabetes with renal failure

156

0

0

Diabetes with skin infection/sepsis

114

0

0

Diarrhea/dysentery

221

7

0

Drowning

106

0

0

Epilepsy

47

1

0

Esophageal cancer

26

13

1

Falls

173

0

0

Fires

122

0

0

Hemorrhage

111

3

0

Homicide

167

0

0

Hypertensive disorder

107

6

0

Congestive heart failure

221

0

0

Inflammatory heart disease

42

0

0

Leukemia

71

2

5

Liver cancer

29

0

2

Lung cancer

66

36

4

Lymphomas

74

0

3

Malaria

89

11

0

Mouth/oropharynx cancer

22

0

0

Obstructed labor

17

1

0

Other cancers

142

0

0

Other cardiovascular diseases

153

0

0

Other digestive diseases

166

0

0

Other infectious diseases

258

0

0

Other injuries

103

0

0

Other noncommunicable diseases

200

0

0

Other pregnancy-related deaths

89

0

0

Ovarian cancer

32

1

0

Pelvic inflammatory disease

5

0

0

Pneumonia

310

229

0

Poisonings

86

0

0

Prostate cancer

40

8

0

2

0

to obtain sufficient documentation for some causes of death overall across all six sites, we were able to find enough deaths for 46 adult causes, 22 child causes, and 12 neonate causes, excluding stillbirths, from the original cause list. The implementation of the project revealed just how poor the quality of medical records and diagnosis is in some institutions. This finding reaffirms our original hypothesis that convergent validity between verbal autopsy and poorly assigned hospital cause of death is not a measure of criterion validity. An important potential limitation of the study is the extent to which the cause of death based on fulfilling the clinical, laboratory, medical imaging, and tissue pathology criteria in this study are the true cause of death. Studies in high-resource settings [43] suggest that clinical diagnosis compared to postmortem autopsy may differ in up to 25% of cases. These studies, however, exaggerate the limitations of our study using clinical diagnostic criteria for three reasons. First, autopsies are much more likely to be undertaken in medico-legal cases or cases with uncertain clinical diagnosis. Shojania et al. found that once the inherent selection bias of postmortem autopsy is taken into account, clinical diagnosis and postmortem autopsy agree more than 90% of the time [44]. Second, these comparisons are for all clinical diagnoses, not for the subset that meets our clearly defined and stringent criteria. In general, less than onethird of hospital deaths in our study fulfilled our diagnostic criteria even in the most sophisticated hospitals. It is a reasonable assumption that the concordance between the clinical diagnosis and postmortem autopsy would be even higher in the subset meeting our criteria. Finally, the definition in these studies of major diagnostic discrepancy is for clinical purposes, not for the purposes of assigning underlying cause of death. For the latter effort, some of the major discrepancies would not move deaths between cause of death categories used in this study. Some readers may object to the use of “gold standard� in describing our dataset. We believe, however, that we have implemented the best possible approach to

32


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 12 of 15

assigning causes of death. In nearly all settings, postmortem rates are low and subject to severe selection bias toward diagnostically challenging and nonrepresentative deaths for a cause. For both implementation and selection bias reasons, we do not foresee VA validation studies being undertaken using large samples of deaths with postmortem autopsies. Clearly defined clinical, laboratory, imaging, and tissue pathology criteria as used in this study are the best that can be implemented. As such, we believe the use of the term gold standard for this dataset is appropriate. A particularly vexing issue in VA validation studies is that by their nature they are conducted on deaths that have occurred in hospital. What would be the performance of VA for deaths in the community? There are potentially three distinct aspects to this question. First, the cause-composition of deaths in the hospital and the community will be different. Fortunately, because we create multiple test datasets with widely varying cause compositions, this issue will not influence the results from VA validation studies as long as the methods recommended by Murray et al. [15] are followed. Second, contact and experience with the health system could change the way in which household members recall certain symptoms or signs. If it does, then VA may capture more information in those cases with hospital experience than when implemented in a population with little or no experience of health care. Given that all validation studies require some diagnostic information on the course of illness prior to death, no validation study can ever investigate this question. This is an unfortunate reality; we believe that constructing a dataset, as we have done, that excludes all information from the household about medical experience prior to death

Table 7 Numbers of VAs collected by cause of death and gold standard level for child causes Child causes

Level 1 Level 2A Level 2B

AIDS

19

0

0

AIDS with TB

1

0

0

Bite of venomous animal

54

0

0

Diarrhea/dysentery

255

1

0

Drowning

82

1

0

Encephalitis

41

0

0

Falls

49

0

0

Fires

68

0

0

Hemorrhagic fever

51

0

0

Malaria

59

58

0

Measles

0

23

0

Meningitis

58

0

0

Other cancers

28

0

0

Other cardiovascular diseases

76

0

0

Other defined causes of child deaths

182

0

0

Other digestive diseases

48

0

0

Other infectious diseases

60

0

0

Other respiratory diseases

12

0

0

Pertussis

0

0

0

Pneumonia

272

224

1

Pneumonia and diarrhea

35

3

0

Poisonings

18

0

0

Road traffic

92

0

0

Sepsis (with local bacterial infection)

22

15

0

Sepsis (without local bacterial infection)

39

67

0

TB

4

5

0

Violent death

52

0

0

Table 8 Numbers of VAs collected by cause of death and gold standard level for neonatal causes Neonate causes

Level 1

Level 2A

Level 2B

Birth asphyxia

461

0

0

Congenital malformation

250

0

0

6

0

0

Pneumonia (serious infection)

84

5

0

Preterm delivery (<33 weeks gestational age [GA]) without respiratory distress syndrome (RDS)

353

0

0

Preterm delivery (with or without RDS) and sepsis

75

1

0

Preterm delivery (without RDS) and birth asphyxia

89

0

0

Preterm delivery (without RDS) and sepsis and birth asphyxia

34

0

0

Respiratory distress syndrome (33-36 weeks GA)

13

0

0 0

Meningitis (serious infection)

Respiratory distress syndrome (<33 weeks GA)

97

0

Sepsis (serious infection)

127

1

0

Sepsis with local bacterial infection

32

1

0

Stillbirth

1,001

1

0

Tetanus

4

0

0

33


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 13 of 15

improve VA performance. Rather than free text, items could be included such as “Did anyone tell you or do you have any documentation mentioning acute myocardial infarction, MI, ischemic heart disease, or coronary heart disease?� These checklist items would be completed by the interviewer after questioning the respondent and examining the medical records and other documentation available. In this way, the task of reading free text and translating it through a dictionary would be simplified and focused only where it is likely to change the results.

is the closest we can come in a validation study to understanding how VA will perform in a poor, underserved community. While it is theoretically possible that household recall of symptoms and signs will be different if someone has experienced health care prior to death, there is in fact no direct evidence for this hypothesis, nor is it clear how it would be tested. Third, the clinical course and thus the signs and symptoms related to a cause of death may be influenced through contact with the health system. As with the second limitation, there is unfortunately no way to investigate this important issue. We simply have no way to figure out the true cause of death for deaths that have occurred in the community with no contact with health services. Ideally, all countries would have in place functioning vital registration systems that capture all deaths and include a medically certified cause of death according to the procedures and rules of the International Classification of Diseases in force at the time. While progress toward this goal is being made, it is painfully slow, and without greater government commitment, will not be a reality for most developing countries for decades to come [45,46]. To meet urgent policy and planning needs, countries will have no alternative but to introduce verbal autopsy, at least for deaths that occur outside hospitals. It is critically important that they have confidence in the VA methods they use, and that they understand the validation and performance characteristics of those methods. We believe that to do so, validity and comparative performance must be assessed against rigorous, standardized criteria that unambiguously identify the cause of death, and that are not influenced whatsoever by the quality, usually very poor, of medical records or the diagnostic biases of physicians who review them. Our study has compiled the first ever dataset of gold standard cause of death assignments across six sites in four countries. It is unlikely that a comparable dataset on VA with true gold standard cause of death ascertainment will be collected in the near future, if for no other reason than the substantial cost and time investment. For quite some time, therefore, the PHMRC will be the largest and most rigorously collected VA validation set. We intend to make the dataset publicly available in the hope that it will serve as a resource for the broader VA scientific community interested in developing and testing new methods. For this reason, we plan to release to the public an anonymized version of the dataset once the primary set of analyses from the investigators have been published. One lesson learned from the complexity of converting free text into dichotomous variables is that future VA instruments may want to incorporate a series of checklist questions based on the free text variables that

Conclusion We have described the development and usefulness of the largest, perhaps only dataset with gold standard cause of death assignment and matching verbal autopsies for more than 12,000 deaths in four countries. We expect that this will facilitate further development of verbal autopsy and perhaps other cause of death measurement approaches in countries with poor vital registration and certification practices. The utility of this dataset will undoubtedly improve if additional cases, in different populations, and for different diseases than those reported here, are added in future studies, provided the same protocols and standards are applied. In this way, confidence in the utility of verbal autopsy methods will increase and result in their wider application in countries to reduce ignorance about the comparative importance of leading causes of death. Additional material Additional file 1: Differences between the standardized WHO instrument and the PHMRC instrument. Additional file 2: General module of the full verbal autopsy instrument used in the field in the PHMRC study. Additional file 3: Adult module of the full verbal autopsy instrument used in the field in the PHMRC study. Additional file 4: Child and neonate module of the full verbal autopsy instrument used in the field in the PHMRC study. Additional file 5: Target cause list reduced to analysis cause list. Additional file 6: Gold standard (GS) definitions in the PHMRC study for adults. Additional file 7: Gold standard (GS) definitions in the PHMRC study for children/neonates. Additional file 8: Medical data extraction form (MDEF) used to extract gold standard data for the PHMRC study. Additional file 9: Duration cutoffs used when making variables dichotomous. Additional file 10: Conversion of polytomous symptoms into dichotomous symptoms. Additional file 11: Low-content polytomous items including response frequencies. Additional file 12: README dictionary developed to convert free text responses into usable key terms.

34


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 14 of 15

Additional file 13: Final analysis cause list and numbers of deaths by site.

4.

Abbreviations CSMF: cause-specific mortality fractions; GS: gold standard; HCE: health care experience; MAD: median absolute deviation; MDEF: medical data extraction form; PCVA: physician-certified verbal autopsy; PHMRC: Population Health Metrics Research Consortium; VA: verbal autopsy; WHO: World Health Organization

5.

6.

7. Acknowledgements The authors are grateful for the many respondents, local field staff, facilities, and physicians who have contributed to this research. In addition, the authors thank Charles Atkinson for managing the verbal autopsy database and performing quality control analysis, and Alireza Vahdatpour, Spencer L James, Michael K Freeman, and Ben Campbell for their intellectual contributions to the research. This work was funded by a grant from the Bill & Melinda Gates Foundation through the Grand Challenges in Global Health initiative. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. The corresponding author had full access to all data analyzed and had final responsibility for the decision to submit this original research paper for publication.

8.

9.

10.

11.

Author details Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave, Suite 600, Seattle, WA 98121, USA. 2University of Queensland, School of Population Health, Brisbane, Australia. 3Johns Hopkins University, Bloomberg School of Public Health, Baltimore, MD, USA. 4Community Empowerment Lab, Shivgarh, India, and The INCLEN Trust International, New Delhi, India. 5Public Health Laboratory-IdC, Pemba, Tanzania. 6Public Health Foundation of India, New Delhi, India. 7Brigham and Women’s Hospital, Boston, MA, USA. 8CSM Medical University, Lucknow, India. 9Harvard University, School of Public Health, Boston, MA, USA. 10National Institute of Public Health, Cuernavaca, Mexico. 11The George Institute for Global Health, Camperdown, Australia. 12Research Institute for Tropical Medicine, Manila, Philippines. 13Cornell University, Division of Nutritional Sciences, Ithaca, NY, USA. 14The George Institute for Global Health, India, Hyderabad, India. 15 Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania.

12.

1

13.

14. 15.

16.

Authors’ contributions CJLM, ADL, and RB conceptualized and organized the study. CJLM, ADL, RL, and SLO drafted the manuscript. DP, RJ, and BN directed the data collection at the Andhra Pradesh site; IR, ML, DS, VT, and HR directed the data collection at the Bohol site; ZP, ED, WF, SM, and MS directed the data collection at the Dar es Salaam site; BH, SG, RL, DRV, and MR directed the data collection at the Mexico site; SS, SMA, UD, and AD directed the data collection at the Pemba site; and VK, RA, VD, AK, and RP directed the data collection at the Uttar Pradesh site. AB, LD, ADF, HK, RL, and IR conceptualized analytic strategies and development of methods. SLO organized and managed collaboration, data collection, and analytics. All authors have read and approved the final manuscript.

17.

18.

19.

Competing interests The authors declare that they have no competing interests.

20.

Received: 15 April 2011 Accepted: 4 August 2011 Published: 4 August 2011 21. References 1. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84:239-245. 2. Quigley MA, Chandramohan D, Rodrigues LC: Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies. Int J Epidemiol 1999, 28:1081-1087. 3. Setel PW, Whiting DR, Hemed Y, Chandramohan D, Wolfson LJ, Alberti KGMM, Lopez AD: Validity of verbal autopsy procedures for

22.

35

determining cause of death in Tanzania. Trop Med Int Health 2006, 11:681-696. Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:32. Boulle A, Chandramohan D, Weller P: A case study of using artificial neural networks for classifying cause of death from verbal autopsy. Int J Epidemiol 2001, 30:515-520. Reeves B, Quigley M: A review of data-derived methods for assigning causes of death from verbal autopsy data. Int J Epidemiol 1997, 26:1080-1089. Quigley MA, Chandramohan D, Setel P, Binka F, Rodrigues LC: Validity of data-derived algorithms for ascertaining causes of adult death in two African sites using verbal autopsy. Trop Med Int Health 2000, 5:33-39. Byass P, Huong DL, Minh HV: A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scand J Public Health Suppl 2003, 62:32-37. Byass P, Fottrell E, Dao LH, Berhane Y, Corrah T, Kahn K, Muhe L, Do DV: Refining a probabilistic model for interpreting verbal autopsy data. Scand J Public Health 2006, 34:26-31. Fantahun M, Fottrell E, Berhane Y, Wall S, Högberg U, Byass P: Assessing a new approach to verbal autopsy interpretation in a rural Ethiopian community: the InterVA model. Bull World Health Organ 2006, 84:204-210. Oti SO, Kyobutungi C: Verbal autopsy interpretation: a comparative analysis of the InterVA model versus physician review in determining causes of death in the Nairobi DSS. Popul Health Metr 2010, 8:21. Murray CJL, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the Symptom Pattern Method for Analyzing Verbal Autopsy Data. PLoS Med 2007, 4:e327. Lozano R, Freeman MK, James SL, Campbell B, Lopez AD, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:50. King G, Lu Y: Verbal Autopsy Methods with Multiple Causes of Death. Statistical Science 2008, 23:78-91. Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9:28. Flaxman AD, Vahdatpour A, James SL, Birnbaum JK, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:35. Murray CJL, James SL, Birnbaum JK, Freeman MK, Lozano R, Lopez AD, the Population Health Metrics Research Consortium (PHMRC): Simplified Symptom Pattern Method for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:30. James SL, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metr 2011, 9:31. Flaxman AD, Vahdatpour A, Green S, James SL, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:29. Lopman BA, Barnabas RV, Boerma JT, Chawira G, Gaitskell K, Harrop T, Mason P, Donnelly CA, Garnett GP, Nyamukapa C, Gregson S: Creating and Validating an Algorithm to Measure AIDS Mortality in the Adult Population using Verbal Autopsy. PLoS Med 2006, 3:e312. Lopman B, Cook A, Smith J, Chawira G, Urassa M, Kumogola Y, Isingo R, Ihekweazu C, Ruwende J, Ndege M, Gregson S, Zaba B, Boerma T: Verbal autopsy can consistently measure AIDS mortality: a validation study in Tanzania and Zimbabwe. Journal of Epidemiology and Community Health 2010, 64:330-334. Polprasert W, Rao C, Adair T, Pattaraarchachai J, Porapakkham Y, Lopez A: Cause-of-death ascertainment for deaths that occur outside hospitals in Thailand: application of verbal autopsy methods. Population Health Metrics 2010, 8:13.


Murray et al. Population Health Metrics 2011, 9:27 http://www.pophealthmetrics.com/content/9/1/27

Page 15 of 15

23. Yang G, Rao C, Ma J, Wang L, Wan X, Dubrovsky G, Lopez AD: Validation of verbal autopsy procedures for adult deaths in China. Int J Epidemiol 2006, 35:741-748. 24. Tensou B, Araya T, Telake DS, Byass P, Berhane Y, Kebebew T, Sanders EJ, Reniers G: Evaluating the InterVA model for determining AIDS mortality from verbal autopsies in the adult population of Addis Ababa. Trop Med Int Health 2010, 15:547-553. 25. Marsh DR, Sadruddin S, Fikree FF, Krishnan C, Darmstadt GL: Validation of verbal autopsy to determine the cause of 137 neonatal deaths in Karachi, Pakistan. Paediatr Perinat Epidemiol 2003, 17:132-142. 26. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: their development and validation in a multicentre study. Trop Med Int Health 1998, 3:436-446. 27. Aggarwal AK, Jain V, Kumar R: Validity of verbal autopsy for ascertaining the causes of stillbirth. Bull World Health Organ 2011, 89:31-40. 28. Freeman JV, Christian P, Khatry SK, Adhikari RK, LeClerq SC, Katz J, Darmstadt GL: Evaluation of neonatal verbal autopsy using physician review versus algorithm-based cause-of-death assignment in rural Nepal. Paediatr Perinat Epidemiol 2005, 19:323-331. 29. Kahn K, Tollman SM, Garenne M, Gear JS: Validation and application of verbal autopsies in a rural area of South Africa. Trop Med Int Health 2000, 5:824-831. 30. Khademi H, Etemadi A, Kamangar F, Nouraie M, Shakeri R, Abaie B, Pourshams A, Bagheri M, Hooshyar A, Islami F, Abnet CC, Pharoah P, Brennan P, Boffetta P, Dawsey SM, Malekzadeh R: Verbal autopsy: reliability and validity estimates for causes of death in the Golestan Cohort Study in Iran. PLoS ONE 2010, 5:e11183. 31. Kumar R, Thakur JS, Rao BT, Singh MMC, Bhatia SPS: Validity of verbal autopsy in determining causes of adult deaths. Indian J Public Health 2006, 50:90-94. 32. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: issues in their development and validation. Int J Epidemiol 1994, 23:213-222. 33. Snow RW, Armstrong JR, Forster D, Winstanley MT, Marsh VM, Newton CR, Waruiru C, Mwangi I, Winstanley PA, Marsh K: Childhood deaths in Africa: uses and limitations of verbal autopsies. Lancet 1992, 340:351-355. 34. Snow B, Marsh K: How useful are verbal autopsies to estimate childhood causes of death? Health Policy and Planning 1992, 7:22-29. 35. Cohen L, Manion L, Morrison K, Morrison KRB: Research methods in education New York: Psychology Press; 2007. 36. Basic Sociodemographic Indicators | National Population Council (CONAPO) | National Institute of Statistics and Geography (INEGI). [http:// www.conapo.gob.mx/]. 37. Verbal autopsy standards: Ascertaining and attributing causes of death | WHO. [http://www.who.int/whosis/mort/verbalautopsystandards/en/index. html]. 38. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: issues in their development and validation. Int J Epidemiol 1994, 23:213-222. 39. Anker M, Black RE, Coldham C, Kalter HD, Quigley MA, Ross D, Snow RW: A standard verbal autopsy method for investigating causes of death in infants and children Geneva: World Health Organization; 1999. 40. Joshi R, Cardona M, Iyengar S, Sukumar A, Raju CR, Raju KR, Raju K, Reddy KS, Lopez AD, Neal B: Chronic diseases now a leading cause of death in rural India–mortality data from the Andhra Pradesh Rural Health Initiative. Int J Epidemiol 2006, 35:1522-1529. 41. Yang G, Hu J, Rao KQ, Ma J, Rao C, Lopez AD: Mortality registration and surveillance in China: History, current situation and challenges. Popul Health Metr 2005, 3:3. 42. Hopkins D: A method of automated nonparametric content analysis for social science. American Journal of Political Science 2010, 54:229-247. 43. Shojania KG, Burton EC, McDonald KM, Goldman L: The autopsy as an outcome and performance measure. Evid Rep Technol Assess (Summ) Rockville: Agency for Healthcare Research and Quality (US); 2002, 1-5. 44. Shojania KG, Burton EC, McDonald KM, Goldman L: Changes in rates of autopsy-detected diagnostic errors over time: a systematic review. JAMA 2003, 289:2849-2856. 45. Mahapatra P, Shibuya K, Lopez AD, Coullare F, Notzon FC, Rao C, Szreter S: Civil registration systems and vital statistics: successes and missed opportunities. Lancet 2007, 370(599):1653-1663.

46. Hill K, Lopez AD, Shibuya K, Jha P, AbouZahr C, Anderson RN, Bawah AA, Betrán AP, Binka F, Bundhamcharoen K, Castro R, Cleland J, Coullare F, Evans T, Carrasco Figueroa X, George CK, Gollogly L, Gonzalez R, Grzebien DR, Huang Z, Hull TH, Inoue M, Jakob R, Jiang Y, Laurenti R, Li X, Lievesley D, Fat DM, Macfarlane S, Mahapatra P, Merialdi M, Mikkelsen L, Nien JK, Notzon FC, Rao C, Rao K, Sankoh O, Setel PW, Soleman N, Stout S, Szreter S, Tangcharoensathien V, van der Maas PJ, Wu F, Yang G, Zhang S, Zhou M: Interim measures for meeting needs for health sector data: births, deaths, and causes of death. Lancet 2007, 370:1726-1735. doi:10.1186/1478-7954-9-27 Cite this article as: Murray et al.: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Population Health Metrics 2011 9:27.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

36


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

RESEARCH

Open Access

Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies Christopher JL Murray1*, Rafael Lozano1, Abraham D Flaxman1, Alireza Vahdatpour1 and Alan D Lopez2

Abstract Background: Verbal autopsy (VA) is an important method for obtaining cause of death information in settings without vital registration and medical certification of causes of death. An array of methods, including physician review and computer-automated methods, have been proposed and used. Choosing the best method for VA requires the appropriate metrics for assessing performance. Currently used metrics such as sensitivity, specificity, and cause-specific mortality fraction (CSMF) errors do not provide a robust basis for comparison. Methods: We use simple simulations of populations with three causes of death to demonstrate that most metrics used in VA validation studies are extremely sensitive to the CSMF composition of the test dataset. Simulations also demonstrate that an inferior method can appear to have better performance than an alternative due strictly to the CSMF composition of the test set. Results: VA methods need to be evaluated across a set of test datasets with widely varying CSMF compositions. We propose two metrics for assessing the performance of a proposed VA method. For assessing how well a method does at individual cause of death assignment, we recommend the average chance-corrected concordance across causes. This metric is insensitive to the CSMF composition of the test sets and corrects for the degree to which a method will get the cause correct due strictly to chance. For the evaluation of CSMF estimation, we propose CSMF accuracy. CSMF accuracy is defined as one minus the sum of all absolute CSMF errors across causes divided by the maximum total error. It is scaled from zero to one and can generalize a method’s CSMF estimation capability regardless of the number of causes. Performance of a VA method for CSMF estimation by cause can be assessed by examining the relationship across test datasets between the estimated CSMF and the true CSMF. Conclusions: With an increasing range of VA methods available, it will be critical to objectively assess their performance in assigning cause of death. Chance-corrected concordance and CSMF accuracy assessed across a large number of test datasets with widely varying CSMF composition provide a robust strategy for this assessment. Keywords: Verbal autopsy, metrics, validation

Background Verbal autopsy (VA) has been in use in various field studies, surveillance sites, and national systems for more than four decades [1-4]. The instruments and analytical tools used to assign cause of death are rapidly evolving. New automated methods [4-7] have been proposed and are in use alongside traditional physician-certified verbal

autopsy (PCVA). With new Bayesian statistical methods and machine learning approaches being developed, we can expect a wide range of new methods and refinements of existing methods in the coming years. It will become increasingly important for users of VA instruments and analytical tools to compare the performance of all the options in a balanced, objective fashion. Large, but we argue inadequate validation datasets in which VA is compared to medical records have been collected and reported in the literature for China and Thailand [8,9]. The multisite Population Health Metrics

* Correspondence: cjlm@uw.edu 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA Full list of author information is available at the end of the article

Š 2011 Murray et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

37


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

Page 2 of 11

coefficient, adjusted contingency coefficient, Tschuprow’s T, Cramer’s V, and Matthews correlation coefficient [28-32]. When applied to the comparison of true cause and predicted cause, these measures capture in a single quantity how often the true cause is predicted correctly as a complex function of misclassification of the true negatives. In VA, however, different uses, such as a research study or monitoring population health, imply different priorities on correct individual cause assignment or accurate CSMF prediction. For this reason, we do not believe that the measures of nominal association that produce a single measure reflecting both will be useful. We focus in this paper on separate measures of individual cause assignment and CSMF accuracy following the general VA tradition. This approach is also required because some of the proposed VA methods, such as the method of King and Lu [33], do not predict individual causes of death, only the CSMFs directly. In other words, metrics that require the full N by N matrix of true and predicted cause to be complete cannot be applied to some VA methods.

Research Consortium has collected a very large validation dataset for neonates, children, and adults in Mexico, Tanzania, India, and the Philippines. These studies, as opposed to all previous efforts, provide the opportunity to compare VA results to gold standard cause of death assignment based on strict clinical diagnostic criteria [10]. All of these datasets provide rich empirical opportunities to assess the validity of existing and proposed VA methods. Robust comparison of performance requires standardization of the metrics used to assess the validity of VA and respect of some basic principles for the validation of empirically-derived approaches. Many metrics, including cause-specific sensitivity, specificity, concordance, absolute error in cause-specific mortality fractions (CSMFs), relative error in CSMFs, and Cohen’s kappa have been reported in the literature [2,8,9,11-22]. The purpose of this paper is to identify and discuss the key issues that must be addressed to choose a set of metrics for VA validation studies and make recommendations based on this assessment for future reporting. A wide array of different types of VA methods has been proposed. We can classify the various methods into four groups, based on the nature of the task that they attempt to perform: 1) individual death cause assignment to a single cause, which includes PCVA and variants of Symptom Pattern, Tariff, and machine learning [2,9,21,23-27]; 2) individual death cause assignment to multiple causes with probabilities across causes for each death summing to 100%; 3) direct estimation of CSMFs without assigning causes to individual deaths; and 4) combined methods that use both direct estimation of CSMFs and individual cause of death assignment so that the sum of the individual cause of death assignments equals the CSMFs from direct estimation. Proposed metrics need to be useful for comparing the performance of methods across this entire spectrum. Further, the metrics and validation study design needs to be able to help identify methods that are likely to perform better than others in many diverse settings with varying population CSMFs and cause lists. Published studies on the validity of verbal autopsy have used a wide variety of measures, many of them coming from the literature on the evaluation of diagnostic tests. Authors have generally reported measures of the performance of a VA method for assigning causes to individual deaths such as sensitivity, specificity, concordance, and more recently, kappa [8,9,11,12,14,16-20]. In addition, they have used measures to assess how well a VA method estimates CSMFs, including the sum of the absolute values of CSMF errors, average CSMF error, and relative error in CSMFs [2,8,9,11,12,14-17,21,22]. There are many other measures proposed in the literature on nominal association such as phi, contingency

Methods Many metrics are a function of the CSMF composition of a test dataset

We use a simple hypothetical case of a VA method to demonstrate why some currently-reported metrics may be difficult to interpret in a robust fashion. This illustration uses a hypothetical case of a population with three causes of death: A, B, and C. Imagine a VA method (by which we mean the combination of the instrument and the analytical tool applied to generate cause of death assignments), method 1, that produces a predicted cause for each death. Table 1 shows the probability that for a given true cause, method 1 will assign the death to one

Table 1 The hypothetical method 1 shows the probability of assigning a death from a true cause to each of the three possible causes; the hypothetical method 2 differs only in the higher probability of assigning deaths from cause A to cause A. Method 1 True

Estimated A

A

B

C

0.70

0.03

0.27

B

0.04

0.60

0.36

C

0.065

0.585

0.35

Method 2 True

38

Estimated A

B

C

A

0.80

0.02

0.18

B

0.04

0.60

0.36

C

0.065

0.585

0.35


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

Page 3 of 11

shown in Table 1, we have quantified the range of each metric due purely to changes in the test set cause composition. Table 2 shows the mean, median, maximum, and minimum values of each metric across the randomly-varied cause compositions. Because we are holding constant the probability of correct and incorrect classification of each true cause, sensitivity for each cause in these simulations does not vary. But specificity for each cause, kappa, overall concordance, summed absolute CSMF error, and relative CSMF error vary widely. The ranges are large enough that one cannot meaningfully compare results of a method from one test dataset with results for another method in a different test dataset. We have demonstrated using a simple case

of the three possible causes. We can consider the matrix of these probabilities as the fundamental attribute of a VA assignment method. Given the matrix of these probabilities and the CSMF composition of a test dataset, we can easily compute the standard array of metrics, including sensitivity, specificity, concordance, absolute error in CSMFs, and relative error in the CSMFs. We have created 500 test datasets by randomly varying the cause composition of the test set (using random draws from an uninformative Dirichlet distribution). We use the Dirichlet distribution because it creates an even distribution across all possible combinations of causes that sum to 100%. By holding constant the probabilities of classification as a function of each true cause as

Table 2 Range of values for selected cause-specific and overall metrics of individual cause assignment and CSMF estimation for two different hypothetical VA assignment methods across 500 test datasets where the cause composition of the test datasets has been randomly varied. Method 1 Cause A

Method 2

Mean

Median

Max

Min

Mean

Median

Max

Min

Sensitivity

0.70

0.70

0.70

0.70

0.80

0.80

0.80

0.80

Specificity

0.95

0.95

0.96

0.94

0.95

0.95

0.96

0.94

Absolute CSMF error

0.08

0.06

0.29

0.00

0.05

0.04

0.19

0.00

Relative CSMF error

0.74

0.24

53.38

0.00

0.71

0.15

53.48

0.00

Chance-corrected concordance

0.55

0.55

0.55

0.55

0.70

0.70

0.70

0.70

Estimated versus true regression

Cause B

Intercept

Slope

RMSE

Intercept

Slope

RMSE

0.52

0.64

0.00

0.52

0.74

0.00

Mean

Median

Max

Min

Mean

Median

Max

Min

Sensitivity

0.60

0.60

0.60

0.60

0.60

0.60

0.60

0.60

Specificity

0.69

0.69

0.97

0.42

0.70

0.70

0.98

0.42

Absolute CSMF error

0.17

0.15

0.57

0.00

0.17

0.15

0.57

0.00

Relative CSMF error

4.50

0.37

229.07

0.00

4.43

0.37

228.56

0.00

Chance-corrected concordance

0.40

0.40

0.40

0.40

0.40

0.40

0.40

Estimated versus true regression

Cause C

0.40

Intercept

Slope

RMSE

Intercept

Slope

RMSE

0.30

0.29

0.11

0.30

0.30

0.11

Mean

Median

Max

Min

Mean

Median

Max

Min

Sensitivity

0.35

0.35

0.35

0.35

0.35

0.35

0.35

0.35

Specificity

0.69

0.69

0.73

0.64

0.73

0.73

0.82

0.64

Absolute CSMF error

0.20

0.19

0.63

0.00

0.19

0.17

0.63

0.00

Relative CSMF error

6.75

0.50

793.85

0.00

6.01

0.49

780.54

0.00

0.03

0.03

0.03

Chance-corrected concordance Estimated versus true regression

Overall causes

0.02

0.03

Intercept

Slope

0.03

RMSE

Intercept

Slope

0.03

RMSE

0.02

0.31

0.03

0.01

0.26

0.08

0.03

Mean

Median

Max

Min

Mean

Median

Max

Min

Kappa

0.26

0.28

0.47

0.00

0.30

0.33

0.53

0.00

Total absolute CSMF error

0.46

0.45

1.26

0.01

0.42

0.37

1.26

0.03

CSMF accuracy

0.75

0.75

1.00

0.37

0.77

0.80

0.98

0.36

39


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

Page 4 of 11

cause A, sensitivity is higher in method 2, and the relative pattern of misclassification is the same. Using the same 500 test datasets with widely varying CSMF compositions, Table 3 counts the number of times that method 1 or 2 has better performance for absolute CSMF error by cause. In fact, 32%, 36%, and 49% of the time for cause A, cause B, and cause C, respectively, the inferior method (method 1) reports smaller absolute CSMF error. This simple finding illustrates how it could be extremely misleading to draw conclusions about the performance of one method compared to another on the basis of only one test dataset. In any real comparison of alternative VA methods with longer cause lists, it is highly likely that for some causes, sensitivities will be higher and for others, lower. The pattern of misclassification is also likely to vary substantially. In these more complicated cases, drawing conclusions about which method performs better cannot be made based on one test dataset but needs to be carefully assessed for a diverse range of cause compositions in a series of test datasets. These three-cause cases also point out that the performance of individual cause assignment in predicting the true cause correctly is quite distinct from how well a VA method does at predicting the true CSMFs. Clearly, when sensitivities for each cause equal 100% for all causes, the CSMFs will be correctly predicted. But for all realistic cases of VA where sensitivities will be far below 100%, we need to quantify the performance of a VA method both at assigning individual causes correctly and for predicting CSMFs accurately. We explore metrics for individual cause assignment in more detail. The key issues examined include correcting for chance, dealing with the cause composition of the test dataset, and partial cause assignment metrics. In the following section, we discuss measures of CSMF accuracy, including the choice between measures of absolute and relative error, adjusting for the number of causes, comparison to random assignment and taking into account cause composition of the test set.

how VA method performance can be affected by CSMF composition of the test set in principle; in multiple applications of this approach to different real VA methods [25-27,34-36] we have also found that this theoretical result holds true. Figure 1 compares a measure of performance for assigning cause to individual deaths, kappa, with the total absolute error in the CSMFs. This comparison highlights that a method’s ability to assign individual causes is not closely related to how well it can estimate CSMFs. The reason is simple: even when sensitivities for the three causes are low and therefore kappa is low, false positives can be balanced by true negatives for each cause. When false positives and true negatives are exactly balanced, there will be no error in the estimated CSMFs. However, these simulations highlight that this can occur because of the particular and, quite possibly, idiosyncratic CSMF composition of the test dataset. Even though results of all standard metrics except sensitivity are strongly affected by the CSMF composition of the test dataset, are comparisons of two VA methods made on one test dataset with one particular CSMF composition still robust? We can adapt this simple three-cause simulation environment to explore this question. Table 1 shows the probabilities of assigning each true cause to the three predicted causes for a second VA method, method 2. This method is superior to method 1. For true causes B and C it assigns the deaths in exactly the same proportions as method 1, but for

Kappa

0.3

0.4

Kappa versus total absolute CSMF error for method 1

0.2

Results Metrics for individual cause assignment

0

0.1

The performance assessment of a method that operates at the individual level has two components: the fraction

0

0.2

0.4

0.6

0.8

1

Total absolute CSMF error

Table 3 The number of times method 1 or 2 has better performance for the absolute CSMF error in 500 randomly-generated test datasets with varying CSMF composition.

1.2

Figure 1 Kappa versus total absolute CSMF error for method 1 for 500 iterations of experiment with varying true CSMFs. This graph shows why kappa should not be used as a metric for CSMF accuracy.

Cause Method Absolute CSMF error

40

A

B

C

1

2

1

2

1

2

160

340

181

319

247

253


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

Page 5 of 11

of true deaths from a cause that are correctly assigned to that cause and the balance between true negatives (true deaths from that cause assigned to other causes) and false positives (deaths from other causes assigned to that cause). The balance between true negatives and false positives only matters as it affects the estimates of the CSMF. Given that we will recommend separate metrics for the accuracy of CSMF prediction, the only aspect of individual cause assignment that matters is whether the true cause is correctly predicted. In Table 1, these are the deaths in the diagonal cells of the matrix compared to the total number of deaths in each row. In the literature on diagnostic tests, the number of deaths in the diagonal cell divided by the total of the row is defined as the sensitivity for a given cause. The generalized version for multiple causes has been referred to as concordance [21,37,38]. As a measure of agreement for a cause, neither sensitivity nor concordance takes into account agreement expected by chance alone. If we had a VA algorithm that randomly assigned deaths to each cause, we would expect it to have a concordance of (1/n), where n is the number of causes, as long as there are large numbers for each cause. In other words, if there are five causes of death and we randomly assign deaths to each of the five causes, we would be right 20% of the time. The general concept of correcting for concordance based on chance can be represented as: P(observed)j − P expected j Kj = 1 − P expected j

P (observed) =

Note that since P(expected) and P(observed) are defined over all causes, Cohen’s kappa is an overallcauses measure of chance-corrected association. 2. Cohen’s kappa assumes that the chance prediction is informed by the true test set cause composition. A more naïve assumption, perhaps more appropriate for VA validation studies, is that the method is uninformed about the true test composition, and chance assignment would simply be equal assignment to all causes. An alternative method for estimating P (expected) is to assume it is simply (1/n), where n is the number of causes. Cohen’s kappa has been reported in the VA literature, but it is not the most attractive approach to correcting for chance in VA applications. As shown in Table 2, Cohen’s kappa is quite sensitive to the cause composition of the test dataset, while option two above is not at all sensitive to this cause composition. Furthermore, Cohen’s kappa provides a measure of association across all causes and not a cause-specific measure of concordance, although logically this approach to correcting for chance could be applied at the cause level. Based on simplicity and the robustness to the CSMF composition of the test dataset, we propose to measure chance-corrected concordance for cause j (CCCj) as: TPj 1 − TPj + TNj N CCCj = 1 1− N Where TP is true positives, TN is true negatives, and N is the number of causes. TP plus TN equals the true number of deaths from cause j. Reporting this measure enhances the comparability across studies with different numbers of causes. When there are only a small number of causes, the chancecorrected concordance will be substantially lower than sensitivity. When a VA algorithm gets less than (1/n) fraction of the deaths correct for a cause, it will have a chance-corrected concordance that is negative. In all other cases, the chance-corrected concordance will range from 0 to 1. In addition to reporting the chance-corrected concordance for each cause, we will also be concerned with how well a VA method performs overall at individual cause assignment for most applications of VA. This

1. Cohen’s kappa calculated P(expected) as: n n n P expected = pik × pki i=1

pkk

k=1

Where the P(observed) j is the fraction that are correctly assigned for a cause j and P(expected)j is the fraction correctly assigned on the basis of chance alone. There are two choices that affect the exact formulation of this class of measures: whether to compute an overall measure of chance-corrected association and/or a causespecific measure of chance-corrected association and how to estimate the association expected on the basis of chance alone. There are at least two methods for estimating the P (expected).

k=1

n

i=1

Where pij is the probability of assigning a death of cause i to cause j. In addition, P(observed) is calculated as:

41


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

Page 6 of 11

CSMF compositions using sampling with replacement of the test deaths by cause. Draws should be taken from an uninformative Dirichlet distribution to capture the range of possible CSMF compositions and sampling with replacement used to generate a range of test datasets. For each test dataset, the overall chance-corrected concordance should be estimated and the median value of these results should be reported as the single summary measure of individual cause assignment. Some VA methods proposed or under development assign probabilities to more than one cause for each death [33,37]. These probabilities are assigned such that they sum to one for each death. There is literature on a range of measures for these types of cases [39,40]. These take into account the probability attached to the correct cause, not just its presence in the top k causes. For simplicity and ease of communication, we can compute a partial death assignment concordance as the fraction of deaths for which the true cause is included in the top k causes, ranked by their predicted probability. For example, a method could predict for a particular death that it is 50% tuberculosis, 20% pneumonia, 10% lung cancer, 10% AIDS, 5% heart failure, and 5% other infectious diseases. We can compute the fraction of the time that the true cause is the top cause (tuberculosis), the top two causes (tuberculosis or pneumonia), the top three causes, and so on. By definition, as the number of causes that are considered for calculating concordance (top two, top three, top four, etc.) increases, the calculated concordance must increase or at least remain equal. As for single cause concordance, we should correct the partial cause concordance for how much better the VA method is than random assignment. The formula for the partial concordance from random assignment takes into account the combinatorics of cases where the same cause is selected at random more than once and simplifies to:

summary judgment requires a summary metric for VA individual cause assignment for a given test dataset of the form: OverallCCC =

k j=1

wj CCCj and

k

wj = 1

j=1

The question is how to choose the set of weights across causes to yield an overall summary for a given test dataset. There are three logical options available: the CSMFs in the test dataset, a standardized distribution of CSMFs such as the global cause of death distribution, and equal weights. Using the test set CSMFs appear to be undesirable, as the results across VA validation studies would not be comparable. If there is a positive or negative correlation between the chance-corrected concordances by cause and the CSMFs in the test set, the overall chance-corrected concordance will vary substantially. The second option, using weights equal to the global cause of death distribution as currently known, is appealing. The problem, however, is that in many validation studies, not all causes present in the global distribution are included. This can be handled as long as the validation study includes categories for other causes. But in a validation study on three or four specific causes with residual causes grouped under “other causes,” the chance-corrected concordance for “other causes” would dominate the results if these were standardized to the global cause of death distribution. An alternative would be to rescale the cause fractions in the global distribution for each study such that the sum of the weights on the included causes equals one. But this would remove some of the appeal of using the global CSMFs as weights. The third option, in which the weights on each cause are equal for all causes included in the study, is the easiest to implement and the most comparable. Based on considerations of simplicity of explanation, ease of implementation, and comparability, we recommend the overall chance-corrected concordance be calculated as the average of the cause-specific chance-corrected concordances, namely equal weights, in the above equation. Even when the overall chance-corrected concordance is calculated as the average of the cause-specific chancecorrected concordances, the CSMF composition of the test set may influence the result. Some more complex VA analytical methods may not have constant probabilities of assignment to causes conditional on the true cause of death. In other words, it is possible that concordance for a cause may vary as a function of the test dataset CSMFs. To avoid making the wrong inference on a method’s performance, we recommend that a set of 100 or more test datasets be created with varying

PC(k) =

k N

Where PC(k) is the partial concordance due to random assignment for the top k causes, and N is the number of causes in the study. The partial chance-corrected concordance for the top k causes, PCCC(k) becomes: k N PCCC(k) = k 1− N C−

Where C is the fraction of deaths where the true cause is in the top k causes assigned to that death. As k

42


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

Page 7 of 11

CSMFs. Resources scale to the absolute size of problem (and cost effectiveness of interventions). In this example, which can be confirmed in an optimization model, the negative consequence scales to the absolute error in cause estimation, not the relative error. In the absence of a detailed understanding of which causes have more or less cost-effective intervention strategies and how over- or underestimation will lead to misallocation of resources, it appears prudent to treat all deaths misclassified where true negatives and false positives are not in balance as equally problematic. In other words, we should be concerned with absolute errors in the CSMFs, not relative errors. Given that negative consequences can come from underestimation or overestimation, we should, in fact, be interested in the absolute value of absolute errors in the CSMFs across each cause. For a summary metric across all causes, we could report the average of the absolute value of the CSMF error. Absolute errors in the CSMFs will tend to be smaller the larger the number of causes in the cause list. For any given cause list, the maximum possible average or total error would occur when we estimate 100% of all deaths due to the cause with the smallest true cause fraction. For any given number of causes, the total of the absolute value of the CSMF errors across causes will always be

increases, it is not necessary that PCCC(k) increases. In fact, at the limit where k equals N, the PC(k) will equal 1.0, and the PCCC(k) will not be defined. By computing the PCCC(k), we facilitate comparisons across studies with different numbers of causes and perhaps different choices of k. As for individual cause assignment, median PCCC(k) across 100 or more test datasets in which the CSMFs have been sampled from an uninformative Dirichlet distribution should be reported. CSMF accuracy

When true negatives for a cause do not equal the false positives estimated for that same cause, the predicted CSMF will be too large or too small. A key choice in the design of metrics for CSMF accuracy is whether we are interested in absolute or relative errors in the CSMF. If the true CSMF for a cause is 15% and we predict 16%, this an error of one percentage point. If, for another cause, the true CSMF is 1% and we predict 2%, the error is also one percentage point. Should we be equally concerned about both of these one percentage point errors? Or is a doubling of the second cause from 1% to 2% a worse error than the 6.7% overestimation of the cause fraction for the first cause? This is the classic problem that has been discussed in several fields: whether we care about absolute or relative errors [41,42]. The answer is strictly a normative choice; as such, our answer must depend on how we intend to use VA results and what the consequences are of making various types of errors. What are the potential effects of misclassification when true negatives do not equal false positives on population health or well-being? If the size of the burden of a problem influences the allocation of resources to programs or research or changes the allocation of managerial or political attention, then inaccurate CSMFs could affect health or well-being. In this sense, is the harm from inaccurate CSMFs related to absolute or relative errors? Financial resources will have less health impact if we move resources away from cost-effective intervention areas to less cost-effective areas. Such harm would be related to the absolute error in the CSMF, not the relative error. Imagine a case where we underestimate the CSMF by 100 deaths for a cause of death with a highly cost-effective intervention strategy available. Because we have underestimated the magnitude of the cause, fewer resources are allocated to the program dealing with this cause, and resources are moved to address a health problem that has been overestimated but for which the intervention strategy is less cost-effective. The misallocation of resources translates in this hypothetical case into 10 fewer lives being saved. The reduction in the number of lives saved is a negative consequence that can be traced to the misestimation of the

CSMF Maximum Error = 2 1 − Minimum CSMFjtrue

The average of the absolute value of the errors is this quantity divided by N, where N is the number of causes. This convenient result means that we can compute the performance of any VA method compared to the worst possible method. This comparison is then independent of the number of causes in the cause list. Therefore, we define CSMF accuracy as:

k

pred

CSMFjtrue − CSMFj

CSMFAccuracy = 1 −

j=1

2(1 − Minimum(CSMFjtrue ))

This quantity will always range from zero to one, where a value of one means no error in the predicted CSMFs and a value of zero means the method is equivalent to the worst possible method of assigning cause fractions. Cause composition of the test set can matter because chance assignment does better or worse depending on the test set. Perhaps more important are two other reasons that CSMF composition can influence the results. First, as shown in Table 2, even when the percentage distribution of a true cause is constant across predicted causes - for example, for true cause A, 50% are assigned to A, 30% to B, and 20% to C - variation in true CSMFs

43


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

Page 8 of 11

for causes A, B, and C. There are three important aspects that relate to CSMF performance that can be best understood in terms of the relationship between the estimated CSMF and the true CSMF:

changes the CSMF average absolute error dramatically. Second, for some of the more complex VA methods, the probability of the predicted cause conditional on the true cause will also vary as a function of the cause composition of the test set. Since the purpose of VA validation studies is to identify which method will work in a variety of population epidemiological conditions, reporting CSMF error or CSMF accuracy for one test set would risk drawing an incorrect inference on relative performance. Given that the CSMF composition of the test set can have multiple influences, to generate robust conclusions about the performance of one VA method compared to another, the cause composition of the test set should be varied using resampling methods. We can use draws from an uninformative Dirichlet distribution to evenly sample all possible cause compositions that sum to one. The Dirichlet distribution can be used because we can generate widely varying cause compositions of the test dataset that sum to 100% for any number of causes. Further, the expected value for each cause of the uninformative Dirichlet is equal cause fractions, but for any given draw from the distribution there is a wide range of cause fractions. For each sample from the cause composition, we can sample the test data with replacement to generate a new matching dataset with an alternative cause composition. After generating predictions for each alternative test dataset using a proposed VA method, we can compute CSMF accuracy. A summary metric would be the median CSMF accuracy across the draws. The median value will be the preferred metric in this case because CSMF accuracy can take on extreme values for some cause compositions. Repeated draws from the uninformative Dirichlet distribution should be continued until the median value of CSMF accuracy stabilizes. Graphing the median value as a function of the number of draws can provide a visual indication of at what point CSMF accuracy changes little with further sampling. The number of draws depends on the tolerance for changes in the median. A reasonable tolerance is that further draws do not alter the median value by more than 0.5%. Many users of verbal autopsy will also be interested in the robustness of CSMF estimation for specific causes. CSMF performance can be assessed by examining the relationship between the estimated CSMF for a cause and the true CSMF for a cause. Because several hundred test datasets have been created by sampling from an uninformative Dirichlet distribution and then sampling with replacement from the test data, it is possible to examine the relationship between estimated CSMF and true CSMF cause by cause. Figure 2 illustrates the relationship between estimated and true CSMFs using the hypothetical VA method 1 across the 500 test datasets

CSMFestimated = α + β · CSMFtrue + ε

The intercept in the relationship between estimated CSMF and true CSMF, a, is an indication of how much a method tends to assign deaths to a cause even when there are no deaths from that cause in the test dataset. Some methods tend towards assigning an equal share of deaths to each cause. These methods will tend to have large nonzero intercepts that approach in the extreme (1/n), where n is the number of causes. The slope of the relationship, b, indicates by how much the estimated CSMF increases for each one percentage point in the true CSMF. Because some or many causes are nonzero intercepts, the slopes for almost all causes for almost all methods will be below 1. In other words, most methods will tend to overestimate small causes and underestimate large causes. The slopes, however, will be highly variable. Finally, the error term in the relationship between estimated and true CSMF provides an indication of how much an estimated cause fraction varies given a particular value of the true cause fraction. Using Ordinary Least Squares regression, the values for a, b, and the standard deviation of the error term (root mean squared error [RMSE]) can be estimated and reported by cause. These three values provide an easily-interpreted assessment of the performance of a VA method at estimating the CSMF for a given cause.

Discussion Our explication of performance metrics for VA leads to the following conclusions. First, for VA methods that assign individual causes to deaths, chance-corrected concordance should be reported for each cause, and the average chance-corrected concordance should be used as a summary measure of individual cause assignment. Second, for VA methods that assign multiple causes to deaths, the partial chance-corrected concordance for the top k causes should be reported for each cause, and the average partial chance-corrected concordance for the top k causes should be used as a summary measure. Third, for all VA methods, median CSMF accuracy computed for a set of test datasets with different CSMF composition drawn from an uninformative Dirichlet distribution should be reported. Because some readers of VA validation studies may not want a single summary measure of performance for assigning individual causes of death or a single summary of CSMF estimation, it will be important to make

44


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

Page 9 of 11

methods are extremely effective at identifying patterns in the data and can easily overfit the data. Strict separation of the test and training data is a critical aspect of any validation study. To avoid chance results from a particular train-test split in the data, validation studies for empirical methods should use multiple train-test splits and report the distribution of values for chancecorrected concordance and median CSMF accuracy. It is also essential to ensure that the CSMF composition of the test datasets is selected at random and is not the same as the CSMF composition of the training datasets.

available the full N by N classification matrix comparing true to assigned cause for all the test datasets. While for most readers this detail will be hard to interpret, it is an important aspect of transparency for validation studies to have this information available at least on demand. For methods that are based on empirical patterns in the data, such as machine learning, Symptom Pattern, Tariff, direct CSMF estimation, or combined methods, great care needs to be taken to ensure that the data used to test the validity of the proposed method are not used for developing or “training� the method. These

40

60

80

100

B

0

20

Estimated Cause Fraction (%)

80 60 40 20 0

Estimated Cause Fraction (%)

100

A

0

20

40

60

80

100

0

True Cause Fraction (%)

40

60

80

100

True Cause Fraction (%)

20

40

60

80

100

C

0

Estimated Cause Fraction (%)

20

0

20

40

60

80

100

True Cause Fraction (%) Figure 2 Estimated CSMF versus true CSMF for causes A, B, and C using method 1 for 500 iterations of experiment with varying true CSMFs.

45


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

Page 10 of 11

To simplify computational needs, the steps of generating different train-test splits and varying the CSMF composition of the test data through resampling can be combined. Several published studies [43,44] have used Cohen’s kappa as a measure of how accurately CSMFs are predicted by the method. In fact, Cohen’s kappa is a summary measure of how well individual causes of death are assigned. CSMF errors of near zero are possible with kappa values that are less than 0.1. Cohen’s kappa is an alternative to average chance-corrected concordance; it is not a measure of CSMF estimation error. Cohen’s kappa, however, will be influenced by the composition of the test training set, as illustrated in Table 2, while average chance-corrected concordance is not affected by the test set cause composition.

Acknowledgements The authors would like to thank Spencer L James, Michael K Freeman, Benjamin Campbell, and Charles Atkinson for intellectual contributions and Roger Ying and Allyne M Delossantos for conducting a literature review. This work was funded by a grant from the Bill & Melinda Gates Foundation through the Grand Challenges in Global Health initiative. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. The corresponding author had full access to all data analyzed and had final responsibility for the decision to submit this original research paper for publication. Author details Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA. 2University of Queensland, School of Population Health, Brisbane, Australia. 1

Authors’ contributions CJLM, RL, and ADL conceptualized the study and guided analyses. ADF and AV performed analyses and helped write the manuscript. CJLM drafted the manuscript and approved the final version. CJLM accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. All authors have read and approved the final manuscript.

Conclusion Even if other measures are reported in addition to those recommended here, inclusion of this standard set of metrics will facilitate comparison across different studies with likely different numbers of causes and different CSMF compositions. The metrics reported here will also encourage an explicit recognition of the potential tradeoffs for some methods between individual cause assignment and CSMF accuracy. Different users are likely to attach different importance to these dimensions; making standardized measurements of both dimensions available for all VA methods will facilitate choosing among the different options. These two standard metrics also reflect the principal information needs of the main users of cause of death data, namely population-level monitoring of leading causes of death (policy) and risk attribution in epidemiological enquiries (research). We expect that standardized metrics will facilitate further methods innovation in the future by providing a clear answer if a new method is leading to improved performance either in the dimension of individual cause assignment or CSMF accuracy. Future validation studies of verbal autopsy methods will also have greater credibility, not only if the appropriate metrics are used, but also if great care is taken in establishing true gold standard cause of death assignment. In the absence of rigorous gold standards, reporting chance-corrected concordance and CSMF accuracy will remain only measures of similarity between two imperfect assessments of cause of death. Robust validation studies require the right metrics as well as the appropriate study design.

Competing interests The authors declare that they have no competing interests. Received: 14 April 2011 Accepted: 4 August 2011 Published: 4 August 2011 References 1. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84:239-245. 2. Setel PW, Whiting DR, Hemed Y, Chandramohan D, Wolfson LJ, Alberti KGMM, Lopez AD: Validity of verbal autopsy procedures for determining cause of death in Tanzania. Trop Med Int Health 2006, 11:681-696. 3. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: issues in their development and validation. Int J Epidemiol 1994, 23:213-222. 4. Huong DL, Minh HV, Byass P: Applying verbal autopsy to determine cause of death in rural Vietnam. Scand J Public Health Suppl 2003, 62:19-25. 5. Byass P, Huong DL, Minh HV: A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scand J Public Health Suppl 2003, 62:32-37. 6. Fantahun M, Fottrell E, Berhane Y, Wall S, Högberg U, Byass P: Assessing a new approach to verbal autopsy interpretation in a rural Ethiopian community: the InterVA model. Bull World Health Organ 2006, 84:204-210. 7. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: issues in their development and validation. Int J Epidemiol 1994, 23:213-222. 8. Polprasert W, Rao C, Adair T, Pattaraarchachai J, Porapakkham Y, Lopez A: Cause-of-death ascertainment for deaths that occur outside hospitals in Thailand: application of verbal autopsy methods. Population Health Metrics 2010, 8:13. 9. Yang G, Rao C, Ma J, Wang L, Wan X, Dubrovsky G, Lopez AD: Validation of verbal autopsy procedures for adult deaths in China. Int J Epidemiol 2006, 35:741-748. 10. Murray CJL, Lopez AD, Black R, Ahuja R, Ali SM, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gomez S, Hernandez B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Praveen D, Premji Z, RamírezVillalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D, Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metr 2011, 9:27. 11. Oti SO, Kyobutungi C: Verbal autopsy interpretation: a comparative analysis of the InterVA model versus physician review in determining causes of death in the Nairobi DSS. Popul Health Metr 2010, 8:21.

Abbreviations CSMF: cause-specific mortality fraction; PCCC: partial chance-corrected concordance; PCVA: physician-certified verbal autopsy; RMSE: root mean squared error; VA: verbal autopsy

46


Murray et al. Population Health Metrics 2011, 9:28 http://www.pophealthmetrics.com/content/9/1/28

Page 11 of 11

12. Byass P, Kahn K, Fottrell E, Collinson MA, Tollman SM: Moving from data on deaths to public health policy in Agincourt, South Africa: approaches to analysing and understanding verbal autopsy findings. PLoS Med 2010, 7:e1000325. 13. Chandramohan D, Setel P, Quigley M: Effect of misclassification of causes of death in verbal autopsy: can it be adjusted? Int J Epidemiol 2001, 30:509-514. 14. Freeman JV, Christian P, Khatry SK, Adhikari RK, LeClerq SC, Katz J, Darmstadt GL: Evaluation of neonatal verbal autopsy using physician review versus algorithm-based cause-of-death assignment in rural Nepal. Paediatr Perinat Epidemiol 2005, 19:323-331. 15. Gajalakshmi V, Peto R: Verbal autopsy of 80,000 adult deaths in Tamilnadu, South India. BMC Public Health 2004, 4:47. 16. Khademi H, Etemadi A, Kamangar F, Nouraie M, Shakeri R, Abaie B, Pourshams A, Bagheri M, Hooshyar A, Islami F, Abnet CC, Pharoah P, Brennan P, Boffetta P, Dawsey SM, Malekzadeh R: Verbal Autopsy: Reliability and Validity Estimates for Causes of Death in the Golestan Cohort Study in Iran. PLoS ONE 2010, 5:e11183. 17. Kumar R, Thakur JS, Rao BT, Singh MMC, Bhatia SPS: Validity of verbal autopsy in determining causes of adult deaths. Indian J Public Health 2006, 50:90-94. 18. Lopman BA, Barnabas RV, Boerma JT, Chawira G, Gaitskell K, Harrop T, Mason P, Donnelly CA, Garnett GP, Nyamukapa C, Gregson S: Creating and Validating an Algorithm to Measure AIDS Mortality in the Adult Population using Verbal Autopsy. PLoS Med 2006, 3:e312. 19. Lopman B, Cook A, Smith J, Chawira G, Urassa M, Kumogola Y, Isingo R, Ihekweazu C, Ruwende J, Ndege M, Gregson S, Zaba B, Boerma T: Verbal autopsy can consistently measure AIDS mortality: a validation study in Tanzania and Zimbabwe. Journal of Epidemiology and Community Health 2010, 64:330-334. 20. Maude GH, Ross DA: The effect of different sensitivity, specificity and cause-specific mortality fractions on the estimation of differences in cause-specific mortality rates in children from studies using verbal autopsies. Int J Epidemiol 1997, 26:1097-1106. 21. Murray CJL, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the Symptom Pattern Method for Analyzing Verbal Autopsy Data. PLoS Med 2007, 4:e327. 22. Quigley MA, Chandramohan D, Rodrigues LC: Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies. Int J Epidemiol 1999, 28:1081-1087. 23. Boulle A, Chandramohan D, Weller P: A case study of using artificial neural networks for classifying cause of death from verbal autopsy. Int J Epidemiol 2001, 30:515-520. 24. Reeves B, Quigley M: A review of data-derived methods for assigning causes of death from verbal autopsy data. Int J Epidemiol 1997, 26:1080-1089. 25. James SL, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metr 2011, 9:31. 26. Murray CJL, James SL, Birnbaum JK, Freeman MK, Lozano R, Lopez AD, the Population Health Metrics Research Consortium (PHMRC): Simplified Symptom Pattern Method for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:30. 27. Flaxman AD, Vahdatpour A, Green S, James SL, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:29. 28. Agresti A: An Introduction to Categorical Data Analysis. 1 edition. New York, NY: Wiley-Interscience; 1996. 29. Goodman LA, Kruskal WH: Measures of Association for Cross Classifications. Journal of the American Statistical Association 1954, 49:732-764. 30. Liebetrau AM: Measures of association Newberry Park, CA: SAGE; 1983. 31. Rosenberg M: Logic of Survey Analysis. 9 edition. New York, NY: Basic Books; 1968. 32. Baldi P, Brunak S, Chauvin Y, Andersen CAF, Nielsen H: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 2000, 16:412-424.

33. King G, Lu Y: Verbal Autopsy Methods with Multiple Causes of Death. Statistical Science 2008, 23:78-91. 34. Flaxman AD, Vahdatpour A, James SL, Birnbaum JK, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:35. 35. Lozano R, Freeman MK, James SL, Campbell B, Lopez AD, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:50. 36. Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:32. 37. Byass P, Fottrell E, Dao LH, Berhane Y, Corrah T, Kahn K, Muhe L, Do DV: Refining a probabilistic model for interpreting verbal autopsy data. Scand J Public Health 2006, 34:26-31. 38. Snow B, Marsh K: How useful are verbal autopsies to estimate childhood causes of death? Health Policy and Planning 1992, 7:22-29. 39. Gneiting T, Raftery AE: Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 2007, 102:359-378. 40. Savage LJ: Elicitation of Personal Probabilities and Expectations. Journal of the American Statistical Association 1971, 66:783-801. 41. Liao H: Medical Imaging and Augmented Reality: 5th International Workshop, MIAR 2010, Beijing, China, September 19-20, 2010, Proceedings Volume 6326 of Lecture Notes in Computer Science, Springer 2010. 42. Wang G, Jiang M: Axiomatic characterization of nonlinear homomorphic means. Journal of Mathematical Analysis and Applications 2005, 303:350-363. 43. Krishnan A, Kumar R, Nongkynrih B, Misra P, Srivastava R, Kapoor SK: Adult mortality surveillance by routine health workers using a short verbal autopsy tool in rural north India. Journal of Epidemiology and Community Health 2011. 44. Joshi R, Lopez AD, MacMahon S, Reddy S, Dandona R, Dandona L, Neal B: Verbal autopsy coding: are multiple coders better than one? Bull World Health Organ 2009, 87:51-57. doi:10.1186/1478-7954-9-28 Cite this article as: Murray et al.: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Population Health Metrics 2011 9:28.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

47


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

RESEARCH

Open Access

Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards Abraham D Flaxman1*, Alireza Vahdatpour1, Sean Green2, Spencer L James1 and Christopher JL Murray1 for the Population Health Metrics Research Consortium (PHMRC)

Abstract Background: Computer-coded verbal autopsy (CCVA) is a promising alternative to the standard approach of physician-certified verbal autopsy (PCVA), because of its high speed, low cost, and reliability. This study introduces a new CCVA technique and validates its performance using defined clinical diagnostic criteria as a gold standard for a multisite sample of 12,542 verbal autopsies (VAs). Methods: The Random Forest (RF) Method from machine learning (ML) was adapted to predict cause of death by training random forests to distinguish between each pair of causes, and then combining the results through a novel ranking technique. We assessed quality of the new method at the individual level using chance-corrected concordance and at the population level using cause-specific mortality fraction (CSMF) accuracy as well as linear regression. We also compared the quality of RF to PCVA for all of these metrics. We performed this analysis separately for adult, child, and neonatal VAs. We also assessed the variation in performance with and without household recall of health care experience (HCE). Results: For all metrics, for all settings, RF was as good as or better than PCVA, with the exception of a nonsignificantly lower CSMF accuracy for neonates with HCE information. With HCE, the chance-corrected concordance of RF was 3.4 percentage points higher for adults, 3.2 percentage points higher for children, and 1.6 percentage points higher for neonates. The CSMF accuracy was 0.097 higher for adults, 0.097 higher for children, and 0.007 lower for neonates. Without HCE, the chance-corrected concordance of RF was 8.1 percentage points higher than PCVA for adults, 10.2 percentage points higher for children, and 5.9 percentage points higher for neonates. The CSMF accuracy was higher for RF by 0.102 for adults, 0.131 for children, and 0.025 for neonates. Conclusions: We found that our RF Method outperformed the PCVA method in terms of chance-corrected concordance and CSMF accuracy for adult and child VA with and without HCE and for neonatal VA without HCE. It is also preferable to PCVA in terms of time and cost. Therefore, we recommend it as the technique of choice for analyzing past and current verbal autopsies. Keywords: Verbal autopsy, cause of death certification, validation, machine learning, random forests

Introduction Verbal autopsy (VA) is a technique for measuring the cause-specific mortality burden for deaths that occur outside of hospitals. In VA, a trained interviewer collects detailed information on signs and symptoms of illness

from laypeople familiar with the deceased. These interviews are analyzed by experts or by computer to estimate 1) the cause of death for each individual and 2) the distribution of causes of death in a population. This information can then be used by policy developers, donors, governments, or decision-makers to choose wisely in developing, requesting, and allocating health resources. For VA to provide useful information to individuals or to society, it is essential that the results of

* Correspondence: abie@uw.edu 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA Full list of author information is available at the end of the article

Š 2011 Flaxman et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

48


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

Page 2 of 11

these interviews be mapped to the underlying cause of death accurately and quickly. Physician-certified verbal autopsy (PCVA) is currently the most common approach to mapping VA interviews to underlying cause of death, but this approach is expensive and time-consuming [1]. Machine learning (ML) methods are computer algorithms that infer patterns from examples [2]. In a classification task like VA analysis, an ML method processes a set of examples ("training data”) that has gold standard classifications, and develops a model to classify additional data. Developing and refining ML methods is a vibrant area of research in computer science, and numerous new methods have been introduced over the past 50 years. One influential ML method, the artificial neural network (ANN), was applied to VA 10 years ago [3]. This approach was deemed potentially useful, pending further evaluation. By casting VA analysis as an application of general ML methods, incremental advances in ML techniques can be directly applied to improve the accuracy of VA analysis. The Random Forest (RF) is an exciting innovation in ML technology [4]. The RF has been used extensively in many domains for classification tasks, and is consistently one of the top approaches [5]. Examples of using ML techniques in various domains include gene selection and classification of microarray data [6], modeling structural activity of pharmaceutical molecules [7], and protein interaction prediction [8]. For this study, we developed an application of the RF Method to VA analysis and compared the performance of RF to PCVA.

a) Fever?

Convulsions? Other

Stiff Neck?

Other Bulging Fontanelle? Measles?

Malaria

Other

Other

Other

b) Did decedent have AIDS?

Excessive bleeding after delivery or abortion?

Maternal Sepsis Pregnant at the time of death?

Maternal Sepsis Did decedent have AIDS?

AIDS

Maternal Sepsis

Die within 6 weeks of childbirth?

Die within 6 weeks of childbirth?

AIDS

AIDS

Excessive bleeding after delivery or abortion?

Maternal Sepsis

Die within 6 weeks after having an abortion?

Maternal Sepsis

Did decedent have TB?

Pregnant at the time of death?

AIDS

Maternal Sepsis

Maternal Sepsis

Maternal Sepsis

Figure 1 Expert algorithm and RF decision trees. A right branch from a node represents “yes” and a left branch represents “no.” a) Decision tree representation of expert algorithm to identify malaria deaths in child VAs (one-versus-all approach); b) Two random decision trees generated by RF to distinguish AIDS deaths from maternal sepsis deaths (one-versus-one approach).

Methods An overview of random forests

Our RF Method for VA analysis seems complicated at first, but is actually a combination of several simple ideas. The first of these is the “decision tree,” a structure for representing a complex logical function concisely as branching decisions [9]. The decision trees in Breiman’s Random Forest method are generated by a randomized algorithm from bootstrap-resampled training data, but the resulting trees are somewhat analogous to the expert algorithms used in early approaches to automatic VA analysis. In Figure 1, Panel a shows a decision-tree representation of an expert algorithm for deciding if a child death was due to malaria or other causes [10], while Panel b depicts decision trees generated as part of the random forest for distinguishing maternal sepsis from HIV deaths. In each, the decision between two possibilities is made by starting from the top level, and progressing to the next level following the branch to the right if the symptom at the current level was endorsed and to the left otherwise. For example, the expert algorithm in Figure 1a will only predict that the cause was malaria if the respondent said that

the decedent had fever and convulsions and no stiff neck, no bulging fontanelle, and no measles. Unlike expert algorithms, however, the decision trees in Breiman’s Random Forest are generated automatically from labeled examples (the training dataset), without guidance from human experts. Instead, a random resampling of the training dataset is generated by drawing examples with replacement from the training dataset, and then a decision tree is constructed sequentially from this, starting from the root. At each node, the algorithm selects a random subset of signs and symptoms to consider branching on, and then branches on the one that best distinguishes between the labels for examples relevant to that node, halting when all relevant examples have the same label. Because of the randomness in this process, running the approach repeatedly on the same training dataset yields different trees, and two such trees are depicted in Figure 1b. Breiman’s original formulation of RF proposed generating hundreds or thousands of decision trees this way,

49


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

Page 3 of 11

and then using them for prediction by calculating the prediction of each tree and taking a vote between their predictions. However, because of the long length of the cause list in verbal autopsy, we followed the “pairwise coupling” approach developed by Hastie [11]. We considered every pair of causes on the cause list, and generated 100 decision trees to distinguish between each pair. This resulted in a table of random forests, depicted schematically in Figure 2. The size of the forest was thus a function of the length of the cause list; for example, for the child VA module, the 21 causes produced a 21 random forest of × 100 = 21, 000 trees. 2 To aggregate the predictions of all of these trees, we tallied cause-specific scores by counting the number of trees that predicted each cause. We then normalized the score for each cause using a novel ranking procedure. The complete process of mapping from scores through ranks to predictions is demonstrated in Figure 3, where, for example, Test C is predicted to be caused by Cause 1, which is not the highest scored cause for this example, but is the highest ranked cause. The full process is as follows: the Test Score Matrix is converted to a Test Rank Matrix on an entry-by-entry basis, by finding the rank of each entry among the corresponding column in the Train Score Matrix. For example, Test A, Cause 3 has score 20, which is the second-highest score when compared with the Cause 3 column of the Train Score Matrix, so it has a rank of 2 in the Test Rank Matrix. After Test A had Cause 1 and Cause 2 ranked similarly, the procedure predicted that Test A was caused by Cause 3 because this is the cause that was highest ranked for A. This is a nonparametric form of whitening, which makes the scores for different causes directly comparable. This approach has a natural generalization to predicting multiple causes for a single death, where the second-highest ranked cause is predicted as the second most likely, etc.

Cause 2 Cause 3 Cause 4 Cause 5 Cause 6

Cause 5

2

5

2 5

2

25

Train B

11

3

19

Train C

14

5

5

Train D

4

12

13

Train E

3

2

18

Train F

7

13

6

Cause 1

Cause 2

Cause 3

Cause 1

Cause 2

Test A

10

10

20

Test A

4

3

Cause 3 2

Test B

5

16

18

Test B

5

1

4

Test C

13

10

14

Test C

2

3

4

Test D

12

9

13

Test D

2

4

5

Test Rank Matrix

Figure 3 Schematic representation of “ranking” technique for cause prediction from random forest scores.

provides a large multisite dataset to assess the performance of new or existing verbal autopsy methods. The PHMRC study identified deaths that met defined clinical diagnostic criteria for cause of death. Then, interviewers visited the households of the deceased to conduct full verbal autopsies. Thus, the gold standard cause of death is paired with the responses from a verbal autopsy. The numbers of records from each site are provided in Table 1. As part of the PHMRC study, all variables including free-text were converted into a series of dichotomous items. All aspects of the study are described elsewhere in more detail [12]. Additional files 1, 2, and 3 list the 40 most informative variables for each cause in the adult, child, and neonatal modules after this data preparation phase was completed. Murray et al. have shown that many traditional metrics of performance, such as specificity or relative and absolute error in CSMFs, are sensitive to the CSMF composition of the test dataset [13] and recommend that robust assessment of performance be undertaken on a range of test datasets with widely varying CSMF compositions. Further, metrics of individual concordance need to be corrected for chance to adequately capture how well a method does over random or equal assignment across causes. The PHMRC has developed a set of 500 test/train splits of the data, which we analyzed. The splits were generated randomly, stratified by cause. Each has a random 75% of examples of each cause in the training set and 25% in the test set. For each split, we used the training data to generate random forests for each pair of causes and then we applied these forests to the test dataset. We never allowed contamination between the training data and the test data - they were kept strictly separate in all steps of the analysis. Further, the cause composition of the test dataset is based on a random draw from an uninformative Dirichlet distribution. The Dirichlet distribution

2 2

Cause 3

9

Test Score Matrix

The Population Health Metrics Research Consortium (PHMRC) gold standard verbal autopsy validation study

5

Cause 2

10

Train Score Matrix

Validation using the PHMRC gold standard test/train datasets

Cause 1 Cause 2 Cause 3 Cause 4 Cause 5

Cause 1 Train A

...

2

5

Figure 2 Schematic representation of RF.

50


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

Page 4 of 11

Table 1 Numbers of VAs collected by site and gold standard level Site

Adult

Child

Neonate

Total

Level 1

Level 2

Level 1

Level 2

Level 1

Level 2

Andhra Pradesh, India

1285

269

385

66

376

1

2,382

Bohol, Philippines

998

262

234

30

374

0

1,898

Dar es Salaam, Tanzania

1556

162

366

106

1047

2

3,239

Mexico City, Mexico

1373

215

124

4

313

2

2,031

Pemba Island, Tanzania

266

31

156

105

261

3

822

Uttar Pradesh, India

1277

142

412

87

251

1

2,170

Total

6,755

1,081

1,677

398

2,622

9

12,542

• Decision trees can compare cause j to all other causes at once, or compare cause j to each other individual cause to come up with “votes” • The signal-to-noise ratio can be improved by removing low-information items using the Tariff Method [16], or all items can be used • Different numbers of signs and symptoms can be used at each decision node • Different numbers of trees can be used in the forest • Cause assignment can be based on the highest scoring cause for each death or on ranking the scores and assigning to the cause with the highest rank

specifies random fractions that sum to 1. Each test split is resampled with replacement to meet the cause fractions specified by a Dirichlet draw. Consequently, each test split has a different distribution of cause fractions, and the cause composition of the training data and test data are always different. We assessed the performance of RF at assigning individual causes of death using median chance-corrected concordance by cause across the 500 test datasets and the median average chance-corrected concordance across causes in the 500 test datasets, following the recommendations of Murray et al [13]. For assessing the performance of RF in estimating CSMFs, we calculated the median CSMF accuracy as well as slope, intercept, and root mean squared error (RMSE) of a linear regression for each cause as a summary of the relationship between estimated CSMFs for a cause and the true CSMF in a particular test dataset [13]. We benchmark RF against PCVA on the same dataset using the results reported by Lozano et al [14]. Murray et al. analyzed data in China two ways: including all items and excluding items that reflected the decedent’s health care experience (HCE) [15]. The purpose of excluding the HCE items is to assess how RF would perform on VA for communities without access to health care. They found, for example, that a considerable component of PCVA performance was related to the household recall of hospital experience or availability of a death certificate or other records from the hospital. We assessed the performance of RF in adults, children, and neonates both with and without the free-response items and the structured questions that require contact with health care to answer (marked in Additional files 1, 2, and 3). There are many potential variations in implementing RF. Specifically:

We conducted an extensive sensitivity analysis to understand the importance of decisions between levels of Tariff-based item reduction, the choice of number of signs and symptoms at every decision node (m), the choice of number of trees (n) in each one-versus-one cause classification, and the difference between maxscore and max-rank cause assignment. To avoid overfitting the data when selecting between the model variants, we conducted our sensitivity analysis using splits 1 to 100 and repeated the analysis using splits 101 to 200 and a random subset of 50 splits. The results of the sensitivity analysis are included in Additional file 4 and show that cause assignment by rank is superior to assignment by score but that the other parameters do not affect chancecorrected concordance or CSMF accuracy. The results shown in the next section are all for the one-versus-one model, with dichotomized variables, with training data reweighted to have equal class sizes, using the 40 most important Tariff-based symptoms per cause, m = 5, n = 100, and the max-rank cause assignment, which produced the highest CSMF accuracy for seven of the first 200 splits of the child VA data with HCE and the highest chance-corrected concordance for 14.

• Continuous and categorical variables can be included as is, or can be dichotomized to reduce noise • The training data can be reweighted so that all causes are represented equally or left as is

Results Individual cause assignment compared to PCVA

Table 2 shows that, for RF over 500 splits, the median value of average chance-corrected concordance for adult

51


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

Page 5 of 11

Table 2 Median chance-corrected concordance (%) for RF and PCVA, by age group with and without HCE RF

90%

PCVA 95% UI

Median

95% UI

80% 70%

No HCE

37.7

(37.6, 38.0)

29.7

(29.4, 29.8)

Child

HCE No HCE

48.0 46.5

(47.8, 48.2) (46.1, 47.0)

44.6 36.3

(44.3, 44.8) (35.9, 36.6)

HCE

51.1

(50.7, 51.6)

47.8

(47.1, 48.3)

No HCE

33.5

(33.0, 33.9)

27.6

(27.2, 28.0)

HCE

34.9

(34.5, 35.4)

33.3

(32.8, 33.7)

Chance-Corrected Concordance

Median Adult

Neonate

100%

VAs without HCE was 37.7% (95% uncertainty interval [UI]: 37.6%, 38%), and for adult VAs with HCE it was 48% (47.8%, 48.2%); for child VAs without HCE it was 46.5% (46.1%, 47%), and for child VAs with HCE it was 51.1% (50.7%, 51.6%). For neonatal VAs without HCE the median average chance-corrected concordance was 33.5% (33%, 33.9%), and for neonatal VAs with HCE it was 34.9% (34.5%, 35.4%). Note that the neonate VAs results presented in the tables for PCVA are for a shorter cause list that only includes six causes, where all the preterm delivery causes are grouped together. This is due to the fact that PCVA performed very poorly on a cause list with 11 causes. The differential value of HCE to RF in adult VA is more substantial than in child or neonatal VAs. Including HCE responses yields a significant relative increase of 10.3% in median chance-corrected concordance for adult VA. This could be because adults have more substantial experience with health care, and hence more relevant information is generated that aids in VA analysis, or it could be confounded by the differences between the adult, child, and neonate cause lists. In PCVA, however, including HCE responses produces a large increase in median chance-corrected concordance for all modules. In all six of these settings, the median chance-corrected concordance is significantly higher for RF than for PCVA. Figure 4 shows that partial-cause assignment increases the partial-cause chance-corrected concordance for all age groups with and without HCE. The increasing partial-cause chance-corrected concordance as a function of the number of causes shows that RF contains additional information in the second, third, etc., most likely causes. However, as the partial-cause assignment continues, the added value from new cause assignment decreases due to the chance-correcting element in the partial-chance-corrected concordance formula, as demonstrated by the decreasing slope. Figures 5, 6, and 7 show the chance-corrected concordance of RF on a cause-by-cause basis for adult, child, and neonatal VAs with and without HCE (also see Additional file 5). Figure 8 shows that on a cause-by-cause basis, RF is better than PCVA with HCE by at least 10

60% 50% Adult without HCE

40%

Adult with HCE 30%

Child without HCE Child with HCE

20%

Neonate without HCE 10%

Neonate with HCE

0%

0

1

2

3

4

5

6

7

Number of Causes Assigned

Figure 4 Partial-cause assignment increases partial chancecorrected concordance for adult, child, and neonate VAs with and without HCE. Slope of increase is higher between one and two cause assignments.

percentage points of chance-corrected concordance for 13 causes for adult deaths (lung cancer, fires, renal failure, pneumonia, homicide, drowning, cirrhosis leukemia/ lymphomas, breast cancer, prostate cancer, epilepsy, cervical cancer, and poisonings). On the other hand, PCVA performed substantially better in detecting suicide, acute myocardial infarction, stomach cancer, other noncommunicable diseases, and AIDS. In addition, as depicted in Figure 9, in five causes of child deaths, RF concordance is at least 10 percentage points higher with HCE (falls, sepsis, fires, other cardiovascular diseases, and measles). Among causes of child deaths, PCVA performed better in detecting other cancers, drowning, encephalitis, violent death, diarrhea/dysentery, and other defined causes of child deaths. Head-to-head comparison of the neonatal performance between PCVA and RF is not possible though, as PCVA utilized a shorter cause list. Another advantage of RF over PCVA is its relatively consistent performance in the presence and absence of HCE variables. PCVA concordances vary significantly with absence of HCE variables (e.g., for 22 causes of adult deaths, without HCE, concordance decreased by more than 10 percentage points). On the other hand, RF concordance only decreases substantially in 15 adult causes. In addition, RF shows more consistency among all causes. For example, its minimum median chancecorrected concordance in adult causes is 7.9% (without HCE) and 10.7% (with HCE), while minimum median chance-corrected concordance for PCVA without HCE is negative for two causes (meaning PCVA did worse than chance). RF does benefit substantially from HCE variables for certain important causes, however. For

52


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

Page 6 of 11

HCE

No HCE

Colorectal Cancer Asthma Malaria Renal Failure Stomach Cancer Other Infectious Diseases Pneumonia Other Noncommunicable Diseases Other Cardiovascular Diseases COPD Lung Cancer TB Suicide Leukemia/Lymphomas Prostate Cancer AIDS Acute Myocardial Infarction Diarrhea/Dysentery Falls Diabetes Epilepsy Fires Cirrhosis Esophageal Cancer Poisonings Stroke Cervical Cancer Other Injuries Road Traffic Maternal Homicide Drowning Breast Cancer Bite of Venomous Animal 0

10

20

30 40 50 60 70 80 ChanceͲCorrected Concordance (%)

90

100

Figure 5 Median chance-corrected concordance (%) for RF across 500 splits, by cause, for adult VA, with and without HCE.

example, for adult deaths due to tuberculosis, AIDS, diabetes, and asthma, chance-corrected concordance increased by more than 20 percentage points when HCE variables were included.

accuracy for RF for adult VAs with HCE was 0.772 (0.769, 0.776), and for adult VAs without HCE it was 0.726 (0.721, 0.730); for child VAs with HCE it was 0.779 (0.775, 0.785), and for child VAs without HCE it was 0.763 (0.755, 0.769); for neonatal VAs with HCE it was 0.726 (0.717, 0.734), and for neonatal VAs without HCE it was 0.720 (0.71, 0.732). The patterns for this population-level estimation quality metric are qualitatively the

CSMF estimation compared to PCVA

Table 3 compares the median CSMF accuracy for RF and PCVA. Over 500 splits, the median value of CSMF

53


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

Page 7 of 11

HCE

No HCE

Other Cardiovascular Diseases Other Cancers Other Infectious Diseases Encephalitis Sepsis Other Digestive Diseases Meningitis Other Defined Causes of Child Deaths Malaria Pneumonia Hemorrhagic Fever Falls Diarrhea/Dysentery AIDS Poisonings Fires Measles Violent Death Drowning Road Traffic Bite of Venomous Animal 0

10

20

30 40 50 60 70 80 ChanceͲCorrected Concordance (%)

90

100

90

100

Figure 6 Median chance-corrected concordance (%) for RF across 500 splits, by cause, for child VA, with and without HCE.

HCE

No HCE

Sepsis with Local Bacterial Infection Preterm Delivery and Birth Asphyxia Pneumonia Preterm Delivery and Sepsis Preterm Delivery without RDS Preterm Delivery with RDS Meningitis/Sepsis Preterm Delivery and Sepsis/Birth Asphyxia Congenital Malformation Birth Asphyxia Stillbirth 0

10

20 30 40 50 60 70 80 ChanceͲCorrected Concordance (%)

Figure 7 Median chance-corrected concordance (%) for RF across 500 splits, by cause, for neonatal VA, with and without HCE.

54


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

Page 8 of 11

Table 3 Median CSMF accuracy for RF and PCVA, by age group with and without HCE

Adult 100

RF

Bite of Venomous Animal Breast Cancer Drowning HomicideMaternal

RF chance-corrected concordance (%)

80

Road Traffic

Cervical Cancer CirrhosisDiabetes Stroke Esophageal Cancer Other InjuriesAIDS

60 Poisonings

20

40

60

80

100

PCVA chance-corrected concordance (%)

Figure 8 Scatter of median chance-corrected concordance of RF versus PCVA, for adult module.

same as those observed in the individual-level metric above. The value of HCE information is more substantial for adult VA, although it yielded a smaller increase, changing the median CSMF accuracy by 0.046. For child VA, the value is small, where it yields an increase of 0.016, and for neonate, the HCE value is not significant (increase of 0.006). In all of these settings except for Child 100

Bite of Venomous Animal Road Traffic

RF chance-corrected concordance (%)

80

Drowning

Fires

Measles

Violent Death

Poisonings 60

Falls

AIDS

Hemorrhagic Fever MalariaDiarrhea/Dysentery

20

Pneumonia Other Cancers Meningitis Other Defined Causes of Child Deaths Other Cardiovascular Diseases Other Digestive Diseases Encephalitis Sepsis Other Infectious Diseases

0 0

95% UI

No HCE

0.726

(0.721, 0.730)

0.624

(0.619, 0.631)

Child

HCE No HCE

0.772 0.763

(0.769, 0.776) (0.755, 0.769)

0.675 0.632

(0.669, 0.680) (0.626, 0.642)

HCE

0.779

(0.775, 0.785)

0.682

(0.671, 0.690)

No HCE

0.720

(0.710, 0.732)

0.695

(0.682, 0.705)

HCE

0.726

(0.717, 0.734)

0.733

(0.719, 0.743)

neonates with HCE, the median CSMF accuracy was significantly higher for RF than for PCVA. For the neonates with HCE, the difference was not statistically significant, and the comparison was done for a six cause list for PCVA and a more challenging 11 cause list for RF. Figure 10 shows scatter plots of the estimated versus true CSMF for four select causes of adult deaths (each of the 500 splits contributes a single point to the scatter). The figure shows how RF estimation quality tends to be different for different causes. As depicted, RF estimations for AIDS, maternal, and ischemic heart disease (IHD) are closely correlated with the true CSMFs. However, for colorectal cancer, estimations are noisier, and regardless of the true CSMF, RF assigns similar CSMFs in all 500 splits. To summarize the quality of RF estimation for each cause for all age groups, Additional file 6 shows the slope, intercept, and RMSE from linear regression of estimated versus true CSMFs. This population-level metric of analysis quality gave results qualitatively similar to the individual-level metric on a causespecific basis. The RF CSMF slopes range from 0.097 to 0.904 for adult VAs, 0.105 to 0.912 for child VAs, and 0.079 to 0.845 for neonatal VAs. PCVA has similar ranges for the three age groups. However, on a causeby-cause basis, PCVA and RF show different characteristics. A comparison revealed that, for the same causes that the methods have high chance-corrected concordance, the CSMF regression slope is higher for RF. This shows that RF attains higher cause-specific chance-corrected concordances as a result of better classification, not simply by assigning a higher portion of deaths to some causes. The results of performing RF with a higher number of trees in each one-versus-one cause classifier showed that the method is stable by only using 100 trees per classifier. It should be noted that, while in the literature it is suggested that increasing the number of trees increases the classification precision, as our overall RF Method includes an ensemble of one-versus-one classi 46 fiers (e.g., for adult VAs, RF has = 1035 one-ver2 sus-one classifiers, each including 100 trees), the overall

0

40

Median

Epilepsy

Colorectal Cancer Stomach Cancer

0

95% UI

Adult

Neonate

Fires Falls Lung Cancer TB COPD Prostate Cancer Asthma 40 Diarrhea/Dysentery Acute Myocardial Infarction Leukemia/Lymphomas Renal Failure Other Cardiovascular Diseases Pneumonia Malaria Suicide Other Noncommunicable Diseases 20 Other Infectious Diseases

PCVA

Median

20

40

60

PCVA chance-corrected concordance (%)

80

100

Figure 9 Scatter of median chance-corrected concordance of RF versus PCVA, for child module.

55


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

Page 9 of 11

Adult AIDS (with HCE)

Adult Colorectal Cancer (with HCE)

Adult IHD - Acute Myocardial Infarction (with HCE)

Adult Maternal (with HCE)

Figure 10 Estimated versus true CSMFs for 500 Dirichlet splits, showing that for selected causes of adult mortality (AIDS, colorectal cancer, maternal, and IHD), the performance of RF varies. For AIDS and IHD, RF tends to overestimate the cause fraction when the true CSMF is small and underestimate otherwise. For colorectal cancer, RF mostly assigns the same CSMF regardless of true CSMF, and for maternal causes, RF is more accurate.

analysis process, and increases reliability. While it may take days for a team of physicians to complete a VA survey analysis, a computer approach requires only seconds of processing on hardware that is currently affordably available. In addition, using machine learning leads to reliability, since the same interview responses will lead to the same cause assignment every time. This is an important advantage over PCVA, which can produce results of widely varying quality among different physicians, according to their training and experience [14]. Despite these strengths of RF, the method does have weaknesses in individual-level prediction of certain causes. For example, chance-corrected concordances for malaria and pneumonia in adults are around 25% even with HCE. Chance-corrected concordances for encephalitis, sepsis, and meningitis in children are in the 15% to 25% range. However, in many applications, it is the population-level estimates that are most important, and the linear regression of true versus estimated cause

number of trees is high, which results in stable performance.

Discussion We found that the RF Method outperforms PCVA for all metrics and settings, with the exception of having slightly lower CSMF accuracy in neonates when HCE was available. Even in this single scenario, the difference in CSMF accuracy is not statistically significant, and furthermore, the PCVA analysis for neonates was limited to a six cause list, while the RF analysis was done on the full 11 cause list. The degree of the improvement varies among metrics, among age modules, and with the presence or absence of HCE variables. When the analysis is conducted without HCE variables, RF is particularly dominant. The superior performance of RF compared to PCVA with respect to all of our quality metrics is excellent because this method also reduces cost, speeds up the

56


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

Page 10 of 11

HCE items, which gives some indication of the potential differences. The machine learning technique described in this paper will be released as free open source software, both as stand-alone software to run on a PC and also as an application for Android phones and tablets, integrated into an electronic version of the VA instrument.

fraction shows that for these causes, RF has a RMSE of at most 0.009 for the adult causes and 0.02 for the child causes. It may be possible to use these RMSEs together with the slopes and intercepts to yield an adjusted CSMF with uncertainty. While the ANN method used by Boulle et al. 10 years ago [3] showed the potential of using ML techniques, the RF Method we have validated here has proven that ML is ready to be put into practice as a VA analysis method. ML is an actively developing subdiscipline of computer science, so we expect that future advances in ML classification will be invented over the coming years, and VA analysis techniques will continue to benefit from this innovation. During the development of our approach, we considered many variants of RF. However, the possibilities are endless, and even some other variant of RF may improve on the method presented here. For example, nonuniformly increasing the number of trees in the forest to have proportionately more for select causes (in the spirit of Boosting [17]) is a potential direction for future exploration. For any ML classifier to be successful, several requirements should be met. As discussed earlier, the accuracy of classification relies considerably on the quality of the training data (deaths with gold standard cause known to meet clinical diagnostic criteria). While the PHMRC study design collected VA interviews distributed among a wide array of causes from a variety of settings, certain causes were so rare that too few cases occurred to train any ML classifier to recognize them. Future studies could focus on collecting additional gold standard VAs for priority diseases to complement the PHMRC dataset. These additional data could improve the accuracy of RF and other ML models on certain selected causes. Future research should also focus on assessing VA’s performance in different settings. For example, users in India may be interested specifically in how RF performs in India instead of across all of the PHRMC sites, particularly if it is possible to train the model only on validation deaths from India. All VA validation studies depend critically on the quality of validation data, and this RF validation is no exception. A unique feature of the PHMRC validation dataset, the clinical diagnostic criteria, ensures that the validation data are very precise about the underlying cause of death. However, this clinical diagnosis also requires that the deceased have some contact with the health system. The validity of the method therefore depends critically on the assumption that the signs and symptoms observed in the deaths that occur in hospitals for a given cause are not substantially different than deaths from that cause that occur in communities without access to hospitals. We have investigated this assumption by conducting our analysis with and without

Conclusions We presented an ML technique for assigning cause of death in VA studies. The optimization steps taken to improve the accuracy of RF classifiers in VA application were presented. We found that our RF Method outperformed PCVA in chance-corrected concordance and CSMF accuracy for adult and child VA with and without HCE and for neonatal VA without HCE. In addition, it is preferable to PCVA in terms of both cost and time. Therefore, we recommend it as the technique of choice for analyzing past and current verbal autopsies. Additional material Additional file 1: Top 40 items based on Tariff Method for adult causes (Items that are extracted from open text field are marked with *, other HCE variables are marked with +). Additional file 2: Top 40 items based on Tariff Method for child causes (Items that are extracted from open text field are marked with *, other HCE variables are marked with +). Additional file 3: Top 40 items based on Tariff Method for neonate causes (Items that are extracted from open text field are marked with *, other HCE variables are marked with +). Additional file 4: Sensitivity analysis for 54 variants of the RF algorithm applied to 200 splits of the child VA data with HCE. Additional file 5: Median chance-corrected concordance (%) across 500 splits, by age group and cause with and without HCE. Additional file 6: Slope, intercept, and RMSE for linear regression of true versus estimated CSMF.

Abbreviations ANN: artificial neural network; CCVA: computer-coded verbal autopsy; CSMF: cause-specific mortality fraction; VA: verbal autopsy; ML: machine learning; PCVA: physician-certified verbal autopsy; PHRMC: Population Health Metrics Research Consortium; RF: Random Forest; RMSE: root mean squared error; HCE: health care experience; IHD: ischemic heart disease. Acknowledgements This research was conducted as part of the Population Health Metrics Research Consortium: Christopher JL Murray, Alan D Lopez, Robert Black, Ramesh Ahuja, Said Mohd Ali, Abdullah Baqui, Lalit Dandona, Emily Dantzer, Vinita Das, Usha Dhingra, Arup Dutta, Wafaie Fawzi, Abraham D Flaxman, Sara Gomez, Bernardo Hernandez, Rohina Joshi, Henry Kalter, Aarti Kumar, Vishwajeet Kumar, Rafael Lozano, Marilla Lucero, Saurabh Mehta, Bruce Neal, Summer Lockett Ohno, Rajendra Prasad, Devarsetty Praveen, Zul Premji, Dolores RamĂ­rez-Villalobos, Hazel Remolador, Ian Riley, Minerva Romero, Mwanaidi Said, Diozele Sanvictores, Sunil Sazawal, Veronica Tallo. The authors would like to additionally thank Charles Atkinson for managing the PHMRC verbal autopsy database and Michael K Freeman, Benjamin Campbell, and Charles Atkinson for intellectual contributions to the analysis. This work was funded by a grant from the Bill & Melinda Gates Foundation through the Grand Challenges in Global Health initiative. The funders had

57


Flaxman et al. Population Health Metrics 2011, 9:29 http://www.pophealthmetrics.com/content/9/1/29

Page 11 of 11

no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. The corresponding author had full access to all data analyzed and had final responsibility for the decision to submit this original research paper for publication.

15. Murray CJL, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the symptom pattern method for analyzing verbal autopsy data. PLoS Med 2007, 4:e327. 16. James SL, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metr 2011, 9:31. 17. Schapire RE, Freund Y, Bartlett P, Lee WS: Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. The Annals of Statistics 1998, 26:1651-1686.

Author details 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA. 2Bill & Melinda Gates Foundation, Seattle, USA. Authors’ contributions ADF, AV, and SG performed analyses. AV and CJLM edited the manuscript. SLJ helped prepare the data and results. ADF drafted the manuscript and approved the final version. ADF accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. All authors have read and approved the final manuscript.

doi:10.1186/1478-7954-9-29 Cite this article as: Flaxman et al.: Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Population Health Metrics 2011 9:29.

Competing interests The authors declare that they have no competing interests. Received: 14 April 2011 Accepted: 4 August 2011 Published: 4 August 2011 References 1. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84:239-245. 2. Mitchell TM: Machine Learning. 1 edition. New York, NY: McGraw-Hill Science/Engineering/Math; 1997. 3. Boulle A, Chandramohan D, Weller P: A case study of using artificial neural networks for classifying cause of death from verbal autopsy. Int J Epidemiol 2001, 30:515-520. 4. Breiman L: Random Forests. Machine Learning 2001, 45:5-32. 5. Caruana R, Karampatziakis N, Yessenalina A: An empirical evaluation of supervised learning in high dimensions. Proceedings of the 25th International Conference on Machine Learning - ICML ‘08, Helsinki, Finland 2008, 96-103. 6. Diaz-Uriarte R, Alvarez de Andres S: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 2006, 7:3. 7. Svetnik V, Liaw A, Tong C, Wang T: Application of Breiman’s Random Forest to modeling Structure-Activity Relationships of pharmaceutical molecules. In Multiple Classier Systems, Fifth International Workshop, MCS 2004, 9-11 June 2004, Proceedings, Cagliari, Italy. Volume 3077. Edited by: Roli F, Kittler J, Windeatt T. Lecture Notes in Computer Science, Berlin: Springer; 2004:334-343. 8. Qi Y, Klein-Seetharaman J, Bar-Joseph Z: Random forest similarity for protein-protein interaction prediction from multiple sources. Pac Symp Biocomput 2005, 531-542. 9. Breiman L, Friedman J, Olshen R, Stone C: Classification and Regression Trees. 1 edition. Wadsworth and Brooks, Monterey, CA; 1984. 10. Anker M, Black RE, Coldham C, Kalter HD, Quigley MA, Ross D, Snow RW: A Standard Verbal Autopsy Method for Investigating Causes of Death in Infants and Children. Geneva: World Health Organization; 1999. 11. Hastie T, Tibshirani R: Classification by pairwise coupling. Ann Statist 1998, 26:451-471. 12. Murray CJL, Lopez AD, Black R, Ahuja R, Ali SM, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gómez S, Hernández B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Praveen D, Premji Z, RamírezVillalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D, Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metr 2011, 9:27. 13. Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9:28. 14. Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:32.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

58


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

RESEARCH

Open Access

Simplified Symptom Pattern Method for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards Christopher JL Murray1*, Spencer L James1, Jeanette K Birnbaum2, Michael K Freeman1, Rafael Lozano1 and Alan D Lopez3 for the Population Health Metrics Research Consortium (PHMRC)

Abstract Background: Verbal autopsy can be a useful tool for generating cause of death data in data-sparse regions around the world. The Symptom Pattern (SP) Method is one promising approach to analyzing verbal autopsy data, but it has not been tested rigorously with gold standard diagnostic criteria. We propose a simplified version of SP and evaluate its performance using verbal autopsy data with accompanying true cause of death. Methods: We investigated specific parameters in SP’s Bayesian framework that allow for its optimal performance in both assigning individual cause of death and in determining cause-specific mortality fractions. We evaluated these outcomes of the method separately for adult, child, and neonatal verbal autopsies in 500 different population constructs of verbal autopsy data to analyze its ability in various settings. Results: We determined that a modified, simpler version of Symptom Pattern (termed Simplified Symptom Pattern, or SSP) performs better than the previously-developed approach. Across 500 samples of verbal autopsy testing data, SSP achieves a median cause-specific mortality fraction accuracy of 0.710 for adults, 0.739 for children, and 0.751 for neonates. In individual cause of death assignment in the same testing environment, SSP achieves 45.8% chance-corrected concordance for adults, 51.5% for children, and 32.5% for neonates. Conclusions: The Simplified Symptom Pattern Method for verbal autopsy can yield reliable and reasonably accurate results for both individual cause of death assignment and for determining cause-specific mortality fractions. The method demonstrates that verbal autopsies coupled with SSP can be a useful tool for analyzing mortality patterns and determining individual cause of death from verbal autopsy data. Keywords: Verbal autopsy, Symptom Pattern, validation, gold standard

Background Methods for analyzing verbal autopsies (VAs) seek to predict causes of death and/or cause-specific mortality fractions (CSMFs) based solely on a decedent’s signs and symptoms leading up to death. The signs and symptoms for a given death are recorded in an interview with a member of the decedent’s family. The family member’s responses can then be analyzed to deduce the true cause of death through either physician-certified verbal autopsy (PCVA) or computer-coded verbal autopsy

(CCVA). One CCVA approach proposed in 2007 by Murray et al. [1] was the Symptom Pattern (SP) Method. SP is a Bayesian approach that implements statistical machinery similar to the InterVA program [2], developed by Byass et al. [3] in 2003. InterVA relies on expert judgment to determine the probability of a particular cause of death given a reported symptom, while SP is a data-driven approach which invokes 1) King-Lu direct CSMF estimation [4] as the prior probability distribution, and 2) the actual probability of responses to combinations of items conditional on true cause in verbal autopsy data, which includes the true cause of death. The validated verbal autopsy data essentially trains the model, and the resulting model can then be applied to

* Correspondence: cjlm@uw.edu 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA Full list of author information is available at the end of the article

© 2011 Murray et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

59


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 2 of 14

may actually make performance of the method worse. Lessons learned in studies of pairwise analysis [8] have also suggested that two strategies may improve performance: 1) developing models for each cause compared to all other causes, one at a time, may be better than a model for all causes at once, and 2) using a smaller, more informative set of items for each cause may improve performance. Building on these insights, we propose a simplified version of the Symptom Pattern Method and assess its performance using the PHMRC gold standard validation train and test datasets.

verbal autopsy questionnaires for which the true cause of death is unknown. These unknown deaths are then assigned a predicted cause of death based on the posterior distribution of the probability of death being due to each cause. Each cause’s predicted deaths can then be aggregated to produce estimates of cause-specific mortality fractions in the population of verbal autopsy data being analyzed. The SP Method has previously been implemented in the R programming language due to its flexibility and compatibility with the King-Lu algorithm. For users unfamiliar with computer programming, this interface can pose difficulties. Furthermore, the computational complexity and depth used in both the King-Lu and SP algorithms can make it difficult for operators to unpack the quantitative rationale of a cause assignment for a particular death. Despite these obstacles, SP has demonstrated success in both assigning individual cause of death and determining cause-specific mortality fractions. In a study of verbal autopsy data from China, SP performed better than PCVA [1]. During the last four years of verbal autopsy research, a number of conceptual, methodological, and empirical innovations have occurred. First, it is increasingly clear that methods such as King-Lu and SP can identify very complex patterns in data. It is essential in evaluating these methods to strictly separate training and test data even when complex resampling is undertaken. Preventing the contamination of test data with train data permits the evaluation of how well a given method will work in practice. Second, Murray et al. [5] have identified that many metrics of performance such as specificity or relative and absolute error in CSMFs are sensitive to the CSMF composition of the test data set. Robust assessment of performance must be undertaken across a range of test datasets with widely varying CSMF compositions. Further, metrics of individual concordance need to be corrected for chance to adequately capture how well a method does over and above random or equal assignment across causes. Third, the Population Health Metrics Research Consortium (PHMRC) multisite study [6] provides the first largescale data set where rigorous clinical diagnostic criteria have been used to assign cause of death in the validation dataset. The availability of improved gold standards provides an opportunity to assess more accurately how well methods perform. Several developments suggest that SP as originally proposed can be simplified with enhanced performance. Flaxman et al. [7] have studied when the King-Lu method of direct CSMF estimation provides accurate CSMFs. They report that when the cause list is larger than seven to 10 causes, the results of King-Lu can be quite inaccurate. Using these CSMFs as a prior in SP

Methods Options for modifying the Symptom Pattern Method

The basis for the SP Method is Bayes’ theorem applied to cause of death analysis. Formally: P Si Di = j P Di = j P Di = j |Si = J P Si Di = j P Di = j j =1

Where Si is the response pattern on a set of k items in the VA (not simply one item), and where P(Di = j|Si) is the probability of individual i dying from cause j, conditional on the observed vector of symptom responses, Si. Examination of Bayes’ theorem highlights four options for SP modification. First, we can develop a model for one cause at a time that produces a posterior probability of a death being from that cause or not from that cause. In the notation provided, Di = j or not j. Alternatively we can develop a model as originally proposed for all causes at the same time where Di = j for j from 1 to the last cause. Second, the prior can be based as originally proposed on the application of the King-Lu approach to direct CSMF estimation, or it can be based on a uniform prior where all causes are considered to be equally likely. In the case of single cause models, a uniform prior would say the probability of a death being from cause j and all other causes other than j would be equal. Third, in the original SP the responses on all items were used simultaneously. Alternatively, we have observed in other verbal autopsy research that it is possible to improve signals in the data by only including the most informative items for a given cause in that cause-specific model. Specifically, we can use the top items for a cause ordered by their tariff [9]. Tariff is most easily viewed as a robust Z score identifying when particular signs or symptoms have high information content for a particular cause. In this analysis, we tested a range of options and conducted our comparative analyses using the top 40 items per cause in terms of the absolute value of the tariff. Fourth, we can vary the number of items evaluated at each time to determine a response pattern. The original

60


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 3 of 14

approach. For each split, we use the training data for SP to establish the P(Sik|Di = j) and then apply these patterns to the test dataset. In no case are there deaths in the training data that are replicated in the test data. Further, the cause composition of the test dataset is based on a random draw from an uninformative Dirichlet distribution so that the cause composition of the training data and test data are always different.

SP paper used 16. Here we have evaluated using a cluster size of 10 versus one. The lower cluster size of 10 compared to 16 improves speed and stability of the results without reducing performance. We have evaluated dropping all interdependencies, because a method with cluster size one could be implemented much more efficiently in many computational platforms. Understanding the importance of clustering is an important dimension to SP. Because using the top 40 symptoms ordered by tariff is only meaningful for single cause models, in total these four options yield 12 possible modifications of SP. In all of these modifications, including the single cause models, we have assigned the final cause of death using the highest posterior value by cause. When assigning more than one cause of death, we have assigned the highest posterior first, the second highest next, etc.

Simplifying Symptom Pattern

To select the best-performing variant, we conducted three types of analyses. We assess the performance of the different variants of SP at assigning individual causes of death using median chance-corrected concordance by cause across the first 100 test datasets and the median average chance-corrected concordance across causes in the 100 test datasets following the recommendations of Murray et al. [5]. For assessing the performance of SP in estimating CSMFs, we report median CSMF accuracy [5] as well as concordance correlation coefficients by cause as a summary of the relationship between estimated CSMFs for a cause and the true CSMF in a particular test dataset. To explore the comparative performance of all 12 SP variants, we have undertaken this assessment for adults, children, and neonates using household recall of HCE. On the basis of these results, we have selected a simplified approach, which we have implemented for children and neonates. To insure that this analysis did not yield results that were biased by analyzing the first 100 train-test splits, we repeated this analysis for the second 100 splits. We also confirmed that the results were robust to the selection of splits by analyzing five sets of randomly-drawn test-train splits of size 50. In the text, we present results for the analysis of the first 100 splits, but our findings are robust across the other tests. On the basis of these results, we select one variant as the Simplified Symptom Pattern (SSP) Method.

Validation using the PHMRC gold standard train-test datasets

As described elsewhere in more detail [6], the PHMRC gold standard verbal autopsy validation study provides a unique and large multisite dataset to assess the performance of new or existing verbal autopsy methods. The PHMRC study collected VAs on deaths that met defined clinical diagnostic criteria for cause of death. For example, a death from an acute myocardial infarction required evidence as obtained by one or more of the following: a cardiac perfusion scan; ECG changes; documented history of coronary artery bypass surgery, percutaneous transluminal coronary angioplasty, or stenting; coronary angiography; and/or enzyme changes in the context of myocardial ischemia. As part of the PHMRC study, all variables including free-text responses regarding health care experiences (HCE) have been converted into a series of dichotomous items, which can be analyzed by SP. Table 1 provides the number of items in the adult, child, and neonatal modules. The PHMRC has developed a fixed set of 500 train and test splits of the data to allow for direct performance comparison between methods. We have analyzed all 500 of these splits for the final validation results presented in this paper. We have used the first 100 and second 100 splits to select the best variant of SP for simplifying the

Validation of Simplified Symptom Pattern Method

Using the full 500 train-test splits in the PHMRC dataset, we assess the performance of the SSP Method. We benchmark variants of SP with each other and against PCVA in the same dataset using the results reported by Lozano et al. [10]. Murray et al. [1] analyzed data for China two ways: including all items and excluding items that reflected the decedent’s contact with health services. The purpose of excluding the latter structured and free-text items was to assess how VA would perform in poor rural populations without access to care. They found, for example, that a considerable component of PCVA performance was related to the household recall of hospital experience or availability of a death certificate

Table 1 Numbers of items in adult, child, and neonate modules Dichotomous Continuous Categorical Free text Total Adult

130

25

32

7*

194

Child

55

13

29

7*

104

Neonate 76

21

33

7*

137

*Free text responses were dichotomized as individual words and expanded into 106, 90, and 39 items for adults, children, and neonates, respectively. Total does not include these expanded items.

61


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 4 of 14

or other records from the hospital. We have assessed the performance of our SSP Method in adults, children, and neonates excluding the household recall of HCE.

Simplified SP applied to adults, children, and neonates compared to PCVA Individual cause assignment

Table 3 shows the comparative performance of SSP versus PCVA in terms of chance-corrected concordance. For adults, SSP outperforms PCVA on the same test datasets both with and without household recall of health care experience. For children, SSP produces better chance-corrected concordance in comparison to PCVA both when health care information is added and withheld. For neonates, SSP does better than PCVA without HCE and slightly worse than PCVA when HCE information is added, though direct comparison is not possible since PCVA analysis was limited to six neonatal causes, while SSP predicted for 11 neonatal causes. Figures 1, 2, and 3 highlight the hierarchy of causespecific chance-corrected concordances in the adult, child, and neonatal modules, respectively. These figures also emphasize the extent to which the addition of health care experience information can inform the predictions for certain causes. AIDS in the adult module, for example, achieves much higher chance-corrected concordance upon addition of HCE. Additional file 1 provides the chance-corrected concordances by cause with and without HCE for SSP. Remarkably, for 15 adult causes with HCE, chance-corrected concordances are above 50%. These causes include all the injuries but also causes such as stroke, AIDS, cirrhosis, cervical cancer, esophageal cancer, and breast cancer. Even when HCE is excluded, chance-corrected concordance is higher than 50% for 13 causes. The causes with the worst performance included some cancers such as colorectal, stomach, prostate, and leukemia/lymphoma. Residual categories such as other noncommunicable, other cardiovascular, and other infectious diseases do particularly poorly. In addition, both renal failure and pneumonia are notable for very low chance-corrected concordances. Additional file 1 for children highlights good performance for the injuries but also for measles, hemorrhagic fever, AIDS, pneumonia, and malaria. As with adults, poor performance is notable for residual categories such as other cancers, other infectious diseases, and other cardiovascular diseases. In neonates (also shown in Additional file 1) SSP does well for stillbirths, preterm delivery and sepsis/birth asphyxia, meningitis/sepsis, and birth asphyxia.

Results Analysis of the performance of SP alternatives

Table 2 summarizes the median chance-corrected concordance and CSMF accuracy for all 12 SP variants on each age module including household recall of HCE. The table identifies each variant in terms of four attributes: symptom cluster size (10 versus one), causemodels (models for each single cause compared to noncause versus one model for multiple causes), the number of symptoms used in the likelihood step of Bayes’ theorem (all versus the top 40), and the prior CSMF distribution (based on the application of KingLu versus a uniform prior). The best results for adults are for the variant that uses a cluster size of 10, models for each cause compared to noncause, the top 40 symptoms, and a uniform prior. However, we observed that other variants produced higher performance in children and neonates. We chose to use the model specifications that produced the most consistent results across age modules by considering the rank of each variant for each age group on both chance-corrected concordance and CSMF accuracy. In particular, we found that using a cluster size of 10, running single cause models, using all symptoms, and using a uniform prior would produce the best results across modules. A close second in terms of overall performance is the variant using a cluster size of 10, running single cause models, using the top 40 symptoms based on tariff, and using a uniform prior. In fact, this variant did best on both metrics for adults but worse for neonates and children than the variant selected. The only difference between the two top performing variants is the set of symptoms included. In general, changes from single cause models to one model for multiple causes have small decrements in performance. Large drops in performance are associated with shifting from the uniform prior to the King-Lu prior and shifting from using a symptom cluster size of 10 compared to one. Our findings on which variant performs best were consistent across other tests, including reassessment of performance for the second 100 test-train splits and assessment on randomly drawn test-train splits. In all cases, the shift from uniform priors to King-Lu priors and from cluster size 10 to cluster size one is associated with substantial decrements in performance. This simplified variant of SP -Simplified Symptom Pattern - performs substantially better than the original version published in 2007.

CSMF estimation

Table 4 shows the CSMF accuracy achieved by SSP in comparison to PCVA for adults, children, and neonates with and without HCE. In all cases, SSP performs substantially better and generates more accurate estimated CSMFs than PCVA on exactly the same validation datasets. Neonate results for CSMF accuracy are not

62


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 5 of 14

Table 2 Comparisons of different Symptom Pattern variants based on 100 splits for the adult, child, and neonate modules, including use of health care experience information Adult module: Cluster

Single/Multiple

Symptom

Prior

CSMF accuracy (95% uncertainty interval [UI])

Chance-corrected concordance (%) (95% UI)

10

Single

Top 40

Uniform

0.726 (0.714, 0.737)

47.8 (47.4, 48.2)

10

Single

All

Uniform

0.703 (0.687, 0.718)

45.6 (44.9, 46.3)

10

Single

Top 40

King-Lu

0.653 (0.640, 0.672)

42.6 (42.1, 43.4)

10

Single

All

King-Lu

0.311 (0.291, 0.349)

18.4 (17.4, 20.3)

10

Multiple

All

Uniform

0.714 (0.697, 0.721)

46.1 (45.7, 46.5)

10

Multiple

All

King-Lu

0.708 (0.696, 0.719)

46.0 (45.6, 46.6)

1

Single

Top 40

Uniform

0.668 (0.652, 0.681)

42.7 (42.2, 43.0)

1

Single

All

Uniform

0.632 (0.620, 0.643)

40.3 (39.8, 40.5)

1

Single

Top 40

King-Lu

0.163 (0.147, 0.212)

9.3 (8.3, 11.0)

1

Single

All

King-Lu

0.043 (0.031, 0.057)

0.4 (0.0, 0.9)

1

Multiple

All

Uniform

0.651 (0.636, 0.665)

39.2 (38.4, 39.4)

1

Multiple

All

King-Lu

0.646 (0.630, 0.664)

38.6 (38.1, 39.2)

Prior

CSMF accuracy (95% UI)

Chance-corrected concordance (%) (95% UI)

Child module: Cluster

Single/Multiple

Symptom

10

Single

Top 40

Uniform

0.718 (0.699, 0.738)

45.2 (44.4, 46.2)

10

Single

All

Uniform

0.740 (0.727, 0.757)

50.9 (50.1, 51.8)

10

Single

Top 40

King-Lu

0.633 (0.617, 0.666)

40.4 (39.2, 40.9)

10

Single

All

King-Lu

0.469 (0.453, 0.516)

36.8 (35.4, 38.0)

10

Multiple

All

Uniform

0.749 (0.736, 0.766)

51.8 (50.7, 52.9)

10

Multiple

All

King-Lu

0.759 (0.745, 0.771)

52.1 (51.5, 53.0)

1

Single

Top 40

Uniform

0.696 (0.676, 0.715)

44.4 (44.0, 45.5)

1

Single

All

Uniform

0.705 (0.692, 0.727)

46.9 (45.6, 47.5)

1

Single

Top 40

King-Lu

0.263 (0.228, 0.280)

16.6 (14.2, 17.7)

1

Single

All

King-Lu

0.125 (0.104, 0.161)

3.6 (2.3, 4.5)

1

Multiple

All

Uniform

0.716 (0.701, 0.733)

47.9 (46.5, 48.7)

1

Multiple

All

King-Lu

0.723 (0.705, 0.741)

47.9 (47.1, 48.6)

Neonate module: Cluster

Single/Multiple

Symptom

Prior

CSMF accuracy (95% UI)

Chance-corrected concordance (%) (95% UI)

10

Single

Top 40

Uniform

0.748 (0.730, 0.766)

29.7 (28.7, 30.6)

10

Single

All

Uniform

0.741 (0.720, 0.787)

31.7 (31.2, 33.0)

10

Single

Top 40

King-Lu

0.679 (0.647, 0.704)

27.9 (25.9, 28.5)

10

Single

All

King-Lu

0.603 (0.553, 0.624)

19.1 (18.0, 21.8)

10

Multiple

All

Uniform

0.732 (0.712, 0.745)

34.1 (32.8, 35.5)

10

Multiple

All

King-Lu

0.736 (0.711, 0.752)

33.6 (32.9, 35.5)

1

Single

Top 40

Uniform

0.663 (0.634, 0.691)

28.8 (27.4, 29.6)

1

Single

All

Uniform

0.604 (0.571, 0.639)

26.4 (25.2, 27.6)

1

Single

Top 40

King-Lu

0.425 (0.391, 0.462)

10.0 (9.2, 11.6)

1

Single

All

King-Lu

0.363 (0.325, 0.384)

0.0 (0.0, 0.5)

1

Multiple

All

Uniform

0.564 (0.550, 0.580)

29.5 (27.7, 30.4)

1

Multiple

All

King-Lu

0.565 (0.541, 0.591)

29.4 (27.8, 30.8)

63


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 6 of 14

regardless of the true CSMF size. It has a tendency to slightly overestimate the CSMF when the true CSMF is very small. Indeed, results from the regression show that SSP will predict a CSMF of 1.4% even if there are no actual deaths from breast cancer. The slope of the regression in addition to the scatter show, though, that beyond very small CSMFs for breast cancer, SSP will typically produce predicted CSMFs that are very close to the truth. Road traffic in Figure 5 shows a very similar relationship. Both breast cancer and road traffic are causes that also obtain a high chance-corrected concordance, suggesting a strong relationship between success at individual-level assignment and populationlevel estimates. Figure 6 shows how for epilepsy, SSP will overestimate at lower true CSMFs, but as the true fraction increases, SSP begins to underestimate. The regression results confirm this observation. The intercept of the regression for epilepsy is 0.017, indicating an estimated CSMF of 1.7% will occur even if no true epilepsy deaths exist. The slope of 0.636 and the accompanying scatter both suggest that beyond a CSMF of approximately 4%, SSP will begin to systematically underestimate the mortality fraction from epilepsy. Cervical cancer, shown in Figure 7, highlights a case where SSP more dramatically overestimates the CSMF when the true CSMF is less than approximately 9%. Beyond 9%, however, the estimations tend to be closer to truth. The RMSE for the cervical cancer regression is 0.013, twice as large as the RMSE for breast cancer, indicating a noisier range of estimates for any given true CSMF. Acute myocardial infarction in Figure 8 is another cause for which SSP systematically underestimates beyond a 5% true cause fraction, and has a RMSE of 0.008. A very similar relationship is shown for COPD in Figure 9. The RMSE in the adult results with HCE ranges from 0.003 to 0.015. In the child with HCE results, the RMSE is typically higher, ranging from 0.006 to 0.027, highlighting the noisier CSMF estimations that result from SSP’s use with child VAs. For example, Figure 10 shows the true and estimated CSMFs for hemorrhagic fever in children, which evidently produces a range of estimates for any given true CSMF. The neonate CSMF estimation is also typically less precise than the adult results, with a RMSE ranging from 0.012 to 0.056. The true and estimated CSMFs for stillbirths are shown in Figure 11 and demonstrate a cause which is essentially always subject to overestimation by SSP. Overall, the analysis of the true versus estimated relationships suggests that while systematic underestimation or overestimation beyond a certain threshold CSMF may be an intrinsic characteristic of SSP’s predictions, in many cases the trend is still predictable and precise.

Table 3 Median chance-corrected concordance (%) for SSP and PCVA, by age group with and without HCE SSP Adult Child Neonate

PCVA

Median

95% UI

Median

95% UI

No HCE

38.0

(37.8, 38.1)

29.7

(29.4, 29.8)

HCE

45.8

(45.7, 45.9)

44.6

(44.3, 44.8)

No HCE

46.8

(46.5, 47.3)

36.3

(35.9, 36.6)

HCE

51.5

(51.1, 51.9)

47.8

(47.1, 48.3)

No HCE

30.4

(30.0, 30.7)

27.6

(27.2, 28.0)

HCE

32.5

(32.0, 33.0)

33.3

(32.8, 33.7)

The median chance-corrected concordance is computed as the median across 500 splits of the mean chance-corrected concordance across causes. These results show how SSP outperforms physicians in individual cause assignment in every situation where head-to-head comparison is possible, except for the neonatal module with the HCE information added. In the neonatal module, SSP cannot be directly compared to PCVA since PCVA analysis could only be conducted for six neonate causes, while SSP can predict for 11 causes.

comparable from PCVA to SSP because the PCVA results are compiled at a six-cause level, whereas SSP is capable of producing estimates for 11 different causes. The difference in adults and children can be as large as 0.077 for children without HCE. This represents a substantial increment in performance at the population level relative to PCVA. To explore the variation by cause in SSP’s mortality fraction estimation, we modeled the estimated CSMF as a function of true CSMF. Additional file 2 shows this relationship based on the true and estimated results from 500 different test splits in the form Estimated CSMF = True CSMF × slope + intercept.

This regression allows us to observe the predicted size of any cause’s mortality fraction even if no true deaths from that cause exist in the dataset and then to determine whether SSP will tend to overestimate or underestimate if the true mortality fraction is greater than zero. Extracting the root mean square error (RMSE) allows for assessment of the range of estimated CSMFs for a given true CSMF, therefore indicating whether any over- or underestimation will be systematic and predictable. This analysis is a useful way to predict how SSP could perform in the field, particularly considering the different settings and project aims that may be focused on different disease burdens. Based on the results from this regression, we chose six causes that highlight characteristics of SSP’s predictions. Figures 4, 5, 6, 7, 8 and 9 show a comparison of estimated CSMFs and true CSMFs for these six causes: breast cancer (Figure 4), road traffic (Figure 5), epilepsy (Figure 6), cervical cancer (Figure 7), acute myocardial infarction (Figure 8), and chronic obstructive pulmonary disease (COPD) (Figure 9). Breast cancer, shown in Figure 4, exemplifies a cause for which SSP produces accurate CSMF estimates

64


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 7 of 14

Figure 1 Median chance-corrected concordance (%) across 500 Dirichlet splits, by adult cause with and without HCE.

performance has improved. This is consistent with the finding of Flaxman et al. [7] that King-Lu has poor accuracy when there are more than seven to 10 causes in the cause list. SSP performance is also enhanced by developing models for each cause, one at a time, that

Discussion These results suggest that Simplified Symptom Pattern performs better than the original version proposed by Murray et al. in 2007. In fact, by dropping the use of the King-Lu direct CSMFs as the prior in SSP,

65


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 8 of 14

Figure 2 Median chance-corrected concordance (%) across 500 Dirichlet splits, by child cause with and without HCE.

Figure 3 Median chance-corrected concordance (%) across 500 Dirichlet splits, by neonate cause with and without HCE.

66


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 9 of 14

implementations of PCVA and SSP presented in this paper may seem minimal. However, we have observed that incremental increases in CSMF accuracy in fact represent substantial improvements. The CSMF accuracy ranges from 0.624 to 0.751 across all the cases in this paper. Two methods would differ in CSMF accuracy by 10 percentage points if on average over 500 tests, one cause was misestimated to be 10 CSMF percentage points higher on average. For the purposes of studying population health, this difference is quite important. Lozano et al. [2] report that InterVA, which is also based on Bayes’ theorem, performs markedly worse than PCVA or the SSP Method in the same validation dataset. For individual cause assignment, SSP has a chancecorrected concordance for adults that is twice as high with similarly large increments in performance in children and neonates. The substantially improved performance of SSP in the same validation datasets can be easily understood by the same dimensions that have been tested in the simplification of the method. SSP can be transformed into InterVA by four steps: use a specific InterVA subset of symptoms, use a cluster size of one, estimate a model for all causes at once, and use expert judgment about the probability of a symptom conditional on a cause of death rather than empirical patterns observed in the training data. All of these choices

Table 4 Median CSMF accuracy for SSP and PCVA, by age group with and without HCE SSP Adult Child Neonate

PCVA

Median

95% UI

Median

95% UI

No HCE

0.671

(0.664, 0.676)

0.624

(0.619, 0.631)

HCE

0.710

(0.704, 0.714)

0.675

(0.669, 0.680)

No HCE

0.709

(0.700, 0.717)

0.632

(0.626, 0.642)

HCE

0.739

(0.733, 0.745)

0.682

(0.671, 0.690)

No HCE

0.748

(0.736, 0.759)

0.695

(0.682, 0.705)

HCE

0.751

(0.737, 0.764)

0.733

(0.719, 0.743)

predict whether a death is from a given cause compared to all other causes and then picking the cause with the highest posterior probability across the individual cause models. SSP is further improved by using a cluster size of 10. These simplifications have led to substantial improvement in performance. Simplified Symptom Pattern performs remarkably well both at individual cause assignment and CSMF estimation. SSP has higher than or equivalent chance-corrected concordance and CSMF accuracy than PCVA in all cases, except for the chance-corrected concordance for neonates with the inclusion of HCE information. The relative differences in performance, particularly concerning CSMF accuracy, between the various

15 10 5 0

Estimated Cause Fraction (%)

20

Breast Cancer

0

5

10

15

True Cause Fraction (%) Figure 4 True versus estimated mortality fractions for breast cancer, adult module with HCE information.

67

20


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 10 of 14

15 10 5 0

Estimated Cause Fraction (%)

20

Road Traffic

0

5

10

15

20

True Cause Fraction (%) Figure 5 True versus estimated mortality fractions for road traffic, adult module with HCE information.

15 10 5 0

Estimated Cause Fraction (%)

20

Epilepsy

0

5

10

15

True Cause Fraction (%) Figure 6 True versus estimated mortality fractions for epilepsy, adult module with HCE information.

68

20


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 11 of 14

15 10 5 0

Estimated Cause Fraction (%)

20

Cervical Cancer

0

5

10

15

20

True Cause Fraction (%) Figure 7 True versus estimated mortality fractions for cervical cancer, adult module with HCE information.

Figure 8 True versus estimated mortality fractions for acute myocardial infarction, adult module with HCE information.

69


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 12 of 14

15 10 5 0

Estimated Cause Fraction (%)

20

COPD

0

5

10

15

True Cause Fraction (%) Figure 9 True versus estimated mortality fractions for COPD, adult module with HCE information.

Figure 10 True versus estimated mortality fractions for hemorrhagic fever, child module with HCE information.

70

20


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 13 of 14

15 10 5 0

Estimated Cause Fraction (%)

20

Stillbirth

0

5

10

15

20

True Cause Fraction (%) Figure 11 True versus estimated mortality fractions for stillbirths, neonate module with HCE information.

actually make the performance of a Bayesian approach worse as demonstrated in this analysis. Lozano et al. [2] do in fact test SSP and show that one can reduce the performance of SSP by taking on these InterVA assumptions. The main practical limitation of the SSP Method is that using a symptom cluster size greater than one requires any analysis of test data to sample from a large training dataset that captures the complex patterns in symptom clusters conditional on cause. This means that SSP cannot be easily delivered to a local analyst for the assessment of a single cause of death. The computational power required to implement SSP on a single-death basis is greater than other methods, such as the Tariff Method or Random Forest Method. For analysis of large groups of deaths or for research studies, this computational power may be a reasonable trade-off given the reliable results produced by the Simplified Symptom Pattern Method. The SSP code will be trained on the full PHMRC dataset and the model will be available for use on the Internet following publication of this paper.

investigation and experimentation. The application of Bayes’ theorem to verbal autopsy responses is an intuitive approach from a statistical standpoint; however, the method may be difficult to fully comprehend by some users. Consequently, it is important for the method to be implemented on a user-friendly computational platform with the option to work with different verbal autopsy instruments. In such a setting, the Simplified Symptom Pattern Method presented in this paper can produce reliable, accurate results for both individual cause of death assignment as well as cause-specific mortality fraction estimates. The growing demand for more comprehensive cause of death data in settings without functioning health information systems could be met by further implementation of verbal autopsy surveys and the use of the Simplified Symptom Pattern Method to analyze the results.

Conclusions First developed in 2007, the Symptom Pattern Method for verbal autopsy has been subject to in-depth

Additional file 2: Slope, intercept, and RMSE from linear regression of estimated versus true CSMFs, by age group and cause with and without HCE.

Additional material Additional file 1: Median chance-corrected concordance (%) across 500 Dirichlet splits, by age group and cause with and without HCE.

71


Murray et al. Population Health Metrics 2011, 9:30 http://www.pophealthmetrics.com/content/9/1/30

Page 14 of 14

Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metr 2011, 9:27. 7. Flaxman AD, Vahdatpour A, James SL, Birnbaum JK, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:35. 8. Hastie T: Classification by pairwise coupling. Ann Statist 1998, 26:451-471. 9. James SL, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metr 2011, 9:31. 10. Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:32.

Abbreviations CCVA: computer-coded verbal autopsy; CSMF: cause-specific mortality fraction; HCE: health care experience; PCVA: physician-certified verbal autopsy; PHMRC: Population Health Metrics Research Consortium; RMSE: root mean square error; SP: Symptom Pattern; SSP: Simplified Symptom Pattern; VA: verbal autopsy Acknowledgements This research was conducted as part of the Population Health Metrics Research Consortium: Christopher J.L. Murray, Alan D. Lopez, Robert Black, Ramesh Ahuja, Said Mohd Ali, Abdullah Baqui, Lalit Dandona, Emily Dantzer, Vinita Das, Usha Dhingra, Arup Dutta, Wafaie Fawzi, Abraham D. Flaxman, Sara Gomez, Bernardo Hernandez, Rohina Joshi, Henry Kalter, Aarti Kumar, Vishwajeet Kumar, Rafael Lozano, Marilla Lucero, Saurabh Mehta, Bruce Neal, Summer Lockett Ohno, Rajendra Prasad, Devarsetty Praveen, Zul Premji, Dolores Ramírez-Villalobos, Hazel Remolador, Ian Riley, Minerva Romero, Mwanaidi Said, Diozele Sanvictores, Sunil Sazawal, Veronica Tallo. The authors would like to additionally thank Charles Atkinson for managing the PHMRC verbal autopsy database and Abraham D. Flaxman, Alireza Vahdatpour, Benjamin Campbell, and Charles Atkinson for intellectual contributions to the analysis. This work was funded by a grant from the Bill & Melinda Gates Foundation through the Grand Challenges in Global Health initiative. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. The corresponding author had full access to all data analyzed and had final responsibility for the decision to submit this original research paper for publication.

doi:10.1186/1478-7954-9-30 Cite this article as: Murray et al.: Simplified Symptom Pattern Method for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Population Health Metrics 2011 9:30.

Author details 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA. 2Department of Health Services, University of Washington, Seattle, WA, USA. 3School of Population Health, University of Queensland, Brisbane, Australia. Authors’ contributions CJLM, JKB, and SLJ conceptualized the method and algorithm. SLJ performed analyses and helped write the manuscript. MKF produced testing data. RL and ADL guided the study design and paper writing. CJLM drafted the manuscript and approved the final version. CJLM accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. All authors have read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Received: 14 April 2011 Accepted: 4 August 2011 Published: 4 August 2011 References 1. Murray CJL, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the symptom pattern method for analyzing verbal autopsy data. PLoS Med 2007, 4:e327. 2. Lozano R, Freeman MK, James SL, Campbell B, Lopez AD, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:50. 3. Byass P, Huong DL, Minh HV: A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scand J Public Health Suppl 2003, 62:32-37. 4. King G, Lu Y: Verbal Autopsy Methods with Multiple Causes of Death. Statistical Science 2008, 23:78-91. 5. Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9:28. 6. Murray CJL, Lopez AD, Black R, Ahuja R, Ali SM, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gomez S, Hernandez B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Praveen D, Premji Z, RamírezVillalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D,

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

72


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

RESEARCH

Open Access

Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies Spencer L James1, Abraham D Flaxman1 and Christopher JL Murray1* for the Population Health Metrics Research Consortium (PHMRC)

Abstract Background: Verbal autopsies provide valuable information for studying mortality patterns in populations that lack reliable vital registration data. Methods for transforming verbal autopsy results into meaningful information for health workers and policymakers, however, are often costly or complicated to use. We present a simple additive algorithm, the Tariff Method (termed Tariff), which can be used for assigning individual cause of death and for determining cause-specific mortality fractions (CSMFs) from verbal autopsy data. Methods: Tariff calculates a score, or “tariff,” for each cause, for each sign/symptom, across a pool of validated verbal autopsy data. The tariffs are summed for a given response pattern in a verbal autopsy, and this sum (score) provides the basis for predicting the cause of death in a dataset. We implemented this algorithm and evaluated the method’s predictive ability, both in terms of chance-corrected concordance at the individual cause assignment level and in terms of CSMF accuracy at the population level. The analysis was conducted separately for adult, child, and neonatal verbal autopsies across 500 pairs of train-test validation verbal autopsy data. Results: Tariff is capable of outperforming physician-certified verbal autopsy in most cases. In terms of chancecorrected concordance, the method achieves 44.5% in adults, 39% in children, and 23.9% in neonates. CSMF accuracy was 0.745 in adults, 0.709 in children, and 0.679 in neonates. Conclusions: Verbal autopsies can be an efficient means of obtaining cause of death data, and Tariff provides an intuitive, reliable method for generating individual cause assignment and CSMFs. The method is transparent and flexible and can be readily implemented by users without training in statistics or computer science. Keywords: Verbal autopsy, validation, gold standard, Tariff Method, cause of death, mortality, cause-specific mortality fractions

Background Verbal autopsies (VAs) are increasingly being used to provide information on causes of death in demographic surveillance sites (DSSs), national surveys, censuses, and sample registration schemes [1-3]. Physician-certified verbal autopsy (PCVA) is the primary method used to assign cause once VA data are collected. Several alternative expert-based algorithms [4-6], statistical methods [7-9], and computational algorithms [7] have been developed. These methods hold promise, but their

comparative performance needs to be evaluated. Largescale validation studies, such as the Population Health Metrics Research Consortium (PHMRC) [10], provide objective information on the performance of these different approaches. The main limitation to date of PCVA is the cost and feasibility of implementation. Finding and training physicians to read VAs in resource-poor settings has proven challenging, leading in some cases to long delays in the analysis of data [1,11]. In some rural areas with marked shortages of physicians, assigning the few available physicians to read VAs may have a very high opportunity cost in terms of health care delivery. Lozano et al. [12]

* Correspondence: cjlm@uw.edu 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA

© 2011 James et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

73


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Page 2 of 16

have also shown that there is a substantial idiosyncratic element to PCVA related to physician diagnostic performance. In contrast, some automated methods (whether statistical or computational in nature) have demonstrated performance similar to PCVA [7,8], but some users may be uncomfortable with the “black box” nature of these techniques. It is often very difficult for users to unpack how decisions on a cause are reached. Furthermore, the actual statistics and mechanics that form the basis for cause assignments are difficult to access and understand due to the myriad computations involved. One method, the King-Lu method, is a direct cause-specific mortality fraction (CSMF) estimation approach [13,14] that does not assign cause to specific deaths, making it even harder for a user to understand how the cause of death is being determined. Empirical methods that use the observed response pattern from VAs in a training dataset have an advantage over expert judgment-based methods in that they capture the reality that some household respondents in a VA interview may respond “yes” to some items even when they would not be considered part of the classical clinical presentation for that cause. For example, 43% of households report coughing as a symptom for patients who died from a fall, and 58% of households report a fever for patients who died from a road traffic accident. However, a limitation of many existing methods such as Simplified Symptom Pattern and Random Forest is that they may not give sufficient emphasis to pathognomonic signs and symptoms. For example, if 20% of patients dying of epilepsy report convulsions, and only 2% of nonepilepsy patients report convulsions, a statistical model will not assign this symptom as much significance as these data imply. Put another way, Bayesian methods such as InterVA and Symptom Pattern and statistical methods such as King-Lu direct CSMF estimation assume that the probability of signs and symptoms conditional on true cause is constant, but in reality it is not. There are subsets of patients who may have signs and symptoms that are extremely informative, and other subsets with less clearly defined signs/symptoms. In this paper, we propose a simple additive approach using transparent, intuitive computations based on responses to a VA instrument. Our premise is that there ought to be highly informative signs or symptoms for each cause. Our goal is to develop an approach to cause of death estimation based on reported signs and symptoms that is simple enough to be implemented in a spreadsheet so that users can follow each step of cause assignment. We illustrate the development of this approach and then use the PHMRC gold standard VA validation study dataset [10] to assess the performance of this approach compared to PCVA, which is current practice.

Methods Logic of the method

The premise behind the Tariff Method is to identify signs or symptoms collected in a VA instrument that are highly indicative of a particular cause of death. The general approach is as follows. A tariff is developed for each sign and symptom for each cause of death to reflect how informative that sign and symptom is for that cause. For a given death, based on the response pattern in the VA instrument, the tariffs are then summed yielding an item-specific tariff score for each death for each cause. The cause that claims the highest tariff score for a particular death is assigned as the predicted cause of death for that individual. The tariffs, tariff scores, and ranks are easily observable at each step, and users can readily inspect the basis for any cause decision. Based on a training dataset in which the true cause is known and a full verbal autopsy has been collected, we can compute a tariff as a function of the fraction of deaths for each variable or item that has a positive response. The tariff can be thought of as a robust estimate of how different an item response pattern is for a cause compared to other causes, formally: xij − Median xij Tarif fij = Interquartile Range xij where tariffij is the tariff for cause i, item j, xij is the fraction of VAs for which there is a positive response to deaths from cause i for item j, median(xij) is the median fraction with a positive response for item j across all causes, and interquartile range x ij is the interquartile range of positive response rates averaged across causes. Note that as defined, tariffs can be positive or negative in value. As a final step, tariffs are rounded to the nearest 0.5 to avoid overfitting and to improve predictive validity. For each death, we compute summed tariff scores for each cause: Tariff Scoreki =

w

Tarif fij xjk

j=1

where xjk is the response for death k on item j, taking on a value of 1 when the response is positive and 0 when the response is negative, and w is the number of items used for the cause prediction. It is key to note that for each death, a different tariff score is computed for each of the possible causes. In the adult module of the PHMRC study, for example, there are 46 potential causes and so there are 46 different tariff scores based on the tariffs and the response pattern for that death. For actual implementation, we use only the top 40 items

74


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Page 3 of 16

death in the test dataset. We compute chance-corrected concordance and CSMF accuracy [15] on the cause of death predictions in the test dataset to avoid in-sample analysis. Chance-corrected concordance is a sensitivity assessment that measures the method’s ability to correctly determine individual cause of death. CSMF accuracy is an index that measures a VA method’s ability to estimate a population’s cause-specific mortality fractions and is determined by calculating the sum of the absolute value of CSMF errors compared to the maximum possible error in CSMFs. Examination of the tariff score ranks can yield a second, third, etc., most likely cause of death. We also compute partial chance-corrected concordance for up to six causes [15]. We undertake separate analyses for adult, child, and neonatal deaths. It is important to note that for each train-test data split from the PHMRC study, we compute a new set of tariffs based only on that particular training set. In other words, in no case are test data used in the development of the tariff that is applied to that particular test dataset. We have repeated the development of tariffs and tariff scores using household recall of health care experience (HCE) and excluding these variables [10] in order to estimate the method’s performance in settings where access to health care is uncommon. HCE items capture any information that the respondent may know about the decedent’s experiences with health care. For example, the items “Did [name] have AIDS?” or “Did [name] have cancer?” would be considered HCE items. Text collected from the medical record is also classified as HCE information. For example, the word “malaria” might be written on the decedent’s health records and would be considered an HCE item. Based on the validation dataset collected by the PHMRC [10], we were able to estimate causes of death and evaluate the method for 34 causes for adults, 21 causes for children, and 11 causes for neonates. We compared Tariff’s performance to PCVA for the same cause lists and item sets for the adult and child results; however, PCVA produces estimates for only six neonate causes and consequently direct comparison for neonates was not possible. In order to analyze the performance of Tariff in comparison with PCVA across a variety of cause of death distributions, 500 different cause compositions based on uninformative Dirichlet sampling [10] were processed with both Tariff and PCVA. The frequency with which Tariff outperforms PCVA in both chance-corrected concordance and CSMF accuracy is then computed across these 500 population cause-specific constructs.

for each cause in terms of tariff to compute a tariff score. The set of 40 items used for each cause prediction are not mutually exclusive, though cumulatively across all cause predictions the majority of items in the PHMRC VA questionnaire are used for at least one cause prediction. Once a set of tariff scores has been obtained for a given death, the cause of death can be assigned in several ways. The easiest method is to simply assign the cause with the highest tariff score. However, some causes may have inherently higher tariffs. To address this issue, each test death’s cause-specific score is ranked in comparison to all of that cause’s scores for deaths in the training dataset, which has been resampled to have a uniform cause distribution. This ranking transformation normalizes the tariff scores and draws on the information found in the training dataset. The cause that claims the highest rank on each death being tested receives the cause assignment for that death. In repeated tests, we have found the ranking transformation improves performance and is the preferred final step for assigning cause. By making cause assignments based on rank for each individual death through the use of the training dataset, we also emulate how the method could be used for individual cause assignment in the field, since cause assignment in the field would be based on ranking a single death relative to the entire validation dataset’s tariff scores. This entire process is illustrated in Figure 1. Implementation of the Tariff Method

We use the PHMRC gold standard VA training datasets to develop tariffs and then to assess the performance of Tariff compared to PCVA. Details on the design of this multicountry study are provided elsewhere [10]. The study collected 7,836 adult, 2,075 child, and 2,631 neonatal deaths with rigorously defined clinical diagnostic and pathological criteria. For each death, the PHMRC VA instrument was applied. The resulting VA dataset consists of responses to symptoms and signs that may be expressed as dichotomous, continuous, and categorical variables. The survey instrument also included items for the interviewer to transcribe medical record text from the household and to take notes during the “open response” portion of the interview, when the respondent explains anything else that he/she feels is relevant. The text from these responses has been converted to dichotomous items. The continuous and categorical variables, such as “how long did the fever last?” were also converted to dichotomous variables. These data processing steps are described in more detail elsewhere [10]. We use the dichotomized training datasets to develop tariffs. We then compute tariff scores for each death in the test and train datasets and assign a cause of death to each

Results Tariffs

Table 1 shows selected tariffs that exemplify pathological plausibility and how certain signs/symptoms are

75


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Page 4 of 16

Full Dataset for Adult Module

Repeated 500 times Random sampling without replacement by cause 25% test dataset

75% train dataset

Resampled with replacement based on uninformative Dirichlet draw

Tariff computed for each cause, for each sign/symptom

Resampled test dataset

Tariff matrix

For each death in resampled test dataset, scores for each cause are calculated by summing tariffs (from train dataset tariff matrix) for endorsed items

For each death in train dataset, scores for each cause are calculated by summing tariffs for endorsed items

Each cause score for each death is ranked relative to the corresponding cause score distribution in the train data

Training dataset is resampled with replacement to have a uniform cause distribution

Cause assignment decided by cause with highest rank on each death

Cause-specific mortality fraction accuracy and chance-corrected concordance computed for all deaths in resampled test dataset Figure 1 Schematic diagram showing the process of making cause assignments starting with the full dataset. All steps within the boxed area are repeated 500 times.

to be highly-ranked within the cause prediction scores. The word “cancer” being written on one’s health care records has a relatively high tariff for both esophageal cancer and cervical cancer, demonstrating that it has predictive value despite being less specific than other items. It is interesting to note that approximately 50% of

strongly predictive of certain causes as compared to other causes. For example, in predicting diabetes with skin infection, the sign of an “ulcer oozing pus” has a positive response rate frequency that is 25 interquartile ranges above the median frequency for this sign across causes. This will result in any death reporting this sign

76


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Page 5 of 16

Table 1 Selected tariffs in the adult module of the PHMRC dataset Signs/Symptoms Ulcer oozed pus

Lump in the neck

Convulsions

Pain in left arm

Free text: “cancer”

Diabetes with skin infection

25

0.5

0.5

1

-0.5

Esophageal cancer

-0.5

8.5

-1.5

1

4.5

Hypertensive disorder (maternal)

-0.5

0.5

7

-0.5

-0.5

Acute myocardial infarction

0

-1

-0.5

4.5

0.5

Cervical cancer

0

0.5

-0.5

-0.5

7

Causes

Tariffs were calculated as explained in the Methods section. These particular tariffs were selected because they demonstrate how the method can be somewhat intuitive from a medical perspective. For example, an ulcer oozing pus is a plausible sign for a person who died of diabetes with skin infection, though it seems somewhat less plausible for someone dying of acute myocardial infarction.

PCVA outperforms Tariff in chance-corrected concordance for the child module both with and without health care experience information. Tariff achieves 21.6% (without HCE) and 23.9% (with HCE) chance-corrected concordance in the neonate module analysis. Neonate results between Tariff and PCVA are not directly comparable because PCVA cannot predict causes of death for all 11 neonate causes and consequently aggregates the five premature delivery causes into a single premature delivery cause. Figure 2 provides details on how well Tariff identifies the true cause as the second, third, fourth through to sixth cause in the list. For all age groups, partial chance-corrected concordance increases steadily as extra causes are considered on the list. It is important to note that partial chancecorrected concordance includes a correction factor for concordance due to chance. Tariff achieves 66% partial chance-corrected concordance if three cause assignments are made for adults, 62% for children, and 52% for neonates. Additional file 4 provides cause-specific chance-corrected concordances for Tariff. For adults, when excluding household recall of health care experience, Tariff yields median chance-corrected concordances over 50% for a number of injuries, including bite of venomous animal, breast cancer, cervical cancer, drowning, esophageal cancer, fires, homicide, maternal, other injuries, and road traffic. Addition of health care experience raises chance-corrected concordance over 50% for AIDS, asthma, and stroke. Additional file 4 also shows that in children without household recall of health care experience, median chance-corrected concordance is over 50% for falls, malaria, and measles. With HCE, the list expands to also include AIDS, bite of venomous animal, drowning, fires, road traffic, and violent death. In neonates, the best performance for Tariff is for preterm delivery and sepsis/birth asphyxia, preterm delivery with respiratory distress syndrome, congenital malformation, and stillbirth. Figures 3, 4, and 5 show visual comparisons of each cause-specific chance-corrected concordance with and without HCE for adults, children, and

maternal hypertensive disorder deaths reported convulsions, and 50% of diabetes with skin infection deaths reported ulcer oozing pus, yet these two sign-cause combinations have markedly different tariffs. This reflects how the tariff computation can capture both the strength and uniqueness of a sign/symptom in predicting a cause. These two examples have equal strength in terms of the sign/symptom-cause endorsement rate, but the sign “ulcer oozing pus” is more unique to diabetes with skin infection than convulsions are to hypertensive disorders. Additional files 1, 2, and 3 show the tariffs (derived from the full dataset) for the top 40 items based on tariff absolute value for each cause for the adult, child, and neonate modules, respectively. Validation of Tariff cause assignment Individual death assignment

Table 2 compares overall median chance-corrected concordance across 500 train-test data splits for Tariff and PCVA for adults, children, and neonates. Among adults, Tariff outperforms PCVA when health care experience is excluded and is not significantly different than PCVA when health care experience information is included.

Table 2 Median chance-corrected concordance (%) for Tariff and PCVA with 95% uncertainty interval (UI), by age group with and without HCE information Tariff Adult Child Neonate

PCVA

Median

95% UI

Median

95% UI

No HCE

34.3

(34.1, 34.5)

29.7

(29.4, 29.8)

HCE

44.5

(44.2, 44.7)

44.6

(44.3, 44.8)

No HCE

28.8

(28.4, 29.2)

36.3

(35.9, 36.6)

HCE

39.0

(38.4, 39.4)

47.8

(47.1, 48.3)

No HCE

21.6

(21.2, 22.2)

27.6

(27.2, 28.0)

HCE

23.9

(23.6, 24.4)

33.3

(32.8, 33.7)

Tariff outperforms or matches the performance of PCVA in the adult module, while PCVA outperforms Tariff in the child module. Results are not comparable for neonates since PCVA can only make cause assignments on six neonate causes, while Tariff makes cause assignments on 11 neonate causes.

77


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Page 6 of 16

Figure 2 Partial chance-corrected concordance for the adult, child, and neonate predictions for making multiple cause of death assignments for each death. Multiple assignments can be made by looking at the top-ranked causes based on the tariff scores for each cause. For a given death, for example, AIDS, TB, and pneumonia might be the three most likely causes of death, thus improving the probability that one of those causes is correct. The partial chance-corrected concordance calculation includes a correction term to compensate for the inherently higher probability of making a correct assignment when multiple causes are assigned.

11 neonate causes, it is not possible to directly compare PCVA and Tariff in accuracy. Additional file 5 shows the slope, intercept, and root mean squared error (RMSE) of regressing the estimated CSMF as a function of true CSMF for all causes across 500 test splits. We have selected four adult causes based on Additional file 5 to illustrate a range of cases where Tariff produces good to relatively poor estimates of the CSMF as a function of the true CSMF. Figure 7 shows the estimated CSMF for drowning compared to the true CSMF for drowning in adults across 500 test datasets. In general, across a wide range of true CSMFs, Tariff performs well in estimating the CSMF from this cause. This quality is further evidenced by the results from the regression. Drowning has an intercept of 1.5%, which means that even if there are no true deaths from drowning in a VA dataset, Tariff will tend to predict a CSMF of approximately 1.5%. However, the slope of 0.817 and the RMSE of 0.006 also indicate that estimations tend to track the true CSMFs fairly closely, and that estimated CSMFs will not vary widely for a given true CSMF. For breast cancer, shown in Figure 8, Tariff can accurately determine the mortality fractions in test splits with small to modest numbers of true breast cancer deaths;

neonates, respectively. These figures also highlight the value of adding HCE information and demonstrate how individual cause assignment is difficult for certain causes when HCE information is not available. For example, the important adult causes of AIDS, malaria, and TB have low concordance when HCE information is withheld, though performance does improve dramatically when HCE information is added. Similarly, chance-corrected concordance improves roughly four-fold for AIDS in the child module when HCE is added. Figure 6 shows a comparison for adults with HCE of concordance achieved with Tariff and PCVA applied to the same 500 test datasets. These results show that PCVA varies more than Tariff in chance-corrected concordance, despite their median across 500 splits being approximately the same. CSMF estimation

To estimate Tariff’s ability to accurately determine CSMFs, we predicted causes of death for 500 different test datasets with varying cause compositions. Table 3 shows that Tariff yields more accurate estimates of CSMFs than PCVA for adults and children, both with and without health care experience information. Since PCVA cannot make cause assignments on the full list of

78


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Page 7 of 16

HCE

No HCE

Other Infectious Diseases Colorectal Cancer Renal Failure Other Noncommunicable Diseases Pneumonia Stomach Cancer Lung Cancer Leukemia/Lymphomas Suicide Other Cardiovascular Diseases AIDS Malaria Cirrhosis COPD TB Asthma Diabetes Prostate Cancer Diarrhea/Dysentery Falls Poisonings Epilepsy Acute Myocardial Infarction Stroke Maternal Other Injuries Esophageal Cancer Cervical Cancer Breast Cancer Homicide Fires Road Traffic Drowning Bite of Venomous Animal 0

10

20

30 40 50 60 70 80 ChanceͲCorrected Concordance (%)

90

100

Figure 3 Median chance-corrected concordance (%) across 500 test splits, by adult cause with and without HCE.

begins to underestimate CSMFs. In this case, however, while there is still a generally good relationship between the true and estimated CSMFs, at low true CSMFs Tariff tends to overestimate the cause fraction, while at very high CSMFs, it has a slight tendency to underestimate. At the other end of the spectrum, Tariff does a poor job of estimating the population fraction of deaths due to stomach cancer, shown in Figure 10, and tends to

however, in test splits with high breast cancer mortality fractions, Tariff tends to underestimate the fraction. The results from the regression for breast cancer show that estimates are slightly less noisy than for drowning and that the method will start to systematically underestimate CSMFs beyond a true CSMF of approximately 2.5%. Figure 9 shows the same relationship for maternal, with a slightly higher threshold for when the method

79


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Page 8 of 16

HCE

No HCE

Other Infectious Diseases Other Cancers Other Cardiovascular Diseases Other Defined Causes of Child Deaths Sepsis Other Digestive Diseases AIDS Pneumonia Road Traffic Meningitis Poisonings Violent Death Encephalitis Diarrhea/Dysentery Bite of Venomous Animal Drowning Hemorrhagic Fever Fires Malaria Falls Measles 0

10

20

30 40 50 60 70 80 ChanceͲCorrected Concordance (%)

90

100

90

100

Figure 4 Median chance-corrected concordance (%) across 500 test splits, by child cause with and without HCE.

HCE

No HCE

Sepsis with Local Bacterial Infection Preterm Delivery without RDS Preterm Delivery and Birth Asphyxia Pneumonia Preterm Delivery and Sepsis Meningitis/Sepsis Birth Asphyxia Preterm Delivery with RDS Stillbirth Congenital Malformation Preterm Delivery and Sepsis/Birth Asphyxia 0

10

20 30 40 50 60 70 80 ChanceͲCorrected Concordance (%)

Figure 5 Median chance-corrected concordance (%) across 500 test splits, by neonate cause with and without HCE.

80


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Page 9 of 16

Figure 6 Chance-corrected concordance comparison scatter for 500 splits of PCVA and Tariff adult module estimations. These results included the use of HCE information.

Additional file 5). Figure 11 demonstrates how Tariff tends to overpredict measles CSMFs in populations with a smaller measles fraction. As the true measles fraction increases, however, Tariff does not systematically overor underestimate the mortality fractions to the extent seen in other causes. Furthermore, the estimates for measles CSMF in children are much noisier than other examples for adults. This quality is also evidenced by the higher RMSE of 0.019. For child sepsis, in contrast, Tariff tends to underestimate CSMFs as the true cause fraction increases. The true versus estimated sepsis CSMFs are shown in Figure 12. The RMSEs for children are higher than for adults, ranging from 0.013 for road traffic accidents to 0.033 for malaria. The neonate CSMF estimation tends to differ from the true cause fraction more frequently than for child or adult deaths. Congenital malformation, shown in Figure 13, exemplifies a cause for which Tariff can roughly determine the correct CSMF regardless of the true CSMF size. However, other neonatal causes such as preterm delivery with respiratory distress syndrome are subject to much noisier estimates, as shown in Figure

underestimate the true cause fraction above 2%. The RMSEs provide a measure of the noise or precision in each cause’s predictions. In the adult predictions including the use of HCE information, the RMSE ranged from 0.005 for maternal causes to 0.019 for other noncommunicable diseases. We performed similar analyses for the child and neonate results (full regression results also shown in Table 3 Median CSMF accuracy for Tariff and PCVA with 95% UI, by age group with and without HCE information Tariff Adult Child Neonate

PCVA

Median

95% UI

Median

95% UI

No HCE

0.695

(0.690, 0.699)

0.624

(0.619, 0.631)

HCE

0.745

(0.739, 0.753)

0.675

(0.669, 0.680)

No HCE

0.642

(0.635, 0.651)

0.632

(0.626, 0.642)

HCE

0.709

(0.704, 0.715)

0.682

(0.671, 0.690)

No HCE

0.663

(0.655, 0.671)

0.695

(0.682, 0.705)

HCE

0.679

(0.670, 0.689)

0.733

(0.719, 0.743)

PCVA results for neonates are shown at a six-cause level, since analysis was not possible at the same 11-cause level as Tariff.

81


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Estimated Cause Fraction (%) 0 5 10 15 20

Page 10 of 16

0

5

10 15 True Cause Fraction (%)

20

Estimated Cause Fraction (%) 0 5 10 15 20

Figure 7 True versus estimated mortality fractions for drowning, adult module with HCE information.

0

5

10 15 True Cause Fraction (%)

Figure 8 True versus estimated mortality fractions for breast cancer, adult module with HCE information.

82

20


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Estimated Cause Fraction (%) 0 5 10 15 20

Page 11 of 16

0

5

10 15 True Cause Fraction (%)

20

Estimated Cause Fraction (%) 0 5 10 15 20

Figure 9 True versus estimated mortality fractions for maternal causes, adult module with HCE information.

0

5

10 15 True Cause Fraction (%)

Figure 10 True versus estimated mortality fractions for stomach cancer, adult module with HCE information.

83

20


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Estimated Cause Fraction (%) 0 5 10 15 20

Page 12 of 16

0

5

10 15 True Cause Fraction (%)

20

Estimated Cause Fraction (%) 0 5 10 15 20

Figure 11 True versus estimated mortality fractions for measles, child module with HCE information.

0

5

10 15 True Cause Fraction (%)

Figure 12 True versus estimated mortality fractions for sepsis, child module with HCE information.

84

20


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Estimated Cause Fraction (%) 0 5 10 15 20

Page 13 of 16

0

5

10 15 True Cause Fraction (%)

20

Estimated Cause Fraction (%) 0 5 10 15 20

Figure 13 True versus estimated mortality fractions for congenital malformation, neonate module with HCE information.

0

5

10 15 True Cause Fraction (%)

20

Figure 14 True versus estimated mortality fractions for preterm delivery with respiratory distress syndrome, neonate module with HCE information.

85


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Page 14 of 16

based cause assignments can be made immediately after data collection in the field. One of the key strengths of Tariff is its flexibility. Each item’s tariff for a cause is computed independently from all other items. Consequently, any instrument’s verbal autopsy items that can be mapped to one of the items in the PHMRC dataset can be evaluated using Tariff. Other methods, such as Random Forest and Simplified Symptom Pattern, require the testing data to have the same item set as the data on which the model was trained. This is an important asset of Tariff because it allows users to implement the method without having to recalculate tariffs or revise the algorithm. It can essentially be used as is for any verbal autopsy instrument with overlapping items with the PHMRC instrument. Tariff does not take into account the interdependencies of signs and symptoms conditional on particular causes. It does not take into account the complex time sequence captured in open narratives, which are often used by physicians. How can such a simple algorithm be more effective than physicians? The answer may lie in the key attributes of Tariff that distinguish it from other methods: identification of items that are unusually important for different causes through computation of the tariff and the additive rather than multiplicative nature of the tariff score. The tariffs focus attention on the specific subset of items that are most strongly related to a given cause. The additive approach may make Tariff more robust to measurement error either in the train or test datasets. Because of its simplicity, we plan to make available several different platforms on which to apply Tariff. Programs in R, Stata, and Python will be available for assigning a cause for a given death or set of deaths, as well as a version of Tariff in Excel for users without training in statistics packages. Tariff will also be available in the Open Data Kit for use on the Android operating system for cell phones and tablets. We hope these tools will lead to widespread testing and application of Tariff. The full sign/symptom-cause tariff matrix will also be available for user inspection and application to other verbal autopsy diagnostic methods such as Random Forest and Simplified Symptom Pattern, which rely on tariffs to identify meaningful signs and symptoms. The tariffs can also be used to refine further verbal autopsy instruments, possibly in reducing the number of survey items, since they show which specific signs/symptoms should be included for accurately predicting certain causes of death. For example, one strategy for item reduction would be to drop items that have low tariffs for all causes and then assess the change in CSMF accuracy or chance-corrected concordance when cause assignment is undertaken with the restricted item set.

14. These results are further reflected in the corresponding coefficients and intercepts seen in Additional file 5, which allow for assessment of the relationship between true and estimated CSMFs. As for adults and children, the RMSE from these regressions indicate which causes can be estimated with greater precision, even if the estimation is systematically high or low. In the neonate results including the use of HCE information, the RMSE ranged from a low of 0.023 for stillbirths to 0.051 for preterm delivery and birth asphyxia and for preterm delivery, sepsis, and birth asphyxia.

Discussion The Tariff Method is a simple additive approach based on identifying items in a VA interview that are indicative of particular diseases. It is based on the premise that individual items or signs/symptoms should be more prominently associated with certain causes (the “signal”) compared with others (the “noise”). This simple approach performs as well as or better than PCVA for adult causes in assigning an underlying cause of death, though PCVA performs better in this comparison for child deaths. At the level of particular causes, Tariff has higher chancecorrected concordances than PCVA for 14/34 adult and 8/21 child causes. Results for neonatal deaths are not comparable due to differences in cause lists. For estimating CSMFs, Tariff performs better than PCVA for adult and child deaths in all comparisons with and without household recall of health care experience. In all comparable cases, Tariff yields higher median CSMF accuracy than PCVA. Overall, at the individual and the CSMF level, Tariff in general offers a competitive alternative to PCVA. Performance for assigning neonatal causes of death, however, is worse than for PCVA. The tariffs for each cause-item pair have already been established using Stata code, which will be available online. Using this pre-existing tariff matrix, the Tariff Method requires only multiplication and addition to make cause of death assignments for each individual death in a given dataset. Though we processed VA response data to develop our method, users need not conduct additional processing to use Tariff since our processing steps can be integrated into the code that makes cause of death assignments. The absence of a statistical model or complex computational algorithm means that the steps involved in assigning cause of death to a particular death can be completed in a spreadsheet and are readily available for user scrutiny. Further, the tariff matrix and algorithm can be implemented on a simple device such as a cell phone - the Open Data Kit research team at the University of Washington has already implemented the tariff algorithm on an Android cell phone using their Free/Libre Open-Source Survey Platform. In other words, tariff-

86


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

Page 15 of 16

Given that PCVA can be costly and time consuming, it would seem that Tariff provides an attractive alternative. Compared to the current version of InterVA [16], Tariff performs markedly better. We believe that users interested in rapid, low-cost, easy-to-understand VA methods should consider Tariff. As indicated by analysis of CSMF accuracy and true versus estimated CSMF regressions, there are certain cases where Tariff may overestimate or underestimate CSMFs for particular causes. It will be important for users of Tariff to understand these limitations, particularly for the purposes of using Tariff to better inform public health decision-making. Future research may yield new techniques to more accurately determine CSMFs based on verbal autopsy through back calculation. Tariff is also attractive to those who wish to examine the exact computation by which a verbal autopsy algorithm makes a cause of death assignment. In the future, as more gold standard deaths are collected to augment existing causes in the PHMRC dataset, or for new causes, it will be straightforward to revise existing tariffs or report tariffs for new causes. This step is particularly easy compared to other computer-automated methods, for which expansion with more causes requires revision of the algorithm itself.

Additional file 2: Top 40 signs/symptoms based on absolute value tariffs for each cause in the child module. These tariffs were calculated using the formula provided in the Methods section. Additional file 3: Top 40 signs/symptoms based on absolute value tariffs for each cause in the neonate module. These tariffs were calculated using the formula provided in the Methods section. Additional file 4: Median chance-corrected concordance (%) across 500 Dirichlet splits, by age group and cause with and without HCE. Additional file 5: Slope, intercept, and RMSE from linear regression of estimated versus true CSMFs, by age group and cause with and without HCE.

Abbreviations CSMF: cause-specific mortality fraction; HCE: health care experience; PCVA: physician-certified verbal autopsy; RMSE: root mean squared error; VA: verbal autopsy Acknowledgements This research was conducted as part of the Population Health Metrics Research Consortium: Christopher J.L. Murray, Alan D. Lopez, Robert Black, Ramesh Ahuja, Said Mohd Ali, Abdullah Baqui, Lalit Dandona, Emily Dantzer, Vinita Das, Usha Dhingra, Arup Dutta, Wafaie Fawzi, Abraham D. Flaxman, Sara Gomez, Bernardo Hernandez, Rohina Joshi, Henry Kalter, Aarti Kumar, Vishwajeet Kumar, Rafael Lozano, Marilla Lucero, Saurabh Mehta, Bruce Neal, Summer Lockett Ohno, Rajendra Prasad, Devarsetty Praveen, Zul Premji, Dolores RamĂ­rez-Villalobos, Hazel Remolador, Ian Riley, Minerva Romero, Mwanaidi Said, Diozele Sanvictores, Sunil Sazawal, Veronica Tallo. The authors would like to additionally thank Charles Atkinson for managing the PHMRC verbal autopsy database and Alireza Vahdatpour, Benjamin Campbell, Michael K. Freeman, and Charles Atkinson for intellectual contributions to the analysis. This work was funded by a grant from the Bill & Melinda Gates Foundation through the Grand Challenges in Global Health initiative. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. The corresponding author had full access to all data analyzed and had final responsibility for the decision to submit this original research paper for publication.

Conclusion Verbal autopsies are likely to become an increasingly important data collection platform in areas of the world with minimal health information infrastructure. To date, methods for evaluating verbal autopsies have either been expensive or time-consuming, as is the case with PCVA, or they have been computationally complex and difficult for users to implement in different settings. This has inhibited the widespread implementation of verbal autopsy as a tool for policymakers and health researchers. Tariff overcomes both of these challenges. The method is transparent, intuitive, and flexible, and, importantly, has undergone rigorous testing to ensure its validity in various settings through the use of the PHMRC verbal autopsy dataset. Using the method on verbal autopsies to determine both individual-level cause assignment and cause-specific mortality fractions will greatly increase the availability and utility of cause of death information for populations in which comprehensive and reliable medical certification of deaths is unlikely to be achieved for many years to come, but is urgently needed for health policies, programs, and monitoring progress with development goals.

Authors’ contributions SLJ, ADF, and CJLM conceptualized the method and algorithm. SLJ performed analyses and helped write the manuscript. CJLM drafted the manuscript and approved the final version. CJLM accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. All authors have read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Received: 14 April 2011 Accepted: 4 August 2011 Published: 4 August 2011 References 1. Fottrell E, Byass P: Verbal Autopsy: Methods in Transition. Epidemiol Rev 2010, 32:38-55. 2. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84:239-245. 3. Baiden F, Bawah A, Biai S, Binka F, Boerma T, Byass P, Chandramohan D, Chatterji S, Engmann C, Greet D, Jakob R, Kahn K, Kunii O, Lopez AD, Murray CJL, Nahlen B, Rao C, Sankoh O, Setel PW, Shibuya K, Soleman N, Wright L, Yang G: Setting international standards for verbal autopsy. Bull World Health Organ 2007, 85:570-571. 4. Byass P, Fottrell E, Dao LH, Berhane Y, Corrah T, Kahn K, Muhe L, Do DV: Refining a probabilistic model for interpreting verbal autopsy data. Scand J Public Health 2006, 34:26-31. 5. Byass P, Huong DL, Minh HV: A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scand J Public Health Suppl 2003, 62:32-37.

Additional material Additional file 1: Top 40 signs/symptoms based on absolute value tariffs for each cause in the adult module. These tariffs were calculated using the formula provided in the Methods section.

87


James et al. Population Health Metrics 2011, 9:31 http://www.pophealthmetrics.com/content/9/1/31

6.

7.

8.

9.

10.

11. 12.

13. 14.

15.

16.

Page 16 of 16

Byass P, Kahn K, Fottrell E, Collinson MA, Tollman SM: Moving from data on deaths to public health policy in Agincourt, South Africa: approaches to analysing and understanding verbal autopsy findings. PLoS Med 2010, 7:e1000325. Flaxman AD, Vahdatpour A, Green S, James SL, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:29. Murray CJL, James SL, Birnbaum JK, Freeman MK, Lozano R, Lopez AD, the Population Health Metrics Research Consortium (PHMRC): Simplified Symptom Pattern Method for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:30. Murray CJL, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the Symptom Pattern Method for Analyzing Verbal Autopsy Data. PLoS Med 2007, 4:e327. Murray CJL, Lopez AD, Black R, Ahuja R, Ali SM, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gómez S, Hernández B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Praveen D, Premji Z, RamírezVillalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D, Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metr 2011, 9:27. Gakidou E, Lopez AD: What do children die from in India today? The Lancet 2010, 376:1810-1811. Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:32. King G, Lu Y: Verbal Autopsy Methods with Multiple Causes of Death. Statistical Science 2008, 23:78-91. Flaxman AD, Vahdatpour A, James SL, Birnbaum JK, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:35. Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9:28. Lozano R, Freeman MK, James SL, Campbell B, Lopez AD, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:50.

doi:10.1186/1478-7954-9-31 Cite this article as: James et al.: Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Population Health Metrics 2011 9:31.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

88


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

RESEARCH

Open Access

Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards Rafael Lozano1*, Alan D Lopez2, Charles Atkinson1, Mohsen Naghavi1, Abraham D Flaxman1 and Christopher JL Murray1 for the Population Health Metrics Research Consortium (PHMRC)

Abstract Background: Physician review of a verbal autopsy (VA) and completion of a death certificate remains the most widely used approach for VA analysis. This study provides new evidence about the performance of physiciancertified verbal autopsy (PCVA) using defined clinical diagnostic criteria as a gold standard for a multisite sample of 12,542 VAs. The study was also designed to analyze issues related to PCVA, such as the impact of a second physician reader on the cause of death assigned, the variation in performance with and without household recall of health care experience (HCE), and the importance of local information for physicians reading VAs. Methods: The certification was performed by 24 physicians. The assignment of VA was random and blinded. Each VA was certified by one physician. Half of the VAs were reviewed by a different physician with household recall of health care experience included. The completed death certificate was processed for automated ICD-10 coding of the underlying cause of death. PCVA was compared to gold standard cause of death assignment based on strictly defined clinical diagnostic criteria that are part of the Population Health Metrics Research Consortium (PHMRC) gold standard verbal autopsy study. Results: For individual cause assignment, the overall chance-corrected concordance for PCVA against the gold standard cause of death is less than 50%, with substantial variability by cause and physician. Physicians assign the correct cause around 30% of the time without HCE, and addition of HCE improves performance in adults to 45% and slightly higher in children to 48%. Physicians estimate cause-specific mortality fractions (CSMFs) with considerable error for adults, children, and neonates. Only for neonates for a cause list of six causes with HCE is accuracy above 0.7. In all three age groups, CSMF accuracy improves when household recall of health care experience is available. Conclusions: Results show that physician coding for cause of death assignment may not be as robust as previously thought. The time and cost required to initially collect the verbal autopsies must be considered in addition to the analysis, as well as the impact of diverting physicians from servicing immediate health needs in a population to review VAs. All of these considerations highlight the importance and urgency of developing better methods to more reliably analyze past and future verbal autopsies to obtain the highest quality mortality data from populations without reliable death certification. Keywords: Verbal autopsy, cause of death certification, validation, physician review

* Correspondence: rlozano@uw.edu 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave, Suite 600, Seattle, WA 98121, USA Full list of author information is available at the end of the article Š 2011 Lozano et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

89


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 2 of 13

cause of death, is part of most VA instruments. Studies in China have already shown that physician readers of VA are strongly influenced by this household recall of health care experience [11]. When health care experience recall is included in the validation studies, performance will be exaggerated when compared to how the VA will perform in populations with little or reduced access to health care. Finally, different VA validation studies have reported a wide range of metrics of validity including cause-specific sensitivity, specificity, concordance, Cohen’s kappa, absolute CSMF errors, and relative CSMF errors, further complicating comparisons of performance [21,24,31,32]. The Population Health Metrics Research Consortium (PHMRC) has undertaken a five-year study to develop a range of new analytical methods for verbal autopsy and test these methods using data collected at six sites in four countries (Mexico, Tanzania, India, and the Philippines) [33]. The PHMRC study is unique both in terms of the size of the validation dataset (12,542 deaths in neonates, children, and adults) and the use of rigorously defined clinical diagnostic criteria for a death to be included in the study as a gold standard cause of death. The study was also designed to provide new evidence on issues related to PCVA, such as the impact of a second physician reader on the cause of death assigned, the variation in performance with and without household recall of health care experience, and the importance of local prior information for physicians reading VAs.

Background Verbal autopsy (VA) is widely used in research studies, demographic surveillance sites, and population monitoring systems [1-6]. While alternative approaches such as InterVA, the Symptom Pattern Method, and direct estimation of cause-specific mortality fractions (CSMFs) [7-13] have been used, physician review of a verbal autopsy and completion of a death certificate remains the most widely used approach for VA analysis. Physician review of VAs is based on the premise that a physician assigned the task in a given setting can correctly interpret reported signs and symptoms and occasionally household recall of health care experience (HCE) to accurately assign causes of death. Validation studies comparing physician-certified verbal autopsy (PCVA) to hospital records have shown mixed results [14-21]. The fraction of deaths where the true cause is accurately predicted has varied from 0% to 95% for different causes in these studies. PCVA can be implemented in many different ways. Some studies or population sites use the World Health Organization-recommended VA instrument [22,23] while other sites use much more abbreviated approaches with more or less emphasis on the open or free-text component of an instrument [24,25]. PCVA also varies in the degree to which physicians undertaking VA review are trained and the curriculum of the training. Operationalization differs by the number of physicians reading each VA, the methods used to adjudicate when different physicians disagree, and the procedures to map International Classification of Diseases (ICD) codes to the physician-assigned underlying cause of death [26,27]. Interpreting the available validation studies is complicated by the considerable heterogeneity across studies in these various dimensions [28,29]. Many of the existing validation studies have several other limitations. First, in principle, validation studies compare a physician-assigned cause of death to a gold standard cause of death. But all published validation studies to date have used some form of hospital-assigned cause of death or chart review of deaths in hospital as the gold standard [30]. The quality of hospital records is highly variable, as is the underlying quality of clinical diagnosis by physicians given differences in the availability of laboratory, imaging, and pathology services. The lack of clear gold standards means that validation studies are effectively a comparison of two imperfect assignments of cause of death, not a real assessment of criterion validity. Second, by design, VA validation studies analyze deaths that occurred in a hospital or had hospital visits just prior to death. Household recall of the health care experience, including whether health workers provided documentation for the cause of hospitalization or

Methods Gold standard cause of death assignment

The design, implementation, and general descriptive results for the PHMRC gold standard verbal autopsy validation study are described elsewhere [33]. Of note for this study, gold standard cause of death assignment was based on strict clinical diagnostic criteria defined prior to data collection. The study protocol defined three levels of cause of death assignment based on the diagnostic documentation: level 1, 2A, and 2B. Level 1 diagnoses are the highest level of diagnostic certainty possible for that condition, consisting of either an appropriate laboratory test or X-ray with positive findings, as well as medically observed and documented illness signs. Level 2A diagnoses are of moderate certainty, consisting of medically observed and documented illness signs. Level 2B was used in place of level 2A if medically observed and documented illness signs were not available, but records existed for treatment of a particular condition. Level 1 criteria were intended for all gold standard cases, and only if it proved impossible to gather enough cases of a particular condition was it allowable to use the level 2A or 2B definition. In addition to specific causes included

90


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 3 of 13

material and information from the death certificate was first translated into English. For each VA, the physician would read the instrument and complete a WHO standard death certificate. The completed death certificate was processed through the US Centers for Disease Control and Prevention’s Mortality Medical Data System (MMDS) software [36] for automated ICD-10 coding of the underlying cause of death. Approximately 25% of certificates were rejected by the MMDS software. These rejected certificates were sent to the National Institute of Health Sciences in Sri Lanka for manual ICD-10 coding. The ICD-10 codes were then mapped to the PHMRC cause list to allow for direct comparison to the gold standard. Figure 2 summarizes the physician review process.

in the list, residual categories include deaths that occur from other causes, clustered according to Global Burden of Disease categories to allow for a balanced distribution of residual causes in the data [34]. For the analysis in this paper, we present results pooling both level 1 and level 2 gold standard causes of death. Additional file 1 provides the number of adult, child, and neonatal deaths by cause used for the comparative analyses reported in this paper. Organization of physician review of VAs

Physician reviews of VAs were organized to allow testing of multiple hypotheses regarding PCVA. We wanted to evaluate the performance of PCVA in settings with and without access to health care services. To achieve this, each VA was read by a single physician, excluding items on household recall of HCE by the respondent. Half of the VAs were additionally reviewed by another physician chosen at random with household recall of health care experience included. Variables reflecting household recall of health care experience include knowledge of clinical diagnoses, records from hospital visits, death certificates, and the open-ended narrative response [33]. VAs excluding HCE are a proxy for how PCVA will perform in the community for deaths that have not occurred in a hospital or where the deceased did not have contact with the health care system. Figure 1 illustrates this review process. To assess whether having two readers changes the performance of VA, 10% of VAs (5% with HCE) were chosen at random within each cause for review by a second physician at the same site. When the two physicians assigned different causes of death, the VA was sent to a third reader. If all three physicians disagreed, the death was assigned as indeterminate. In this paper, we do not present the results of this substudy but note that second and third review did not improve performance and in some cases made performance worse. To assess the impact of local knowledge on reading VAs, an additional 10% of VAs (5% with HCE) were assigned to a different physician from another site in another country. Physicians in four sites were recruited to read VAs. The 24 physicians were active practitioners, Englishspeaking, and computer-literate. A three-day training course was organized and conducted by an experienced VA analyst to provide all physicians with a similar basis for their work. The training curriculum was based on a customized version of the Sample Vital Registration with Verbal Autopsy (SAVVY) manual [35]. VAs were randomly assigned to physicians. Household recall of health care experience and records were identified as direct diagnosis questions, medical records, death certificates, and open-ended responses. For reviews excluding these items, physicians were shown a PDF of the VA instrument without this information provided. For the 10% of VAs sent to another country, the open-ended

Data analysis

We have analyzed the performance of physician review using the metrics recommended by Murray et al. (2011) [37]. The analyses for neonates, children, and adults were conducted separately. The numbers of causes including residual causes of death were 34 causes for adults, 21 for children, and six for neonates. The reasons behind the decision to reduce the number of causes from the original design are explained in detail elsewhere [33]. In the case of neonates and specifically for PCVA analysis, the cause list had to be reduced to five causes of death plus stillbirths. This is because the set of causes included for the validation study of combinations of prematurity with various other conditions do not have unique ICD codes in the 10th revision [38]. For this study, underlying cause of death was assigned following the rules of the ICD for each sequence of causes of death that the physicians produced after reading the VA. For example, we aggregated in preterm delivery all deaths from five causes from the original list, such as preterm delivery without respiratory distress syndrome (RDS), preterm delivery (without RDS) and birth asphyxia, preterm delivery (with or without RDS) and sepsis, preterm delivery (without RDS) and sepsis/birth asphyxia, and preterm delivery with RDS. These more refined causes of death for neonates reflect the presence of comorbid conditions; while they have clear relevance to understanding patterns of neonatal mortality, they do not map to the ICD-10. To compute the median chance-corrected concordance and CSMF accuracy for each category, we first created 500 test datasets with true CSMF compositions drawn from an uninformative Dirichlet distribution for the relevant number of causes by sampling within each cause with replacement. For each draw, we compute chance-corrected concordance and CSMF accuracy and report the median value across the draws. We also calculated a linear regression of true and estimated CSMFs for each cause. The slope and intercept measure how

91


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 4 of 13

Verbal autopsies to be reviewed

Single Review 90%

BLINDED 100% of these reviewed with HCE blinded

Test for Cross-Cultural Reliability

Double Review 10%

COMPLETE 50% of these reviewed by a different physician with HCE included

BLINDED 100% of these reviewed by 2 different physicians with HCE blinded

A different 10% of the total set of verbal autopsies were additionally sent to other sites for single review. 100% of these reviewed with HCE blinded, 50% reviewed by another physician with HCE included.

Figure 1 Illustration of the review framework used for physician certification.

Figure 2 Diagram of the process for physician review and data analysis.

92

COMPLETE 50% of these reviewed by 2 different physicians with HCE included


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 5 of 13

cross the 50% threshold. Of note, PCVA does extremely poorly for some important causes of death such as prostate cancer, stomach cancer, leukemia/lymphoma, epilepsy, renal failure, colorectal cancer, poisonings, diabetes, asthma, and pneumonia. Addition of HCE notably improves performance for asthma and diabetes in this grouping. The same analysis in children shows that physician review does well for a number of injuries including violence, road traffic, drowning, fires, falls, and bite of a venomous animal. Falls is one case where addition of the health care experience information actually lowers chance-corrected concordance. Some major causes of death such as diarrhea/dysentery, malaria, and AIDS have intermediate levels of performance. On the other hand, pneumonia has a chance-corrected concordance below 33% with and without HCE. Somewhat surprisingly, PCVA has quite poor performance for the limited number of measles deaths in the study. Physicians do not perform better than or worse than chance for some causes such as sepsis, other cardiovascular diseases, and other digestive diseases. For the neonatal death analysis examining only a fivecause list and stillbirths, PCVA achieves chance-corrected concordance greater than 50% only for stillbirths. Chance-corrected concordance is intermediate in value for birth asphyxia and preterm delivery but very poor for congenital malformation, pneumonia, and meningitis/sepsis. Table 2 reports on the determinants of concordance using mixed-effects logistic regression. The regression controls for cause (coefficients not shown) and site/ physician, and includes independent variables for the availability of HCE, whether the review was in-site or out-of-site, and a dummy variable indicating whether the death met only level 2 gold standard criteria. Table 2 confirms the overall finding that availability of HCE makes a profound difference in the probability that a physician will assign the true cause as the underlying cause of death. The odds ratio is highest in adults and much lower in neonates, indicating that there is perhaps more useful information in health care experience for assigning adult causes than for neonates and children. For all age groups, physicians performed slightly better reviewing in-site VAs, suggesting that prior knowledge of causes of death and associated symptoms may influence their concordance, with the greatest effect in children. In adults, physicians are less likely to get the true cause correct when the diagnostic criteria only meet level 2, but the reverse is true in children. This may be explained by the fact that the same clinical history used in the absence of laboratory confirmation for some level 2 diagnoses in children are what physicians use to assign cause in a VA.

accurately the estimated cause matches the true cause, with a slope of 1 and intercept of 0 indicating a perfect match. The root mean square error (RMSE) indicates how precisely the cause is estimated, with lower RMSE values indicating greater correlation. We used random effects logistic regression to study the factors associated with physicians assigning the true cause to a death. Independent variables included fixed effects for level of gold standard diagnosis, whether the VA was reviewed at the site it was collected or a different site, and inclusion of information on the household recall of health care experience, as well as random effects for cause and physician nested by site. We also conducted a sensitivity analysis to determine if physicians assigned the correct cause of death in any of the diagnoses from the death certificate rather than as just the underlying cause itself.

Results Individual cause assignment

Table 1 shows the overall results for the performance of PCVA against the gold standard cause of death. Without household recall of health care experience, a proxy for PCVA in communities with limited access, physicians get the cause right after correcting for chance less than 30% of the time in adults and neonates, and 36% of the time in children. Providing physicians with items on health care experience and the free-text components improves performance markedly in adults to 45% and slightly higher in children to 48%. Despite the short cause list in neonates, chance-corrected concordance only increases to 33%. In all cases, PCVA has chancecorrected concordances of less than 50%. Chance-corrected concordance by cause with and without HCE is shown in Figure 3 for adults, Figure 4 for children, and Figure 5 for neonates; detailed values and uncertainty intervals are provided in Additional file 2. Physicians are able to achieve a chance-corrected concordance of 50% or greater in adults for a number of injuries (bite of a venomous animal, road traffic accidents, homicides, drowning), maternal causes, and breast cancer. When HCE is included in the VA, chance-corrected concordance increases enough so that other injuries, suicides, AIDS, acute myocardial infarction, and stroke Table 1 Median chance-corrected concordance (%) and 95% uncertainty interval [UI], by age group with and without HCE No HCE

HCE

Median

95% UI

Median

95% UI

Adults

29.7

(29.4, 29.8)

44.6

(44.3, 44.8)

Children

36.3

(35.9, 36.6)

47.8

(47.1, 48.3)

Neonates

27.6

(27.2, 28.0)

33.3

(32.8, 33.7)

93


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 6 of 13

Figure 3 Median chance-corrected concordance (%), by adult cause with and without HCE.

0.7. In all three age groups, CSMF accuracy improves when household recall of health care experience is available. A more fine-grained appreciation of how well PCVA does in estimating CSMFs is provided in Figure 7 for adult bite of a venomous animal without HCE and Figure 8 for adult bite of a venomous animal with HCE, Figure 9 for adult asthma without HCE and Figure 10 for adult asthma with HCE, Figure 11 for adult other noncommunicable diseases without HCE and Figure 12 for adult other noncommunicable diseases with HCE, and Figure 13 for child falls without HCE and Figure 14 for child falls with HCE. For selected causes with and without HCE, CSMFs as estimated through PCVA are compared to the true CSMFs in the test datasets. Figure 7 and 8 show that with or without HCE, PCVA does a reasonably good job estimating the cause fraction due to bite of a venomous animal. Even in this case, inclusion of the HCE, especially the open-ended narrative, improves CSMF estimation. Figure 9 shows that for

Figure 6 shows the odds ratio of assigning the correct cause as a function of the physician reading the VA for adult, child, and neonatal causes. For adult causes, the odds ratio for getting the true cause correct ranges from 0.65 to 1.43. For children, there is a similarly wide range across physicians and an even broader variation in performance across physicians for neonates. One physician, for example, has an odds ratio of 0.20 for neonates. This analysis demonstrates that after controlling for cause and information available on the VA, there is substantial variation in physician performance. We cannot determine the attributes of success but they most likely include training, clinical experience, and diagnostic skill. CSMF estimation

The overall accuracy of physicians in estimating CSMFs for the test set is given in Table 3. CSMF accuracy across 500 test sets shows that physicians estimate CSMFs with considerable error for adults, children, and neonates. Only for neonates with HCE is accuracy above

94


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 7 of 13

Figure 4 Median chance-corrected concordance (%), by child cause with and without HCE.

causes with accurate estimation (injuries, breast cancer, maternal, stillbirths) have a slope near 1 and intercept near 0, while causes with inaccurate estimation (sepsis, meningitis, pneumonia, asthma, and the other residual categories) have a lower slope and higher intercept. Similarly, high-correlation causes (injuries, cancers, stillbirths) have a low RMSE, and low-correlation causes (pneumonia, malaria, diarrhea/dysentery, birth asphyxia, and other residual categories) have a high RMSE. Some causes have accurate estimation and low correlation (homicide, violent death) while other causes have inaccurate estimation and high correlation (cancers, epilepsy, asthma). Physicians are better overall at estimating CSMFs for adults than for children and neonates. For nearly all causes, addition of HCE leads to more accurate CSMF estimation. Notable exceptions are diarrhea/ dysentery in adults and falls in children, for which we observed a similar decrease in chance-corrected concordance. Interestingly, addition of HCE decreases the correlation of CSMF estimation for most causes, most substantially for asthma and diabetes in adults, other

asthma without HCE, estimated CSMFs are almost always too low and do not tend to be higher when the true CSMF is higher. In contrast, adding HCE to the VA (Figure 10) yields CSMF estimates that are too high at low true CSMFs and too low at high true CSMFs. Figures 11 and 12 illustrate a systematic problem with PCVA: the tendency to assign to the residual category of other noncommunicable diseases far too many deaths. In fact, in nearly every case, the estimated CSMF is substantially higher than the true CSMF. Further, there is no correlation between the estimated and true CSMFs. Where PCVA says there are more deaths from other noncommunicable diseases compared to another population, this relationship implies there may not be more deaths in reality. Figures 13 and 14 show that, for child falls, addition of HCE actually causes both overestimation and underestimation to increase when the true CSMF is higher. Additional file 3 shows the slope, intercept, and RMSE results from the linear regression by cause. As expected,

Table 2 Mixed-effects logistic regression odds ratios (OR) and standard errors (SE), by determinant of concordance Adult

Figure 5 Median chance-corrected concordance (%), by neonate cause with and without HCE.

95

Child

Neonate

OR

SE

OR

SE

OR

SE

With HCE

2.03

0.08

1.38

0.11

1.11

0.08

In-site

1.22

0.10

1.71

0.28

1.29

0.16

Gold Standard Level 2

0.87

0.06

1.36

0.16

1.61

0.85


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 8 of 13

Figure 6 Random effect logistic regression odds ratios (OR) and standard errors (SE) by physician, of assigning the correct cause as a function of the physician reading the VA for adult, child, and neonatal causes.

and 1.2% respectively. In neonates, the partial chancecorrected concordance actually declines by 2.9%. With HCE, the change is more substantial, 4.5% and 2.3% in adults and children respectively. For neonates, as without HCE, it declines, this time by 4.6%.

infectious diseases and poisonings in children, and congenital malformation and meningitis/sepsis in neonates. Coding sensitivity

In the study protocol, following recommendations from the WHO, the physician reading the VA completes a death certificate. The final underlying cause assigned is based on processing this death certificate using MMDS software or manual coding for those rejected by the software. We studied the extent to which the physician may be assigning the true cause of death on the death certificate in one of the additional cause lines as opposed to the underlying cause, or where the other causes assigned combined with ICD rules leads to the assignment of an underlying cause that is different from the gold standard cause of death. We tested this by calculating the partial chance-corrected concordance, assigning a physician as concordant if s/he assigns the true cause of death in any of the lines of the death certificate. Partial chance-corrected concordance takes into account that, automatically by chance, physicians would assign the true cause in either the underlying or associated causes of death more often. Table 4 shows that the partial chance-corrected concordance increases in reviews without HCE in adults and children by 2.1%

Discussion When physicians review VA results for individuals who died without contact with health care services, the median chance-corrected concordance ranges from -3% to 77.6% with an average value across causes of 29.7% for adults; -5% to 89.5% with an average value of 36.3% for children; and 1.6% to 72.9% with an average value of 27.6% for neonates. This basic result is the same whether one or two physicians review the VA but is lower when physicians from other locations review the

Table 3 Median CSMF accuracy and 95% UI, by age group with and without HCE No HCE

HCE

Median

95% UI

Median

95% UI

Adults

0.624

(0.619, 0.631)

0.675

(0.669, 0.680)

Children

0.632

(0.626, 0.642)

0.682

(0.671, 0.690)

Neonates

0.695

(0.682, 0.705)

0.733

(0.719, 0.743)

Figure 7 Estimated versus true CSMFs across 500 Dirichlet splits, for adult bite of venomous animal without HCE.

96


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 9 of 13

Figure 8 Estimated versus true CSMFs across 500 Dirichlet splits, for adult bite of venomous animal with HCE.

Figure 10 Estimated versus true CSMFs across 500 Dirichlet splits, for adult asthma with HCE.

VA. Performance improves when physicians are given access to household recall of health care experience and medical records retained by the household. Both results, the improvement with HCE and the difference between physicians from within the country versus physicians from another country, highlight that a substantial component of VA diagnoses are a function not of signs and symptoms but the combination of prior epidemiological views of the physician reader and filtered information on medical records provided by the household. In other words, the validity of PCVA is highly contextual. It will perform better when respondents have more access to health care and when physicians are strongly guided by their prior beliefs on the prevalence of diseases. Performance of a VA method on estimating CSMFs is a complex function of both individual death assignment concordance and the pattern of how true negatives are larger or smaller than false positives. The median CSMF

accuracy found in this study was 0.624 without HCE and 0.675 with HCE for adults; 0.632 without HCE and 0.682 with HCE for children; and 0.695 without HCE and 0.733 with HCE for neonates. The performance of PCVA must be interpreted in light of the performance of medical certification of causes of death in a functioning vital registration system. Hernรกndez et al. (2011) [39] have found in Mexico, for example, that routine medical certification using the same gold standard deaths has a median chance-corrected concordance of 66.5% for adults, 38.5% for children, and 54.3% for neonates; and a CSMF accuracy of 0.780 for adults, 0.683 for children, and 0.756 for neonates. This is one of the few studies with comparable assessment of medical certification of death using the same methods and metrics. PCVA provides less accurate measurement than medical certification for adults but comparable results for children and neonates.

Figure 9 Estimated versus true CSMFs across 500 Dirichlet splits, for adult asthma without HCE.

Figure 11 Estimated versus true CSMFs across 500 Dirichlet splits, for adult other noncommunicable diseases without HCE.

97


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 10 of 13

Figure 12 Estimated versus true CSMFs across 500 Dirichlet splits, for adult other noncommunicable diseases with HCE.

Figure 14 Estimated versus true CSMFs across 500 Dirichlet splits, for child falls with HCE.

To many readers, the relatively modest performance of PCVA will come as a surprise. Some previously published studies [14-20] have reported substantially higher concordances compared to medical record review and quite small errors in estimated CSMFs. The less impressive performance reported here must be viewed taking into account two factors. First, in this study PCVA is being compared to a true gold standard. It is possible that the same signs and symptoms that lead to diagnoses in some facilities without laboratory tests or diagnostic imaging are those used by physicians reading a VA leading to falsely inflated performance when no gold standard is available. Second, by assessing PCVA performance estimating CSMFs across 500 test datasets, we get a much more robust assessment of performance at estimating CSMF performance, an assessment that is not simply the function of the CSMF composition in one particular test dataset.

The findings on PCVA must also be interpreted in light of the results of the sensitivity analysis. In the adult case with HCE, in 5% of the deaths, physicians assign the true cause somewhere on the death certificate but not as underlying cause. Our study is a fair assessment of the cause of death pattern yielded through PCVA using a rigorous protocol for coding causes of death. The sensitivity result, however, suggests that better training of physicians in completing the death certificate might improve performance. In this study, physicians were carefully trained in this part of the completion of a VA. The difference for children and neonates is less marked. In addition to the discrepancy in coding sensitivity, several of the physicians experienced difficulty in completing their assigned VAs due to the length of time involved in reading each VA. In some cases, VAs had to be reassigned to a different physician at the same site to ensure completion. The results of this study were conducted with 95% of the total VAs sent out for review. We present results based on a single physician review of each VA. We have as part of this broader study a substudy comparing single review and double review with adjudication of conflicting reviews. For reasons of space, we have not presented the results from that substudy here. Our overall conclusions, however, presented Table 4 Sensitivity analysis comparing partial chancecorrected concordance (%) for correct cause assignment with underlying versus all diagnoses Underlying

Figure 13 Estimated versus true CSMFs across 500 Dirichlet splits, for child falls without HCE.

98

All Diagnoses

No HCE

HCE

No HCE

HCE

Adults

29.7

44.6

31.8

49.1

Children

36.3

47.8

37.5

50.1

Neonates

27.6

33.3

24.7

28.7


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 11 of 13

in this paper on PCVA will not be affected by using only single review. In fact, we find that two readers do not improve performance over a single reader, confirming a result published for Andhra Pradesh [40]. Based on purely probability theory grounds, double review should only improve the results of VA if a single physician is more than 50% likely to get the true cause correct. Given that a single physician is less than 50% likely to get the true cause correct, there is no theoretical argument in favor of double review, nor is there empirical support in our study. Our finding that physicians vary markedly in their ability to assign the true cause controlling for cause of death, availability of HCE, and whether a physician is from the site or another location has important implications. It suggests that despite standardized training, all physicians are not equal in their ability to assign causes of death. Given that physicians vary in diagnostic skill for patients when they are alive, it should not be surprising that some physicians are better than others at reading verbal autopsies. This reality is one further challenge to implementing PCVA. The marked sensitivity of the results to the diagnostic ability of different physicians and their prior views on the prevalence of diseases suggests that more rigorous screening and training of physicians who undertake PCVA could improve the results. This highlights the major implementation challenge that many are facing: it is costly, time-consuming, and difficult to recruit and motivate physicians to read large numbers of VAs. Recruiting physicians with better diagnostic acumen and ability to accurately assign causes of death given a VA could be even more problematic. PCVA by its nature has substantially lower reproducibility than automated statistical or machine-learning methods for VA analysis.

death are available, they may have an important role in medical certification of death outside of health facilities. To our knowledge, this is the first true validation study where the performance of PCVA has been compared to a rigorously defined gold standard cause of death. Given that verbal autopsy remains the global standard for assessing causes of death and prioritizing health interventions in areas lacking complete vital registration systems, it is essential to develop analytical methods that are low-cost, quick to implement, and consistently accurate. Physician review meets none of these criteria, and yet it is still the most widely implemented method for analysis of VAs today. As a result, verbal autopsy studies that rely on physician coding for cause of death assignment may not be as robust as previously thought. The time and cost required to initially collect the verbal autopsies must be considered in addition to the analysis, as well as the impact of diverting physicians from servicing immediate health needs in a population to review VAs. All of these considerations highlight the importance and urgency of developing better methods to more reliably analyze past and future verbal autopsies to obtain the highest quality mortality data from populations without reliable death certification.

Conclusions Given the cost, implementation difficulty, and idiosyncratic nature of PCVA, what should be its role in future VA data analysis? Clearly, more rigorous standardization of questionnaire implementation, tests of diagnostic skill, and training might be able to improve concordance and perhaps increase CSMF accuracy. These efforts will likely increase costs and delays in implementation. If lowercost, more-reproducible methods can perform as well as PCVA, they would have substantial advantages for many data-collection platforms. The challenge for physicians to assign an accurate cause of death on the basis of the recall of signs, symptoms, and health care experience raises questions about the accuracy of medical certification of deaths that occur outside of a health facility. In many countries, medical certification of these deaths has the same or a more limited information basis available for the physician completing the death certificate. If alternative methods for assigning verbal autopsy causes of

Abbreviations CSMF: cause-specific mortality fraction; HCE: health care experience; ICD: International Classification of Diseases; MMDS: Mortality Medical Data System; PCVA: physician-certified verbal autopsy; PHMRC: Population Health Metrics Research Consortium; RMSE: root mean square error; SAVVY: Sample Vital Registration with Verbal Autopsy; VA: verbal autopsy; WHO: World Health Organization

Additional material Additional file 1: Number of deaths for adult, child, and neonate causes in the PHMRC study. Additional file 2: Median chance-corrected concordance (%) and 95% UI, by cause with and without HCE. Additional file 3: Slope, intercept, and RMSE from linear regression of estimated versus true CSMFs, by cause with and without HCE.

Acknowledgements This research was conducted as part of the Population Health Metrics Research Consortium: Christopher JL Murray, Alan D Lopez, Robert Black, Ramesh Ahuja, Said Mohd Ali, Abdullah Baqui, Lalit Dandona, Emily Dantzer, Vinita Das, Usha Dhingra, Arup Dutta, Wafaie Fawzi, Abraham D Flaxman, Sara Gomez, Bernardo Hernandez, Rohina Joshi, Henry Kalter, Aarti Kumar, Vishwajeet Kumar, Rafael Lozano, Marilla Lucero, Saurabh Mehta, Bruce Neal, Summer Lockett Ohno, Rajendra Prasad, Devarsetty Praveen, Zul Premji, Dolores RamĂ­rez-Villalobos, Hazel Remolador, Ian Riley, Minerva Romero, Mwanaidi Said, Diozele Sanvictores, Sunil Sazawal, Veronica Tallo. The authors would like to additionally thank Michael K Freeman, Spencer L James, Alireza Vahdatpour, and Benjamin Campbell for intellectual contributions to the analysis. This work was funded by a grant from the Bill & Melinda Gates Foundation through the Grand Challenges in Global Health initiative. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. The corresponding

99


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 12 of 13

author had full access to all data analyzed and had final responsibility for the decision to submit this original research paper for publication.

17. Kahn K, Tollman SM, Garenne M, Gear JS: Validation and application of verbal autopsies in a rural area of South Africa. Trop Med Int Health 2000, 5:824-831. 18. Setel PW, Whiting DR, Hemed Y, Chandramohan D, Wolfson LJ, Alberti KGMM, Lopez AD: Validity of verbal autopsy procedures for determining cause of death in Tanzania. Trop Med Int Health 2006, 11:681-696. 19. Quigley MA, Chandramohan D, Setel P, Binka F, Rodrigues LC: Validity of data-derived algorithms for ascertaining causes of adult death in two African sites using verbal autopsy. Trop Med Int Health 2000, 5:33-39. 20. Yang G, Rao C, Ma J, Wang L, Wan X, Dubrovsky G, Lopez AD: Validation of verbal autopsy procedures for adult deaths in China. Int J Epidemiol 2006, 35:741-748. 21. Freeman JV, Christian P, Khatry SK, Adhikari RK, LeClerq SC, Katz J, Darmstadt GL: Evaluation of neonatal verbal autopsy using physician review versus algorithm-based cause-of-death assignment in rural Nepal. Paediatr Perinat Epidemiol 2005, 19:323-331. 22. Aggarwal AK, Jain V, Kumar R: Validity of verbal autopsy for ascertaining the causes of stillbirth. Bull World Health Organ 2011, 89:31-40. 23. Baiden F, Bawah A, Biai S, Binka F, Boerma T, Byass P, Chandramohan D, Chatterji S, Engmann C, Greet D, Jakob R, Kahn K, Kunii O, Lopez AD, Murray CJL, Nahlen B, Rao C, Sankoh O, Setel PW, Shibuya K, Soleman N, Wright L, Yang G: Setting international standards for verbal autopsy. Bull World Health Organ 2007, 85:570-571. 24. Krishnan A, Kumar R, Nongkynrih B, Misra P, Srivastava R, Kapoor SK: Adult mortality surveillance by routine health workers using a short verbal autopsy tool in rural north India. Journal of Epidemiology and Community Health 2011. 25. Census of India - Vital Statistics - Sample Registration System (SRS). [http://censusindia.gov.in/Vital_Statistics/SRS/Sample_Registration_System. aspx]. 26. Engmann C, Jehan I, Ditekemena J, Garces A, Phiri M, Mazariegos M, Chomba E, Pasha O, Tshefu A, McClure EM, Thorsten V, Chakraborty H, Goldenberg RL, Bose C, Carlo WA, Wright LL: An alternative strategy for perinatal verbal autopsy coding: single versus multiple coders. Trop Med Int Health 2011, 16:18-29. 27. Morris SK, Bassani DG, Kumar R, Awasthi S, Paul VK, Jha P: Factors associated with physician agreement on verbal autopsy of over 27000 childhood deaths in India. PLoS ONE 2010, 5:e9583. 28. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84:239-245. 29. Reeves B, Quigley M: A review of data-derived methods for assigning causes of death from verbal autopsy data. Int J Epidemiol 1997, 26:1080-1089. 30. Polprasert W, Rao C, Adair T, Pattaraarchachai J, Porapakkham Y, Lopez AD: Cause-of-death ascertainment for deaths that occur outside hospitals in Thailand: application of verbal autopsy methods. Population Health Metrics 2010, 8:13. 31. Oti SO, Kyobutungi C: Verbal autopsy interpretation: a comparative analysis of the InterVA model versus physician review in determining causes of death in the Nairobi DSS. Popul Health Metr 2010, 8:21. 32. Huong DL, Minh HV, Byass P: Applying verbal autopsy to determine cause of death in rural Vietnam. Scand J Public Health Suppl 2003, 62:19-25. 33. Murray CJL, Lopez AD, Black R, Ahuja R, Ali SM, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gómez S, Hernández B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Praveen D, Premji Z, RamírezVillalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D, Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metr 2011, 9:27. 34. Murray CJL, Lopez AD: Alternative projections of mortality and disability by cause 1990-2020: Global Burden of Disease Study. Lancet 1997, 349:1498-1504. 35. Setel PW, Rao C, Hemed Y, Whiting DR, Yang G, Chandramohan D, Alberti KGMM, Lopez AD: Core verbal autopsy procedures with comparative validation results from two countries. PLoS Med 2006, 3: e268. 36. Mortality Medical Data System (MMDS) | U.S. CDC NVSS. [http://www.cdc. gov/nchs/nvss/mmds.htm].

Author details 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave, Suite 600, Seattle, WA 98121, USA. 2University of Queensland, School of Population Health, Brisbane, Australia. Authors’ contributions RL, ADL, ADF, and CJLM conceptualized and guided the study. CA performed analyses and helped write the manuscript. MN mapped cause lists and ICD codes. CJLM drafted the manuscript and approved the final version. CJLM accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. All authors have read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Received: 13 April 2011 Accepted: 4 August 2011 Published: 4 August 2011 References 1. Bang AT, Bang RA: Diagnosis of causes of childhood deaths in developing countries by verbal autopsy: suggested criteria. The SEARCH Team. Bull World Health Organ 1992, 70:499-507. 2. Losos J: Routine and sentinel surveillance methods. East Mediterr Health J 1996, 2:45-60. 3. Binka F, Ngom P, Phillips J, Adazu K, Macleod B: Assessing population dynamics in a rural African society: The Navrongo Demographic Surveillance System. J Biosoc Sci 1999, 31:375-391. 4. Cleland J: Demographic data collection in less developed countries 1946-1996. Popul Stud (Camb) 1996, 50:433-450. 5. Adjuik M, Smith T, Clark S, Todd J, Garrib A, Kinfu Y, Kahn K, Mola M, Ashraf A, Masanja H, Adazu K, Adazu U, Sacarlal J, Alam N, Marra A, Gbangou A, Mwageni E, Binka F: Cause-specific mortality rates in subSaharan Africa and Bangladesh. Bull. World Health Organ 2006, 84:181-188. 6. Gajalakshmi V, Peto R: Verbal autopsy of 80,000 adult deaths in Tamilnadu, South India. BMC Public Health 2004, 4:47. 7. Byass P, Huong DL, Minh HV: A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scand J Public Health Suppl 2003, 62:32-37. 8. Byass P, Fottrell E, Dao LH, Berhane Y, Corrah T, Kahn K, Muhe L, Do DV: Refining a probabilistic model for interpreting verbal autopsy data. Scand J Public Health 2006, 34:26-31. 9. Fottrell E, Byass P, Ouedraogo TW, Tamini C, Gbangou A, Sombié I, Högberg U, Witten KH, Bhattacharya S, Desta T, Deganus S, Tornui J, Fitzmaurice AE, Meda N, Graham WJ: Revealing the burden of maternal mortality: a probabilistic model for determining pregnancy-related causes of death from verbal autopsies. Popul Health Metr 2007, 5:1. 10. King G: Verbal Autopsy Methods with Multiple Causes of Death. Statistical Science 2008, 23:78-91. 11. Murray CJL, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the Symptom Pattern Method for Analyzing Verbal Autopsy Data. PLoS Med 2007, 4:e327. 12. Byass P, Kahn K, Fottrell E, Collinson MA, Tollman SM: Moving from data on deaths to public health policy in Agincourt, South Africa: approaches to analysing and understanding verbal autopsy findings. PLoS Med 2010, 7:e1000325. 13. King G, Lu Y, Shibuya K: Designing verbal autopsy studies. Popul Health Metr 2010, 8:19. 14. Snow RW, Armstrong JR, Forster D, Winstanley MT, Marsh VM, Newton CR, Waruiru C, Mwangi I, Winstanley PA, Marsh K: Childhood deaths in Africa: uses and limitations of verbal autopsies. Lancet 1992, 340:351-355. 15. Quigley MA, Armstrong Schellenberg JR, Snow RW: Algorithms for verbal autopsies: a validation study in Kenyan children. Bull World Health Organ 1996, 74:147-154. 16. Rodriguez L, Reyes H, Tome P, Ridaura C, Flores S, Guiscafre H: Validation of the verbal autopsy method to ascertain acute respiratory infection as cause of death. Indian J Pediatr 1998, 65:579-584.

100


Lozano et al. Population Health Metrics 2011, 9:32 http://www.pophealthmetrics.com/content/9/1/32

Page 13 of 13

37. Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9:28. 38. International Classification of Diseases (ICD) | WHO. [http://www.who.int/ classifications/icd/en/]. 39. Hernández B, Ramírez-Villalobos D, Romero M, Gómez S, Atkinson C, Lozano R: Assessing quality of medical death certification: concordance between gold standard diagnosis and underlying cause of death in selected Mexican hospitals. Popul Health Metr 2011, 9:38. 40. Joshi R, Lopez AD, MacMahon S, Reddy S, Dandona R, Dandona L, Neal B: Verbal autopsy coding: are multiple coders better than one? Bull World Health Organ 2009, 87:51-57. doi:10.1186/1478-7954-9-32 Cite this article as: Lozano et al.: Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Population Health Metrics 2011 9:32.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

101


Joshi et al. Population Health Metrics 2011, 9:33 http://www.pophealthmetrics.com/content/9/1/33

RESEARCH

Open Access

Effects on the estimated cause-specific mortality fraction of providing physician reviewers with different formats of verbal autopsy data Rohina Joshi1*, Devarsetty Praveen2, Clara Chow1 and Bruce Neal1

Abstract Background: The process of data collection and the methods used to assign the cause of death vary significantly among different verbal autopsy protocols, but there are few data to describe the consequences of the choices made. The aim of this study was to objectively define the impact of the format of data presented to physician reviewers on the cause-specific mortality fractions defined by a verbal autopsy-based mortality-surveillance system. Methods: Verbal autopsies were done by primary health care workers for all deaths between October 2006 and September 2007 in a community in rural Andhra Pradesh, India (total population about 180,162). Each questionnaire had a structured section, composed of a series of check boxes, and a free-text section, in which a narrative description of the events leading to death was recorded. For each death, a physician coder was presented first with one section and then the other in random order with a 20- to 40-day interval between. A cause of death was recorded for each data format at the level of ICD 10 chapter headings or else the death was documented as unclassified. After another 20- to 40-day interval, both the structured and free-text sections of the questionnaire were presented together and an index cause of death was assigned. Results: In all, 1,407 verbal autopsies were available for analysis, representing 94% of all deaths recorded in the population that year. An index cause of death was assigned using the combined data for 1,190 with the other 217 remaining unclassified. The observed cause-specific mortality fractions were the same regardless of whether the structured, free-text or combined data sources were used. At the individual level, the assignments made using the structured format matched the index in 1,012 (72%) of cases with a kappa statistic of 0.66. For the free-text format, the corresponding figures were 989 (70%) and 0.64. Conclusions: The format of the verbal autopsy data used to assign a cause of death did not substantively influence the pattern of mortality estimated. Substantially abbreviated and simplified verbal autopsy questionnaires might provide robust information about high-level mortality patterns. Keywords: verbal autopsy, questionnaire format, physician reviewer, mortality

Introduction Verbal autopsy methods have their origins in the 17 th century lay-reporting systems developed for monitoring epidemics [1]. Those early “death searches� centered upon an interview of the family of the deceased person with the goal of establishing whether the cause of death was attributable to the disease under investigation. During the last 60 years, verbal autopsy methods have

evolved in developing countries to allow the broader evaluation of population mortality as well as the study of specific conditions [2]. There are now more than 35 population laboratories in developing-country settings using verbal autopsy methods to track mortality patterns on an ongoing basis [3-5]. While the underlying principles behind the verbal autopsy methodologies used at these sites is the same, the process of data collection and the methods used to assign the cause of death vary significantly [6,7]. It would be anticipated that both these aspects of the

* Correspondence: rjoshi@georgeinstitute.org.au 1 The George Institute for Global Health Australia, Sydney, Australia Full list of author information is available at the end of the article

Š 2011 Joshi et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

102


Joshi et al. Population Health Metrics 2011, 9:33 http://www.pophealthmetrics.com/content/9/1/33

Page 2 of 7

and very old age groups. A quarter of the population was below 15 years of age and a tenth was above 60 years of age. The majority of the adult population were engaged in work related to agriculture or aquaculture and the average household income was 2,000 Indian rupees (US$50) a month. The literacy level of the population was 54% [11].

verbal autopsy process play an important role in the validity of the causes of death assigned, but robust quantitative evaluations of the impact of using different methods are few [8]. In regard to data collection, a typical verbal autopsy questionnaire collects information about a broad range of characteristics of the deceased including age, sex, education, and occupation, as well as information directly related to the death such as disease risk factors, signs and symptoms of illness, and healthservice utilization. Questionnaire format varies markedly with data variously collected in a structured format, free-text format, or combined format. The structured format comprises a list of check boxes and directed responses and the free-text format a narrative description of the illness that led to the death. Assignment of cause of death is usually done by physician review of these data, although expert algorithms and data-driven algorithms are increasingly widely used [6]. Once again, a broad range of different cause of death assignment processes are in use [9,10]. In this study, we sought to determine the impact of the format of the verbal autopsy data on the cause-specific mortality fractions reported for a rural community in Andhra Pradesh, India, using a verbal autopsy process based upon single-physician review.

Identification of deaths

The primary responsibility for identifying all deaths in the village lay with the MPHW. Identification of deaths by the MPHW was facilitated by her daily contact with the villagers and a network of key informants including the village headman, the “Panchayat” (village governing body responsible for registration of deaths), priests and cremation staff, and other community leaders. Data collection

For each death recorded, the MPHW attempted to visit the deceased’s household within a month of the date of death. The family member or other caregiver best able to report on the events preceding the death was identified, consent was obtained, and a systematic inquiry into the events leading up to the death was made using a verbal autopsy tool. The verbal autopsy tool used in this project had two sections. The first section was composed of a series of structured questions beginning with a filter question for each symptom group to allow skipping past more detailed questions that were not likely to be relevant to the death. The second free-text section recorded an open-ended narrative documenting the history of the illness leading to death as described by the family member. This free-text section was completed with the aid of a defined symptom list with specific inquiry about treatments, medical procedures, and associated documentation. Different questionnaires were used for deaths in each of three age groups (0 to 28 days, 29 days to less than five years, and five years onward). The questionnaires were based on validated verbal autopsy tools used by studies in China [13] and Tanzania [14] and the Registrar General of India’s Sample Registration System [15] with minor modifications to terminology made to suit local circumstances. The MPHWs were trained in data collection prior to commencement of the study and were provided with a manual of operations developed specifically for the administration of the questionnaire. Refresher training was provided every six months.

Methods This study was conducted by a research collaboration (the Andhra Pradesh Rural Health Initiative) [11] involving five Indian and Australian institutions. The data used for this analysis were collected between Oct. 1, 2006 and Sept. 30, 2007. Ethics approval for the project was received from the Ethics Committees of the CARE Foundation, Hyderabad, India; the Indian Council of Medical Research, New Delhi, India and the University of Sydney, Australia. Informed consent was obtained from each respondent prior to the collection of any data, and we sought to design and conduct the project in line with the Declaration of Helsinki and its subsequent amendments. For participants who could not read or write, the participant information sheet and consent form were explained by the Multipurpose Primary Healthcare Worker (MPHW) and a thumbprint was recorded. Population studied

This project was conducted in 45 villages in the East and West Godavari districts in Andhra Pradesh, India. The population (n = 180,162) age and sex structure was defined by a population census conducted in 2002-2003 [12]. The age distribution of the population in the villages was characteristic of populations in which fertility has decreased recently, with relatively low proportions of the population in the very young

Cause of death assignment

Cause of death assignment was done by single-physician review using validated materials and processes developed for the Registrar General of India’s Sample Registration System [4,15]. This included providing the

103


Joshi et al. Population Health Metrics 2011, 9:33 http://www.pophealthmetrics.com/content/9/1/33

Page 3 of 7

diagnosis was any other cause or unclassified. Analyses were done using SPSS version 16 [17].

physician coders, who had received specific training in verbal autopsies, a series of algorithms to facilitate the cause of death assignment process [16]. In this study, the process was modified such that the information obtained for each death was presented to the same physician coder three times, each time in a different format, with a cause of death assigned independently on each occasion. In brief, for each recorded death, the physician coder was, at random, presented first with either the structured data alone or the free-text data alone and asked to assign a cause of death. After an interval of 20 to 40 days, the data were presented again in the alternate format and a cause of death was assigned a second time. Finally, after another 20 to 40 days, the full data for the death (composed of the structured and free-text sections together) was presented and an index cause of death was assigned. On each occasion, a single underlying cause of death was selected for each individual from a restricted list of causes drawn from the 10th version of the International Classification of Disease (ICD-10).

Results During the 12 months of the study, a total of 1,497 deaths were recorded, of which 1,407 (94%) had a verbal autopsy completed. There were structured and free-text data available for all 1,407 cases for which a verbal autopsy was done. There was a slightly greater proportion of unclassified deaths (ICD chapter R00-R99) when assignment was based on the free-text data format alone (21.6%) or structured data format alone (19.3%) compared to when the two sets of data were used together (15.4%). Overall pattern of mortality

The cause-specific mortality fractions described for this population varied little with the format of the data presented to the physician coders (Figure 1). The rank order of the leading causes of death was almost identical for cause of death assignment based on the structured, free-text, and combined data formats with only small differences in the estimated cause-specific mortality fractions. While neoplasms ranked fourth in the structured and free-text format but third in the combined format, the differences in the proportions of deaths assigned to these main causes among data formats was small and may simply reflect the play of chance. While the cause-specific mortality fractions varied little with the format of the data used by the physician for cause of death assignment, there was moderate variation in the causes of death assigned to individuals after each presentation of the data (Table 1). The causes of death assigned using the structured format matched the index in 1,012/1,407 (72%) of cases with a kappa statistic of 0.66 (95% confidence interval [CI]: 0.63, 0.68). For the free-text format, the corresponding figures were 989/ 1,407 (70%) and 0.64 (95% CI: 0.61, 0.67). Further examination of the main causes of death shows that correlations between individual diagnoses were moderate or high for most causes when comparing either format to the index. Correlations were particularly good for the four leading causes (circulatory, injury, infections, and neoplasms) but appeared consistently lower for the next four leading causes (respiratory, digestive, genitourinary, and endocrine) although confidence intervals were not tight. For the remaining seven causes, there were too few cases assigned to each to enable reliable estimates of kappa coefficients. The overall correlation (0.53, 95% CI: 0.50, 0.56) was lower for the comparison of structured versus free text as were the great majority of the cause-specific estimates made for this comparison. This is unsurprising because there

Outcomes

The main outcome for this study was the proportion of deaths that were attributed to each of 15 main causes (defined at the chapter heading level in the ICD-10) or else remained unclassified. Analysis

The primary analysis was a comparison of the causespecific mortality fractions and rank order of the leading causes of death ascribed to the population using cause of death assignment based on the structured data alone, the free-text data alone, and the combined data (with the index cause based on using the two formats of data together). The proportions of deaths assigned to each main cause (or left unclassified) were presented graphically side by side in a column chart to enable a direct visual comparison of the cause-specific mortality fractions described by the structured, free-text, and combined methods. At the level of the individual case, kappa statistics and their confidence intervals were calculated to quantify the consistency of reporting of cause of death between each of the three pairs of methods. This was done in two ways - first, using the full 16 possible assignments to obtain an overall estimate of the correlation of diagnoses and, second, for each individual cause compared to all others to identify whether it was possible to conclude whether particular causes were more or less likely to have consistent diagnoses made. For the second set of analyses, an indicator variable was created in each dataset and set as 1 if the diagnosis was the cause of interest and 0 if the

104


Joshi et al. Population Health Metrics 2011, 9:33 http://www.pophealthmetrics.com/content/9/1/33

Page 4 of 7

Structured Free text Combined

Figure 1 The proportion of deaths assigned to each main underlying cause for each of the different data formats presented to the physician coders.

Table 1 Kappa statistics describing constancy of individual cause of death assignments between different data formats presented to the physician coders Cause of death (number of deaths)

Kappa statistics (95% confidence interval) Structured vs. combined format

Unstructured vs. combined format

Structured vs. unstructured

Circulatory (487)

0.71 (0.67 - 0.75)

0.70 (0.66 - 0.74)

0.57 (0.52 - 0.61)

Injury (187)

0.81 (0.76 - 0.86)

0.77 (0.72 - 0.82)

0.74 (0.68 - 0.79)

Infectious (134)

0.75 (0.69 - 0.80)

0.70 (0.63 - 0.76)

0.65 (0.58 - 0.72)

Neoplasm (139)

0.83 (0.78 - 0.88)

0.79 (0.73 - 0.84)

0.76 (0.70 - 0.82)

Respiratory (61) Digestive (43)

0.57 (0.47 - 0.67) 0.38 (0.24 - 0.51)

0.57 (0.47 - 0.68) 0.52 (0.40 - 0.65)

0.48 (0.38 - 0.58) 0.36 (0.23 - 0.50)

Genitourinary (42)

0.53 (0.40 - 0.66)

0.61 (0.48 - 0.74)

0.36 (0.22 - 0.50)

Endocrine (31)

0.53 (0.37 - 0.69)

0.55 (0.40 - 0.70)

0.31 (0.15 - 0.48)

Nervous (23)

0.61 (0.44 - 0.77)

0.63 (0.46 - 0.80)

0.37 (0.19 - 0.55)

Mental (16)

0.62 (0.42 - 0.82)

0.46 (0.22 - 0.70)

0.53 (0.30 - 0.77)

Perinatal (14)

0.83 (0.68 - 0.98)

0.66 (0.44 - 0.88)

0.48 (0.23 - 0.72)

Pregnancy# (4) Blood# (3) Congenital# (3) Skin# (3) Unclassified (217)

0.44 (0.38 - 0.50)

0.40 (0.34 - 0.46)

0.31 (0.25 - 0.37)

Overall (1407)

0.66 (0.63 - 0.68)

0.64 (0.61 - 0.67)

0.53 (0.50 - 0.56)

*A kappa statistic of 0.75 or above is generally considered to reflect a high correlation, 0.40 - 0.75 a moderate correlation, and below 0.40 a poor correlation [21]. # No kappa coefficient was calculated because there were too few deaths were recorded to allow informative estimates to be obtained.

105


Joshi et al. Population Health Metrics 2011, 9:33 http://www.pophealthmetrics.com/content/9/1/33

Page 5 of 7

and this would expand the base from which interviewers could be recruited and reduce the average cost of an interview. Structured questions also make it much easier to collect standardized data across a range of interviewers without specialization [10] and reduce the likelihood that important aspects of the events leading to death will be missed. There are also several possible disadvantages to a switch to a fully structured questionnaire. First, the questionnaire could miss important information if the scope of the questions is inadequate, and unless very long it will always miss unusual signs or symptoms associated with uncommon causes of death. This will not be an issue for the evaluation of broad mortality patterns in populations, but if verbal autopsy is required to identify particular, less-common conditions then additional specific questions might need to be added. The list of structured questions in the verbal autopsy tool used for this study was fairly exhaustive and included a number of questions about health care utilization and investigation results that are not always covered by the structured sections of verbal autopsy questionnaires. This is likely a key reason for the success with which causes of death were assigned using the structured data alone in the current study. Another potential challenge with fully structured questionnaires is that the structured format can impede the ability of the interviewer to develop a rapport with the respondent and that the structured format is at odds with the way that medical education teaches practitioners to collect and review clinical information [19]. Careful design of a structured questionnaire and the interview flow, or a switch to automated probabilistic methods of cause of death assignment would, however, address these issues. While the findings of this research are in contrast to usual practice in the conduct of verbal autopsy projects, the results are not totally unexpected, since no prior attempt has been made to address this issue in a robust quantitative design. Prior work done as part of the Andhra Pradesh Rural Health Initiative has shown that another well-established practice in verbal autopsy, the duplicate coding of cases by physicians, is no more effective than single coding [20]. These two pieces of work serve to highlight the importance of robust quantitative evaluation of all aspects of verbal autopsy design to ensure that the most effective and efficient systems are in place. The analyses done for this study were based on a fairly short cause list (chapter headings of ICD10) and this will have decreased the potential for misclassification. While the list of causes used here would be sufficient for many high-level health-planning or programmonitoring functions, a longer cause list may have increased the variability of the cause-specific mortality

was no overlap of data when assigning the causes of death for this comparison, whereas for each of the comparisons against the index cause the comparator data format was also used to help allocate the index cause of death.

Discussion The two key findings from this project are that neither the format of the data presented to the physician reviewer, nor the joint provision of data in two different formats versus a single format alone had any substantive impact on the cause-specific mortality fractions estimated for the population. The cause-specific mortality fractions obtained from the provision of each data format, or the two combined, were highly comparable in every case with no substantive differences detected for any major cause. These findings are somewhat in conflict with usual practice, with the majority of verbal autopsy programs collecting both structured and freetext data in the belief that the combined data will provide physician reviewers with important additional insight into the likely cause of death [7,8], perhaps because the two different formats of questioning are complementary to one other [10]. On the basis of the results reported here, it seems possible that simplified approaches could provide much the same information. It may be that if the cause of death cannot be assigned on the basis of one format of data alone, then the cause is so unclear that the addition of further information will assist in making a diagnosis in too few cases for it to be important. Indirectly, the results also suggest that if such profoundly different questionnaire formats can give such similar results there is probably rather little to be gained from making minor adjustments to existing questionnaires and that the real advances in verbal autopsy methodology are to be made elsewhere. The findings are important because the format of the questionnaire has a significant influence on many aspects of the verbal autopsy process. Most importantly of all, if the questionnaire need only collect data in one format, it could be substantially reduced in length, avoiding the duplication of data collection consequent upon having a structured and free-text-narrative component to an interview [18]. This would make the verbal autopsy interview both more acceptable to the respondents and more feasible for those collecting the data. The restriction of data collection to only structured questions would produce even greater dividends because complexity would be reduced. Interviewers collecting free-text-narrative data require much more extensive training and supervision and must have at least a basic understanding of disease processes and the pattern in which symptoms develop [6]. For a completely structured questionnaire, no such understanding is required

106


Joshi et al. Population Health Metrics 2011, 9:33 http://www.pophealthmetrics.com/content/9/1/33

Page 6 of 7

fractions estimated with each data format and likely would have decreased the strengths of the correlations obtained at the individual level. It is likely that our use of the same physician to make all three cause of death assignments has overestimated the correlation among the causes assigned using the different data formats because some cases may have been remembered from one cause of death assignment to the next. The alternate strategy of using different physicians to assign each cause would almost certainly have done the opposite because it would have introduced between-physician variability. We undertook the study in this way in an effort to specifically test the impact of the data format while holding all other factors constant. Nonetheless it has to be recognized that the current results are likely to be biased toward, rather than away from, congruence in the findings for the two data formats. Repeating the study using different physicians would nicely answer this question. It is also possible that the conclusions drawn here may not be fully generalizable to all other settings in which verbal autopsy is done. In part at least, the findings are likely to be dependent upon both the specifics of the methods used in this study and the true cause-specific mortality fractions in the population.

Foundation (Hyderabad, India), the Centre for Chronic Disease Control (New Delhi, India), The George Institute for Global Health (Sydney, Australia) and the School of Population Health, University of Queensland (Brisbane, Australia). We would like to thank all the Multipurpose Primary Healthcare Workers, the physician coders, the project staff, and all the respondents who participated in the study. Funding support for the India-based component of this project was provided by the Byrraju Foundation and the Wellcome Trust (grant number GR076471MF). The George Institute’s contribution to this project was made possible by an award from the George Foundation. Rohina Joshi is supported by an International Post-graduate Research Scholarship and International Post-graduate Award from the University of Sydney and Bruce Neal by an Australian Research Council Future Fellowship. Author details The George Institute for Global Health Australia, Sydney, Australia. 2The George Institute for Global Health India, Hyderabad, India. 1

Authors’ contributions RJ, CC, and BN participated in the conception, design, and coordination of the study. RJ and DP performed the statistical analysis and drafted the paper. All authors read, contributed to, and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Received: 11 March 2011 Accepted: 4 August 2011 Published: 4 August 2011 References 1. Garenne M, Fauveau V: Potential and limits of verbal autopsies. Bull World Health Organ 2006, 84(3):164-65. 2. Fauveau V: Assessing probable causes of death without registration or certificates: a new science? Bull World Health Organ 2006, 84(3):246-47. 3. Yang G, Hu J, Rao KQ, Ma J, Rao C, Lopez AD: Mortality registration and surveillance in China: history, current situation and challenges. Population Health Metrics 2005, 3(3). 4. Jha P, Gajalakshmi V, Gupta PC, Kumar R, Mony P, Dhingra N, Peto R: Prospective study of one million deaths in India: Rationale, design and validation results. PLoS Medicine 2006, 3(2):e18. 5. Bangha M, Diagne A, Bawah A, Sankoh O: Monitoring the millennium development goals: the potential role of the INDEPTH Network. Glob Health Action 2010, 13:3. 6. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84(3):239-45. 7. Joshi R, Kengne A, Neal B: Methodological trends in studies based on verbal autopsies before and after published guidelines. B World Health Organ 2009, 87:678-82. 8. Soleman N, Chandramohan D, Shibuya K: WHO Technical Consultation on Verbal Autopsy Tools. Geneva; 2005 [http://www.who.int/healthinfo/ statistics/mort_verbalautopsy.pdf]. 9. Murray CJL, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the Symptom Pattern Method for Analyzing Verbal Autopsy Data. PLoS Med 2007, 4(11):e327. 10. Fottrell E, Byass P: Verbal Autopsy - methods in transition. Epidemiol Rev 2010, 32:38-55. 11. Chow C, Cardona M, Raju P, Iyengar S, Sukumar A, Raju R, Colman S, Madhav P, Raju R, Reddy KS, Celermajer D, Neal B: Cardiovascular disease and risk factors among 345 adults in rural India - the Andhra Pradesh Rural Health Initiative. Int J Cardiol 2007, 116:180-85. 12. Joshi R, Cardona M, Iyengar S, Sukumar A, Raju R, Raju R, Raju K, Reddy KS, Lopez A, Neal B: Chronic diseases now a leading causes of death in rural India - mortality data from the Andhra Pradesh Rural Health Initiative. Int J Epidemiol 2006, 35(6):1522-29. 13. Yang G, Rao C, Ma J, Wang L, Wan X, Dubrovsky G, Lopez AD: Validation of verbal autopsy procedures for adult deaths in China. Int J Epidemiol 2006, 35(3):741-48. 14. Setel PW, Whiting DR, Hemed Y, Chandramohan D, Wolfson LJ, Alberti KG, Lopez A: Validity of verbal autopsy procedures for determining cause of death in Tanzania. Tropical Medicine & International Health 2006, 11(5):681-96.

Conclusions The results of this study suggest that the format of the verbal autopsy data used to assign a cause of death does not substantively influence the pattern of mortality estimated, and that the collection of data in both structured and free-text formats is probably unnecessary. This conclusion is supported by other work showing that the inclusion of free-text data in probabilistic cause of death assignment models had no appreciable effect. This also provides support for the notion that automated cause of death assignment processes might have considerable potential for the reliable allocation of cause of death computer-based, machine-learning techniques more easily utilize data in structured formats and the research presented here suggests that rather little information would be lost by forgoing the free-text component of data collection. These findings have substantial implications for the design and implementation of future verbal autopsy studies that seek to describe the mortality pattern in a community. The data suggest that abbreviated and simplified verbal autopsy questionnaires could provide robust information for functions such as health-service planning, program evaluation and the long-term tracking of cause-specific mortality fractions. Acknowledgements and funding The Andhra Pradesh Rural Health Initiative is a collaboration among the Byrraju Satyanarayana Raju Foundation (Hyderabad, India), the CARE

107


Joshi et al. Population Health Metrics 2011, 9:33 http://www.pophealthmetrics.com/content/9/1/33

Page 7 of 7

15. Registrar General of India, Sample Registration System Academic Partners, Centre for Global Health Research, St. Michael’s Hospital, University of Toronto. Registrar General of India Prospective Study of One Million Deaths in India: SRS verbal autopsy form. [http://www.cghr. org/project.htm]. 16. Registrar General of India, Sample Registration System Academic Partners, Centre for Global Health Research, St. Michael’s Hospital, University of Toronto. Manual for assigning causes of death from verbal autopsy. 2005 [http://www.cghr.org/ CODA_Manual_July_2006___july31_LS.pdf]. 17. SPSS [program]: Version 16.0.1 version. Chicago, IL, USA.: SPSS INC; 2007. 18. Fottrell E, Byass P, Ouedraogo TW, Tamini C, Gbangou A, Sombié I, Högberg U, Witten KH, Bhattacharya S, Desta T, Deganus S, Tornui J, Fitzmaurice AE, Meda N, Graham WJ: Revealing the burden of maternal mortality: a probabilistic model for determining pregnancy-related causes of death from verbal autopsies. Population Health Metrics 2007, 5:1. 19. Measurement of overall and cause-specific mortality in infants and children: memorandum from a WHO/UNICEF meeting. Bull World Health Org 1994, 72:707-713. 20. Joshi R, Lopez A, MacMahon S, Reddy KS, Dandona R, Dandona L, Neal B: Verbal autopsy coding - are multiple coders better than one? B World Heath Organ 2009, 87:51-7. 21. Fleiss J: Statistical methods for rates and proportions. 2 edition. New York: John Wiley; 1981. doi:10.1186/1478-7954-9-33 Cite this article as: Joshi et al.: Effects on the estimated cause-specific mortality fraction of providing physician reviewers with different formats of verbal autopsy data. Population Health Metrics 2011 9:33.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

108


Yé et al. Population Health Metrics 2011, 9:34 http://www.pophealthmetrics.com/content/9/1/34

RESEARCH

Open Access

An improved method for physician-certified verbal autopsy reduces the rate of discrepancy: experiences in the Nouna Health and Demographic Surveillance Site (NHDSS), Burkina Faso Maurice Yé1*, Eric Diboulo1, Louis Niamba1, Ali Sié1, Boubacar Coulibaly1, Cheik Bagagnan1, Jonas Dembélé1 and Heribert Ramroth2

Abstract Background: Through application of the verbal autopsy (VA) approach, trained fieldworkers collect information about the probable cause of death (COD) by using a standardized questionnaire to interview family members who were present at the time of death. The physician-certified VA (PCVA), an independent review of this questionnaire data by up to three physicians trained in VA coding, is currently recommended by the World Health Organization (WHO) and is widely used in the INDEPTH Network. Even given its appropriateness in these contexts, a large percentage of causes of death assigned by VAs remains undetermined. As physicians often do not agree upon a final COD classification, there remains substantial room to improve the standard VA method, potentially leading to a reduction in physician discordance in COD coding. Methods: We present an extension of the current method of PCVA and compare it to the standard WHOrecommended procedure. We used VA data collected in the Nouna Health and Demographic Surveillance Site (NHDSS) between 2009 and 2010 using a locally-adapted version of an INDEPTH standard verbal autopsy questionnaire. Until 2009, physicians in the NHDSS followed the WHO method (Method 1). As an extension of Method 1, starting in 2010, the use of a panel of physicians was added to the coding process in the case where a third physician’s final conclusions resulted in an undetermined COD (Method 2). Two independent samples of VA questionnaires were compared for the year 2009 (using Method 1) and the year 2010 (using Method 2). Results: The WHO-recommended method used for 2009 yielded a high level of undetermined CODs, where the final coding was “undetermined” in 50.8% of all questionnaires due to disagreement among participating physicians (Method 1). By introducing a panel of physicians in 2010 for cases where the principal physicians disagreed on the cause of death, the revised method significantly reduced the proportion of undetermined CODs to 1.5% (Method 2). Conclusions: As the extended method of PCVA significantly improved the accuracy of the VA procedure, we suggest the adoption of this method for those countries where alternatives like computer-based VA coding are not available. Based on the results of our study, further research should be pursued. Keywords: Verbal autopsy, cause of death, discrepancy, concordance, Nouna, Burkina Faso

* Correspondence: yemaure@yahoo.fr 1 Centre de Recherche en Santé de Nouna, Burkina Faso, PO BOX 02 Nouna, Burkina Faso Full list of author information is available at the end of the article © 2011 Yé et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

109


Yé et al. Population Health Metrics 2011, 9:34 http://www.pophealthmetrics.com/content/9/1/34

Page 2 of 9

intention to build on former approaches, such as those presented in the study by Joshi et al [13]. In this study we compare the usual WHO-recommended PCVA procedure with a locally-adapted method that incorporates the use of a panel of physicians after a discrepancy among three physician coders arises.

Background Verbal autopsy (VA) is a technique used to determine the cause of death by asking caregivers, friends, or family members about signs and symptoms exhibited by the deceased in the period before death. It is usually done by trained fieldworkers using a standardized questionnaire that collects details on signs, symptoms, complaints, and any medical history or events prior to death [1]. The World Health Organization (WHO) recommends the use of verbal autopsy to measure specific causes of death [2,3]. The purpose of verbal autopsy is to describe the causes of death at the community or population level where limited or no vital registration is completed with medical certificates. Indeed, medically certified cause of death data are available for less than one-third of the more than 57 million deaths occurring worldwide annually. The majority of deaths lacking such data are from developing countries [4]. Information about cause of death is essential for public health planning, priority setting, monitoring, and evaluation, but the collection of such information in countries with incomplete or no vital registration systems remains a substantial challenge [5]. Reliable data on cause-specific mortality is also needed by countries to keep track of progress toward the Millennium Development Goals [3,6]. The use of physician-certified verbal autopsies (PCVA) is common in the majority of developing countries, as well as for Health and Demographic Surveillance Sites (HDSS) that are members of International Network for the Demographic Evaluation of Populations and Their Health in Developing Countries (INDEPTH) [6]. Within the INDEPTH Network, 36 HDSS in 20 countries regularly use VA to assess cause of death [6]. However, the data collection tools are not yet harmonized, which has led to substantial variability in the coding process across sites [7,8]. Recently, there have been several attempts to introduce alternative methods such as a computer-based verbal autopsy coding method (InterVA) to replace the PCVA approach [6]. This probability-based method was tested in several settings [9-11]. However, the results still show some discrepancies in comparison to PCVA results [12]. Few studies are available on the use of different physician coding methods that produce better results. In contrast, the study by Joshi and colleagues comparing results involving multiple coders versus one single coder suggest that advantages attained from the multiple-coding system remain limited [13]. However, in this study, the approaches for cause of death assignment used either a panel of expert physicians or involved two or more physician coders who independently reviewed the data to arrive at a final diagnosis [14,15]. The method proposed in this paper was tested with the

Methods Study area

The Nouna Health and Demographic Surveillance Site (NHDSS) has existed since 1992 and is in the rural western part of Burkina Faso (Figure 1). It currently covers 58 villages and one semi-urban town and covers a population of about 85,000 inhabitants. The NHDSS is part of Kossi province, which consists primarily of a rural population of multi-ethnic groups. The predominant activity is subsistence farming and cattle keeping. The region is a dry orchard savannah and has a sub-Sahelian climate, which is characterized by a hot climate with short rainy season lasting from June to September with rainfall varying between 400 to 1000 millimeters. The vegetation is mainly scattered short trees. The mean temperature varies from 26°C to 34°C, often reaching 40°C in April, the hottest period [11]. The NHDSS is a member of the INDEPTH Network, a global network of HDSSs with the aim of conducting longitudinal health and demographic evaluation of populations in low- and middle-income countries [16]. The health facilities within the NHDSS consist of one secondary care facility (the district hospital) and 14 primary health centers. The NHDSS has been used as a sampling frame for numerous studies in the fields of clinical research, epidemiology, health economics, and health-systems research. Nouna has a functional vital event registration system, which allows collecting data continuously on pregnancies, births, deaths, and migration [17]. The VA questionnaire

The Nouna questionnaire covers background characteristics of the deceased using structured filter questions on specific signs and symptoms experienced by the deceased up to the point of death. Additionally, a narrative section provides an opportunity to describe conditions not covered in the structured questions (see Additional file 1). Although the questionnaire is written in French, interviews with the HDSS population are performed by trained fieldworkers who translate the content into local languages. The Dioula language is the most spoken local language, but several other languages are common, such as Bwamu, Moore, and Fulfulde. Verbal autopsy questionnaire data are collected every four to five months at the household level by

110


YĂŠ et al. Population Health Metrics 2011, 9:34 http://www.pophealthmetrics.com/content/9/1/34

Page 3 of 9

Figure 1 Map of NHDSS.

sought opinion from external specialists in the area of interest when required. Based on the number of available physicians, the panel consisted of three to four members. An agreement upon a given cause of death was only reached when two out of three members (66%) or three out of four members (75%) of the panel arrived at a consensus. Thus the panel coding process was more than majority-based and required that more than 50% of the panel members come up with the same cause of death. The cause was then ascribed to the final cause of death. The panel overwhelmingly agreed to classify the cause of death as undetermined if the available VA information did not allow them to make a final decision.

interviewers. They are then coded by physicians familiar with the 10th revision of the WHO International Classification of Diseases codes (ICD-10). We used the ICD-10 adopted in 1994 by the World Health Assembly. Its main use here is to classify causes of mortality as recorded at the registration of death. The ICD-10 also covers a conceptual framework of definitions, standards, and methods that have been closely linked and developed along with the classifications themselves. A restricted list based on ICD-10 has been used for the final physician coding (see Additional file 2). Physicians’ coding organization

The VA coding sessions were organized locally by gathering 12 physicians working in the district hospital with an average working experience as general practitioners of four years. One of these physicians with detailed public health background guided the coding process. All physicians had good knowledge of patient management covering the areas of general medicine, care for pediatric inpatients, care for HIV patients, and basic gynecological and obstetrical care for women. Nevertheless, the panel

Study design

This study was designed as a comparative study using two methods of PCVA to ascertain causes of death respectively on two independent samples of VA questionnaires collected in 2009 and 2010. The first sample, from 2009, was coded using the WHO-recommended method (Method 1). The second sample, from 2010, was coded using the extended method (Method 2).

111


YĂŠ et al. Population Health Metrics 2011, 9:34 http://www.pophealthmetrics.com/content/9/1/34

Page 4 of 9

system using community reporters, called community key informants (CKIs). Overall, 58 CKIs (one per village) report deaths occurring within households. Afterward, an assigned village interviewer collects information on the death. The trained field staff who visit households with a registered death have no medical background. As described above, they conduct the interview with the caregivers or relatives, translating the French VA questionnaire into the local language. The interview usually takes place several months after the event with the person who assisted the deceased before the death. Figure 3 presents the VA data collection flow chart in Nouna describing the interaction between fieldworkers and the community.

Coding methods

Method 1: As recommended by WHO [3], two experienced local physicians interpret the answers to the questionnaire and independently determine the most probable cause of death. In the case of disagreement, a third physician is consulted. The cause of death is attributed only if supported by at least two physicians using ICD-10. Method 2: In 2010, Method 1 was extended using a panel of physicians in the case of a coding discordance between referee physicians. The VA coding procedure has been combined in a stepwise process shown in Figure 2. Verbal autopsy data collection

Two key actors are generally involved in the process of VA data collection. Since the creation of Nouna HDSS, the event of death is registered in an active reporting

Quality control

Quality control is ensured by several checking mechanisms put in place at different stages of the data

Figure 2 Verbal autopsy coding procedure. Phy: physician, D: diagnosis.

112


YĂŠ et al. Population Health Metrics 2011, 9:34 http://www.pophealthmetrics.com/content/9/1/34

COMMUNITY

Page 5 of 9

SUPERVISORS

COMPUTER CENTER Filing process

Data processing

Supervisors

1

Updating of the database

2

DB report/production of deaths list

3

Updating of the DB :

HRB

Within the HDSS

CKI

Fieldworkers

VA Interviewers

List of deaths

VA forms

4

Outside the HDSS

5

a) b)

Updating forms Coding of updates

Data analysis

Pool of physicians who diagnose

Results dissemination

Coding causes of death

Figure 3 Verbal autopsy flowchart. The filing process steps are as follows: 1) Events processing for data entry; 2) Processing output of VA interviews; 3) Processing of VA forms for data entry; 4) Process followed in the case of problems reported on the VA forms; 5) Processing of VA coding for data entry; HRB: compound registration book, DB: database, CKI: community key informants.

113


YĂŠ et al. Population Health Metrics 2011, 9:34 http://www.pophealthmetrics.com/content/9/1/34

Page 6 of 9

represents a concordance rate of 49.2% for the first method. Applying the same procedure to the VA records of 2010, agreement could be achieved for only 219 records, resulting in a preliminary concordance rate of 35.6%. Involvement of the physician panel increased agreement on the final cause of death to 607 diagnoses. Thus, the latter method yielded a concordance rate of 98.5% among physician coders, given the two stages of analysis. With additional involvement of a physician panel in the case of disagreement among the principal coders, the discrepancy among physician coders could thus substantially be reduced to less than 1.5%. The results of the proportion test showed that the proportions of undetermined causes of death achieved by the two methods were significantly different (p value < 0.0001). The study shows a significant reduction in the percentage of undefined causes of deaths.

collection process. Whenever inconsistencies in collected information do not allow for a final diagnosis, a second interview is done by a field supervisor for consistency. Independently, the interview process at the household level is closely followed up by village supervisors in a random manner. At the data-entry level, attention is given to the attributed codes to reduce errors of coding. Statistical method used

The concordance rate was obtained for each method by taking the total number of VAs coded where there is agreement among physician coders over the total number of VAs coded. The proportion test for two independent samples was applied to compare the proportions of undetermined cause of death achieved using the different methods.

Results Verbal autopsy data coded

Predominant cause-specific mortality fraction

Out of 1,256 deaths collected over the study period, 640 were coded in 2009 using the first coding method (WHO), while 616 deaths were coded in 2010 using the locally-adapted method.

Our findings indicate that malaria is the leading cause of death, 37.3% in 2009 and 37.9% in 2010, of total deaths registered (Figure 4). Here the undetermined CODs are not displayed, assuming that the non coded CODs follow the same pattern as the coded CODs. Malaria is followed by pneumonia and diarrheal diseases. Figure 4 shows similar patterns in deaths using the two methods.

Agreement between physician coders

Out of 640 deaths coded in 2009 using the WHO method, there was an agreement on 315 diagnoses. This

Figure 4 Comparative cause-specific mortality fractions of Methods 1 and 2.

114


Yé et al. Population Health Metrics 2011, 9:34 http://www.pophealthmetrics.com/content/9/1/34

Page 7 of 9

limited to certain sites and the result comparison with PCVA approaches still shows some discrepancies in comparison to the PCVA results [12]. However, given that the computer-based probability approach to VA interpretation is designed to overcome the weakness of physicians’ reviews, preliminary results are promising, but are not fully convincing [9-11]. At present, the main problem in choosing an optimal method of coding for VA is that no gold standard is available and comparison among various methods remains limited. Currently, both methods might profit from comparing their results with those by the other method. For resource-poor settings, a reliable and affordable method of VA coding remains a necessity, as mortality data remain important to guide decision-makers for health planning purposes. While waiting to scale up use of the computer-based model, the improved WHO method proposed in our study could be applied as an alternative method for coding, as it offers a good rate of concordance among physician coders. Despite verbal autopsy being a useful tool in determining causes of death, the method has some limitations. Previous studies note these shortcomings in detail [2,5,9,13]. Because verbal autopsy is based on data collected through an interview process, and based on signs and symptoms exhibited, it is subject to recall bias and misreporting. Physicians have different experiences and knowledge in coding that could lead to different interpretations of the diagnosis [5,9]. While PCVA has some well-known limitations [4], the shortcomings of the tool are known and quantifiable. These deficiencies, however, should not prevent countries requiring information on causes of death from benefiting from the use of VA when no practical alternative for obtaining these data exists. Few studies are available on the use of different physician coding methods that produce better results. However, the approaches for cause of death assignment most commonly used either a panel of expert physicians or two or more physician coders who independently review the data and arrive at a diagnosis [14,15]. Despite its acknowledged limitations [13], PCVA is still considered the best possible method to get cause of death estimates in areas where vital events registrations systems are limited or not available. Cohen’s kappa as a measure of agreement couldn’t be applied, as both samples used here were independent (years 2009 and 2010) and the coding was done independently in a blind manner by different physicians. Given this constraint, we focused only on the comparison of concordance rate between the two samples. This approach to analysis allowed us to attain a simple but effective measure for the improvement of the extended coding method.

Discussion Our findings provide evidence that the choice of verbal autopsy coding method has a highly significant impact on the results of PCVA. It indicates that the improvement of an empirical method of PCVA, like the WHOrecommended method [3], through use of a physician panel in case of COD-coding disagreements, leads to a high reduction of the proportion of undetermined causes of death. Although this method of panel coding necessitates additional resources and time for physician coders as compared to the standard coding procedure described by Soleman et al [5], and especially to the single-coding procedure described by Joshi et al [13], it brings a large improvement in the existing methods determining the probable causes of death. The findings of Fottrell et al [9] support our results, as an initial agreement of 60% among two physicians was shown to increase to more than 80% when a review is done by one additional physician. Thus, our findings are in stark contrast to those of Joshi and colleagues who suggest reducing coding to one physician only [9]. We cannot exclude the fact that the quality of coding might depend on the VA questionnaire used per HDSS site, as discussed in several INDEPTH meetings over the past five years. The current VA questionnaire available through WHO/INDEPTH tries to overcome limitations of existing VA questionnaires, offering separate versions for different age groups and providing comparability over different countries. Thus, NHDSS has moved to the updated WHO/INDEPTH questionnaire in 2011. For coding, the Nouna site uses the restricted classification list suggested during the INDEPTH Meeting in Uganda in 2008, comparable with other HDSS sites. On one hand, our data possibly suggest that the new multicoding system of deaths doesn’t necessarily affect the mortality pattern, although it results in changes in the proportion of deaths within the different groups of leading causes of death. Undeniably, the suggested method is more time-consuming and costly, but it is also more efficient. However, this is the first time that such a panel discussed the questionable cases. In summary, the procedure might be especially helpful in HDSS sites where high rates of undetermined CODs are observed. The use of automated Bayesian models to assign the most likely causes of death tested by Byass [18] are currently under investigation in the Nouna HDSS. The main gains achieved from this method are a reduction in time and cost needed to complete the coding process. Additionally, as the model doesn’t involve different physicians over time or in different countries, it aims to provide comparable results within HDSS sites over time and across different HDSS sites. However, its use is still

115


Yé et al. Population Health Metrics 2011, 9:34 http://www.pophealthmetrics.com/content/9/1/34

Page 8 of 9

There is a need for further study to confirm our findings in other settings. This will have the advantage of adopting a unique method for HDSS sites within the INDEPTH Network and to some extent to other sites outside the network interested in more accurate physician-certified verbal autopsy coding methods.

Author details 1 Centre de Recherche en Santé de Nouna, Burkina Faso, PO BOX 02 Nouna, Burkina Faso. 2University of Heidelberg, Institute of Public Health, Heidelberg, Germany. Authors’ contributions MY: Participated in the study design, data analysis and interpretation, drafted the paper, and coordinated the manuscript revision process. ED: Participated in the study design, performed statistical analysis, and helped write the paper. LN: Participated in the data analysis and interpretation and helped write the paper. AS: Participated in data analysis and interpretation and helped write the paper. BC: Participated to data analysis and interpretation and helped write the paper. CB: supervised data entry, performed data analysis, and helped write the paper. JD: Supervised data collection and contributed to data acquisition, analysis, and interpretation. HR: Contributed to the study design, data analysis and interpretation, and helped draft and revise the manuscript. All authors read and approved the final manuscript.

Conclusions Verbal autopsy remains essential to capture and determine probable cause of death, especially in the context of low-income countries like Burkina Faso, where it is estimated that roughly 75% of deaths occur at home [19]. The VA process has the ability to contribute substantially by informing policymakers on real mortality data and allowing countries to monitor trends toward attainment of Millennium Development Goals, in particular those related to maternal and child health outcomes. The advantage of involving a physician panel in the coding process as suggested here is obvious, as it allows coding of an additional 50% of VAs. Importantly, this method promotes interactive discussions among physicians involved in the coding process, similar to what physicians are already doing during their clinical presentations on patients. Thus, the panel method provides a framework for scientific discussion among physicians, allowing everyone to update their knowledge. Our study presents an alternative method of PCVA that substantially reduces the proportion of undetermined causes of death and therefore contributes to the death codification. We also aim to advocate for harmonization in the PCVA process, while encouraging the validation of the computer-based method of death coding.

Competing interests The authors declare that they have no competing interests. Received: 15 March 2011 Accepted: 4 August 2011 Published: 4 August 2011 References 1. Sample Vital Registration with Verbal Autopsy. USAID, MS-07-26-VAC 2007. 2. World Health Organization: Mortality: guideline for certification and rules for coding. In International Statistical Classification of Diseases and Related Health Problems-tenth revision. Volume 2. Instruction Manual. Geneva, World Health Organization; 1993:124-38. 3. Verbal autopsy standards: ascertaining and attributing cause of death. WHO; 2007, ISBN 978 92 4 1547215 (NLM classification: WA 900). 4. World Health Organization: World health report 2004- changing history. Geneva: WHO; 2004. 5. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84(3):239-245. 6. Baiden Frank, Bawah Ayaga, Biai Sidu, Binka Fred, Boerma Ties, Byass Peter, Chandramohan Daniel, Chatterji Somnath, Engmann Cyril, Greet Dieltiens, Jakob Robet, Kahn Kathleen, Kunii Osamu, Alan DLopez, Christopher J, Murray L, Nahlen Bernard, Rao Chalapati, Sankoh Osman, Philip WSetel, Shibuya Kenji, Soleman Nadia, Wright Linda, Yang Gonghuan: Setting international standards for verbal autopsy. Bulletin of the World Health Organization 2007, 85(8). 7. Bang AT, Bang RA: Diagnosis of causes of childhood deaths in developing countries by verbal autopsy: suggested criteria. Bull World Health Organ 1992, 70:499-507. 8. World Health Organization: WHO technical consultation on verbal autopsy tools. Geneva: WHO; 2005. 9. Fottrell E, Byass P, Ouédraogo WT, Tamini C, Gbangou A, Sombié I, Högberg U, et al: Revealing the burden of maternal mortality: a probabilistic model for determining pregnancy-related causes of death from verbal autopsies. Population Health Metrics 2007, 5, 1, BioMed Central. 10. Fantahun M, Fottrel E, Berhane Y, Wall S, Högberg U, Byass P: Assesing a new approach to verbal autopsy interpretation in arural Ethipian community: the InterVA model. Bulletin of the World Health Organization 2006, 84:204-210. 11. Byass P, Huong DL, Minh HV: A probabilistic approach to interpreting verbal autopsy: methodology and preliminary validation in Vietnam. Scandinavian Journal of Public Health 2003, 31:32-37. 12. Oti SO, Kyobutungi C: Verbal autopsy interpretation: a comparative analysis of the InterVA model versus physician review in determining causes of death in the Nairobi DSS Population. Population Health Metrics 2010, 8:21. 13. Joshi R, Lopez AD, MacMahon S, Reddy S, Dandona R, Dandona L, Neal B: Verbal autopsy coding: are multiple coders better than one? In Bulletin of the World Health Organisation. Volume 87. Bulletin de l’Organisation Mondiale de la Santé; 2009:(1):51.

Additional material Additional file 1: The VA questionnaire used in the NHDSS. Additional file 2: The restricted cause of death list based on ICD-10 used for the final physician coding.

List of abbreviations CKI: community key informants; COD: cause of death; ICD: International Classification of Diseases; INDEPTH: International Network for the Demographic Evaluation of Populations and Their Health in Developing Countries; NHDSS: Nouna Health and Demographic Surveillance Site; PCVA: physician-certified verbal autopsy; VA: verbal autopsy; WHO: World Health Organization Acknowledgements We acknowledge Dr. Markus Elsner, editor at Nature Biotechnology, for his advice on the manuscript structure. We are grateful to the physicians in Nouna who coded the VA data and to the interviewers and village key informants who helped to collect the VA data at the household level. We wish also to thank Mr. Jake Robyn, PhD student from Harvard University, who revised the English spelling and grammar of the manuscript.

116


Yé et al. Population Health Metrics 2011, 9:34 http://www.pophealthmetrics.com/content/9/1/34

Page 9 of 9

14. Jha P, Gajalakshmi V, Gupta PC, Kumar R, Mony P, Dhinga N, et al: Prospective study of one million deaths in India: rationale, design and validation results. PLoS Med 2006, 3:e18, PMID:16354108. 15. Garenne M, Fauveau V: Potential and limits of verbal autopsies. Bull World Health Organ 2006, 84:164-5, PMID:16583068. 16. The International Network for the Demographic Evaluation of Populations and their Health. [http://www.indepth-network.org], Site visited on 22 April 2011. 17. Sié A, Louis V, Gbangou A, Müller O, Niamba L, Stieglbauer G, Yé M, Kouyaté B, Sauerborn R, Becher H: Global Health Action 2010, 3:5284. 18. Byass P, Fottrell E, et al: “Refining a probabilistic model for interpreting verbal autopsy data”. Scandinavian Journal of Public Health 2006, 34(1):26-31. 19. Centre de recherché en Santé de Nouna: Rapport annuel 2010 du système de surveillance démographique et de santé. Rapport périodique N°2 Janvier; 2011. doi:10.1186/1478-7954-9-34 Cite this article as: Yé et al.: An improved method for physiciancertified verbal autopsy reduces the rate of discrepancy: experiences in the Nouna Health and Demographic Surveillance Site (NHDSS), Burkina Faso. Population Health Metrics 2011 9:34.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

117


Flaxman et al. Population Health Metrics 2011, 9:35 http://www.pophealthmetrics.com/content/9/1/35

RESEARCH

Open Access

Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards Abraham D Flaxman1*, Alireza Vahdatpour1, Spencer L James1, Jeanette K Birnbaum2 and Christopher JL Murray1 for the Population Health Metrics Research Consortium (PHMRC)

Abstract Background: Verbal autopsy (VA) is used to estimate the causes of death in areas with incomplete vital registration systems. The King and Lu method (KL) for direct estimation of cause-specific mortality fractions (CSMFs) from VA studies is an analysis technique that estimates CSMFs in a population without predicting individual-level cause of death as an intermediate step. In previous studies, KL has shown promise as an alternative to physiciancertified verbal autopsy (PCVA). However, it has previously been impossible to validate KL with a large dataset of VAs for which the underlying cause of death is known to meet rigorous clinical diagnostic criteria. Methods: We applied the KL method to adult, child, and neonatal VA datasets from the Population Health Metrics Research Consortium gold standard verbal autopsy validation study, a multisite sample of 12,542 VAs where gold standard cause of death was established using strict clinical diagnostic criteria. To emulate real-world populations with varying CSMFs, we evaluated the KL estimations for 500 different test datasets of varying cause distribution. We assessed the quality of these estimates in terms of CSMF accuracy as well as linear regression and compared this with the results of PCVA. Results: KL performance is similar to PCVA in terms of CSMF accuracy, attaining values of 0.669, 0.698, and 0.795 for adult, child, and neonatal age groups, respectively, when health care experience (HCE) items were included. We found that the length of the cause list has a dramatic effect on KL estimation quality, with CSMF accuracy decreasing substantially as the length of the cause list increases. We found that KL is not reliant on HCE the way PCVA is, and without HCE, KL outperforms PCVA for all age groups. Conclusions: Like all computer methods for VA analysis, KL is faster and cheaper than PCVA. Since it is a direct estimation technique, though, it does not produce individual-level predictions. KL estimates are of similar quality to PCVA and slightly better in most cases. Compared to other recently developed methods, however, KL would only be the preferred technique when the cause list is short and individual-level predictions are not needed. Keywords: Verbal autopsy, cause of death certification, validation, direct estimation

* Correspondence: abie@uw.edu 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA Full list of author information is available at the end of the article Š 2011 Flaxman et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

118


Flaxman et al. Population Health Metrics 2011, 9:35 http://www.pophealthmetrics.com/content/9/1/35

Page 2 of 10

VA instruments of items with high sensitivity or specificity for particular causes. They argue the approach provides an efficient, low-cost approach for estimating CSMFs and they derive analytical strategies for choosing symptoms from an instrument that will optimize performance. At least two studies have taken the KL method and applied it to real-world verbal autopsy datasets [3,4]. Despite the impressive results with small errors in CSMFs reported by King and Lu, there are several outstanding issues that need to be understood before widespread adoption of the method. First, King and Lu report in repeated experiments the expected value of the CSMF produced by their method compared to the true CSMFs using test and train datasets. They do not report a metric of the average error in CSMFs across repeated experiments, leaving it unclear how well the method will work in a given real-world application. Second, in all of the cases that they report, the CSMF composition of the train and test datasets are either identical or very close to each other. The performance of the KL method when the CSMF composition of the training set is different than the test dataset has not been established. Third, the validation data reported by King and Lu pertain to relatively short cause lists of length 11 and 13, respectively. The performance of the KL method for the longer cause lists desired in most VA studies has not yet been established. Fourth, until recently [5] there have been no standardized metrics to compare the performance of different VA methods for the estimation of CSMFs, limiting the comparison of KL to other methods such as PCVA, InterVA, Symptom Pattern, or others [6-8]. In this paper we present the results of a validation study of the KL method, using a large dataset with a realistically diverse cause list collected in the Population Health Metrics Research Consortium (PHMRC) gold standard verbal autopsy validation study [9]. The study was undertaken to develop a range of new analytical methods for verbal autopsy and to test these methods using data collected in six sites in four countries (Mexico, Tanzania, India, and the Philippines). The study is unique, both in terms of the size of the validation dataset (7,836, 2,075, and 2,631 deaths in adults, children, and neonates respectively) and the use of rigorously defined clinical diagnostic criteria for a death to be included in the study as a gold standard cause of death. The dataset collected through the PHMRC is sufficiently large to be able to explore the relationship between CSMF errors by cause and overall CSMF accuracy and the size of training and test datasets.

Background In settings where a non-negligible proportion of the population dies outside of the hospital system, verbal autopsies (VAs) are emerging as a vital tool for understanding the population-level patterns of cause-specific mortality fractions (CSMFs). By combining this with robust information on levels of age-specific all-cause mortality (also collected through household surveys, e.g., of sibling survivorship), it is possible to estimate ageand cause-specific mortality rates. Most population-level estimates derived from VAs are created in two phases, by first assigning a cause or several causes to each death and then calculating CSMFs from the number of deaths or partial deaths assigned to each cause. Direct estimation is an alternative approach that produces population-level estimates of CSMFs directly from the VAs without the intermediate stage that requires assigning deaths to each VA. The direct estimation method proposed by King and Lu (which we will call the KL method) is designed to capture complex patterns of interdependence between various signs and symptoms in the VA instrument [1,2]. This approach can be interpreted as a sophisticated multiclass generalization of the classic back-calculation approach of epidemiology and has been shown to be a promising method in theoretical simulation and small-scale validation studies [2]. The KL method is based on the following matrix expression: P (S) = P ( S| D) × P (D) 2k × 1

2k × n

n ×1

Where P(S) is the distribution of symptom profiles in the test dataset, P(S|D) is the distribution of symptom profiles for each cause of death (calculated using the training dataset), and P(D) is the distribution of causes of death in the test dataset. A symptom profile is a combination of k different symptoms. Each symptom is dichotomous, so k symptoms yield 2k symptom profiles. P(S) and P(S|D) are calculated by tabulation. For a symptom profile s0, P(S = s0) is calculated by counting the fraction of VAs to be analyzed that endorse symptom profile s0. For a symptom profile s0 and cause j, P(S = s0|D = j) is calculated by counting the fraction of VAs in the “training set” with disease j as the cause of death that endorses symptom profile s0. Quadratic programming or least squares approaches may be used to solve this equation. King and Lu reported that the expected value of CSMFs estimated by their direct estimation method in repeated samples yields plausible CSMFs in a simulation study using data for 13 adult causes of death in China and 11 causes of child death in Tanzania. King and Lu [1] further stress that the direct CSMF estimation approach does not depend on the presence in the

Methods We use the PHMRC gold standard VA dataset to undertake three distinct analyses to understand the

119


Flaxman et al. Population Health Metrics 2011, 9:35 http://www.pophealthmetrics.com/content/9/1/35

Page 3 of 10

performance of the KL method in different settings. Details of the methods used for establishing the gold standard cause of death and for the collection of the VA data are reported elsewhere in detail [9]. The PHMRC instrument uses separate modules for neonate, child, and adult deaths so these sets of deaths have been analyzed separately. The final cause lists are mutually exclusive and collectively exhaustive for all causes, and contain 11 causes for neonates, 21 causes of child death, and 34 causes of adult death. The development of training and test datasets is described in detail elsewhere [9] and is summarized in Figure 1. Figure 1 outlines the basic simulation design to generate a range of test and training datasets. First, for each cause we split the data randomly without replacement, with 75% into a training set and 25% into a test set. This step was repeated 500 times to avoid results being influenced by the idiosyncrasies of a particular data split. We then sampled CSMF compositions from an uninformative Dirichlet distribution and randomly resampled (with replacement) the available deaths in the test set to generate a test dataset with the prescribed total number of deaths and CSMF composition. By varying the CSMF compositions of test datasets as well as the total number of deaths, we generated a wide array of validation datasets. Each one maintained a strict separation of training and test data, which guarantees that our metrics are for “out-of-sample� prediction quality. This method generates test/train datasets with independent CSMF composition. Over the course of the PHMRC gold standard VA validation study, it became clear that metrics for gauging the quality of VA methods are quite subtle and are not standardized between research efforts. The complex issues are described fully by Murray et al. [5], who also proposed new metrics that allow for quality comparison across cause lists and cause compositions. Following their recommendations, we report median CSMF accuracy across 500 test datasets. At the cause-specific level we report the intercept, slope, and root mean squared error (RMSE) for the relationship between estimated CSMF and the true CSMF assessed using linear regression. Murray et al. [10] showed that in China, the recall of the household or possession of medical records recorded in the VA interview had a profound effect on both the concordance for PCVA as well as the performance of the computer-coded VAs. However, obtaining useful information from this health care experience (HCE) cannot be assumed for many settings where VA will be used. Therefore, we identified all signs and symptoms that we suspected could be much more informative for people who have received health care and performed all validation experiments on two versions of the datasets

Original Data with Validated Gold Standard

25%

Random CSMF via Dirichlet

Sampling without replacement

Test Data Pool

75%

Sampling with replacement

Train Dataset

Test Dataset

KL Direct Estimation of CSMFs

True CSMFs

Comparison

Accuracy

Figure 1 The process of generating 500 test and train datasets and applying KL estimation to them. After dividing the whole dataset into 25% testing and 75% training portions (randomly, stratified by cause), a draw from an uninformative Dirichlet distribution was used to perturb the cause combination of the test set (by resampling each cause with replacement according to a CSMF that was drawn from Dirichlet distribution). Accuracy of the KL method was calculated by comparing the KL-estimated CSMFs and the true CSMF of the test dataset.

developed above, one with all variables (noted as with HCE) and one version excluding recall of health care experience (without HCE). Validating KL CSMFs for neonates, children, and adults

In the first test, we apply the KL software to the 500 pairs of training and test datasets for each of the three age groups. We assess the performance of the KL method by reporting median CSMF accuracy and the relationship between the estimated CSMFs and true

120


Flaxman et al. Population Health Metrics 2011, 9:35 http://www.pophealthmetrics.com/content/9/1/35

Page 4 of 10

CSMFs by cause. The KL method requires the user to select two parameters: the number of symptoms to be subset from all symptoms (nSymp), and the total number of draws of different subsets (n.subset). For these main results, we used settings of 10 symptoms and 400 iterations. We also investigated the effect of these parameters on the accuracy of the KL method by an extensive exploration of the range of settings. We repeated our assessment while varying the nSymp from eight to 18. We also varied n.subset from 200 to 600.

Table 1 Median CSMF Accuracy for KL and PCVA, by age group with and without HCE

Assessing the relationship between KL CSMF accuracy and the number of causes

variation in the CSMF estimation accuracy by changing the symptom cluster size when n.subset is large enough (greater than 200). Figure 2 shows the variation of CSMF accuracy when the symptom cluster size is varied between eight and 18. (The KL method requires that the number of causes in the module be fewer than the number of symptom profiles 2k. Hence, theoretically k = 6 is the smallest allowed. In addition, since some symptom profiles never appear in the data, k = 8 is the smallest nSymp we could use for all adult, child, and neonate datasets.) As shown in Table 1, without HCE the KL method slightly outperforms PCVA. We remark that the PCVA accuracy for child VAs in absence of HCE variables is 0.05 below the median KL accuracy. For neonatal VAs without and with HCE variables, the KL method CSMF accuracy is 0.797 (95% uncertainty interval [UI]: 0.784, 0.805) and 0.795 (0.783, 0.806), respectively, which are also substantially higher than than CSMF accuracy of PCVA. The relationship between estimated and true CSMFs for each cause in adults, children, and neonates are shown in Additional file 1. A good estimation should have intercept close to zero and slope close to one. With slope 0.631, intercept 0.015, and RMSE 0.013, drowning is the most accurately estimated cause of death in adult VA. In the same module, stomach cancer and other cardiovascular diseases are the least accurately estimated causes with slope being approximately 0.08. Other cardiovascular disease also has a high intercept (0.047), which shows it is substantially overestimated when the true CSMF is low. In the child module, violent death is the most accurately estimated CSMF with slope 0.480, intercept 0.024, and RMSE 0.016, and other digestive disease is the worst estimated cause where slope, intercept, and RMSE are 0.092, 0.031, and 0.010, respectively. In the neonatal module, stillbirth is almost perfectly estimated with slope, intercept, and RMSE being 0.98, 0.003, and 0.017, respectively. Pneumonia has the lowest accuracy of estimation with a slope, intercept, and RMSE of 0.199, 0.053, and 0.026. As it is observed, the quality of prediction is generally higher in

KL

Assessing if KL accuracy is influenced by the correlation between training and test dataset CSMF composition

The technique described for the experiments above generates test and training sets that have independently random CSMFs. We suspected that the KL performance in previous studies has been exaggerated because the CSMF compositions of test and train datasets have been similar. To investigate this hypothesis, we conducted an additional analysis using training and test sets generated by sampling deaths from training and test pools uniformly at random (with replacement). In contrast to previous experiments in which the CSMFs of the test and train datasets are independent, the test and train datasets in this case both have CSMF combinations similar to those of the original pool. The same metrics are used for this assessment.

Results CSMF accuracy of KL for adult, child, and neonatal VA analysis was found to be largely independent of using different sized symptom clusters and including or excluding HCE (Table 1 and Figure 2). For all experiments, n.subset of KL method, which specifies the total number of draws of different subsets of symptoms, is set to 400. Through our experiments we saw no significant

121

95% UI

Median

95% UI

Adult

No HCE

0.661

(0.654, 0.665)

0.624

(0.619, 0.631)

Child

HCE No HCE

0.669 0.687

(0.664, 0.673) (0.682, 0.692)

0.675 0.632

(0.669, 0.680) (0.626, 0.642)

HCE

0.698

(0.692, 0.702)

0.682

(0.671, 0.690)

No HCE

0.797

(0.784, 0.805)

0.695

(0.682, 0.705)

HCE

0.795

(0.783, 0.806)

0.733

(0.719, 0.743)

Neonate

To evaluate the dependence of the method’s CSMF accuracy on the number of causes in the cause list, we performed the following experiment. For n = 5, 6, ..., 46 we randomly chose n causes of death and used a CSMF drawn from an uninformative Dirichlet to construct a test dataset that contains exactly n causes of death. (The maximum is 46, as our original adult dataset has 46 causes of death.) The deaths were sampled from the original 25% test and 75% train pool datasets described above. We performed 500 iterations for each n. By the nature of this test, the number of deaths in the train and test datasets do not vary as the number of causes are altered. This provides a direct assessment of performance strictly as a function of the number of causes.

PCVA

Median


Flaxman et al. Population Health Metrics 2011, 9:35 http://www.pophealthmetrics.com/content/9/1/35

Page 5 of 10

0.85 0.8 0.75 Adult without HCE Accuracy

0.7

Adult with HCE Child without HCE

0.65

Child with HCE Neonate without HCE

0.6

Neonate with HCE 0.55

0.5 8 9 10 11 12 13 14 15 16 Eumber of symptoms in each draw (nSymp)

17

18

Figure 2 Variation of CSMF accuracy of the KL method as a function of symptom cluster size (nSymp). For all age groups, with and without HCE, varying the symptom cluster size had little effect on CSMF accuracy.

Figures 3, 4, and 5 show the estimated and true CSMF of a selection of causes in the three age groups. A lower slope in the regression shown in Additional file 1 shows more deviation from the perfect estimation line in the figures. We found that KL tends to equally distribute deaths among causes, which overestimates the CSMF when the true CSMF is very low and underestimates when it is high. As shown in Figure 6, the number of causes on the cause list has a very large impact on the accuracy of KL CSMF estimations. While these results are acquired by randomly dropping causes from the adult module, a comparison with the neonate and child modules’ accuracy results (Table 1) suggests that the most important parameter in the KL method’s superior performance in child and neonate modules is the lower number of causes in these modules. Accuracy is above 0.75 when the cause list contains fewer than 12 causes. For larger cause lists, such as those used for practical applications in adults and children, the KL method generates progressively lower levels of CSMF accuracy. We found that KL is extremely sensitive to the level of similarity between cause composition in the train and test datasets. We observed that if both test and train sets are randomly sampled with the same cause

neonatal module. It is observed that for causes for which estimation is not accurate, KL tends to assign close to constant cause fractions, which results in higher intercepts and lower slopes. As a result, small CSMFs are overestimated and large CSMFs are underestimated in such causes. We found that in adult VA, the KL method is most effective in predicting CSMF for maternal causes and causes that are due to injuries, such as drowning. In child VA, measles, malaria, bite of venomous animal, and violent death were most accurately predicted. For neonatal VA, stillbirth and preterm delivery cause group were best. In contrast, KL performs poorly in predicting stomach cancer and other noncommunicable disease in adults, other digestive disease and other infectious disease in children, and pneumonia in neonates. As shown in Table 1, in general, the effect of the HCE variable on the accuracy of CSMF estimation is not large (the change is 0.008, 0.011, and -0.002 for adult, child, and neonates). For the majority of causes in all age groups, accuracy slightly increased when HCE variables were added; however, the change was not large. For example, in the adult module, average slope increases from 0.236 to 0.247 and average intercept decreases from 0.024 to 0.023 (mean RMSE does not change).

122


Flaxman et al. Population Health Metrics 2011, 9:35 http://www.pophealthmetrics.com/content/9/1/35

Page 6 of 10

Maternal

15 10 0

5

10

15

0

20

5

10

15

True Cause Fraction (%)

Pneumonia

Drowning

20

0

10 0

5

Estimated Cause Fraction (%)

15 10 5

15

20

True Cause Fraction (%) 20

0

Estimated Cause Fraction (%)

5

Estimated Cause Fraction (%)

15 10 5 0

Estimated Cause Fraction (%)

20

20

AIDS

0

5

10

15

0

20

5

10

15

20

True Cause Fraction (%)

True Cause Fraction (%)

Figure 3 Estimated versus true cause fractions for AIDS, maternal, pneumonia, and drowning in adults in 500 random resamplings of the validation dataset. Causes like pneumonia were overestimated when rare but underestimated when common, while causes like drowning were estimated with accuracy that does not depend closely on true cause fraction.

performs about as well as PCVA in terms of CSMF accuracy. Compared with some new methods [8,11,12], KL generates substantially less accurate CSMFs for adults and children. The KL method yields CSMF estimates that tend to be biased upwards when the true CSMFs in the test datasets are low and biased downwards when the true CSMFs are high. The extent of these biases is highly variable across causes. The biases in the KL estimates of CSMFs bear considerable resemblance to the biases observed in PCVA by cause, although there is some variation in performance by cause.

composition, KL estimation will yield dramatically higher CSMF accuracy. For example, for adult VAs with HCE when the test and train set have the same CSMF, the median CSMF accuracy is 0.947 (0.945, 0.951), which is 0.28 points higher than the accuracy of KL for redistributed test sets and within 0.05 of the maximum possible accuracy.

Discussion In this first large-scale validation of the KL method for direct CSMF estimation compared to gold standard cause of death assignment, we found that the method

123


Flaxman et al. Population Health Metrics 2011, 9:35 http://www.pophealthmetrics.com/content/9/1/35

Page 7 of 10

AIDS 20 15 10 0

5

10

15

20

0

5

10

15

True Cause Fraction (%)

True Cause Fraction (%)

Pneumonia

Violent Death

20

15 10 5 0

0

5

10

15

Estimated Cause Fraction (%)

20

20

0

Estimated Cause Fraction (%)

5

Estimated Cause Fraction (%)

15 10 5 0

Estimated Cause Fraction (%)

20

Malaria

0

5

10

15

20

0

True Cause Fraction (%)

5

10

15

20

True Cause Fraction (%)

Figure 4 Estimated versus true cause fraction for AIDS, malaria, pneumonia, and violent death in children in 500 random resamplings of the validation dataset. These causes were underestimated when rare and overestimated when common.

clear set of sensitive and specific symptoms, the KL method tends to yield CSMF estimates that are biased towards the cause fraction in the training dataset rather than the test dataset. This tendency of the KL method to project the training dataset CSMF onto the test dataset is confirmed by the experiment in which we found that KL accuracy was exaggerated when the training and test datasets have identical CSMF compositions. One clear advantage of KL compared to PCVA is in the tests in which household recall of health care experience is excluded from physician review and the KL

Our findings contradict several previous claims about details of the method. First, we found that varying symptom cluster size from eight to 18 made essentially no difference to the results. Second, KL does well in estimating CSMFs for causes such as road traffic accidents and drowning for which there are sensitive and specific symptoms. These are the same causes on which physicians also perform well. Our experiments show that, similarly to individual-level cause assignment techniques, KL is inaccurate in finding CSMFs for causes with weak symptom presence. Where there is not a

124


Flaxman et al. Population Health Metrics 2011, 9:35 http://www.pophealthmetrics.com/content/9/1/35

Page 8 of 10

Stillbirth 20 15 10 0

5

Estimated Cause Fraction (%)

15 10 5 0

Estimated Cause Fraction (%)

20

Pneumonia

0

5

10

15

20

0

True Cause Fraction (%)

5

10

15

20

True Cause Fraction (%)

Figure 5 Estimated versus true cause fraction for stillbirth and pneumonia in neonates in 500 random resamplings of the validation dataset. Stillbirth estimations were highly accurate, while pneumonia was either underestimated or overestimated in most cases.

Learning) that all have better performance than KL in the absence of household recall of health care experience. The relatively disappointing performance of KL compared to published claims will surprise some readers. The key explanation is the number of causes included in our study for adults and children. Our finding that the KL method’s accuracy dramatically decreases as the number of causes increases explains why KL has performed well in previous validation studies (e.g., [2]). These have all used lists of causes that contain fewer than 15 causes. For studies with smaller number of causes (e.g., neonatal VA studies usually consider fewer than eight to 10 causes of deaths) our findings suggest that the KL method produces very good results with a CSMF accuracy greater than 0.75. A further reason for the exaggerated performance previously reported for KL may be that previous studies used test and train datasets that had similar CSMF compositions. Our experiments here show that the KL method in this special case yields substantially higher levels of CSMF accuracy. In real populations, there is no reason to expect that a training dataset collected in a hospital will have the same CSMF composition as the population. In fact, a method that largely returns the training dataset CSMF composition adds little information beyond the CSMF composition of the training dataset. Thus, a more realistic assessment of KL performance follows from the cases in which the CSMF compositions in the test and train datasets are unrelated. A central assumption of the KL approach is that, conditional on the cause of death, the symptom profiles of

method. Thus, in settings where populations are expected to have little exposure to health care, the KL approach should be preferred to PCVA. This finding, however, must be tempered with the comparison to other methods (Symptom Pattern, Tariff, and Machine

● ●

0.75

● ●

● ● ●

● ●

● ●

0.7

Estimation Accuracy

0.8

● ●

● ●

● ● ●

0.65

● ●

● ●

● ●

● ●

5 8

12

17

22

27

32

37

42

Number of Causes Figure 6 Median CSMF accuracy versus number of causes on a cause list for the KL method. The test datasets for this experiment were generated by randomly selecting a set of causes and constructing test datasets using an uninformative Dirichlet distribution. The KL method has excellent performance for short cause lists, but rapidly degrades as the length of the list increases.

125


Flaxman et al. Population Health Metrics 2011, 9:35 http://www.pophealthmetrics.com/content/9/1/35

Page 9 of 10

reference deaths, usually from hospitals, are the same as community deaths. The data in the PHMRC study was collected from deaths that met stringent gold standard diagnostic criteria, and most of these necessarily occur within the hospital system (community deaths simply cannot meet the diagnostic criteria for many causes). As a result, this validation study cannot directly investigate the importance of this assumption to the KL method. However, by excluding HCE variables from the study, we have emulated this setting and found little change to our results.

Mwanaidi Said, Diozele Sanvictores, Sunil Sazawal, Veronica Tallo. The authors would like to additionally thank Charles Atkinson for managing the PHMRC verbal autopsy database and Michael Freeman, Benjamin Campbell, and Charles Atkinson for intellectual contributions to the analysis. This work was funded by a grant from the Bill & Melinda Gates Foundation through the Grand Challenges in Global Health initiative. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. The corresponding author had full access to all data analyzed and had final responsibility for the decision to submit this original research paper for publication. Author details Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA. 2University of Washington, Department of Health Services, Seattle, USA. 1

Conclusion Our validation of the KL method for direct estimation of CSMF from VA data collected in the PHMRC study showed that KL performs at about the same level as PCVA for adults, slightly better for children, and much better for neonates. Since it is a direct method, it does not yield cause of death assignments for individual deaths. We also found that KL performance is sensitive to the number of causes on the cause list, and as the number of causes under consideration increases, the quality of KL estimation decreases precipitously. This degradation is especially relevant when using VA to understand population-level patterns of adult mortality, in which the accuracy of KL becomes comparable to PCVA. Thus we judge KL to be a reasonable approach for neonatal VA and other settings with very short cause lists, but not as useful in its current form for adult or child VA. For adults and children, other methods, such as the Simplified Symptom Pattern, Random Forest, and Tariff, have better CSMF accuracy and also provide individual death cause assignment.

Authors’ contributions AV performed analyses and helped write the manuscript. SLJ and JKB helped in data preparation and preliminary studies. CJLM designed the study and drafted the manuscript. ADF contributed in the study design, edited the manuscript, and approved the final version. ADF accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. All authors have read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Received: 14 April 2011 Accepted: 4 August 2011 Published: 4 August 2011 References 1. King G, Lu Y, Shibuya K: Designing verbal autopsy studies. Popul Health Metr 2010, 8:19. 2. King G, Lu Y: Verbal Autopsy Methods with Multiple Causes of Death. Statist Sci 2008, 23:78-91. 3. MISAU: UNICEF Mozambique - National Child Mortality Study 2009: Summary 2009 [http://www.unicef.org/mozambique/ NCMS_Summary_Report_ENG_220909.pdf]. 4. Lee AC, Mullany LC, Tielsch JM, Katz J, Khatry SK, LeClerq SC, Adhikari RK, Shrestha SR, Darmstadt GL: Verbal Autopsy Methods to Ascertain Birth Asphyxia Deaths in a Community-based Setting in Southern Nepal. Pediatrics 2008, 121:e1372-1380. 5. Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9:28. 6. Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:32. 7. Lozano R, Freeman MK, James SL, Campbell B, Lopez AD, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:50. 8. Murray CJL, James SL, Birnbaum JK, Freeman MK, Lozano R, Lopez AD, the Population Health Metrics Research Consortium (PHMRC): Simplified Symptom Pattern Method for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:30. 9. Murray CJL, Lopez AD, Black R, Ahuja R, Ali SM, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gómez S, Hernández B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Praveen D, Premji Z, RamírezVillalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D, Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metr 2011, 9:27. 10. Murray CJL, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the symptom pattern method for analyzing verbal autopsy data. PLoS Med 2007, 4:e327.

Additional material Additional file 1: Slope, intercept, and RMSE from linear regression of estimated versus true CSMFs, by age group and cause with and without HCE.

Abbreviations CSMF: cause-specific mortality fraction; KL: King and Lu cause-specific mortality fraction direct estimation method; PCVA: physician-certified verbal autopsy; PHMRC: Population Health Metrics Research Consortium; RMSE: root mean squared error; HCE: health care experience; VA: verbal autopsy Acknowledgements This research was conducted as part of the Population Health Metrics Research Consortium: Christopher J.L. Murray, Alan D. Lopez, Robert Black, Ramesh Ahuja, Said Mohd Ali, Abdullah Baqui, Lalit Dandona, Emily Dantzer, Vinita Das, Usha Dhingra, Arup Dutta, Wafaie Fawzi, Abraham D. Flaxman, Sara Gomez, Bernardo Hernandez, Rohina Joshi, Henry Kalter, Aarti Kumar, Vishwajeet Kumar, Rafael Lozano, Marilla Lucero, Saurabh Mehta, Bruce Neal, Summer Lockett Ohno, Rajendra Prasad, Devarsetty Praveen, Zul Premji, Dolores Ramírez-Villalobos, Hazel Remolador, Ian Riley, Minerva Romero,

126


Flaxman et al. Population Health Metrics 2011, 9:35 http://www.pophealthmetrics.com/content/9/1/35

Page 10 of 10

11. James SL, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metr 2011, 9:31. 12. Flaxman AD, Vahdatpour A, Green S, James SL, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:29. doi:10.1186/1478-7954-9-35 Cite this article as: Flaxman et al.: Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards. Population Health Metrics 2011 9:35.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

127


Mayanja et al. Population Health Metrics 2011, 9:36 http://www.pophealthmetrics.com/content/9/1/36

RESEARCH

Open Access

Using verbal autopsy to assess the prevalence of HIV infection among deaths in the ART period in rural Uganda: a prospective cohort study, 2006-2008 Billy N Mayanja1*, Kathy Baisley2, Norah Nalweyiso1, Freddie M Kibengo1, Joseph O Mugisha1, Lieve Van der Paal1,3, Dermot Maher1,4 and Pontiano Kaleebu1

Abstract Background: Verbal autopsy is important for detecting causes of death including HIV in areas with inadequate vital registration systems. Before antiretroviral therapy (ART) introduction, a verbal autopsy study in rural Uganda found that half of adult deaths assessed were in HIV-positive individuals. We used verbal autopsy to compare the proportion of HIV-positive adult deaths in the periods before and after ART introduction. Methods: Between 2006 and 2008, all adult (≼ 13 years) deaths in a prospective population-based cohort study were identified by monthly death registration, and HIV serostatus was determined through annual serosurveys. A clinical officer interviewed a relative of the deceased using a verbal autopsy questionnaire. Two clinicians independently reviewed the questionnaires and classified the deaths as HIV-positive or not. A third clinician was the tie-breaker in case of nonagreement. The performance of the verbal autopsy tool was assessed using HIV serostatus as the gold standard of comparison. We compared the proportions of HIV-positive deaths as assessed by verbal autopsy in the early 1990s and the 2006-2008 periods. Results: Of 333 deaths among 12,641 adults of known HIV serostatus, 264 (79.3%) were assessed by verbal autopsy, of whom 59 (22.3%) were HIV-seropositive and 68 (25.8%) were classified as HIV-positive by verbal autopsy. Verbal autopsy had a specificity of 90.2% and positive predictive value of 70.6% for identifying deaths among HIV-infected individuals, with substantial interobserver agreement (80.3%; kappa statistic = 0.69). The HIVattributable mortality fraction estimated by verbal autopsy decreased from 47.0% (pre-ART period) to 25.8% (ART period), p < 0.001. Conclusions: In resource-limited settings, verbal autopsy can provide a good estimate of the prevalence of HIV infection among adult deaths. In this rural population, the proportion of deaths identified by verbal autopsy as HIVpositive declined between the early 1990s and the 2006-2008 period. Verbal autopsy findings can inform policy on HIV health care needs.

Background Most resource-limited settings have inadequate or no vital registration systems, although many deaths in these areas occur outside health care facilities and vital registration data are essential for public health planning [1,2]. Since 2004, many countries including Uganda

have scaled up access to antiretroviral therapy (ART). Knowing the impact of ART introduction on the proportion of deaths that are associated with HIV is important for policymakers [3,4]. The World Health Organization (WHO) has stimulated interest in the use of verbal autopsy as a tool to obtain information on causes of death in areas with inadequate vital registration systems, for public health planning and resource allocation [5]. In a WHO

* Correspondence: billy.mayanja@mrcuganda.org 1 MRC/UVRI Uganda Research Unit on AIDS. P.O.Box 49, Entebbe, Uganda Full list of author information is available at the end of the article

Š 2011 Mayanja et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

128


Mayanja et al. Population Health Metrics 2011, 9:36 http://www.pophealthmetrics.com/content/9/1/36

Page 2 of 7

workshop, priority research questions were identified for the use of verbal autopsy tools in longitudinal population-based HIV studies: the accuracy of verbal autopsy for HIV/AIDS identification, especially in terms of detecting changes in causes of death over time; the best verbal autopsy questions to identify HIV infection in deceased persons; and verbal autopsy questions to be used for ART-related deaths [6]. From 1990 to 1993, before ART was available in rural Uganda, the verbal autopsy tool was validated using the HIV test results obtained from consecutive annual HIV serosurveys in the same population. Half of the deaths among adults aged 13 years and above were among HIV-positive people and the specificity and positive predictive value of the verbal autopsy tool for identifying HIV-positive deaths were both 92%. The study methods and verbal autopsy questionnaire used during this study have been described in detail before [7]. We designed a study to evaluate the usefulness of verbal autopsy in identifying deaths among adults with HIV infection after the introduction of ART in 2004 in the same population. We also compared the proportions of deaths identified as HIV-positive before and after ART introduction, by comparing our results to those from the 1990-1993 verbal autopsy study.

Methods Study setting and participants

In 1989, a general population cohort (GPC) was established in rural southwest Uganda to describe the population dynamics of HIV infection. The GPC originally consisted of around 4,500 adults (aged 13 years and above) in 15 neighboring villages [8]. In 1999, 10 more villages were added to the survey area, bringing the total number of adults to around 14,530 in 2006-2008 [9]. Annual house-to-house census and HIV serosurveys are conducted. The average annual survey participation is 60% to 65%, although 84% have ever participated in the HIV serosurvey. In 1990, a clinical cohort nested within the GPC was established to study the natural history of HIV infection. The study setting has a stable homogeneous population, in which adult HIV prevalence initially declined from 8.5% in 1990-1991 to 6.2% in 19992000 but then rose to 7.7% in 2004-2005 [10,11]. Since January 2004, ART has been provided for all eligible participants at the study clinic according to the Uganda National ART guidelines [12,13]. At the time of the study, the population of Uganda was 29.6 million people with a national average HIV prevalence of 6.4% [14]. Notification of deaths and condolence visit

The GPC study purposely instituted 26 communitybased recorders, who every month collect and report information on all deaths and births that occur in their villages in the study area, and this information is

129

updated by the annual census conducted in the same cohort. From January 2006 until December 2008, all deaths that occurred in adults in the study area were assessed by verbal autopsy. A home visitor paid a condolence visit to relatives of the eligible deceased soon after the death (usually within one month) and informed them about the study, giving them opportunity to ask questions. A copy of the study information sheet was given to the relatives most closely associated with the deceased. A verbal autopsy interview appointment was made (approximately two months after death occurred) with consenting relatives. The verbal autopsy visit

The interviewer (a clinical officer) and a home visitor visited the relative of the deceased on the agreed date and, after obtaining written informed consent, administered a verbal autopsy questionnaire. The verbal autopsy questionnaire was the same as that validated in 19901993 in the same study area but with additional questions on ART use (Appendix). The questionnaire was comparable to the WHO SAVVY (sample vital registration with verbal autopsy) International verbal autopsy questionnaire 3 for death of a person aged 15 years and above. However, it missed some of the questions in Section 5: History of previously known medical conditions (especially high blood pressure, diabetes, asthma, and cancer), Section 7: Symptoms and signs associated with illnesses of women, Section 8: Symptoms and signs associated with pregnancy, and Section 10: Treatment and health service use for the final illness. During the verbal autopsy interview, the respondent was asked to give a narration of the events preceding the death and then specific questions were asked on the symptoms prior to death and whether the deceased was on ART. At the end of the verbal autopsy interview, the opinion of the respondent on the deceased’s cause of death was sought by asking whether the respondent thought that he/she knew the cause of death. If the response was ‘yes’, the respondent was asked how or from whom he/she knew (for example, own opinion, from another household member, from doctor/nurse treating the deceased). The interviewer also recorded what she thought was the cause of death. Each completed questionnaire was checked for completeness and consistency by a physician and any queries were discussed and clarified with the interviewer. Independent allocation of cause of death

The verbal autopsy questionnaires were independently reviewed by two experienced physicians (both clinical epidemiologists) from outside the study area to ascertain the likely cause(s) of death and whether the deceased was HIV-positive. There were no specified criteria given


Mayanja et al. Population Health Metrics 2011, 9:36 http://www.pophealthmetrics.com/content/9/1/36

Page 3 of 7

to the physicians upon which to make the assessment as to whether the deceased was HIV-positive or not; physicians were allowed the freedom to use their clinical knowledge of HIV disease presentation. The assessing physician had access to the respondent’s opinion on the cause of death. In case of nonagreement between the two clinicians on whether the deceased was HIV-positive or not, the opinion of a third, more senior physician (an internist) was sought. The physicians recorded their findings on specifically designed verbal autopsy forms. Sample size

We aimed to assess by verbal autopsy all deaths that occurred in the study population between 2006 and 2008. No formal sample-size calculations were done. Statistical methods

Data were double-entered into an MS Access database (Microsoft Corp., USA) and analyzed using Stata version 10 (Stata Corporation, College Station, Texas, USA). The analysis of verbal autopsy data was restricted to deaths in individuals who had participated in at least one annual census round and so had a valid unique census identification number. This identification number was used to obtain the individual’s HIV serostatus by linkage with the annual HIV serosurvey results in the population cohort database. We assessed the performance of the verbal autopsy tool in identifying HIV infection among deaths in adults aged 13 years and older. Since we did not have data on the actual cause of death, we defined as “HIV-associated” any death occurring in a person who was HIV-seropositive. We calculated specificity and positive predictive values of the verbal autopsy tool, using HIV serostatus as the gold standard. True positive was defined as HIV-positive status, and true negative was defined as HIV-negative status at the last available serosurvey. A verbal autopsy test positive was defined as an HIV-associated diagnosis by both assessors, or two out of three assessors in the case of nonagreement between the first two. A verbal autopsy test negative was defined as an HIV-negative diagnosis, or unknown HIV association, by two out of three assessors. We assessed interobserver agreement between the first two assessors using a kappa statistic. We calculated HIV-attributable mortality fraction as the proportion of deaths of known HIV serostatus that were classified as “HIV-positive” by verbal autopsy. Ethical aspects

The study was approved by the Science and Ethic committee of the Uganda Virus Research Institute and the Uganda National Council for Science and Technology. Informants gave written informed consent prior to participating in the study. A

130

condolence fee of 5,000 Uganda shillings (US$ 2.20) was given to each informant.

Results Between January 2006 and December 2008, there were 387 deaths in the study area among 14,530 adult participants aged 13 and older in the GPC census and 333 deaths among 12,641 adults with known HIV serostatus (mortality rate = 10.8 per 1000 person-years, 95% confidence interval (CI): 9.8, 12.1). Among the 333 deaths of known HIV status, verbal autopsy was done for 264 (79.3%). Of the 69 deaths of known HIV serostatus where verbal autopsy was not done, in 29 (42%) there was no relative to interview, in 15 (22%) the relative refused to be interviewed, and in 25 (36%) no reason was given. Of the 333 deaths of known HIV status, a similar proportion of HIV-positive (78.7%) and HIV-negative (80.1%) deaths were assessed by verbal autopsy (Table 1). Among the 264 deaths assessed by verbal autopsy, 59 (22.3%) were HIV-positive and had a mean (SD) age of 41.0 (14.4) years while 205 (77.7%) were HIV-negative with a mean (SD) age of 60.1 (22.8) years. Among all HIV-seropositive deaths in the GPC, there was no evidence of a difference in mean age between deaths assessed by verbal autopsy and those that were not (p = 0.78). However, among the HIV-seronegative deaths in the GPC, there was some evidence that the mean age of deaths assessed by verbal autopsy was greater than that of deaths that were not assessed (61.4 versus 54.9 years, respectively, p = 0.06). Overall, 68 (25.8%) deaths were classified as HIVpositive by verbal autopsy. The proportion of deaths that were classified as HIV-positive decreased with increasing age: from 50.6% in those aged 13 to 44 years to 5.1% among those aged 65 years and older (Table 2). Similarly, the proportion of true HIV-seropositive deaths decreased with age, from 41.4% of those aged 13 to 44, to 3.4% of those aged 65 years and older. The verbal autopsy had an overall specificity of 90.2% and positive predictive value of 70.6% for diagnosis of HIV-positive death. In participants aged 13 to 44 years, the corresponding values were 76.5% and 72.7%, respectively, while among those aged 65 years and over, the values were 96.5% and 33.3% respectively (Table 3). The assessors’ agreement as to whether the death was HIV-positive was found in 212 (80.3%) deaths (kappa statistic 0.69). The 52 discordant assessments were evenly distributed between deaths of participants who were HIV-positive (20.3%) and negative (19.5%) (p = 0.89). The HIV-attributable mortality fraction estimated by verbal autopsy decreased from 47.0% (pre-ART period) to 25.8% (ART period) (p < 0.001; Table 4).


Mayanja et al. Population Health Metrics 2011, 9:36 http://www.pophealthmetrics.com/content/9/1/36

Page 4 of 7

Table 1 Total deaths of known HIV serostatus and available verbal autopsy (VA) deaths, by age and HIV status. Age group (years)

VA assessed/total (%) HIV-positive

HIV-negative

Deaths known HIV status

13-24

4/7 (57.1%)

19/29 (65.5%)

23/36 (63.9%)

25-34

20/24 (83.3%)

19/22 (86.4%)

39/46 (84.8%)

35-44

12/16 (75.0%)

13/20 (65.0%)

25/36 (69.4%)

45-54

12/13 (92.3%)

12/16 (75.0%)

24/29 (82.8%)

55-64

7/9 (77.8%)

28/36 (77.8%)

35/45 (77.8%)

65+

4/6 (66.7%)

114/135 (84.4%)

118/141 (83.7%)

59/75 (78.7%)

205/258 (80.1%)

264/333 (79.3%)

VA assessed

41.2 (14.0)

61.4 (22.0)

56.9 (22.1)

Not assessed

40.1 (16.2)

54.9 (25.2)

51.5 (24.2)

41.0 (14.4)

60.1 (22.8)

55.8 (22.6)

Total Mean (SD) age (years)

All deaths

group [15]. The positive predictive value of verbal autopsy is lower among people aged 65 and older than in the younger age groups because of the lower prevalence of HIV-positive deaths. In our study, only 3% of the deaths in those 65 years and older were HIV-positive. In contrast, a study in Kenya reported that AIDS was the principle cause of death among older people up until the age of 70 years [16]. In populations such as ours with a lower prevalence of HIV-positive deaths among people over 65, improved verbal autopsy tools are needed to reliably measure HIV-positive deaths in this age group. In 2006 to 2008, during the time of this verbal autopsy study, the estimated national ART coverage in Uganda was 33% [17]. As expected, when compared to the earlier verbal autopsy study in the period before ART was introduced in this community [7], we found

Discussion In this study population, which has no official vital registration system, we found that verbal autopsy had an overall specificity of 90% and positive predictive value of 71% in identifying HIV infection among adult deaths. Since the introduction of ART in 2004, the prevalence of HIV infection among adult deaths, as estimated by verbal autopsy, has reduced, with the HIV-attributable mortality fraction falling from 47% in the early 1990s to 25.8% in 2006 to 2008. We found that verbal autopsy performed well in identifying HIV-positive deaths in the age groups 13 to 44 years and 45 to 64 years. This finding is in agreement with findings from a study conducted in Tanzania and Zimbabwe, which showed that, in generalized HIV epidemics, verbal autopsy can be used to reliably measure AIDS mortality, especially in the 15 to 44 year age

Table 2 Verbal autopsy classification by age and HIV serostatus. HIV status by verbal autopsy classification

Total (%)

True HIV serostatus HIV-positive

13-44 years

45-64 years

Positive

44 (50.6%)

32

12

Negative Don’t know

32 (36.8%) 11 (12.6%)

2 2

30 9 51 (58.6%)

87 (100%)

36 (41.4%)

Positive

18 (30.5%)

14

4

Negative

35 (59.3%)

4

31

Don’t know 65+ years

All ages

HIV-negative

6 (10.2%)

1

5

59 (100%)

19 (32.2%)

40 (67.8%)

Positive

6 (5.1%)

2

4

Negative Don’t know

103 (87.3%) 9 (7.6%)

1 1

102 8 114 (96.6%)

118 (100%)

4 (3.4%)

Positive

68 (25.8%)

48

20

Negative

170 (64.4%)

7

163

Don’t know

26 (9.8%)

4

22

264 (100%)

59 (22.3%)

205 (77.7%)

Total deaths

131


Mayanja et al. Population Health Metrics 2011, 9:36 http://www.pophealthmetrics.com/content/9/1/36

Page 5 of 7

Table 3 Performance of the verbal autopsy tool using HIV serostatus as gold standard, by age group Age group (years) 13-44

45-64

65+

All ages

Specificity (95% CI)

76.5% (62.5-87.2%)

90.0% (76.3-97.2%)

96.5% (91.3-99.0%)

90.2% (85.3-93.9%)

Positive predictive value (95% CI)

72.7% (57.2-85.0%)

77.8% (52.4-93.6%)

33.3% (4.3-77.7%)

70.6% (58.3-81.0%)

HIV-attributable mortality fraction (95% CI) this study

50.6% (39.6-61.5%)

30.5% (19.2-43.9%)

5.1% (1.9-10.7%)

25.8% (20.6-31.5%)

cohort (based on ALPHA data2)

40.3%

24.4%

1.6%

17.0%

1

Test (verbal autopsy) positive defined as positive by both assessors, or by 2 out of 3 assessors in case of nonagreement; test negative defined as negative or unknown by both assessors, or by 2 out of 3 assessors in case of nonagreement. 2 Unpublished Analysing Longitudinal Population-based HIV cohorts in Africa (ALPHA) data

proportion of deaths assessed. Therefore, our observed decrease in the estimated HIV-attributable fraction is unlikely to be a result of HIV-positive selection bias in the verbal autopsy assessed deaths. In this study we used two physicians to review and independently ascertain whether the deceased was HIV-positive or not, and a tie-breaker third clinician in case of nonagreement. There has been debate as to which is the best method to use to ascertain the cause of death from verbal autopsy questionnaires in resource-limited settings. Though physician review of verbal autopsy questionnaires is the most commonly used method, issues have been raised about the possible inter- and intra-observer variability, the demands on physicians’ time, and the remuneration costs of these physicians [22,23]. Other techniques like the probabilistic (InterVA) model [24,25], symptom pattern [26], machine learning, and tariff method have been developed and validated, but a universally accepted method is yet to be decided. The assessing clinicians’ ascertainment of whether the deceased was HIV-positive could have been biased by the opinion of the deceased’s relative. Without access to this information, the performance of the verbal autopsy tool may have been lower, and our results may not be generalizable to settings where the respondent’s opinion was not available. However, the majority of deaths in our cohort occurred at home or outside health care facilities without medical attention, and so the responses of caregivers and relatives may reflect general awareness of HIV and recognition of its symptoms in the community. Thus our results may be generalizable to other rural settings with similar HIV prevalence, in which verbal autopsy information is obtained from family members. This study had several strengths. The annual HIV survey allowed the HIV serostatus of the majority of deaths to be known. The monthly reports of deaths in the area by the community-based recorders allowed us to have a fairly complete death registration. We ensured consistency in the interviewing process by

a decline in the proportion of deaths that were HIVpositive. Our estimated HIV-attributable mortality fraction from the verbal autopsy data is somewhat higher than that estimated from the main GPC cohort based on HIV serostatus (25.8% versus 17.0%). The observed decline in HIV-attributable mortality fraction since the early 1990s reflects improved clinical care and cotrimoxazole prophylaxis before ART was available, as well as the introduction of ART in 2004. The impact of ART in reducing HIV-associated mortality has been documented before [18-20]. Access to ART has improved the survival of HIV-infected individuals, making HIV infection a chronic manageable disease and allowing these individuals to live socially- and economically-active lives [21]. Although worldwide the number of people living with HIV increased from 29 million in 2001 to 33.4 million in 2008, HIV-associated deaths increased only from 1.9 million to 2 million over the same period [18]. We assessed a higher proportion of all adult deaths by verbal autopsy than was done in the 1990-1993 study. However, in both studies, there was no difference by HIV status in the

Table 4 Comparison of verbal autopsy-assessed HIVassociated deaths in pre-ART and ART periods. Characteristics

Verbal autopsy study period

Study period

Pre-ART (1990-93)

ART (2006-08)

HIV prevalence

8.0%

7.5%

Deaths of known HIV serostatus

293

333

Deaths where verbal autopsy was done

155 (53%)

264 (79%)

HIV-positive deaths with verbal autopsy

78 (50%)

59 (22%)

Specificity

92%

90%

Positive predictive value

92%

71%

Assessors’ agreement on HIV-associated deaths 91%

80%

Kappa statistic

n/a

0.69

HIV-attributable mortality fraction-VA study

47%

25.8%

HIV-attributable mortality fraction-cohort data

50%

17%

132


Mayanja et al. Population Health Metrics 2011, 9:36 http://www.pophealthmetrics.com/content/9/1/36

Page 6 of 7

using the same interviewer for all the relatives of the deceased. The use of the same questionnaire as in the previous study enabled comparison of the findings of the two studies. There was substantial interobserver agreement between the two assessing clinicians and verbal autopsy had good specificity and positive predictive value for the identification of HIV-positive deaths. The two-month interval between death and the verbal autopsy interview likely minim ized any difficulty participants had in recalling the circumstances of their loved ones’ deaths and minimized the stress caused by the verbal autopsy interview, resulting in a high participation rate. Our study limitations included the fact that some of the censused GPC participants had not participated in the annual HIV serosurvey and therefore had no HIV test results and were excluded from analysis. Furthermore, some individuals may not have participated in a recent serosurvey and thus may have seroconverted since their last negative HIV test. However, the HIV incidence in this population is relatively low (4.3, 95% CI: 3.6, 5.1 per 1,000 person-years; unpublished data), so any misclassification bias is likely to be small. We also could not carry out verbal autopsy for all the deceased participants as some relatives refused to be interviewed and, in other cases, there were no relatives to interview. Using HIV status (instead of the actual cause of death) as the comparison gold standard for HIV-associated deaths may not be ideal, as some HIVinfected participants might have died of other nonHIV-related causes.

Conclusions The good specificity and positive predictive value of verbal autopsy in this study in identifying deaths in HIVpositive persons in a resource-limited setting with limited official vital registration systems adds to the evidence of the value of verbal autopsy in determining HIV-positive deaths. Policymakers should consider verbal autopsy as a source ofi nformation about deaths occurring among HIV-positive individuals and HIVassociated deaths that can be added in routine health information systems. Appendix Items included in ART period (2006 - 2008) verbal autopsy study questionnaire. General

Circumstances of death • Duration of terminal illness • Was death sudden and unexpected • Type of medical treatment during the last 3 months before death • Whether deceased was on ART, and if yes, when ART was started • Ever treated for tuberculosis • Health unit(s) attended • Records available in home (extract findings) Signs, symptoms and their severity during the last illness • Fever • Diarrhea • Dehydration • Vomiting/associated abdominal pain • Breathing (chest pain/chest in-drawing/difficult/ rapid/wheezing) • Cough (severe/productive/blood in sputum/followed by vomiting) Neurological • Neck stiffness • Unconscious • Fits (jerking of whole body/individual limbs/frequency per day) • History of epileptic illness in earlier years • Paralysis of limbs • Rigid body stiffness, unable to open mouth Others • • • • • • • • •

Skin rash and itching Red and sore eyes Loss of weight Abscesses/body sores Body swelling (edema/which parts) Constipation (associated abdominal pain) Hair changes (light colored or thin) Yellowing of eyes/passing brown urine Herpes zoster (at any time in life)

Respondent’s opinion of cause of death • Specify • How or from whom did you know

•Where death occurred • Any spouse of the deceased died in the last 5 years, if so, believed cause of death • Respondent’s detailed account of illness

133

Interviewer’s diagnosis • Illness 1 • Illness 2


Mayanja et al. Population Health Metrics 2011, 9:36 http://www.pophealthmetrics.com/content/9/1/36

Page 7 of 7

Final classification (Clinician) • • • •

10.

Death HIV-related Death ART-related Leading cause of death Subsidiary causes

11.

12. List of abbreviations ART: antiretroviral therapy; GPC: general population cohort

13. 14.

Acknowledgements We acknowledge the contributions of the community-based recorders for the death registrations, the GPC survey office clerks for availing us with a list of GPC-censused deaths, cohort home visitors for mobilizing the relatives of the deceased, the assessing clinicians for assigning causes of death, the Statistics staff for data management, and the relatives of the deceased participants without whom this study would not have been possible. Funding: This study was funded by the Medical Research Council (UK).

15.

16.

17. Author details 1 MRC/UVRI Uganda Research Unit on AIDS. P.O.Box 49, Entebbe, Uganda. 2 MRC Tropical Epidemiology Group, London School of Hygiene and Tropical Medicine, Keppel Street, WC1E 7HT, UK. 3International Rescue Committee, Tanzania Office, P.O Box 106048, Dar es Salaam, Tanzania. 4London School of Hygiene and Tropical Medicine, Keppel Street, WC1E 7HT, London.

18.

19. Authors’ contributions BNM was involved in data acquisition and data interpretation and wrote the first draft of the manuscript; KB was involved in data analysis and interpretation; NN, FMK, and JOM were involved in data acquisition and interpretation; LVP was involved in the study conception and design, data acquisition, and interpretation; DM was involved in data interpretation; PK was involved in the study conception and design, data acquisition, and interpretation and was the overall custodian of the study. All authors were involved in the preparation of the manuscript and have read and given approval of the final version.

22.

Competing interests The authors declare that they have no competing interests.

23.

Received: 15 March 2011 Accepted: 4 August 2011 Published: 4 August 2011

24.

20.

21.

References 1. Mathers CD, Fat DM, Inoue M, Rao C, Lopez AD: Counting the dead and what they died from: an assessment of the global status of cause of death data. Bull World Health Organ 2005, 83:171-177. 2. Setel PW, Macfarlane SB, Szreter S, Mikkelsen L, Jha P, et al: A scandal of invisibility: making everyone count by counting everyone. Lancet 2007, 370:1569-1577. 3. Diaz T, De Cock K, Brown T, Ghys PD, Boerma JT: New strategies for HIV surveillance in resource-constrained settings: an overview. AIDS 2005, 19(Suppl 2):S1-8. 4. Diaz T, Loth G, Whitworth J, Sutherland D: Surveillance methods to monitor the impact of HIV therapy programmes in resource-constrained countries. AIDS 2005, 19(Suppl 2):S31-37. 5. WHO: WHO technical consultation on verbal autopsy tools. Talloires, France; 2004, Final report. 2005. 6. WHO: Surveillance in the era of treatment. Workshop report. Streatley, UK; 2005. 7. Kamali A, Wagner HU, Nakiyingi J, Sabiiti I, Kengeya-Kayondo JF, et al: Verbal autopsy as a tool for diagnosing HIV-related adult deaths in rural Uganda. Int J Epidemiol 1996, 25:679-684. 8. Mulder DW, Nunn AJ, Wagner HU, Kamali A, Kengeya-Kayondo JF: HIV-1 incidence and HIV-1-associated mortality in a rural Ugandan population cohort. AIDS 1994, 8:87-92. 9. Mbulaiteye SM, Mahe C, Ruberantwari A, Whitworth JA: Generalizability of population-based studies on AIDS: a comparison of newly and

25.

26.

continuously surveyed villages in rural southwest Uganda. Int J Epidemiol 2002, 31:961-967. Shafer LA, Biraro S, Nakiyingi-Miiro J, Kamali A, Ssematimba D, et al: HIV prevalence and incidence are no longer falling in southwest Uganda: evidence from a rural population cohort 1989-2005. AIDS 2008, 22:1641-1649. Nakibinge S, Maher D, Katende J, et al: Community engagement in health research: two decades of experience from a research project on HIV in rural Uganda. Trop Med Int Health 2009, 14:190-195. Ministry of Health, Republic of Uganda: National Antiretroviral Treatment and Care Guidelines for Adults and Children. Kampala;, First 2003. Ministry of Health, Republic of Uganda: National Antiretroviral Treatment and Care Guidelines for Adults and Children. Kampala;, Second 2008. Uganda Government: The State of Uganda Population report 2008.[http:// www.popsec.org/documents/state_of_uganda_population_report_2008.pdf]. Lopman B, Cook A, Smith J, Chawira G, Urassa M, et al: Verbal Autopsy can consistently measure AIDS mortality: a validation study in Tanzania and Zimbabwe. J Epidemiol Community Health 2010, 64:330-334. Negin J, Wariero J, Cumming RG, Mutuo P, Pronyk PM: High rates of AIDSrelated mortality among older adults in rural Kenya. J Acquir Immune Defic Syndr 2010, 55:239-244. WHO, UNAIDS, UNICEF: Towards universal access: scaling up priority HIV/ AIDS interventions in the health sector. Progress report 2008. Geneva, World Health Organization, Geneva;[http://www.who.int/hiv/pub/ towards_universal_access_report_2008.pdf], accessed 8 March 2011. UNAIDS/WHO: AIDS Epidemic update December 2009. UNAIDS. Geneva; [http://data.unaids.org/pub/Report/2009/jc1700_epi_update_2009_en.pdf], accessed 28 February 2011. Mermin J, Were W, Ekwaru JP, Moore D, Downing R, et al: Mortality in HIVinfected Ugandan adults receiving antiretroviral treatment and survival of their HIV-uninfected children: a prospective cohort study. Lancet 2008, 371:752-759. Reniers G, Araya T, Davey G, Nagelkerke N, Berhane Y, et al: Steep declines in population-level AIDS mortality following the introduction of antiretroviral therapy in Addis Ababa, Ethiopia. AIDS 2009, 23:511-518. Badri M, Cleary S, Maartens G, Pitt J, Bekker LG, et al: When to initiate highly active antiretroviral therapy in sub-Saharan Africa? A South African cost-effectiveness study. Antivir Ther 2006, 11:63-72. Ronsmans C, Vanneste AM, Chakraborty J, Van Ginneken J: A comparison of three verbal autopsy methods to ascertain levels and causes of maternal deaths in Matlab, Bangladesh. Int J Epidemiol 1998, 27:660-666. Todd JE, De Francisco A, O’Dempsey TJ, Greenwood BM: The limitations of verbal autopsy in a malaria-endemic region. Ann Trop Paediatr 1994, 14:31-36. Byass P, Fottrell E, Huong DL, Berhane Y, Corrah T, et al: Refining a probabilistic model for interpreting verbal autopsy data. Scandinavian Journal of Public Health 2006, 34:26-31. Oti SO, Kyobutungi C: Verbal autopsy interpretation: a comparative analysis of the InterVA model versus physician review in determining causes of death in the Nairobi DSS. Popul Health Metr 2010, 8:21. Murray CJ, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the symptom pattern method for analyzing verbal autopsy data. PLoS Med 2007, 4:e327.

doi:10.1186/1478-7954-9-36 Cite this article as: Mayanja et al.: Using verbal autopsy to assess the prevalence of HIV infection among deaths in the ART period in rural Uganda: a prospective cohort study, 2006-2008. Population Health Metrics 2011 9:36.

134


Wan et al. Population Health Metrics 2011, 9:37 http://www.pophealthmetrics.com/content/9/1/37

RESEARCH

Open Access

Epidemiologic application of verbal autopsy to investigate the high occurrence of cancer along Huai River Basin, China Xia Wan1, Maigeng Zhou2, Zhuang Tao2, Ding Ding3,4 and Gonghuan Yang1,2*

Abstract Background: In 2004, the media repeatedly reported water pollution and “cancer villages” along the Huai River in China. Due to the lack of death records for more than 30 years, a retrospective survey of causes of death using verbal autopsy was carried out to investigate cancer rates in this area. Methods: An epidemiologic study was designed to compare numbers of deaths and causes of death between the study areas with water pollution and the control areas without water pollution in S County and Y District in 2005. The study areas were selected based on the distribution of the Huai River and its tributaries. Verbal autopsy was used to assist cause of death (COD) diagnoses and to verify mortality rates. The standard mortality rates (SMRs) of cancer in the study area were compared with those in the control areas. In order to verify the difference between mortality rates due to cancers in the study and the control areas, patients who reported having cancer in the survey received a second diagnosis by national and provincial oncologists with pathological and laboratory examinations. Comparisons were made to determine if differential cancer prevalence rates in the study and control areas were similar to the difference in mortality due to cancer in these study and control areas. Mortality rates of cancers in study and control areas were also compared with national statistics for the rural population of China. Results: Over five years, 3,301 deaths were identified, including 1,158 cancer deaths. The annual average SMRs of cancer in the study areas of S County and Y District were 277.8/100,000 and 223.6/100,000, respectively, which is three to four times higher than those in the control areas. In addition, a total of 626 cases of cancer in the study and control areas were confirmed. The prevalence rates of cancer were 545/100,000 and 128.1/ 100,000 per year in the study and control areas in S County, respectively, and 440.9/100,000 and 200/100,000 per year in the study and control areas in Y District, respectively. The mortality and prevalence rates of digestive cancers were higher in the study areas than the control areas. In 2000, the SMR for cancer in rural areas nationwide was 120.9/100,000, and in study areas in S County and Y District, the excess rates of deaths were 184/100,000 and 138.8/100,000, respectively. Conclusions: The death rates of digestive cancers were much higher in the study areas of S County and Y District. The patterns for between-area differences in prevalence and mortality rates of cancer were similar. Verbal autopsy is shown to be a useful tool in retrospective mortality surveys in low-resource areas with limited access to health care. Keywords: Verbal autopsy, cancer, mortality rate, prevalence rate, water pollution

* Correspondence: yangghuan@vip.sina.com 1 Institute of Basic Medical Sciences of Chinese Academy of Medical Sciences/School of Basic Medicine of Peking Union Medical College, Beijing, China Full list of author information is available at the end of the article © 2011 Wan et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

135


Wan et al. Population Health Metrics 2011, 9:37 http://www.pophealthmetrics.com/content/9/1/37

Page 2 of 9

they are adjacent to the Huai River. There was one study area and one control area in each county/district. In S County, the study area was irrigated by the main branch and stream of the Huai River, while the control area was up in the hills where the Huai did not pass through. In the Y District, the study area was selected within five kilometers from the main branch of the Huai River, and the control area was more than five kilometers away from the river (Figure 1). Based on the power and sample size calculations [7] (assuming the cancer mortality rates in the study and control areas are 0.03% and 0.01%, respectively), it was determined that 50,000 people from each study and control area should be investigated. Based on inclusion criteria and power and sample size calculations, 25 villages were selected as study areas and 19 were selected as control areas in S County, and 31 villages were selected as study areas and 32 as control areas in Y District.

Background In 2004, China Central Television and other media outlets reported water pollution and “cancer villages” [1] along the Huai River, which caused concern in the public and the government. An important question to be answered is whether cancer prevalence and mortality rates are significantly higher in these villages. Normally, an investigation of this of type would depend on a vital registration system with accurate data, especially on COD. However, there has been no routine vital registration system on COD in these impoverished areas, and the only retrospective survey of COD was conducted from 1973 to 1975 [2]. In addition, there were no population-based reports on COD in the following years. To answer the above research question, a retrospective survey of death causes was conducted using verbal autopsy (VA). VA is an interview carried out with family members and/or caregivers of the deceased, using a structured questionnaire to elicit signs and symptoms and other pertinent information that can be used to assign a probable underlying cause of death [3]. The VA procedure was validated for adult deaths in China in 2005 [4], and the sensitivity and specificity for cancers exceeded 85% and 95%, respectively. Moreover, VA has been used to evaluate the quality of the urban [5] and rural [6] COD reporting system of China’s Disease Surveillance Points (DSP) System, demonstrating the feasibility of using this tool in practice. In the current study, we use the validated VA to determine whether a high prevalence of cancer exists along the Huai River basin. Findings will influence further research on the relationship between water pollution and cancers.

Death cause investigation with VA

The VA questionnaire was derived from standard questions [8], the operational characteristics of which have been previously assessed in China [4]. This questionnaire was used to ask a family member to report symptoms of the decedent, and the duration, diagnosis, and medication history during illness. In addition, the questionnaire also included open-ended questions to the respondent, including a narrative of events leading to death, diagnostic records by the hospital, medication records, etc. When available, hospital records, laboratory tests, and death certificates were photocopied and included in the review process. The questions asked family members varied by age, gender, and disease of the decedent. The total number of questions ranged from 66 to 82 and the interview time was about 30 to 40 minutes. Of all the questions, about 30 were related to symptoms caused by cancer (Additional file 1). All of the deaths between July 1, 2002 and June 30, 2005 in the study and control areas were collected. Then, family members or main caregivers of the deceased above 5 years old were interviewed using the VA questionnaire. The interviews were conducted by local health workers who underwent a three-day training. Completed VA questionnaires were reviewed by an independent panel of local senior clinical experts who had completed a training course on how to use VA questions to diagnose specific cancers. Two experts independently identified the COD based on the information recorded on the form and in available records. Only one cause was assigned for each death. Each physician was unaware of the diagnosis assigned by the other physician. If the two diagnoses were discordant, the forms were reviewed by a third physician. A total of 11.7% of cases went to a third review. If the diagnoses of the

Methods Based on the distribution of the Huai River and its tributaries, an epidemiologic study was designed to compare death rates and COD between the study areas with water pollution and the control areas without water pollution. VA procedure was used to assist COD diagnosis and to verify the mortality rates. The investigation was conducted from August to December 2005. The standard mortality rates (SMRs) of cancers in the study areas were compared with those in the control areas. In order to verify the difference in cancer-caused mortality rates between the study and the control areas, a prevalence survey of cancers was used to verify the mortality rates by VA. Additionally, mortality rates of cancers in the study and control areas were compared with national statistics from China’s rural population. Selection of study areas

The S County and Y District were selected for this study because they were reported to have high cancer rates, and

136


Wan et al. Population Health Metrics 2011, 9:37 http://www.pophealthmetrics.com/content/9/1/37

Page 3 of 9

Figure 1 Map of study and control areas in S County and Y District.

sufficient evidence were confirmed. This evidence usually involves pathological and laboratory examinations. Those with insufficient evidence were accepted for further diagnosis. In addition, a prevalence survey was conducted in patients. The prevalence survey included demographic characteristics, and other structured questions regarding disease-related symptoms and duration, hospital diagnosis level, diagnosis certificate, diagnosis record, and related laboratory examination.

three experts were not consistent with each other, the three experts were asked to discuss the case and to reach a consensus. If they could come to an agreement, the diagnosis would be accepted. Otherwise, the death was recorded as having an unidentified cause. Deaths with unclear diagnoses would be further diagnosed by a team of experts in Beijing. Deaths with an unspecified cause accounted for 6.6% of all cases. Finally, local encoding professionals carried out coding based on the 10th revision of the International Classification of Diseases (ICD-10) for each VA questionnaire. Provincial and national ICD-10 coding experts randomly checked COD coding and corrected incorrect coding.

Data analysis

All statistical analyses were conducted using SAS 9.2 software (PUMC, China). Nationwide rural mortality rates came from the 2000 annual report of the DSP System [9]. The population census data from 2000 was used for the SMR [10]. The DSP System for COD from 1991 to 2000 covered 10 million people in 145 locations in all provinces by multiple-stratified random sampling. This nationally representative sample reflected regional population distributions, urban and rural areas, age and sex, and eastern, central, and western regions of the country. Analytical procedures for age distribution in the study and control areas included descriptive statistics (mean, standard deviation) and student’s t test. Mortality rates, prevalence rates, relative risk, 95% confidence interval (CI) for standardized mortality rate [11], and p-values using the cross product difference test method [11] of the study and control areas were calculated. The formula for 95% CI is:

Verification of COD diagnosis with prevalence surveys

In order to verify the difference in mortality caused by cancer between the study and control areas, a prevalence survey was used to determine if the difference in prevalence rates of cancers in the study and control areas were identical to the difference in mortality caused by cancer in the study and control areas. First, lists of patients with cancer who were still alive during the survey period were enumerated by village doctors (Village doctors are primary caregivers in rural China who provide basic medical procedures or referrals to county-level medical facilities. There is usually one village doctor per village. Village doctors usually know each villager well and can easily contact villagers for medical purposes.) through contact with each individual in the villages of the study and control areas. Then, all of these patients were examined by provincial or national oncologists through review of their medical records. Based on the guidelines for cancer diagnosis, those patients with

p − uα / 2 ×

137

Wi pi qi

Wi

2

, p + uα / 2 ×

Wi pi qi

Wi

2


Wan et al. Population Health Metrics 2011, 9:37 http://www.pophealthmetrics.com/content/9/1/37

Page 4 of 9

areas. In the study and control areas in S County, the prevalence rates were 545/100,000 and 128.1/100,000 per year, respectively. In the study and control areas in Y District, the rates were 440.9/100,000 and 200/ 100,000 per year, respectively. For all types of digestive cancer under investigation, the mortality and prevalence rates in the study areas were higher than those in the control areas (e.g., liver cancer [Figure 3]).

Where p: standard mortality rate; ua/2 : limit value of U distribution; p i: age-specific mortality rate; qi : 1-pi ; and W i : the proportion of age-specific standard population. To calculate the excess mortality rates for cancers between the study and control areas and to compare those with the national statistics for the rural populations, the following formula was used: the excess mortality rates for cancer = ∑[(age-specific mortality rate for cancer in a certain area - age-specific mortality rate for cancer at the national level) × age-specific population in this area]/total population in this area

Cancer mortality rates in study areas compared to the general rural population

In 2000, the SMR of cancer in people aged 5 years and above in rural populations nationwide was 120.9/ 100,000. Compared with this average mortality cancer rate, the study areas in S County and Y District had an excess mortality of 184/100,000 and 138.8/100,000, respectively. The mortality rates among people in the two control areas were lower than the average level (Table 3).

Results Demographic characteristics of deaths

In total, 3,301 deaths in people above 5 years of age were identified, of which 1,158 cases were caused by cancers. Among the 1,158 people who died of cancer, 83.8% in the study areas and 83.5% in the control areas visited a county-level or above health facility. The differences in the age of the deceased between the study and control areas were not statistically significant (66.9 versus 68.1 years in S County, p = 0.17; 70.1 versus 71.6 years in Y District, p = 0.08). The cancer mortality rates in the study areas were higher than those in the control areas, regardless of age group (Table 1).

Discussion In a location where there is no registration of COD and not all medical records are kept by family members after a death, it is very difficult to acquire accurate data on COD. VA can be used for this investigation, especially in places with a lack of health services and no vital registration of COD. In most previous studies on VA, VA was mainly used to assist COD diagnosis for infectious, infant, child, or maternal diseases. The usage of VA is increasing among adults in many developing countries, such as India [12], rural Ethiopia [13], and other African countries [14,15]. By means of VA, reliable COD can be inferred. VA can have high sensitivity for common diseases, and in particular for cancer. In India, the sensitivity of VA for cancer reaches 94% to 95%[16,17]. In China, its sensitivity reaches 85% and specificity exceeds 95% [4]. VA plays a significant important role in finding patterns of COD in these areas. Accuracy is a major issue of concern for verbal autopsy. The accuracy of VA is influenced by many factors, such as COD, characteristics of the deceased, classification of COD, the design and content of the questionnaire, and procedures carried out in the field [18]. Several studies have attempted to assess the validity of the VA instrument by validating it against a “gold standard,” namely medical records of people who have died [19]. However, in this study, we had few medical records, making it difficult to validate the results of VA. Therefore, in order to assess the COD from VA more accurately, strict quality control strategies were adopted. First, because deaths are commonly underreported as a result of cultural burial rituals and concerns in rural China [20], we collected death tolls from the registered

Cause-specific mortality fractions for major causes of death

In the study areas of S County and Y District, the cancer mortality fractions were 48% and 44%, respectively, suggesting that cancer was the leading COD, while in the control areas those fractions were only about 20%, similar to the general rural population (Figure 2). Cancer mortality rates in the study areas

In the study areas in S County and Y District, the annual average SMR of cancer in people 5 years of age and above was 277.8/100,000 and 223.6/100,000, respectively, which is three to four times higher than the SMR in the control areas. The mortality rates of lung, stomach, esophageal, liver, and colorectal cancer in the study areas were two to six times higher than those in the control areas. The differences in cancer mortality between the study and control areas for different types of cancers were significant, except for colorectal cancer (Table 2). Consistency in cancer mortality and prevalence rates

A total of 657 patients were reported to have cancer, of whom 37 had insufficient evidence for a cancer diagnosis. After further physical examination, 31 patients were excluded from having a diagnosis of cancer. In total, there were 626 cases of cancer in the study and control

138


Wan et al. Population Health Metrics 2011, 9:37 http://www.pophealthmetrics.com/content/9/1/37

Page 5 of 9

Table 1 Mortality rate (per 100,000) of main causes of death by age group in study and control areas in S County and Y District, July 2002-June 2005 Male Study area Disease

5-14 15-34 35-64

Female Control area

65+ 140.3

5-14 15-34 35-64

Study area 65+

Control area

5-14 15-34 35-64

65+

S Infectious diseases County

0.0

3.7

23.7

0.0

0.0

19.9

77.0

0.0

0.0

16.9

83.7

Neoplasms Blood and bloodforming organs Endocrine system

38.7 0.0

11.0 0.0

386.3 2587.7 11.0 3.4 0.0 0.0

4.2 0.0

152.9 0.0

449.0 12.8

11.1 0.0

19.6 0.0

192.4 1548.8 0.0 0.0

0.0

0.0

0.0

46.8

0.0

0.0

13.3

25.7

0.0

0.0

3.4

Neuropsychiatric system

0.0

7.3

10.2

124.7

0.0

4.2

6.6

102.6

0.0

0.0

6.8

Circulatory system

0.0

25.7

84.7

1091.2

0.0

0.0

76.4

769.7

0.0

0.0

Respiratory system

0.0

3.7

40.7

592.4

0.0

0.0

16.6

500.3

0.0

0.0

Digestive system

0.0

0.0

10.2

233.8

0.0

0.0

19.9

115.5

0.0

Genitourinary system

0.0

0.0

0.0

46.8

0.0

0.0

3.3

12.8

0.0

5-14 15-34 35-64

65+

0.0

0.0

3.3

50.1

0.0 0.0

4.4 0.0

82.1 6.6

238.0 0.0

27.9

0.0

4.4

0.0

12.5

97.7

0.0

4.4

6.6

87.7

47.3

906.9

0.0

0.0

42.7

927.0

6.8

669.7

0.0

4.4

16.4

400.9

0.0

13.5

223.2

0.0

4.4

9.9

112.7

0.0

10.1

14.0

0.0

0.0

0.0

12.5

Maternity diseases

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

7.8

0.0

0.0

0.0

0.0

0.0

0.0

Accident or injury

48.4

62.4

77.9

140.3

44.1

59.4

86.4

243.7

0.0

23.5

74.3

139.5

0.0

13.1

32.9

200.4

Congenital malformation

0.0

3.7

3.4

0.0

0.0

0.0

0.0

0.0

11.1

0.0

0.0

0.0

0.0

0.0

0.0

0.0

6.8

109.1

192.4

0.0

9.9

488.5

Other

0.0

7.3

Total

87.1

124.8

11.0

4.2

10.0

0.0

0.0

251.2

0.0

0.0

647.2 5113.0 66.2

72.2

405.5 2501.6 22.3

51.0

371.4 3962.6

0.0

35.0

Infectious diseases Y District

0.0

4.0

28.8

0.0

0.0

13.9

13.1

0.0

0.0

11.4

0.0

0.0

0.0

39.7

Neoplasms Blood and bloodforming organs Endocrine system

17.7 0.0

12.1 0.0

392.8 1864.0 0.0 0.0

0.0 0.0

3.9 0.0

107.8 0.0

809.6 0.0

0.0 0.0

20.9 0.0

144.3 1114.0 0.0 0.0

0.0 0.0

8.1 0.0

68.4 0.0

304.5 0.0

Neuropsychiatric system

0.0

0.0

3.6

56.5

0.0

0.0

3.5

13.1

0.0

4.2

0.0

64.3

0.0

4.1

0.0

39.7

0.0

0.0

7.2

22.6

0.0

0.0

13.9

26.1

0.0

0.0

0.0

42.8

0.0

0.0

3.6

92.7

Circulatory system

0.0

12.1

86.5

1107.1

0.0

11.7

90.4

1188.3

0.0

4.2

57.0

1135.4

0.0

0.0

46.8

1668.0

Respiratory system

0.0

0.0

18.0

338.9

0.0

0.0

27.8

705.1

0.0

0.0

0.0

353.5

0.0

0.0

14.4

423.6

Digestive system

0.0

4.0

25.2

169.5

0.0

3.9

24.3

195.9

0.0

0.0

15.2

192.8

0.0

0.0

18.0

198.6

Genitourinary system

8.8

8.1

10.8

33.9

0.0

0.0

0.0

91.4

0.0

0.0

11.4

32.1

0.0

4.1

7.2

26.5

67.8

75.0

210.3 2530.4

Maternity diseases

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

8.1

3.6

0.0

Accident or injury

44.1

44.3

50.5

101.7

21.6

46.8

66.1

248.1

9.5

12.5

30.4

96.4

12.1

12.2

36.0

304.5

Congenital malformation

0.0

0.0

3.6

0.0

0.0

3.9

3.5

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

225.9

3.8

321.3

0.0

0.0

10.8

357.4

273.4 3427.6 12.1

36.6

208.7 3455.1

Other

0.0

0.0

14.4

0.0

0.0

38.2

404.8

0.0

4.2

Total

70.6

84.5

641.5 3987.8 21.6

70.2

389.4 3695.5

9.5

46.0

diagnose whether the deceased suffered from cancer. Because we lack a gold standard, we used the index of prevalence rates to test and verify the distribution of COD between the study and control areas in S County and Y District. The similarity in patterns of prevalence and mortality rates serve as an extra validation of the VA. The relationship of mortality and morbidity patterns between the study and control areas is useful to determine the accuracy of VA diagnoses and that the impact of the exposure tested (ie, water pollution) was being reliably measured by VA.

permanent residence administration departments and countryside health departments to avoid underestimation. Second, we derived COD from verbal autopsy results using physicians’ reviews. This approach is one of the most widely used in VA studies and is regarded as having high sensitivity and specificity for selected COD but low repeatability for deriving COD [19,21]. In this study, senior experts located in Beijing re-examined 5% of the deaths, selected by random sampling, after the local clinical experts completed the COD assignments. Third, to prevent reporting bias, we used VA to

139


Wan et al. Population Health Metrics 2011, 9:37 http://www.pophealthmetrics.com/content/9/1/37

Page 6 of 9

Figure 2 Comparison of cause-specific mortality fractions for major causes of death in study and control areas in S County and Y District and rural areas nationwide.

Table 2 Comparison of mortality rates among people above 5 years of age in S County and Y District, July 2002-June 2005 Disease

Number of deaths

Mortality rate (1/100,000)

SMR (1/100,000)* (95% CI)

Relative risk (RR)

p-value

Study area

Control area

Study area

Control area

Study area

Control area

972

617

671.4

439.0

0.000

462

128

319.1

91.1

338.6 (311.6, 365.7) 72.1 (59.5, 84.7)

1.7

Total cancer

3.9

0.000

Esophageal cancer Stomach cancer

108

21

74.6

14.9

586.1 (549.1, 623.1) 277.8 (252.4, 303.2) 64.1 (52.0, 76.2)

11.8 (6.7, 16.8)

5.5

0.000

83

14

57.3

10.0

49.4 (38.8, 60.0)

8.0 (3.8, 12.1)

6.2

0.000

Liver cancer

84

25

58.0

17.8

50.2 (39.4, 60.9)

14.0 (8.5, 19.6)

3.6

0.000

Colorectal cancer 14

5

9.7

3.6

8.4 (4.0, 12.8)

2.5 (0.3, 4.8)

3.3

0.018

Lung cancer

107

36

73.9

25.6

63.6 (51.6, 75.7)

19.8 (13.3, 26.2)

3.2

0.000

967

745

676.7

534.0

493.2 (461.4, 525.1)

415.7 (385.7, 445.8)

1.2

0.000

Total cancer

429

139

300.2

99.6

223.6 (201.9, 245.3)

78.4 (65.3, 91.6)

2.9

0.000

Esophageal cancer

51

22

35.7

15.8

25.7 (18.5, 33.0)

11.8 (6.8, 16.7)

2.2

0.002 0.000

S County Total

Y District Total

Stomach cancer

88

23

61.6

16.5

43.2 (33.9, 52.4)

12.9 (7.6, 18.3)

3.3

Liver cancer

97

22

67.9

15.8

53.3 (42.4, 64.2)

12.6 (7.3, 17.9)

4.2

0.000

Colorectal cancer 21

7

14.7

5.0

10.2 (5.7, 14.6)

4.0 (1.0, 7.0)

2.5

0.028

Lung cancer

26

63.0

18.6

46.1 (36.3, 55.8)

14.6 (9.0, 20.3)

3.2

0.000

90

*: 2000 census data were used for the standard population

140


Wan et al. Population Health Metrics 2011, 9:37 http://www.pophealthmetrics.com/content/9/1/37

Page 7 of 9

Figure 3 Comparison of mortality and prevalence rates (1/100,000) for all types of cancer and liver cancer in S County and Y District, July 2002-June 2005.

prevalence rates remained low in control areas (60% to 65% of the 2000 national prevalence for rural areas), while prevalence rates increased dramatically in the study areas, where villagers relied on polluted water

Based on the survey conducted from 1973 to 1975, both S County and Y District had low prevalence rates of cancer (around 70% of the national prevalence for rural areas). The 2005 investigation found that cancer

141


Wan et al. Population Health Metrics 2011, 9:37 http://www.pophealthmetrics.com/content/9/1/37

Page 8 of 9

Table 3 Average excess mortality rates of cancers among people in study and control areas in S County and Y District compared to the rural national level (1/100,000), July 2002-June 2005 S County Y District

Group

Total cancer

Esophageal cancer

Stomach cancer

Liver cancer

Colorectal cancer

Study area

184.0

57.5

28.0

28.6

4.0

Lung cancer 49.2

Control area

-61.9

-4.8

-23.4

-15.4

-2.9

-2.8

Study area Control area

138.8 -48.2

14.4 -3.4

25.2 -16.0

33.8 -15.4

7.8 -1.3

33.0 -8.5

die at home [22], using VA to assign COD can be useful and feasible. Using VA, we have found evidence for higher cancercaused mortality rates in the areas near the Huai River, where water pollution is a serious concern. Our findings demonstrate the feasibility of using VA as a diagnosis tool for cancer in rural China.

sources for drinking water. Prevalence rates were particularly high for gastrointestinal cancers. In this study, also we examined a number of risk factors, but we did not find significant differences between study and control areas. These risk factors included infections with Helicobacter pylori and hepatitis B, smoking, alcohol use, dietary behaviors (e.g., consuming pickled, smoked, or molded food), and indoor cooking and heating practices. Based on these, we can infer that there was a higher occurrence of cancer along the Huai River basin. We made efforts not to impose biases during the VA process (for example, VA procedures were conducted similarly in the study and control areas). However, this study has limitations. First, retrospective study design poses risks for recall bias. A long recall period is likely to impair a respondent’s ability to recollect and report relevant information. A recall period ranging from one to 12 months is generally thought to be acceptable. One validation study showed no significant effect on sensitivity or specificity using differences in recall period length of one to 21 months [19]. In this study, because all deaths were from July 1, 2002 to June 30, 2005, the recall period for respondents is longer. Therefore, we use the prevalence study to confirm the VA results. Second, the questionnaire used in this paper was long, and it took at least 30 minutes to complete. The VA questionnaire should be further improved in future studies to enhance its operability. Third, we used the 2000 DSP data as a reference to the general rural population, because 2002 to 2005 data were not available. Therefore, data from the study and control areas were not collected at the same time as the reference group (i.e., overall rural population). Finally, other cancer risk factors, such as other types of environmental pollution, might have confounded the observed patterns between the study and control areas. There may also be residual confounding from other unidentified cancer risk factors, which could affect the observed patterns.

Additional material Additional file 1: Questionnaire (translated into English) used for verbal autopsy collection in this study.

Acknowledgements The authors thank the Ministry of Science and Technology for supporting the project “Research on evaluation of the association between water pollution and cancer along Huai River (2006BAI19B03).” Author details Institute of Basic Medical Sciences of Chinese Academy of Medical Sciences/School of Basic Medicine of Peking Union Medical College, Beijing, China. 2Chinese Center for Disease Control and Prevention, Beijing, China. 3 Graduate School of Public Health, San Diego State University, San Diego, California, USA. 4School of Medicine, University of California, San Diego, San Diego, La Jolla, California, USA. 1

Authors’ contributions XW participated in the design of the study, performed statistical analyses, and drafted the manuscript. MZ and ZT participated in the design of the study and performed statistical analyses. DD participated in drafting the manuscript. GY conceptualized the study, designed and organized of the study, and helped to draft the manuscript. All authors read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Received: 14 March 2011 Accepted: 4 August 2011 Published: 4 August 2011 References 1. [http://ah.anhuinews.com/system/2004/11/15/001046066.shtml], accessed 2004-11-15. 2. The study on Cancer death survey in Chinese population Cancer preventive and control office of Ministry of Health. edition. Beijing (China): People’s Health Press; 1979. 3. Verbal autopsy standards: Ascertaining and attributing cause of death. World Health Organisation edition; 2007. 4. Yang Gonghuan, Rao Chalapati, Jiemin Ma, Wang Lijun, Wan Xia, Guillermo Dubrovsky, Lopez AD: Validation of verbal autopsy procedures for adult deaths in China. International Journal of Epidemiology 2006, , 35: 741-748.

Conclusions In conclusion, though limitations apply, the symptombased VA can be used to infer COD in places where medical services are insufficient and people usually die at home. In rural China, where almost 80% of people

142


Wan et al. Population Health Metrics 2011, 9:37 http://www.pophealthmetrics.com/content/9/1/37

5.

6.

7. 8. 9.

10.

11.

12.

13.

14.

15. 16.

17. 18.

19. 20.

21.

22.

Page 9 of 9

Rao Chalapati, Yang Gonghuan, Jianping Hu, Jiemin Ma, Wan Xia, Lopez AD: Validation of cause-of-death statistics in urban China. International Journal of Epidemiology 2007, , 36: 642-651. Lijun Wang, Gonghuan Yang, Jiemin Ma, Chalapati Rao, Xia Wan, Dubrovsky Guillermo, Lopez AD: Evaluation of the quality of cause of death statistics in rural China using verbal autopsies. Journal of Epidemiology Community Health 2007, , 61: 519-526. Lemeshow SHJD, Klar J, Lwanga SK: Adequacy of Sample Size in Health Studies. Fudan Univerisity Press; 2010. AMMP-Tanzania: Adult mortality and morbidity project.[http://www.Ncl.ac. uk/ammp/tools_methods/index.html], accessed 2003. Yang Gonghuan, Zeng Guang, Zheng Xiwen, Etc: Selection of the second stage of DSP system and its representative. Chin J Epidemiol. Chinese Journal of Epidemiology 1992, 13(4):79-81. Tabulation on the 2000 population census of the people’s republic of China Population census office under the State council;Dept. of population statistics, Social, Science and Technology Statistics of National Bureau of Statistics of China edition. Beijing (China): China Statistics Press; 2002. Wang G: A new significance test for standardized rate: an application of CPD in comprehensive evaluation (1). Chinese Journal of Hospital Statistics 1997, 4(1):28-29. Kanungo S, Tsuzuki A, Deen JL, Lopez AL, Rajendran K, Manna B, Sur D, Kim DR, Gupta VK, Ochiai RL, et al: Use of verbal autopsy to determine mortality patterns in an urban slum in Kolkata, India. Bull World Health Organ 2010, 88(9):667-674. Fantahun M, Fottrell E, Berhane Y, Wall S, Hogberg U, Byass P: Assessing a new approach to verbal autopsy interpretation in a rural Ethiopian community: the InterVA model. Bull World Health Organ 2006, 84(3):204-210. Quigley MA, Chandramohan D, Setel P, Binka F, Rodrigues LC: Validity of data-derived algorithms for ascertaining causes of adult death in two African sites using verbal autopsy. Trop Med Int Health 2000, 5(1):33-39. Mathers CD, Sadana R, Salomon JA, Murray CJ, Lopez AD: Healthy life expectancy in 191 countries, 1999. Lancet 2001, 357(9269):1685-1691. Gajalakshmi V, Peto R, Kanaka S, Balasubramanian S: Verbal autopsy of 48 000 adult deaths attributable to medical causes in Chennai (formerly Madras), India. BMC Public Health 2002, 2:7. Gajalakshmi V, Peto R: Verbal autopsy of 80,000 adult deaths in Tamilnadu, South India. BMC Public Health 2004, 4:47. Huong DL, Minh HV, Byass P: Applying verbal autopsy to determine cause of death in rural Vietnam. Scand J Public Health Suppl 2003, 62:19-25. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84(3):239-245. Xia Wan, Zhou MG, Wang LJ, Chen AP, Yang GH: Using the general growth balance method and synthetic extinct generations method to evaluate the underreporting of death reporting in Disease Surveillance Points from 1991 to 1998. Chinese Journal of Epidemiology 2009, 39(3):927-932. Jo J, M KK, Blasi A, Baydur A, Juarez R: Elucidating nonlinear baroreflex and respiratory contributions to heart rate variability in obstructive sleep apnea syndrome. Conf Proc IEEE Eng Med Biol Soc 2005, 4:4430-4433. A series of reports on Chinese disease surveillance (10) - 1999 annual report on Chinese disease surveillance. Dept. of Control Disease of MOH, Chinese Academy of Preventive Medicine; 199935.

doi:10.1186/1478-7954-9-37 Cite this article as: Wan et al.: Epidemiologic application of verbal autopsy to investigate the high occurrence of cancer along Huai River Basin, China. Population Health Metrics 2011 9:37.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

143


Hernández et al. Population Health Metrics 2011, 9:38 http://www.pophealthmetrics.com/content/9/1/38

RESEARCH

Open Access

Assessing quality of medical death certification: Concordance between gold standard diagnosis and underlying cause of death in selected Mexican hospitals Bernardo Hernández1, Dolores Ramírez-Villalobos1, Minerva Romero1, Sara Gómez1, Charles Atkinson2 and Rafael Lozano2*

Abstract Background: In Mexico, the vital registration system relies on information collected from death certificates to generate official mortality figures. Although the death certificate has high coverage across the country, there is little information regarding its validity. The objective of this study was to assess the concordance between the underlying cause of death in official statistics obtained from death certificates and a gold standard diagnosis of the same deaths derived from medical records of hospitals. Methods: The study sample consisted of 1,589 deaths that occurred in 34 public hospitals in the Federal District and the state of Morelos, Mexico in 2009. Neonatal, child, and adult cases were selected for causes of death that included infectious diseases, noncommunicable diseases, and injuries. We compared the underlying cause of death, obtained from medical death certificates, against a gold standard diagnosis derived from a review of medical records developed by the Population Health Metrics Research Consortium. We used chance-corrected concordance and accuracy as metrics to evaluate the quality of performance of the death certificate. Results: Analysis considering only the underlying cause of death resulted in a median chance-corrected concordance between the cause of death in medical death certificates versus the gold standard of 54.3% (95% uncertainty interval [UI]: 52.2, 55.6) for neonates, 38.5% (37.0, 40.0) for children, and 66.5% (65.9, 66.9) for adults. The accuracy resulting from the same analysis was 0.756 (0.747, 0.769) for neonates, 0.683 (0.663, 0.701) for children, and 0.780 (0.774, 0.785) for adults. Median chance-corrected concordance and accuracy increased when considering the mention of any cause of death in the death certificate, not just the underlying cause. Concordance varied substantially depending on cause of death, and accuracy varied depending on the true cause-specific mortality fraction composition. Conclusions: Although we cannot generalize our conclusions to Mexico as a whole, the results demonstrate important problems with the quality of the main source of information for causes of death used by decisionmakers in settings with highly technological vital registration systems. It is necessary to improve death certification procedures, especially in the case of child and neonatal deaths. This requires an important commitment from the health system and health institutions.

* Correspondence: rlozano@uw.edu 2 Institute for Health Metrics and Evaluation, University of Washington, USA. 2301 5th Ave, Suite 600. Seattle, WA 98121, USA Full list of author information is available at the end of the article © 2011 Hernández et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

144


HernĂĄndez et al. Population Health Metrics 2011, 9:38 http://www.pophealthmetrics.com/content/9/1/38

Page 2 of 10

the national level, this figure was 0.5% for hospitals run by the MoH in the Federal District [6]. Studies in various countries [7-13], including Mexico [14,15], have assessed the validity of death certification by comparing the underlying cause of death in the medical death certificate with other sources that provide additional information regarding cause of death, such as medical records. In general, these studies indicate that the concordance between causes of death from death certificates and those obtained using other sources vary by place and cause of death. Most of these studies have been conducted in developed countries, comparing information on underlying causes of death obtained from death certificates with hospital records. They estimate either kappa coefficients or sensitivity and specificity to assess the validity of death certificates. The information from Latin America comes mainly from studies in Brazil, analyzing the validity and reliability of cause of death registration in specific areas of the country [9,10]. In Mexico, these studies have concentrated on infant deaths [14,15]. To our knowledge, there is no single study in Mexico that has analyzed the reliability of causes of death based on death certificates for a wide range of diseases. The objective of this study was to assess the concordance between the cause of death obtained from the medical death certificate and a rigorously-defined gold standard diagnosis based on medical records in hospitals in the Federal District and the state of Morelos, Mexico in 2009. We measured the quality of official mortality statistics in a sample of deaths that had occurred in medical units with high quality diagnoses. Gold standard criteria used in this study were developed by the Population Health Metrics Research Consortium (PHMRC) as part of a multisite study to validate verbal autopsy questionnaires in diverse populations [16].

Background Vital registration (VR) constitutes a key element for planning and evaluation of health systems in all countries. In Mexico, the VR system is managed by the National Institute of Statistics and Geography, Mexico (INEGI), in conjunction with the Ministry of Health (MoH) and the Civil Registry offices. The Mexican VR system relies on information regarding cause of death that is registered annually using data from medical death certificates. According to international assessments, Mexico’s VR system is rated among the best in terms of quality and completeness [1]. Important efforts have been made over the past years to improve the coverage and quality of the mortality registry [2], however, there is still room for improvement. Mexico has a Center for Disease Classification (CEMECE), which was established in 1985 and in 2008 was officially recognized by the Pan American Health Organization/World Health Organization as a Collaborating Centre for the Family of International Classifications [3]. The CEMECE is responsible for monitoring the quality and standardization of the use of the 10th revision of the International Classification of Diseases (ICD-10) in all areas of the health system. Since 2007, INEGI established an automated coding system for underlying cause of death, adopting the Automated Coding of Medical Entities (ACME) system [4] used by the US Centers for Disease Control and Prevention and adapting it to the Mexican context. There are currently 25 countries participating in the International Collaborative Effort on Automating Mortality Statistics [5]. The quality of the information provided in the medical death certificate, however, varies according to the personnel responsible for completing it. In Mexico in 2009, 97% of medical death certificates were completed by physicians and 3% by lay people with authorization of the MoH. These figures vary within the country. For example, 99.9% of medical death certificates are filled out by physicians in urban areas and 93.2% in rural areas. Of those medical death certificates filled out by physicians in the Federal District, only 20% of them are filled out by the physician who treated the deceased. In MoH hospitals in the Federal District, this figure ranges from 8.5% in adults, 17.3% in children, and 28.6% in neonatal deaths [6]. Another way to assess quality of death certification is by using the percentage of ill-defined deaths and examining the percentage of deaths at home. In 2009, Mexico registered around 565,000 deaths, 44.4% of which occurred in health facilities, 47.3% at home, and 8.3% in public areas. Of the 66,062 deaths registered in the Federal District, 62% occurred in health facilities. In addition, while 2.1% of deaths were coded as ill-defined at

Methods Population and sample

The sample was selected from deaths that occurred in public hospitals in the Federal District and the state of Morelos, Mexico in 2009. Following the protocol of PHMRC, 211 neonatal, 94 child, and 1,284 adult cases were selected to cover a list of causes of death that included infectious diseases, noncommunicable diseases, and injuries. The protocol of the PHMRC considered the selection of cases (100 or 30 depending on cause of death) from three main age groups (neonates, children, and adults). A list of these causes and final number of cases included in the study is described in Additional file 1. This list was used as part of the PHMRC project, although some causes of death were omitted in Mexico due to the lack of deaths from those causes. Deaths

145


Hernรกndez et al. Population Health Metrics 2011, 9:38 http://www.pophealthmetrics.com/content/9/1/38

Page 3 of 10

were identified in 34 public hospitals (see Additional file 2 for more details). Inclusion criteria for the study were deaths that occurred in the selected hospitals between January and December 2009 with a medical record available at the hospital. The age of each patient at the time of death was obtained from hospital records. Deaths were classified as neonatal deaths (first 27 days of life), child deaths (deaths from 28 days to <12 years) and adult deaths (12 years and up), following the general design of the PHMRC project.

documents for each death: the medical death certificate and the legal death certificate. The medical death certificate is a compulsory document that allows the relatives of the deceased to obtain the legal death certificate in the corresponding civil registry office, and also is the source of the official statistics on cause of death. The legal death certificate is the legal document required in order to proceed with the burial and any administrative procedure related to the deceased, such as legal proceedings related to inheritance, insurance, and pension payment, etc. Only an authorized judge of the Civil Registry can grant the legal death certificate. According to the General Law of Health, only physicians and personnel authorized by the MoH can fill out the medical death certificate. A hard copy of the medical death certificate is collected by the regional offices of INEGI and sent to their national headquarters where the automated coding to generate official figures takes place.

Gold standard cause of death

This study used the gold standard criteria developed by the PHMRC [16]. These gold standard criteria were developed by a committee of physicians involved in the study and underwent multiple cycles of group review. The gold standard criteria classified deaths into three levels based on the degree to which the information from the medical record provided certainty to classify the death as a given cause: level 1, level 2A, and level 2B. Level 1 diagnoses provide the highest level of diagnostic certainty possible for that condition, consisting of either an appropriate laboratory test or x-ray with positive findings, as well as medically observed and documented illness signs. Level 2A diagnoses are of moderate certainty, consisting of medically observed and documented illness signs. Level 2B was used in place of level 2A if medically observed and documented illness signs were not available but records exist for treatment of a particular condition. Level 1 criteria were used for all gold standard cases, and cases classified as level 2A or 2B were only accepted in situations where it proved impossible to gather sufficient level 1 cases for a particular condition. For the analysis in this paper, we present results pooling levels 1, 2A, and 2B gold standard causes of death. The following is an example of the gold standard criteria for breast cancer. For a case to be considered gold standard level 1 it had to have either an operative specimen with histological confirmation or a biopsy/fine needle aspiration cytology documented in the medical records. To be considered level 2A it had to have a mammography diagnosis and imaging evidence of metastases in other tissues based on CT scan/MRI/xrays. In cases where the basis for the initial diagnosis was no longer available, the case could be considered level 2B if there was documented evidence in the medical record of the patient having been under treatment for breast cancer at a recognized cancer hospital or cancer unit.

Procedure

This study is part of the PHMRC project. Following the protocol of that study, once access to the medical records of each hospital was granted, we began a general review of the mortality database of every hospital in the study to identify potential gold standard cases. When a potential case was identified, a trained physician reviewed the medical record, and when available, the autopsy report, to classify the case as one of the three levels of gold standard: level 1, 2A, and 2B. Review of medical records was carried out by six physicians who had received extensive training. A standardization and pilot study was carried out before the review of cases began. The physician team remained under strict supervision, and weekly meetings were held with all the members to review special cases and harmonize decision criteria. As part of the PHMRC study to validate the verbal autopsy questionnaire, verbal autopsy interviews were conducted with relatives of deceased persons whose diagnoses were classified as gold standard. For the current study, identification numbers for the death certificates of successfully interviewed gold standard cases were recovered from the hospitals and provided to personnel of the General Directorate of Health Information of the MoH. They, in turn, provided us with the coding for the underlying and other causes of death stated in the medical death certificate. The research protocol of this study was approved by the ethics and research committee of the National Institute of Public Health and of the participant institutions that required it.

Cause of death from official statistics

Analysis

The process of death certification in Mexico involves several participants and includes the generation of two

As it has been shown elsewhere, to assess concordance and accuracy between a diagnosis considered true (gold

146


Hernรกndez et al. Population Health Metrics 2011, 9:38 http://www.pophealthmetrics.com/content/9/1/38

Page 4 of 10

included in the verbal autopsy database, and this information was sent to the General Directorate of Health Information to recover the underlying cause of death in official statistics for those deaths. Most of the cases for which we could not recover the information from the medical death certificate were violent deaths (for which the medical death certificate was not available at the hospital). Because we could not identify their underlying cause of death in the official statistics, 140 cases were dropped from the analysis (many of these were stillbirths or violent deaths). The final analysis included 1,589 deaths from 34 hospitals. The mean number of causes of death mentioned in each medical death certificate for generation of the underlying cause of death was 2.97 (95% uncertainty interval [UI]: 2.92, 3.02) for adults, 3.18 (3.00, 3.36) for children, and 2.40 (2.18, 2.61) for neonates. The first step in the analysis was to assess the concordance between the cause of death that appeared in the medical death certificate and the gold standard for each age group, shown in Table 1. When analyzing only the underlying cause of death, the median chance-corrected concordance ranged from 38.5% for children to 66.5% for adults. When analyzing the concordance considering the sequence of causes of death mentioned in the medical death certificate compared to the gold standard, the median chance-corrected concordance increased, ranging from 58.9% for neonates to 75.9% for adults. This increase was the most substantial for children. A detailed analysis of the concordance between underlying causes of death from the medical death certificate versus the gold standard is presented in Additional file 3. As we can see in this analysis, some causes, like diabetes, are more often recorded in the medical death certificate than in the gold standard, suggesting that the physicians are overstating this cause in the medical death certificates. There is also important misclassification of diarrhea, pneumonia, burns, lung cancer, falls, and poisonings. Other causes such as AIDS, cervical cancer, and leukemia/lymphomas have very little misclassification. The median chance-corrected concordance between the medical death certificate and the gold standard varied substantially by cause of death, shown in Figure 1 for adults, Figure 2 for children, and Figure 3 for

standard) and the underlying cause of death in official vital registration, it is necessary to use performance metrics that allow us to make comparisons with different cause-specific mortality fraction (CSMF) compositions and variable cause lists [17]. We conducted the analysis using a list of 27 causes for adults, seven causes for children, and five causes for neonates including stillbirths (Additional file 1). Because we had information not only on the underlying cause of death, but also on the sequence of causes of death stated in the death certificate, we estimated concordance first considering only the underlying cause of death and then considering all causes of death recorded in the death certificate. We calculated the chance-corrected concordance for each cause of death and the median chance-corrected concordance as a summary measure by age group for adults, children, and neonates. Chance-corrected concordance constitutes a measure of concordance between two classification methods (in this case the assignment of causes in the official figures and the gold standard), correcting for the probability of agreement expected by chance. In addition, we calculated the median CSMF accuracy, which is a summary of the performance of a method in estimating CSMFs in the sample. Because results for both chance-corrected concordance and CSMF accuracy can be extremely sensitive to the CSMF composition of the test set, it is essential to report results for a sufficiently large set of randomly-generated CSMF test sets with different CSMF compositions. These CSMF compositions should be drawn randomly from an uninformative Dirichlet distribution [18]. To avoid bias, we used 500 test datasets (splits) from an uninformative Dirichlet distribution to estimate how well the estimated CSMFs compare to the true CSMFs, and we generated scatterplots to show the association between true and estimated CSMFs for each split. We also calculated a linear regression for each cause. The slope and intercept measure how accurately the estimated cause matches the true cause, with a slope of 1 and intercept of 0 indicating a perfect match. The root mean square error (RMSE) indicates how precisely the cause is estimated, with lower RMSE values indicating greater correlation.

Results Individual cause assignment

Table 1 Median chance-corrected concordance (%) by age group, for underlying diagnosis and all diagnoses

For this study, a total of 8,573 medical records of deaths from 36 hospitals were reviewed. Gold standard levels were assigned to 2,995 cases and informed consent was granted by participants to apply the verbal autopsy in 2,031 of those cases for deaths that had occurred in 2009. It was possible to recover information on the medical death certificate of 1,729 deaths from 2009

Underlying Diagnosis

147

All Diagnoses

Median

95% UI

Median

95% UI

Adults

66.5

(65.9, 66.9)

75.9

(75.4, 76.3)

Children

38.5

(37.0, 40.0)

64.0

(61.4, 66.3)

Neonates

54.3

(52.2, 55.6)

58.9

(56.9, 60.5)


Hernández et al. Population Health Metrics 2011, 9:38 http://www.pophealthmetrics.com/content/9/1/38

Page 5 of 10

Underlying Diagnosis

All Diagnoses

Poisonings Falls Other Infectious Diseases Pneumonia Other Cardiovascular Diseases Renal Failure TB Fires Diarrhea/Dysentery Acute Myocardial Infarction Lung Cancer COPD Road Traffic Stroke Other Noncommunicable Diseases Cirrhosis Breast Cancer Maternal Colorectal Cancer Stomach Cancer Homicide Diabetes Leukemia/Lymphomas Cervical Cancer AIDS Suicide Prostate Cancer 0

10

20

30

40

50

60

70

80

90

100

ChanceͲCorrected Concordance (%) Figure 1 Median chance-corrected concordance (%) by adult cause, for underlying diagnosis and all diagnoses.

diseases and other defined causes and the lowest for other cardiovascular diseases. In the case of neonatal deaths, stillbirths and meningitis/sepsis had the highest concordance, and birth asphyxia had the lowest. Detailed values for chance-corrected concordance are given in Additional file 4.

neonates. For adults, prostate cancer, suicide, AIDS, leukemia/lymphomas, and cervical cancer had the highest concordances, while other infectious diseases, falls, and poisonings had a lower concordance. For children, the highest concordance was found for other infectious

Underlying Diagnosis

All Diagnoses

Other Cardiovascular Diseases Diarrhea/Dysentery Sepsis Pneumonia Other Digestive Diseases Other Infectious Diseases Other Defined Causes of Child Deaths

Underlying Diagnosis

All Diagnoses

Birth Asphyxia Preterm Delivery Congenital Malformation Stillbirth Meningitis/Sepsis 0

10

20

30

40

50

60

70

80

90

0

100

10

20

30

40

50

60

70

80

90

100

ChanceͲCorrected Concordance (%)

ChanceͲCorrected Concordance (%)

Figure 2 Median chance-corrected concordance (%) by child cause, for underlying diagnosis and all diagnoses.

Figure 3 Median chance-corrected concordance (%) by neonate cause, for underlying diagnosis and all diagnoses.

148


Hernรกndez et al. Population Health Metrics 2011, 9:38 http://www.pophealthmetrics.com/content/9/1/38

Page 6 of 10

CSMF estimation

We estimated the CSMF accuracy of the medical death certificate in predicting the cause of death as identified in the gold standard, shown in Table 2. The accuracy indicates the ability of the medical death certificate to resemble the CSMFs as they are according to the gold standard. When considering only the underlying cause of death, the median accuracy ranged from 0.683 for child deaths to 0.780 for adults. Median accuracy increased when considering the mention of any cause of death in the medical death certificate versus the gold standard, ranging from 0.822 for child deaths to 0.887 for neonatal deaths. The true and estimated CSMFs vary substantially across the 500 Dirichlet splits. Figures 4 through 9 show estimated versus true CSMFs for AIDS, maternal deaths, lung cancer, pneumonia, diabetes, and other noncommunicable diseases in adults. The red line indicates perfect concordance between estimated and true CSMFs, and data points closer to the red line more accurately predict the CSMF for a particular cause. As we can see, in the case of AIDS (Figure 4) the accuracy is very high for different true CSMFs. In the case of maternal deaths (Figure 5), the death certificate overestimates the occurrence of these deaths when the true CSMF is low but underestimates it when the true CSMF is higher. In the case of lung cancer (Figure 6), the death certificate overestimates when the true CSMF is very low, while it underestimates the occurrence of these deaths when the true CSMF increases. In the case of pneumonia (Figure 7), the accuracy is very low, overestimating at low levels of the true CSMF and underestimating for high true CSMFs. For diabetes (Figure 8) and other noncommunicable diseases (Figure 9), we see a substantial overestimation in the number of cases at any level of the true CSMF. Additional file 5 shows the slope, intercept, and RMSE results from the linear regression by cause. As expected, high-accuracy causes (AIDS) have a slope near 1 and intercept near 0, while low-accuracy causes (diabetes, other noncommunicable diseases) have a lower slope and higher intercept. Similarly, high-precision causes have a low RMSE and vice versa.

Figure 4 Estimated versus true CSMFs across 500 Dirichlet splits for adult AIDS.

Discussion The importance of evaluating reliability and validity of underlying causes of death in mortality statistics has been recognized for a long time in the area of public health [19,20]. Generation of reliable statistical mortality data requires precise and consistent cause of death data, which in turn depends on the completeness and accuracy of cause of death diagnoses on the medical death certificate and its correct completion. There are different approaches in assessing the accuracy of the diagnoses in the medical death certificates. For example, several publications have used postmortem results from autopsies as a gold standard to compare agreements and errors with medical death certificates. A meta-analysis of 53 autopsy series published in 2003 yielded a median error rate of 23.5% (range: 4.1%49.8%). The analysis of diagnostic error rates in the

Table 2 Median CSMF accuracy by age group, for underlying diagnosis and all diagnoses Underlying Diagnosis

All Diagnoses

Median

95% UI

Median

95% UI

Adults

0.780

(0.774, 0.785)

0.839

(0.835, 0.846)

Children

0.683

(0.663, 0.701)

0.822

(0.802, 0.845)

Neonates

0.756

(0.747, 0.769)

0.887

(0.880, 0.895)

Figure 5 Estimated versus true CSMFs across 500 Dirichlet splits for maternal deaths.

149


Hernรกndez et al. Population Health Metrics 2011, 9:38 http://www.pophealthmetrics.com/content/9/1/38

Page 7 of 10

Figure 6 Estimated versus true CSMFs across 500 Dirichlet splits for adult lung cancer.

Figure 8 Estimated versus true CSMFs across 500 Dirichlet splits for adult diabetes.

same study, adjusting for the effects of case mix, country, and autopsy rate, yielded relative decreases of 19.4% (1.8%, 33.8%) for a period of 10 calendar years [21]. Not all diseases can be diagnosed with a postmortem examination. Adequate clinical examinations prior to death are also useful for correct determination and certification of causes of death. Using medical records as a gold standard (with review by pathologists or nosologists), some studies have validated the quality of death certificates in different countries. These include a populationbased study of 1,068 deaths in Valencia, Spain [22] and another that was a review of 2,813 medical death certificates in Finland [23]. We calculated, for both studies, the same metrics that we used for our sample. In the first study the median chance-corrected concordance was 58.9% and in the second 60.3%. The accuracy was 0.94 and 0.90, respectively. It is important to mention that when we calculated the same metrics for only 1,284 adults we computed a mean chance-corrected

concordance of 66% and a CSMF accuracy of 0.85 without sampling across the 500 Dirichlet splits. To the best of our knowledge, this is the first study in Mexico assessing the validity of medical death certificates using a robust gold standard. Although the sample may be biased (more than 66% of the cases came from hospitals with high technical capabilities for diagnoses as well as good pathology departments) the results are consistent with other studies that used a sample of hospital deaths. Johansson and Westerling published a study of 31,785 death certificates that were linked to the national hospital discharge register and found an agreement of 46% with the main diagnosis of the hospital discharge and the underlying cause of death in medical death certificates [24]. For deaths that occurred in the hospital, the agreement increased to 84%, but for those that occurred at home, the agreement fell to 43%. The same study found an incremental trend of the

Figure 7 Estimated versus true CSMFs across 500 Dirichlet splits for adult pneumonia.

Figure 9 Estimated versus true CSMFs across 500 Dirichlet splits for adult other noncommunicable diseases.

150


Hernández et al. Population Health Metrics 2011, 9:38 http://www.pophealthmetrics.com/content/9/1/38

Page 8 of 10

The case of maternal deaths is important to highlight because Mexico has undertaken a major effort since 2002 to improve the completeness and quality of their diagnoses. Chance-corrected concordance for maternal deaths was 80%, which included false positive cases diagnosed as HIV/AIDS and noncommunicable diseases that could have been considered indirect obstetric deaths. The low concordance and accuracy in the case of child and neonatal deaths, as well as the variability across causes at these ages, could be associated with different factors, such as the type of causes selected as gold standard, the number of gold standard cases gathered by cause, and death certification itself. Regarding the last point, in Mexico, as in many other countries, death certification is perceived as unglamorous routine paperwork or a “burdensome task” of low priority. It is sometimes even interpreted as punishment or a task for doctors with a low level of training. This may be the case in the pediatric hospitals because the correlation between the medical death certificates and the gold standard was very low for all causes. In our study, when we considered not only the underlying cause of death, but the mention of any cause of death in the medical death certificate, the median chance-corrected concordance for children increased from 38.5% to 64.0% with a very dramatic increase in diarrhea, sepsis, and pneumonia. In neonates, the median chance-corrected concordance increased from 54.3% to 58.9%, mainly due to an increase in the concordance of birth asphyxia and preterm deaths. This is consistent with Hunt and Barr [30] who demonstrated in their study that including all causes written in the medical death certificates regardless of the sequence of diagnosis increased the concordance from 58% to 91% in neonatal deaths. In other words, the medical knowledge to assign a cause of death is present, but it could be used more efficiently in correctly filling out the death certificates. These results suggest that using multiple cause of death analysis could better support decision-makers, because assigning “one cause to one death” is an exercise that is not easily understood by physicians, and this directly affects the reliability of the cause of death statistics. This problem becomes apparent when we consider all the causes reported on the medical death certificate, where the consistency of individual cause assignment and accuracy of the CSMF composition improve significantly. However, improving the quality of medical certification by using the multiple cause approach does not help to increase the validity of the cause of death statistics themselves because they are based on the underlying cause of death. This study had various methodological strengths: in contrast to other validation studies using medical

agreement by age: 43.8% in children under 1 year old, 44.7% in children from 1 to 14 years of age, and 49% in adults aged 15 and over. Our study found a reasonably high concordance and accuracy of the assignment of individual causes of death in the underlying cause of death of medical death certificates compared to the gold standard. For adults, the list of 34 causes of death used in our study is reasonable and captures the epidemiological pattern for causes of death in the Federal District and Morelos, but this is not the case for the 21 causes for children. There were difficulties in obtaining the quota of deaths for some diseases, particularly for children aged 1 month to 12 years. According to official statistics, there were 868 deaths in the MoH health facilities of the Federal District and the state of Morelos in 2009. That year there were no deaths due to measles, meningitis, encephalitis, hemorrhagic fever, malaria, or bites of venomous animals in those age ranges, and there were only 39 deaths (4.5%) related to injuries, 28 in the Federal District and 11 in Morelos. None of these cases fit the inclusion criteria due to the lack of quality of the medical records. In the case of neonates we did not find any deaths due to pneumonia. This study also shows a substantial variability in the concordance and accuracy depending on cause of death. In the case of adults, it is worthy to mention that for diabetes, a highly prevalent disease considered the number one cause of death in Mexico, this analysis shows a substantial overreporting of deaths based on the death certificate. Previous studies have shown that validity and comparability of diabetes can be affected because the diagnosis usually appears in only two-thirds of death certificates for people who had diabetes before death [25,26]. The order of the sequence of causes can also be a factor in whether or not diabetes is assigned as underlying cause of death [27,28]. Murray et al. show that when controlling for individual and community factors, mortality from diabetes can be reduced by 10% in the US and 24% in Mexico [29]. In this study we have seen a poor performance of diabetes CSMF prediction despite high chance-corrected concordance (86.8%) due to an overlap between diabetes in 38% of cardiovascular deaths and in 32% of pneumonia deaths. It is clear, on the other hand, that chance-corrected concordance and CSMF prediction are good for diseases where the diagnosis should be evidence-based, such as HIV/AIDS, leukemia/lymphomas, and cervical cancer. More than 95% of the death certificates with these causes match the gold standard and have concordance over 90%. There are other causes, such as cirrhosis, homicides, and maternal deaths for which more than 95% of death certificates match the gold standard, but their chance-corrected concordance is lower than 85%.

151


Hernández et al. Population Health Metrics 2011, 9:38 http://www.pophealthmetrics.com/content/9/1/38

Page 9 of 10

target health policies related to these age groups. Although we know our results are not generalizable to the rest of the country, it is important to consider that quality may be lower elsewhere. Each year there are an average of 40,000 deaths in these age groups (7% of total deaths), and in some states in Mexico the relative contribution reaches 10% or more. This requires an important commitment from the health system and health institutions and a review of coding procedures for deaths. It is necessary to provide tools and training to physicians so they can conduct a proper certification of the deaths. Evidence in the literature suggests that this is feasible [31,32], and manuals exist that can assist in putting this into practice [33]. It is crucial to address the issue of the importance of an accurate certification of cause of death and to implement quality controls in medical institutions. In terms of research, this study underlines the need to expand analysis of this type to other areas of the country using similar robust gold standards.

records as the gold standard, the cases selected in this study were based on robust gold standard criteria used in a multisite study; in addition, the metrics used to assess the performance of the VR system (chance-corrected concordance, CSMF accuracy, and linear regression, all estimated using a set of 500 test splits) are less sensitive to the cause composition of the test sample than other metrics traditionally used to assess performance, such as sensitivity and specificity. The study had some limitations that should be considered in the interpretation of results. It is important to take into account that the cases included in this study are a sample of cases with complete medical records, which allowed their classification as gold standard. The cases came mostly from high-specialty hospitals in the Federal District and as a result may have better death certification than deaths taking place in nonspecialty medical units. For the same reason, the concordance and accuracy reported in this paper may be higher than one we might find in other settings. This study is based on high quality registries and cannot be extrapolated to the entire country. It could be argued that the concordance may be affected not only by the information registered in the medical death certificate, but also by the coding procedures of the underlying cause of death. In this study, we used the coding information from INEGI, which generates the official mortality figures, and we assume that their procedures follow robust quality standards. However, the effect of possible coding problems on concordance and accuracy should be the subject of future research. In addition, the sample size was small for child and neonatal deaths, which may have limited our ability to analyze concordance and accuracy in these age groups. The reduced sample size can be explained by the low mortality in these age groups in medical units of the Federal District, as well as by the presence of a different mortality pattern in the study area.

Additional material Additional file 1: Number of deaths for adults, children, and neonates by cause of death Additional file 2: List of participant hospitals and number of cases obtained from each hospital Additional file 3: Misclassification matrix showing causes from the gold standard and medical death certificate Additional file 4: Median chance-corrected concordance by age group and cause, for underlying diagnosis and all diagnoses Additional file 5: Slope, intercept, and RMSE from linear regression of estimated versus true CSMFs, by age group and cause

Abbreviations ACME: Automated Coding of Medical Entities; CEMECE: Center for Disease Classification in Mexico; CSMF: cause-specific mortality fraction; INEGI: National Institute of Statistics and Geography, Mexico; MoH: Ministry of Health; PHMRC: Population Health Metrics Research Consortium; VR: vital registration. Acknowledgements and Funding We acknowledge the support of Dr. Bernardo Bidart, Director of the Coordination of Federal Reference Hospitals, MoH, for his support in the access to hospitals’ records. We would also like to acknowledge the support in the data collection for this study from the General Hospital of Mexico (Dr. Patricia Alonso and Dr. Virgilia Soto), Dr. Manuel Gea González Hospital General Hospital (Dr. Octavio Serrano and Dr. Lourdes Suárez), the National Institute of Respiratory Diseases (Dr. Cecilia Garcia Sancho and Dr. Rogelio Perez Padilla), and the National Institute of Oncology of Mexico (Dr. Roberto Herrera, Dr. Carlos Ochoa, and Dr. Alejandro Mohar). We also acknowledge the contribution of the hospitals belonging to the Secretary of Health from the Federal District (Dr. Armando Ahued, Dr. Ignacio Villasenor, and Dr. Patricia Cravioto) and the state of Morelos (Dr. Ma. Luisa Gontes and Dr. Mario Jorge López Arango). We would like to recognize the work of Dr. Javier Ernesto Alquicira Martínez, Dr. Zamira Castillo Salazar, Dr. Sully Claudio Riquelme, Dr. Romina Michelle Galindo Arestegui, Dr. Violeta Martínez Alcántara, Dr. David Osvaldo González López, Dr. Alejandra Piñera Cruz, Dr. Blanca Araceli Reyes Murillo, Dr. Román Rogelio Rodríguez Soria, Dr. Pilar Soel Encalada, Dr. Tania Stephenson Gussinye, Dr. Hector Mauricio Torres Valencia Dr. Angélica Urquiza Flores, Dr.América Villaseñor Domínguez, and Dr. Daniela Minerva Zárate Romero in the data

Conclusions Using a different approach to test the quality of the underlying cause of death for a sample of deaths from high-specialty hospitals, this study shows high concordance for some causes of death in adults but not for children and neonates. However, in future studies it would be worth including more causes of death in each category to reduce the size of the residual categories and better capture the epidemiological profile of a middle-income country. The results indicate the need to improve death certification procedures, especially in the case of children and neonates. While mortality in neonates and children under 12 has decreased significantly in recent years, it is desirable to improve the quality of records to better

152


Hernández et al. Population Health Metrics 2011, 9:38 http://www.pophealthmetrics.com/content/9/1/38

Page 10 of 10

collection, as well as the support of Juan Jose González Vilchis in the data analysis. In particular, the authors are thankful for the support of Kelsey Pierce and Rebecca Mandell, project officers for this project, and to Spencer L. James, Michael K. Freeman, Abraham D. Flaxman, and Summer Lockett Ohno, all from the Institute for Health Metrics and Evaluation. Finally, the authors would like to acknowledge the members of the Population Health Metrics Research Consortium. This research was supported by the grant 51488 “Population Health Metrics Research Consortium - Mexico” from the Bill & Melinda Gates Foundation. The sponsors of the study had no role in study design, data collection, data analysis, data interpretation, or the writing of the report.

14.

15.

16.

Author details 1 Center for Population Health Research. National Institute of Public Health, Mexico. Av. Universidad 655. Cuernavaca, Morelos, 62508, Mexico. 2Institute for Health Metrics and Evaluation, University of Washington, USA. 2301 5th Ave, Suite 600. Seattle, WA 98121, USA. Authors’ contributions BH participated in the design of the project and the acquisition, analysis, and interpretation of data. He wrote the first draft. DRV, MR, and SG participated in the acquisition and analysis of data. CA contributed in the analysis of data, and RL is the principal investigator of the project and participated in the conceptualization, design, data collection, analysis, and interpretation. He revised the first draft and wrote the final version. All authors read and approved the manuscript.

17.

18.

19. 20.

Competing interests The author declares that they have no competing interests. 21. Received: 2 April 2011 Accepted: 4 August 2011 Published: 4 August 2011 22. References 1. Mathers CD, Fat DM, Inoue M, Rao C, Lopez AD: Counting the dead and what they died from: an assessment of the global status of cause of death data. Bull. World Health Organ 2005, 83:171-177. 2. Lozano-Ascencio R: Is it possible to improve the death registries in Mexico? Gac Med Mex 2008, 144:525-533. 3. El Centro Colaborador para la Familia de Clasificaciones Internacionales de la OMS en México (CEMECE) | Dirección General de Información en Salud (DGIS). [http://sinais.salud.gob.mx/cemece/acercade/index.html]. 4. Mortality Medical Data System (MMDS) | U.S. CDC NVSS. [http://www.cdc. gov/nchs/nvss/mmds.htm]. 5. International Collaborative Effort (ICE) on Automating Mortality Statistics | U.S. CDC NVSS. [http://www.cdc.gov/nchs/nvss/ice_automation.htm]. 6. Base de datos de defunciones 1979-2007 | Dirección General de Información en Salud (DGIS). [http://www.sinais.salud.gob.mx/ basesdedatos/defunciones.html]. 7. Alpérovitch A, Bertrand M, Jougla E, Vidal J-S, Ducimetière P, Helmer C, Ritchie K, Pavillon G, Tzourio C: Do we really know the cause of death of the very old? Comparison between official mortality statistics and cohort study classification. Eur. J. Epidemiol 2009, 24:669-675. 8. Sinha S, Myint PK, Luben RN, Khaw K-T: Accuracy of death certification and hospital record linkage for identification of incident stroke. BMC Med Res Methodol 2008, 8:74. 9. Laurenti R, de Mello Jorge MHP, Gotlieb SLD: Underlying cause-of-death mortality statistics: considering the reliability of data. Rev. Panam. Salud Publica 2008, 23:349-356. 10. Nunes J, Koifman RJ, Mattos IE, Monteiro GTR: Reliability and validity of uterine cancer death certificates in the municipality of Belém, Pará, Brazil. Cad Saude Publica 2004, 20:1262-1268. 11. Pattaraarchachai J, Rao C, Polprasert W, Porapakkham Y, Pao-In W, Singwerathum N, Lopez AD: Cause-specific mortality patterns among hospital deaths in Thailand: validating routine death certification. Popul Health Metr 2010, 8:12. 12. Jedrychowski W, Mróz E, Wiernikowski A, Flak E: Validity study on the certification and coding of underlying causes of death for mortality statistics. Przegl Epidemiol 2001, 55:313-322. 13. Almada R, Ciriacos C, Piñeyrúa M, Logaldo R, González D: Calidad del registro en el certificado de defunción en un hospital público de

23.

24.

25.

26. 27.

28.

29.

30. 31. 32.

33.

referencia. Montevideo, Uruguay, octubre-noviembre 2009. Rev Med Urug 2010, 26:216-223. Tomé P, Reyes H, Guiscafré H, Martínez H, Rodríguez L, Gutiérrez G: Sobrediagnóstico de infección respiratoria y diarrea aguda en muertes de niños en Tlaxcala, México: análisis comparativo entre certificados de defunción y “autopsias verbales”. Bol. méd. Hosp. Infant. Méx 1994, 51:159-166. Alvarez G, Harlow SD, Denman C, Hofmeister MJ: Quality of cause-of-death statements and its impact on infant mortality statistics in Hermosillo, Mexico. Rev. Panam. Salud Publica 2009, 25:120-127. Murray CJL, Lopez AD, Black R, Ahuja R, Ali SM, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gómez S, Hernández B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Praveen D, Premji Z, RamírezVillalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D, Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metr 2011, 9:27. Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9:28. Frigyik B, Kapila A, Gupta M: Introduction to the Dirichlet Distribution and Related Processes. Seattle, WA: University of Washington Electrical Engineering, Technical Report UWEETR-2010-0006; 2010. Moriyama IM: Problems in measurement of accuracy of cause-of-death statistics. Am J Public Health 1989, 79:1349-1350. Moriyama IM, Dawber TR, Kannel WB: Evaluation of diagnostic information supporting medical certification of deaths from cardiovascular disease. Natl Cancer Inst Monogr 1966, 19:405-419. Shojania KG, Burton EC, McDonald KM, Goldman L: Changes in rates of autopsy-detected diagnostic errors over time: a systematic review. JAMA 2003, 289:2849-2856. Benavides FG, Bolumar F, Peris R: Quality of death certificates in Valencia, Spain. Am J Public Health 1989, 79:1352-1354. Lahti RA, Penttilä A: The validity of death certificates: routine validation of death certification and its effects on mortality statistics. Forensic Sci. Int 2001, 115:15-32. Johansson LA, Westerling R: Comparing Swedish hospital discharge records with death certificates: implications for mortality statistics. Int J Epidemiol 2000, 29:495-502. Geiss L, Herman W, Smith P: Mortality in non-insulin dependent diabetes. Diabetes in America. 2 edition. Washington D.C.: U.S. Department of Health and Human Services, National Institutes of Health; 1995, 233-257. Ochi JW, Melton LJ, Palumbo PJ, Chu CP: A population-based study of diabetes mortality. Diabetes Care 1985, 8:224-229. Jougla E, Papoz L, Balkau B, Maguin P, Hatton F: Death certificate coding practices related to diabetes in European countries–the “EURODIAB Subarea C” Study. Int J Epidemiol 1992, 21:343-351. Lu TH, Hsu PY, Bjorkenstam C, Anderson RN: Certifying diabetes-related cause-of-death: a comparison of inappropriate certification statements in Sweden, Taiwan and the USA. Diabetologia 2006, 49:2878-2881. Murray CJL, Dias RH, Kulkarni SC, Lozano R, Stevens GA, Ezzati M: Improving the comparability of diabetes mortality statistics in the U.S. and Mexico. Diabetes Care 2008, 31:451-458. Hunt R, Barr P: Errors in the certification of neonatal death. J Paediatr Child Health 2000, 36:498-501. Myers KA, Farquhar DR: Improving the accuracy of death certification. CMAJ 1998, 158:1317-1323. Lakkireddy DR, Basarakodu KR, Vacek JL, Kondur AK, Ramachandruni SK, Esterbrooks DJ, Markert RJ, Gowda MS: Improving death certificate completion: a trial of two training interventions. J Gen Intern Med 2007, 22:544-548. Guía para el llenado de los certificados de defunción y muerte fetal | Secreteria de Salud | Mexico. [http://sinais.salud.gob.mx/descargas/pdf/ GuiaLlenadoCertDefuncionyMFetal.pdf].

doi:10.1186/1478-7954-9-38 Cite this article as: Hernández et al.: Assessing quality of medical death certification: Concordance between gold standard diagnosis and underlying cause of death in selected Mexican hospitals. Population Health Metrics 2011 9:38.

153


França et al. Population Health Metrics 2011, 9:39 http://www.pophealthmetrics.com/content/9/1/39

RESEARCH

Open Access

Use of verbal autopsy in a national health information system: Effects of the investigation of ill-defined causes of death on proportional mortality due to injury in small municipalities in Brazil Elisabeth França1*, Deise Campos2, Mark DC Guimarães3 and Maria de Fátima M Souza4

Abstract Background: The Mortality Information System (MIS) in Brazil records mortality data in hospitals and civil registries with the responsibility of compiling underlying cause of death. Despite continuous improvements in the MIS, some areas still maintain a high proportion of deaths assigned to ill-defined causes. Deaths coded to this category have most likely been considered as miscoded deaths from communicable and noncommunicable diseases. However, some local studies have provided evidence of underreporting of injury in Brazil. The aim of this study was to investigate ill-defined causes of death using the verbal autopsy (VA) method to estimate injury-specific mortality fraction in small municipalities in northeastern Minas Gerais, Brazil. Methods: A sample size of reported death certificates with ill-defined conditions in a random sample of 10 municipalities was obtained, and then trained interviewers questioned family members using a standardized VA questionnaire to elicit information on symptoms experienced by the deceased before death. All attempts were made to collect existing information about the disease or death using health facilities records. Probable causes of death were assigned by a physician after review of the completed questionnaires following rules of the 10th revision of the International Classification of Diseases (ICD-10). Results: Of 202 eligible ill-defined deaths, 151 were investigated using the VA methodology, and 12.6% had injury as the underlying cause of death. The proportional mortality fraction from injury among all causes of death increases from 4.4% to 8.2% after investigation. Different specific injury category causes were observed between recorded injury causes and those detected by VA. Drowning was the top specific injury cause detected after investigation. Conclusions: This study provides evidence that the use of VA in the investigation of registered ill-defined conditions in an existing MIS can furnish information on the relevance of injury as a priority health problem in small municipalities of Minas Gerais. Local research with VA should be brought to the attention of regional health policymakers to improve the quality of data for their planning.

* Correspondence: efranca@medicina.ufmg.br 1 Graduate Program in Public Health and Research Group in Epidemiology and Health Evaluation, Faculty of Medicine, Federal University of Minas Gerais. Av. Alfredo Balena, 190. Belo Horizonte, 30130-100, Brazil Full list of author information is available at the end of the article © 2011 França et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

154


Franรงa et al. Population Health Metrics 2011, 9:39 http://www.pophealthmetrics.com/content/9/1/39

Page 2 of 9

southeast region. We found that 14% of recorded illdefined conditions and nonrecorded deaths in 2007 were due to injuries [7]. The finding of injury among ill-defined conditions is a striking issue, as injuries cause a significant health burden in Brazil, even taking undercount into consideration, and the death rates from violence rank among the highest in the Americas [9]. Therefore, proportional mortality for ill-defined causes of death must be considered in the use of data for epidemiological analysis such as burden of disease estimation, with suitable adjustments to account for potential bias. Miscoding of injury to ill-defined causes is of concern, and it may be indicative of an urgent need to check the quality of the VR system. It should also be investigated whether it is necessary to consider injury in the redistribution rules to estimate levels of cause-specific mortality. Thus, the aim of the present study was to investigate in a more detailed way causes of death reclassified as due to injury among those initially recorded as ill-defined causes in the routine information system in the northeast region of Minas Gerais.

Introduction Mortality data obtained from vital statistics registration systems (VR) are critical for rational public health planning. However, their usefulness will depend on the quality of death certificates, with emphasis on the reliability of cause-specific mortality data. Indicators of the quality of VR include completeness of registration and the proportion of deaths coded as ill-defined condition [1]. In Brazil, the Mortality Information System (MIS) is one of the key components of the National VR, and it was entrusted with the active recording of mortality data in hospitals and registry offices. The MIS is responsible for compiling cause of death data throughout the country, with a current estimate of more than 1 million coded deaths annually. Despite the continuous improvements of the MIS, Brazil is classified as having mediumquality death registration data, based on estimates of the national level of registration completeness and the proportion of ill-defined causes of death [1]. Despite the high proportion of ill-defined conditions in Brazil, up to 2010 there were only two published studies that investigated the distribution of defined causes of death among ill-defined conditions [2,3]. In both studies, 4-5% of these conditions were found to be due to injury, which indicates an unusual result, as more complete registration from injuries would have been expected [2,4]. During the first decade of this century, the Ministry of Health implemented a series of policies focused on helping states of the north and northeast regions to reduce ill-defined causes of death. Since 2005, there has been a substantial decline in the proportion of ill-defined causes in these regions, particularly in the northeast. In 2007, the level of completeness was estimated to be 90% for the whole country, and 7.7% of deaths were assigned to symptoms, signs, and ill-defined causes of death [5,6]. In spite of the steady decrease in the proportions of illdefined conditions, these still remain high in several specific areas of the country [7]. Although some initiatives for investigating ill-defined causes of death reported to MIS by health professionals used hospital records, health department records, and interviews with relatives of the deceased [5,8], until 2008 there was no national standard procedure for this investigation. In 2007, the Ministry of Health began to discuss the adoption of adapted verbal autopsy (VA) questionnaires developed by the World Health Organization (WHO) for the investigation of ill-defined causes of death. As part of these initiatives, we conducted a study to assess the use of VA in the investigation of the causes of underregistered deaths and reported deaths to MIS from symptoms, signs, and ill-defined conditions in the northeast region of Minas Gerais state, Brazilian

Methods This is a cross-sectional study of death certificates with ill-defined causes officially reported to the MIS in the northeast region of Minas Gerais state, Brazil. Minas Gerais is located in the southeast region of Brazil with an estimated population of 19 million inhabitants in 2007, and it is divided into 13 health administrative regions. These regions differ widely in living standards and population features. The northeast region has approximately 880,000 inhabitants, and it has the highest burden of disease and the poorest quality of causespecific mortality data in the state [10]. The sample size was estimated in 164 death certificates (DC) with ill-defined causes to be investigated considering the following parameters: 1) number of total death certificates with ill-defined causes in 2007 in the region reported up to April 2008: 1,124; 2) percentage of anticipated frequency of ill-defined causes to be reclassified into well-defined causes: 80%; 3) absolute precision: 6%; 4) confidence level: 90%; and, 5) design effect: 1.5. We then added 20% due to potential losses, yielding 197 DCs with ill-defined causes for reclassification, which were rounded to 200. The resident population of this region is geographically dispersed into 63 municipalities as follows: a) 49.2% with fewer than 10,000 inhabitants; b) 31.8% between 10,000 and 19,999 inhabitants; and, c) 19.0% with 20,000 or more inhabitants. In order to obtain the 200 DCs, we used a two-stage sampling procedure. In the first stage, we randomly chose 10 municipalities

155


França et al. Population Health Metrics 2011, 9:39 http://www.pophealthmetrics.com/content/9/1/39

Page 3 of 9

used was the report of an accident (intentional or unintentional) that was considered the underlying cause of the death by the physician if it was a logical sequence between the injury and the other causes. We used a classification approach with a few broad categories of injuries to increase the validity of VA. All the data collectors and the medical professionals were blinded to the objective of the study. Persons with injury cases detected among investigated ill-defined conditions were compared to people with injury cases registered on VR using Fisher’s exact test to examine whether the range of injury causes and other selected characteristics of death were significantly different between these two groups. Significance level considered was 0.05. We obtained the registered mortality statistics in the VR system from the Brazilian Ministry of Health [12]. Deaths coded to ill-defined conditions in the studied municipalities of Minas Gerais northeast region were reassigned to injury and medical causes according to observed proportions detected among ill-defined causes in the investigated sample. The resultant estimates from these adjustments and the registered VR causes were summed to obtain cause of death estimates. Proportions of deaths from ill-defined causes and injuries, deaths occurring at home in VR data, and demographic information (number of municipalities and proportion of resident population) in selected administrative areas of the country were calculated by municipality size using data made available by the Brazilian Ministry of Health [12]. Data entry and analysis were carried out by Excel and Stata 9.0 statistical package. Ethical approval for this study was obtained from the Federal University of Minas Gerais Ethical Committee. The study was also implemented with the full knowledge and support of the Brazilian Ministry of Health and the State Health Department of Minas Gerais.

among the 63, proportional to their population size as indicated above. Assuming that ill-defined causes were distributed similarly among the municipalities, these 10 municipalities would suffice to reach the required sample size. In the second stage, all DCs with ill-defined causes reported in these municipalities were then selected. Verbal autopsy interviews were conducted with family members of the deceased within 13 months following the date of death, on average. In the majority of cases respondents were family members, such as son/daughter (36% of the total), husband/wife (19%), parents (7%), or other family members (30%). Informed consent was obtained before the interviews, which were carried out by trained interviewers with at least secondary schooling. All interviewers have been carefully trained in interview techniques. Almost all interviews were conducted with the assistance of a community health worker from the Family Health Program who accompanied the interviewer during the home visits. This facilitated the address location and provided emotional support to the family. However, the health worker was not present at the time of the interview because of the possibility of introducing respondent bias and for ethical reasons. A standardized VA questionnaire was used to elicit information on symptoms experienced by the deceased before death. This questionnaire was based on the Portuguese version of the WHO instrument previously used in Mozambique and adapted by the Ministry of Health for the Brazilian reality, including cultural differences and the most prevalent diseases [5,11]. In addition, all attempts were made to trace existing information about the disease or death, including data obtained from hospital, health department, autopsy, family health program, or civil registry office records and by interviewing family health program professionals. Two forms were applied according to the age at death: (i) from 28 days old to younger than 10 years old and (ii) 10 years old or over. As we did not find registered ill-defined causes for neonatal deaths, it was not necessary to use the specific form for this age group. A medical professional trained in cause of death certification assigned the underlying cause of death of each individual using both the VA and the medical records, or any other documented evidence available that might have helped in determining the probable cause of death. Rules of the 10th revision of the International Classification of Diseases (ICD-10) were applied. Causes of death were then classified using a standardized mortality tabulation list from WHO comprised of groupings of ICD-10 codes, including 18 specific items for external causes of death [11]. Descriptive analysis was performed for data from the VA interviews of ill-defined conditions. For diagnosing deaths due to accidental or violent injuries, the criterion

Results A sample of 202 (35.3%) ill-defined conditions among 572 deaths was obtained from the 10 municipalities randomly selected. Six cases with defined causes of death at hospital investigation were excluded, and 45 cases could not be located due to migration of households or missing address. Thus, we investigated 151 (74.8%) illdefined deaths with the verbal autopsy methodology. No statistical difference was found between the participants in the study sample (n = 151) and losses (n = 45) with regard to the proportion of deaths occurring in hospitals and the age distribution of the deceased. On the other hand, being male (76%), living in a rural area (62%), and having no coverage from the Family Health Program (58%) were significantly more frequent in losses. Table 1 shows the distribution of selected characteristics in the VA interviews of ill-defined conditions.

156


França et al. Population Health Metrics 2011, 9:39 http://www.pophealthmetrics.com/content/9/1/39

Page 4 of 9

injury deaths in VR while drowning ranks first in the reclassified injury deaths (VA data). Although the age distribution was relatively similar, female deaths were relatively more common in the VA data if compared with VR data. Thirty-seven percent of injuries detected among ill-defined conditions occurred at home, as compared to none in the VR data. The proportion of residents in rural areas was similar in both groups. The proportion of injuries on VR data in the sampled municipalities was 4.4% (n = 25), considering all causes of death (n = 572). Two hundred and two cases were registered as ill-defined causes of death, but six cases had medical records with defined causes and three deaths among them were due to injury. Based on the fact that the number of investigated ill-defined cases is 151 and 19 among them were due to injury, we should have 47 injury deaths. So the injury fraction after investigation should be 8.2%; that is to say, the proportional mortality fraction due to injury among defined causes in the 10 municipalities increased from 4.4% to 8.2% after investigation (Table 3). Table 4 provides a summary of the proportional distribution of ill-defined causes of death, injuries, and home deaths, and also some demographic characteristics of Brazil and some administrative areas according to municipality size. Although the proportions of injuries in the state of Minas Gerais (11.1%), Brazilian southeast region (11.0%) and Brazil (12.5%) were relatively similar, this proportion in the northeast region of Minas Gerais is 9.2%. Within each area, the fraction of injuries is likely to be directly related to the size of the municipality. We also should point out that while smaller municipalities have a higher proportion of ill-defined conditions and home deaths than larger ones, this pattern is similar throughout the country in all administrative areas, and it is far greater in the northeast region of Minas Gerais.

Table 1 Proportion of injuries and other variables in the sampled municipalities after investigation of ill-defined conditions using verbal autopsy (VA), Minas Gerais northeast, Brazil, 2007 Variables

n (%)

Cause of death Injury

19 (12.6)

Medical causes

110 (72.8)

Ill-defined

22 (14.6)

Age (years) 0-19

5 (3.3)

20-59 ≼60

40 (26.5) 106 (70.2)

Male

78 (51.7)

Female

73 (48.3)

Sex

Rural area Yes No

29 (19.2) 122 (80.8)

Death place* Home

119 (79.3)

Hospital

15 (10.0)

Others

16 (10.7)

Having FHP** Yes

138 (91.4)

No

13 (8.6)

*missing, not included (n = 1); ** Family Health Program

Among the investigated deaths, 12.6% were due to injuries. The majority of people were over 60 years old, the sex distribution was similar, and 81% of the deceased lived in urban areas. Although the majority of deaths occurred at home, 62.9% of the respondents to VA questionnaires informed us that the deceased had visited a health facility during the period in which the terminal events leading to death occurred. This proportion, however, was lower (32.5%) when we checked the health facility registries (data not presented). On the other hand, we should note that 91% of the deceased with a registered ill-defined cause of death were covered by the Family Health Program and also that some deaths took place in a health facility. As Table 2 indicates, different distribution of specific causes within the injury category, sex, and the death place were observed between injury causes coded by the VR system before investigation and those reclassified as injury among the ill-defined conditions by the VA method. Road traffic accidents were the top cause of

Discussion Our findings indicate that the use of routine vital statistics may underrepresent the burden of external causes in northeastern Minas Gerais, as 12.6% of deaths registered with no defined causes in small municipalities were in fact due to injuries after VA investigation. We also found different proportional composition of injury types in the investigated study sample pointing out that some of them are more likely to be misclassified, e.g., drowning and falls. Although there have been some previous initiatives in using VA in Brazil, they were focused on the investigation of infant deaths [13]. To our knowledge, this study is one of the first attempts to use VA standardized procedures in the investigation of recorded ill-defined conditions for deaths occurring predominantly outside of

157


França et al. Population Health Metrics 2011, 9:39 http://www.pophealthmetrics.com/content/9/1/39

Page 5 of 9

Table 2 Injury in vital registration (VR) data before investigation and among investigated ill-defined conditions by the VA method in the sampled municipalities, Minas Gerais northeast, Brazil, 2007 Variables

Injury in VR data before investigation

Injury among ill-defined conditions

p value

n (%)

n (%)

(Fisher’s exact test)

10 (40.0) 1 (4.0)

1 (5.3) 4 (21.1)

0.00

Drowning

2 (8.0)

6 (31.6)

Exposures* and poisonings

3 (12.0)

1 (5.3)

Assault

9 (36.0)

4 (21.1)

0

3 (15.8)

1-19 20-59

4 (16.0) 18 (72.0)

2 (10.5) 11 (57.9)

60 +

3 (12.0)

6 (31.6)

Male

22 (88.0)

12 (63.2)

Female

3 (12.0)

7 (36.8)

Home Hospital

0 6 (25.0)

7 (36.8) 3 (15.8)

Others

18 (75.0)

9 (47.4)

Cause of death Road traffic accidents Falls

Events of undetermined intent Age (years)

0.32

Sex 0.07

Death place** 0.00

Rural*** Yes

7 (31.8)

5 (26.3)

No

15 (68.2)

14 (73.7)

0.74

*exposure to fires, venomous animals, force of nature; ** missing, not included (n = 1); *** missing, not included (n = 3)

It has been used extensively for estimating the causes of death structure of a defined population, mainly in countries with Demographic Surveillance Sites or sentinel/ sample vital registration systems, and more recently for verifying registered causes of death in a national VR system [14-17]. In Brazil, it has been proposed as a complement to the VR system for all states with high proportions of home deaths [5]. But it is important to note that VA is a limited substitute for proper medical certification [18], and its use in some areas of the country is part of the Brazilian government initiatives to strengthen the VR system [5]. The proportion of ill-defined causes of death in VR in Brazil has been the focus of several studies concerning its characteristics, distribution, and determinants, showing that a higher proportion of ill-defined conditions occur among children (1-4 years old) and the elderly. The majority of cases occurred at home [19-21], and unattended deaths make up the highest proportion of mortality [21-23], except in capitals [24]. In our study, we found that only 10% of the ill-defined deaths occurred in hospitals. Thus, it is likely that hospital investigation would have a smaller impact in decreasing

hospitals as an adjunct to medical certification in the routine mortality information system in Brazil. VA is a method of ascertaining probable causes of a death based on an interview with family relatives or caregivers, using a questionnaire to elicit information on signs and symptoms experienced by the deceased before death and also the circumstances preceding that death. Table 3 Proportional mortality due to injuries in the sampled municipalities in vital registration (VR) data before and after investigation, Minas Gerais northeast, Brazil, 2007 Cause of death

VR before investigation

VR after investigation

n

%

n

%

Injuries

25

4.4

47

8.2

Medical causes

345

60.3

458

80.1

Ill-defined causes

202

35.3

67

11.7

Total

572

100.0

572

100.0

158


França et al. Population Health Metrics 2011, 9:39 http://www.pophealthmetrics.com/content/9/1/39

Page 6 of 9

Table 4 Proportion of ill-defined conditions, injuries, home deaths, municipalities, and population by municipality size in selected administrative areas of Brazil, 2007 Administrative area and municipality size (inhabitants)

% Ill-defined

% Injury

% Home deaths

% Municipalities

% Population

Minas Gerais northeast (n = 5,409)

(n = 5,409)

(n = 5,409)

(n = 63)

(n = 881,340)

< 50,000 50,000-99,999

29.8 0.0

8.7 0.0

35.9 0.0

98.4 0.0

85.6 0.0

100,000 +

9.5

10.6

22.3

1.6

14.4

Missing*

2.5

2.5

2.5

0.0

0.0

Total

25.8

9.2

32.5

100.0

100.0

Minas Gerais state (n = 111,366)

(n = 111,366)

(n = 111,366)

(n = 853)

(n = 19,719,285)

< 50,000 50,000-99,999

14.3 12.2

9.2 10.0

25.7 19.2

92.1 4.5

42.3 13.6

100,000 +

7.5

13.5

16.3

3.4

44.1

Missing*

0.2

0.2

0.2

0.0

0.0

Total

11.2

11.1

20.9

100.0

100.0

Southeast (n = 495,877)

(n = 495,877)

(n = 495,877)

(n = 1,668)

(n = 80,641,101)

< 50,000 50,000-99,999

12.2 11.1

9.2 10.3

22.0 16.9

85.7 6.2

22.0 9.3

100,000 +

6.1

11.4

13.8

8.1

68.7

Missing*

0.5

0.5

0.5

0.0

0.0

Total

8.0

11.0

15.9

100.0

100.0

(n = 1,047,824)

(n = 5,564)

(n = 189,335,191)

Brazil (n = 1,047,824) (n = 1,047,824) < 50,000

10.6

11.3

30.2

89.5

33.8

50,000-99,999 100,000 +

9.0 5.5

12.5 13.0

23.0 16.0

5.6 4.9

11.4 54.8

Missing*

0.4

0.4

0.4

0.0

0.0

Total

7.7

12.5

21.4

100.0

100.0

Source: Ministry of Health (http://www2.datasus.gov.br/DATASUS/index.php; accessed May 2011) * Record linkage of the population size data bank and the Mortality Information System was not possible.

in Table 4. These findings are probably related to better access to hospital treatment in larger cities. Recent evidence from an ecological study demonstrated a statistically significant negative association between the Family Health Program (FHP) coverage levels and the under-5 mortality rate due to ill-defined causes in Brazil after controlling for socio-demographic characteristics (fertility rate, percentage of illiterates, and per capita income) and population size [26]. This program is a strategy of primary health care adopted in 1994 by the Brazilian Ministry of Health operating through a multiprofessional team that covers a defined geographical area of about 3,500 inhabitants with active home visits by community health workers and, when

the proportion of ill-defined diseases as suggested from a study in SĂŁo Paulo [22] and that household investigation using the VA method would be the best option in this case. As expected, in-hospital deaths were negatively associated with ill-defined conditions [24], even after controlling for socio-demographic characteristics and chronic diseases [25]. In our study when the municipality was adopted as a unit of analysis in an ecological approach, we found that the proportions of ill-defined conditions and home deaths were inversely related to the size of the municipality, i.e., municipalities with fewer than 50,000 inhabitants have higher proportions of home deaths and ill-defined causes of death as shown

159


França et al. Population Health Metrics 2011, 9:39 http://www.pophealthmetrics.com/content/9/1/39

Page 7 of 9

estimate of the proportion of ill-defined causes reclassified as due to injury. Thus, if we apply the 12.6% adjustment factor to VR data in this region, out of the 1,393 ill-defined causes before adjusting, 175 cases should be due to injury. Then after adjusting, we should have 674 injury deaths (the pre-adjustment number of injuries was 499), and the adjusted injury fraction should be 12.5%. That is to say, before adjusting, the injury fraction on VR data was 9.2% (Table 4), compared to the 12.5% after adjusting. However, some characteristics of the losses could affect these results. Household interviews were particularly less successful for the deceased resident in rural areas and without coverage by the FHP, and we had higher proportion of males among nonparticipants in the study sample. But we believe that the losses are unlikely to significantly affect the study findings. In general, injury is not considered in the redistribution procedures of ill-defined conditions, as more complete registration of deaths from this cause is expected [28,29]. Thus, we hope that our findings help to clarify the importance of using empirical studies to adjust VR data and derive “best” estimates of injuries in areas with data quality problems. Our results may have important implications for policy planning as injury is considered a public health concern in urban areas of high population density, and there is some evidence that violence has been spreading to the interior of the country [9]. Data from other countries also indicate that this problem is not unique to Brazil. Some studies from Oklahoma and Boston in the United States have detected inaccuracy of death certificates that may underestimate the contribution of injury to elderly mortality [30,31]. A cross-sectional study in Iran of six leading ill-defined causes of death also detected that 10% and 5% of the deaths were due to injuries for the age groups of 15-69 years old and 70 years old and over, respectively. According to the authors, a possible explanation for these findings might be that injury deaths of undetermined intent pending investigation by the Iranian Legal Medicine Organization were coded as ill-defined conditions [32]. In Brazil, the cause of death is certified by a coroner if there are suspicious circumstances surrounding the event. It is possible that in small municipalities the availability of this medical examiner is less frequent and the cause of death is declared as undetermined by a physician or even by the civil registrar, in which case the cause of death is usually not recorded. In this study, although our eligible sample of 202 illdefined deaths was actually based on a sampling frame for the whole year of 2007, the population source available in April 2008 when we started the field research (n = 1,124 ill-defined causes) did not include the total number of ill-defined deaths registered in 2007 in the

necessary, by physicians as well [26,27]. In 2008 the FHP was present in 94% of Brazilian municipalities [26]. In accordance with this high estimate, it is important to note that the majority of deceased persons in our study had the FHP coverage, so they had access to health facilities. It is possible that the FHP has less impact on the quality of VR in low-income municipalities [26] due to the lack of qualified human resources, allowing weaknesses in identifying the cause of death, and errors in the procedures of death notification to the MIS at the municipal level [7]. Thus, administrative errors in passing the information on from a local level to the appropriate level is a major possibility, as we also found some nonrecorded death certificates in registry offices with injury as a cause of death. In addition, in our study we found that some registered ill-defined causes of death had defined causes after hospital investigation. Although the investigation of ill-defined causes of death considering hospitals, forensic institutes, and home interviews has been advocated as a simple methodology of information recovery [2], we should also point out the scarcity of systematic studies in small communities in the country. In poor-resource areas, where training of health professionals may be a challenge, we understand that all possible sources of medical/health information about the disease or death should be considered. In our investigation, this information was previously reviewed before conducting VA interviews with the family members. Despite relatively good access to health care in the region, our findings indicate a possible weak link between health care activities and the MIS, and they highlight the importance of standardizing the data collection process at the municipal level to improve the quality of the VR system. In Brazil, several studies have been undertaken for validating the medical certification of causes of death [20]. However, very few studies have investigated the distribution of reclassified causes of death among illdefined conditions. In our study, as indicated above, we found that a high proportion of ill-defined deaths in small municipalities were due to injuries (12.6%), and we think that this finding could well be related to the poor quality in filling out death certificates and/or problems related to the data flow at a municipal level. A previous field study in 13 heterogeneous municipalities in the states of Sergipe, Mato Grosso, and São Paulo and other research in the state of Rio de Janeiro using data from hospital admissions forms have also found injury among ill-defined conditions [2,3]. Although we cannot be certain that our results can be generalized for the whole northeast region of Minas Gerais, we understand that we are able to suggest a comparison of the VR data taking into account the

160


França et al. Population Health Metrics 2011, 9:39 http://www.pophealthmetrics.com/content/9/1/39

Page 8 of 9

defined deaths in the vital registration system and estimate the additional burden of violence and accidents. Therefore, for these areas it might be more appropriate to apply the defined cause-specific fractions from VA when dealing with ill-defined deaths. Beyond a better understanding of causes of death, the use of VA in these areas can also help to check the quality of the existing VR system. Thus, VA is an important tool in the investigation of ill-defined conditions in existing VR systems with data quality problems. It could represent a feasible midterm strategy for improving mortality information when the proportion of ill-defined causes is important, such as in small Brazilian municipalities, until the MIS reaches adequate levels of quality on the definition of the underlying cause of death [7]. On the other hand, our results emphasize the need to investigate other specific populations to better understand the true causes of these deaths. Local research with VA should be brought to the attention of regional health policymakers to improve the quality of data for their planning.

Northeast region of Minas Gerais, due to changes in the reporting process deadline of the final data to the health department. The final numbers (n = 1,393 ill-defined deaths, all deaths = 5,409) became available from the Ministry of Health only in March 2009. It was not possible to consider in our research the ill-defined deaths included (n = 269) due to practical and logistic reasons. However, we believe that this constraint did not interfere in selection bias as the proportional cause of death distribution was almost similar in both groups (data not presented). Although the proportions of injuries in the state of Minas Gerais, southeast region, and Brazil as a whole were relatively similar, this proportion in northeastern Minas Gerais was lower. This area has a higher proportion of ill-defined causes of death, and it seems that the fraction of injuries is inversely related to the proportion of ill-defined causes of death (as seen in Table 4), probably due to better registration of injuries on VR. On the other hand, the proportion of injuries in our study sample (4.4%) is lower when compared with the entire northeast region of Minas Gerais (9.2%). A possible explanation for this observation is the occurrence of misclassification due to higher fractions of ill-defined causes in the sampled municipalities. This study also has some other limitations concerning the time period between death and VA interview. Information from VA interviews was collected as far back as 13 months, on average. Although possible, we consider that the effect of recall period is less important for injury causes, as misclassification has been less observed among injuries [33,34]. Information about the disease or death (health worker information and/or hospital visits) was available for almost half of the cases (n = 72) and this was used to complement VA data, thus helping the medical professional to assign the underlying cause of death. Although this proportion is similar for injury, in only four cases was it related to violence or accidents, and in one unique case the final diagnosis would not be injury if based only on the VA data. Thus, the application of standardized VA procedures has significant potential for improving knowledge on causes of death in small municipalities and should be conducted in such a setting as to complement the VR. It is also important to note that active research for additional documented information could reveal weaknesses in collecting and processing mortality data of the MIS at the municipal level. In conclusion, the findings of this study suggest that the use of VA procedures for diagnosing causes of death among recorded ill-defined conditions might provide information on the relevance of injury as a priority health problem in small municipalities in Brazil. VA can be conducted to correct for the large proportion of ill-

Acknowledgements This work was sponsored by the Foundation for Research Support of Minas Gerais State (FAPEMIG) EDT-3292/06 and had partial support of the Health Department of the State of Minas Gerais. D. Campos was sponsored by the Brazilian Coordination of Improvement of Professional Higher Education (CAPES). E. França and M.D. Guimarães are fellows of the Brazilian Council for Scientific and Technological Development (CNPq). Author details 1 Graduate Program in Public Health and Research Group in Epidemiology and Health Evaluation, Faculty of Medicine, Federal University of Minas Gerais. Av. Alfredo Balena, 190. Belo Horizonte, 30130-100, Brazil. 2Research Group in Epidemiology and Health Evaluation, Faculty of Medicine, Federal University of Minas Gerais, Brazil. Av. Alfredo Balena, 190. Belo Horizonte, 30130-100, Brazil. 3Department of Social Medicine and Graduate Program in Public Health, Faculty of Medicine, Federal University of Minas Gerais; Research Group in Epidemiology and Health Evaluation, Faculty of Medicine, Federal University of Minas Gerais, Av. Alfredo Balena, 190. Belo Horizonte, 30130-100, Brazil. 4Area of Health Surveillance and Disease Prevention and Control, Pan American Health Organization. 525 Twenty-third St. N.W., 200372895, Washington, DC. Authors’ contributions EF, DC, and MFMS participated in the conception and design of the study. EF and DC performed the analysis of the data. EF, DC, and MDCG participated in the discussion and interpretation of the results. EF wrote the first draft of the manuscript, and MDCG reviewed the manuscript. All authors read, contributed to, and approved the final manuscript. Conflicts of interest The author declares that they have no competing interests. Received: 16 March 2011 Accepted: 4 August 2011 Published: 4 August 2011 References 1. Mathers CD, Fat DM, Inoue M, Rao C, Lopez AD: Counting the dead and what they died from: an assessment of the global status of cause of death data. Bulletin of the World Health Organization 2005, 83:171-77. 2. Mello-Jorge MHP, Gotlieb SL, Laurenti R: [The national mortality information system: problems and proposals for solving them. I - Deaths

161


França et al. Population Health Metrics 2011, 9:39 http://www.pophealthmetrics.com/content/9/1/39

3.

4.

5.

6. 7.

8.

9. 10.

11. 12. 13.

14.

15.

16.

17.

18.

19.

20. 21.

22.

23.

Page 9 of 9

due to natural causes]. Revista Brasileira de Epidemiologia 2002, 5(2):197-211. Teixeira CL, Klein CH, Bloch KV, Coeli CM: [Probable cause of death after reclassification of ill-defined causes on hospital admissions forms in the Unified National Health System, Rio de Janeiro, Brazil]. Cadernos de Saúde Pública 2006, 22(6):1315-1324. Murray CJL, Lopez AD: Estimating causes of death: new methods and global and regional applications for 1990. The Global Burden of Disease Boston: Harvard School of Public Health; 1996, 118-200. Ministério da Saúde: Manual para investigação do óbito com causa mal definida Brasília; 2009 [http://svs.aids.gov.br/download/manuais/ manual_obito_mal_definida.pdf]. RIPSA. Rede Interagencial de Informações para a Saúde: IDB 2009. Brasil Brasília; 2009 [http://tabnet.datasus.gov.br/cgi/idb2009/matriz.htm]. Campos D, França E, Loshi RH, Souza MFM: Uso da autópsia verbal na investigação de óbitos com causa mal definida em Minas Gerais, Brasil. Cadernos de Saúde Pública 2010, 26(6):1221-1233. Rosa JAR, Garbin T: Redução das taxas de mortalidade por causas maldefinidas em Bento Gonçalves. In Anais da 3ª. Expoepi - Mostra Nacional de Experiências Bem Sucedidas em Epidemiologia, Prevenção e Controle de Doenças Edited by: Ministério da Saúde 2003, 123-127. Souza ER, Lima MLC: The panorama of urban violence in Brazil and its capitals. Cien Saude Colet 2006, 11(2):363-373. França E, Abreu D, Campos D, Rausch MC: Avaliação da qualidade da informação sobre mortalidade infantil em Minas Gerais, em 2000-2002: utilização de uma metodologia simplificada. Revista Médica de Minas Gerais 2006, 16:S28-35. World Health Organization: Verbal autopsy Standards: ascertaining and attributing cause of death Geneva; 2007. Ministério da Saúde. Informações de Saúde. Estatísticas Vitais. [http:// www2.datasus.gov.br/DATASUS/index.php]. Barreto ICHC, Pontes LK, Corrrea L: Vigilância de óbitos infantis em sistemas locais de saúde: avaliação da autópsia verbal e das informações de agentes de saúde. Rev Panam Salud Publica 2000, 7(5):303-312. Baiden F, Bawah A, Biai S, Binka F, Boerma T, Byass P, Chandramohan D, Chatterji S, Engmann C, Greet D, Jakob R, Kahn K, Kunii O, Lopez AD, Murray CJL, Nahlen B, Rao C, Sankoh O, Setel PW, Shibuya K, Soleman N, Wright L, Yang G: Setting international standards for verbal autopsy. Bulletin of the World Health Organization 2007, 85(8):570-71. Whiting DR, Setel PW, Chandramohan D, Wolfson LJ, Hemed Y, Lopez AD: Estimating cause-specific mortality from community- and facility-based data sources in the United Republic of Tanzania: options and implications for mortality burden estimates. Bulletin of the World Health Organization 2006, 84(12):940-948. Adjuik M, Smith T, Clark S, Todd J, Garrib A, Kinfu Y, Kahn K, Mola M, Ashraf A, Masanja H, Adazu U, Sacarla J, Alam N, Marra A, Gbangou A, Mwageni E, Binka F: Cause-specific mortality rates in sub-Saharan Africa and Bangladesh. Bulletin of the World Health Organization 2006, 84(12):181-188. Rao C, Porapakkham Y, Pattaraarchachai J, Polpraser W, Swampunyalert N, Lopez AD: Verifying causes of death in Thailand: rationale and methods for empirical investigation. Population Health Metrics 2010, 8:1-13. Setel PW, Sankoh O, Rao C, Velkoff VA, Mathers C, Gonghuan Y, Hemed Y, Jha P, Lopez AD: Sample registration of vital events with verbal autopsy: a renewed commitment to measuring and monitoring vital statistics. Bulletin of the World Health Organization 2005, 83(8):611-17. Mathias TAF, Mello Jorge MHP, Laurenti R, Aidar T: Considerações sobre a qualidade de informações de mortalidade na população idosa residente no município de Maringá, Estado do Paraná, Brasil, no período de 1979 a 1998. Epidemiologia e Serviços de Saúde 2005, 14(3):159-169. França EB, Abreu DMX, Rao C, Lopez AD: Evaluation of causa-of-death statistics for Brazil, 2002-2004. Int J Epidemiol 2008, 37:891-901. Mello Jorge MHP, Laurenti R, Lima-Costa MF, Gotlieb SLD, Chiavegatto Filho ADP: A mortalidade de idosos no Brasil: a questão das causas maldefinidas. Epidemiologia e Serviços de Saúde 2008, 17(4):271-281. Rozman MA, Eluf-Neto J: Necropsia e mortalidade por causa mal definida no Estado de São Paulo, Brasil. Rev Panam Salud Publica 2006, 20(5):307-313. Santo AH: Causas mal definidas de morte e óbitos sem assistência. Rev Assoc Med Bras 2008, 54(1):23-8.

24. Abreu DMX, Sakurai E, Campos LN: A evolução da mortalidade por causas mal definidas na população idosa em quatro capitais brasileiras, 19962007. Revista Bras Estudos Popul 2010, 27(1):75-88. 25. Lima-Costa MF, Matos DL, Laurenti R, Mello-Jorge MHP, Cesar CC: Time trends and predictors of mortality from ill-defined causes in old age: 9 year follow-up of the Bambuí cohort study (Brazil). Cadernos de Saúde Pública 2010, 26(3):514-522. 26. Rasella D, Aquino R, Barreto ML: Impact of the Family Health Program on the quality of vital information and reduction of child unattended deaths in Brazil: an ecological longitudinal study. BMC Public Health 2010, 10:380. 27. Ministério da Saúde: Programa de Saúde da Família [Family Health Program]. Revista de Saúde Pública 2000, 34(3):316-319. 28. Paes NA, Gouveia JF: Recovery of the main causes of death in the Northeast of Brazil: impact on life expectancy]. Revista de Saúde Pública 2010, 44(2):301-9. 29. Gamarras CJ, Valente JG, Silva CA: Correction for reported cervical cancer mortality data in Brazil, 1996-2005. Revista de Saúde Pública 2010, 44(4):629-38. 30. Rodriguez SR, Mallonee S, Archer P, Gofton J: Evaluation of deathcertificate based surveillance for traumatic brain injury, Oklahoma 2002. Public Health Report 2006, 121(3):382-9. 31. Betz ME, Kelly SP, Fisher J: Death certificate inaccuracy and underreporting of injury in elderly people. J Am Geriatr Soc 2008, 56(12):2267-72. 32. Khosravi A, Rao C, Naghavi M, Taylor R, Jafar N, Lopez AD: Impact of misclassification on measures of cardiovascular disease mortality in the Islamic Republic of Iran: a cross-sectional study. Bulletin of the World Health Organization 2008, 86(9):688-96. 33. Chandramohan D, Setel P, Quigley M: Effect of misclassification of causes of death in verbal autopsy: can it be adjusted? Int J Epidemiol 2001, 30:509-14. 34. Setel PW, Whiting DR, Hemed Y, Chandramohan D, Wolfson L, Alberti KGMM, Lopez AD: Validity of verbal autopsy procedures for determining cause of death in Tanzania. Tropical Medicine and International Health 2006, 2(5):688-703. doi:10.1186/1478-7954-9-39 Cite this article as: França et al.: Use of verbal autopsy in a national health information system: Effects of the investigation of ill-defined causes of death on proportional mortality due to injury in small municipalities in Brazil. Population Health Metrics 2011 9:39.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

162


Mudenda et al. Population Health Metrics 2011, 9:40 http://www.pophealthmetrics.com/content/9/1/40

RESEARCH

Open Access

Feasibility of using a World Health Organizationstandard methodology for Sample Vital Registration with Verbal Autopsy (SAVVY) to report leading causes of death in Zambia: results of a pilot in four provinces, 2010 Sheila S Mudenda1†, Stanley Kamocha2†, Robert Mswia4†, Martha Conkling3, Palver Sikanyiti1, Dara Potter2, William C Mayaka1 and Melissa A Marx2*

Abstract Background: Verbal autopsy (VA) can be used to describe leading causes of death in countries like Zambia where vital events registration does not produce usable data. The objectives of this study were to assess the feasibility of using verbal autopsy to determine age-, sex-, and cause-specific mortality in a community-based setting in Zambia and to estimate overall age-, sex-, and cause-specific mortality in the four provinces sampled. Methods: A dedicated census was conducted in regions of four provinces chosen by cluster-sampling methods in January 2010. Deaths in the 12-month period prior to the census were identified during the census. Subsequently, trained field staff conducted verbal autopsy interviews with caregivers or close relatives of the deceased using structured and unstructured questionnaires. Additional deaths were identified and respondents were interviewed during 12 months of fieldwork. After the interviews, two physicians independently reviewed each VA questionnaire to determine a probable cause of death. Results: Among the four provinces (1,056 total deaths) assessed, all-cause mortality rate was 17.2 per 1,000 personyears (95% confidence interval [CI]: 12.4, 22). The seven leading causes of death were HIV/AIDS (287, 27%), malaria (111, 10%), injuries and accidents (81, 8%), diseases of the circulatory system (75, 7%), malnutrition (58, 6%), pneumonia (56, 5%), and tuberculosis (50, 5%). Those who died were more likely to be male, have less than or equal to a primary education, and be unmarried, widowed, or divorced compared to the baseline population. Nearly half (49%) of all reported deaths occurred at home. Conclusions: The 17.2 per 1,000 all-cause mortality rate is somewhat similar to modeled country estimates. The leading causes of death – HIV/AIDS, malaria, injuries, circulatory diseases, and malnutrition – reflected causes similar to those reported for the African region and by other countries in the region. Results can enable the targeting of interventions by region, disease, and population to reduce preventable death. Collecting vital statistics using standardized Sample Vital Registration with Verbal Autopsy (SAVVY) methods appears feasible in Zambia. If conducted regularly, these data can be used to evaluate trends in estimated causes of death over time. Keywords: cause of death, cause-specific mortality, mortality, verbal autopsy, Zambia

* Correspondence: marxm@zm.cdc.gov † Contributed equally 2 Global AIDS Program, Centers for Disease Control and Prevention, Government of the United States of America, Lusaka, Zambia Full list of author information is available at the end of the article © 2011 Mudenda et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

163


Mudenda et al. Population Health Metrics 2011, 9:40 http://www.pophealthmetrics.com/content/9/1/40

Page 2 of 9

Botswana. The country is divided into nine provinces, within which there are 72 districts. Districts are further stratified into Census Supervisory Areas (CSAs) [20]. With the exception of Lusaka and Copperbelt provinces, Zambia is predominantly rural; an estimated 61% of the population resides in rural areas [21]. Information on mortality is collected by health facilities throughout Zambia. However, this system fails to collect data on home deaths, which are thought to represent a substantial proportion of deaths in the country [12]. Therefore, in spite of having the necessary regulatory framework that supports the maintenance of a vital statistics system in Zambia, the current system does not generate usable vital statistics. Mortality estimates have not been sufficiently reliable for setting health sector priorities or for assessing program progress and impact. The goal of this study was to pilot implementation of a standardized process for collecting vital events data in Zambia. The objectives were to determine the feasibility of using SAVVY for this purpose and to estimate age-, sex-, and cause-specific mortality fractions for four pilot provinces in Zambia for a two-year period in 2009 and 2010.

Background Mortality is one of the most important indicators for measuring the health of the population in a country. But population-based causes of death have not been well described in many developing countries. Vital registration requires robust and systematic data collection, which is often difficult in these settings. Data on causes of death in most developing countries are incomplete and of poor quality, partly because most deaths are not attended by physicians or medically certified [1-3]. Only 12% of countries worldwide have high-quality mortality data from vital events registration, while 75 countries do not have any information on cause-specific mortality [4]. Less than a third of all deaths worldwide have causes that are medically certified [5]. Because of this, verbal autopsy (VA) methods are increasingly being used to ascertain the rate and leading causes of death, particularly in less-developed countries [6,7]. In Zambia, vital events registration is not robust or systematic and has failed to report on home deaths. Verbal autopsy methodology could become an important way to collect and report mortality data in Zambia [8-11]. Sample Vital Registration with Verbal Autopsy (SAVVY) is one method used to collect vital events data in regions where vital events registration is poor [12]. During a verbal autopsy, an interviewer trained in verbal autopsy methods asks the next of kin or caregiver open and structured questions about symptoms of the illness and events leading to the death. Specific symptoms reported are used to code causes of death using the 10th revision of the International Classification of Diseases (ICD-10) [13]. Currently, the most common way to code symptoms into causes of death is by physician review. But, increasingly, computer-generated algorithms are being tested and validated for personnel-cost-free coding [11,14,15]. Standard World Health Organization (WHO)-recommended procedures suggest that cause of death be determined by administering verbal autopsy interviews using standard questionnaires after a baseline survey is conducted to identify deaths in a certain discrete period [16]. Other countries in the Southern African region have conducted VA studies using slight variations on the recommended WHO methodology. For example, Mozambique implemented a post-census mortality survey using VA methods [17], while other VA studies conducted in the region have focused on smaller communities [18,19]. Zambia has a population estimated at 13 million and is located in sub-Saharan Africa, bordering eight countries, Namibia, Angola, the Democratic Republic of the Congo, Tanzania, Mozambique, Malawi, Zimbabwe, and

Methods Sampling

Data for SAVVY were collected by the Government of the Republic of Zambia’s Central Statistical Office (CSO). SAVVY was implemented in four provinces from January to December 2010. We used the 2000 Zambia Census of Population and Housing data [20] as the sampling frame and selected a stratified one-stage random sample. In order to increase the efficiency of the sample design, the sampling frame of CSAs was divided into urban and rural strata that were as homogeneous as possible. This pilot phase of SAVVY was conducted in 33 of 10,869 CSAs in Central, Luapula, Lusaka, and Southern provinces. The 33 CSAs were selected to represent different population densities and socioeconomic characteristics, and present various potential logistical challenges. Data collection

A baseline census was conducted in selected CSAs in January 2010 to count and describe the populations of the areas selected. VA interviews were conducted for all deaths reported to have occurred in the 12 months preceding the baseline census as well as for all deaths that occurred between January and December 2010. We carried out quarterly independent re-enumeration of populations to verify resident populations and death registration

164


Mudenda et al. Population Health Metrics 2011, 9:40 http://www.pophealthmetrics.com/content/9/1/40

Page 3 of 9

adding the number of individuals who were recorded as deceased in the 12 months preceding the survey to the total population in sampled areas from the census so that we had a complete census for the years we were recording deaths.

completeness as a quality measure [12]. Zambia used the standard and recommended WHO [22] VA questionnaires with slight adaptations to reflect the Zambian context (e.g., inclusion of a question on type of health facilities) for the collection of neonatal, child, and adult deaths and the causes of death. Characteristics collected included sex, age at death, marital status, education, place of death, health services utilization, rural or urban residence, and province. Nurses, other medical personnel, and, in some cases, teachers were trained as VA interviewers. VA Interviewers were assisted in the field by key informants whose duty was to inform the VA interviewers of every death that occurred in the CSAs in which they worked. Key informants, mostly community health workers and traditional birth attendants, were chosen from the CSAs in which SAVVY was implemented. MEASURE Evaluation’s SAVVY methods are also described elsewhere [12].

Ethics

The protocol was approved by the Research Ethics Committee at the University of Zambia and the Zambian Ministry of Health. The Centers for Disease Control and Prevention (CDC) Institutional Review Board approved the evaluation as nonresearch.

Results Among the four provinces assessed, a total of 1,107 deaths were identified to have occurred during 2009 and 2010. During visits to the households to conduct VA interviews, 51 of these deaths were determined to have occurred prior to 2009, and were excluded from analysis. All subsequent summaries and analyses reflect the remaining 1,056 deaths. A close adult relative (mother, father, sibling, or spouse) participated in the VA interview for 687 (65%) reported deaths, a child of the deceased participated for 96 (9%) deaths, other relatives participated for 259 (25%) deaths, and nonrelatives participated for 14 (1%) deaths. Of the total 1,056 deaths, 1,006 (95%) respondents reported that they had lived with the deceased in the period leading to death. An eligible respondent agreed to participate in the study for each death. There were no refusals. The crude all-cause mortality rate was 17.2 per 1,000 (95% CI: 12.4, 22.0). More deaths occurred among males (584, 55%) than females (472, 45%) although slightly more females (51%) than males (49%) were reported in the dedicated census. The number of deaths in children under 5 years of age (365, 34%) was disproportionately high relative to this group’s population (15%). Among the 70 (7%) neonatal (0 to 27 days) deaths, 40 (57%) were male and 30 (43%) were female. People who had a primary or no education (371, 57%) also contributed a disproportionately higher number of deaths than the population they represented (7,103, 41%). Marriage was as common in those who died as in the baseline census (approximately 50%). But those who had died were three to six times as likely to be widowed (115, 18%) or divorced (67, 10%) compared to the baseline population (5% and 3% respectively; Table 1). Of all reported deaths, 518 (49%) occurred at home. The place of death varied considerably among the provinces. In rural Luapula Province, approximately 64% (198) of reported deaths occurred at home, while in urban Lusaka, 37% (127) of deaths occurred at home (Figure 1). The majority (819, 77%) of the deceased sought some form of medical treatment in the period

Coding

Nine physicians were trained on VA questionnaire review, how to produce a death certificate, and how to assign an immediate and underlying cause of death based on ICD-10 guidelines and coding principles. After the VA interviews were conducted, two physicians independently reviewed each VA questionnaire to determine a probable cause of death. They each completed a death certificate for the VA death and assigned an ICD-10 code. The death certificates and ICD-10 codes completed by the two physicians were then compared. If they agreed, the cause of death assigned was considered final. If they disagreed, they reviewed the VA questionnaire together to reach an agreement. If they failed to reach consensus on the underlying cause, the cause of death for that particular VA death was considered undetermined. Data analysis

SAVVY data for the four provinces were analyzed using Stata v11 (StataCorp LP, College Station, Texas). Characteristics of the population are presented along with mortality rates and cause-specific mortality fractions (CSMFs) by age and sex for the leading causes of death. CSMF refers to the proportion of deaths due to specific cause divided by the total number of deaths. For CSMF estimates, we aggregated and tabulated causes of death based on ICD-10 classification according to the WHO Tabulation List [13] adapted for Zambia. Additionally, we explored the type and place of services sought for medical care by those who died in the period leading to death (generally in the three months prior to death). We used Pearson’s chi-square tests to compare characteristics of those who died to the baseline census population and selected CSMFs between selected groups. We calculated the denominator for all-cause mortality by

165


Mudenda et al. Population Health Metrics 2011, 9:40 http://www.pophealthmetrics.com/content/9/1/40

Page 4 of 9

of the liver, other digestive system disorders, mental disorders, stomach and other digestive system disorders) represented approximately 8% (75) of the total causes of death while 5% (48) had causes that were undetermined (Figure 3). Cause-specific mortality fractions varied by disease and gender. Males died most often of HIV/AIDS, and most (125, 82%) HIV deaths among males occurred in those over 15 years of age. Malaria, injuries (transport related, drowning, falls, exposure to smoke, fire and flames, accidental poisoning, and assault), and diseases of the circulatory system were the other leading causes of death for males. Men died of injuries more often than women did (10% vs. 4%, p < 0.01; Figure 3). The leading causes of death for females were HIV/ AIDS, malaria, diseases of the circulatory system, and malnutrition (Figure 3). Most (105, 78%) females dying of HIV/AIDS deaths were over 15 years of age. Approximately 93% of the deaths in females due to malnutrition occurred in children and young teens between 4 weeks and 14 years of age. Meningitis caused a statistically significantly greater number of deaths in girls aged 4 weeks to 14 years vs. boys in the same age group (7% vs. 2%, p < 0.05). Maternal conditions also contributed to 13 (3%) of deaths among females. Although not statistically significant, more male infants died of stillbirth than females (1.5% vs. 0.6%, p < 0.25).

Table 1 Socio-demographics of the deceased identified by SAVVY interview, and of the baseline population from the dedicated census Characteristic

Male sex

Deceased (N = 1056) n (%)

Baseline Census (N = 30,315) n (%)

584 (55.3)

14,851 (49.0)

365 (34.5)

4566 (15.1)

Age group 0-4 yrs 5-14

52 (4.9)

8400 (27.7)

15-49

430 (40.7)

15,542 (51.3)

95 (9.0)

1295 (4.3)

114 (10.8)

512 (1.7)

50-64 65+ 1

Highest education level reported None

82 (12.7)

556 (3.2)

Primary

289 (44.7)

6547 (37.7)

Secondary

195 (30.2)

7860 (45.3)

Higher

38 (5.9)

1954 (11.3)

Unknown

42 (6.5)

432 (38.0)

Never married Married/Living with partner

119 (18.4) 322 (49.9)

6853 (39.5) 8763 (50.6)

Widowed

115 (17.8)

928 (5.4)

Divorced

67 (10.4)

588 (3.4)

Separated

19 (2.9)

198 (1.1)

Unknown

4 (0.6)

19 (0.1)

Central

180 (17.1)

3310 (10.9)

Luapula Lusaka

312 (29.6) 348 (32.9)

1210 (4.0) 14,268 (47.1)

Southern

216 (20.4)

11,527 (38.0)

Marital status1

Province

Discussion All-cause crude mortality was 17.2 per 1,000 personyears in the four provinces studied from 2009 to 2010; nearly half (49%) of deaths occurred at home. Overall leading causes of death were HIV/AIDS, malaria, and circulatory diseases, generally reflecting the order of sex-specific causes. Sex-specific differences in the order included injuries, the third leading cause of disease for men, and malnutrition, the fourth leading cause of death for women. These leading causes of death were similar to those previously reported in the region [18,23,24]. In a worldwide summary of mortality published in 2009, leading causes of adult mortality in the African region were derived largely from population-based studies, and included HIV/AIDS (35%), other infectious causes (including malaria), and injuries (for men) [22]. Separate SAVVY-based studies in Tanzania and Kenya showed that the majority of non-infant deaths were attributable to HIV/AIDS, tuberculosis, and malaria, and approximately 6% were attributable to cardiovascular disease (CVD). With modification based on lessons learned from this pilot, new technologies being developed, and a gradually increasing governmental commitment to fund collection of vital statistics data, it is feasible that vital registration

1

Among adults and teenagers only; Deceased: N = 646, Population: N = 17,349.

before death and many sought care at more than one facility (Figure 2). Most people (> 80%) who sought care went to a government clinic prior to death, and more than 53% sought treatment at a government hospital. Only 30% of the deceased in rural Luapula, but approximately 68% in rural Southern Province, were reported to have sought treatment at a government hospital. Overall, more than 46% of the deceased received homebased care. The leading causes of death were HIV/AIDS (287, 27%), malaria (111, 10%), injuries and accidents (81, 8%), diseases of the circulatory system (75, 7%), malnutrition (58, 6%), pneumonia (56, 5%), and tuberculosis (50, 5%) (Figure 3). The remaining deaths were caused by perinatal and neonatal causes, diarrheal diseases, cancers, meningitis, maternal conditions, measles, stillbirth, other anemias, diabetes mellitus, and mental and behavioral disorders due to substance use and other causes. Combined, other specified conditions (including the remainder of infectious and parasitic diseases, diseases

166


Mudenda et al. Population Health Metrics 2011, 9:40 http://www.pophealthmetrics.com/content/9/1/40

Page 5 of 9

Luapula . 63.5%

Central 51.7%

.

Lusaka 36.5% N

Southern 49.1%

Figure 1 Map of Zambia showing the percent of deaths reported to have occurred at home in 2009 and 2010 in each of the four pilot provinces.

scheduled national censuses. Using standardized approaches to verbal autopsy and SAVVY-based vital registration could also improve the ability to compare results across countries and regions [16]. Zambia employed physicians to code interviews into ICD-10 causes of death (physician- certified verbal autopsy [PCVA]). Two physicians coded each interview and, if their codes differed, they discussed the case to agree on a final code. Physicians were unable to consider medical records in coding deaths as recommended by WHO because most families of those who had died did not keep these medical records. Despite this limitation, we viewed PCVA and duplicate coding as strengths. However, recent publications have suggested a low (30%) concordance between PCVA and a gold standard, in this case known cause of death [26]. Recent reports have also cast doubt that duplicate coding improves data quality [27,28]. Computer-based algorithms such as InterVA and the Symptom-Pattern and the newly developed Tariff and Random Forest methods are promising alternatives to

data may be collected using SAVVY methodology in the future. Zambia was the first country in Africa to use WHOrecommended SAVVY methodology to collect vital events data. WHO recommends conducting a dedicated census just prior to conducting verbal autopsy interviews. These activities require donor funding and the undivided attention of professional government staff. While costly on monetary, opportunity cost, and staffing bases, this rigorous methodology allows for collection of standardized census data specific for vital events registration. Adaptations of WHO-recommended SAVVY methodology that rely on national censuses, like the one conducted in Mozambique in 2007 [17], likely realize cost and time efficiencies that may improve sustainability. However, these efficiencies must be weighed against the adverse impact that longer recall periods may have on data quality [6,25]. Using a dedicated census allows for a shorter recall period for retrospective identification of deaths. Additionally, the recall period can be determined by the study implementers, rather than by

167


Mudenda et al. Population Health Metrics 2011, 9:40 http://www.pophealthmetrics.com/content/9/1/40

Page 6 of 9

Figure 2 Health care utilized by the deceased whose relatives reported that they received some type of treatment between final illness onset and death (n = 819). * The proportions across the different type of care/health facility sought do not add up to 100% because many individuals visited more than one type of health care/service during illness in the period leading to death. “Other places� includes hospices.

Figure 3 Causes of death for males and females of all ages from all deaths recorded by Zambia’s SAVVY (N = 1056) from 2009-2010. *p < 0.05. **p < 0.001 Note: All other specified causes include the remainder of infectious and parasitic diseases, diseases of the liver, other digestive system disorders, other CNS disorders, other chronic obstructive pulmonary diseases, disorders of the kidney, other mental and behavioural disorders, asthma, senility/oldage, sickle-cell disorders, other respiratory diseases, duodenal ulcer, hernias, disorders of the skin and subcutaneous tissue, other diseases of the urinary system, other disorders of the genital organs, abdominal pain, Sudden Infant Death Syndrome, leprosy, viral hepatitis, other blood diseases, epilepsy, diseases of the oesophagus, stomach and duodenum, hyperplasia of prostrate, stomach and other digestive system disorders.

168


Mudenda et al. Population Health Metrics 2011, 9:40 http://www.pophealthmetrics.com/content/9/1/40

Page 7 of 9

Despite rapid scale-up of national programs to provide free highly active antiretroviral treatment (HAART) and prevention of mother-to-child transmission, HIV is still the leading cause of death in these four provinces. This finding is similar to other countries in sub-Saharan Africa [24,28] and the entire African region [23]. Research suggests that HIV-related death is most common in the three months following treatment initiation and is associated with advanced HIV disease at presentation [36], thought to indicate delays in seeking care. Long distances from homes to health care centers providing HAART have been linked to delayed treatment, particularly in rural areas [37]. Zambia is currently incorporating new, more aggressive treatment guidelines that may improve survival [38]. It is hoped that implementation of these guidelines will lead to reductions in HIV-related mortality, although they do not address the distance barrier. Zambia should be able to evaluate trends in HIV-related mortality before and after implementation of the new guidelines with the continued and ongoing collection of vital events data using SAVVY. Other leading causes of death reported here were also reported by others in the region, including malaria [24,39-41], circulatory diseases [23], and injury [23,42]. Challenges existed with data collection and analysis that should be taken into account when interpreting the data. First, verbal autopsy interviews were conducted by nurses who could have introduced their own professional judgment and biases into the coding process by selecting keywords associated with the illnesses that they inadvertently “diagnosed” during the interview. Secondly, the census did not collect any information about ages under 1, so it was not possible to calculate neonatal mortality rates. Thirdly, when field staff assessed health care sought, they meant in the three months prior to death, but we understand that this time frame was not consistently explained. Because of this oversight, in addition to health care sought for the illness leading to death, we may have also captured health care sought for conditions that the person had previously but from which they didn’t die. Fourth, this was the pilot phase of SAVVY in Zambia and the sampled areas do not necessarily represent the country. The next phase is designed to complement the pilot and, together, provide nationally-representative estimates. Fifth, most of the households interviewed were unable to provide clinical records such as laboratory results. Our reliance on a lay description of the family member’s symptoms likely resulted in misclassification of cause of death in some cases. Sixth, as a cultural practice, stillbirths and neonatal deaths are not generally acknowledged by families as deaths, and so are likely greatly undercounted in this study. Seventh and finally, sample randomization for this 2009-2010 assessment was based on a national census from 2000, which was clearly out of date.

PCVA [29-31]. Algorithmic methods have been shown to perform as well as or better than PCVA in cause of death assignment without the personnel cost [29,30,32]. For particular individual causes of death, some algorithms have been shown to perform better than PCVA. But algorithms lack the ability to identify and prioritize causes of death that are of public health importance in specific settings, to adapt to changing disease patterns, and to accurately identify less common causes of death [28,30]. Overall, in recent comparisons, despite statistical differences in results generated by PCVA and algorithmcoded methods, leading causes of disease and groups most burdened have been similar and have had the same policy implications [19,28]. During data collection for this pilot phase, we did not use algorithms to code causes of death because they have only recently been developed and validated. Once they are refined and made available for tailoring and testing in Zambia, they could be used here. Although this was a pilot and included just four of the nine provinces in Zambia, our crude all-cause mortality rate, 17.2 per 1,000 person-years, was somewhat similar to the 13.3 per 1,000 person years estimated for 2009 and 2010 by the Central Statistics Office’s (CSO) “Population Projections Report” based on projections from 2000 census data [33]. Our estimate, which included a substantial proportion of residents of Lusaka Province, was essentially equal to CSO’s projections for Lusaka Province (17.1 per 1000) and similar to results from a recently conducted analysis reporting 14.1 to 14.5 deaths per 1,000 in Lusaka Province [34]. Although samples were not drawn to be representative of the country, our crude maternal mortality rate (1.6 per 100,000 women aged 15-49) was somewhat similar to the 1.2 per 100,000 rate reported by the Demographic Health Survey (DHS) for 2002-2003 [35]. However, our under-5 mortality was substantially lower (80 per 1,000) than the DHS reported with regard to 2003-2004 data (119 per 1,000) [35]. This difference could suggest an ascertainment gap in our data that should be investigated. Under-5 mortality could also be underreported because of stigma associated with discussing early childhood deaths. Otherwise, few sources of mortality data exist in Zambia. To our knowledge, there is no other source of representative data on the distribution and causes of death in Zambia. More than 80% of people who subsequently died were reported to have sought treatment at a government clinic at some time prior to death. Many delayed too long. These findings suggest that educational material should be posted in waiting areas of government health clinics to alert people to early signs and symptoms of illnesses that should prompt them to seek medical care. Based on the demographic profile of those who died, messaging should be simple, pictorial, and in large print.

169


Mudenda et al. Population Health Metrics 2011, 9:40 http://www.pophealthmetrics.com/content/9/1/40

Page 8 of 9

hopes to incorporate SAVVY into national strategic plans so that this essential vital statistics information can be used to identify gaps in quality and access to health care and identify shortfalls and successes of interventions.

Other aspects of Zambia’s application of SAVVY likely contributed to high quality and completeness of data. For instance, a dedicated census allowed for shorter recall periods for our interviewees. Additionally, community health workers and traditional birth attendants were employed and trained to identify deaths in their own communities for autopsy interview. As community members, they also facilitated entry of SAVVY interviewers into their neighbors’ households. In part, because of the strengths and despite the weaknesses, these data can be used to determine needs and gaps in the health care system. Results could be used to develop community-based interventions to improve survival in the groups identified as most at risk for death. Based on our results, interventions could include improvements in HAART access for people with HIV; access to treated mosquito nets for malaria prevention and access to prompt treatment for those with malaria; clinicallyattended birthing and nutritional support for females; access to information about preventing and treating circulatory diseases; and increased education to parents about knowledge of signs and symptoms that should prompt urgent medical attention of their young children. Further in-depth study is needed to develop interventions to avert deaths, especially those that are preventable and treatable. While SAVVY data have not yet been linked with Zambia’s national electronic health records (EHR) system, “SmartCare,” identifiers used in the system are compatible with those used in SAVVY. With dedicated effort, appropriate approvals, and confidentiality protections, the health care, morbidity, and mortality data collected in health facilities that are captured in SmartCare could be linked with the verbal autopsy and census data captured in SAVVY. This linkage could identify clinical antecedents of mortality, providing a more comprehensive description of gaps in care and prevention. Finally, discussion is also needed worldwide to determine whether the gains enjoyed from standardizing SAVVY methods across countries are worth the potential loss of cost efficiency and perhaps the additional sustainability gained by integrating it with other activities.

Acknowledgements The cooperation of all the households in the selected clusters was essential for this study, which was conducted by the Central Statistical Office with the financial and technical support of the Centers for Disease Control and Prevention (under Cooperative Agreement number 1 U2G PS000635-01) and MEASURE Evaluation. The authors would like to acknowledge the dedication and efforts made by the field staff and the physicians for ensuring that the data collection and death certification were carried out in an efficient manner with high quality. We acknowledge with special gratitude the Centers for Disease Control and Prevention (CDC) for their financial, material, and technical support that made this survey possible. The findings and conclusions in this paper are those of the authors and do not necessarily represent the views of the CDC. Author details 1 Central Statistical Office, Government of the Republic of Zambia, Lusaka, Zambia. 2Global AIDS Program, Centers for Disease Control and Prevention, Government of the United States of America, Lusaka, Zambia. 3CTS Global, Inc. Assigned to: Centers for Disease Control and Prevention, Lusaka, Zambia. 4 Futures Group/MEASURE Evaluation, North Carolina, USA. Authors’ contributions PS, SK, and RM participated in the conception and design of the study. SSM, SK, RM, MC, PS, and MM participated in the analysis of the results. RM and MC wrote the statistical code and generated the computer output. SSM, SK, MC, RM, and MM drafted the paper. WM and DP reviewed and commented on the manuscript, and WM provided technical and managerial support for the Central Statistics Office authors. All authors read, contributed to, and approved the manuscript. Competing interests The authors declare that they have no competing interests. Received: 21 February 2011 Accepted: 5 August 2011 Published: 5 August 2011 References 1. Beaglehole R, Bonita R: Public health at the crossroads Cambridge: Cambridge University Press; 1997. 2. World Health Organization: Verbal autopsy for maternal deaths. Report of a WHO workshop, London, 10-13 January. Geneva: World Health Organization, Division of Family Health; 1994. 3. Lopez A: Causes of Death: an assessment of global patterns of mortality around 1985. World Health Stat Q 1990, 43:91-104. 4. Mathers C, Ma Fat D, Inoue M, Rao C, Lopez A: Counting the dead and what they died from: An assessment of the global status of cause of death data. Bulletin of the World Health Organization 2005, 83:171-177c. 5. Lopez A, Ahmed O, Guillot M, Ferguson B, Salomon J, Murray C, Hill K: World Mortality in 2000: Life Tables for 191 Countries. Geneva: World Health Organization; 2000. 6. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bulletin of the World Health Organization 2006, 84:239-245. 7. Baiden F, Bawah A, Biai S, Binka F, Boerma T, Byass P, Chandramohan D, Chatterji S, Engmann C, Greet D: Setting international standards for verbal autopsy. Bulletin of the World Health Organization 2007, 85:570-571. 8. Bang A, Bang R, the SEARCH team: Diagnosis of causes of childhood deaths in developing countries by verbal autopsy: suggested criteria. Geneva: Bull World Health Organ; 1992:70:499-507. 9. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: issues in their development and validation. Int J Epidemiol 1994, 23:213-222. 10. Kahn K, Tollman S, Garenne M, Gear J: Validation and application of verbal autopsies in a rural area of South Africa. Trop Med Int Health 2000, 5:824-831.

Conclusion Results from this pilot study indicate that collection of verbal autopsies to estimate causes of mortality using SAVVY methods is feasible in Zambia. The SAVVY methodology enabled Zambia, a country where half of all deaths occur at home, to collect field-based vital records data and report causes of death in selected provinces for the first time. Lessons learned from Zambia’s pilot of SAVVY and the experiences of other countries will enable us to modify our methods with an aim of improving data quality and sustainability of SAVVY. Zambia

170


Mudenda et al. Population Health Metrics 2011, 9:40 http://www.pophealthmetrics.com/content/9/1/40

Page 9 of 9

11. Reeves BC, Quigley M: A review of data-derived methods for assigning causes of death from verbal autopsy data. Int J Epidemiol 1997, 26:1080-1088. 12. SAVVY: Sample Vital Registration with Verbal Autopsy. [http://www.cpc. unc.edu/measure/tools/monitoring-evaluation-systems/savvy]. 13. World Health Organization: ICD-10: International Statistical Classification of Diseases and Related Health Problems. Geneva: World Health Organization;, Second 20042. 14. Lawn J, Wilczynska-Ketende K, Cousens S: Estimating the causes of 4 million neonatal deaths in the year 2000. Int J Epidemiol 2006, 35:706-718. 15. Winbo I, Serenius F, Dahlquist G, Kallen B: A computer based method for cause of death classification in stillbirths and neonatal deaths. Int J Epidemiol 1997, 26:1298-1306. 16. Verbal Autopsy Standards: Ascertaining and Attributing Cause of Death. [http://www.who.int/whosis/mort/verbalautopsystandards/en/]. 17. Mbofana F, Lewis R, West L, Mazive E, Cummings S, Mswia R: Using verbal autopsy in a post-census mortality survey to capture causes of death in Mozambique, 2006-2007. Global Congress on Verbal Austopsy: State of Science; February 15-17; Bali, Indonesia 2011. 18. Lulu K, Berhane Y: The Use of simplified verbal autopsy in identifying causes of adult death in a predominantly rural population in Ethiopia. BMC Public Health 2005, 5:58. 19. Oti S, Kyobutungi C: Verbal autopsy interpretation: a comparative analysis of the InterVA model versus physician review in determining causes of death in the Nairobi DSS. Population Health Metrics 2010, 8:21. 20. Central Statistical Office: Zambia Census of Population and Housing, 2000. Lusaka, Zambia; 2003. 21. Central Statistical Office: Zambia Census of Population and Housing, 2010: Preliminary results. Lusaka, Zambia; 2011. 22. Chandrasekar C, Deming W: On a method of estimating birth and death rates and the extent of registration. Journal of American Statistics Association 1949, 101-115. 23. Mathers C, Boerma T, Ma Fat D: Global and regional causes of death. British Medical Bulletin 2009, 92:7-32. 24. van Eijk A, Adazu K, Ofware P, Vulule J, Hamel M, Slutsker L: Causes of deaths using verbal autopsy among adolescents and adults in rural western Kenya. Trop Med Int Health 2008, 13:1314-1324. 25. Mobley C, Boerman J, Titus S, Lohrke B, Shangula K, Black R: Validation Study of a verbal autopsy method for causes of childhood mortality in Namibia. J Trop Pediatr 1996, 42:365-369. 26. Atkinson C, Lozano R, Flaxman A, James S, Lopez A, Murray C: Experimental validation of the Physician Coded Verbal Autopsy (PCVA) method for verbal autopsy: establishing a baseline for performance. Global Congress on Verbal Autopsy: State of the Science; February 15-17, 2011; Bali, Indonesia 2011. 27. Joshi R, Lopez A, MacMahon S, Reddy S, Dandona R, Dandona L, Neal B: Verbal autopsy coding: are multiple coders better than one? Bulletin of the World Health Organization 2009, 87:51-57. 28. Byass P, Kahn K, Fottrell E, Collinson M, Tollman S: Moving from Data on Deaths to Public Health Policy in Agincourt, South Africa: Approaches to Analyzing and Understanding Verbal Autopsy Findings. PLoS Med 2010, 7:e1000325. 29. James S, Flaxman A, Vohdatpour A, Murray C: Experimental validation of the Tariff Method for Verbal Autopsy: using empirical cause symptom associations to levy cause-of-death assignments. Global congress on Verbal Autopsy, State of the Science; February 15-17; Bali, Indonesia 2011. 30. Vahdatpour A, Green S, James S, Flaxman A, Lozano R, Naghavi M, Lopez A, Murray C: Machine Learning for Verbal Autopsy Analysis: Validation Study of Random Forest. Global Congress on Verbal Autopsy: State of the Science; February 15-17; Bali, Indonesia 2011. 31. Lozano R, James S, Flaxman A, Vahdatpour A, Green S, Birnbaum J, Campbell B, Atkinson C, Kalter H, Naghavi M, et al: Comparative methods in adult verbal autopsy: examining the ability of Symptom Pattern, Machine Learning, Tariff, and InterVA to accurately determine causes of death. Global Congress on Verbal Autopsy: State of the Science; February 1517; Bali, Indonesia 2011. 32. Bauni E, Ndila C, Mochamah G, Nyutu G, Williams T: Validating physician review and probabilistic modeling (InterVA) approaches to verbal autopsy interpretation using hospital causes of death. Global Congress on Verbal Autopsy: State of the Science; February 15-17; Bali, Indonesia 2011.

33. Zambia Central Statistical Office: Population Projections Report: Project population, with and without AIDS, Zambia, 2000-2025. Lusaka 2003. 34. Rathod S, Kusnthan T, Sachingongu N, Stringer J, Chi B: Trends in all-cause mortality and HIV attitudes and behaviours during public antiretroviral scale-up: a multiple cross-sectional study in Lusaka Zambia 2004-2010. Presented at Congress of Epidemiology; Montreal, Quebec, Canada 2011. 35. Central Statistical Office Ministry of Health, University of Zambia, Tropical Diseases Research Centre, Macro International Inc: Zambia: 2007 Demographic and Health Survey. Calverton, Maryland: MEASURE Evaluation; 2009. 36. Stringer S, Zulu I, Levy J, Stringer E, Mwango A, Chi B, Mtonga V, Reid S, Cantrell R, Bulterys M, et al: Rapid Scale-up of Antiretroviral Therapy at Primary Care Sites in Zambia, Feasibility and Early Outcomes. Journal of the American Medical Association 2006, 296:782-793. 37. van Dijk J, Sutcliffe C, Munsanje B, Hamangaba F, Thuma P, Moss W: Barriers to the care of HIV-infected children in rural Zambia: a crosssectional analysis. BMC Infectious Diseases 2009, 9:169. 38. Ministry of Health: Adult and Adolescent Antiretroviral Therapy Protocols 2010. Lusaka: Government of the Republic of Zambia; 2010. 39. Becher H, Kynast-Wolf G, Sie A, Ndugwa R, Ramroth H, Kouyate B, Muller O: Patterns of malaria: cause-specific and all-cause mortality in a malariaendemic area of west Africa. The American journal of tropical medicine and hygiene 2008, 78:106. 40. Kaatano G, Mashauri F, Kinung’hi S, Mwanga J, Malima R, Kishamawe C, Nnko S, Magesa S, Mboera L: Patterns of malaria related mortality based on verbal autopsy in Muleba District, north-western Tanzania. Tanzania Journal of Health Research 2009, 11:210-218. 41. Sacarlal J, Nhacolo AQ, Sigaúque B, Nhalungo DA, Abacassamo F, Sacoor CN, Aide P, Machevo S, Nhampossa T, Macete EV: A 10 year study of the cause of death in children under 15 years in Manhiça, Mozambique. BMC Public Health 2009, 9:67. 42. Garrib A, Herbst AJ, Hosegood V, Newell ML: Injury mortality in rural South Africa 2000-2007: rates and associated factors. Tropical Medicine & International Health 2011. doi:10.1186/1478-7954-9-40 Cite this article as: Mudenda et al.: Feasibility of using a World Health Organization-standard methodology for Sample Vital Registration with Verbal Autopsy (SAVVY) to report leading causes of death in Zambia: results of a pilot in four provinces, 2010. Population Health Metrics 2011 9:40.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

171


Mwanyangala et al. Population Health Metrics 2011, 9:41 http://www.pophealthmetrics.com/content/9/1/41

RESEARCH

Open Access

Verbal autopsy completion rate and factors associated with undetermined cause of death in a rural resource-poor setting of Tanzania Mathew A Mwanyangala1, Honorathy M Urassa1, Jensen C Rutashobya1, Chrisostom C Mahutanga1, Angelina M Lutambi1,2, Deodatus V Maliti1, Honorati M Masanja1,2, Salim K Abdulla1,2 and Rose N Lema1,2*

Abstract Background: Verbal autopsy (VA) is a widely used tool to assign probable cause of death in areas with inadequate vital registration systems. Its uses in priority setting and health planning are well documented in sub-Saharan Africa (SSA) and Asia. However, there is a lack of data related to VA processing and completion rates in assigning causes of death in a community. There is also a lack of data on factors associated with undetermined causes of death documented in SSA. There is a need for such information for understanding the gaps in VA processing and better estimating disease burden. Objective: The study’s intent was to determine the completion rate of VA and factors associated with assigning undetermined causes of death in rural Tanzania. Methods: A database of deaths reported from the Ifakara Health and Demographic Surveillance System from 2002 to 2007 was used. Completion rates were determined at the following stages of processing: 1) death identified; 2) VA interviews conducted; 3) VA forms submitted to physicians; 4) coding and assigning of cause of death. Logistic regression was used to determine factors associated with deaths coded as “undetermined.” Results: The completion rate of VA after identification of death and the VA interview ranged from 83% in 2002 and 89% in 2007. Ninety-four percent of deaths submitted to physicians were assigned a specific cause, with 31% of the causes coded as undetermined. Neonates and child deaths that occurred outside health facilities were associated with a high rate of undetermined classification (33%, odds ratio [OR] = 1.33, 95% confidence interval [CI] (1.05, 1.67), p = 0.016). Respondents reporting high education levels were less likely to be associated with deaths that were classified as undetermined (24%, OR = 0.76, 95% CI (0.60, -0.96), p = 0.023). Being a child of the deceased compared to a partner (husband or wife) was more likely to be associated with undetermined cause of death classification (OR = 1.35, 95% CI (1.04, 1.75), p = 0.023). Conclusion: Every year, there is a high completion rate of VA in the initial stages of processing; however, a number of VAs are lost during the processing. Most of the losses occur at the final step, physicians’ determination of cause of death. The type of respondent and place of death had a significant effect on final determination of the plausible cause of death. The finding provides some insight into the factors affecting full coverage of verbal autopsy diagnosis and the limitations of causes of death based on VA in SSA. Although physician review is the most commonly used method in ascertaining probable cause of death, we suggest further work needs to be done to address the challenges faced by physicians in interpreting VA forms. There is need for an alternative to or improvement of the methods of physician review. Keywords: Verbal autopsy (VA), completion rate, undetermined, HDSS

* Correspondence: rnathan@ihi.or.tz 1 Ifakara Health Institute, Off passage, P.o.Box 53, Off Mlabani, Ifakara, Kilombero, Morogoro, Tanzania Full list of author information is available at the end of the article © 2011 Mwanyangala et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

172


Mwanyangala et al. Population Health Metrics 2011, 9:41 http://www.pophealthmetrics.com/content/9/1/41

Page 2 of 7

the end point of assigning a cause of death. Better understanding of the VA process will contribute in decision-making on whether to use physician review, algorithms, artificial neural networks, or probabilistic methods to interpret and assign causes of death to estimate cause-specific mortality in rural settings of Tanzania.

Background Verbal autopsy (VA) is a commonly used tool to ascertain causes of death. In many developing countries, cause of death data are limited because most deaths take place outside health facilities [1]. In addition, in some countries, vital statistics from vital registration systems are incomplete or do not exist. As a result, VA is often necessary for cause of death data determination [2-4] and results from VA are widely used for health planning, priority setting, monitoring, and evaluations [5-7]. In sub-Saharan Africa (SSA) and Asia, VA is used to obtain estimates on the distribution of causes of death and has become a routinely used tool to provide information on the burden of the disease [5,8-10]. VA has been shown to provide the best results to obtain the specific causes of death in most of SSA [11]. In order to play this potential role, VA methodology needs to be generalizable and responsive to community needs. VA is a process involving completion of death identifications, VA interviews, and cause of death ascertainment. VA is based on the premises that the primary caregiver, usually a family member, can recall, volunteer, and recognize symptoms experienced by the deceased that can be interpreted later to derive a probable cause of death. Several studies have documented challenges with the process of the interview in terms of interviewers, respondents, recall period, and language [12-15]. There have been problems with questionnaires, such as grouping and comprehensiveness of VA forms, closed- versus open-ended questions, and linguistic appropriateness [16-19]. Another overarching issue is the diversity in VA questionnaires used across different countries; although, recently, there has been great effort internationally to harmonize these tools [13-16]. Also, there are different methods for interpreting VA data to derive the probable causes of death including physician review, algorithms, probabilistic methods, and use of artificial neural networks. The process of VA includes identification of deaths in the community, documentation of the event [20-22], and interview of the caretaker of the deceased person. However, not all reported deaths result in interviews or specific assignment of causes of death. There has been limited systematic documentation of the completion rates at each step of the VA process, ending when a cause of death is assigned. The current study set out to determine the completion rates of the VA process and factors associated with failure to assign a cause of death. It is important to understand the gaps in current VA methods and explore how improve them [17]. Such information is needed for optimal design of VA tools that will enable better estimates of disease burden and understanding of the limitations of VA questionnaire administration from the stage of identifying a death to

Methods Study area

The Ifakara Health and Demographic Surveillance System (HDSS) is a part of the INDEPTH Network http://www. indepth-network.org. It was established in 1996, and since January 1997, all individuals are followed through households visits once in every four months. The surveillance area covers a total of 2,400 km2 of Guinea savanna in the floodplain of the Kilombero River, which divides the two districts of Kilombero and Ulanga in the Morogoro region. During the household visit the field interviewer updates and records basic demographic events including deaths, birth, pregnancy, and migration. Since, 2002 all deaths reported were followed with VA in order to ascertain the possible cause of death. Death identification

The HDSS field interviewers identify and register deaths during the routine household visit. During that visit, the interviewer informs the respondent that within a specified period of time another person will make a visit to document details about the death. Each death is recorded in forms that are collected, as well as in household register books. These forms are submitted to a data clerk for logging and to data management for entry in the database. The lists of deaths per VA supervisor zone are presented with basic demographic and household information in order to facilitate finding the residency of the deceased. The VA tool

The VA is a postmortem in-depth interview with the primary caregivers of the deceased [17]. VA questionnaires are structured into sections, including background, short narrative history, checklist of signs and symptoms (including duration), list of health services used during the terminal illness, and medical evidence (if any). The history of the illness elicits an unprompted account of the trend of events that eventually led to the death. The questionnaires are age-specific; there are separate forms for neonates (0 to 28 days), children (29 days to 12 years), and adults (above 12 years). Therefore, it is important to check on the age at death of the deceased to know the appropriate questionnaire to use. The tools are widely used in most of HDSS [8,23-25]. The questionnaire used was the 2002 VA form from INDEPTH, based on the

173


Mwanyangala et al. Population Health Metrics 2011, 9:41 http://www.pophealthmetrics.com/content/9/1/41

Page 3 of 7

were expressed as percentages and used to determine the completion rates in each step of the VA processing. All percentages refer to the preceding step in the VA processing sequence. Factors associated with undetermined cause of death were determined using a univariate logistic regression model. In order to adequately adjust for confounders, multivariate logistic regression was also used to determine association between selected independent variables and the outcome variables ("undetermined” cause of death). Two models were fitted, one for neonates and children and the other for adult deaths. Stata version 10 was used for analysis.

form from WHO/CDS/CSR/ISR/99.4, which has been well described [5]. A separate team of interviewers (specifically trained to conduct VAs) and administer the age-specific VA tool interview a family member who was closest to the deceased during the terminal illness and death. The interview is conducted after 40 days after the date of death to allow for the mourning period. Physician VA review

Each completed form is submitted to two physicians independently to ascertain the probable cause of death; in case of discordance, a third physician is invited and majority rule is applied. If the third physician determines a different cause, the case is coded as undetermined [26,27]. This is the most common method used to assign causes of death using VA [26,28-30]. Classification from the 10th revision of the International Classification of Diseases (ICD-10) was used. Physicians are updated on coding with trainings conducted at least once a year. In Ifakara HDSS off-site physicians are used deliberately to avoid potential bias in coding by those who have an intimate knowledge of the population and intervention.

Results Socio-demographic characteristics of the respondent

From 2002 to 2007, a total of 5,027 deaths (an average of 838 per year) were identified by the Ifakara HDSS field interviewers during the routine rounds. Of the deaths, 50% were males. The mean age at death was 31 years for the entire period of the study. Fifty-six percent of all deaths were those aged 12 years and above. Most respondents (68%) had completed primary education, and 34% of the respondents for adult deaths were either the deceased’s son or daughter. Sixtyeight percent of deaths occurred outside formal health facilities. Deaths ranged from one to four per household over the period of analysis. Over the study period, 38% of respondents were children of the heads of household. About 52% and 65% of the respondents reported residing with their mothers and fathers, respectively. Swahili was the main language used in the interviews during the VA in Ifakara HDSS (Table 1).

VA completion rates

Four indicators are used to assess completion rates at each stage in the process in assigning causes of death: 1) number of interviews/total number of deaths identified in the community; 2) number of forms completed (i.e., deaths)/total number of forms submitted to the physicians; 3) number of deaths coded with a specific cause assigned/total number of completed forms submitted for coding; and 4) number of deaths coded with specific causes assigned/total number of forms reviewed for coding. All of the proportions are converted to percents.

Completion rates of VA

Of the deaths reported during the study, 4,244 (84%) had VA interviews conducted. The completion rate in conducting the VA over that period ranged from 83% in 2002 to 89% in 2007. Of the 4,094 VA forms submitted to physicians for ascertaining the possible cause of death, 94% ended with a cause of death specified. The coding completeness was lowest in 2003 (92%) compared to other years. There were significant differences across years in the number of deaths assigned a cause of death as undetermined (14% in 2007 and 40% in 2004) (Figure 1). During the period of analysis, 16% of the deaths identified for VA were lost at the stage between community identification and VA interviews, 4% were lost between VA interviews and physicians’ determination of cause of death, and 6% were lost due to logistic issues when sending forms to physicians. A total of 1,178 (23%) of the deaths identified were lost before cause of death assignment. In addition, physicians did not assign a specific cause of death (undetermined death assignment) for 1,174 respondents that were interviewed. Over the

Factors associated with undetermined cause of deaths

Variables related to household compositions and sociodemographic characteristics of the respondents were included: residing with father or mother, number of deaths in the household, place of death, age category at death (neonates, children, or adult), the relationship of the respondent with the deceased, level of education of the respondent, and age and sex of the respondent. Data analysis and management

The data collected within the HDSS framework were used for analysis. Variables included in the analysis were extracted from different files of the Ifakara HDSS database. We performed the descriptive analysis by age and sex and by other variables, including the respondent’s relationship to the deceased, the relationship to the head of household, and the place of death. The proportions

174


Mwanyangala et al. Population Health Metrics 2011, 9:41 http://www.pophealthmetrics.com/content/9/1/41

Page 4 of 7

Table 1 Socio-demographic characteristics of deaths and VA respondents n Age at death (mean)ŧ

%

31

0 to 28 days

555

11.0

1 month to 12 years Above 12 years

1653 2819

32.9 56.1

Female

2515

50.1

Male

2512

49.9

Sex of the deceasedŧ

Respondent’s level of education Primary

2886

68

Secondary

570

13.4

Above secondary No education

249 539

5.9 12.7

Partner

666

27.8

Son/daughter

817

34.1

Parent

138

5.8

Other

773

32.3

Mother Father

1026 498

57.1 27.8

Grandmother

149

8.3

Grandfather

28

1.6

Other

93

5.2

Child

1601

37.7

Grandchild

288

6.8

Partner Self

421 1092

9.9 25.7

Other

842

19.8

Yes

2229

52

No

2015

48

2775

65.4

1469

34.6

One

2339

55.1

More than one

1905

44.9

Figure 1 The death distribution and completion rate in processing the VA.

Relationship of the respondent with the deceased*

Factors associated with undetermined cause

The current study has shown that about 31% of the death forms submitted to clinicians ended with an undetermined cause of death. For neonatal and childhood deaths, age at death, the level of education of the respondent, and place of death were associated with likelihood of undetermined cause of death. After adjusting for confounders, the neonatal and childhood deaths that occurred outside health facilities were significantly more likely, 33%, to end with undetermined cause (odds ratio [OR] = 1.33, 95% confidence interval [CI] (1.05, 1.67), p = 0.016). If the respondent had attained a secondary level of education, the death was 24% less likely to end with an undetermined cause compared with those who had no education (OR = 0.76, CI (0.60, 0.96), p = 0.023). If the respondent was related to the deceased in a way besides mother, father, or grandmother, an undetermined cause assignment was significantly more likely, but its significance disappeared after adjusting for other variables (OR = 1.57, CI (0.66, 3.77), p = 0.309).

Relationship of the respondent with the deceased**

Relationship to the head of household

Residing with mother

Residing with father Yes No Number of deaths per household

Deaths identified (5027)

783 lost (16%)

Place of death Health facility

1345

31.7

Outside health facility

2899

68.3

3095 1149

72.9 27.1

VA interviews conducted (4244) 150 lost (4%)

Undetermined cause of death assigned Yes No

Total forms lost, 20022007:2352 (47%)

VA forms submitted to physicians (4094) 245 lost (6%)

ŧ

Includes all eligible deaths * Adult deaths ** Neonatal and child deaths.

Forms completed with a specific cause of death (3849) 1174 classified as undetermined (23%)

period of analysis, 2352 (47%) deaths were not assigned a specific cause, whether because they were lost or they were assigned an undetermined cause (Figure 2).

Figure 2 Forms lost in processing VA at Ifakara 2002-2007.

175


Mwanyangala et al. Population Health Metrics 2011, 9:41 http://www.pophealthmetrics.com/content/9/1/41

Page 5 of 7

Among adult deaths (12 years and above), the relationship with the respondent was the only variable significantly associated with undetermined cause of death. A respondent who was a child of the deceased increased the odds of the death being coded as undetermined compared to respondents that were partners (husband or wife) (OR = 1.35, CI (1.04, 1.75), p = 0.023) (Table 2).

found in other studies [8,31]. The verbal autopsy interview was completed for 84% of the deaths identified by Ifakara HDSS between 2002 and 2007. This is considered high in resource-constrained rural settings [22,32]. This achievement reflects the strength of the HDSS system in tracking vital events, the field operation, and the timing of the VA interviews. All interviews were conducted in Swahili, unlike other studies that reported language as a limit in processing the verbal autopsy [33]. For the 16% of the deaths for which VA interviews were not conducted, this was likely due to outmigration of

Discussion The current analysis has found that annual VA interview completion rates are high and similar with those

Table 2 Factors associated with undetermined cause of deatha Neonates and children Age

Neonates and children

Adults

OR

CI (95%)

P

OR

Adults CI (95%)

P

OR

CI (95%)

P

OR

CI (95%)

P

0.98

(0.93-1.03)

0.381

0.99

(0.99-1.00)

0.09

0.95

(0.89-1.00)

0.09

0.99

(0.99-0.99)

0.03

Sex Female

1

Male

1.06

(0.85-1.32)

0.587

0.93

(0.78-1.10)

0.405

1.06

(0.85-1.32)

0.615

0.99

(0.80-1.23)

0.939

0.78

(0.63-0.98)

0.035

0.7 0.87

(0.41-1.18) (0.70-1.07)

0.183 0.189

0.76

(0.60-0.96)

0.023

0.74 0.88

(0.46-1.34) (0.71-1.10)

0.373 0.28

Respondent’s level of education Primary Secondary

No education 1 Relationship of the respondent with the deceased* Partner

1

Son/daughter 1.11

(0.89-1.40)

0.345

1.11

(0.89-1.40)

0.345

1.35

(1.04-1.75)

0.023

Parent

1.04

(0.69-1.56)

0.857

1.04

(0.69-1.56)

0.857

1.26

(0.82-1.95)

0.298

Other

1.06

(0.85-1.33)

0.597

1.06

(0.84-1.34)

0.597

1.21

(0.93-1.56)

0.147

0.259

Relationship of the respondent with the deceased* * Mother

1

Father

1.11

(0.86-1.43)

0.408

1.08

(0.83-140)

0.565

Grandmother

0.87

(0.57-1.33)

0.534

0.96

(0.60-1.53)

0.864

Other Relation to the head of household

1.45

(0.96-2.20)

0.08

1.57

(0.66-3.77)

0.309

Child Grandchild

1 0.92

(0.67-1.26)

0.606

0.92

(0.65-1.45)

0.643

0.42

(0.09-1.89)

0.251

0.41

(0.09-1.91)

Partner

1.41

(0.99-1.98)

0.051

1.68

(1.02-2.76)

0.04

Self

1.15

(0.84-1.56)

0.378

1.42

(0.91-2.24)

0.125

Other

0.98

(0.68-1.43)

0.929

1.09

(0.79-1.52)

0.576

0.96

(0.64-1.45)

0.862

1.3

(0.83-2.04)

0.255

Residing with mother No

1

Yes

1.05

(0.61-1.82)

0.858

0.91

(0.69-1.19)

0.48

0.94

(0.50-1.74)

0.833

0.97

(0.65-1.45)

0.894

1 1.31

(1.04-1.64)

0.02

0.9

(0.74-1.09)

0.302

1.33

(1.05-1.67)

0.016

0.91

(0.74-1.11)

0.342

Place of death Health facility Outside health facility

** Neonatal and child deaths * Adult deaths.

176


Mwanyangala et al. Population Health Metrics 2011, 9:41 http://www.pophealthmetrics.com/content/9/1/41

Page 6 of 7

the choice of the respondent and the location of the death have an impact on the final assignment of the cause of death across all age groups. This study has provided insight into factors affecting full coverage of verbal autopsy diagnoses and limitations to using verbal autopsy-based causes of death in SSA. Although physician review is the most commonly used method to ascertain probable causes of death, it may have limitations, and further work is needed to provide more information on the challenges faced by physicians in interpreting VA forms. There may be a need to identify alternative methods or improve physician review.

the care takers within or soon after the 40 days of mourning. Refusal is also a potentially limiting factor for conducting the VA interviews. These factors have not been quantified in this study. The subsequent stages in the process of assigning cause of death presented more challenges than the community identification of deaths. This poses a risk of underestimation of the burden of disease. Logistic issues preventing submission of the VA forms to physicians for coding causes a significant proportion of forms to have missing causes of death. In this study, 4% of the forms were not submitted to the physicians. Furthermore, although the undetermined cause of death can be redistributed among the three causes as assigned by different physicians, still there is high proportion (31%) of deaths that were not assigned specific causes of death (they were coded as undetermined cause). Most of the undetermined cases were children and adults. This contradicts other studies that reported problems applying VA to neonatal deaths [17,34,35]. This observation might be due to the fact that most of the respondents to the neonatal deaths were the mothers, fathers, or grandmothers, who were likely to have good understanding of the illness and were likely very close to the deceased. These findings underscore the importance of the relationship of the deceased and the VA respondent. A significant number of deaths occurred outside health facilities, and this underscores the continued relevance of VA in determination of cause of death in settings with inadequate vital registration systems [17]. As observed in this study, children who died outside of health facilities were more likely to be coded as undetermined. As this group is the target for VA, perhaps the tool needs to be improved further to identify the most appropriate respondent. Another point to note is that the proportion of specific causes coded as undetermined varied significantly across years but improved markedly in 2007. This might be due to the fact that in 2007 there was more than one retraining session, unlike in other years. The current analysis has shown the continued relevance of VA as tool for determination of cause of death in settings without or limited vital registration systems. The results raise several concerns about the continuing use of physicians in reviewing and interpreting VA data [36,37].

List of abbreviations SSA: sub-Saharan Africa; HDSS: Health and Demographic Surveillance System; VA: verbal autopsy. Acknowledgements and funding I gratefully acknowledge the staff of the Ifakara Health Demographic Surveillance System, VA supervisors, all Ifakara Health Institute staff, the HDSS community, INDEPTH Network, Novartis Foundation, Swiss Tropical and Public Health Institute, the Tanzania Ministry of Health and Social Welfare, USAID, and the London School of Hygiene & Tropical Medicine for their generous support in running the surveillance system. Author details 1 Ifakara Health Institute, Off passage, P.o.Box 53, Off Mlabani, Ifakara, Kilombero, Morogoro, Tanzania. 2Ifakara Health Institute, Plot 463, Kiko Avenue, Mikocheni, P.o.Box 78373, Dar es Salaam, Tanzania. Authors’ contributions MAM is responsible for coordinating the HDSS, data preparation for analysis, performing all data analysis and interpretation, drafting and revising the manuscript. MAM gave final approval for submission. RN participated in coordination the HDSS and provided guidance for reviewing and data analysis. HM, AL, HU, and SA reviewed the statistical analysis and the manuscript. CM and JC participated in managing the all field data collection in the period of analysis. All authors have read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Received: 9 March 2011 Accepted: 5 August 2011 Published: 5 August 2011 References 1. Asuzu MC, Johnson OO, Owoaje ET, Kaufman JS, Rotimi C, Cooper RS: The Idikan adult mortality study. African journal of medicine and medical sciences 2000, 29(2):115-118. 2. Kamugisha ML, Gesase S, Mlwilo TD, Mmbando BP, Segeja MD, Minja DT, Massaga JJ, Msangeni HA, Ishengoma DR, Lemnge MM: Malaria specific mortality in lowlands and highlands of Muheza district, north-eastern Tanzania. Tanzania health research bulletin 2007, 9(1):32-37. 3. Yang G, Rao C, Ma J, Wang L, Wan X, Dubrovsky G, Lopez AD: Validation of verbal autopsy procedures for adult deaths in China. International journal of epidemiology 2006, 35(3):741-748. 4. Aspray TJ: The use of verbal autopsy in attributing cause of death from epilepsy. Epilepsia 2005, 46(Suppl 11):15-17. 5. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bulletin of the World Health Organization 2006, 84(3):239-245. 6. McLarty DG, Unwin N, Kitange HM, Alberti KG: Diabetes mellitus as a cause of death in sub-Saharan Africa: results of a community-based study in Tanzania. The Adult Morbidity and Mortality Project. Diabet Med 1996, 13(11):990-994.

Conclusion There is high completion rate in the initial stages of VA, but a number of deaths are still lost during the later stages of VA process. The highest proportion of loss was due to physicians not assigning a definite cause of death after receiving the VA forms. Results suggest that

177


Mwanyangala et al. Population Health Metrics 2011, 9:41 http://www.pophealthmetrics.com/content/9/1/41

7.

8.

9. 10. 11.

12.

13.

14.

15. 16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28. 29.

Page 7 of 7

Kamali A, Wagner HU, Nakiyingi J, Sabiiti I, Kengeya-Kayondo JF, Mulder DW: Verbal autopsy as a tool for diagnosing HIV-related adult deaths in rural Uganda. International journal of epidemiology 1996, 25(3):679-684. Sacarlal J, Nhacolo AQ, Sigauque B, Nhalungo DA, Abacassamo F, Sacoor CN, Aide P, Machevo S, Nhampossa T, Macete EV, et al: A 10 year study of the cause of death in children under 15 years in Manhica, Mozambique. BMC public health 2009, 9:67. Parkar SR, Nagarsekar B, Weiss MG: Explaining suicide in an urban slum of Mumbai, India: a sociocultural autopsy. Crisis 2009, 30(4):192-201. Gajalakshmi V, Peto R: Verbal autopsy of 80,000 adult deaths in Tamilnadu, South India. BMC public health 2004, 4:47. Kaufman JS, Asuzu MC, Rotimi CN, Johnson OO, Owoaje EE, Cooper RS: The absence of adult mortality data for sub-Saharan Africa: a practical solution. Bulletin of the World Health Organization 1997, 75(5):389-395. Mirza NM, Macharia WM, Wafula EM, Agwanda RO, Onyango FE: Verbal autopsy: a tool for determining cause of death in a community. East African medical journal 1990, 67(10):693-698. Joshi R, Kengne AP, Neal B: Methodological trends in studies based on verbal autopsies before and after published guidelines. Bulletin of the World Health Organization 2009, 87(9):678-682. Snow RW, Armstrong JR, Forster D, Winstanley MT, Marsh VM, Newton CR, Waruiru C, Mwangi I, Winstanley PA, Marsh K: Childhood deaths in Africa: uses and limitations of verbal autopsies. Lancet 1992, 340(8815):351-355. Kumar V, Datta N: Lay reporting and verbal autopsy in assessment of infant mortality. Indian journal of pediatrics 1986, 53(6):672-674. van Eijk AM, Adazu K, Ofware P, Vulule J, Hamel M, Slutsker L: Causes of deaths using verbal autopsy among adolescents and adults in rural western Kenya. Trop Med Int Health 2008, 13(10):1314-1324. Thatte N, Kalter HD, Baqui AH, Williams EM, Darmstadt GL: Ascertaining causes of neonatal deaths using verbal autopsy: current methods and challenges. J Perinatol 2009, 29(3):187-194. Kalaichandran A, Zakus D: The obstetric pathology of poverty: maternal mortality in Kep province, Cambodia. World health & population 2007, 9(2):38-47. Rosenstein MG, Romero M, Ramos S: Maternal mortality in Argentina: a closer look at women who die outside of the health system. Maternal and child health journal 2008, 12(4):519-524. Huy TQ, Johansson A, Long NH: Reasons for not reporting deaths: a qualitative study in rural Vietnam. World health & population 2007, 9(1):14-23. Setel PW, Rao C, Hemed Y, Whiting DR, Yang G, Chandramohan D, Alberti KG, Lopez AD: Core verbal autopsy procedures with comparative validation results from two countries. PLoS medicine 2006, 3(8):e268. Campos D, Franca E, Loschi RH, Souza Mde F: [Verbal autopsy for investigating deaths from ill-defined causes in Minas Gerais State, Brazil]. Cadernos de saude publica/Ministerio da Saude, Fundacao Oswaldo Cruz, Escola Nacional de Saude Publica 26(6):1221-1233. Deressa W, Fantahun M, Ali A: Malaria-related mortality based on verbal autopsy in an area of low endemicity in a predominantly rural population in Ethiopia. Malaria journal 2007, 6:128. Huong DL, Minh HV, Vos T, Janlert U, Van do D, Byass P: Burden of premature mortality in rural Vietnam from 1999-2003: analyses from a Demographic Surveillance Site. Population health metrics 2006, 4:9. Hammer GP, Some F, Muller O, Kynast-Wolf G, Kouyate B, Becher H: Pattern of cause-specific childhood mortality in a malaria endemic area of Burkina Faso. Malaria journal 2006, 5:47. Fikree FF, Azam SI, Berendes HW: Time to focus child survival programmes on the newborn: assessment of levels and causes of infant mortality in rural Pakistan. Bulletin of the World Health Organization 2002, 80(4):271-276. Garrib A, Jaffar S, Knight S, Bradshaw D, Bennish ML: Rates and causes of child mortality in an area of high HIV prevalence in rural South Africa. Trop Med Int Health 2006, 11(12):1841-1848. Iriya N, Manji KP, Mbise RL: Verbal autopsy in establishing cause of perinatal death. East African medical journal 2002, 79(2):82-84. Bhandari N, Bahl R, Taneja S, Martines J, Bhan MK: Pathways to infant mortality in urban slums of Delhi, India: implications for improving the quality of community- and hospital-based programmes. Journal of health, population, and nutrition 2002, 20(2):148-155.

30. Jafarey SN, Rizvi T, Koblinsky M, Kureshy N: Verbal autopsy of maternal deaths in two districts of Pakistan–filling information gaps. Journal of health, population, and nutrition 2009, 27(2):170-183. 31. Ngo AD, Rao C, Hoa NP, Adair T, Chuc NT: Mortality patterns in Vietnam, 2006: Findings from a national verbal autopsy survey. BMC research notes 3:78. 32. Hetzel MW, Alba S, Fankhauser M, Mayumana I, Lengeler C, Obrist B, Nathan R, Makemba AM, Mshana C, Schulze A, et al: Malaria risk and access to prevention and treatment in the paddies of the Kilombero Valley, Tanzania. Malaria journal 2008, 7:7. 33. Ba MG, Kodio B, Etard JF: [Verbal autopsy to measure maternal mortality in rural Senegal]. Journal de gynecologie, obstetrique et biologie de la reproduction 2003, 32(8 Pt 1):728-735. 34. Marsh DR, Sadruddin S, Fikree FF, Krishnan C, Darmstadt GL: Validation of verbal autopsy to determine the cause of 137 neonatal deaths in Karachi, Pakistan. Paediatric and perinatal epidemiology 2003, 17(2):132-142. 35. Morris SK, Bassani DG, Kumar R, Awasthi S, Paul VK, Jha P: Factors associated with physician agreement on verbal autopsy of over 27000 childhood deaths in India. PloS one 5(3):e9583. 36. Oti SO, Kyobutungi C: Verbal autopsy interpretation: a comparative analysis of the InterVA model versus physician review in determining causes of death in the Nairobi DSS. Population health metrics 8:21. 37. Ronsmans C, Vanneste AM, Chakraborty J, Van Ginneken J: A comparison of three verbal autopsy methods to ascertain levels and causes of maternal deaths in Matlab, Bangladesh. International journal of epidemiology 1998, 27(4):660-666. doi:10.1186/1478-7954-9-41 Cite this article as: Mwanyangala et al.: Verbal autopsy completion rate and factors associated with undetermined cause of death in a rural resource-poor setting of Tanzania. Population Health Metrics 2011 9:41.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

178


Engmann et al. Population Health Metrics 2011, 9:42 http://www.pophealthmetrics.com/content/9/1/42

RESEARCH

Open Access

Classifying perinatal mortality using verbal autopsy: is there a role for nonphysicians? Cyril Engmann1*, John Ditekemena2, Imtiaz Jehan3, Ana Garces4, Mutinta Phiri5, Vanessa Thorsten6, Manolo Mazariegos7, Elwyn Chomba5, Omrana Pasha3, Antoinette Tshefu2, Elizabeth M McClure6, Dennis Wallace6, Robert L Goldenberg8, Waldemar A Carlo9, Linda L Wright10 and Carl Bose1 Abstract Background: Because of a physician shortage in many low-income countries, the use of nonphysicians to classify perinatal mortality (stillbirth and early neonatal death) using verbal autopsy could be useful. Objective: To determine the extent to which underlying perinatal causes of deaths assigned by nonphysicians in Guatemala, Pakistan, Zambia, and the Democratic Republic of the Congo using a verbal autopsy method are concordant with underlying perinatal cause of death assigned by physician panels. Methods: Using a train-the-trainer model, 13 physicians and 40 nonphysicians were trained to determine cause of death using a standardized verbal autopsy training program. Subsequently, panels of two physicians and individual nonphysicians from this trained cohort independently reviewed verbal autopsy data from a sample of 118 early neonatal deaths and 134 stillbirths. With the cause of death assigned by the physician panel as the reference standard, sensitivity, specificity, positive and negative predictive values, and cause-specific mortality fractions were calculated to assess nonphysicians’ coding responses. Robustness criteria to assess how well nonphysicians performed were used. Results: Causes of early neonatal death and stillbirth assigned by nonphysicians were concordant with physicianassigned causes 47% and 57% of the time, respectively. Tetanus filled robustness criteria for early neonatal death, and cord prolapse filled robustness criteria for stillbirth. Conclusions: There are significant differences in underlying cause of death as determined by physicians and nonphysicians even when they receive similar training in cause of death determination. Currently, it does not appear that nonphysicians can be used reliably to assign underlying cause of perinatal death using verbal autopsy.

Background Understanding population-based causes of perinatal death (stillbirth [SB] and early neonatal deaths [END] i.e., newborn deaths in the first seven days of life) is essential when developing an effective perinatal health policy [1]. Because there will always be competing demands for health care resources, a robust system constructed to identify and assign a medically-determined cause of death (COD) for each perinatal death is highly desirable [2]. In many high-income countries, there is a complete record of each death, and 90% of these have * Correspondence: cengmann@med.unc.edu 1 Departments of Pediatrics and Maternal Child Health, University of North Carolina at Chapel Hill, North Carolina, USA Full list of author information is available at the end of the article

medical certification of COD [3]. By contrast, many low- and middle- income countries, which have the highest burden of poverty and disease, continue to lack routine, representative, and high-quality information on COD and population-based cause-specific mortality fractions (CSMF) [4]. Fewer than 3% of all perinatal deaths in low- and middle-income countries have medical certification of COD [5]. In part, this may be because more than half of all births and perinatal deaths occur in the home and are frequently unrecorded in vital registration or health systems [6]. Increasing numbers of low- and middle-income countries are using verbal autopsy (VA) methods as an epidemiologic tool to inform mortality surveillance systems [7]. To determine perinatal mortality, the VA method

Š 2011 Engmann et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

179


Engmann et al. Population Health Metrics 2011, 9:42 http://www.pophealthmetrics.com/content/9/1/42

Page 2 of 10

health training) or community health workers (high school graduates with 18 months of health training) designated as community coordinators. Within one week of an END or SB, birth attendants notified community coordinators who then visited the family, determined eligibility for the study, and requested consent from eligible mothers. Perinatal deaths were excluded if they occurred in a hospital, if a birth attendant was not present at delivery, if the mother was unavailable for any reason (including peripartum death), or if the mother could not be enrolled within seven days of death. A seven-day enrollment window was chosen to reduce the variability in the quality of reporting introduced by recall bias [21-23]. Because the conventional perinatal VA respondents are mothers, we included only those subjects whose mothers were available for interview. Informed consent was obtained from mothers in a private and confidential setting. The consent form was read to all mothers who then provided their signatures or, if they were illiterate, thumbprints.

relies on information obtained from an interview with the primary caregiver (usually the mother) of the deceased. During this process, the symptoms, signs, and behaviors during the illness of the deceased, or of the mother in the case of fetal death, are recorded [8]. This information is summarized and reviewed and the most probable COD assigned. VA is proving to be a costeffective, practical, and sustainable alternative to a thorough medical diagnostic evaluation where vital registration systems are weak [9]. A variety of methods exist for interpreting VA interviews to arrive at a COD. The most commonly used method has two or three trained physician coders review the data and independently assign a COD [10]. Any discrepancies between the COD assigned by each physician member of the panel are resolved by discussion and review of the VA data, and a final consensus COD is agreed upon by the physician panel. Alternatively, COD can be assigned by the use of predetermined criteria/ algorithms, computer simulations, or probabilistic approaches, all of which do not require the presence of a physician [11-15]. There is a widespread physician shortage in many low-income countries and significant costs incurred in recruiting, training, and utilizing physicians. Reports suggest that nonphysician providers can conduct specified clinical tasks with adequate training [16-18]. We previously reported that when taught a standardized VA package in a classroom setting, nurses and midwives achieve a level of cognitive and applied knowledge comparable to physicians in determining perinatal COD [19]. Thus, we sought to investigate whether, following this training, nonphysicians can determine causes of SB and END in rural communities as reliably as physicians.

Training and VA methodology

Neither community coordinators nor physicians had prior experience before the study with the use of VA to determine COD. All community coordinators and physicians participating in this study received standardized training in VA methods over three days, via a train-thetrainer method [19]. Community coordinators were trained to interview mothers using the VA questionnaire. To assign underlying COD, both community coordinators and physicians were trained in the classification, rules, and guidelines of the 10th revision of the International Classification of Diseases (ICD-10). Underlying COD was defined as the single most important disease or condition that initiated the train of morbid events leading directly to fetal or neonatal death. Uniform data describing the circumstances surrounding a perinatal death were collected from each mother using a standardized VA questionnaire developed specifically for this study from a validated VA tool [24]. The questionnaire was administered by the community coordinators who then sent these data separately to two local physicians. Additionally, the community coordinators and physicians were provided with demographic and other descriptive data collected as part of the FIRST BREATH Trial. Each community coordinator and physician independently first determined whether the death occurred prior to birth and was therefore classified as a SB, or after a live birth and classified as an END. Then they assigned one underlying COD. After the COD was assigned and entered independently, any discrepancy in assignment of COD between physicians was discussed and consensus underlying COD was assigned. The underlying COD assigned by the community

Methods Setting, subjects, and study design

This prospective observational study was nested within the FIRST BREATH Trial conducted by the Eunice Kennedy Shriver National Institute of Child Health and Human Development Global Network for Women’s and Children’s Health Research [20]. The FIRST BREATH Trial was a cluster randomized, controlled trial that investigated the effects of implementing a package of newborn care practices and newborn resuscitation in community settings. This VA study included 38 communities from Guatemala (Chimaltenango province), the Democratic Republic of the Congo (DRC) (Equateur province), Zambia (Kafue district), and Pakistan (Thatta district). Each community comprised a cluster of villages with approximately 300 deliveries per year. Data describing births were collected by birth attendants and reviewed by trained nurse-midwives (with three to four years of

180


Engmann et al. Population Health Metrics 2011, 9:42 http://www.pophealthmetrics.com/content/9/1/42

Page 3 of 10

assessed the degree of robustness of community coordinator responses. We defined robustness using criteria previously described, utilized, and published by Setel et al [26]. To be considered “robust” a condition must meet the following criteria: 1) sensitivity > 50%, 2) specificity > (1-CSMF of the physician consensus), and 3) relative difference between the CSMF for the community coordinator and the CSMF for the physician consensus within 20%. The relative difference was calculated as follows: absolute value ((CSMF of the physician consensus - CSMF of the community coordinator)/CSMF of the physician consensus × 100%). Additionally, we calculated the level of agreement between the physician consensus and community coordinators, using Cohen’s kappa statistic. Levels of agreement based on ranges of kappa values were defined as follows: 0.81-0.99, almost perfect agreement; 0.61-0.80, substantial agreement; 0.41-0.60 moderate agreement; and less than 0.4, slight to fair agreement [24].

coordinator was then compared to the consensus underlying COD assigned by the physician panel. Data collection and analysis

Data were entered and transmitted electronically to the data coordinating center (Research Triangle Institute, Research Triangle Park, NC, USA) where data edits, including inter- and intraform consistency checks, were performed. The study was reviewed and approved by the institutional ethics review committees of the Research Triangle Institute, the University of North Carolina at Chapel Hill, and in-country Institutional Review Boards. Data were analyzed using SAS (SAS/STAT® Software version 9.2). Physician perinatal COD responses were viewed as the reference standard for calculations of sensitivity, specificity, positive and negative predictive values, and CSMF, which were calculated using conventional two-by-two table analysis. The Delta method was used to calculate confidence intervals for the CSMFs [25]. We defined the CSMF as the number of perinatal deaths (END or SB) due to a specific cause divided by the total number of deaths. Before the start of the study, our a priori hypothesis was that the COD assigned by community coordinators would be concordant with the COD assigned by the physician panel in greater than 70% of perinatal deaths, and we powered our study accordingly. We also

Results The study period was from May 2007 to June 2008, during which 9,461 infants were born in the designated communities. Among these, birth attendants identified 518 SB and END (Figure 1). The SB, END, and perinatal mortality rates were 30/1000 births, 25/1000 live births, and 55/1000 births, respectively. Of the 518 deaths, 81 were ineligible for the study because the delivery

9461 Births

518 Perinatal Deaths x

x

229 Early neonatal deaths 289 Stillbirths

81 Ineligible

x x

Birth attendant absent at delivery 2 Infant delivered in a hospital 79

x x

Mother did not provide consent 40 Mother unavailable for interview within seven days after the death 145

437 Eligible Deaths 185 Not Enrolled 252 Enrolled

i

134 Stillbirths

118 Early Neonatal Deaths

Figure 1 The verbal autopsy study population.

181

i


Engmann et al. Population Health Metrics 2011, 9:42 http://www.pophealthmetrics.com/content/9/1/42

Page 4 of 10

The positive predictive values for infections and tetanus were 0.83 and 0.67, respectively, and the negative predictive values for preterm/low birth weight, trauma, and tetanus were 0.93 or above. The relative difference between CSMF assigned by physician panels and community coordinators was 20% for tetanus and > 20% for all other diagnostic categories. Only the diagnosis of tetanus fulfilled criteria for robustness. When the level of agreement among the different diagnostic categories was considered using Cohen’s kappa statistic, there was substantial agreement for the diagnosis of tetanus (0.71); all other categories showed slight or only moderate agreement.

occurred in a hospital (79) or the birth attendant was absent at the time of delivery (2). Among eligible deaths, 185 were not enrolled because the mother was not available for interview within seven days after the death (145) or did not provide consent (40). This study reports on 252 perinatal deaths (134 SBs and 118 ENDs), based on determinations by the physicians regarding the timing of perinatal deaths. Concordance of stillbirth and early neonatal death between physicians and nonphysicians

Ninety-three percent of perinatal deaths determined by physicians to be SBs were classified as SBs by community coordinators; the remainder were classified as ENDs. Ninety-five percent of perinatal deaths determined by physicians to be ENDs were classified by community coordinators as ENDs; the remainder were classified as SBs. Concordance between physicians and community coordinators in the determination of timing of perinatal deaths did not vary between the two classes of community coordinators (nurse-midwives and community health workers).

Stillbirth

Table 3 summarizes the comparison of community coordinator and physician underlying COD for SB. Overall, causes of SB assigned by community coordinators were concordant with causes of SB assigned by physician panels 57% of the time. Table 4 presents the sensitivity, specificity, positive and negative predictive values, and CSMF of underlying SB COD assigned by community coordinators. Kappa values are additionally included. Sensitivity for antepartum hemorrhage and maternal infection was 0.62 and 0.64, respectively, and for all other known COD categories was 0.57 or less. Specificity was generally high (0.94 or higher) except for prematurity, for which the specificity was 0.86. The positive predictive values of antepartum hemorrhage, maternal infection, maternal accident, and prolonged labor were 0.80 or higher. Maternal infections had a negative predictive value of 0.81; all the other SB COD categories had values of 0.94 or above. The relative difference between CSMF assigned by physician panels and community coordinators was 13% for cord prolapse; all other SB diagnostic categories were greater than 20%.

Early neonatal death

Table 1 compares underlying causes of END assigned by physician panels and community coordinators. Overall, causes of END assigned by community coordinators were concordant with causes of END assigned by physician panels 47% of the time. Table 2 describes the sensitivity, specificity, positive and negative predictive values, and CSMF of specific underlying causes of END assigned by community coordinators. Kappa values are additionally included. Sensitivity and specificity were high for preterm/low birth weight and tetanus. By contrast, sensitivity was low for infections and asphyxia, although specificity for both of these was 0.90 or above.

Table 1 Comparison of physician consensus (PC) and community coordinator (CC) for underlying neonatal cause of death (COD) CC for underlying cause of death

PC for underlying cause of death Preterm/ Low birth weight

CC total

Asphyxia

Fetal Trauma

Tetanus

Unknown/ no cause

Other1

Preterm/low birth weight

14

11

3

0

1

0

2

31

Infection

1

19

3

0

0

0

0

23

Asphyxia

4

5

13

0

0

0

0

22

Fetal trauma

0

4

3

0

0

0

0

7

Tetanus

0

2

0

0

4

0

0

6

Unknown/no cause

0

6

3

0

0

2

1

12

Other1

1

5

6

0

0

2

3

17

PC total

20

52

31

0

5

4

6

118

1

Other includes congenital malformation, birth trauma, and neonatal accident, Percent concordant = (55/118) Ă— 100 = 46.6%. (This includes COD coded as Other.)

182


Underlying COD, as determined by CC

Measures2

Underlying COD, as determined by PC (reference standard) Identified as COD

Not identified as COD

Preterm/LBW

20

98

COD

14 (70.0)

17 (17.3)

Not COD

6 (30.0)

81 (82.7)

Infection

52

66

COD Not COD

19 (36.5) 33 (63.5)

4 (6.1) 62 (93.9)

Asphyxia

31

87

COD

13 (41.9)

9 (10.3)

Not COD

18 (58.1)

78 (89.7)

Fetal Trauma

0

118

COD

0 (0.0)

7 (5.9)

Not COD

0 (0.0)

111 (94.1)

Tetanus

5

113

COD

4 (80.0)

2 (1.8)

Not COD

1 (20.0)

111 (98.2)

Unknown/ no cause

4

114

COD Not COD

2 (50.0) 2 (50.0)

10 (8.8) 104 (91.2)

Other1

6

112

COD

3 (50.0)

14 (12.5)

Not COD

3 (50.0)

98 (87.5)

PPV NPV

CSMFPC (95% CI)

1-CSMFPC

CSMFCC (95% CI)

1-CSMFCC

RD (95% CI)

Kappa (95 CI)

0.70 0.83 0.45 0.93

0.17 (0.09, 0.25)

0.83

0.26 (0.16, 0.37)

0.74

0.55 (0.08,1.04)

0.43 (0.24, 0.62)

0.37 0.94 0.83 0.65

0.44 (0.29, 0.59)

0.56

0.19 (0.11, 0.28)

0.81

0.56 0.32 (0.40, 0.70) (0.17, 0.48)

0.42 0.90 0.59 0.81

0.26 (0.16, 0.37)

0.74

0.19 (0.10, 0.27)

0.81

0.29 0.35 (0.03, 0.54) (0.15, 0.54)

0.06 (0.01, 0.11)

0.94

0.80 0.98 0.67 0.99

0.04 (0.00, 0.08)

0.96

0.05 (0.01, 0.09)

0.95

0.20 (0.00,2.00)

0.50 0.91 0.17 0.98

0.03 (0.00, 0.07)

0.97

0.10 (0.04, 0.16)

0.90

2.00 0.21 (0.27, 11.0) (-0.07, 0.49)

0.50 0.88 0.18 0.97

0.05 (0.01, 0.09)

0.95

0.14 (0.07, 0.22)

0.86

1.83 0.20 (0.38,7.33) (-0.04, 0.44)

SE

SP

0.94 0.00 1.00

Engmann et al. Population Health Metrics 2011, 9:42 http://www.pophealthmetrics.com/content/9/1/42

Table 2 Comparison of neonatal underlying cause reported by physician consensus (PC) to underlying cause of death reported by community coordinator (CC) for select causes of early neonatal death (n = 118)

0.71 0.41, 1.00)

1

Other includes congenital malformation, birth trauma, and neonatal accident. 2 Definitions of measures are provided in the methods section. Measures are as follows: sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), cause-specific mortality fraction (CSMF) of physician consensus (PC) and community coordinator (CC), and absolute relative difference of the CSMF (RD). RD was calculated using the actual numeric values rather than the rounded CSMFs reported in the table.

Page 5 of 10

183


Engmann et al. Population Health Metrics 2011, 9:42 http://www.pophealthmetrics.com/content/9/1/42

Page 6 of 10

Table 3 Comparison of physician consensus (PC) and community coordinator (CC) for underlying maternal cause of stillbirth CC for underlying maternal cause of death

PC underlying maternal cause of death Antepartum hemorrhage

Prematurity Accident Prolonged labor

Cord prolapse/ complication

CC total Malpresentation

Unknown/ Other1 no cause

Antepartum hemorrhage

8

1

0

0

1

0

0

0

0

10

Maternal infection/sepsis

1

32

0

0

0

1

0

2

1

37

Prematurity

3

9

4

1

0

0

0

2

2

21

Maternal accident

0

1

0

4

0

0

0

0

0

5

Prolonged labor

0

0

0

0

8

0

2

0

0

10

Cord prolapse/ complication

1

2

0

0

2

4

0

0

0

9

Malpresentation

0

1

0

0

0

0

2

0

2

5

Unknown/no cause

0

3

5

2

1

1

0

11

4

27

Other1

0

1

0

0

3

2

0

1

3

10

PC total

13

50

9

7

15

8

4

16

12

134

1

Other includes multiple delivery, hypertension/eclampsia, and post-term delivery Percent concordant = (76/134) Ă— 100 = 56.7%. (This includes COD coded as Other.)

methods, and none has compared nonphysicians from a variety of countries with a range of backgrounds such as nurse-midwives in Zambia and DRC and community health workers in Pakistan and Guatemala [27]. Our group previously examined how well community coordinators and physicians performed when taught a structured VA program in a classroom setting [19]. In both cognitive and applied knowledge, community coordinators’ pretest results were lower than physicians; however, these results improved significantly post-test, with nurse-midwives showing comparable results to physicians. In light of these data we undertook the present study. Our study showed that despite the ability to improve cognitive and applied knowledge in the classroom setting, this knowledge did not result in nonphysicians reaching similar conclusions about COD in actual practice. Chowdhury et al. reported on the use of medical assistants (with three years of institutional training) in a single site in Matlab, Bangladesh, to interpret neonatal VA data and assign COD [27]. When specific diagnostic categories assigned by medical assistants were compared to physician panels, birth asphyxia showed good reliability with kappa values of 0.77, while prematurity, respiratory distress syndrome, pneumonia, and sepsis/ meningitis showed moderate agreement, with kappa values between 0.51 and 0.59. The authors concluded that medical assistants are generally knowledgeable about the disease profile of a geographic area, can generally use their clinical judgment and knowledge to determine COD for all ICD-10 classes of neonatal death, and may be considered an alternative for determining neonatal COD in rural areas where physicians

Cord prolapse fulfilled the robustness criteria. It is worth pointing out that where relative difference values are small, it may be because there are very small CSMFs. When the level of agreement among the different diagnostic categories was considered using Cohen’s kappa statistic, no category demonstrated almost perfect agreement.

Discussion There are three main findings from this study. The first is that given identical data from the VA questionnaires, community coordinators and physicians draw the same conclusions about the timing of perinatal death (SB and END) 95% of the time. Second, causes of SB and END assigned by community coordinators were concordant with causes of SB and END assigned by physician panels 57% and 47% of the time, respectively. Third, only one cause of END assigned by community coordinators (tetanus) met robustness criteria. Similarly, when robustness criteria were applied to SB diagnostic categories to assess the performance of community coordinators, only cord prolapse met criteria. Task-shifting of physician-domain responsibilities is an increasingly important concept that is gaining support in the literature [16,17]. Numerous authors have assessed the utilization and impact of nonphysician providers after being taught a structured curriculum [18,19]. These authors report that nonphysicians, specifically nurse-midwives, can perform comparably to physicians when taught a structured teaching program with adequate supervision. To our knowledge, only one study has compared nonphysicians to physicians in determining perinatal COD in the field using verbal autopsy

184


Underlying COD, as determined by CC

Identified as COD

Not identified as COD

Antepartum hemorrhage

13

121

COD Not COD

8 (61.5) 5 (38.5)

2 (1.7) 119 (98.3)

Maternal infection

50

84

COD

32 (64.0)

5 (6.0)

Not COD

18 (36.0)

79 (94.0)

Prematurity

9

125

COD

4 (44.4)

17 (13.6)

Not COD

5 (55.6)

108 (86.4)

Maternal accident

7

127

COD

4 (57.1)

1 (0.8)

Not COD

3 (42.9)

126 (99.2)

Prolonged labor

15

119

COD Not COD

8 (53.3) 7 (46.7)

2 (1.7) 117 (98.3)

Cord prolapse/complication

8

126

COD

4 (50.0)

5 (4.0)

Not COD

4 (50.0)

121 (96.0)

Malpresentation

4

130

COD

2 (50.0)

3 (2.3)

Not COD

2 (50.0)

127 (97.7)

Unknown/no cause

16

118

COD

11 (68.8)

16 (13.6)

Not COD

5 (31.3)

102 (86.4)

Other1

12

122

COD Not COD

3 (25.0) 9 (75.0)

7 (5.7) 115 (94.3)

PPV NPV

CSMFPC (95% CI)

1-CSMFPC

CSMFCC (95% CI)

1-CSMFCC

RD Kappa (95% CI) (95% CI)

0.62 0.98 0.80 0.96

0.10 (0.04, 0.15)

0.90

0.07 (0.03, 0.12)

0.93

0.23 0.67 (0.00,0.57) (0.44,0.9)

0.64 0.94 0.86 0.81

0.37 (0.25, 0.50)

0.63

0.28 (0.17, 0.38)

0.72

0.26 0.62 (0.87,0.42) (0.47,0.75)

0.44 0.86 0.19 0.96

0.07 (0.02, 0.11)

0.93

0.16 (0.08, 0.23)

0.84

1.33 0.19 (0.24,4.40) (-0.02,0.4)

0.57 0.99 0.80 0.98

0.05 (0.01, 0.09)

0.95

0.04 (0.00, 0.07)

0.96

0.29 0.65 (0.00,0.80) (0.33,0.97)

0.53 0.98 0.80 0.94

0.11 (0.05, 0.17)

0.89

0.07 (0.03, 0.12)

0.93

0.33 0.6 (0.00,0.64) (0.37,0.84)

0.50 0.96 0.44 0.97

0.06 (0.02, 0.10)

0.94

0.07 (0.02, 0.11)

0.93

0.13 0.43 (0.00,1.67) (0.13,0.74)

0.50 0.98 0.40 0.98

0.03 (0.00, 0.06)

0.97

0.04 (0.00, 0.07)

0.96

0.25 0.42 (0.00,3.00) (0.01,0.84)

0.69 0.86 0.41 0.95

0.12 (0.06, 0.18)

0.88

0.20 (0.12, 0.29)

0.80

0.69 0.43 (0.12,1.83) (0.23,0.62)

0.25 0.94 0.30 0.93

0.09 (0.04, 0.14)

0.91

0.07 (0.03, 0.12)

0.93

0.17 0.21 (0.00,0.82) 0.05,0.47)

SE

SP

Other includes multiple delivery, hypertension/eclampsia, post-term delivery, and other causes. Definitions of measures are provided in the methods section. Measures are as follows: sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), cause-specific mortality fraction (CSMF) of physician consensus (PC) and community coordinator (CC), and absolute relative difference of CSMF (RD). RD was calculated using the actual numeric values rather than the rounded CSMFs reported in the table. 185

2

Page 7 of 10

1

Measures2

Underlying COD, as determined by PC (reference standard)

Engmann et al. Population Health Metrics 2011, 9:42 http://www.pophealthmetrics.com/content/9/1/42

Table 4 Comparison of maternal underlying cause reported by physician consensus (PC) to underlying cause reported by community coordinator (CC) for select causes of stillbirth (n = 134)


Engmann et al. Population Health Metrics 2011, 9:42 http://www.pophealthmetrics.com/content/9/1/42

Page 8 of 10

symptoms [29]. This approach calculates the likelihood of each COD and displays as many as three of the most probable COD, along with their associated likelihoods [11]. More recently, King and Lu developed an alternative probabilistic method which directly estimates CSMF without individual COD attribution [1]. Data on symptoms reported by caregivers along with COD are collected, and the COD distribution is estimated in the population in which only symptom data are available. Each of these methods has its advantages and drawbacks in terms of cost effectiveness, complexity, repeatability, and validity. For example, the King-Lu method depends on the availability of high-quality, facility-based, or valid mortality data, which are lacking in most settings where VA is needed. A major strength of this study is the use of a standardized VA training program for both nonphysicians and physicians. There are some limitations to this study. We did not use the harmonized VA questionnaire developed and published by WHO, since this was published after the start of the study. However, we believe that the results of our study would have been similar had we used this tool because the tools are broadly similar. Other limitations of this study include the lack of available medical diagnostic aids (laboratory, radiologic, or microbiologic studies) and lack of a postmortem examination for validating the underlying COD assigned by physicians. Although the COD determined by the physician panel is often the traditional reference standard in VA methodology, physician panels have their limitations: their assignments of COD may contain systematic biases, they may not readily code diseases unexpected in certain demographic groups, they tend to focus on the presence rather than the absence of symptoms, and they show a preference for highly specific diagnosis [12,30,31]. It is conceivable that direct interactions between the community coordinator and the respondents (mothers and birth attendants) may have provided the community coordinators with more information from the respondents and the environment about the circumstances, signs, and symptoms of the deceased before death than was recorded on the standardized VA questionnaire. Verbal autopsy has been used in a variety of ways, including: to determine priority diseases and programmatic intervention; to conduct rapid assessments in emergency/disaster situations; as sample registration of vital events; and perhaps most importantly, to describe population-level CSMF. If Millennium Development Goals related to pregnancy outcomes are to be achieved, it is imperative to understand more about the perinatal CODs which contribute disproportionately to under-5 mortality in low-income countries. The shortage of human and material resources makes routine perinatal autopsies for deaths that occur in a community setting

are scarce. A number of reasons may explain the differences observed between our study and those of Chowdhury et al. First, in our study causes of both SB and END were assigned, in contrast to neonatal outcomes only from Matlab. Also, community coordinators had different educational backgrounds. Nurses and midwives had three to four years post-high school health training, whereas community health workers had only 18 months post-high school health training, in contrast to medical assistants from Matlab who had three years training. The community coordinators in our study had no a priori experience with VA and assigning COD, unlike the Matlab cohort, and our study was a multisite study compared to the single site of Matlab, where the medical assistants over the years have been closely involved in verbal autopsy work in the demographic surveillance program. The concept of underlying COD, the single most important disease or condition that initiated the chain of events leading directly to fetal or neonatal death, is complex. It requires a deep appreciation of pathophysiology and, especially in the case of perinatal death, consideration of both the mother and the fetus. To effectively utilize this concept, the coder has to “construct a story” of what happened. The key initiating factor, without which the death most likely might not have occurred, is the cornerstone of the “story.” It is possible that community coordinator responses might reflect other categories of COD, such as the final and contributing COD as described in the ICD-10. Further research is needed to determine whether concordance of community coordinator-assigned COD when utilizing multiple or other categories of COD might yield higher concordance with physician panels. Comparing only underlying COD may be a potential limitation in this study. It appears increasingly evident that individual perinatal deaths in low-income countries may have several causes. Thus, forcing assignment of a perinatal death into a single underlying cause, as required for ICD-10, may be less useful than previously appreciated [14,28]. For example, a combination of prematurity, birth asphyxia, and infection may coexist in an END, and it may be more useful from a public health policy perspective to consider all causes of death collectively. Additionally, some authors use final COD instead of underlying COD assignments [27]. Although the use of nonphysician coders to assign COD may not be a suitable alternative to the use of physician coders, other alternatives may have a role. A number of computer simulation techniques have been developed that address multiple COD and CSMF. Byass et al. have described a Bayesian approach called InterVA that simultaneously adjusts the probability of a finite list of causes according to affirmative answers to specific

186


Engmann et al. Population Health Metrics 2011, 9:42 http://www.pophealthmetrics.com/content/9/1/42

Page 9 of 10

in low-income countries unlikely. The relatively limited number of symptoms and signs exhibited by the fetus and neonate compared to adults and the strength of studies conducted make perinatal verbal autopsy an attractive medium-term alternative.

2. 3.

4. 5.

Conclusions The most common method of assigning COD is the use of physician panels, which are costly and utilize scarce physician availability. In this study, the COD assigned by nonphysicians agreed with the COD determined by physicians about 50% of the time, and only tetanus and cord prolapse met robustness criteria. Although it may be too early to recommend against using nonphysicians to determine perinatal COD, based on our data, we recommend that further research be performed before nonphysicians are asked to determine perinatal COD in any settings in low-income countries.

6.

7.

8.

9. 10. 11. 12.

Funding Funding was provided by grants from the National Institutes of Child Health and Human Development (U01 HD 40636) and the Bill & Melinda Gates Foundation.

13.

14. Acknowledgements The authors would like to thank all the NICHD Global Network staff, community health workers, nurses, midwives, and physicians. Most of all, we thank the mothers and their babies.

15.

Author details 1 Departments of Pediatrics and Maternal Child Health, University of North Carolina at Chapel Hill, North Carolina, USA. 2Kinshasa School of Public Health, Kinshasa, Democratic Republic of Congo. 3Department of Community Health Sciences, The Aga Khan University, Karachi, Pakistan. 4 IMSALUD/San Carlos University, Guatemala City, Guatemala. 5Department of Pediatrics and Child Health, University Teaching Hospital, Lusaka, Zambia. 6 Research Triangle Institute, Durham, North Carolina, USA. 7Institute of Nutrition for Central America and Panama (INCAP), Guatemala City, Guatemala. 8College of Medicine, Drexel University, Philadelphia, Pennsylvania, USA. 9Department of Pediatrics, University of Alabama at Birmingham, Alabama, USA. 10Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, USA.

17.

16.

18.

19.

20. 21.

Authors’ contributions CE, VT, EM, DW, CB, WC, RG and LL had significant intellectual input in the conception and design of the study, data acquisition and analyses, draft writing, and final approval of the study. AG, IJ, JD, MP, MM, EC, OP and AT had significant input in the design of this study, data acquisition and analyses, draft writing, and final approval of the study. All authors read and approved the final manuscript.

22.

23. 24.

Competing interests The authors declare that they have no competing interests.

25. 26.

Received: 24 March 2011 Accepted: 5 August 2011 Published: 5 August 2011 27. References 1. Lopez AD, Mathers CD: Measuring the global burden of disease and epidemiological transitions: 2002-2030. Ann Trop Med Parasitol 2006, 100:481-99.

187

Engmann C, Matendo R, Kinoshita R, et al: Stillbirth and early neonatal mortality in rural Central Africa. Int J Gynaecol Obstet 2009, 105:112-7. Mathers CD, Fat DM, Inoue M, Rao C, Lopez AD: Counting the dead and what they died from: an assessment of the global status of cause of death data. Bull World Health Organ 2005, 83:171-7. Setel PW, Macfarlane SB, Szreter S, et al: A scandal of invisibility: making everyone count by counting everyone. Lancet 2007. Lawn J, Shibuya K, Stein C: No cry at birth: global estimates of intrapartum stillbirths and intrapartum-related neonatal deaths. Bull World Health Organ 2005, 83:409-17. Lawn JE, Osrin D, Adler A, Cousens S: Four million neonatal deaths: counting and attribution of cause of death. Paediatr Perinat Epidemiol 2008, 22:410-6. Hill K, Lopez AD, Shibuya K, et al: Interim measures for meeting needs for health sector data: births, deaths, and causes of death. Lancet 2007, 370:1726-35. Chandramohan D, Setel P, Quigley M: Effect of misclassification of causes of death in verbal autopsy: can it be adjusted? Int J Epidemiol 2001, 30:509-14. Baiden F, Bawah A, Biai S, et al: Setting international standards for verbal autopsy. Bull World Health Organ 2007, 85:570-1. Garenne M, Fauveau V: Potential and limits of verbal autopsies. Bull World Health Organ 2006, 84:164. Byass P, Fottrell E, Dao LH, et al: Refining a probabilistic model for interpreting verbal autopsy data. Scand J Public Health 2006, 34:26-31. Fantahun M, Fottrell E, Berhane Y, Wall S, Hogberg U, Byass P: Assessing a new approach to verbal autopsy interpretation in a rural Ethiopian community: the InterVA model. Bull World Health Organ 2006, 84:204-10. Freeman JV, Christian P, Khatry SK, et al: Evaluation of neonatal verbal autopsy using physician review versus algorithm-based cause-of-death assignment in rural Nepal. Paediatr Perinat Epidemiol 2005, 19:323-31. King G, Lu Y, Shibuya K: Designing verbal autopsy studies. Popul Health Metr 2010, 8:19. Quigley MA, Chandramohan D, Setel P, Binka F, Rodrigues LC: Validity of data-derived algorithms for ascertaining causes of adult death in two African sites using verbal autopsy. Trop Med Int Health 2000, 5:33-9. Chu K, Rosseel P, Gielis P, Ford N: Surgical task shifting in Sub-Saharan Africa. PLoS Med 2009, 6:e1000078. Labhardt ND, Balo JR, Ndam M, Grimm JJ, Manga E: Task shifting to nonphysician clinicians for integrated management of hypertension and diabetes in rural Cameroon: a programme assessment at two years. BMC Health Serv Res 2010, 10:339. Enweronu-Laryea C, Engmann C, Osafo A, Bose C: Evaluating the effectiveness of a strategy for teaching neonatal resuscitation in West Africa. Resuscitation 2009, 80:1308-11. Engmann C, Jehan I, Ditekemena J, et al: Using verbal autopsy to ascertain perinatal cause of death: are trained nonphysicians adequate? Trop Med Int Health 2009, 14:1496-504. Carlo WA, Goudar SS, Jehan I, et al: Newborn-care training and perinatal mortality in developing countries. N Engl J Med 2010, 362:614-23. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84:239-45. Lee AC, Mullany LC, Tielsch JM, et al: Verbal autopsy methods to ascertain birth asphyxia deaths in a community-based setting in southern Nepal. Pediatrics 2008, 121:e1372-80. Fottrell E, Byass P: Verbal autopsy: methods in transition. Epidemiol Rev 2010, 32:38-55. Engmann C, Jehan I, Ditekemena J, et al: An alternative strategy for perinatal verbal autopsy coding: single versus multiple coders. Tropical Medicine and International Health 2011, 16:18-29. Agresti A: Categorical Data Analysis. John Wiley & Sons Incorporated. (New York); 1990. Setel PW, Whiting DR, Hemed Y, et al: Validity of verbal autopsy procedures for determining cause of death in Tanzania. Trop Med Int Health 2006, 11:681-96. Chowdhury HR, Thompson SC, Ali M, Alam N, Yunus M, Streatfield PK: A comparison of physicians and medical assistants in interpreting verbal autopsy interviews for allocating cause of neonatal death in Matlab, Bangladesh: can medical assistants be considered an alternative to physicians? Popul Health Metr 2010, 8:23.


Engmann et al. Population Health Metrics 2011, 9:42 http://www.pophealthmetrics.com/content/9/1/42

Page 10 of 10

28. Joshi R, Kengne AP, Neal B: Methodological trends in studies based on verbal autopsies before and after published guidelines. Bull World Health Organ 2009, 87:678-82. 29. Byass P, Kahn K, Fottrell E, Collinson MA, Tollman SM: Moving from data on deaths to public health policy in Agincourt, South Africa: approaches to analysing and understanding verbal autopsy findings. PLoS Med 2010, 7:e1000325. 30. Kahn K, Tollman SM, Garenne M, Gear JS: Validation and application of verbal autopsies in a rural area of South Africa. Trop Med Int Health 2000, 5:824-31. 31. Snow RW, Basto de Azevedo I, Forster D, et al: Maternal recall of symptoms associated with childhood deaths in rural east Africa. Int J Epidemiol 1993, 22:677-83. doi:10.1186/1478-7954-9-42 Cite this article as: Engmann et al.: Classifying perinatal mortality using verbal autopsy: is there a role for nonphysicians? Population Health Metrics 2011 9:42.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

188


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

RESEARCH

Open Access

Trends in causes of death among children under 5 in Bangladesh, 1993-2004: an exercise applying a standardized computer algorithm to assign causes of death using verbal autopsy data Li Liu1*, Qingfeng Li2, Rose A Lee1, Ingrid K Friberg1, Jamie Perin1, Neff Walker1 and Robert E Black1

Abstract Background: Trends in the causes of child mortality serve as important global health information to guide efforts to improve child survival. With child mortality declining in Bangladesh, the distribution of causes of death also changes. The three verbal autopsy (VA) studies conducted with the Bangladesh Demographic and Health Surveys provide a unique opportunity to study these changes in child causes of death. Methods: To ensure comparability of these trends, we developed a standardized algorithm to assign causes of death using symptoms collected through the VA studies. The original algorithms applied were systematically reviewed and key differences in cause categorization, hierarchy, case definition, and the amount of data collected were compared to inform the development of the standardized algorithm. Based primarily on the 2004 cause categorization and hierarchy, the standardized algorithm guarantees comparability of the trends by only including symptom data commonly available across all three studies. Results: Between 1993 and 2004, pneumonia remained the leading cause of death in Bangladesh, contributing to 24% to 33% of deaths among children under 5. The proportion of neonatal mortality increased significantly from 36% (uncertainty range [UR]: 31%-41%) to 56% (49%-62%) during the same period. The cause-specific mortality fractions due to birth asphyxia/birth injury and prematurity/low birth weight (LBW) increased steadily, with both rising from 3% (2%-5%) to 13% (10%-17%) and 10% (7%-15%), respectively. The cause-specific mortality rates decreased significantly due to neonatal tetanus and several postneonatal causes (tetanus: from 7 [4-11] to 2 [0.4-4] per 1,000 live births (LB); pneumonia: from 26 [20-33] to 15 [11-20] per 1,000 LB; diarrhea: from 12 [8-17] to 4 [2-7] per 1,000 LB; measles: from 5 [2-8] to 0.2 [0-0.7] per 1,000 LB; injury: from 11 [7-17] to 3 [1-5] per 1,000 LB; and malnutrition: from 9 [6-13] to 5 [2-7]). Conclusions: Pneumonia remained the top killer of children under 5 in Bangladesh between 1993 and 2004. The increasing importance of neonatal survival is highlighted by the growing contribution of neonatal deaths and several neonatal causes. Notwithstanding the limitations, standardized computer-based algorithms remain a promising tool to generate comparable causes of child death using VA data.

* Correspondence: liliu@jhsph.edu 1 Department of International Health, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205, USA Full list of author information is available at the end of the article Š 2011 Liu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

189


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

Page 2 of 11

corresponds to 91%, 94%, and 99% of the eligible under5 deaths in the three surveys. We obtained the VA instruments and datasets from the three studies with help from colleagues at the Johns Hopkins Bloomberg School of Public Health and DHS at Inner City Fund (ICF) Macro. The 1993-1994 and 1996-1997 VA studies are nearly identical in design, but both differ from the 2004 study in many aspects. Some of the differences are summarized elsewhere [7]. In general, more symptom data were collected in 2004. In addition, different algorithms were applied when assigning causes of death in each study. In order to generate comparable trends in child causes of death, a standardized algorithm that was compatible with all three studies was developed.

Background Trends in the causes of child mortality serve as important global health information to guide efforts to improve child survival [1,2]. For a low- or middleincome country (LMIC) like Bangladesh, these indicators are particularly important for assisting child health policy development and scarce resource allocation. Child mortality rates are declining in many countries [3,4]. In Bangladesh, the under-5 mortality rate (U5MR) decreased from 148 to 52 deaths per 1,000 live births (LB) between 1990 and 2009 [4]. During the same period of time, the neonatal mortality rate also dropped from 58 to 30 deaths per 1,000 LB. As a result, neonatal deaths contributed 57% of all under-5 deaths in 2009, compared to only 39% two decades earlier. Accompanying this steady decline in child mortality is a changing distribution of child causes of deaths [1,5], which has not been well described previously. The three verbal autopsy (VA) studies conducted with the Bangladesh Demographic and Health Surveys (BDHS) provide a unique opportunity to fill this gap [1,6,7]. The goal of the current study is to apply data from the three surveys to generate nationally representative empirical trends of child causes of death. Specifically, we aim to first develop a standardized computer algorithm to assign causes of death and then to apply the standardized algorithm to generate comparable estimates over time for causes of child death.

Development of the standardized algorithm

Computer-based algorithms following a hierarchical process were originally applied in all three VA studies [6-8]. We chose to develop the standardized algorithm in a similar fashion. During the development, the original algorithms applied in each study were systematically reviewed. Four aspects of the algorithms - cause categorization, hierarchy, case definition, and amount of information collected - were compared and considered in the process. Cause categorization

As shown in Table 1, the same categorization was used for many causes across the three studies, including injury, neonatal tetanus, measles only, measles followed by acute respiratory infections (ARI)/diarrhea, ARI (note that the terms ARI and pneumonia are used interchangeably in this paper), malnutrition, and unidentified causes. For most of the other cause categories, the differences were trivial. For example, possible ARI, possible diarrhea, and possible ARI and diarrhea appeared as individual causes previously, but were combined with other possible serious infections in the last study. However, one important difference exists between cause categorizations across studies. In the first study, deaths occurring in the first three days of life except congenital abnormality and prematurity were collapsed to form the category of “early perinatal deaths” [8]. In the second study, all deaths occurring in the first three days, including those due to congenital abnormality, prematurity, and complications of delivery, were included in the category of “early neonatal or pregnancy/delivery related” deaths [6]. These collapsed categories masked the relative importance of individual neonatal causes. In fact, information was collected and available in both studies to assign the specific neonatal causes, but was not originally used to classify subcategories.

Methods Data source and datasets

The three VA studies were conducted in 1993-1994, 1996-1997, and 2004, following the corresponding BDHS. Details of the design and implementation of each study were published elsewhere [6-8]. Briefly, all three BDHS have a multistage stratified and clustered sampling design, with a total sample size of 9,174; 8,682; and 10,500 households and 12,924 (9,174 women and 3,284 men); 12,473 (9,127 women and 3,346 men); and 15,627 (11,330 women and 4,297 men) individuals for 1993-1994, 1996-1997, and 2004, respectively. Households with under-5 deaths in the past five years were revisited for the VA studies after the main DHS was conducted. The VA interviews were performed after the completion of the 1993-1994 and 1996-1997 DHS, but were done within one day of identifying the eligible households in 2004. Standardized VA questionnaires, also including a narrative of the conditions regarding the fatal illness, were administered to primary caretakers of deceased children. Only data collected through the structured questionnaires were analyzed here. Information on 828, 678, and 587 under-5 deaths was collected in 1993-1994, 1996-1997, and 2004, respectively, which

190


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

Page 3 of 11

Table 1 Cause categories applied in the three VA studies in Bangladesh 1993-1994

1996-1997

2004

Injury Neonatal tetanus Measles only Measles followed by ARI/diarrhea ARI Watery diarrhea Persistent diarrhea

Diarrhea

Dysentery ARI and watery diarrhea ARI and persistent diarrhea

ARI and diarrhea

ARI and dysentery Congenital abnormality

Congenital abnormality Early neonatal or pregnancy/delivery related

Prematurity Early Perinatal

Premature birth/LBW Birth asphyxia Birth injury

Possible ARI Possible diarrhea

Other possible serious infections

Possible ARI and diarrhea Malnutrition Not identified

stopped suckling” needed to happen not only in the terminal stage of illness, but also at least one day before the final illness. Other differences in case definitions are underscored in Additional file 1 where applicable.

Hierarchy

The hierarchies applied in the first two VA studies are nearly identical [6,7], but are quite different from the one used in 2004 (Figure 1, left and middle columns). As discussed above, one important difference is the classification of early neonatal deaths in the first two studies. Because different cause categorizations were applied, the total number of tiers was different in the hierarchies. The same causes could also be given different priorities and assigned in different tiers. For example, neonatal tetanus was assigned in the third tier in the first two studies but in the first tier in 2004. Prematurity/LBW was assigned before possible pneumonia/ diarrhea initially, but was moved to be assigned after other possible serious infections in 2004.

Amount of information collected

Even if the same case definitions were applied across the three studies, the causes of deaths would not always be comparable due to the fact that different amounts of information were collected in each study. For example, the definition of other possible serious infections entails having two or more signs of serious infections. However, a different number of signs of serious infections were collected in each study. Specifically, five signs of serious infections (stopped suckling, difficult breathing, chest indrawing, convulsions, and fever) were collected in all three studies, with one more sign (stopped crying) collected in 1996-1997 and 2004, and 12 additional signs (rapid breathing, cold to touch, lethargic, unresponsive or unconscious, bulging fontanels, redness or drainage from the umbilical cord stump, skin rash with bumps containing pus, vomiting everything, stopped being able to grasp, stopped being able to respond to a voice, stopped being able to follow movements with the eyes, and stiff neck) collected in 2004 alone. As a result, a larger proportion of other possible serious infections would have been assigned in 2004 because more signs were available in that year.

Case definition

The definitions for each cause, or the case definitions, are similar across the three studies (see Additional file 1 for details), but are slightly different for a number of causes. For example, neonatal tetanus was defined in the first two studies as deaths occurring between 4 and 14 days of life, with convulsions, and where the baby either cried normally after birth but stopped crying in the final illness or suckled normally after birth but stopped suckling in the final illness or both. In 2004, the case definition included one additional condition: i. e., the symptoms of “cried normally after birth but stopped crying” and “suckled normally after birth but

191


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

Page 4 of 11

Figure 1 Hierarchies applied in the three Bangladesh VA studies and the standardized hierarchy.

192


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

Page 5 of 11

same time, which allowed any two or all three causes to be assigned. The resulting cause categories include measles only, measles followed by ARI or diarrhea, diarrhea only, pneumonia only, and ARI and diarrhea [6-8]. When presenting the final causes, measles and measles followed by ARI or diarrhea were combined as measles. ARI and diarrhea were redistributed among ARI only and diarrhea only according to their relative importance as assigned single causes. Similar redistribution was done for possible pneumonia and possible diarrhea. Then possible pneumonia and possible diarrhea as two causes were grouped into pneumonia and diarrhea, respectively, to form the final categories of pneumonia and diarrhea. The standardized algorithm was reviewed by members of the CHERG before submission for publication.

The standardized algorithm

After systematically reviewing the original algorithms, we consider the 2004 cause categorization and hierarchy to be more logical, and developed the standardized algorithm mainly based on them. Even though some specific neonatal causes, such as birth asphyxia, were not assigned in the two earlier studies, relevant signs and symptoms were collected. This allowed us to further classify the causes of most cases originally assigned as early neonatal deaths. The final cause categories are in principle consistent with those adopted by the Child Health Epidemiology Reference Group (CHERG) [1,2,9]. Specifically, results for the following causes are presented: pneumonia, diarrhea, prematurity/LBW, birth asphyxia/birth injury, congenital abnormalities, neonatal tetanus, measles, injury, possible serious infections, malnutrition, and unspecified causes. It is noted that drowning, as an important cause among children aged 1 to 4 years, is included in injury. We also incorporated into the final hierarchy certain tiers from the 1993-1994 and 1996-1997 studies, specifically, possible pneumonia and possible diarrhea, and considered them to be meaningfully different from the remaining other possible serious infections. We promoted prematurity/LBW to be classified before the remaining other possible serious infections. The final hierarchy is shown in Figure 1 (the right column), which indicates the specific steps that were taken to assign deaths by cause in this study. We also attempted to assign meningitis in the hierarchy by applying the case definitions borrowed from the 1999 World Health Organization VA monograph [10], but were not successful due to the lack of the necessary signs of serious infections in the first two studies. Given that our objective was to generate comparable trends in the causes of under-5 deaths, we decided to adopt case definitions that could be employed in all three studies. For example, we chose to apply the case definition of neonatal tetanus from the first two studies and ignore the additional information available in 2004 despite the fact that the 2004 definition may have higher specificity. Similarly, when deciding how much information to apply in the standardized algorithm, to serve our purpose of maintaining comparability, we chose to only include the amount of information that was available across all studies. In the case of other possible serious infections, only the five commonly available signs of serious infections were employed in the algorithm. Some causes could occur both among neonates and children from 1 to 59 months old, such as diarrhea, possible diarrhea, ARI, possible ARI, and other possible serious infections. They were assigned in the same tier, but separately for the two age groups applying somewhat different case definitions (see Additional file 1 for details). Measles, diarrhea, and ARI were assigned at the

Sensitivity analysis and uncertainty estimation

After the standardized algorithm was finalized, it was applied to the three VA studies to obtain cause-specific fractions. The cause fractions were properly weighted to take into account oversampling of certain subgroups in the BDHS [6-8]. To reduce the influence of the hierarchy, we conducted a sensitivity analysis to first assign causes allowing multiple diagnoses without following a hierarchy. Then, among deaths with multiple causes, we applied the standardized hierarchy to determine a single cause. To directly compare trends in each cause across time, cause-specific mortality rates (CSMR) were calculated by multiplying cause-specific fractions with U5MR. Estimates of U5MR are available from multiple sources for Bangladesh, including the DHS [7,11,12], the Interagency Group for Child Mortality Estimation (IGME) [4], and the Institute for Health Metrics and Evaluation (IHME) [3]. The three sets of U5MR are generally similar but larger discrepancies were observed in 2004. For the years of 1993-1994 and 1996-1997, U5MR were interpolated for midpoints between the two years in all three series accordingly. All three sets of U5MR were applied to calculate CSMR, and the results were not qualitatively different. The IGME series was presented here for demonstration since it ranked in the middle of the three sets of estimates. When producing uncertainty estimates, to take into account the complex survey design of the studies, a bootstrap sample of clustered data was taken for each DHS survey within strata, where strata contained multiple primary sampling units. Strata with only one primary sampling unit contributing cause of death data were treated interchangeably, or as if they were in a single stratum. Because deaths were only observed at the household or secondary sampling unit, this led to a substantial estimated variability in the cause of death

193


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

Page 6 of 11

Figure 2 Cause-specific fraction of deaths among children under 5 years of age in Bangladesh, 1993-1994, 1996-1997, and 2004.

Neonatal pneumonia remained one of the most important causes across time, contributing to 9% (6%-12%), 8% (5%-11%), and 12% (9%-17%) of under-5 deaths in 1993-1994, 1996-1997, and 2004, although the change is not statistically significant. The relative importance of birth asphyxia/birth injury increased steadily and significantly, claiming 3% (2%-5%), 6% (4%-9%), and 13% (10%-17%) of deaths among children younger than 5 in the three studies. Similarly, the proportion of deaths due to prematurity/LBW increased significantly from 3% (2%-5%) and 4% (2%-7%) to 10% (7%-15%) during the same period of time. Among children aged 1 to 59 months, pneumonia remained the top killer over the decade, claiming 16% to 21% (1993-1994: 20% [16%-25%]; 1996-1997: 16% [12%-20%]; 2004: 21% [16%-26%]) of lives of children under 5 in Bangladesh. The contribution of diarrhea decreased from 10% (6%-13%) and 11% (7%-14%) in the first two studies to 6% (3%-9%) in 2004. The cause-specific fraction of measles declined from 4% (2%-6%) to 0.2% (0%-0.9%) in 1993-2004. Injury and malnutrition also posed major threats to child survival, contributing 4% to 9% (1993-1994: 9% [5%-13%]; 1996-1997: 7% [4%-9%]; 2004: 4% [2%-8%]) and 6% to 8% (1993-1994: 7%

estimates. This uncertainty was used as a factor together with the uncertainty of the U5MR [4] to determine the total uncertainty in the CSMR. Uncertainty ranges (UR) were defined as the 2.5 to 97.5 percentiles. A two-sample bootstrap test was implemented to examine whether the differences were statistically significant between the 1993-1994 and 2004 studies. The analyses were conducted using STATA 10.0 Special Edition [13] and R [14].

Results Trends in the cause-specific fractions

In Bangladesh, the under-5 mortality rate dropped from 128 deaths per 1,000 live births (LB) in 19931994 to 110 in 1996-1997, and then to 70 per 1,000 LB in 2004 [4]. During the same time period, the neonatal mortality rate (NMR) declined from 53 to 49 and then to 36 deaths per 1,000 LB. Corresponding to the reduction in under-5 mortality, the proportion of neonatal deaths increased significantly from 36% (UR: 31%-41%) to 41% (37%-46%) and then to 56% (49%62%) in the three VA studies (Figure 2, also refer to Additional file 2 for a complete list of uncertainty estimates).

194


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

Page 7 of 11

from 9 (6-13) and 19 (14-24) to 5 (2-7) and 4 (2-6) between 1993-1994 and 2004, respectively. Among children younger than 5, the cause-specific mortality rates of pneumonia and diarrhea both dropped significantly, from 38 (31-46) to 23 (18-29) per 1,000 LB for pneumonia and from 14 (9-19) to 5 (2-7) per 1,000 LB for diarrhea. The top three causes changed from pneumonia, diarrhea, and injury in 1993-1994 (CSMR: 38 [31-46], 14 [9-19], and 11 [7-17] deaths per 1,000 LB, respectively), to pneumonia, birth asphyxia/birth injury, and prematurity in 2004 (CSMR: 23 [18-29], 9 [6-13] and 7 [4-11] deaths per 1,000 LB, respectively). Our results are replicated in the sensitivity analysis, where deaths were first assigned to all possible causes without applying a hierarchy, and then multiple causes were assigned applying the standardized hierarchy. Most of the trends in CSMR are not statistically significant among neonates during the 10-year period, except those of birth asphyxia/birth injury and neonatal tetanus. However, postneonatal causes excluding other possible serious infections all saw significant declines in CSMR.

[5%-10%]; 1996-1997: 8% [5%-11%]; 2004: 6% [3%-10%]) of total under-5 deaths, respectively. None of the changes in cause-specific fractions are significant among 1- to 59month-olds except those due to measles and injury. Among all children under 5, pneumonia was responsible for 24% to 33% (1993-1994: 29% [25%-34%]; 19961997: 24% [20%-29%]; 2004: 33% [28%-39%]) of deaths. Diarrhea was the second most important cause in 19931994 and 1996-1997, accounting for 11% (8%-14%) to 13% (9%-16%) of under-5 deaths. Birth asphyxia took over as the second important cause in 2004, claiming 13% (10%-17%) of all under-5 deaths. Trends in the cause-specific mortality rates

CSMRs and their uncertainty ranges are presented and compared in Figure 3 (also refer to Additional file 3 for the numeric values of these estimates). Among neonatal causes, the mortality rates of all causes dropped over the period with two exceptions: birth asphyxia/birth injury and prematurity/LBW. Birth asphyxia/birth injury increased from 4 (2-7) deaths per 1,000 live births (LB) in 1993-1994 to 7 (4-10) in 1996-1997 and then to 9 (613) per 1,000 LB in 2004. Prematurity increased from 4 (2-7) and 5 (2-8) deaths per 1,000 LB in the first two studies to 7 (4-11) deaths per 1,000 LB in 2004. The increase in birth asphyxia/birth injury between 19931994 and 2004 was statistically significant, but the increase was not significant for prematurity albeit the significant increase in its cause-specific fractions. Among declining causes, tetanus had the steepest drop between 1996-1997 and 2004, decreasing significantly from 7 (4-11) to 2 (0.4-4) per 1,000 LB. Other neonatal causes, such as pneumonia, diarrhea, and congenital abnormality, all showed a moderate decline in causespecific mortality rates over the 10-year period, but the decline has not reached statistical significance. Among deaths in the 1- to 59-month age group, the mortality rates of all of the causes dropped between 1993 and 2004, except for a slight increase in other possible serious infections, rising from 0 in 1993-1994 to 1 (0.1-2) per 1,000 LB in 2004 (Figure 3). The decline in cause-specific mortality rates among other causes was all significant. Pneumonia showed the fastest decline between 1993-1994 and 1996-1997, but reduced at a slower rate between 1996-1997 and 2004, dropping from 26 (20-33) to 15 (11-20) deaths per 1,000 LB between 1993 and 2004. Diarrhea showed the opposite pattern, staying relatively constant between the first two studies, but dropped the most quickly and significantly afterwards, declining from 12 (8-17) to 4 (2-7) deaths per 1,000 LB. The mortality rates of measles and injury decreased steadily and significantly from 5 (2-8) and 11 (7-17) to 0.2 (0-0.7) and 3 (1-5) per 1,000 LB, respectively. Malnutrition and unspecified causes dropped

Discussion This study developed and applied a standardized computer-based algorithm to assign child causes of deaths using nationally representative verbal autopsy (VA) data in Bangladesh. The results provide distinctive insights into patterns and trends of child causes of death at the national level for one decade. The empirical trends are of particular interest as child causes of death are often modeled for most LMICs [1,2]. Our results corroborate the previous finding that pneumonia remains the top ranking cause of death among children below 5 years of age [1], despite the fact that the mortality rates of pneumonia had been declining significantly in Bangladesh. The proportional contribution of neonatal causes increased steadily during the study period. The proportion of under-5 deaths due to birth asphyxia/birth injury and prematurity/LBW rose progressively and significantly. The increase in their CSMR was also noticeable, although only the increase in the CMSR of birth asphyxia/birth injury reached statistical significance, possibly due to the competing significant reduction in the U5MR [4]. Likely because of a similar reason, the cause-specific fraction and the CSMR of neonatal tetanus both decreased, although only the reduction in CSMR was statistically significant. The composition of the top three ranking causes transitioned from including none neonatal cause to including two neonatal causes, which signifies again the increasing relative importance of neonatal mortality as under-5 mortality rate continues to decrease [15]. The CSMR of all postneonatal causes dropped significantly between 1993-1994 and 2004 except for other

195


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

Page 8 of 11

(A) Neonatal causes

(B) Postneonatal causes

Figure 3 Cause-specific mortality rate of children under 5 years of age in Bangladesh, 1993-1994, 1996-1997, and 2004 (* indicates the change was statistically significant between 1993-1994 and 2004).

196


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

Page 9 of 11

that possible infections were likely to be true cases of pneumonia/diarrhea with less severe symptoms. Second, after combining the possible diagnosis with the confirmed causes, the time trends were more stable. This practice is also consistent with previous approaches taken by CHERG [1,2]. However, the resulting trends in pneumonia need to be interpreted with caution. Further investigation shows that possible pneumonia is responsible for 12% to 26% of the deaths coded as pneumonia deaths among children aged 1 to 59 months old but 59% to 71% of the deaths coded as pneumonia among neonates. We suspect that the high proportion of neonatal possible pneumonia could be due to the inclusion of other serious infections, such as neonatal sepsis. As a common caveat of VA studies [17], additional misclassification errors may also exist in our results. Comorbidity between diarrhea and pneumonia were reallocated among the two causes based on their relative importance in this study. Alternatively, cases with this type of comorbidity can be treated solely as pneumonia deaths. We took the former approach based on biological and medical considerations. However, more research is clearly needed to further determine which method is more appropriate. In fact, CHERG has an ongoing activity to examine the comorbidity patterns between pneumonia and diarrhea, which may contribute some knowledge to this area in the near future. We estimated that malnutrition was responsible for 6% to 8% of under-5 deaths in Bangladesh in the study period. However, malnutrition could contribute to additional under-5 deaths as a risk factor [18]. In our sensitivity analysis where multiple causes were initially allowed, we found that comorbidity between malnutrition and any of the four infectious causes (diarrhea, ARI, measles, and other serious infections) exist among 42% to 55% of these deaths. Therefore, the estimated contribution of malnutrition to under-5 deaths needs to be interpreted carefully. Several limitations are acknowledged regarding the present study. First, despite our attempts, several important causes, such as meningitis and neonatal sepsis, were not classified due to a lack of necessary symptom data in the two earlier studies. Moreover, unspecified causes contributed to 18% to 25% of under-5 deaths in Bangladesh between 1993 and 2004. It is suggested that more symptom data, especially relating to serious infections, should be routinely collected in future VA studies to facilitate ascertainment of additional causes. The case definitions used herein were validated in Bangladesh, Nicaragua, and Uganda, and their validity has been shown to be reasonably good [10,19]. Similar hierarchies, though not validated, have been applied to the three Bangladesh VA studies originally and to a VA study conducted in parallel with the 2006 Nepal DHS

possible serious infections. Some of these changes, such as those in measles and diarrhea, may be partly explained by the increase in the coverage of measles vaccine and oral rehydration salts [7,11,12]. Other changes, however, have a less obvious association with changes in intervention coverage. For example, despite the significant decrease in the mortality rates of postneonatal pneumonia, the case management of pneumonia did not seem to improve [7,11,12], although access to high-quality, low-cost antibiotics is suggested to have been increasing during this period. Several approaches are in widespread use to estimate the variability of population-level estimates for complex multistage sample survey data. The Jackknife and Balanced Repeated Replication (BRR) methods need multiple primary sampling units per sampling stratum [16]. These methods are both based on resampling of sample survey data. In this study, where each DHS sample had many levels of stratification, there were multiple strata where only one sampling unit was informative concerning the causes of death for children under 5. Resampling can accommodate stratification and clustering in multistage sampling data, and a more general approach relative to the Jackknife and BRR was used in our case, where an additional complexity was introduced due to the cause reallocation from the comorbidity of pneumonia and diarrhea to each individual cause as a nonlinear function of population estimates. We developed a standardized algorithm to assign child causes of deaths using signs and symptoms collected through the VA studies to ensure comparability of the trends. The original algorithms applied in each study were systematically reviewed and key differences in cause categorization, hierarchy, case definition, and amount of data collected were compared to inform the development of the standardized algorithm. The standardized algorithm, primarily based on the 2004 cause categories and hierarchy, guarantees comparability of the trends by only including information commonly available across all studies. However, this was realized at the expense of losing additional information collected in the later studies. When determining the cause categories, instead of mapping the causes to specific codes based on the ICD10, we employed the CHERG cause categorization, which is in principle consistent with the ICD-10 rules. However, the CHERG classification does emphasize the relative public health importance of major child causes and encourages linkage with the relevant health interventions [9]. When presenting the final causes, we chose to combine possible pneumonia and possible diarrhea with pneumonia and diarrhea, respectively, for several reasons. First, based on the case definitions, we considered

197


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

Page 10 of 11

VA component in their future DHSs. In addition, cause of death data collected through other sources, such as the International Network for the Demographic Evaluation of Populations and Their Health in Developing Countries (INDEPTH) and the Mozambique post-census survey applying the Sample Vital Registration using Verbal Autopsy (SAVVY) methodology, could all contribute to a better empirical understanding of cause of death [23]. Often, however, the heterogeneity in cause of death estimation could also be due to differences in methods for assigning or ascertaining causes [2,23]. Standardized algorithms are becoming an indispensible tool to generate child causes of death estimates that are comparable across time, countries, and settings [24]. Computer-based algorithms have the advantage of being objective, feasible, and affordable, and may better serve the needs of LMICs as compared to physician review. The current exercise is among the efforts to develop such standardized tools with the hope that more research of this type would be stimulated. The resulting trends in child causes of death also provide a platform to link with trends in intervention coverage in Bangladesh. Close examination of these linkages, using packages like the Lives Saved Tool (LiST) [25], would help us better understand what drives the changes in cause-specific mortality and to identify mechanisms that work in the context of Bangladesh to reduce child mortality. These successful experiences or lessons learned can be shared with other countries to help accelerate their progress toward Millennium Development Goal 4.

[20]. Our standardized hierarchy, however, has the inherent limitations of any hierarchical process. In particular, the resulting cause-specific fractions are quite sensitive to the tier in which the causes were assigned [21,22]. For example, for comparison purposes, we assigned prematurity/LBW after other possible serious infections. The prematurity fractions were reduced by about 30% across all three studies. However, the hierarchy doesn’t affect all deaths in the studies. In fact, our sensitivity analysis reveals that more than half (52%54%) of the deaths were either due to a single cause or considered as belonging to “unspecified causes�. Their cause fractions will remain the same irrespective of the application of hierarchies. Among the other half of the deaths with multiple diagnoses, many deaths could have been assigned to the same causes during medical certification. Further validation research, such as the ongoing Grand Challenges in Global Health Initiative #13 Study done by the Population Health Metrics Research Consortium (PHMRC), may help determine a better computer-based algorithm. However, preliminary results of the PHMRC show that infectious causes, such as pneumonia, are still among the few causes that are hard to assign even after applying the advanced machine-learning theory. Whether these validated algorithms can be readily applied to secondary data is questionable since only limited information on signs and symptoms are available. In addition, the external validity of these algorithms in other countries and settings still needs to be shown. Limited by the validity of the algorithm, the absolute level of the estimated cause-specific mortality rates may not be accurate. In other words, our uncertainty ranges did not take into account the unknown uncertainty in the cause of death assigning process. But with the employment of the standardized algorithm over the three datasets, the trends in the cause-specific mortality rates should be reasonably reliable. The uncertainties of the trends have been quantified by incorporating known uncertainties in the complex survey design and the U5MR. The values of these trends are further appreciated considering the fact that they originated from nationally representative empirical data. In the absence of a more reliable alternative, such information should start to be utilized to facilitate child health policymaking and resource allocation. In the past, few empirical data on child causes of deaths have been available in LMICs, but recently more data are being collected and shared. Bangladesh is among a number of countries that have had at least one DHS with the VA module, and more countries are either planning on or considering including the

Conclusions Despite a declining trend in the cause-specific mortality rate, pneumonia remained the top killer of children under 5 in Bangladesh from 1993 to 2004. The increasing importance of neonatal survival is highlighted by the growing contribution of neonatal deaths and several neonatal causes. Neonatal tetanus, birth asphyxia/birth injury, postneonatal pneumonia, diarrhea, measles, injury, and malnutrition all saw significant reductions in mortality rates, the driving factors of which still need to be better understood. Notwithstanding the limitations, standardized computer-based algorithms remain a promising tool to generate comparable child causes of death using VA data. The tool should prove particularly useful in places where routine medical certification is impractical yet information on the causes of death is regularly requested to improve child-survival planning. Repeated VA studies employing standardized instruments and algorithms would be especially appreciated for tracking trends in causes of child death.

198


Liu et al. Population Health Metrics 2011, 9:43 http://www.pophealthmetrics.com/content/9/1/43

Page 11 of 11

Additional material 4. Additional file 1: Case definitions of major child causes of death applied in the three Bangladesh VA studies and the standardized case definitions (differences in the case definitions between studies are underscored where applicable).

5. 6.

Additional file 2: Cause-specific fractions and uncertainty ranges (in parentheses) in Bangladesh, 1993-1994, 1996-1997, and 2004 (* indicates the change was statistically significantly between 19931994 and 2004).

7.

Additional file 3: Cause-specific mortality rates (per 1,000 live births) and uncertainty ranges (in parentheses) in Bangladesh, 1993-1994, 1996-1997, and 2004 (* indicates the change was statistically significantly between 1993-1994 and 2004).

8.

9. List of abbreviations BDHS: Bangladesh Demographic and Health Survey; BRR: Balanced Repeated Replication; CHERG: Child Health Epidemiology Reference Group; CSMR: cause-specific mortality rates; IGME: the Inter-agency Group for Child Mortality Estimation; IHME: the Institute for Health Metrics and Evaluation; INDEPTH: International Network for the Demographic Evaluation of Populations and Their Health in Developing Countries; LB: live births; LBW: low birth weight; LiST: lives saved tool; LMIC: low- and middle-income country; NMR: neonatal mortality rate; SAVVY: Sample Vital Registration using Verbal Autopsy; U5MR: under-5 mortality rate; VA: verbal autopsy.

10.

Acknowledgements and funding We want to express our sincere appreciation to Abudullah Baqui and Emma Williams for their assistance with obtaining the three Bangladesh VA datasets, instruments, and computer programs. We would also like to acknowledge Bridgette Wellington for her help with the 2004 VA questionnaire for neonates. We thank Henry Kalter for useful discussion about the hierarchy and the sensitivity analysis. And we appreciate the constructive comments from Jennifer Bryce, Agbessi Amouzou, and Jennifer Requejo on some preliminary results. The study was supported by the Bill & Melinda Gates Foundation (Grant numbers: 50140 and 43386).

13.

Author details Department of International Health, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205, USA. 2Department of Population, Family and Reproductive Health, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205, USA.

18.

11.

12.

14. 15. 16. 17.

1

19.

Authors’ contributions LL developed the standardized algorithm, conducted the analysis, and wrote the first draft of this manuscript. QL assisted with data analysis. RAL helped review the original algorithms. IF contributed to the development of the standardized algorithm. JP assisted with the uncertainty estimation. REB and NW conceived the idea and supervise the study in general. All authors participated in results interpretation and subsequent revision of the manuscript.

20.

21.

22.

Competing interests The authors declare that they have no competing interests.

23.

Received: 15 February 2011 Accepted: 5 August 2011 Published: 5 August 2011

24. 25.

References 1. Black RE, Cousens S, Johnson H, Lawn JE, Rudan I, Bassani DG, Jha P, Campbell H, Walker CF, Cibulskis R, et al: Global, Regional and National Causes of Child Mortality, 2008. Lancet 2010, 375:1969-87. 2. Johnson H, Liu L, Walker CF, Black RE: Estimating the distribution of causes of child deaths in high mortality countries with incomplete death certification. Int J Epidemiol 2010, 39:1103-1114. 3. Rajaratnam JK, Marcus JR, Flaxman AD, Wang H, Levin-Rector A, Dwyer L, Costa M, Lopez AD, Murray CJ: Neonatal, postneonatal, childhood, and under-

5 mortality for 187 countries, 1970-2010: a systematic analysis of progress towards Millennium Development Goal 4. Lancet 2010, 375:1988-2008. You D, Jones G, Hill K, Wardlaw T, Chopra M: Levels and trends in child mortality, 1990-2009. Lancet 2010, 376:931-933. Bryce J, Boschi-Pinto C, Shibuya K, Black RE: WHO estimates of the causes of death in children. Lancet 2005, 365:1147-1152. Baqui AH, Sabir AA, Begum N, Arifeen SE, Mitra SN, Black RE: Causes of childhood deaths in Bangladesh: an update. Acta Paediatr 2001, 90:682-690. National Institute of Population Research and Training (NIPORT), Mitra and Associates, ORC Macro: Bangladesh Demographic and Health Survey 2004 Dhaka, Bangladesh and Calverton, Maryland [USA]: National Institute of Population Research and Training (NIPORT), Mitra and Associates, and ORC Macro; 2005. Baqui AH, Black RE, Arifeen SE, Hill K, Mitra SN, al SA: Causes of childhood deaths in Bangladesh: results of a nationwide verbal autopsy study. Bull World Health Organ 1998, 76:161-171. Fottrell E, Byass P: Verbal autopsy: methods in transition. Epidemiol Rev 2010, 32:38-55. Anker M, Black RE, Coldham C, Kalter H, Quigley MA, Ross D, Snow RW: A Standard Verbal Autopsy Method for Investigating Causes of Death in Infants and Children Geneva, Switzerland: WHO; 1999. Mitra SN, Ali MN, Islam S, Cross AR, Saha T: Bangladesh Demographic and Health Survey 1993-1994 Calverton, Maryland: National Institute of Population Research and Trainning (NIPORT), Mitra and Associates, and Macro International Inc.; 1994. Mitra SN, Al-Sabir A, Cross AR, Jamil K: Bangladesh Demographic and Health Survey, 1996-1997 Dhaka and Calverton, Maryland: National Institute of Population Research and Training (NIPORT), Mitra and Associates, and Macro International Inc.; 1997. StataCorp: Stata Statistical Software: Release 10 College Station, TX: StataCorp LP; 2007. R Development Core Team: R: A language and environment for statistical computing Vienna, Austria: R Foundation for Statistical Computing; 2009. Lawn JE, Wilczynska-Ketende K, Cousens SN: Estimating the causes of 4 million neonatal deaths in the year 2000. Int J Epidemiol 2006, 35:706-718. Rao JNK, Wu CFJ: Resampling inference with complex survey data. Journal of the American Statistical Association 1988, 83:231-241. Anker M: The effect of misclassification error on reported cause-specific mortality fractions from verbal autopsy. Int J Epidemiol 1997, 26:1090-1096. Black RE, Allen LH, Bhutta ZA, Caulfield LE, de OM, Ezzati M, Mathers C, Rivera J: Maternal and child undernutrition: global and regional exposures and health consequences. Lancet 2008, 371:243-260. Kalter HD, Hossain M, Burnham G, Khan NZ, Saha SK, Ali MA, Black RE: Validation of caregiver interviews to diagnose common causes of severe neonatal illness. Paediatr Perinat Epidemiol 1999, 13:99-113. Ministry of Health and Population (MOHP) [Nepal], New ERA, Macro International Inc.: Nepal Demographic and Health Survey 2006 Kathmandu, Nepal: Ministry of Health and Population, New ERA, and Macro International Inc.; 2007. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: their development and validation in a multicentre study. Trop Med Int Health 1998, 3:436-446. Thatte N, Kalter HD, Baqui AH, Williams EM, Darmstadt GL: Ascertaining causes of neonatal deaths using verbal autopsy: current methods and challenges. J Perinatol 2009, 29:187-194. Adjuik M, Smith T, Clark S, Todd J, Garrib A, Kinfu Y, Kahn K, Mola M, Ashraf A, Masanja H, et al: Cause-specific mortality rates in sub-Saharan Africa and Bangladesh. Bull World Health Organ 2006, 84:181-188. WHO: Verbal Autopsy Standards: Ascertaining and Attributing Causes of Death Geneva, Switzerland: World Health Organization; 2007. Steinglass R, Cherian T, Vandelaer J, Klemm RD, Sequeira J: Development and use of the Lives Saved Tool (LiST): a model to estimate the impact of scaling up proven interventions on maternal, neonatal and child mortality. Int J Epidemiol 2010, 40(2):519-520.

doi:10.1186/1478-7954-9-43 Cite this article as: Liu et al.: Trends in causes of death among children under 5 in Bangladesh, 1993-2004: an exercise applying a standardized computer algorithm to assign causes of death using verbal autopsy data. Population Health Metrics 2011 9:43.

199


Källander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

RESEARCH

Open Access

Social autopsy: INDEPTH Network experiences of utility, process, practices, and challenges in investigating causes and contributors to mortality Karin Källander1,2,3,4*, Daniel Kadobera2, Thomas N Williams5,6, Rikke Thoft Nielsen7,8, Lucy Yevoo9, Aloysius Mutebi1,2, Jonas Akpakli9, Clement Narh9, Margaret Gyapong9, Alberta Amu9 and Peter Waiswa1,2

Abstract Background: Effective implementation of child survival interventions depends on improved understanding of cultural, social, and health system factors affecting utilization of health care. Never the less, no standardized instrument exists for collecting and interpreting information on how to avert death and improve the implementation of child survival interventions. Objective: To describe the methodology, development, and first results of a standard social autopsy tool for the collection of information to understand common barriers to health care, risky behaviors, and missed opportunities for health intervention in deceased children under 5 years old. Methods: Under the INDEPTH Network, a social autopsy working group was formed to reach consensus around a standard social autopsy tool for neonatal and child death. The details around 434 child deaths in Iganga/Mayuge Health and Demographic Surveillance Site (HDSS) in Uganda and 40 child deaths in Dodowa HDSS in Ghana were investigated over 12 to 18 months. Interviews with the caretakers of these children elicited information on what happened before death, including signs and symptoms, contact with health services, details on treatments, and details of doctors. These social autopsies were used to assess the contributions of delays in care seeking and case management to the childhood deaths. Results: At least one severe symptom had been recognized prior to death in 96% of the children in Iganga/ Mayuge HDSS and in 70% in Dodowa HDSS, yet 32% and 80% of children were first treated at home, respectively. Twenty percent of children in Iganga/Mayuge HDSS and 13% of children in Dodowa HDSS were never taken for care outside the home. In both countries most went to private providers. In Iganga/Mayuge HDSS the main delays were caused by inadequate case management by the health provider, while in Dodowa HDSS the main delays were in the home. Conclusion: While delay at home was a main obstacle to prompt and appropriate treatment in Dodowa HDSS, there were severe challenges to prompt and adequate case management in the health system in both study sites in Ghana and Uganda. Meanwhile, caretaker awareness of danger signs needs to improve in both countries to promote early care seeking and to reduce the number of children needing referral. Social autopsy methods can improve this understanding, which can assist health planners to prioritize scarce resources appropriately.

* Correspondence: k.kallander@malariaconsortium.org 1 Department of Health Policy, Planning & Management, School of Public Health, Makerere University, P.O. Box 7072, Kampala, Uganda Full list of author information is available at the end of the article © 2011 Källander et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

200


K채llander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 2 of 12

as possible, they will be published on the INDEPTH website. The expert group on social autopsy systematically reviewed, debated, and refined the accumulated experience and evidence from the most widely-used social autopsy questionnaires and procedures. This resulted in standard social autopsy questionnaires for two age groups (neonatal and child deaths). The results from applying the child death SA tool in Uganda and Ghana is presented, and points of intervention that could, in the future, prevent other deaths are identified and discussed. The neonatal tool was piloted in GuineaBissau and in Uganda, and the results will be presented elsewhere.

Background Each year approximately 8.8 million deaths occur in children younger than 5 years old worldwide, with the majority (68%) dying of infectious causes such as pneumonia, diarrhea, and malaria [1]. Reliable estimates of the numbers, causes of, and contributors to death are key elements of functional health systems where health policies and programs are evidence based [2]. However, especially in high mortality settings, the civil registration systems are limited or nonexistent, and most deaths go unrecorded [3]. Fewer than one-third of the 57 million global annual deaths are issued medical certificates [4]. Countries that cannot record the number of people who die or why they die cannot realize the full potential of their health systems [5]. As an alternative method, the estimation of cause-specific mortality can be obtained through the use of alternative methods, such as verbal autopsies (VAs), which use standard interview tools with caretakers on symptoms preceding death in a patient [6,7]. Most of the INDEPTH coordinated Health and Demographic Surveillance Sites (HDSS) (http://www.indepthnetwork.org), which carry out longitudinal surveillance of births, deaths, and migrations in defined populations, have already adopted VA for routine investigation of cause of death. These data are used to generate areaspecific disease profiles, which are shared with subnational and national health planners for better allocation of health resources. However, these VA tools do not provide information on critical delays and care seeking that could have saved the child. Several previous attempts have been made to understand the reasons why diseases like malaria, diarrhea, and pneumonia continue to cause so many child deaths. The contributing factors leading to death are complex, but poor recognition of illness symptoms by parents and inappropriate medical care provided to children appears to play an important role in many countries [8-20]. Better information about the social processes, the timing and type of care-seeking actions, and treatments received prior to death is critical to identifying modifiable factors that can be addressed by new policies or better resource planning. In this paper we suggest bringing these methodologies together under one standard umbrella called social autopsy (SA). This paper presents the consensus products and results of a three-year effort by an expert group of researchers, pediatricians, statisticians, and other stakeholders under the sponsorship of the INDEPTH Network. It is intended to serve the needs of various users and producers of mortality information, including researchers, policymakers, program managers, and evaluators in the 37 INDEPTH field sites in 19 countries. To make these resources as easily and widely accessible

Methods Study area and population Uganda

The Iganga/Mayuge HDSS is a defined area across parts of Iganga and Mayuge districts at the shores of Lake Victoria in eastern Uganda. The HDSS area is predominantly rural, but partly peri-urban in some trading areas. It consists of 185 villages drawn from both Iganga and Mayuge districts. It spans a total area of 400 km2 with a total population of approximately 159,000. The main source of income is subsistence farming with a culturally homogenous population (more than 80% are Lusoga speaking). About 17% are children less than 5 years old. The overall under-5 mortality is 137 deaths per 1,000 live births [21], and the key causes of child mortality are neonatal conditions (24%), malaria (23%), pneumonia (21%), and diarrheal diseases (17%) [22]. The Iganga/ Mayuge HDSS has one hospital, eight public health centers, three non-governmental organization (NGO) clinics, and 122 drug shops. The site is a member of the international HDSS organization INDEPTH and largely follows its standard methods. Ghana

The Dodowa HDSS is housed within the Dodowa Health Research Centre. Studies conducted in the center have a focus on developing and evaluating communityand district-based health interventions and obtaining information to improve health policy, planning, and service delivery in the Ghana Health Service. It has strong links with the District Health Management Team and the Local Government Authority. The HDSS is sited in Dodowa, the district capital of the Dangme West District of the Greater Accra Region. It covers a population of approximately 98,000 people in 381 communities. It is purely rural but gradually catching up with the rapid urbanization of the peripheral areas surrounding the city of Accra. The most common form of transportation in the district is the bicycle. Under-5 mortality in Ghana is 80 deaths per 1,000 live births in the most recent five-

201


Källander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 3 of 12

tools developed by the social autopsy working group (SAWG). In Dodowa, the SA tools were integrated into the verbal autopsy tool four months after its introduction (April 2009). Whenever a child death was reported, after a mourning period of four to six weeks, the merged VA/SA questionnaire was used by a trained native field worker fluent in the local language (four in Dodowa and 12 in Iganga/Mayuge) to interview the deceased’s immediate care giver about symptoms presented, preventive behavior, treatment-seeking behavior, and detailed information about care received by first and last provider visited.

year period [23], and the top causes are similar to those in Uganda, namely neonatal conditions (28%), malaria (33%), pneumonia (15%), and diarrheal diseases (12%) [24]. The Dangme West District has four health centers and six community clinics spread throughout the district. In the traditional sector, there are 300 traditional healers, 92 trained traditional birth attendants (TBAs) and “wanzams” (local circumcisers), and an equal number of untrained TBAs who provide alternative medical services. In addition, there are approximately 58 chemical sellers and an unknown number of drug peddlers operating throughout the district. The district has no hospitals. The inhabitants use surrounding hospitals for referral care as well as for some primary care. The same hospitals are the referral hospitals for the community health insurance scheme, which operates in the district.

Instruments

Through Internet-based searches using Medline and Google free-text search and communications with researchers known to have experience with death inquiry methodology, published and unpublished reports were retrieved and reviewed for good and bad practices in using postmortem questionnaires to collect data on care-seeking practices prior to death. A total of nine questionnaires were retrieved and reviewed, of which one had been used in Guinea-Bissau, one in India, two in Kenya, four in Uganda, and one in Bolivia. Observations from the review included the inability of previously-used tools to assess the timing of treatment seeking in relation to the severity of symptoms, the sequence of the many providers visited, and the quality of care provided. The results of the review were compiled and discussed in the SAWG, a group consisting of medical doctors, social scientists, epidemiologists, demographers, and statisticians from INDEPTH HDSSs in Guinea-Bissau, Uganda, Kenya, and Ghana. The group reached consensus on the content of the tools that would result in a minimum level of complexity during data collection and analysis. Based on the review findings, it was decided that there was a need for specific tools for neonatal (0 to 28 days old) and under-5 (29 days old to 5 years old) death events. The new SA tool for under-5 deaths incorporated learning from the mortality surveys in Bolivia [20], where the analysis of the care-seeking process for all providers seen was found to be too complex. Hence, the new SA tool was designed to capture only information on the first and last provider seen before death. The new tool also begins with a history section with probes for the respondents to describe recognition of symptoms, timing of recognition, actions taken at home and outside, and provider behavior. Other detailed information collected in the new tool includes:

Sampling

The HDSSs in Uganda and Ghana generate populationbased data on key demographic events two to three times a year and household, socioeconomic, and education data annually or biannually. In Iganga/Mayuge HDSS, deaths are identified either by information obtained from the routinely (every six months) updated demographic population registry, or through reporting by the one or two village scouts who are selected in each village to report all births and deaths in their respective villages in the study area. In Dodowa HDSS, deaths are identified from three sources: 1) a list from the Dodowa HDSS Field Office based on information collected from the routinely updated demographic population registry, 2) deaths reported from the health facilities, and 3) deaths reported from community key informants. In Iganga/Mayuge HDSS all reported child deaths from January 2009 to July 2010 were included in the study (n = 434). In Dodowa HDSS it was found after the pilot that the community key informants only concentrated on some of the communities and, since the routine population registry is only updated every 6 months, many child deaths were missed by the system. Hence, only 40 child deaths were reported between December 2008 and December 2009, and all of these were included in the study. Data collection

For all deaths in the Iganga/Mayuge and Dodowa HDSSs, verbal autopsies are used to assign likely cause of death. In Iganga/Mayuge HDSS in Uganda, a modified version of the social autopsy questionnaire developed in Bolivia [25] has been integrated into the standardized verbal autopsy tool and used routinely since 2006. In December 2008, both Iganga/Mayuge and Dodowa HDSS introduced the modified social autopsy

• Symptoms presented (e.g., symptoms in chronological order starting with day 1, time from first symptom to death, and in relation to provider seen)

202


Källander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 4 of 12

• Preventive behaviour (e.g., bed net use, vaccinations) • Treatment-seeking behavior (e.g., type, place, and timing of treatment at home and outside the home; chronological order of providers seen; reasons for not giving treatment or seeking care; timing and sequence of providers seen; transport used for seeking care; cost of transport, treatment, and care; hospitalization of child; and reason for and compliance with referral)

tool that was piloted in this study but was later added to the revised version to capture this potential source of bias. A modified version of Thaddeus’ and Maine’s threedelay model for maternal death [28] was adapted, as described by Waiswa et al. 2010 [11], for determining the occurrence of delays in the home, on the way to, and in the health facility. The definitions of delay at the different levels are provided in Tables 1 and 2. Delay in the home included lack of recognition of at least one severe symptom; treatment of children recognized to have at least one possibly severe or severe symptom at home; treatment at home without going for any outside care; treatment of children with severe signs who were not taken for outside care on the same day or with possibly severe symptoms who were not taken for outside care after 24 hours; treatment from an informal care provider as both first and last source; or not complying with referral advice because of reasons such as waiting to get permission from husband, belief that the child was too sick to go, waiting to finish ongoing treatment, or other answers related to perception of illness. Transport delay was defined as taking more than two hours to reach the first or last provider after a decision to seek care was made or a child not being taken for referral because of a lack of transport or lack of money for transport. Health facility delay was defined as the first or last provider taking more than one hour to attend to the child after the child had reached the health facility, referral to another facility because of lack of equipment or drugs, or the first or last formal care provider not providing any care or treatment. The relative contribution of delays at home, during transport, or at the health facility was calculated in four steps. 1) First, the conditional proportion of children exposed to delay was calculated by dividing the number of children exposed to each delay indicator by the total number of deceased children who could have been exposed to that delay indicator. 2) Second, each indicator’s relative contribution to the delay within a delay cluster (e.g., home, transport, or health facility) was calculated by multiplying the conditional proportions for each delay indicator with the fraction of each delay cluster to avoid over weighting clusters containing more delay indicators. 3) The total contribution to delay of each delay cluster was calculated by summing up the relative weights within each delay cluster. 4) The proportional contribution to delay of the different clusters was calculated by dividing each cluster total by the overall cluster total. The data from Iganga/Mayuge HDSS and Dodowa HDSS were analyzed independently, and the results are

The new SA tool for under-5 deaths was piloted for 12 to 18 months for all deaths in these age groups in Dodowa HDSS in Ghana and in Iganga/Mayuge HDSS in Uganda, and the results from data collection were entered and analyzed. Bandim HDSS in Guinea-Bissau only piloted the newborn SA tool, and the results will be presented elsewhere. Data analysis

Data were entered in FoxPro (Microsoft Corporation, Seattle, WA, USA) and analyzed in STATA 10 (Stata Corporation, College Station, TX, USA). The BASICS conceptual framework [26], which suggests a number of indicators that can be quantified for standardized analysis and comparability over time and space, was adopted for analysis of care-seeking processes preceding death. The data were subjected to standard descriptive analysis using proportions (overall and conditional) for categorical data (e.g., children with severe symptoms treated at home) and median values with interquartile range for continuous data (e.g., length of illness before death). Numbers were inserted into an Excel spreadsheet that listed all indicators and was programmed to automatically yield proportions, conditional proportions, and graphical outputs (available on request from first author). Caretaker recognition of severe illness was defined by the caretakers’ mention of at least one of the danger signs defined by the Integrated Management of Childhood Illness (IMCI) guidelines [27]. These include convulsions, chest indrawing, nasal flaring, grunting, bulging fontanelle, umbilical redness extending to the skin, many or severe skin pustules, lethargic or unconscious or less than normal movement, not able to drink or breastfeed, vomits everything, convulsions, loose stools/diarrhea > 2 days, heavy bleeding Similarly, “possibly severe” illness symptoms (wheezing, fever, fast breathing, and difficult breathing) were also retrieved from the IMCI guidelines. Informal care was defined as any care provider that was not a public/ private health facility, hospital, or trained community health worker (CHW). Severity of illness at the time of care seeking was not explored in the version of the SA

203


Källander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 5 of 12

Table 1 Three-delay model for child deaths in Uganda Delay clusters

Number of Denominator children

Conditional proportion

Relative weight¤

Delay 1 - home delay

1/6

# children whose caregivers did not mention at least one severe symptom

17

f

4%

0.01

# children with possibly severe or severe symptom who were treated at home

126

d

32%

0.05

# children only receiving treatment at home without going outside for care

85

f

20%

0.03

# children with severe symptoms who were brought outside the home for care after > 1 day

174

c

42%

0.07

# children who only received informal health care for their fatal illnesses as both first and last source of care

3

b

1%

0.00

# not going for referral because of caretaker decisionmaking

6

e

4%

0.01

# delaying > 2 hrs to reach first or last provider

84

a

36%

0.18

# not going for referral because of lack of money for transport

17

e

12%

0.04

Delay 2 - transport delay

1/2

Delay 3 - health facility delay

1/3

# children obtaining treatment from provider after > 1 hr from first or last provider

71

a

20%

0.07

# children referred because of lack of equipment or lack of drugs

92

e

65%

0.22

# deceased children who did not receive any treatment after visiting first or last formal provider

17

b

7%

0.02

# who went to at least one outside provider (a)

349

# who went to at more than one outside provider (b)

234

# children reported with at least one severe symptom (c)

417

Cluster total weight*

Proportional contribution (total = 0.70)∞

0.17

24%

0.22

32%

0.31

44%

# children reported with at least one severe or possibly 398 severe symptom (d) # children referred from first or last provider (e)

141

Total # child deaths (f)

434

¤

Each indicator’s relative contribution to the delay within a cluster, assuming each indicator is equally important. *The total contribution to delay of each cluster (total = 0.70) ∞ The proportional contribution to delay of each cluster.

presented separately, as we had no intention of comparing the two different countries but merely wished to present the results of the SA tool in two different contexts.

Length of illness

Sick children in Iganga/Mayuge HDSS tend to be sick for about one week before they die. The length of illness was the time between the mother first noticing that her child was sick and the day of the child’s death. The median number of days from first illness symptom was recognized until death occurred was six (interquartile range [IQ]: 2-28).

Results A. Iganga/Mayuge HDSS Child characteristics

A total of 434 deaths in children 29 days old to 5 years old were reported and surveyed between January 2009 and July 2010 in Iganga/Mayuge HDSS, Uganda. The mean age of the deceased children was 17 months (standard deviation [SD]: 13) of whom 48% were female. Forty-two percent (181/434) died at home.

The pathway model

Using the pathway model (Figure 1), the proportion of caretakers in Iganga/Mayuge HDSS i) recognizing illness and treating at home, ii) seeking outside care from a formal or informal provider, and iii) receiving treatment or referral advice from a provider was calculated.

204


Källander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 6 of 12

Table 2 Three-delay model for child deaths in Ghana Number of children

Denominator Conditional proportion

Relative weight¤

Cluster total weight*

Proportional contribution (total = 0.54)∞

1/6

0.34

63%

12

f

30%

0.04

# children with possibly severe or severe symptom who 28 were treated at home

d

80%

0.11

# children only receiving treatment at home without going outside for care

5

f

13%

0.02

# children with severe symptoms who were brought outside the home for care after > 1 day

23

c

82%

0.12

# children who only received informal health care for their fatal illnesses as both first and last source of care

3

b

12%

0.02

# not going for referral because of caretaker decisionmaking

3

e

20%

0.03 0.06

11%

# delaying > 2 hrs to reach first or last provider

1

a

3%

0.02

# not going for referral because of lack of money for transport

2

e

13%

0.04 0.14

26%

Delay 1 - home delay # children whose caregivers did not mention at least one severe symptom

Delay 2 - transport delay

1/2

Delay 3 - health facility delay # children obtaining treatment from provider after > 1 hr from first or last provider

1/3 5

a

16%

0.05

# children referred because of lack of equipment or lack 1 of drugs

e

7%

0.02

# deceased children who did not receive any treatment 5 after visiting first or last formal provider

b

20%

0.07

# who went to at least one outside provider (a)

32

# who went to at more than one outside provider (b)

25

# children reported with at least one severe symptom (c)

28

# children reported with at least one severe or possibly severe symptom (d)

35

# children referred from first or last provider (e)

15

Total # child deaths (f)

40

¤

Each indicator’s relative contribution to the delay within a cluster, assuming each indicator is equally important. *The total contribution to delay of each cluster (total = 0.54) ∞ The proportional contribution to delay of each cluster.

mentioned antibiotics were Cotrimoxazole (trimethoprim/sulfamethoxazole) and ampicillin. All of the traditional remedies mentioned were harmless, including tea infusions from local leaves, egg yolk, and glucose. ii) Caretakers seeking care outside the home Overall, 80% (349/434) of the caretakers sought some type of care outside the home for their child’s illness, though 9% (28/324; 25 missing values) of these first went to an informal provider (traditional healers or drug shop). Still, more than half (173/324; 53%; 25 missing values) went to a public provider first and 38% (123/324; 25 missing values) went to a formal private provider. Almost half of the 349 caretakers (48%) who sought outside care went to more than one provider and 96% (163/

i) Caretakers’ recognition of illness and care provided in the home Although most caretakers (417/434; 96%) recognized that their child was seriously ill, one-third (126/398; 32%) of children with at least one possibly severe or severe symptom was first gave treatment at home. Of these, 85 (20%) did not seek any further care outside the home (Table 1). Sixty-four percent (210/ 326) of caretakers treated with medicines whereas 23% (74/326) used herbs or traditional remedies. The medicines most commonly used at home were antimalarials (148/210; 70%), paracetamol (89/210; 42%), and antibiotics (48/210; 23%). The most commonly used antimalarials included chloroquine, Coartem (artemether/ lumefantrine) and quinine, and the most frequently

205


K채llander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 7 of 12

Figure 1 Conceptual framework of the possible care seeking processes preceding death (modified after Kalter et al. 2004).

were reported to have waited more than one hour to be attended by a health worker, and 65% (92/141) of children who were referred were referred because of lack of drugs or equipment. Seventeen of the 234 caretakers (7%) who went to more than one source of care did not receive any treatment during the care-seeking episode. The second biggest contribution to delay (33%) was caused by problems with transport. After the caretakers had decided to seek care outside the home, 24% (84/ 349) spent more than two hours traveling to the health provider. Of the children whose caretakers did not

169) went to a public provider for their last source of care. The median time to access a formal health care provider after noticing a severe symptom in the child was three days (IQ: 1-4). iii) Providers giving care Of the children who were taken outside their home for care, 13% (45/349) did not receive any treatment from the first provider seen, and 10% (17/169) did not receive any treatment from either the first or the last provider. A total of 141 of the 349 children (40%) who were taken to a provider were referred for further treatment, but only half (62/125; 22 missing values) of the caretakers adhered to the referral advice. Among the 62 caretakers who gave a reason for their nonadherence with the referral advice, the vast majority (87%) stated that it was because of a lack of money. Twenty-eight of the caretakers of the deceased children (6%) had been given a death certificate, but none of these stated the cause of death. None of the caretakers who had visited a hospital stated they had been given postmortem results. The three-delay model

The relative contribution of delays at the home, on the way to, or in the health facility was calculated (Table 1). Most delays in the care-seeking process were caused by problems at the health facility (44%) (Figure 2). At the health facility, approximately 20% of children (71/349)

Figure 2 The relative contribution of the three delays leading up to child death in Iganga/Mayuge HDSS, Uganda (n = 434).

206


Källander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 8 of 12

adhere to the referral advice, 12% did so because of lack of money for transport. Delays in the home contributed to 21% of the total delay. Four percent (17/434) of caretakers did not recognize severe illness symptoms, 32% (126/398) treated severe or possible severe symptoms at home, and 4% (18/434) never took their child for care outside the home. Of the children whose caretakers had recognized the severe symptoms, 42% (174/417) still waited more than one day before they went outside the home to seek treatment. Among the 349 children who were brought outside the home for care, 3 (1%) saw only informal providers during the illness episode. For children who were referred to another provider, 4% (6/141) did not go because of perceptions that the referral was unnecessary or that the child was improving.

ii) Caretakers seeking care outside the home Overall, 80% (32/40) of caretakers sought care outside the home for the child’s illness, though 44% (14/32) first went to informal health providers like traditional healers and drug shops. Nineteen percent (6/32) first went to formal private providers, and 38% (12/32) went to a public provider. Most of the 32 caretakers (78%) who sought care outside the home went to more than one provider, and 56% (18/32) went to a public provider as the last source of care. Still, 38% (12/32) went to a public provider first, and 19% (6/32) went to a formal private provider. The majority of the 32 caretakers (78%) who sought outside care went to more than one provider, and 56% (18/32) went to a public provider for their last source of care. The median time to access a formal health care provider after noticing a severe symptom in the child was two days (IQ: 1-5). iii) Provider giving care Of the children who were taken outside the home for care, 16% (5/32) did not receive any treatment from the first provider, and 20% (5/25) did not receive any treatment from the last provider. A total of 15 of the 32 children (47%) who were taken outside the home for care were referred for further treatment, but only nine (60%) of the caretakers adhered to the referral advice. The reasons for nonadherence with the referral advice was the death of the child before reaching the referral point (4 children), delayed caretaker decision making (3 children) and financial barriers hampering care seeking (2 children). Three caretakers (7.5%) stated they had received a death certificate, and only five caretakers received a postmortem result. However, the certificates were not with the respondents at the time of the interview (the document could not be found, was with someone else, or was kept at the hospital).

B Dodowa HDSS, Ghana Child characteristics

The analysis was based on the 40 deaths in children 29 days old to 5 years old between December 2008 and December 2009. One child was excluded because he died without having been ill. The mean age of the deceased children was 21.7 months (SD: 15.7), and 43% were females. Forty-three percent (17/40) had died at home. Length of illness

Sick children in Dodowa HDSS tended to die quickly after the illness was recognized. The length of illness from the time between the caretakers first noticed that a child was sick and the day of the child’s death was only 3.5 days (IQ: 2-6). The pathway model

As for Iganga/Mayuge HDSS, the pathway model (Figure 1) was used to calculate the proportion of caretakers in Dodowa HDSS who i) recognized illness and treated the child at home, ii) sought outside care from formal or informal providers and iii) received treatment or referral advice from a provider. i) Caretakers’ recognition of illness and care provided in the home Of the 40 caretakers interviewed, 28 (70%) mentioned having seen at least one severe symptom in the child. Of these, five (18%) only gave treatment at home without going for outside care whereas three died without receiving any kind of treatment. Most caretakers (28/40; 70%) first gave herbal or orthodox treatments or both at home before seeking care outside. Half of the caretakers gave drugs bought from drug shops, 36% (10/ 28) gave herbs, and 14% (4/28) gave a combination of orthodox and herbal medicines. A mix of paracetamol, antimalarials, and antibiotics were the most frequently mentioned medicines used in the home. The traditional medicines included an unspecified mix of leaves, herbal teas, and herbs for bathing the child.

The three delay model

The relative contribution of delays at the home, on the way to, or in the health facility was calculated (Table 2). Most delays (63%) were caused by factors in the household, where the main barriers were poor caretaker recognition of severe illness and delayed decision-making to seek formal care outside the house for the sick child (Figure 3). Thirty percent (12/40) of caretakers did not mention any severe symptoms seen in the child before death, and all caretakers who recognized severe or possibly severe symptoms first treated at home. Of the caretakers who recognized that their children had severe symptoms, 82% (23/28) still waited more than one day before they took the child outside the home for treatment. The second biggest delay was caused by problems in the health facility (26%). At the health care provider, 16% of children (5/32) were reported to have waited

207


K채llander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 9 of 12

explained by the social autopsy. However, other studies have previously shown that it is usually a combination of factors, of which poverty and distance to health centers are key determinants for late care seeking [30]. To improve policies and programs, it is critical for local health authorities to understand cultural issues and social beliefs, and we recommend that program implementers complement SA investigation with in-depth qualitative methods for a sample of cases (e.g., using case narratives with probing). Although most sick children were taken to qualified providers at some stage, many caretakers waited more than 24 hours after illness recognition before seeking any care outside the home. A median of two and three days passed between a caretaker noticing a severe symptom and seeking care from a formal health practitioner in Dodowa and in Iganga/Mayuge, respectively. Waiting for self-prescribed medicines to have an effect is one likely explanation for some of the later attendance at formal providers. Having used antibiotics in the home has previously been shown to be the only significant risk factor for late care seeking in children who later died of pneumonia in Uganda [19]. Our findings are also in accordance with other studies on fatal childhood illnesses in Guinea-Bissau, Tanzania, South Africa, and India, which showed high attendance at health facilities before child death [9,12,16,18]. Care in district hospitals is of poor quality in both Ghana and Uganda [36,37], and the fatal outcome of Ugandan children, who in a majority of cases had seen a formal health care provider prior to death, is likely explained by a combination of factors, such as late arrival of very sick children and district hospital incapacity to cater to critically ill children. Yet even though caretakers are aware of the poor quality of care in the public health system [31], they often have no choice when circumstances become grave. They will seek care there anyway, even though they know that treatment and staff may not be available. Hence, the quality of primary health care providers, both private and public, cannot be overlooked, and training of health workers is needed in sick child case management and in caretaker education on symptom recognition. The social autopsy approach provided descriptive information on fatal cases that is not routinely available, such as the timing of the events, the place of death, caretaker behavior during the illness episode, the number and type of health professionals involved, and details of treatments and advice given. Elaborating this sequence is essential to understanding the factors and constraints external to the disease itself that may be associated with the childhood deaths and that must be addressed when designing, implementing, and monitoring intervention strategies to reduce childhood deaths,

Figure 3 The relative contribution of the three delays leading up to child death in Dodowa HDSS, Ghana (n = 40).

more than one hour to be attended by a health care worker, and 20% (5/25) did not receive any treatment from either the first or last health care provider. The smallest contribution to delay (11%) was caused by problems with transport. Of the 32 caretakers who decided to seek care outside the home, all but 1 spent less than two hours traveling to the health provider, while one could not tell the actual time she spent travelling. Two of the caretakers who did not adhere with referral advice did not go because of lack of money for transport.

Discussion The SA questionnaire for deceased children under 5 years old developed by the INDEPTH SAWG was found to be useful for quantifying contributing factors to child death at the health facility, community, and household levels. In our study, we found that lack of symptom recognition and household practices (delay 1) was the most common cause of delay in Dodowa HDSS, while health facility-related delays (delay 3) were the primary reasons for delay in Iganga/Mayuge HDSS in Uganda. Our results are consistent with earlier studies of the possible contribution of inadequate care seeking and poor case management to childhood deaths [10,11,19]. An ethnographic study in Ghana found that mothers may not be able to recognize serious illness in their babies, and they often do not seek care outside the home even when they do realize that their child is seriously ill [29]. Studies in older children conducted in the same setting [30,31] and elsewhere in Uganda [32] have shown similar challenges in care seeking and referral. Other barriers, which could not be elucidated by the present semistructured social autopsy tool, include gender aspects of decision-making, other responsibilities at home, local perceptions of illness and care providers, poverty, and distance to health facilities [30,31,33-35]. Which of these barriers led to delays at home cannot be

208


K채llander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 10 of 12

importance of illness severity as a parameter in the analysis, a question on alertness, activeness, and feeding practices was added in the final tool to assess the status of the child at the time of illness recognition and care seeking. As we collected information only on children that died, we cannot conclude whether any of the delays had a negative effect on survival, as this would require a case-control approach. However, some practices reported, such as mothers only seeking outside care when the child was severely sick or severely sick children not receiving any treatment for the illness after having seen one or more providers, are likely to be detrimental to survival of sick children.

such as the IMCI initiative or community case management programs for sick children. One challenge faced by sites that implemented the SA tool in this study was how to integrate the SA tool into the VA tool without creating an instrument that was overly cumbersome. Collecting the SA data separately from the VA data during different household visits may shorten the time for the SA interview but would negatively affect the flow of questions, as many VA and SA indicators are interlinked and difficult to separate (e.g., timing of care seeking in relation to symptoms mentioned). Another challenge was the complexity of analyzing care-seeking data. Following the advice provided by Aguilar et al. (1998) [20] and learning from the experiences from Bandim HDSS in Guinea-Bissau, where caretakers had a difficult time remembering the order in which different providers were seen (Personal Communication with Rikke Thoft Nielsen 2011), the tool was designed to only capture information on the first and last providers. While this may exclude valuable information that could explain behavior, it was deemed a necessary revision of the tool to reduce the complexity for the data collectors and analysts to a manageable level. To overcome this complexity, developing a computerized coding of SA data as is done for verbal autopsy [2,38] would be preferable. With more programming, these tools could potentially adopt the structure of the Tanzania Essential Health Interventions Project (TEHIP) District Health Intervention Profile, which not only displays the local disease burden but also links the data to displays of health systems actions that can be implemented to address the most common problems and barriers [39]. Another option used in other studies [10,11] is to have an expert panel review each death using standard checklists and frameworks, which is a model commonly used for the interpretation of VA data. This study has some limitations. The illness history and care-seeking information are based on interviews by nonmedical personnel. Interviews depending on recall pose reliability and validity problems [40]. However, severe symptoms are normally remembered longer than mild symptoms [41]. Since most interviews were made within four to six weeks of death, recall bias was likely low. In these retrospective interviews we were not able to determine the actual quality of care provided to these children, nor were we able to adequately determine social and cultural processes in the family that affected the care seeking. Other methods (e.g., clinical audits and in-depth interviews) will be needed to investigate these factors. Another important limitation was the failure to analyze the sequence of symptoms in the order that they appeared and, hence, the care-seeking patterns in relation to the severity of the disease. Given the

Conclusions Our team has shown that social autopsy data can be collected as part of verbal autopsy data and that such data could be useful for informing the design, implementation, and monitoring of interventions. It is often perceived that the key challenge to effective coverage of child survival interventions is poor caretaker care-seeking behavior. Our findings reveal that while delays at home were indeed common in Dodowa HDSS in Ghana, a main challenge in Iganga/Mayuge HDSS in Uganda was the sick child case management in the health system. Meanwhile, caretaker awareness of danger signs needs to improve in both countries to promote early care seeking and reduce the number of children needing referral. Social autopsy is a promising method to improve the understanding of the circumstances preceding death. Access to this information can help health planners and policymakers prioritize scarce resources appropriately by identifying the most suitable interventions for the specific context. Acknowledgements and Funding The Social Autopsy Working Group is grateful to Dr. Osman Sankoh, Executive Director/INDEPTH Network, for his continued support and valuable input throughout the conception, implementation, and analysis of this study. We are also grateful to Evasius Bauni and Rebecca Njue for their input in the tool design phase. This study was supported by a grant from the INDEPTH Network to the Sida supported Health and Demographic Surveillance Site in Iganga and Mayuge districts in Uganda. TNW is supported through a fellowship awarded by the Wellcome Trust (grant 076934). We thank the study participants, research assistants, the staff of the Health and Demographic Surveillance Sites in Iganga/Mayuge, Bandim, Dodowa, and Kilifi. We are grateful to the fruitful discussions we have had with the Child Health Epidemiology Research Group (CHERG), especially with Henry Kalter and his team. Author details 1 Department of Health Policy, Planning & Management, School of Public Health, Makerere University, P.O. Box 7072, Kampala, Uganda. 2Iganga/ Mayuge Health & Demographic Surveillance Site (HDSS), P.O. Box 111, Iganga, Uganda. 3Department of Public Health Sciences, Division of International Health (IHCAR), Nobels V채g 9, Karolinska Institutet, Stockholm 17176, Sweden. 4Malaria Consortium Africa, P.O box 8045, Kampala, Uganda. 5 KEMRI-Wellcome Trust Research Programme, Epidemiological and Demographic Surveillance System (EPI-DSS) Group, Kilifi, Kenya. 6Nuffield

209


Källander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 11 of 12

Department of Clinical Medicine, Centre for Tropical Medicine, University of Oxford, Churchill Hospital, Old Road, Oxford OX3 7LJ, UK. 7Bandim Health Project, Apartado 861, Bissau, 1004 Bissau Codex, Guinea-Bissau. 8Statens Serum Institut, 5 Artillerivej, Copenhagen 2300, Denmark. 9Dodowa Health Research Centre, Ghana Health Service, P.O. Box 1, Dodowa, Ghana.

17. Krug A, Pattinson RC, Power DJ: Saving children–an audit system to assess under-5 health care. S Afr Med J 2004, 94(3):198-202. 18. de Savigny D, Mayombana C, Mwageni E, Masanja H, Minhaj A, Mkilindi Y, Mbuya C, Kasale H, Reid G: Care-seeking patterns for fatal malaria in Tanzania. Malar J 2004, 3(1):27. 19. Kallander K, Hildenwall H, Waiswa P, Galiwango E, Peterson S, Pariyo G: Delayed care seeking for fatal pneumonia in children aged under five years in Uganda: a case-series study. Bull World Health Organ 2008, 86(5):332-338. 20. Aguilar AMAR, Cordero D, Kelly P, Zamora D, Salgado R: Mortality Survey in Bolivia: The Final Report. Investigating and Identifying the Causes of Death for Children Under Five. Arlington, Va: USAID by the Basic Support for Institutionalizing Child Survival (BASICS) Project; 1998. 21. Uganda Bureau of Statistics: Uganda Demographic and Health Survey 2006. Calverton, Maryland, USA, UBOS and ORC Macro.; 2006. 22. Mortality country fact sheet - Uganda. [http://www.who.int/whosis/mort/ profiles/mort_afro_uga_uganda.pdf]. 23. Ghana Statistical Service (GSS), Ghana Health Service (GHS), ICF Macro: Ghana Demographic and Health Survey 2008. Accra, Ghana: GSS, GHS, and ICF Macro; 2009. 24. Mortality country fact sheet - Ghana. [http://www.who.int/whosis/mort/ profiles/mort_afro_gha_ghana.pdf]. 25. Kalter HD, Salgado R, Gittelsohn J, Parades P: A Guide to Conducting Mortality Surveys and Surveillance. Arlington, Virginia: Basic Support for Institutionalizing Child Survival Project (BASICS II) for the United States Agency for International Development; 2004. 26. Waldman R, Bartlett A, Campbell C, Steketee R: Overcoming Remaining Barriers: The Pathway to Child Survival. Arlington, VA: United States Agency for International Development, the BASICS Project; 1996. 27. Gove S: Integrated management of childhood illness by outpatient health workers: technical basis and overview. The WHO Working Group on Guidelines for Integrated Management of the Sick Child. Bull World Health Organ 1997, 75(Suppl 1):7-24. 28. Thaddeus S, Maine D: Too far to walk: maternal mortality in context. Soc Sci Med 1994, 38(8):1091-1110. 29. Bazzano AN, Kirkwood BR, Tawiah-Agyemang C, Owusu-Agyei S, Adongo PB: Beyond symptom recognition: care-seeking for ill newborns in rural Ghana. Trop Med Int Health 2008, 13(1):123-128. 30. Rutebemberwa E, Kallander K, Tomson G, Peterson S, Pariyo G: Determinants of delay in care-seeking for febrile children in eastern Uganda. Trop Med Int Health 2009, 14(4):1-8. 31. Rutebemberwa E, Nsabagasani X, Pariyo G, Tomson G, Peterson S, Kallander K: Use of drugs, perceived drug efficacy and preferred providers for febrile children: implications for home management of fever. Malar J 2009, 8(1):131. 32. Peterson S, Nsungwa-Sabiiti J, Were W, Nsabagasani X, Magumba G, Nambooze J, Mukasa G: Coping with paediatric referral–Ugandan parents’ experience. Lancet 2004, 363(9425):1955-1956. 33. Nsungwa-Sabiiti J, Kallander K, Nsabagasani X, Namusisi K, Pariyo G, Johansson A, Tomson G, Peterson S: Local fever illness classifications: implications for home management of malaria strategies. Trop Med Int Health 2004, 9(11):1191-1199. 34. Tanner M, Vlassoff C: Treatment-seeking behaviour for malaria: a typology based on endemicity and gender. Soc Sci Med 1998, 46(45):523-532. 35. Comoro C, Nsimba SE, Warsame M, Tomson G: Local understanding, perceptions and reported practices of mothers/guardians and health workers on childhood malaria in a Tanzanian district–implications for malaria control. Acta Tropica 2003, 87(3):305-313. 36. Nolan T, Angos P, Cunha AJ, Muhe L, Qazi S, Simoes EA, Tamburlini G, Weber M, Pierce NF: Quality of hospital care for seriously ill children in less-developed countries. Lancet 2001, 357(9250):106-110. 37. Issah K, Nang-Beifubah A, Opoku CF: Maternal and neonatal survival and mortality in the Upper West Region of Ghana. Int J Gynaecol Obstet 2011, 113(3):208-210. 38. Byass P, Fottrell E, Dao LH, Berhane Y, Corrah T, Kahn K, Muhe L, Do DV: Refining a probabilistic model for interpreting verbal autopsy data. Scand J Public Health 2006, 34(1):26-31. 39. Tanzania Essential Health Interventions Project. [http://www.idrc.ca/en/ev3170-201-1-DO_TOPIC.html].

Authors’ contributions KK and DK were involved in the conception and design of this work, the data collection, the analysis and interpretation of the data, and in the writing of the manuscript. LY, RTN, and AM were involved in the conception and design of this work, data collection, and in the writing of the manuscript. JA and CN were involved in data collection, data analysis, and in the writing of the manuscript. TNW, MG, AA, and PW were involved in the conception and design of this work and in the writing of the manuscript. All authors read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Received: 15 March 2011 Accepted: 5 August 2011 Published: 5 August 2011 References 1. Black RE, Cousens S, Johnson HL, Lawn JE, Rudan I, Bassani DG, Jha P, Campbell H, Walker CF, Cibulskis R, et al: Global, regional, and national causes of child mortality in 2008: a systematic analysis. Lancet 2010, 375(9730):1969-1987. 2. Murray CJ, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the symptom pattern method for analyzing verbal autopsy data. PLoS Med 2007, 4(11):e327. 3. Setel PW, Macfarlane SB, Szreter S, Mikkelsen L, Jha P, Stout S, AbouZahr C: A scandal of invisibility: making everyone count by counting everyone. Lancet 2007, 370(9598):1569-1577. 4. WHO: The World Health Report 2008: Primary Health Care - Now More Than Ever. Geneva: WHO; 2008. 5. Baiden F, Bawah A, Biai S, Binka F, Boerma T, Byass P, Chandramohan D, Chatterji S, Engmann C, Greet D, et al: Setting international standards for verbal autopsy. Bull World Health Organ 2007, 85(8):570-571. 6. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84(3):239-245. 7. WHO: Verbal autopsy standards: ascertaining and attributing cause of death. Geneva: WHO; 2007. 8. Terra de Souza AC, Peterson KE, Andrade FM, Gardner J, Ascherio A: Circumstances of post-neonatal deaths in Ceara, Northeast Brazil: mothers’ health care-seeking behaviors during their infants’ fatal illness. SocSciMed 2000, 51(11):1675-1693. 9. de Zoysa I, Bhandari N, Akhtari N, Bhan MK: Careseeking for illness in young infants in an urban slum in India. Soc Sci Med 1998, 47(12):2101-2111. 10. Bojalil R, Kirkwood BR, Bobak M, Guiscafre H: The relative contribution of case management and inadequate care-seeking behaviour to childhood deaths from diarrhoea and acute respiratory infections in Hidalgo, Mexico. Trop Med Int Health 2007, 12(12):1545-1552. 11. Waiswa P, Kallander K, Peterson S, Tomson G, Pariyo GW: Using the three delays model to understand why newborn babies die in eastern Uganda. Trop Med Int Health 2010, 15(8):964-972. 12. Sodemann M, Jakobsen MS, Molbak K, Alvarenga IC, Aaby P: High mortality despite good care-seeking behaviour: a community study of childhood deaths in Guinea-Bissau. Bull World Health Organ 1997, 75(3):205-212. 13. Sutrisna B, Reingold A, Kresno S, Harrison G, Utomo B: Care-seeking for fatal illnesses in young children in Indramayu, west Java, Indonesia. Lancet 1993, 342(8874):787-789. 14. Reyes H, Perez-Cuevas R, Salmeron J, Tome P, Guiscafre H, Gutierrez G: Infant mortality due to acute respiratory infections: the influence of primary care processes. Health Policy Plan 1997, 12(3):214-223. 15. Durrheim DN, Frieremans S, Kruger P, Mabuza A, de Bruyn JC: Confidential inquiry into malaria deaths. Bull World Health Organ 1999, 77(3):263-266. 16. Krug A, Pattinson RC, Power DJ: Why children die: an under-5 health care survey in Mafikeng region. S Afr Med J 2004, 94(3):202-206.

210


Källander et al. Population Health Metrics 2011, 9:44 http://www.pophealthmetrics.com/content/9/1/44

Page 12 of 12

40. Kroeger A: Health interview surveys in developing countries: a review of the methods and results. Int J Epidemiol 1983, 12:465-481. 41. Linder FE: National health interview surveys: Trends in the study of morbidity and mortality. Geneva: WHO; 1965. doi:10.1186/1478-7954-9-44 Cite this article as: Källander et al.: Social autopsy: INDEPTH Network experiences of utility, process, practices, and challenges in investigating causes and contributors to mortality. Population Health Metrics 2011 9:44.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

211


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

REVIEW

Open Access

Social autopsy for maternal and child deaths: a comprehensive literature review to examine the concept and the development of the method Henry D Kalter1*, Rene Salgado2, Marzio Babille3, Alain K Koffi1 and Robert E Black1

Abstract “Social autopsy” refers to an interview process aimed at identifying social, behavioral, and health systems contributors to maternal and child deaths. It is often combined with a verbal autopsy interview to establish the biological cause of death. Two complementary purposes of social autopsy include providing population-level data to health care programmers and policymakers to utilize in developing more effective strategies for delivering maternal and child health care technologies, and increasing awareness of maternal and child death as preventable problems in order to empower communities to participate and engage health programs to increase their responsiveness and accountability. Through a comprehensive review of the literature, this paper examines the concept and development of social autopsy, focusing on the contributions of the Pathway Analysis format for child deaths and the Maternal and Perinatal Death Inquiry and Response program in India to social autopsy’s success in meeting key objectives. The Pathway Analysis social autopsy format, based on the Pathway to Survival model designed to support the Integrated Management of Childhood Illness approach, was developed from 1995 to 2001 and has been utilized in studies in Asia, Africa, and Latin America. Adoption of the Pathway model has enriched the data gathered on care seeking for child illnesses and supported the development of demand- and supply-side interventions. The instrument has recently been updated to improve the assessment of neonatal deaths and is soon to be utilized in large-scale population-representative verbal/ social autopsy studies in several African countries. Maternal death audit, starting with confidential inquiries into maternal deaths in Britain more than 50 years ago, is a long-accepted strategy for reducing maternal mortality. More recently, maternal social autopsy studies that supported health programming have been conducted in several developing countries. From 2005 to 2009, 10 high-mortality states in India conducted community-based maternal verbal/social autopsies with participatory data sharing with communities and health programs that resulted in the implementation of numerous data-driven maternal health interventions. Social autopsy is a powerful tool with the demonstrated ability to raise awareness, provide evidence in the form of actionable data and increase motivation at all levels to take appropriate and effective actions. Further development of the methodology along with standardized instruments and supporting tools are needed to promote its widescale adoption and use. Introduction and background In developing country settings with inadequate vital registration systems and where many deaths occur at home, verbal autopsy is the investigative method most often used to determine the prevailing biological causes of death. Health policymakers and programmers require

these data to identify health priorities, allocate sparse resources, and evaluate the impact of health programs. Social autopsy consists of questions on modifiable social, cultural, and health system factors that contribute to the same deaths investigated by verbal autopsy. Because social autopsy studies are often conducted without a control group of survivors, it is important that the factors included be based on interventions of proven efficacy. Health care programmers and policymakers need these data to identify strategies for increasing health-

* Correspondence: hkalter@jhsph.edu 1 Department of International Health, Johns Hopkins Bloomberg School of Public Health, (615 North Wolfe Street), Baltimore, (21205), USA Full list of author information is available at the end of the article

© 2011 Kalter et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

212


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Page 2 of 13

health program implementation. But national level data, important for advocating and securing resources for community health approaches, as well as for developing regional and global estimates of social and behavioral determinants of health, were lacking. An anomaly emerged as well – that, while the methodology grew from a programmatic approach that acknowledges the importance of community participation, few of the programs or researchers conducting social autopsies have sought participation below the level of health programmers and policymakers in sharing or utilizing the data for program or intervention development. However, a track did emerge among practitioners and external users of maternal death reviews, including those based on verbal autopsy, recognizing the power of the data to increase the visibility and awareness of the problem [6] and, in the process, to raise the demand for access to quality maternal health care as a human right [7]. In this way, social autopsy has made an important contribution to the political process and formation of health policy at the global, national, and subnational levels. This paper undertakes a comprehensive review of the literature in order to examine the concept of social autopsy, the development of the methodology, and the quality of its execution in the pursuit of five key objectives. The analysis of study outcomes is organized around seminal efforts in which the authors have participated, while also assessing how widely and successfully the social autopsy method has been adopted.

promotive behaviors and access to and utilization of quality health care services. Two complementary purposes of social autopsy are to increase awareness of maternal and child mortality to empower communities to participate and to engage health programs to increase responsiveness and accountability; and to provide largescale population-level data to support advocacy and securing of the necessary resources to tackle these problems. Verbal autopsy instruments for child deaths have most often included only limited elements that could be termed “social autopsy,” usually consisting of a few questions regarding whether and where care was sought for the fatal illness. In contrast, verbal autopsies for maternal deaths earlier on and more frequently have examined the social contributors to death alongside the medical causes. Factors influencing this approach include the success of the nationwide system of health facility-based confidential inquiry into maternal deaths conducted in the United Kingdom since 1952, which from the beginning recognized the importance of social factors and examined these by constructing illustrative vignettes of individual maternal deaths [1]; and later, the widespread adoption of the “Three Delays” model of maternal mortality [2], which highlights the social/behavioral causal chain linking the household, community, and health system and provides a clear framework for the development of maternal social autopsy tools. The World Health Organization (WHO) helped promote the spread of maternal death reviews using several methods, including verbal autopsy with a strong social element, with its Beyond the Numbers effort [3], which was highly influenced by the earlier work in Britain. Similarly influenced by Mosley and Chen’s 1984 framework for the study of child survival in developing countries, which posited a set of socioeconomic determinants underlying and operating through proximate biological factors to affect mortality [4], child survival strategies in the 1990s evolved in the same direction toward integrated approaches and an appreciation for the importance of household and community factors in health promotion, disease prevention, and treatment. This culminated in the development of the Pathway to Survival framework in 1995 [5]. In response, social autopsy efforts for fatally ill children emerged that holistically track the entire process and determinants of health care provision, care seeking (or not) from home to facility, and the quality of care provided. The size and scope of the enhanced social autopsy efforts both for child and maternal deaths varied, but most were limited to studies at subdistrict, district, or country-region level. This was often appropriate, as important social factors may vary by site and many social autopsy studies were intended to support local

Methods Search strategy

We conducted computerized searches of Embase, PubMed, and SCOPUS databases using the keywords and phrases: (careseeking OR care-seeking OR care seeking) AND (death OR mortality), “social autopsy,” and “verbal autopsy.” We then manually searched references quoted in original publications for additional information. Study inclusion and exclusion criteria

To be included in the review, studies had to fulfill the following criteria: 1. Published after 1989 in a peer-reviewed journal or as a report accessible through a Web search; 2. Examine the care-seeking process for fatal illnesses of children from birth to 5 years or for maternal deaths; 3. Investigate a minimum of 50 child or maternal deaths; 4. Include an abstract accessible through the search database; and 5. Written in English or French.

213


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Page 3 of 13

prior to the development of the Pathway to Survival model in 1995, which positively influenced the scope of care-seeking factors considered by subsequent studies. Understanding the Pathway sets the context for examining the development of the social autopsy method.

Study characteristics

Standard information was abstracted from all eligible studies by two reviewers (AK and HDK). The information included the following: the dates the data were collected and published; the setting, i.e., the country and site in which the work was conducted; the group studied (maternal, child, or both); the number of deaths observed; the study objective and design; and the format (open-ended, closed-ended, or combined) and source of the social autopsy questionnaire.

Child social autopsy: providing evidence on failures in the Pathway to Survival

The multicountry assessment of WHO/UNICEF’s Integrated Management of Childhood Illness (IMCI) approach found that, although IMCI health facilities provided the gold standard of child illness care in developing countries, the strategy failed to decrease child mortality. In part this was due to weak implementation of IMCI’s family and community component, and assuming that quality health care services alone would lead to increased care seeking and appropriate home care practices [11]. Access, coverage, and utilization were all found to lag, resulting in ineffective delivery of appropriate child survival technologies. The Pathway to Survival conceptual framework (Figure 1) was designed to support the implementation and monitoring of IMCI, with the aim of highlighting the essential steps that need to be taken both inside the home and in the community to prevent child illness and return sick children to health. The pathway identifies and organizes modifiable social, cultural, and health system factors affecting home care practices, health care access and utilization, and the delivery of quality health care [12]. As seen in Table 1 (and Additional file 1, which provides more details), the three child social autopsy studies conducted prior to the development of the Pathway model [13-15], as well as the three non-Pathway studies conducted after the Pathway model was developed [16-18], examined, on average, three aspects of the care-seeking process. In contrast, the eight later studies that followed the Pathway model [8,9,19-24] on average assessed eight care-seeking elements, providing a more complete understanding of the care-seeking process and the factors affecting health care utilization for severely ill children in developing countries. Nevertheless, even the non-Pathway studies, with their limited set of data variables and none examining a representative sample of deaths at the district or higher level (two of the Pathway studies met this objective), attempted to form a social diagnosis of mortality determinants. More of the Pathway studies were also conducted by or in support of health programming or health policy development (5/8) and to support community participation and empowerment (4/8), compared, respectively, to 2/6 and 1/6 of the non-Pathway studies. Somewhat higher percentages of the Pathway than non-Pathway studies were also rated

Outcomes

Data were also extracted to assess whether the study met five key objectives of social autopsy, as follows: (i) essential elements of the care-seeking process were described, including recognition of the illness, whether adequate home care was provided, whether and what type of outside-the-home care was sought (informal, formal, or both), delays to formal health care seeking and related constraints (e.g., lack of knowledge of illness danger signs, seeking traditional care, lack of transportation, costs), and the quality of health care provided (from the client’s perspective); (ii) a social diagnosis of the contributors to death was made, i.e., household (behavioral), community (social), and health system determinants of the deaths were identified; (iii) the study provided representative national or large area data; and the data were utilized to support (iv) health program or policy development; and/or (v) community empowerment. Ethical considerations

The social autopsy studies conducted in Bolivia [8] and Guinea [9] that two of the authors participated in and that were central to the work described in this paper were programmatic efforts approved by the national and regional ministries of health (MOH) of the respective countries without undergoing formal ethical review. The Maternal and Perinatal Death Inquiry and Response (MAPEDIR) program in India [10], also central to this paper, was reviewed by the Johns Hopkins University institutional review board and found to be a programmatic effort, rather than research, and so was exempt from board oversight. Nevertheless, key Helsinki principles were upheld in the conduct of all these studies, including administering informed consent to all respondents and maintaining the confidentiality of the information they provided.

Results The search of the databases using the keywords and phrases identified 14 articles and reports of child deaths and eight of maternal deaths that met the inclusion criteria (Table 1). Only three child studies were conducted

214


Study Author and reference #

Publication date

Study characteristics

Outcomes

Study setting Age group studied

No. of deaths investigated

Data collected on the careseeking process: 1) illness recognition; 2) home care; 3) recognition of severe illness; 4a) times, 4b) sequence, and 4c) type of health care sought; 5) CS delays; 6) CS constraints; 7) quality of care; 8) referral; 9) compliance with home care and/ or referral advice

Social diagnosis of contributors to death was made

Data provided were: 1) representative; 2) large area (district/ regional/ national)

Data were utilized to support health program or policy development: 1) advocacy/ accountability; 2) data sharing and interpretation; 3) intervention development = responsiveness

Data were utilized to support community empowerment: 1) data sharing and interpretation; 2) intervention development; 3) monitoring and revision

Sustrisna [13] 1993

Indonesia: 10,000 HHs, Indramayu, West Java

Under 5 years old

139

4c; 6

Implied

1) Unclear; 2) No

1/2/3) None stated

1/2/3) None stated

Gutierrez [14]

1994

Mexico: Tlaxcala state

3 days-5 years old

98 ARI & 34 acute diarrhea

4c, 5; 7; 8

Implied

1) Unclear 2) Yes

□1/2/3) Yes

1/2/3) None stated

Sodemann [15]

1997

Guinea-Bissau: 2 suburbs of Bissau

1-30 months old

125

4a, c; 5; 7

Yes

1) Yes; 2) No

1/2/3) None stated

1/2/3) None stated

Aguilar [8]

1998

Bolivia: El Alto city

Under 5 years old

271

PtoS study: 1; 2; 3; 4b, 4c; 5; 7; 9

Yes

1) Likely; 2) No

△ 1) Yes; 2/3) None stated

△ 1) Yes; 2/3) None stated

de Bocaletti [19]

1999

Guatemala: 4 towns

Stillbirths & 0-6 days old

101/36

PtoS study: A) Mother: delivery place & decision maker; B) Mother & child: 1; 2; 3; 4c; 5; 6; 7; 9

Yes

1) Likely; 2) No

□1/2) Yes; 3) Goals stated

□1/2) Yes; 3) None stated

de Souza [20]

2000

Brazil: 11 municipalities, Ceara state

1-11 months old

127

PtoS study: 2; 3; 4a, b, c; 5; 6; 7; 9

Yes

1) Possible; 2) Yes

1/2/3) None stated

△1) Yes; 2/3) None stated

RACHA [21]

2000

Cambodia: 40 villages in 4 provinces

Perinates & 1 wk.59 mo. Old

59/119

PtoS study: A) Mother: delivery place & decision maker; B) Mother & child: 1; 2; 3; 4a, b, c; 5; 6; 7; 9

Yes

1) Possible; 2) Yes

□1/2) Yes; 3) Goals stated

1) None stated; 2) Goal to mobilize the community; 3) None stated

Bhandari [22] 2002

India: 2 urban slums, Delhi

0-365 days

162

PtoS study: 3; 4a, b, c; 5; 7; 9; referral compliance constraints

Yes

1) No; 2) No

1/2/3) None stated

1/2/3) None stated

Schumacher [9]

2002

Guinea: Mandiana prefecture

0 days-59 330 months old

PtoS study: 1; 2; 3; 4a, b, c; 5; 6; 7; 9 Yes

1) Yes; 2) Yes

□1/2/3) Yes

□1/2) Yes; 3) None stated

Hinderaker [16]

2003

Tanzania: 2 divisions in 2 districts

Stillbirths and neonates

136

A) Mother: delivery place; B) Child: 5; 7; 8

Yes

1) Probably not; 1/2/3) None stated 2) No

de Savigny [17]

2004

Tanzania: Rufiji Under 5 DSS years old

320 (all malaria)

2; 4a, b, c

Presumed, not demonstrated

1) Yes; 2) No

1/2/3) None stated

□1) Yes; 2/3) None stated

Page 4 of 13

215

□1/2) Yes; 3) None stated

Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Table 1 Studies and reports meeting the inclusion criteria of the comprehensive review


75 ARI & diarrhea

PtoS study: 3; 4a, c; 5; 6; 7

Yes

1) Yes; 2) Yes

△1/2/3) None stated, 1/2/3) None stated but the study aimed to “provide information to better implement interventions linked with IMCI program”

100

4c; 6

Yes

1) Yes; 2) No

1/2/3) None stated

1/2/3) None stated

Neonates 64

PtoS study: A) Mother: delivery Yes place and attendant; B) Child: 3; 5; 7

1) No; 2) No

1/2/3) None stated

1/2/3) None stated

Zimbabwe: 1 province and urban Harare

Maternal

166

5; 6; 7; 8; 9

Yes

1) Possible; 2) Yes

12/3) None stated

△1) Yes; 2/3) None stated

2000

Mexico: 3 states

Maternal

145

3; CS decision maker; 4a, c; 5; 6; 7; 8 Yes

1) Yes; 2) Yes

□1) Yes; 2/3) None stated

1/2/3) None stated

Supratikto [38]

2002

Indonesia: 3 districts, S. Kalimantan

Maternal

130

4c, 5; 6; 7

Yes

1) Possible; 2: Yes

□1/2/3) Yes

△1) Yes; 2/3) None stated

Bartlett [34]

2005

Afghanistan: Kabul & 3 districts

Maternal

133

4c; 5; 6; 7

No

1) Possible 2) Yes

□1) None stated; 2/3) Yes

1/2/3) None stated

Campbell [35]

2005

Egypt

Maternal

718 (1992/ 3; 4c; 5; 7 3) / 580 (2000)

Yes

1) Yes; 2) Yes

□1/2/3) Yes

△1) Passive; 2/3) None stated

UNICEF [10]

2008

India: 4 districts in 3 states

Maternal

102 (1 district)

3; 4a, b, c; 5; 6

Yes

1) Possible; 2) Yes

□1/2/3) Yes

□1/2) Yes; 3) None stated

Jafarey [37]

2009

Pakistan: 2 districts

Maternal

128

3; 4c; 5; 6; 7; 8

Yes

1) Possible/No; 2) Yes

□1) Yes: 2/3) None stated

1/2/3) None stated

D’Ambruoso [40]

2010

Burkina Faso: 1 district; Indonesia: 2 districts

Maternal

70 (BF) / 5; 6; 7 104 (Indonesia)

Yes

1) No; 2) Yes

1/2/3) None stated

1/2/3) None stated

Bojalil [23]

2007

Mexico: Hidalgo state

Under 5 years old

Beiersmann [18]

2007

Burkina Faso: Under-5 sub-portion of years old 1 district with malaria

Waiswa [24]

2010

Uganda: Iganga/ Mayuge DSS

Fawcus [39]

1996

Castro [36]

Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Table 1 Studies and reports meeting the inclusion criteria of the comprehensive review (Continued)

CS: care seeking; HH: household; ARI: acute respiratory infection; PtoS: Pathway to Survival; IMCI: Integrated Management of Childhood Illness; DSS: demographic and surveillance site; △ and □: studies ranked, respectively, as providing “any” and “strong” support to health programs and communities.

Page 5 of 13

216


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Page 6 of 13

Figure 1 The Pathway to Survival.

the group producing the materials, as well as by an independent group working in Uganda [24]. Figure 2 illustrates the type of information gathered by the social autopsy instrument with data from Guinea for 330 child deaths. It can be seen, for example, that while 290 (88%) of the children’s caretakers recognized one or more signs of a severe illness, 34 (10%) of the children received no care whatsoever, 238 (72%) were taken for some outside-the-home care, on average 2.3 days after the illness began, and only 132 (40%) children received some formal health care, on average 3.5 days after the illness onset. The Guinea study identified only 13 referrals, perhaps because it used an early version of the social autopsy questionnaire that did not ask about this directly; a later refinement improved the assessment of referrals. In 2009, the WHO/UNICEF-supported Child Health Epidemiology Reference Group (CHERG) [26] undertook to review and update the Pathway Analysis social autopsy format. The main issues considered were: 1) in response to the increased contribution of neonatal deaths to overall child mortality resulting from recent decreases in post-neonatal deaths [27], to improve the format’s assessment of stillbirths and neonatal deaths and related care-seeking issues by adding modules on maternal and newborn care,

as strongly supporting health programs (3/8 versus 2/6) and communities (2/8 versus 1/6). The first analysis of child deaths following the Pathway model was a survey conducted in 1995 of 271 child deaths from randomly selected census tracts over a then-recent nine-month period in El Alto, Bolivia [8]. The social autopsy format used in the Bolivia study, consisting of a separate sheet duplicating the same open-ended questions on possible problems along the pathway for each day of the illness, was found to be cumbersome both for data collection and analysis. Subsequent work produced a one-page social autopsy tool formatted as a matrix, with each row constituting one action taken for the illness, and columns for recording the action, the illness day the action was taken, the illness signs at the time the action was taken, reasons for the action being taken, and each of the remaining steps along the pathway. The Pathway Analysis social autopsy format was published online as part of a manual aimed at health programs describing how to undertake a child mortality study to determine the biological causes and social determinants of death [25]. Later work added a module for investigating perinatal deaths. Subsequent social autopsy studies utilizing or based on this format and manual were conducted in Guatemala [19], Cambodia [21], and Guinea [9] with support from

217


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Page 7 of 13

Figure 2 Pathway analysis for 330 child deaths in Mandiana Prefecture, Guinea; denominators: * = 212 children seen by an informal or formal provider and not referred or hospitalized, ** = 238 children seen by a formal or informal provider, *** = 132 children seen by a formal health provider.

considerably longer than the original version, CHERG is developing CAPI (computer-assisted personal interview) software for field-based data capture on a netbook or tablet computer with built-in consistency checks, automatic mapping of skip patterns, and correct question wording depending on who the respondent is and the child’s age at death. This should significantly ease the interview process and increase the quality of the data. Software versions are being developed both for the integrated verbal/social autopsy (VASA) interview and for the social autopsy alone to enable using it with other verbal autopsy instruments. These tools will be available open access on the CHERG website in order to facilitate their widespread use. The largest scale of the prior Pathway Analysis studies was at the provincial level. One last objective that CHERG is working to fulfill is to collaborate with government and international partners in several countries in Africa to develop national and other large-scale VASA studies. The purpose is to provide evidence of modifiable social, cultural, and health systems factors contributing to neonatal and child mortality for advocacy and health policy and planning exercises and to

including care seeking for maternal complications; 2) to strengthen the evaluation of child preventive care; 3) to examine a host of behavioral and social factors not previously considered; and 4) to include questions on the utilization of trained community health workers, in accord with the recent inclusion of these workers in some formal health systems. Most of the social, behavioral, and preventive factors that were added to the questionnaire (Table 2) were based on interventions included in the Lives Saved Tool [28], which undergo rigorous review for evidence of their efficacy. In addition, where possible, the questions were worded similarly to those in the Demographic and Health Surveys (DHS) [29] in order to facilitate comparisons of the social autopsy data with similar data for survivors in settings where a recent DHS was conducted. CHERG integrated the updated social autopsy instrument with the Population Health Metrics Research Consortium verbal autopsy questionnaire, which is currently being extensively validated by studies described in other articles in this series of Population Health Metrics. While the updated Pathway Analysis format is

218


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Page 8 of 13

India with assistance from UNICEF [10]. Because up to half or more of maternal deaths in these states are thought to go unreported, and many of these are thought to occur at home, it was decided to initiate MAPEDIR with community-based VASA of maternal deaths. In 2010, this effort culminated in the Government of India announcing its plan to commence a nationwide program of facility- and community-based maternal death audits [33]. The rationale for maternal social autopsy is the same as for child deaths. There are several highly efficacious interventions against maternal mortality, including, for example, antibiotics for preterm premature rupture of the membranes to prevent maternal (and fetal) sepsis, a skilled birth attendant providing active management of the third stage of labor to prevent postpartum hemorrhage, and treatment of primary postpartum hemorrhage with rectal misoprostol. Yet, as illustrated by the findings of 800 VASA interviews in Orissa, India (Figure 3), many women in developing countries may die at home without ever seeking health care for their maternal complications, and many who do seek care never receive effective treatment. As in the India context, this may often be true despite the fact that multiple facilities at a level that should be capable of providing basic or comprehensive emergency obstetric care are visited during the fatal illness. To effectively tackle these problems, data are needed on the social, behavioral, and health system factors contributing to the deaths. And effective sharing of such data, in a setting where in its absence many of the deaths would not have even been registered much less investigated, can raise awareness of the magnitude, causes, and determinants of the problem and support the development of effective interventions with communities and health programs. Where

Table 2 Social, behavioral, and preventive factors included in the updated Pathway Analysis social autopsy questionnaire Social factors • Mother’s education, literacy, age at marriage • Household possessions, husband’s education, breadwinner’s occupation • Duration of residence in community and time to reach usual health provider • Social capital (community joint action, helpful persons/groups, denial of services) Maternal factors (including care seeking for complications) • Antenatal care (blood pressure, urine and blood, counseling on food and care seeking), tetanus toxoid, insecticide-treated bed net use, malaria prophylaxis • Birthplace and attendant, partograph, handwashing, clean delivery surface • Knowledge of and care seeking for pregnancy, labor, and delivery complications • Constraints to health care seeking and compliance with referral advice for maternal complications • Quality of health care services (treatment, referral, and reasons for referral for complications) Care seeking for child illnesses • Newborn and child illness recognition, health care seeking, compliance with treatment, and referral advice • Constraints to health care seeking and compliance with treatment and referral advice • Quality of health care services (treatment, referral, and reasons for referral of sick children).

begin gathering the data needed to develop global estimates of these factors. Maternal death inquiry and response

Maternal death audit has been undertaken in many forms, including clinical audit, which evaluates the quality of care provided in health facilities against an accepted standard; confidential inquiry of all or a sample of deaths in a population, most often focusing on medical factors but sometimes including community aspects; facility-based maternal death review, preferably augmented with information from the community; and community-based VASA, most useful in areas where many deaths occur outside of a health facility, and which can be combined with a facility review of cases that did access care for a more accurate assessment of medical factors. From its beginnings with Great Britain’s nationwide system of confidential inquiry into maternal deaths, to its promotion by WHO’s Beyond the Numbers effort and beyond, several developing countries have implemented a system of maternal death audit. Examples include Sri Lanka [30] and Malaysia [31], which review both hospital and home deaths, and South Africa [32], which conducts confidential inquiries of hospital deaths. The most recent large-scale effort has been the MAPEDIR program undertaken in 10 high-mortality states of

Figure 3 Pathway analysis for 800 maternal deaths, April 2005 to September 2007, in eight districts of Orissa, India; PHC: primary health care center, CHC: community health center.

219


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Page 9 of 13

discover what went wrong. Data from the resulting death inquiries were sometimes disturbing to health officials and spurred them to take action. For example, in several states it was found that a disproportionate share of maternal deaths occurred among women of lower castes. Data from the inquiries shared with communities empowered them to understand how future deaths could be prevented and to develop appropriate interventions. Table 3 lists some of the many interventions developed at multiple levels as part of the MAPEDIR response.

programs and government are less responsive, the data can be used for advocacy to promote accountability. Of the seven non-MAPEDIR studies included in the comprehensive review of maternal social autopsy, five strongly supported health programs and three provided some support to community participation and empowerment (Table 1). Two of the studies were either conducted or commissioned by the country’s national maternal health program and their findings were used to help guide the country’s reproductive health strategy [34,35]. In such cases, it is evident that responsive programs are in place and making good use of the social autopsy data. Two additional papers included an author from the government health authority [36,37], suggesting that practical use might also be made of their findings; one of these stressed the importance of disseminating the findings to policymakers, health planners, health professionals, and the community to ensure sensitization and action and recommended implementation of community- and hospital-based maternal death audits [37]. One other paper described the use of the social autopsy data in a participatory audit system consisting of periodic meetings of community and district health staff, with the audits of some deaths also involving community representatives [38]. The two remaining papers proposed health system or community interventions based on their study findings, but took the process no further [39,40]. Similar to the child studies, all but one of these maternal studies made a social diagnosis of mortality determinants, although only two collected data on a representative sample of deaths from a district or larger area. The eighth maternal study reviewed, the MAPEDIR program, collected data on a similar number of careseeking variables (6) as the other seven studies (mean = 4.9), but was unique in its extensive sharing, interpretation, and use of the data for health planning and intervention development by the community, as well as health authorities and government officials (Additional file 1) [10]. The first level of data provided by MAPEDIR was a simple but powerful one in raising awareness and the visibility of the problem. MAPEDIR increased the reporting of maternal deaths by the community and local health providers, most highly in locales where reporting had previously lagged far behind. It accomplished this by first engaging health officials, talking about the problem, and highlighting the reluctance of first-line health workers to report maternal deaths for fear of being blamed and penalized. The importance of a nonblaming approach and a search for systemic causes for the deaths was discussed. Similar sensitization sessions were held with villagers as part of discussions on the need for birth preparedness, complication readiness, and reporting and investigating maternal deaths to

Discussion Meeting the social autopsy objectives

The comprehensive review of the English and French literature found that adoption of the Pathway to Survival framework, starting with Aguilar et al’s 1998 study in Bolivia [8], increased the richness of the data on the care-seeking process collected by child social autopsy. The Pathway studies provided valuable information and an increased understanding of the social, behavioral, and health systems factors affecting care seeking for severe neonatal and child illnesses in developing countries. Thus, while many of the non-Pathway studies made a social diagnosis of mortality determinants, it is likely that the Pathway studies were able to reach a more accurate diagnosis. The Pathway studies also were used somewhat more often to support health programs and communities, but even they fell short of fully achieving this objective – fewer than half were rated as strongly supportive of health programs and only one-quarter were strongly supportive of communities.

Table 3 Some maternal health interventions undertaken in India in response to MAPEDIR’s social autopsy findings • Dholpur, Rajasthan: taxi union, local NGO, and district health society collaborated in planning and running an obstetric help line and referral transport system • Guna, Madhya Pradesh: district mapped maternal deaths and revitalized SHC and PHCs in high mortality areas for 24/7 safe-delivery services; district ensured referral transport to all PHCs via call center and secured vehicles (local communities donated 6/22 vehicles) • Purulia, West Bengal: four gram panchayats (local governance board) initiated and supported van rickshaws intervention for referral transport from isolated villages • West Bengal: state made all public maternity beds nonpaying; expanded JSY to all SC/ST and BPL women • Eight Navajyoti districts, Orissa: functional blood banks and blood storage units • Orissa: state considered how to target men with maternal care-seeking messages BPL = below poverty line; JSY = Janani Suraksha Yojana (institutional care incentive scheme); MAPEDIR = Maternal and Perinatal Death Inquiry and Response program; NGO = non-governmental organization; PHC = primary health care center; RCH II PIP = Reproductive and Child Health Program phase 2 program implementation plan; SC/ST = scheduled castes and tribes; SHC = subhealth care center.

220


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Page 10 of 13

through collective action and discouraging community participation. Supratikto et al. conducted community-based audits of local, individual maternal deaths, and discussed the challenge of preserving confidentiality and nonblaming in this setting [38]. In their research study of three different approaches to village-based audit of neonatal deaths, Patel et al. did not face this problem, though they observed that the bereaved family’s presence at the audit could decrease participants’ willingness to discuss the case, and cautioned that in other settings scrutinizing individual cases might adversely affect family and community relations [42]. The MAPEDIR program’s approach to maintaining confidentiality while utilizing illness narratives was to share composite vignettes of cases illustrating common problems or actual cases from across borders so the participants would be unlikely to know the family of the case being discussed. In the end, the discussion would be brought back to the quantitative data to help prioritize problems illustrated by the individual stories within the context of the community’s overall experience.

The reviewed maternal social autopsy studies collected data on a range of care-seeking variables, with the MAPEDIR program ranking somewhat above average in the number of factors examined, perhaps enhancing its ability to reach an accurate social diagnosis of mortality determinants. However, while most of the maternal studies strongly supported health programs, only the MAPEDIR program also offered strong support to communities. Recent evidence suggests that this is more than a philosophical choice, that community participation and empowerment can strongly impact neonatal and perhaps maternal mortality [41]. Social autopsy serves varied purposes; among these are providing large-scale population-level data to contribute to country and global estimates of mortality determinants, and increasing awareness of maternal and child death as preventable problems in order to empower communities and engage health programs. While most of the reviewed studies did not meet the objective of selecting a representative sample of deaths at the district or larger area level, and so may not be generalizable to overall mortality in the study areas and thereby serve the first listed purpose of social autopsy, nonetheless several were found to strongly support health programs and communities in improving health interventions and access to care. However, this does not negate the importance of selecting cases in as representative a fashion as possible, within existing limitations. For example, if it is known that many deaths in the service area are occurring at home, then a social autopsy investigation should be designed to ensure that home deaths are included in the study sample.

Limitations

Just as in the sharing of social autopsy data with the community, care must be taken in collecting and handling the data to ensure confidentiality of the highest possible level and the respondent’s comfort with providing the information. There is potential for the information to be stigmatizing, for example, if the respondent or another family member delayed taking an action perceived as possibly life-saving. This could affect the respondent’s openness during the interview and the accuracy of the illness reports. Limited recall for remote events is another potential problem, as sample size needs for child social autopsy may require interviewing families with a child death three or more years ago. This issue will be more exacerbated in a study of maternal deaths.

The relationship of social autopsy to death audit

The term “social autopsy” implies that a social diagnosis is to be made of the most common or otherwise important social, behavioral, and health systems determinants of mortality. While data on social factors can play a role in audits of individual deaths, as in facility-based death reviews that include interviews of the deceased’s family members, the term as used in this review refers to aggregate diagnoses made using quantitative data. Social autopsy can also provide qualitative data in the form of individual illness narratives, with the purpose of illustrating the overall care-seeking process, showing how several problems can conspire to cause a death, and putting a human face on the numbers. Sharing and discussion of these narratives, especially at the community level, provides a powerful learning opportunity, but the process must take care to preserve confidentiality and foster a nonblaming approach. Otherwise, there is the danger that the sharing session could deteriorate into a negative atmosphere, diverting attention from the necessary focus on systemic problems that can be fixed

Technical issues

As with verbal autopsy, some challenges remain for social autopsy, both to improve its ease of use and assess the validity of the information it gathers. Item reduction might be possible, both in terms of the number of questions asked and the number of potential responses to multiple-choice questions. This would shorten the duration of the interview and simplify the questions, making it more practical to conduct social autopsy on the platform of national surveys and in the context of local health program planning and monitoring. An integrated VASA interview can take 60 to 90 minutes to complete, and some have argued that the interviews should therefore be separated. Counterarguments are that respondents would

221


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Page 11 of 13

rather not undergo a second visit; that a return visit would be even more impractical on the platform of a national survey; that elements of the verbal autopsy, such as the illness signs and symptoms, must in any case be discussed as part of the social autopsy care-seeking questions; and that by following a more natural chronology an integrated interview promotes an improved interviewer-respondent dialogue and enhances recall of the illness events. The new CAPI software also promises to ease the interview process and shorten its duration. Another argument in favor of separating the verbal and social autopsies is that this might help overcome the potential for the stigma of the social autopsy discussed above to affect disclosure of verbal autopsy information and hence determination of the biological diagnoses. This separation could also foster a specialized approach to social autopsy, enabling a more elaborate informed consent process and a dedicated cadre of interviewers trained to more effectively deal with feelings of anger, guilt, and shame over the death that might arise during the interview. Determining which of the implementation models works best will require additional experience and possibly experimentation. Social autopsy analysis at its most basic level involves outputting frequency distributions of all the questionnaire’s mortality determinants. However, many important questions can be answered only by cross tabulations, such as who the decision-makers were for facility and home delivery, and for which illness signs and symptoms particular actions were taken. Comparisons can also be made of women or children for whom care was and was not sought. This can enhance the identification of care-seeking constraining factors. Such analyses might also help with item reduction by identifying noninformative variables, and potentially expose needed areas of inquiry not covered by the questionnaire. While there is a history of validating verbal autopsy, only one study included in the comprehensive review validated any aspect of the social autopsy. Bojalil et al. independently assessed the clinical competence of doctors mentioned in mothers’ narratives of their child’s fatal illness and found a high correlation between the quality of care they provided as assessed from the narratives and by the competence scores [23]. Jafarey et al. attempted to use medical records to validate third delays identified in narrative accounts of care received for maternal complications at tertiary facilities, but were not able to due to inadequate information in the medical records [37]. In addition to the quality of medical care received, other aspects of social autopsy to consider for validation include caregivers’ reports of care sought or other actions taken, and care-seeking delays and constraining factors.

The way forward

It is apparent that the Pathway to Survival and Three Delays models are useful for organizing the care-seeking process for severe child and maternal illnesses. Future social autopsy studies should be guided by these models, and should also make use of social autopsy’s other strengths of raising awareness through participatory data sharing and intervention development. Just as there have been international efforts to standardize verbal autopsy instruments and analytic methods for neonatal, child, and adult deaths, the utility and acceptability of social autopsy findings could be advanced by reaching agreement on a core dataset to be gathered and on standardized formats and methods for accomplishing this. CHERG’s Pathway Analysis format has the advantage of having been reviewed and updated by scientists from several international organizations, including WHO, but the team was small, the instrument has not been officially endorsed, and there are other groups working to develop their own social autopsy instrument for child deaths, the most prominent being the INDEPTH Network. A process is needed to bring together interested parties, finalize these efforts, and reach agreement on a standardized format and data analysis plan. Similarly, a standardized social autopsy format and analysis plan for maternal deaths is needed. WHO published suggested questions for maternal verbal autopsy, including questions on care seeking [43], that at least two of the papers in the present review used as the basis for their study instruments. UNICEF’s MAPEDIR program developed a standardized VASA format for investigating maternal deaths as well as accompanying materials for training surveyors and sharing the findings with the community, but has not yet published these resources. Developing standardized questionnaires and analysis plans for child and maternal deaths could promote the agenda of routinely collecting and utilizing quality social autopsy data. These instruments should be provided together with tools for country adaptation, fieldworker training, and community data sharing. These efforts will be enhanced by further work to develop the social autopsy method, such as identifying ways to minimize the stigmatizing effect of certain questions and assessing the optimal model for combining or sequencing the verbal and social autopsy interviews.

Conclusions Social autopsy is a powerful tool for raising awareness and the visibility of child and maternal death as preventable problems in the community, among health workers, health authorities, and government officials. It provides evidence in the form of actionable data to communities, health programs, and health policymakers,

222


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

Page 12 of 13

and increases motivation at all levels to take appropriate and effective actions. Health systems and communities of implementing countries, individually and as partners, used the Pathway to Survival and MAPEDIR findings to develop appropriately focused interventions. Additional social autopsy studies reviewed for this paper were utilized by governments to improve health programming. Social autopsy data can also build institutional awareness and political commitment, thus helping to increase health system and governmental accountability and responsiveness. Community participation in the death inquiry and response process may in itself act as an intervention by increasing awareness and motivating communities to take action in a way that increases care seeking. Social autopsy studies conducted in representative, large-scale populations can provide data to develop national and global estimates of social, cultural, and health systems determinants of mortality and to advocate for the resources needed to overcome these problems. New neonatal and child VASA studies are being planned at the country or subnational level in several African countries. Further development of the social autopsy method and development and wide-scale adoption of standardized tools based on the Pathway to Survival and Three Delays models will promote social autopsy’s overall objectives of providing evidence on failures in the pathway to survival and increasing awareness to empower communities and engage health programs in the battle against child and maternal mortality.

their loved one, to increase our understanding so that we might increase theirs. Support for the Bolivia and Guinea VASA studies as well as for development of the Pathway Analysis format was provided by the United States Agency for International Development through The BASICS Project. Support for updating the Pathway Analysis questionnaire and conducting integrated VASA studies in sub-Saharan Africa was provided by the Child Health Epidemiology Reference Group through a grant from the Bill & Melinda Gates Foundation to the US Fund for UNICEF. The conceptualization, piloting and scaling up of MAPEDIR were funded by UNICEF as part of the UNICEF-assisted Country Programme in India. Author details Department of International Health, Johns Hopkins Bloomberg School of Public Health, (615 North Wolfe Street), Baltimore, (21205), USA. 2President’s Malaria Initiative, United States Agency for International Development, (1300 Pennsylvania Avenue), Washington, DC, (20523), USA. 3UNICEF, Chad country office, (PO Box 1146), N’Djamena, Chad. 1

Authors’ contributions HDK co-led the team that developed the Pathway Analysis questionnaire, led the teams that updated this instrument and developed the MAPEDIR questionnaire, participated in the design and conduct of the Pathway Analysis study in Guinea and the MAPEDIR program in India, drafted the manuscript, and co-conducted the comprehensive review of social autopsy studies. RS participated in the design and conduct of the Bolivia Pathway Analysis study, co-led the team that developed the Pathway Analysis questionnaire, and was a member of the team that updated the instrument. MB conceived of the MAPEDIR program in India and led UNICEF monitoring and technical assistance to the government. AK helped draft the methods section of the manuscript and co-conducted the comprehensive review of social autopsy studies. REB conceived of updating the Pathway Analysis questionnaire. All authors critically reviewed and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Received: 1 April 2011 Accepted: 5 August 2011 Published: 5 August 2011

Additional material

References 1. Why Mothers Die 2000-2002: The Sixth Report of the Confidential Enquiries into Maternal Deaths in the United Kingdom , accessed 8 March 2011. 2. Thaddeus S, Maine D: Too far to walk: maternal mortality in context. Social Science & Medicine 1994, 38:1091-1110. 3. Beyond the Numbers: reviewing maternal deaths and complications to make pregnancy safer. World Health Organization, Geneva; 2004 [http:// www.who.int/making_pregnancy_safer/documents/9241591838/en/index. html], accessed 14 July 2011. 4. Mosley WH, Chen LC: An analytic framework for the study of child survival in developing countries. Popul Dev Rev 1984, 10(Suppl):25-45. 5. Claeson M, Waldman RJ: The evolution of child health programmes in developing countries: from targeting diseases to targeting people. Bull World Health Organ 2000, 78:1234-1255. 6. De Brouwere V, Tonglet R, Van Lerberghe W: Strategies for reducing maternal mortality in developing countries: what can we learn from the history of the industrialized West? Trop Med Internat Health 1998, 3:771-782. 7. Hunt P: Report of the Special Rapporteur on the right of everyone to the enjoyment of the highest attainable standard of health United Nations General Assembly; 2010 [http://www.unhcr.org/refworld/country,,, FACTFINDING,IND,,4c0367cf2,0.html], accessed 14 July 2011 A/HRC/14/20/ Add.2. 8. Aguilar AM, Alvarado R, Cordero D, Kelly P, Zamora A, Salgado R: Mortality Survey in Bolivia: The Final Report. Investigating and Identifying the Causes of Death for Children Under Five The Basic Support for Institutionalizing Child Survival (BASICS) Project. Arlington, VA; 1998 [http://pdf.usaid.gov/pdf_docs/ PNACF082.pdf], accessed 14 July 2011. 9. Schumacher R, Swedberg E, Diallo MO, Keita DR, Kalter HD, Pasha O: Mortality study in Guinea: Investigating the causes of death for children under

Additional file 1: Studies and reports meeting the inclusion criteria of the comprehensive review (with detailed findings). CS: care seeking; SA: social autopsy; HH: household; ARI: acute respiratory infection; MOH: ministry of health; PHC: primary health care; AD: acute diarrhea; VA: verbal autopsy; DSS: demographic and surveillance site; PtoS: Pathway to Survival; SB: stillbirth; DHS: Demographic and Health Survey; CHW: community health worker; VHC: village health committee; ANC: antenatal care; IMCI: Integrated Management of Childhood Illness; MOHP: ministry of health and planning; GOWB: Government of West Bengal; NGO: non-governmental organization; PRI: panchayat raj institutions; Δ and □: studies ranked, respectively, as providing “any” and “strong” support to health programs and communities.

Acknowledgements and funding The authors thank Save the Children for permission to present the data in Figure 2, which were collected by Save the Children in Guinea, and Eric Swedberg of Save the Children for his leadership in conducting the Pathway Analysis study at the Guinea project site. We also acknowledge the contributions of dozens of NGO project staff and health workers in Bolivia, Guinea, and India to the social autopsy studies that we participated in and led in those countries; and of the many Government of India MOHFW officials and health workers and UNICEF health officers and fieldworkers who helped bring the MAPEDIR program to life in order to overcome death. Most of all, we acknowledge the contributions of the thousands of family members of deceased children and women who gave of their time and energy to participate in the verbal/social autopsy inquiries of the deaths of

223


Kalter et al. Population Health Metrics 2011, 9:45 http://www.pophealthmetrics.com/content/9/1/45

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26. 27.

28.

Page 13 of 13

5 The Basic Support for Institutionalizing Child Survival Project (BASICS II). Arlington, VA; 2002 [http://www.basics.org/documents/pdf/guinea_mort. pdf], accessed 14 July 2011. Maternal and Perinatal Death Inquiry and Response: Empowering communities to avert maternal deaths in India UNICEF, New Delhi; 2008 [http://www.unicef.org/india/MAPEDIRMaternal_and_Perinatal_Death_Inquiry_and_Response-India.pdf], accessed 8 March 2011. Bryce J, Victora CG, Habicht JP, Black RE, Scherpbier RW, the MCE-IMCI technical advisors: Programmatic pathways to child survival: results of a multi-country evaluation of Integrated Management of Childhood Illness. Health Policy Plan 2005, 20(suppl 1):15-17. Waldman R, Campbell CC, Steketee RW: Overcoming remaining barriers: the pathway to survival Current issues in child survival series, The Basic Support for Institutionalizing Child Survival (BASICS) Project. Arlington, VA; 1996 [http://pdf.usaid.gov/pdf_docs/PNABZ644.pdf], accessed 14 July 2011. Sustrisna B, Reingold A, Kresno S, Harrison G, Utomo B: Care-seeking for fatal illnesses in young children in Indramayu, West Java, Indonesia. Lancet 1993, 342:787-780. Gutiérrez G, Reyes H, Martinez H, Tomé P, Guiscafré H: Study of the disease-health seeking-death process: another use of the verbal autopsy. International Journal of Epidemiology 1994, 23:427-428. Sodemann M, Jakobsen MS, Molbak K, Alvarenga IC Jr, Aaby P: High mortality despite good care-seeking behavior: a community study of childhood deaths in Guinea-Bissau. Bull World Health Organ 1997, 75:205-212. Hinderaker SG, Olsen BE, Bergsjo PB, Gasheka P, Lie RT, Havnen J, Kvale G: Avoidable stillbirths and neonatal deaths in rural Tanzania. BJOG 2003, 110:616-623. de Savigny D, Mayombana C, Mwageni E, Masanja H, Minhaj A, Mkilindi Y, Mbuya C, Kasale H, Reid G: Care-seeking patterns for fatal malaria in Tanzania. Malaria Journal 2004, 3:27. Beiersmann C, Sanou A, Wladarsch E, De Allegri M, Kouyaté B, Müller O: Malaria in rural Burkina Faso: local illness concepts, patterns, traditional treatment and influence on health-seeking behaviour. Malaria Journal 2007, 6:106. de Bocaletti E, Schumacher R, Hurtado E, Bailey P, Matute J, McDermott J, Moore J, Kalter HD, Salgado R: Perinatal mortality in Guatemala: Community study MotherCare; 1999 [http://pdf.usaid.gov/pdf_docs/PNACJ798.pdf], accessed 14 July 2011. Terra de Souza AC, Peterson KE, Andrade FMO, Gardner J, Ascherio A: Circumstances of post-neonatal deaths in Ceara, Northeast Brazil; mother’s health care-seeking behaviors during their infant’s fatal illness. Social Science & Medicine 2000, 51:1675-1693. The Pathway to child health (Siem Reap, Pursat, Stung Treng, and Kampot) The Reproductive and Child Health Alliance (RACHA); 2000 [http://rc.racha. org.kh/docDetails.asp?resourceID=32&categoryID=3], accessed 14 July 2011. Bandari N, Bahl R, Taneja S, Martines J, Bhan M: Pathways to infant mortality in urban slums of Delhi, India: implications for improving the quality of community- and hospital-based programmes. J Health Popul Nutr 2002, 20:148-155. Bojalil R, Kirkwood BR, Bobak M, Guiscafre H: The relative contribution of case management and inadequate care-seeking behaviour to childhood deaths from diarrhea and acute respiratory infections in Hidalgo, Mexico. Trop Med and International Health 2007, 12:1545-1552. Waiswa P, Källander K, Peterson S, Tomson G, Pariyo GW: Using the three delays model to understand why newborn babies die in eastern Uganda. Trop Medicine and International Health 2010, 15:964-972. Kalter HD, Salgado R, Gittelsohn J, Parades P: A Guide to Conducting Mortality Surveys and Surveillance The Basic Support for Institutionalizing Child Survival Project (BASICS II). Arlington, VA; 2004 [http://www.jsi.com/ JSIInternet/Resources/Publications/childsurvival.cfm], accessed 22 June 2011. Child Health Epidemiology Reference Group:[http://cherg.org/main.html], accessed 8 March 2011. Lawn JE, Cousens S, Zupan J, for the Lancet Neonatal Survival Steering Team: 4 million neonatal deaths: When? Where? Why? Lancet 2005, 365:891-900. Boschi-Pinto C, Young M, Black RE: The child health epidemiology reference group reviews of the effectiveness of interventions to reduce maternal, neonatal and child mortality. International J Epidemiol 2010, 39: i3-i6.

29. Fisher AA, Way AA: The demographic and health surveys program: An overview. International Family Planning Perspectives 1988, 14:15-19. 30. Perera MALR: High maternal mortality and morbidity: the shame of the South-East Asia region. Regional Health Forum WHO South-East Asia Region 2006, 6(2)[http://www.searo.who.int/en/Section1243/Section1310/ Section1343/Section1344/Section1356_5336.htm], accessed 21 March 2011. 31. Suleiman AB, Mathews A, Jegasothy R, Ali R, Kandiah N: A strategy for reducing maternal mortality. Bull World Health Organ 1999, 77:190-3. 32. Saving Mothers 2005-2007: Fourth Report on Confidential Enquiries into Maternal Deaths in South Africa; Expanded Executive Summary National Committee on Confidential Enquiries into Maternal Deaths; 2008 [http:// www.doh.gov.za/docs/reports/2007/savingmothers.pdf], accessed 21 March 2011. 33. iGovernment: India announces audit of maternal deaths.[http:// igovernment.in/site/india-announces-audit-maternal-deaths-38295], accessed 8 March 2011. 34. Bartlett LA, Mawji S, Whitehead S, Crouse C, Dalil S, Lonete D, Salama P, the Afghan Maternal Mortality Study Team: Where giving birth is a forecast of death: maternal mortality in four districts of Afghanistan, 1999-2002. Lancet 2005, 365:864-870. 35. Campbell O, Gipson R, Issa AH, Matta N, El Deeb B, El Mohandes A, Alwen A, Mansour E: National maternal mortality ratio in Egypt halved between 1992-93 and 2000. Bull World Health Organ 2005, 83:462-471. 36. Castro R, Campero L, Hernández B, Langer A: A study on maternal mortality in Mexico through a qualitative approach. J of Women’s Health & Gender-Based Medicine 2000, 9:679-690. 37. Jafarey SN, Rizvi T, Koblinsky M, Kureshy N: Verbal autopsy of maternal deaths in two districts of Pakistan–filling information gaps. J Health Popul Nutri 2009, 27:170-183. 38. Supratikto G, Wirth ME, Achadi E, Cohen S, Ronsmans C: A district-based audit of the causes and circumstances of maternal deaths in South Kalimantan, Indonesia. Bull World Health Organ 2002, 80:228-234. 39. Fawcus S, Mbizvo M, Lindmark G, Nystrőm L: A community-based investigation of avoidable factors for maternal mortality in Zimbabwe. Studies in Family Planning 1996, 27:319-327. 40. D’Ambruoso L, Byass P, Qomariyah N, Quedraogo M: A lost cause? Extending verbal autopsy to investigate biomedical and socio-cultural causes of maternal death in Burkina Faso and Indonesia. Social Science & Medicine 2010, 71:1728-1738. 41. Manandhar DS, Osrin D, Shrestha BP, Mesko N, Morrison J, Tumbahangphe KM, Tamang S, Thapa S, Shrestha D, Thapa B, Shrestha JR, Wade A, Borghi J, Standing H, Manandhar M, Costello AM, de L, members of the MIRA Makwanpur trial team: Effect of a participatory intervention with women’s groups on birth outcomes in Nepal: cluster-randomized controlled trial. Lancet 2004, 364:970-79. 42. Patel Z, Kumar V, Singh P, Singh V, Yadav R, Baqui AH, Santosham M, Awasthi S, Singh JV, Darmstadt GL: Feasibility of community neonatal death audits in rural Uttar Pradesh, India. Journal of Perinatology 2007, 27:556-64. 43. Campbell O, Ronsmans C: Verbal autopsies for maternal deaths, report of a WHO workshop, London 10-13 January 1994 World Health Organization, Geneva; (WHO/FHE/MSM/95.15); 1995. doi:10.1186/1478-7954-9-45 Cite this article as: Kalter et al.: Social autopsy for maternal and child deaths: a comprehensive literature review to examine the concept and the development of the method. Population Health Metrics 2011 9:45.

224


Byass et al. Population Health Metrics 2011, 9:46 http://www.pophealthmetrics.com/content/9/1/46

RESEARCH

Open Access

Using verbal autopsy to track epidemic dynamics: the case of HIV-related mortality in South Africa Peter Byass1,2*, Kathleen Kahn1,2, Edward Fottrell2,3, Paul Mee1, Mark A Collinson1,2 and Stephen M Tollman1,2

Abstract Background: Verbal autopsy (VA) has often been used for point estimates of cause-specific mortality, but seldom to characterize long-term changes in epidemic patterns. Monitoring emerging causes of death involves practitioners’ developing perceptions of diseases and demands consistent methods and practices. Here we retrospectively analyze HIV-related mortality in South Africa, using physician and modeled interpretation. Methods: Between 1992 and 2005, 94% of 6,153 deaths which occurred in the Agincourt subdistrict had VAs completed, and coded by two physicians and the InterVA model. The physician causes of death were consolidated into a single consensus underlying cause per case, with an additional physician arbitrating where different diagnoses persisted. HIV-related mortality rates and proportions of deaths coded as HIV-related by individual physicians, physician consensus, and the InterVA model were compared over time. Results: Approximately 20% of deaths were HIV-related, ranging from early low levels to tenfold-higher later population rates (2.5 per 1,000 person-years). Rates were higher among children under 5 years and adults 20 to 64 years. Adult mortality shifted to older ages as the epidemic progressed, with a noticeable number of HIV-related deaths in the over-65 year age group latterly. Early InterVA results suggested slightly higher initial HIV-related mortality than physician consensus found. Overall, physician consensus and InterVA results characterized the epidemic very similarly. Individual physicians showed marked interobserver variation, with consensus findings generally reflecting slightly lower proportions of HIV-related deaths. Aggregated findings for first versus second physician did not differ appreciably. Conclusions: VA effectively detected a very significant epidemic of HIV-related mortality. Using either physicians or InterVA gave closely comparable findings regarding the epidemic. The consistency between two physician coders per case (from a pool of 14) suggests that double coding may be unnecessary, although the consensus rate of HIV-related mortality was approximately 8% lower than by individual physicians. Consistency within and between individual physicians, individual perceptions of epidemic dynamics, and the inherent consistency of models are important considerations here. The ability of the InterVA model to track a more than tenfold increase in HIV-related mortality over time suggests that finely tuned “local” versions of models for VA interpretation are not necessary.

Background Verbal autopsy (VA) has become a widely established approach for characterizing cause of death patterns in settings where individual deaths are not routinely certified as to cause, with a variety of methods being used for both interview and interpretation phases [1]. Most often, VA has been applied for particular times, or over

relatively short periods, to obtain point estimates of cause-specific mortality. However, as archives of VA data accumulate over time, possibilities of studying epidemic dynamics using VA approaches emerge. This is of interest in terms of measuring potential newly emerging causes of death [2], as well as for monitoring the dynamics of epidemiological transition [3]. But it also raises new methodological challenges, for example around consistent interpretation of VA into causes of death over long periods of time and consequently around practitioners’ developing perceptions of new situations. More generally, it raises the question of how

* Correspondence: peter.byass@epiph.umu.se 1 MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa Full list of author information is available at the end of the article

© 2011 Byass et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

225


Byass et al. Population Health Metrics 2011, 9:46 http://www.pophealthmetrics.com/content/9/1/46

Page 2 of 8

The Agincourt Health and Socio-Demographic Surveillance Site in the rural northeast of South Africa has been documenting a geographically-defined population (around 70,000 people in 2005) since 1992, including registering deaths and following those up with VA interviews [20]. The start of this surveillance in 1992 coincided with the early stages of the HIV epidemic (at least in terms of HIV-related mortality) in this area, and hence the accumulated VA data enable a methodological exploration as to how the epidemic evolved. Our primary aim is to characterize the epidemic of HIV-related mortality in this population, comparing both physicianinterpreted causes of death and probabilistically modeled causes of death from the same VA interview material. As subsidiary aims, we investigate (1) approaches for handling common co-causes of HIV-related mortality, such as tuberculosis, malnutrition, and chronic gastroenteritis, and (2) variations between different coding physicians’ responses to the emerging epidemic. Although this paper deals specifically with an epidemic of HIV-related mortality, findings are discussed in terms of using VA for monitoring long-term dynamics in mortality patterns.

effectively VA methods are able to detect newly emerging causes of death. Over the past two decades, southern Africa has experienced a massive and rapidly developing epidemic of HIV infection and associated mortality [4-6]. However, large-scale modeled estimates provide a rather imperfect picture of the epidemic, given that most deaths in southern Africa are neither certified nor medically investigated [7]. Localized populations with intensive surveillance, such as member centers of the INDEPTH Network [8], provide opportunities to look at specific examples in detail [9-11], even if this may generate a subsequent debate as to generalizability. A number of studies elsewhere have established the validity of VA methods for attributing deaths to HIV/AIDS, particularly among adults [12-16]. Nevertheless, there remain some unresolved issues about how to best handle cocauses of mortality in cases of HIV-related death, and willingness to attribute deaths to HIV, whatever methods are used, may be influenced by nonmedical factors such as social stigmatization [17,18]. HIV-related deaths are complex to count, since HIVpositive individuals are frequently affected by other diseases as a result of being immunologically compromised, and it can be difficult from VA data, in the absence of HIV serology, to determine the relative significance of AIDS versus other diseases in the processes leading to death. The 10th version of the International Classification of Diseases (ICD-10) uses codes B20 to B24 as underlying causes representing HIV/AIDS in combination with other disease categories (B20 infectious and parasitic diseases, B21 malignant neoplasms, B22 other diseases including wasting, B23 other conditions, and B24 nonspecific AIDS) [19]. However, differentiating probable HIV-related deaths detected by VA into these subcategories may not be easy to achieve, particularly where there is no explicit evidence of HIV positivity. The ability to interpret any VA interview reliably depends on several factors, including the quality and detail of information on signs and symptoms provided by the informant. In settings where stigma is high around a particular cause of death - as is often the case for HIV - sensitive information may be withheld from the interviewer. Extent of nondisclosure is likely to vary as an epidemic develops, starting from minimal levels when key symptoms are not yet widely known by informants, and when physicians may also not yet be attuned to a particular diagnosis. As a significant epidemic such as HIV/AIDS develops, stigma is likely to rise, together with nondisclosure of relevant details. In a mature epidemic - particularly in the case of HIV as antiretroviral treatments are rolled out - nondisclosure may wane. These patterns may have significant effects on the outcomes of VA interpretation.

Methods The analyses in this paper are based on the entire series of 6,153 deaths (among all ages) in the Agincourt population from 1992 to 2005, as previously described in terms of primary-care planning [21] and in a comparison between physician and modeled VA interpretation [22]. VA interviews were successfully completed for 5,794 deaths (94.2%), using a questionnaire developed before international standards were agreed upon. These VA interviews were subsequently coded by two independent physicians who attempted to reach consensus where their diagnoses differed, with a third reviewing and intervening in case of disagreement. If no consensus could be reached, the cause of death was recorded as “undetermined.� During the period from 1992 to 2005, 14 physician reviewers were involved in VA interpretation during various subperiods. In 373 (6.4%) of VA reviews, it was not possible to trace the identities of the coding physicians. The InterVA model (http://www. interva.net) was also applied to the VA interview material, as described previously [22]. This public-domain model relates input indicators (history, signs, symptoms from VA interview material) to likely cause(s) of death using Bayesian probabilities. A standard grid of conditional prior probabilities was defined by an expert panel of physicians [23]. The model has subsequently been evaluated in a number of settings [22,24]. As a standard model designed for cause of death determination in low- and middle-income countries, it has the advantage of consistency over time and place [25].

226


Byass et al. Population Health Metrics 2011, 9:46 http://www.pophealthmetrics.com/content/9/1/46

Page 3 of 8

“low” HIV, the number of cases most likely due to HIVrelated causes decreased from 57 to eight out of a total of 707 deaths (8.1% to 1.1%). Physician consensus findings for the same period recorded 13 cases (1.8%), although a total of 20 cases (2.8%) were HIV-related according to at least one physician. However, among the 51 cases rated as HIV-related by the model ("high” setting) but not by physician consensus for this period, the most common underlying cause attributed by physicians was malnutrition (nine cases, 17.6%). By contrast, overall physician consensus results for 1992-1994 recorded 4.2% for malnutrition, compared with 1.3% for 1995-2005.

A dataset was compiled (using Microsoft FoxPro) containing the two independent physician interpretations (main cause, possible immediate and contributing causes with ICD-10 codes), the physicians’ consensus finding as to underlying cause (based primarily on the individual physicians’ main cause findings), and the InterVA version 3.2 results (up to three likely causes per case, each associated with a quantified likelihood). The HIV level for the InterVA model was set to “high” and malaria set to “low,” based on existing knowledge of causes of death in this population, as discussed previously [22]. The concept behind this setting in the InterVA model is analogous to a coding physician knowing that HIV or malaria represent more-common or less-common public health problems in a particular population, irrespective of the details around any individual death or detailed prior knowledge of cause-specific mortality. Age groups were defined as under 1 year, 1 to 4 years, 5 to 19 years, 20 to 49 years, 50 to 64 years, and 65 years and over. Analyses used Stata 10. Surveillance-based studies in the Agincourt subdistrict were reviewed and approved by the Committee for Research on Human Subjects (Medical) of the University of the Witwatersrand, Johannesburg, South Africa (protocol M960720). Informed consent was obtained at the individual and household levels at every follow-up visit, whereas community consent from civic and traditional leadership was secured at the start of surveillance and reaffirmed from time to time. Feedback on cause of death patterns is presented to local communities and health service providers annually.

Effects of different approaches for estimating HIV-related mortality

In addition to the physician consensus material on underlying causes of death that were identified as HIVrelated, an additional 18 cases involved HIV as the physician consensus contributory cause. From this revised total of 1,154 HIV-related deaths, 693 (60.0%) were concluded in the physician consensus to have an infection (ICD B20), out of which 148 (12.7%) were specifically mentioned as tuberculosis. Ten cases (0.9%) had malignancies (B21), and 99 (8.6%) had chronic gastroenteritis or malnutrition (B22). Using the alternative approach of the InterVA model, a total of 1,237 cases were rated as probably HIVrelated, although in 91 of these HIV was not the most likely cause. Of the 1,237 cases, 156 (12.6%) were also identified as being associated with tuberculosis and 10 (0.8%) with other infections (B20), three (0.2%) with malignancies (B21), and 18 (1.5%) with chronic gastroenteritis or malnutrition (B22).

Results The evolving epidemic of HIV-related mortality

Figure 1 shows the evolution of HIV-related mortality, both overall and by age group, in the Agincourt population, calculated as the rates (per 1,000 person-years) of physician consensus underlying cause being coded as ICD-10 B20-B24 (1,136 deaths, 18.4%), or the rates of most likely cause from InterVA being HIV/AIDS-related death (1,146 deaths, 18.6%). Both approaches showed very similar patterns over time and within age groups, with a huge increase from no HIV-related deaths in 1992 to 2.5 per 1,000 person-years in 2005 according to physician coding, and correspondingly from 0.2 to 2.6 per 1,000 person-years according to InterVA. Table 1 shows numbers of deaths according to physicians and InterVA, by age, sex, and period. Only 63/6,153 (1.0%) of the overall VA records explicitly mentioned HIV positivity in the interview material, so the overwhelming majority of conclusions on HIVrelated deaths both by the physicians and the model reflected circumstantial findings. When data for the period from 1992 to 1994 were rerun with InterVA set to

Interphysician variations in attributing HIV-related mortality

Of the 14 physicians coding this series of VAs, two completed very few (two and 16 cases respectively) and have been excluded from further consideration of interphysician variation. Of the 12 remaining, there were between two and five physicians coding VAs in any one year. No individual carried out work over the entire period. Figure 2 shows the overall proportions of physician consensus and InterVA HIV-related deaths by year, together with the proportions rated by the various physicians. In addition, the “low” HIV InterVA results for 1992-1994 are shown. Table 2 shows the proportions of HIV-related deaths as coded by first and second physician coders (irrespective of individual physician identity) compared with the revised physician consensus proportions, by year. The overall proportion of HIV-related mortality after achieving consensus was around 8% lower than single physician opinions (19.9% compared with 21.6%, ratio 0.92).

227


Byass et al. Population Health Metrics 2011, 9:46 http://www.pophealthmetrics.com/content/9/1/46

Page 4 of 8

12

[a]

under 1 y 10

1Ͳ4 y

Rate per 1,000 personͲyears

5Ͳ19 y 20Ͳ49 y 8

50Ͳ64 y 65 y & over

6

overall

4

2

0 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 year 12

[b]

under 1 y 10

1Ͳ4 y

Rate per 1,000 personͲyears

5Ͳ19 y 20Ͳ49 y 8

50Ͳ64 y 65 y & over

6

overall

4

2

0 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 year Figure 1 HIV-specific mortality rates by age group by (a) physician consensus interpretation of VA data and (b) by InterVA interpretation (HIV-related death as most likely cause).

tenfold over a 14-year period, was successfully detected and tracked by means of VA, in the absence of any more rigorous routine procedures for following up deaths and their causes. Although one might not argue

Discussion It is clear that the progression of the epidemic of HIVrelated mortality in this rural South African community, with population-based rates increasing more than

228


Byass et al. Population Health Metrics 2011, 9:46 http://www.pophealthmetrics.com/content/9/1/46

Page 5 of 8

the epidemic. Intuitively plausible trends, such as the increasing age of HIV-related deaths observed as the epidemic developed (according to both approaches), presumably following developments in care and treatment, are encouraging. The InterVA model was not specifically designed to deliver ICD-10 codes, and so the major comparison here was equivalence at the B2* level rather than at the third digit level. As is usually the case where VA is used, there is no gold standard against which to absolutely compare these findings. Even if we knew the HIV serostatus for every death, there would still be difficulties in determining which deaths were actually attributable to HIV. However, it is very unlikely that the closely similar epidemic patterns shown for the two methods in Figure 1 would be similar entirely by chance, and in that sense both lend credence to the other. But, as we have noted previously [22], the physician approach was very time-consuming and expensive compared with probabilistic modelling, and the delays and expense involved in the physician process may be hard to justify from these results. Since the “two physicians plus arbitrator” model of physician interpretation seems to have become a de facto (but not necessarily “gold”) standard in much VA work, it is perhaps surprising that there have been few detailed analyses of individual physicians’ opinions compared with physician consensus findings in VA studies using this method, with some exceptions [29,30]. It is also important in this context to remember that concurrent findings do not necessarily constitute “truth” [31]. In the particular setting of this epidemic, where the incidence of HIV-related deaths was changing at a rate that was not necessarily clear to physicians at the time, especially in the early stages of epidemic, it was particularly relevant to examine the ways in which individual physician interpreters responded to the changing situation, as well as the effect on consensus findings. It is also noteworthy that a relatively large number of individual physicians were involved in the process over the 14-year period; it would be surprising if this were not the case in most longer-term VA operations. It is worth noting that large studies using multiple physicians to interpret cause of death are difficult to interpret and understand if details about interobserver effects are not presented. It is also clear from the results in Figure 2 that, in general, consensus rates tended to be slightly lower than individual physician rates, particularly in the later years. This could have important implications in considering whether to use only a single coding physician per case, as has previously been suggested [32]. While there was generally good consistency between first and second physician findings (averaging over individual physicians) as shown in Table 2, the generally slightly lower rates of HIV-related mortality from the consensus process

Table 1 Characteristics of HIV-related deaths in Agincourt, South Africa, using VA data interpreted according to physician consensus on underlying cause and InterVA most likely cause characteristic

period 1992-94

1995-97

19982001

2002-05

physician consensus underlying cause = B20-B24 (n = 1,136) total sex age group

13

69

357

697

female

6

30

177

365

male

7

39

180

332

under 1 year

2

4

27

55

1 to 4 years

1

12

52

71

5 to 19 years

0

1

6

16

20 to 49 years

10

48

255

480

50 to 64 years

0

4

17

67

65 years & over

0

0

0

8

27.5 (15.7)

30.2 (17.7)

27.6 (16.1)

30.9 (17.4)

mean age (SD)

InterVA most likely cause = HIV-related (n = 1,146) total sex age group

57

79

313

697

female

34

42

177

425

male

23

37

136

272

under 1 year

5

7

41

96

1 to 4 years

14

20

72

122

5 to 19 years

7

5

7

26

20 to 49 years

26

41

168

386

50 to 64 years

5

5

19

63

65 years & over

0

1

6

4

25.0 (20.6)

25.7 (20.8)

24.3 (19.8)

26.4 (19.7)

mean age (SD)

for VA as the epidemiological method of choice for this purpose, the reality across much of the world is that there is no realistic alternative for the time being [26]. Even where deaths are supposed to be certified, there can be considerable difficulties in accurately capturing and recording deaths related to HIV/AIDS [27]. How VA material can best be interpreted into cause of death findings including HIV-related mortality is thus a very important issue, which can then form the basis of understandings of population health, for example patterns of social disparities [28]. The validity, reliability, and consistency with which VA data can be interpreted, particularly in terms of HIV-related mortality, are important issues. Both the physician-based and modeled approaches presented here yielded very similar results in terms of characterizing

229


Byass et al. Population Health Metrics 2011, 9:46 http://www.pophealthmetrics.com/content/9/1/46

Page 6 of 8

Table 2 HIV-related deaths (numbers and proportions) according to first and second physician coders and physician consensus, by year

50 45 %

40

year

deaths

35 30 25

HIV-related deaths (ICD codes B20-B24) n (%) first physician

second physician

physician consensus

1992

152

0 (0)

0 (0)

0 (0)

1993

260

3 (1.2)

1 (0.4)

1 (0.4)

1994

254

15 (5.9)

12 (4.7)

12 (4.7)

1995

300

17 (5.7)

13 (4.3)

14 (4.7)

Figure 2 Proportion of HIV-related deaths by year, according to physician consensus (heavy solid line) and opinions from 12 individual physicians who participated in coding during various periods (thin lines joining markers). The InterVA model results are represented by the heavy dashed line, with the alternative “low” HIV setting for 1992-1994 represented by the dotted line.

1996

276

25 (9.1)

24 (8.7)

24 (8.7)

1997

249

33 (13.3)

35 (14.1)

31 (12.4)

1998

361

59 (16.3)

66 (18.3)

62 (17.2)

1999

381

58 (15.2)

68 (17.8)

68 (17.8)

would probably result in slightly higher levels of “undetermined” cause of death in an all-cause analysis than might have resulted from using only a single physician coder. Around the inception of this HIV-related mortality epidemic, the relationship between individual physicians, consensus results and the “low” and “high” HIV settings for the InterVA model are particularly interesting. The proportional differences in rates among the various approaches were greatest during the first three years, as is clear from Figure 2. Initial work on the InterVA model suggested that only causes likely to vary by an order of magnitude in terms of overall proportion needed to have an adjustment [23], with the crossover between “low” and “high” being at around 1% of total mortality. The “high” setting was therefore the appropriate one overall here. The analogous “setting” in physician coding is represented by a physician’s awareness of how common HIV-related mortality is in a population, irrespective of the detailed circumstances of a particular case. Physician consensus rates gave the lowest measure of HIV in the early years, and it seems that in the uncertain early stages of the epidemic it was particularly difficult to achieve consensus, even though some deaths were considered as HIV-related by one physician. This supposition is indirectly supported by finding that the physicians’ highest rates of malnutrition-related mortality were recorded during that period, probably representing a misclassification of deaths that were at least partly HIV-related. Thus the reality here is that the HIV-related mortality rates between 1992 and 1994 were probably somewhere in between the various

2000

422

97 (23.0)

96 (22.7)

98 (23.2)

2001

499

123 (24.6)

126 (25.3)

131 (26.3)

2002

594

137 (23.1)

148 (24.9)

150 (25.3)

2003

701

182 (26.0)

176 (25.1)

173 (24.7)

2004

648

237 (36.6)

236 (36.4)

210 (32.4)

2005

697

265 (38.0)

253 (36.3)

180 (25.8)

overall

5,794

1,251 (21.6)

1,254 (21.6)

1,154 (19.9)

20 15 10 5 0 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

estimates shown in Figure 2. Conversely, individual physicians recorded appreciably more HIV-related mortality in the later years, compared with both the consensus and modeled findings, possibly reflecting physicians’ inflated views of HIV latterly. Additionally, nondisclosure of sensitive details in VA interviews at various stages of the epidemic may have compromised both the physicians’ and model’s findings. In the case of the model, it is important to note that the HIV rates over the period increased tenfold without any information being given to the model about a likely increase over time. This illustrates the relatively noncritical magnitudes of the cause-specific prior probabilities incorporated in the model, and supports the notion that a single model can be used for interpreting VA data over wide ranges of time and place, maximizing the benefits of consistency for comparative purposes over different settings.

230


Byass et al. Population Health Metrics 2011, 9:46 http://www.pophealthmetrics.com/content/9/1/46

Page 7 of 8

3.

Conclusions VA was clearly able to identify the emergence and growth of a very significant epidemic of HIV-related mortality in this population, and using either physicians or probabilistic modeling to derive cause of death findings gave closely similar results. The evidence suggests that physicians were perhaps a little slow to recognize the early stages of the epidemic, while the model (at least when set to expect a “high” level of HIV mortality) may have slightly overestimated initially. However, the fact that a numerically constant model was able to characterize a greater-than-tenfold increase in HIV-related mortality over time is an important demonstration of the relative robustness of probabilistic modeling for VA interpretation. This suggests that there is no need for finely tuned “local” versions of models for VA interpretation, the proliferation of which would detract from the comparability of results over time and place.

4.

5.

6.

7. 8.

9.

10. 11.

Acknowledgements We value the contribution of verbal autopsy informants, community leaders, study communities, supervisors, and field workers. The Umeå Centre for Global Health Research is supported from FAS, the Swedish Council for Working Life and Social Research (http://www.fas.se) (grant no. 2006-1512). The Agincourt Health and Socio-Demographic Surveillance System, including conduct of verbal autopsies, is funded by The Wellcome Trust, UK (http:// www.wellcome.ac.uk)(Grant numbers 058893/Z/99/A, 069683/Z/02/Z and 085477/Z/08/Z) and the University of the Witwatersrand and Medical Research Council, South Africa. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

12.

13. 14.

15.

Author details 1 MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa. 2Umeå Centre for Global Health Research, Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden. 3Centre for International Health and Development, Institute of Child Health, University College London, London, UK.

16.

17. 18.

Authors’ contributions PB conceived the study, analyzed data, and drafted the manuscript. All authors contributed to critical reviews and developments in the paper. KK established and managed the VA system, and KK and SMT were involved in physician interpretation and arbitration of VA material. EF, PM, and MAC managed data and quality control. All authors read and approved the final manuscript.

19.

20.

Competing interests The authors declare that they have no competing interests.

21.

Received: 9 February 2011 Accepted: 5 August 2011 Published: 5 August 2011

22.

References 1. Fottrell E, Byass P: Verbal Autopsy - methods in transition. Epidemiologic Reviews 2010, 32:38-55. 2. Nsubuga P, Nwanyanwu O, Nkengasong JN, Mukanga D, Trostle M: Strengthening public health surveillance and response using the health systems strengthening agenda in developing countries. BMC Public Health 2010, 10(Suppl 1):S5.

23.

24.

231

Karar ZA, Alam N, Streatfield PK: Epidemiological transition in rural Bangladesh, 1986-2006. Global Health Action 2009, 2. Joint United Nations Program on HIV/AIDS: Global report: UNAIDS report on the global AIDS epidemic 2010 Geneva: UNAIDS; 2010 [http://www.unaids. org/en/media/unaids/contentassets/documents/unaidspublication/2010/ 20101123_globalreport_en.pdf], ISBN 978-92-9173-871-7. Dorrington RE, Johnson LF, Bradshaw D, Daniel T: The Demographic Impact of HIV/AIDS in South Africa. National and Provincial Indicators for 2006 Cape Town: Centre for Actuarial Research, South African Medical Research Council and Actuarial Society of South Africa; 2006 [http://www.mrc.ac.za/ bod/DemographicImpactHIVIndicators.pdf]. Bradshaw D, Nannan N, Groenewald P, Joubert J, Laubscher R, Nojilana B, Norman R, Pieterse D, Schneider M: Provincial mortality in South Africa, 2000 - priority-setting for now and a benchmark for the future. South African Medical Journal 2005, 95:496-503. Byass P: The Imperfect World of Global Health Estimates. PLoS Medicine 2010, 7:e1001006. Bangha M, Diagne A, Bawah A, Sankoh O: Monitoring the Millennium Development Goals: the potential role of the INDEPTH Network. Global Health Action 2010, 3:5517. Kanjala C, Alberts M, Byass P, Burger S: Spatial and temporal clustering of mortality in Digkale HDSS in rural northern South Africa. Global Health Action 2010, Supp 1. Hosegood V, Vanneste A-M, Timæus IM: Levels and causes of adult mortality in rural SouthAfrica: the impact of AIDS. AIDS 2004, 18:663-671. Garrib A, Jaffar S, Knight S, Bradshaw D, Bennish ML: Rates and causes of child mortality in an area of high HIV prevalence in rural South Africa. Tropical Medicine and International Health 2006, 11:1841-1848. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: their development and validation in a multicentre study. Tropical Medicine and International Health 1998, 3:436-446. Doctor HV, Weinrebb AA: Estimation of AIDS adult mortality by verbal autopsy in rural Malawi. AIDS 2003, 17:2509-2513. Lopman B, Cook A, Smith J, Chawira G, Urassa M, Kumogola Y, Isingo R, Ihekweazu C, Ruwende J, Ndege M, Gregson S, Zaba B, Boerma T: Verbal autopsy can consistently measure AIDS mortality: a validation study in Tanzania and Zimbabwe. Journal of Epidemiology and Community Health 2010, 64:330e334. Quigley MA, Chandramohan D, Rodrigues LC: Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies. International Journal of Epidemiology 1999, 28:1081-1087. Tensou B, Araya T, Telake DS, Byass P, Berhane Y, Kebebew T, Sanders EJ, Reniers G: Evaluating the InterVA model for determining AIDS mortality from verbal autopsies in the adult population of Addis Ababa. Tropical Medicine and International Health 2010, 15:547-553. Groenewald P, Nannan N, Bourne D, Laubscher R, Bradshaw D: Identifying deaths from AIDS in South Africa. AIDS 2005, 19:193-201. Blacker J: The impact of AIDS on adult mortality: evidence from national and regional statistics. AIDS 2004, 18(suppl 2):S19-S26. World Health Organization: ICD-10: international statistical classification of diseases and related health problems: tenth revision - 2nd edition, volume 2 Geneva: World Health Organization; 2004 [http://www.who.int/ classifications/icd/ICD-10_2nd_ed_volume2.pdf], ISBN 92 4 154653 0. Kahn K, Tollman SM, Collinson MA, Clark SJ, Twine R, Clark BD, Shabangu M, Gomez-Olive FX, Mokoena O, Garenne ML: Research into health, population, and social transitions in rural South Africa: Data and methods of the Agincourt Health and Demographic Surveillance System. Scandinavian Journal of Public Health 2007, 35(supplement 69):8-20. Tollman SM, Kahn K, Sartorius B, Collinson MA, Clark SJ, Garenne ML: Implications of mortality transition for primary health care in rural South Africa: a population-based surveillance study. Lancet 2008, 372:893-901. Byass P, Kahn K, Fottrell E, Collinson MA, Tollman SM: Moving from Data on Deaths to Public Health Policy in Agincourt, South Africa: Approaches to Analysing and Understanding Verbal Autopsy Findings. PLoS Medicine 2010, 7:e1000325. Byass P, Fottrell E, Huong DL, Berhane Y, Corrah T, Kahn K, Muhe L, Van DD: Refining a probabilistic model for interpreting verbal autopsy data. Scandinavian Journal of Public Health 2006, 34:26-31. Fantahun M, Fottrell E, Berhane Y, Wall S, Hogberg U, Byass P: Assessing a new approach to verbal autopsy interpretation in a rural Ethiopian


Byass et al. Population Health Metrics 2011, 9:46 http://www.pophealthmetrics.com/content/9/1/46

25.

26. 27.

28.

29.

30.

31.

32.

Page 8 of 8

community: the InterVA model. Bulletin of the World Health Organization 2006, 84:204-210. Fottrell E, Kahn K, Ng N, Sartorius B, Huong DL, Minh HV, Fantahun M, Byass P: Mortality measurement in transition: proof of principle for standardised multi-country comparisons. Tropical Medicine and International Health 2010, 15:1256-1265. Garenne M, Fauveau V: Potential and limits of verbal autopsies. Bulletin of the World Health Organization 2006, 84:164. Nojilana B, Groenewald P, Bradshaw D, Reagon G: Quality of cause of death certification at an academic hospital in Cape Town, South Africa. South African Medical Journal 2009, 99:648-652. Groenewald P, Bradshaw D, Daniels J, Zinyakatira N, Matzopoulos R, Bourne D, Shaikh N, Naledi T: Local-level mortality surveillance in resource-limited settings: a case study of Cape Town highlights disparities in health. Bulletin of the World Health Organization 2010, 88:444-451. Morris SK, Bassani DG, Kumar R, Awasthi S, Paul VK, Jha P: Factors Associated with Physician Agreement on Verbal Autopsy of over 27000 Childhood Deaths in India. PLoS ONE 2010, 5(3):e9583. Khademi H, Etemadi A, Kamangar F, Nouraie M, Shakeri R, Abaie B, Pourshams A, Bagheri M, Hooshyar A, Islami F, Abnet CC, Pharoah P, Brennan P, Boffetta P, Dawsey SM, Malekzadeh R: Verbal Autopsy: Reliability and Validity Estimates for Causes of Death in the Golestan Cohort Study in Iran. PLoS ONE 2010, 5(6):e11183. Byass P: The democratic fallacy in matters of clinical opinion: implications for analysing cause-of-death data. Emerging Themes in Epidemiology 2011, 8:1. Joshi R, Lopez AD, MacMahon S, Reddy S, Dandona R, Dandona L, Neal B: Verbal autopsy coding: are multiple coders better than one? Bulletin of the World Health Organization 2009, 87:51-57.

doi:10.1186/1478-7954-9-46 Cite this article as: Byass et al.: Using verbal autopsy to track epidemic dynamics: the case of HIV-related mortality in South Africa. Population Health Metrics 2011 9:46.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

232


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

RESEARCH

Open Access

Verbal autopsy-based cause-specific mortality trends in rural KwaZulu-Natal, South Africa, 2000-2009 Abraham J Herbst1*, Tshepiso Mafojane1 and Marie-Louise Newell1,2

Abstract Background: The advent of the HIV pandemic and the more recent prevention and therapeutic interventions have resulted in extensive and rapid changes in cause-specific mortality rates in sub-Saharan Africa, and there is demand for timely and accurate cause-specific mortality data to steer public health responses and to evaluate the outcome of interventions. The objective of this study is to describe cause-specific mortality trends based on verbal autopsies conducted on all deaths in a rural population in KwaZulu-Natal, South Africa, over a 10-year period (2000-2009). Methods: The study used population-based mortality data collected by a demographic surveillance system on all resident and nonresident members of 12,000 households. Cause of death was determined by verbal autopsy based on the standard INDEPTH/WHO verbal autopsy questionnaire. Cause of death was assigned by physician review and the Bayesian-based InterVA program. Results: There were 11,281 deaths over 784,274 person-years of observation of 125,658 individuals between Jan. 1, 2000 and Dec. 31, 2009. The cause-specific mortality fractions (CSMF) for the population as a whole were: HIVrelated (including tuberculosis), 50%; other communicable diseases, 6%; noncommunicable lifestyle-related conditions, 15%; other noncommunicable diseases, 2%; maternal, perinatal, nutritional, and congenital causes, 1%; injury, 8%; indeterminate causes, 18%. Over the course of the 10 years of observation, the CSMF of HIV-related causes declined from a high of 56% in 2002 to a low of 39% in 2009 with the largest decline starting in 2004 following the introduction of an antiretroviral treatment program into the population. The all-cause agestandardized mortality rate (SMR) declined over the same period from a high of 174 (95% confidence interval [CI]: 165, 183) deaths per 10,000 person-years observed (PYO) in 2003 to a low of 116 (95% CI: 109, 123) in 2009. The decline in the SMR is predominantly due to a decline in the HIV-related SMR, which declined in the same period from 96 (95% CI: 89, 102) to 45 (95% CI: 40, 49) deaths per 10,000 PYO. There was substantial agreement (79% kappa = 0.68 (95% CI: 0.67, 0.69)) between physician coding and InterVA coding at the burden of disease group level. Conclusions: Verbal autopsy based methods enabled the timely measurement of changing trends in cause-specific mortality to provide policymakers with the much-needed information to allocate resources to appropriate health interventions.

* Correspondence: kherbst@africacentre.ac.za 1 Africa Centre for Health and Population Studies, University of KwaZuluNatal, Somkhele, South Africa Full list of author information is available at the end of the article Š 2011 Herbst et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

233


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 2 of 13

of Umkhanyakude in the province of KwaZulu-Natal, South Africa [26]. Although it is largely rural, the demographic surveillance area (DSA), consisting of 435 square kilometers, also includes a township and periurban informal settlements. The population is characterized by high HIV prevalence [27] and incidence [28], but following the introduction of prevention of motherto-child transmission of HIV infection (in 2001) and antiretroviral treatment and care (in 2004) programs, child [8] and adult [7] all-cause mortality have started to decline. The study population has comparatively high levels of cardiovascular risk factors [29] and high trauma-related mortality [30].

Background The advent of the HIV pandemic and the more recent prevention and therapeutic interventions have resulted in extensive and rapid changes in cause-specific mortality rates in sub-Saharan Africa during the last two decades [1-3]. South Africa, in particular, with a severe HIV epidemic, experienced a steep rise in adult mortality during the 1990s and the early part of this decade [4,5] and is one of few countries where child mortality increased from the 1990 baseline [6]. In the past few years, evidence is emerging from South Africa [7,8] and elsewhere in sub-Saharan Africa [9-12] that HIV-related mortality is declining following the introduction of prevention and treatment programs. In South Africa, these changes have occurred against the backdrop of a steadily increasing noncommunicable disease burden [13] and high trauma-related mortality [14]. With limited reliable data as to whether HIVrelated health care is flourishing to the detriment of nonHIV care, there is demand for timely and accurate causespecific mortality data to steer public health responses and to evaluate the outcome of interventions [15]. Due to the lack of death registration systems in the majority of the world’s poorest settings, verbal autopsybased mortality surveillance has become one of the methods of choice to obtain the much needed cause-specific mortality data [15]. In particular, health and demographic surveillance sites have a role to play in this context to provide timely and accurate data [16]. Although South Africa has a functioning death registration system, the quality of cause of death data has been questioned [17,18] and data from demographic surveillance studies using verbal autopsies have contributed to determining cause-specific mortality rates [19-21] in the country. Verbal autopsy (VA) methods have traditionally depended on physician assessment of the verbal autopsy interview data to determine a cause of death [22], but the efficiency, reliability, and repeatability of this approach have recently been questioned [22,23]. Alternative methods to determine the cause of death on the basis of a verbal autopsy interview have been developed [22,24], such as the InterVA (http://www.interva.net) system, which applies Bayes’ theorem to derive probable causes of death from VA data [25]. Using data from a well-established longitudinal demographic surveillance, this study describes cause-specific mortality trends based on verbal autopsies conducted on all deaths in a rural population in KwaZulu-Natal, South Africa, over a 10-year period (2000-2009).

Mortality data

The approximately 12,000 households in the DSA were visited thrice annually by fieldwork teams and all deaths were notified. Upon death notification, a trained nurse conducted an interview with the closest caregiver (parent or grandparent, 26%; spouse, child, or grandchild, 25%; other relative, 20%; sibling, 14%) of the deceased on average six months after the death. The nurses recorded a narrative of the circumstances leading up to the death and completed a questionnaire based on the standard INDEPTH/WHO verbal autopsy questionnaire [31,32]. Interviews were conducted in the local language and transcribed by the interviewer, after verbal consent. There were 228 (2%) refusals and 102 (0.9%) cases where a suitable interviewee could not be identified. A total of 13 nurses acted as interviewers over the course of the study; 80% of the interviews were conducted by six of these nurses. The majority of the nurses were graduate professional nurses; the remainder had twoyear nursing diplomas. All were previously employed with local health services and received training in administering the verbal autopsy questionnaire. Two methods were used to determine cause of death for each case: physician coding and an automated method using the InterVA v3 probabilistic verbal autopsy interpretation model. In the physician-coded method, two clinicians independently assigned cause of death on the basis of the information collected during the verbal autopsy and their clinical judgement. If consensus could not be reached between the physicians, the VA interview was refused, or no suitable interviewee could be found, the cause of death was recorded as “undetermined.” A third clinician reviewed all cases and codified the causes of death using the International Classification of Diseases, 10 th revision (ICD-10)[33]. The ICD-10 codes were mapped into global burden of disease groups (Table 1) using the crosswalk in Lopez [34]. A total of 27 physicians were involved over the course of the study period (2000 to 2009) in assigning

Methods Study area and population

The Africa Centre for Health and Population Studies hosts a demographic surveillance program in the district

234


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 3 of 13

Table 1 Burden of disease groups Cause category

Abbreviation Burden of disease codes

HIV-related causes

HIV-related

U003, U009

Other communicable diseases

Other CD

U005, U007, U008, U010, U015, U016, U017, U018, U020, U033, U037, U039

Maternal, perinatal, nutritional, and congenital causes

MPNC

U042, U043, U044, U045, U047, U048, U050, U051, U052, U054, U131, U136, U139, U140, U142

Noncommunicable lifestyle-related conditions

Lifestyle

U079, U105, U106, U107, U108, U109, U110, U112, U113, U114, U116, U117, U119, U120, U121, U122, U123

Other noncommunicable conditions

Other NCD

U059, U060, U061, U062, U063, U064, U065, U067, U068, U069, U070, U071, U072, U073, U074, U075, U076, U077, U078, U080, U081, U082, U084, U085, U086, U087, U096, U097, U124, U125, U130

Injuries

Injuries

U148, U150, U151, U152, U153, U154, U155, U157, U158

Indeterminate causes

Indeterminate U000, Z900, Z993, Z994, Z997, Z998, Z999

they became nonresidents for as long as they remained a member of (retained links with) the household under surveillance. Out-migrants continued to be followed as nonresident household members in 83% of emigrations over the duration of the study. We stratified mortality analysis by five age groups (under 5, 5-14, 15-49, 50-64, and over 65 years). The age-group boundaries were chosen to separate groups of public health importance and different patterns of mortality. There were no major changes in age group composition over the course of the study, and agestandardized mortality rates did not differ significantly on a year-to-year basis, therefore crude mortality rates are reported throughout. Under-5 mortality rates were expressed as deaths per person-years observed, rather than the customary deaths per live births, for consistent comparison to the mortality rates in the other age groups. Exact Poisson confidence intervals were calculated for all-cause mortality rates and cause-specific mortality rates and fractions where those were based on the single-cause physician-coded causes [35]. Confidence intervals for cause-specific mortality rates and fractions in the case of the InterVA-coded causes were calculated with R [36] using bootstrapping. Cause-specific mortality rates and fractions were based on the InterVA-determined causes unless otherwise stated. The kappa analysis was done using STATA v11 [37]. Ethical approval for the Africa Centre Demographic Surveillance was provided by the University of KwaZulu-Natal Bio-Medical Research Ethics Committee.

diagnoses, however, eight physicians were responsible for 79% of the recorded diagnoses. Complete cause of death coding for the physician-coded diagnoses was available for deaths between 2000 and the end of 2008. The InterVA model is based on Bayesian calculations of probabilities that a particular death was due to particular causes, given a set of symptoms and circumstances associated with the death [25]. The verbal autopsy questionnaire data were converted into the 106 input indicators required by the InterVA probabilistic model using an SQL script. Cause of death categories were obtained by running InterVA in batch mode on the input indicators with malaria prevalence set to “low” and HIV prevalence set to “high.” The 35 possible InterVA cause categories were mapped to the corresponding burdenof-disease codes to have cause categories comparable to the physician-coded diagnoses. InterVA produced up to three possible causes of death per case with a likelihood value between 0 and 1 for each cause. In cases where the likelihood values did not sum to 1 for a particular case, the difference between the sum of the likelihood values for the identified causes and 1 were allocated to the “indeterminate” cause. As recommended by the developers of InterVA, all identified causes were considered proportionate to their likelihood values in the rate calculations. Data analysis

Deaths and person-years of observation were aggregated annually for the period from Jan. 1, 2000, to Dec. 31, 2009, for all individuals in the study population. Individuals contributed to the person-years denominator from Jan. 1, 2000, or from any later date of birth or in-migration, until Dec. 31, 2009, and they ceased to contribute to the denominator at death, termination of household membership, household out-migration, or the last surveillance visit in which household membership was confirmed. Thus, individuals who were previous homestead residents continued to be followed when

Results Mortality

There were 11,281 deaths over 784,274 person years of observation of 125,658 individuals between Jan. 1, 2000 and Dec. 31, 2009 (Table 2). All causes of death were coded using InterVA; the 10,267 deaths between Jan. 1, 2000 and Dec. 31, 2008, were physician-coded as well. All-cause age-standardized mortality (SMR) for the total

235


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 4 of 13

Table 2 Person-years observed, number of deaths, and sum of InterVA likelihood values per cause of death by age group and year 0-4 yr

PYO

Deaths

HIV-related

MPNC

Injuries

Lifestyle

Other NCD

Other CD

2000

9,304

185

93.7

9.2

2.4

0.3

-

40.5

Indeterminate 38.9

2001

9,761

202

95.4

10.2

0.7

2.4

-

57.4

35.9

2002 2003

9,969 9,806

216 201

87.5 71.3

13.7 4.1

5.5 1.7

1.9

-

64.8 60.3

44.5 61.8

2004

9,837

140

38.1

4.9

1.9

-

-

30.0

65.1

2005

9,939

141

38.0

3.5

0.7

-

-

47.6

51.1

2006

10,127

118

35.5

4.2

1.9

0.6

0.8

48.6

25.5

2007

10,508

109

36.0

3.8

2.3

0.7

-

39.9

26.3

2008

10,539

118

28.6

4.7

3.0

0.7

-

53.9

27.1

2009

10,483

102

12.2

1.0

3.2

0.5

-

50.8

34.4

5-14 yr

PYO

Deaths

HIV-related

MPNC

Injuries

Lifestyle

Other NCD

Other CD

Indeterminate

2000

19,642

22

3.2

-

6.8

1.7

1.4

4.7

4.3

2001

20,723

31

13.7

-

4.5

2.6

0.1

4.6

5.5

2002

21,710

35

13.0

-

4.0

1.0

1.6

8.6

6.9

2003

22,218

41

14.2

-

10.5

2.2

0.6

2.8

10.7

2004

22,416

49

25.5

-

10.6

3.0

-

1.6

8.4

2005

22,181

38

18.1

-

6.9

2.7

-

1.3

8.9

2006 2007

22,052 22,078

33 27

15.7 11.8

-

2.8 4.5

1.0 0.6

0.7 -

5.4 4.7

7.4 5.5

2008

21,697

36

8.3

-

9.2

2.0

1.8

4.1

9.7

2009

21,288

22

12.9

-

3.2

0.2

-

1.6

4.1

15-49 yr

PYO

Deaths

HIV-related

MPNC

Injuries

Lifestyle

Other NCD

Other CD

Indeterminate

2000

28,618

441

327.4

4.1

34.9

32.3

0.3

5.2

36.8

2001

32,154

547

394.0

2.4

56.1

30.7

2.2

9.7

52.0

2002

35,475

625

485.4

2.6

44.2

29.5

3.2

6.6

53.5

2003 2004

37,350 39,059

701 694

523.1 480.6

5.8 3.1

55.2 81.2

28.1 26.3

7.2 5.1

10.2 4.3

71.5 93.3

2005

40,277

647

438.2

5.2

60.2

29.9

7.2

6.4

100.1

2006

41,826

580

375.6

2.1

73.2

26.2

3.3

7.4

92.3

2007

43,399

624

413.2

4.7

62.3

28.5

3.4

6.9

105.1

2008

44,235

509

294.6

1.8

85.4

23.7

6.3

4.9

92.3

2009

45,236

495

274.2

4.7

73.3

26.0

6.3

6.1

104.6

50-64 yr

PYO

Deaths

HIV-related

MPNC

Injuries

Lifestyle

Other NCD

Other CD

Indeterminate

2000

4,325

124

49.3

-

10.8

36.4

4.4

3.5

19.6

2001

4,433

153

62.6

-

14.2

49.2

4.3

3.9

18.9

2002

4,553

125

60.7

-

4.2

40.0

1.1

2.8

16.3

2003

4,666

159

70.0

-

8.1

46.5

8.4

2.8

23.3

2004

4,735

136

64.1

-

3.0

30.5

6.5

3.4

28.5

2005

4,714

167

84.8

-

5.9

35.6

8.2

2.7

29.8

2006

4,715

164

60.7

-

7.8

43.1

6.6

4.7

41.2

2007 2008

4,836 4,990

169 155

74.2 55.7

-

5.0 7.0

41.6 45.5

3.3 9.6

2.9 3.7

41.9 33.5

2009

5,152

174

62.1

-

6.8

46.7

10.8

4.2

43.5

236


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 5 of 13

Table 2 Person-years observed, number of deaths, and sum of InterVA likelihood values per cause of death by age group and year (Continued) 65+ yr

PYO

Deaths

HIV-related

MPNC

Injuries

Lifestyle

Other NCD

Other CD

Indeterminate

2000

3,059

139

28.4

1.6

2.8

64.0

2001

3,134

196

24.8

1.9

3.6

98.3

4.3

8.0

29.9

7.7

13.4

2002

3,170

226

34.8

2.8

8.9

46.3

103.8

14.2

7.9

2003

3,187

216

31.7

1.6

53.5

5.2

102.2

12.1

12.2

51.0

2004 2005

3,268 3,396

202 198

33.2 36.0

2.5 -

10.1 6.1

92.8 74.2

5.8 15.9

9.1 8.0

48.5 57.8

2006

3,494

197

30.2

1.8

8.7

77.1

13.2

7.1

59.0

2007

3,526

229

42.6

0.7

8.8

100.6

10.1

3.8

62.4

2008

3,539

212

25.3

0.4

9.8

104.7

6.0

5.7

60.1

2009

3,474

211

28.0

1.1

5.9

100.6

10.4

3.3

61.7

in 2002, declining to a minimum of 39% of deaths in 2009. Indeterminate causes were responsible for a minimum of 14% of deaths during 2000, rising to a maximum of 25% in 2009. The largest proportion (52%) of deaths occurred in the 15-49 age group followed by the 65 and older age group at 18%. The under-5 age group contributed 20% of the deaths in 2000, but this declined to between 9% and 11% from 2007 to 2009. Overall, 82% of all deaths

population changed from 139 (95% CI: 130, 148) deaths per 10,000 person-years observed (PYO) in 2000 to a maximum of 174 (95% CI: 165, 183) in 2003 and then declined to 116 (95% CI: 109, 123) in 2009 (Figure 1). HIV-related causes were responsible for 50% of the deaths over the period from 2000 to 2009, followed by indeterminate causes at 18% and lifestyle-related noncommunicable diseases at 15% of all deaths. HIV-related causes were responsible for a maximum 56% of deaths

200

180

160

Deaths/10000 PYO

140

120

100

80

60

40

20

0 2000 HIV-related

2001 Other CD

2002 MPNC

2003 Injury

2004 Lifestyle

2005 Other NCD

2006 Undeterminate

2007

2008

All Cause Mortality (age standardized)

Figure 1 Contribution Of Each Cause to All-Cause Mortality By Year. (Deaths 11, 281, Person Years Observed (PYO) 784, 274).

237

2009


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 6 of 13

peaked at 22 (95% CI: 16, 28) deaths per 10,000 PYO in 2004 and then declined to 10 (95% CI: 6, 15) in 2009 (Figure 4). The all-cause SMR for the 15-49 age group peaked at 190 (95% CI: 176, 203) deaths per 10,000 PYO in 2003 and then declined to 109 (95% CI: 100, 119) in 2009 (Figure 5). HIV-related mortality peaked in 2003 at 140 (95% CI: 135, 145) deaths per 10,000 PYO and then declined to 61 (95% CI: 56, 65) in 2009 (Table 3). The CSMF for HIV-related causes declined from 74% (95% CI: 71, 78) in 2000 to 55% (95% CI: 52, 59) in 2009. This decline in HIV-related CSMF appears to be at the expense of an increase in indeterminate CSMF, which increased over the same period from 8% (95% CI: 7, 10) to 21% (95% CI: 19, 24). There was no discernible trend in all-cause mortality in the 50-64 age group (Figure 6). HIV-related causes constituted the largest CSMF at 42% for the entire period, followed by lifestyle-related noncommunicable diseases at 27% and indeterminate causes at 19%. There was no significant trend in HIV-related mortality and the highest mortality rates observed occurred in 2005 (180 (95% CI: 156, 206)) and 2007 (154 (95% CI: 129, 179)) after the introduction of an antiretroviral therapy

occurred before the age of 65 years, declining from a high of 85% in 2000 to 79% in 2009 (Figure 2). The under-5 all-cause SMR peaked at 217 (95% CI: 189, 245) deaths per 10,000 PYO in 2002 and then declined rapidly to 103 (95% CI: 83,103) in 2009 (Figure 3). This is greater than a twofold decline in mortality. HIV-related mortality (Table 3) in this age group had a high in 2000 of 101 deaths per 10,000 PYO (95% CI: 86,115), plateaued at around 37 deaths per 10,000 PYO between 2004 and 2007, and then declined further to a low of 12 (95% CI: 6, 17) in 2009. Other communicable disease SMR varied between 31 (95%: CI 22, 38) in 2004 and 65 (95% CI: 54, 76) deaths per 10,000 PYO in 2002. Mortality due to indeterminate causes was high in this age group varying around 41 deaths per 10,000 PYO and peaked around 60 deaths per 10,000 between 2003 and 2005. The cause-specific mortality fraction (CSMF) due to HIVrelated causes declined from 51% (95% CI: 44, 58) in 2000 to 12% (95% CI: 6, 17) in 2009. As a result of the overall decline in the mortality rate in this age group, the CSMF for other communicable diseases increased from 22% (95% CI: 17, 27) in 2000 to 50% (95% CI: 42, 57) in 2009. The all-cause SMR for the 5-14 age group

200

180

160

Deaths/10000 PYO

140

120

100

80

60

40

20

0 2000

2001

2002

2003

2004

0-4yr

05-14yr

15-49yr

50-64yr

2005 65+yr

2006

2007

2008

2009

All Cause Mortality (age standardized)

Figure 2 Contribution Of Each Age Group to All-Cause Mortality By Year. (Deaths 11, 281, Person Years Observed (PYO) 784, 274).

238


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 7 of 13

275

250

225

200

Deaths/10000 PYO

175

150

125

100

75

50

25

0 2000 HIV-related

2001 Other CD

2002 MPNC

2003 Injury

2004

2005

Noncommunicable

2006 Indeterminate

2007

2008

2009

All Cause Mortality (age standardized)

Figure 3 < 5 yr Age Standardized Mortality. (Deaths 1, 531, Person Years Observed (PYO) 100, 274).

level. There were no significant changes in agreement over time (Table 5).

(ART) program in the area in 2004 (Table 3). Lifestylerelated noncommunicable mortality remained stable at around 88 deaths per 10,000 PYO. Age-standardized mortality in the over-65 age group appeared to increase in the early part of the decade, rising from 453 (95% CI: 377, 528) deaths per 10,000 in 2000 to a peak of 721 (95% CI: 631,812) in 2002 and plateauing off to remain stable at around 611 deaths per 10,000 PYO (Figure 7). Noncommunicable diseases were responsible for the largest CSMF at 50%, with lifestyle-related CSMF at 45%. Indeterminate (26%) and HIV-related (16%) causes were also important.

Discussion Cause-specific mortality trends in the under-5 age group and the 15-49 age group were dominated by trends in HIV-related mortality in response to the introduction of interventions such as preventing mother-to-child transmission of HIV (PMTCT) and ART [7,8]. There were no significant trends in other causes of mortality in these age groups. The CSMF for HIV-related causes for the under-5 age group (34%) for the 2002-2005 period was higher than the value (26%) reported by Byass [20] using InterVA (with the same set of input indicators as used by this study) for the same period in Agincourt, a demographic surveillance site in the Mpumalanga province of South Africa. In the case of the 15-49 age group, the HIV-related CSMF (72%) in this study was also higher than the value (54%) reported for the Agincourt site. In the older age groups, HIV-related mortality remained important but showed no significant decline in the second half of the 2000-2009 period.

Comparison of physician with InterVA coding

There was substantial agreement (79% kappa = 0.68 (95% CI: 0.67, 0.69)) between physician coding and InterVA coding at the burden of disease group level. This agreement varied significantly among age groups (Table 4), with the lowest agreement in the under-5 age group (kappa = 0.44 (95% CI: 0.41, 0.47)) and the over65 age group (kappa = 0.50 (95% CI: 0.48, 0.52)), with the remaining age groups around the overall agreement

239


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 8 of 13

Table 3 Cause-specific mortality rates by age group and year (per 10,000 person years observed) 2000

0-4 yrs (95% CI)

5-14 yrs (95% CI)

15-49 yrs (95% CI)

50-64 yrs (95% CI)

65+ (95% CI)

HIV-related

100.8 (88.1-113.9)

1.6 (0.0-3.1)

114.3 (109.1-119.5)

113.8 (89.5-137.0)

94.3 (65.6-122.2)

MPNC

8.0 (2.8-12.9)

-

1.4 (0.1-2.5)

-

3.9 (0.0-7.6)

Injuries

2.6 (0.0-5.2)

3.5 (1.4-5.3)

11.9 (8.2-15.6)

24.4 (9.6-36.7)

9.3 (0.0-16.8) 211.2 (180.8-240.9)

Lifestyle

0.3 (0.0-0.7)

0.9 (0.0-1.8)

11.1 (7.6-14.4)

83.8 (62.8-103.7)

Other NCD

-

0.7 (0.0-1.4)

0.2 (0.0-0.5)

11.1 (2.6-18.0)

18.1 (3.8-29.5)

Other CD

45.0 (35.4-54.5)

2.4 (0.3-4.2)

1.6 (0.3-2.7)

5.5 (0.0-11.5)

22.0 (11.0-32.8)

Indeterminate

42.1 (33.7-49.1)

2.2 (1.2-3.0)

13.4 (11.1-15.6)

48.2 (35.9-59.0)

95.6 (78.0-113.6)

2001

0-4 yrs (95% CI)

5-14 yrs (95% CI)

15-49 yrs (95% CI)

50-64 yrs (95% CI)

65+ (95% CI)

HIV-related

97.7 (84.6-110.8)

6.6 (4.4-9.1)

122.9 (116.9-128.9)

143.9 (118.4-170.3)

77.3 (50.9-102.3)

MPNC

10.0 (4.1-15.2)

-

0.4 (0.0-0.9)

-

6.1 (0.0-12.2)

Injuries

0.7 (0.0-1.4)

2.2 (0.4-3.7)

17.2 (12.5-21.5)

31.9 (15.7-47.4)

10.3 (0.0-19.3) 324.3 (283.6-365.0)

Lifestyle

2.5 (0.1-4.5)

1.5 (0.0-2.7)

9.9 (6.5-12.6)

111.4 (87.4-135.0)

Other NCD

-

0.1 (0.0-0.1)

0.8 (0.0-1.5)

9.7 (1.1-16.6)

24.4 (8.5-38.4)

Other CD

59.2 (46.6-70.2)

2.2 (0.4-3.6)

2.9 (1.0-4.6)

7.2 (0.0-12.9)

38.2 (21.3-54.6) 144.8 (121.5-168.7)

Indeterminate

36.9 (29.3-43.0)

2.5 (1.3-3.4)

15.9 (13.4-18.2)

41.1 (28.8-52.3)

2002

0-4 yrs (95% CI)

5-14 yrs (95% CI)

15-49 yrs (95% CI)

50-64 yrs (95% CI)

65+ (95% CI)

HIV-related

87.7 (74.6-101.9)

5.9 (3.6-8.3)

137.1 (132.1-142.0)

134.7 (111.5-156.7)

106.8 (73.3-138.1)

MPNC

12.8 (6.3-18.0)

-

0.7 (0.0-1.5)

-

8.8 (0.9-15.1)

Injuries

5.5 (1.3-9.2)

1.8 (0.0-3.4)

12.0 (8.5-15.3)

7.8 (0.0-14.3)

26.3 (9.0-41.1)

Lifestyle

-

0.5 (0.0-1.0)

8.3 (5.2-10.9)

88.0 (66.2-108.8)

341.3 (298.3-388.4)

Other NCD

-

0.7 (0.0-1.5)

0.9 (0.0-1.7)

2.4 (0.0-4.9)

46.9 (27.1-65.0)

Other CD

65.6 (54.5-76.9)

4.0 (1.9-5.8)

1.8 (0.4-3.0)

5.5 (0.0-10.6)

22.7 (7.9-35.8)

Indeterminate

45.0 (37.4-51.7)

3.2 (1.7-4.4)

15.3 (13.0-17.6)

36.2 (23.9-46.9)

160.3 (132.6-187.5)

2003

0-4 yrs (95% CI)

5-14 yrs (95% CI)

15-4 9 yrs (95% CI)

50-64 yrs (95% CI)

65+ (95% CI)

HIV-related

72.7 (58.6-85.8)

6.4 (3.8-8.9)

140.6 (135.7-145.5)

148.2 (123.6-172.4)

99.3 (67.9-126.6)

MPNC

4.1 (0.2-7.4)

-

1.4 (0.0-2.4)

-

5.0 (0.0-9.6)

Injuries

1.8 (0.0-3.5)

4.8 (2.5-7.0)

13.8 (10.4-17.1)

17.4 (3.6-27.5)

16.3 (0.8-28.0)

Lifestyle

1.9 (0.0-3.6)

0.9 (0.0-1.8)

8.3 (5.8-10.7)

105.5 (81.3-127.6)

330.0 (292.1-369.7)

Other NCD

-

0.3 (0.0-0.6)

1.9 (0.6-3.1)

18.4 (7.4-28.2)

39.6 (18.2-58.8)

Other CD

61.6 (49.5-73.0)

1.3 (0.1-2.3)

2.7 (1.2-4.1)

5.0 (0.0-9.6)

31.3 (16.0-44.9)

Indeterminate

62.9 (51.6-73.0)

4.9 (2.9-6.6)

18.9 (16.6-21.1)

46.3 (36.7-55.4)

156.2 (131.2-181.2)

2004

0-4 yrs (95% CI)

5-14 yrs (95% CI)

15-49 yrs (95% CI)

50-64 yrs (95% CI)

65+ (95% CI)

HIV-related

38.7 (27.8-49.1)

11.4 (8.7-14.3)

123.8 (118.5-129.4)

130.7 (111.0-151.6)

100.9 (71.1-129.3)

MPNC

4.6 (0.3-7.9)

-

0.8 (0.0-1.5)

-

6.3 (0.0-11.2)

Injuries

1.9 (0.0-3.8)

4.8 (2.3-6.9)

20.1 (15.9-23.8)

6.3 (0.0-12.5)

24.2 (8.9-39.0)

Lifestyle

-

1.3 (0.0-2.6)

6.8 (4.7-9.0)

71.8 (53.4-88.5)

302.2 (266.5-337.7)

Other NCD

-

-

1.3 (0.3-2.2)

13.8 (4.3-21.7)

18.4 (4.7-30.4)

Other CD Indeterminate

30.5 (23.0-38.9) 66.6 (55.9-76.7)

0.7 (0.0-1.4) 3.6 (1.8-5.2)

1.0 (0.2-1.7) 23.9 (20.5-27.2)

5.2 (0.0-9.4) 59.6 (45.6-72.5)

19.8 (8.0-29.9) 146.1 (122.4-169.3)

2005

0-4 yrs (95% CI)

5-14 yrs (95% CI)

15-49 yrs (95% CI)

50-64 yrs (95% CI)

65+ (95% CI)

HIV-related

38.3 (29.5-46.9)

8.2 (5.3-10.9)

108.9 (103.7-114.6)

179.4 (154.0-201.9)

104.3 (74.4-131.6)

MPNC

3.6 (0.2-6.3)

-

1.2 (0.3-2.0)

-

-

Injuries

0.7 (0.0-1.4)

3.1 (1.1-5.0)

14.5 (11.0-17.5)

12.5 (2.0-21.0)

17.8 (3.9-30.0)

Lifestyle

-

1.2 (0.0-2.4)

7.9 (5.3-10.2)

78.7 (59.2-97.4)

232.4 (199.1-264.9)

Other NCD

-

-

1.8 (0.8-2.7)

17.6 (5.7-27.3)

52.3 (30.9-73.2)

Other CD Indeterminate

47.9 (38.7-57.3) 51.4 (43.0-58.6)

0.6 (0.0-1.2) 4.0 (2.1-5.9)

1.5 (0.5-2.4) 24.8 (21.2-27.9)

4.0 (0.0-7.4) 62.1 (46.8-76.9)

17.8 (4.6-27.8) 158.3 (130.4-183.0)

240


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 9 of 13

Table 3 Cause-specific mortality rates by age group and year (per 10,000 person years observed) (Continued) 2006

0-4 yrs (95% CI)

5-14 yrs (95% CI)

15-49 yrs (95% CI)

50-64 yrs (95% CI)

65+ (95% CI)

HIV-related

35.1 (26.2-42.8)

7.1 (5.0-9.5)

89.8 (84.8-95.0)

126.8 (104.5-149.4)

86.8 (57.1-111.7)

MPNC

4.1 (0.0-7.6)

-

0.5 (0.0-0.9)

-

4.7 (0.0-8.7)

Injuries Lifestyle

1.8 (0.0-3.7) 0.6 (0.0-1.2)

1.3 (0.0-2.5) 0.4 (0.0-1.0)

16.8 (12.7-20.5) 6.2 (4.2-8.1)

14.9 (2.0-24.4) 95.0 (74.9-114.4)

24.9 (7.5-38.5) 224.2 (190.7-257.3)

Other NCD

0.8 (0.0-1.6)

0.3 (0.0-0.7)

0.8 (0.0-1.4)

13.9 (4.2-21.4)

39.4 (21.0-57.1)

Other CD

47.9 (38.9-56.6)

2.5 (0.7-3.8)

1.5 (0.5-2.3)

9.5 (2.4-15.2)

18.1 (8.1-27.6)

Indeterminate

26.2 (20.5-31.8)

3.8 (1.3-5.7)

23.0 (19.9-26.2)

87.7 (73.2-102.6)

165.8 (140.4-189.7)

2007

0-4 yrs (95% CI)

5-14 yrs (95% CI)

15-49 yrs (95% CI)

50-64 yrs (95% CI)

65+ (95% CI)

HIV-related

34.3 (25.1-42.8)

5.4 (3.3-7.5)

95.4 (91.1-100.3)

153.4 (130.8-177.3)

116.3 (85.5-144.9)

MPNC

3.0 (0.0-5.5)

-

1.4 (0.2-2.3)

-

2.0 (0.0-4.3)

Injuries Lifestyle

2.2 (0.0-4.1) 0.7 (0.0-1.5)

2.0 (0.4-3.5) 0.3 (0.0-0.6)

13.7 (10.4-16.8) 6.4 (4.3-8.4)

10.4 (1.1-18.1) 88.9 (66.7-109.4)

24.1 (10.0-36.3) 296.1 (261.9-328.5) 29.5 (14.4-42.7)

Other NCD

-

-

0.8 (0.1-1.4)

5.6 (0.0-10.1)

Other CD

38.6 (29.9-47.2)

2.1 (0.2-3.7)

1.6 (0.6-2.5)

4.3 (0.0-8.2)

8.3 (1.3-14.3)

Indeterminate

25.1 (19.9-30.4)

2.5 (0.8-3.9)

24.5 (20.9-27.5)

86.9 (68.2-103.5)

173.3 (144.5-199.4)

2008

0-4 yrs (95% CI)

5-14 yrs (95% CI)

15-49 yrs (95% CI)

50-64 yrs (95% CI)

65+ (95% CI)

HIV-related

27.1 (18.5-35.7)

3.8 (1.6-6.0)

66.9 (62.4-72.1)

111.7 (89.1-134.3)

70.5 (46.9-93.4)

MPNC Injuries

4.4 (0.5-7.6) 2.8 (0.0-5.1)

4.2 (1.9-6.4)

1.0 (0.1-1.8) 18.7 (14.7-22.4)

14.7 (2.1-24.6)

0.3 (0.0-0.7) 27.7 (10.8-41.2)

Lifestyle

0.7 (0.0-1.5)

0.9 (0.0-1.9)

5.3 (3.4-7.1)

96.7 (77.8-116.4)

295.6 (260.9-333.1)

Other NCD

-

1.1 (0.0-2.1)

1.6 (0.6-2.5)

19.8 (6.3-30.2)

19.7 (6.8-31.1)

Other CD

51.2 (42.8-60.0)

1.7 (0.2-3.0)

1.0 (0.1-1.8)

7.5 (0.0-13.6)

14.5 (4.9-23.1)

Indeterminate

25.8 (19.4-31.5)

4.8 (2.8-6.5)

20.5 (17.5-23.3)

60.3 (44.7-74.0)

173.6 (145.9-197.4)

2009

0-4 yrs (95% CI)

5-14 yrs (95% CI)

15-49 yrs (95% CI)

50-64 yrs (95% CI)

65+ (95% CI)

HIV-related

11.7 (6.2-16.7)

5.6 (3.6-7.7)

58.3 (53.9-62.3)

114.9 (94.5-137.2)

78.3 (51.7-103.4)

MPNC Injuries

0.9 (0.0-2.1) 3.0 (0.0-5.7)

1.5 (0.0-2.9)

1.2 (0.2-2.1) 15.4 (11.7-18.9)

12.3 (1.6-20.9)

3.1 (0.0-6.2) 15.7 (2.6-27.0)

Lifestyle

0.4 (0.0-0.9)

0.1 (0.0-0.2)

5.3 (3.3-7.0)

94.5 (74.9-113.8)

285.8 (249.9-322.0)

Other NCD

-

-

1.5 (0.4-2.3)

20.6 (8.4-30.6)

29.2 (14.4-42.9)

Other CD

45.2 (38.2-52.2)

0.8 (0.0-1.5)

1.1 (0.2-1.8)

6.2 (0.3-11.0)

9.6 (0.9-16.9)

Indeterminate

36.1 (29.0-44.0)

2.3 (1.0-3.5)

26.9 (23.7-30.1)

89.4 (70.0-106.1)

185.6 (157.9-212.9)

The verbal autopsy questionnaire was not designed from inception with the InterVA input indicators in mind. A number (18 out of 106 indicators) did not map directly to the questionnaire and had to be derived indirectly. Data quality in relation to the InterVA input indicators could not be monitored on an ongoing basis and some of the changes over time in the indeterminate-cause proportion could reflect data-quality issues.

Noncommunicable-disease mortality increased with increasing age and was the major cause of death in the 65-and-older age group. There were no clear trends in noncommunicable disease mortality over time. The proportions of indeterminate causes in this study for the different age groups were generally lower than the levels reported by Byass [20] for the Agincourt site, with the exception of the under-5 age group. In the under-5 age group, 32% in this study compared to 26% in the Byass study; 21% in this study compared to 38% in the Byass study in the 5-14 age group; 12% in this study compared to 26% in the Byass study in the 15-49 age group; 17% in this study compared to 31% in the Byass study in the 50-64 age group, and 25% in this study compared to 33% in the Byass study in the 65and-older age group.

Conclusions There has been a substantial decline in proportion of deaths due to HIV-related causes in the under-5 age group over the study period coupled with a fairly constant mortality rate due to other communicable diseases and indeterminate causes. This would indicate that further gains in reducing under-5 mortality would

241


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 10 of 13

30

25

Deaths/10000 PYO

20

15

10

5

0 2000 HIV-related

2001 Other CD

2002 MPNC

2003 Injury

2004

2005

Noncommunicable

2006 Indeterminate

2007

2008

2009

All Cause Mortality (age standardized)

Figure 4 5-14 yr Age Standardized Mortality. (Deaths 333, Person Years Observed (PYO) 216, 005).

250

Deaths/10000 PYO

200

150

100

50

0 2000 HIV-related

2001

2002

Other CD

MPNC

2003 Injury

2004 Lifestyle

2005 Other NCD

2006 Indeterminate

2007

Figure 5 15-49 yr Age Standardized Mortality. (Deaths 5, 863, Person Years Observed (PYO) 387, 630).

242

2008

2009

All Cause Mortality (age standardized)


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 11 of 13

450

400

350

Deaths/10000 PYO

300

250

200

150

100

50

0 2000 HIV-related

2001

2002

Other CD

MPNC

2003 Injury

2004 Lifestyle

2005 Other NCD

2006 Indeterminate

2007

2008

2009

All Cause Mortality (age standardized)

Figure 6 50-64 yr Age Standardized Mortality. (Deaths 1, 526, Person Years Observed (PYO) 47, 118).

900

800

700

Deaths/10000 PYO

600

500

400

300

200

100

0 2000 HIV-related

2001

2002

Other CD

MPNC

2003 Injury

2004 Lifestyle

2005 Other NCD

2006 Indeterminate

Figure 7 65+yr Age Standardized Mortality. (Deaths 1, 526, Person Years Observed (PYO) 47, 118).

243

2007

2008

2009

All Cause Mortality (age standardized)


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 12 of 13

deaths in this analysis in spite of the fact that physician coding is not yet complete for 2009. Verbal autopsy based methods enabled the timely measurement of changing trends in cause-specific mortality to provide policymakers with the much-needed information to allocate resources to appropriate health interventions.

Table 4 Agreement between physician and InterVA cause of death allocation by age group Age Group

Agreement

Kappa (95% confidence interval)

p

0-4

61%

0.43 (0.40-0.46)

< 0.0001

5-14

76%

0.68 (0.63-0.74)

< 0.0001

15-49

86%

0.71 (0.70-0.72)

< 0.0001

50-64 65+

77% 68%

0.68 (0.65-0.70) 0.50 (0.48-0.53)

< 0.0001 < 0.0001

Overall

79%

0.68 (0.67-0.69)

< 0.0001

Acknowledgements We thank the community members in the demographic surveillance area who have contributed their data to the study since 2000. We appreciate the contribution of the research operations staff of the Africa Centre in collecting the data used in this paper. Erofili Grapsa assisted with the statistical analysis. Nuala McGrath and Till Bärnighausen provided statistical advice. Peter Byass and Ed Fottrell provided assistance with implementing InterVA. The demographic surveillance is funded by the Wellcome Trust. The verbal autopsy program was partially funded by the MTN Foundation.

require investigation of causes other than HIV and possibly changes in public health services. Further research is required to determine whether the resilience to change in other communicable disease mortality is due to interaction between interventions aimed at HIV and those aimed at other causes of child mortality, or due to unrelated factors. In the 15-49 age group, the positive impact of HIVrelated interventions was substantial. We did not explore sex-specific mortality, but given the different age pattern in HIV prevalence [27], one would expect some differences in the sex-specific HIV-related mortality trends. Although trauma mortality is dwarfed by HIV-related mortality in this age group, it is still considerably higher than the global mortality estimate [38]. In the older age groups, the dual burdens of communicable and noncommunicable diseases were evident. The lack of a substantial decline in HIV-related mortality in those over 50 years old requires further investigation to determine whether this was due to lack of access or response to treatment programs or an artifact of competing risks from other mortality causes. The InterVA verbal autopsy program performed well, and the conclusions based on InterVA mortality cause allocation would have been no different had they been based on physician mortality cause allocation. The InterVA program allowed more timely analysis of causespecific mortality; as a result we could include the 2009

Author details 1 Africa Centre for Health and Population Studies, University of KwaZuluNatal, Somkhele, South Africa. 2MRC Centre of Epidemiology for Child Health, UCL ICH, London, UK. Authors’ contributions AJH was responsible for the data analysis and drafting the manuscript. TM supervised the verbal autopsy data collection and participated in the data analysis. MLN is the director of the Africa Centre and contributed to the drafting and reviewing of the manuscript. All authors have read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Received: 12 February 2011 Accepted: 5 August 2011 Published: 5 August 2011 References 1. Rajaratnam JK, Marcus JR, Levin-Rector A, Chalupka AN, Wang H, Dwyer L, Costa M, Lopez AD, Murray CJ: Worldwide mortality in men and women aged 15-59 years from 1970 to 2010: a systematic analysis. Lancet 2010, 375:1704-1720. 2. Black RE, Cousens S, Johnson HL, Lawn JE, Rudan I, Bassani DG, Jha P, Campbell H, Walker CF, Cibulskis R: Global, regional, and national causes of child mortality in 2008: a systematic analysis. The Lancet 2010. 3. Stanecki K, Daher J, Stover J, Akwara P, Mahy M: Under-5 mortality due to HIV: regional levels and 1990-2009 trends. Sexually Transmitted Infections 2010, 86:ii56. 4. Bradshaw D, Laubscher R, Dorrington R, Bourne DE, Timaeus IM: Unabated rise in number of adult deaths in South Africa. South African Medical Journal 2008, 94:278. 5. Statistics South Africa: Mortality and causes of death in South Africa, 2008: Findings from death notification Pretoria; 2010. 6. Chopra M, Daviaud E, Pattinson R, Fonn S, Lawn JE: Saving the lives of South Africa’s mothers, babies, and children: can the health system deliver? The Lancet 2009, 374:835-846. 7. Herbst A, Cooke G, Bärnighausen T, KanyKany A, Tanser F, Newell M: Adult mortality and antiretroviral treatment roll-out in rural KwaZulu-Natal, South Africa. Bull World Health Organ 2009, 87:754-762. 8. Ndirangu J, Newell M, Tanser F, Herbst A, Bland R: Decline in early life mortality in a high HIV prevalence rural area of South Africa: evidence of HIV prevention or treatment impact? AIDS 2010, 24:593-602. 9. Floyd S, Molesworth A, Dube A, Banda E, Jahn A, Mwafulirwa C, Ngwira B, Branson K, Crampin AC, Zaba B, et al: Population-level reduction in adult mortality after extension of free anti-retroviral therapy provision into rural areas in northern Malawi. PLoS One 2010, 5:e13499. 10. Bendavid E, Bhattacharya J: The President’s Emergency Plan for AIDS Relief in Africa: an evaluation of outcomes. Annals of internal medicine 2009, 150:688.

Table 5 Agreement between physician and InterVA cause of death allocation by year Year

Agreement

Kappa (95% confidence interval)

p

2000

78%

0.67 (0.64-0.70)

< 0.0001

2001 2002

82% 82%

0.72 (0.69-0.74) 0.71 (0.68-0.74)

< 0.0001 < 0.0001

2003

78%

0.66 (0.63-0.69)

< 0.0001

2004

77%

0.66 (0.63-0.69)

< 0.0001

2005

75%

0.63 (0.61-0.66)

< 0.0001

2006

76%

0.66 (0.63-0.69)

< 0.0001

2007

79%

0.69 (0.66-0.72)

< 0.0001

2008

79%

0.72 (0.69-0.75)

< 0.0001

244


Herbst et al. Population Health Metrics 2011, 9:47 http://www.pophealthmetrics.com/content/9/1/47

Page 13 of 13

34. Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJL, (Eds.): Global Burden of Disease and Risk Factors Washington: Oxford University Press and The World Bank; 2006. 35. Ulm K: A simple method to calculate the confidence interval of a standardized mortality ratio (SMR). American journal of epidemiology 1990, 131:373-375. 36. R Development Core Team: R: A language and environment for statitical computing Vienna: R Foundation for Statistical Computing; 2010. 37. STATACorp: Statistical Software Release 11.0 College Station, TX: Stata Corporation; 2010. 38. Peden M, McGee K, Sharma G: The injury chart book: a graphical overview of the global burden of injuries. Book The injury chart book: a graphical overview of the global burden of injuries City: World Health Organisation; 2002.

11. Reniers G, Araya T, Davey G, Nagelkerke N, Berhane Y, Coutinho R, Sanders EJ: Steep declines in population-level AIDS mortality following the introduction of antiretroviral therapy in Addis Ababa, Ethiopia. AIDS (London, England) 2009, 23:511. 12. Gregson S, Gonese E, Hallett TB, Taruberekera N, Hargrove JW, Lopman B, Corbett EL, Dorrington R, Dube S, Dehne K: HIV decline in Zimbabwe due to reductions in risky sex? Evidence from a comprehensive epidemiological review. International Journal of Epidemiology 2010, 1-13. 13. Mayosi BM, Flisher AJ, Lalloo UG, Sitas F, Tollman SM, Bradshaw D: The burden of non-communicable diseases in South Africa. The Lancet 2009, 374:934-947. 14. Seedat M, Van Niekerk A, Jewkes R, Suffla S, Ratele K: Health in South Africa 5–violence and injuries in South Africa: prioritising an agenda for prevention. Lancet 2009, 374:1011-1022. 15. Fottrell E: Dying to count: mortality surveillance in resource-poor settings. Glob Health Action 2009, 2. 16. Sankoh O: Global health estimates: stronger collaboration needed with low- and middle-income countries. PLoS Med 2010, 7:e1001005. 17. Nojilana B, Groenewald P, Bradshaw D, Reagon G: Quality of cause of death certification at an academic hospital in Cape Town, South Africa. S Afr Med J 2009, 99:648-652. 18. Yudkin PL, Burger EH, Bradshaw D, Groenewald P, Ward AM, Volmink J: Deaths caused by HIV disease under-reported in South Africa. AIDS 2009, 23:1600-1602. 19. Hosegood V, Vanneste AM, Timaeus IM: Levels and causes of adult mortality in rural South Africa: the impact of AIDS. AIDS 2004, 18:663-671. 20. Byass P, Kahn K, Fottrell E, Collinson MA, Tollman SM: Moving from data on deaths to public health policy in Agincourt, South Africa: approaches to analysing and understanding verbal autopsy findings. PLoS Med 2010, 7:e1000325. 21. Cook I, Alberts M, Burger S, Byass P: All-cause mortality trends in Dikgale, rural South Africa, 1996–2003. Scandinavian journal of public health 2008, 36:753. 22. Fottrell E, Byass P: Verbal autopsy: methods in transition. Epidemiol Rev 2010, 32:38-55. 23. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84:239-245. 24. King G, Lu Y, Shibuya K: Designing verbal autopsy studies. Popul Health Metr 2010, 8:19. 25. Byass P, Fottrell E, Dao LH, Berhane Y, Corrah T, Kahn K, Muhe L, Do DV: Refining a probabilistic model for interpreting verbal autopsy data. Scand J Public Health 2006, 34:26-31. 26. Tanser F, Hosegood V, Bärnighausen T, Herbst K, Nyirenda M, Muhwava W, Newell C, Viljoen J, Mutevedzi T, Newell M: Cohort Profile: Africa Centre Demographic Information System (ACDIS) and population-based HIV survey. Int J Epidemiol 2008, 37:956-962. 27. Welz T, Hosegood V, Jaffar S, Bätzing-Feigenbaum J, Herbst K, Newell M: Continued very high prevalence of HIV infection in rural KwaZulu-Natal, South Africa: a population-based longitudinal study. AIDS 2007, 21:1467-1472. 28. Bärnighausen T, Tanser F, Gqwede Z, Mbizana C, Herbst K, Newell M: High HIV incidence in a community with high HIV prevalence in rural South Africa: findings from a prospective population-based study. AIDS 2008, 22:139-144. 29. Bärnighausen T, Welz T, Hosegood V, Bätzing-Feigenbaum J, Tanser F, Herbst K, Hill C, Newell M: Hiding in the shadows of the HIV epidemic: obesity and hypertension in a rural population with very high HIV prevalence in South Africa. J Hum Hypertens 2008, 22:236-239. 30. Garrib A, Herbst AJ, Hosegood V, Newell ML: Injury mortality in rural South Africa 2000 - 2007: rates and associated factors. Tropical Medicine & International Health 2011, 16:439-446. 31. INDEPTH Standardized Verbal Autopsy questionnaire (Revised August 2003). [http://www.indepth-network.org/index.php? option=com_content&task=view&id=96&Itemid=184]. 32. Baiden F, Bawah A, Biai S, Binka F, Boerma T, Byass P, Chandramohan D, Chatterji S, Engmann C, Greet D, et al: Setting international standards for verbal autopsy. Bulletin of the World Health Organization 2007, 85:570-571. 33. World Health Organization: International statistical classification of diseases and related health problems, 10th revision Switzerland, Geneva: WHO Library; 1992.

doi:10.1186/1478-7954-9-47 Cite this article as: Herbst et al.: Verbal autopsy-based cause-specific mortality trends in rural KwaZulu-Natal, South Africa, 2000-2009. Population Health Metrics 2011 9:47.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

245


Vergnano et al. Population Health Metrics 2011, 9:48 http://www.pophealthmetrics.com/content/9/1/48

RESEARCH

Open Access

Adaptation of a probabilistic method (InterVA) of verbal autopsy to improve the interpretation of cause of stillbirth and neonatal death in Malawi, Nepal, and Zimbabwe Stefania Vergnano1*, Edward Fottrell1,2, David Osrin1, Peter N Kazembe3, Charles Mwansambo4, Dharma S Manandhar5, Stephan P Munjanja6, Peter Byass2, Sonia Lewycka1 and Anthony Costello1

Abstract Background: Verbal autopsy (VA) is a widely used method for analyzing cause of death in absence of vital registration systems. We adapted the InterVA method to extrapolate causes of death for stillbirths and neonatal deaths from verbal autopsy questionnaires, using data from Malawi, Zimbabwe, and Nepal. Methods: We obtained 734 stillbirth and neonatal VAs from recent community studies in rural areas: 169 from Malawi, 385 from Nepal, and 180 from Zimbabwe. Initial refinement of the InterVA model was based on 100 physician-reviewed VAs from Malawi. InterVA indicators and matrix probabilities for cause of death were reviewed for clinical and epidemiological coherence by a pediatrician-researcher and an epidemiologist involved in the development of InterVA. The modified InterVA model was evaluated by comparing population-level cause-specific mortality fractions and individual agreement from two methods of interpretation (physician review and InterVA) for a further 69 VAs from Malawi, 385 from Nepal, and 180 from Zimbabwe. Results: Case-by-case agreement between InterVA and reviewing physician diagnoses for 69 cases from Malawi, 180 cases from Zimbabwe, and 385 cases from Nepal were 83% (kappa 0.76 (0.75 - 0.80)), 71% (kappa 0.41(0.320.51)), and 74% (kappa 0.63 (0.60-0.63)), respectively. The proportion of stillbirths identified as fresh or macerated by the different methods of VA interpretation was similar in all three settings. Comparing across countries, the modified InterVA method found that proportions of preterm births and deaths due to infection were higher in Zimbabwe (44%) than in Malawi (28%) or Nepal (20%). Conclusion: The modified InterVA method provides plausible results for stillbirths and newborn deaths, broadly comparable to physician review but with the advantage of internal consistency. The method allows standardized cross-country comparisons and eliminates the inconsistencies of physician review in such comparisons.

Background Cause-specific mortality data on childhood deaths are vital to identify health needs, compare patterns of death across populations, plan and monitor interventions, and inform policy [1-3]. In high-income countries, all births and deaths are enumerated through vital registration systems, and death certification is routine. In low-

income settings, most births and deaths occur at home, death certificates are rarely available, and vital registrations are often inadequate or nonexistent [2-4]. Verbal autopsies (VAs) provide an alternative means of identifying probable causes of death through interviews with a close caregiver of the deceased, in which information about the circumstances, signs, and symptoms leading to death are gathered. VAs have limitations: they require recollection of events at the time of death, rely on understanding and reporting of signs and symptoms by interviewees, and may be influenced by

* Correspondence: stev@doctors.org.uk 1 Centre for International Health and Development, UCL, Institute of Child Health 30 Guilford St, London WC1N1EH, UK Full list of author information is available at the end of the article

Š 2011 Vergnano et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

246


Vergnano et al. Population Health Metrics 2011, 9:48 http://www.pophealthmetrics.com/content/9/1/48

Page 2 of 9

interviewer skills. The data must also be interpreted to establish a diagnosis [5]. Conventionally, VA questionnaires are read by two or more physicians separately and one or more causes of death are attributed. A cause of death is established when physicians’ opinions correspond; otherwise diagnosis is reconsidered and discussed with or without the input of an additional physician. If no agreement is reached, the cause of death is considered undetermined. Repeatability of this diagnostic process over time and in different settings is problematic, particularly when diagnostic criteria are not standardized amongst different clinicians [6-8]. In some situations, disagreement between physicians is such that a large proportion of causes of death remain indeterminate [7,9]. Moreover, the method is costly, time-consuming, and requires the involvement of physicians who are an already overstretched resource in low-income countries [6,10]. Despite these limitations, VAs are useful in estimating cause-specific mortality fractions (CSMFs) in population studies [6,8,11]. They have been used extensively in epidemiological studies, household surveys, and sentinel surveillance sites, and have been piloted in subsamples from sample registration systems. There remains a need to refine the technique to make it more comparable, repeatable, easy to apply, and cost-effective. VA questionnaires devised by the WHO attempt to standardize the interview process, but more standardized approaches to interpreting VA data are needed. Hierarchical algorithms and computer programs based on logistic regression have been used, but they are difficult to standardize across cultures and age groups and can usually only identify single causes of death [12,13]. InterVA uses a probabilistic method and has been tested in a range of settings for deaths at all ages, across sexes, and for maternal deaths [14,15]. We describe the refinement and evaluation of InterVA to identify causes of death in the perinatal (stillbirths and neonatal deaths in the first seven days) and neonatal periods, using data from three different settings: Malawi, Zimbabwe, and Nepal.

p(C|I) =

p(I|C) ∗ p(C) p(I|C) ∗ p(C) + p(I|!C) ∗ p(!C)

where p (C|I) indicates the probability of a cause of death (C) given the presence of the indicator (I) and p (I/!C) is the probability of I in the absence of C [10]. Probabilities of final-cause categories increase or decrease in relation to specific signs and symptoms reported in the VA interview. If symptoms are not reported, the probabilities do not change. The program is available online http://www.InterVA.net. Users can enter the data as single cases or in batches, and the model generates up to three causes of death and their respective likelihoods. Prior to the current study, the probability matrix consisted of 34 cause of death classifications and 104 indicators [10]. Data sources

To explore the performance of InterVA in different settings, 734 stillbirth and neonatal VAs were obtained from rural areas of three low-income countries. In Malawi (Mchinji District), 169 stillbirth and neonatal VAs were collected from 2004 to 2005, as part of a cluster-randomized study evaluating two community interventions to improve maternal and child health [18]. Although designed for the study, the VA questionnaire was comparable in structure and content with the subsequent WHO questionnaire [19]. Completed questionnaires were interpreted independently by two Malawian pediatricians, who assigned up to three causes of death on the basis of a hierarchical classification and algorithm [20]. They were able to use alternative diagnoses where necessary. Discrepancies were resolved by discussion and, if consensus could not be reached, the cause of death was recorded as indeterminate. In Nepal (Makwanpur district), 385 VAs were collected from 2001 to 2003 as part of a cluster-randomized study of a community intervention to improve maternal and child health [21]. The questionnaire was again comparable with the subsequent WHO tool. Questionnaires were interpreted independently by two Nepalese pediatricians, who each assigned a single cause of death on the basis of the same algorithm used in Malawi. Discrepancies were resolved after review by a third physician. The third data source included 180 neonatal deaths from Zimbabwe, identified as part of a maternal and perinatal mortality study conducted in 2007 and 2008 [22]. Neonatal VAs were conducted using the WHO tool. Questionnaires were interpreted independently by two physicians, who each assigned a single cause of death using the International Classification of Diseases and Related Health Problems (ICD-10). Discrepancies were resolved after review by a third physician (Table 2).

Methods Based on Bayes’ theorem [16], the InterVA model calculates the probability of a set of causes of death given the presence of circumstances, signs, and symptoms (collectively called ‘indicators’) reported in VA interviews. The method is described in detail elsewhere [10,17]. Briefly, a finite number of causes of death are assigned to a predefined matrix of estimated probabilities of occurrence. The presence of indicators (Table 1) modifies the predefined probabilities of each cause of death upward or downward using Bayes’ theorem according to the formula

247


Vergnano et al. Population Health Metrics 2011, 9:48 http://www.pophealthmetrics.com/content/9/1/48

Page 3 of 9

Table 1 InterVA indicators and cause of death categories Indicators

Cause of Death Categories

was this an elder 65+ years

any chronic/recurrent diarr (4+w)

Perinatal asphyxia

was this an adult 50-64 years

any abdominal swelling

Congenital malformation

was this a female 15-49 years

any vomiting

Prematurity

was this a male 15-49 years

any yellowness/jaundice

Tetanus

was this a child 5-14 years

any abnormality of urine

Pneumonia

was this a child 1-4 years

any urinary retention

Malaria

was this an infant 4 wks-1 yr

any haematuria

Measles

was this a neonate < 4 wks was she pregnant at death

any swelling of ankles/legs no bilateral swelling of ankle

Meningitis Diarrhea

did pregnancy end within 6 weeks

any skin lesions/ulcers

Bloody diarrhea

did final illness last at least 3 weeks

any rash (non-measles)

Other acute infection

did final illness last < 3 weeks

any herpes zoster

Malnutrition

was death very sudden/unexpected

any measles rash

Kwashiorkor

was death during wet season

any excessive night sweats

HIV/AIDS related

was death during dry season

any excessive water intake

Pulmonary tuberculosis

was s/he in a transport accident did s/he drown

any excessive urination any excessive food intake

Chronic infection Maternal causes

had s/he fallen recently

any acute fever

Acute respiratory disease (not pnem.)

any poisoning, bite, sting

any persistent fever (> 2 wk)

Chronic respiratory disease

was s/he a known smoker

any enlarged/swollen glands

Acute cardiac

any obvious recent injury

any facial swelling

Chronic cardiac

was s/he known to drink alcohol

was there a coma > 24 hrs

Stroke

any suggestion of homicide

any weight loss

Diabetes

any convulsions or fits any diagnosis of epilepsy

any anaemia/paleness any drowsiness

Malignancy Liver disease

was the fontanelle raised

any delayed/regressed development

Kidney disease

was the fontanelle or eyeball sunken

any diagnosis of asthma

Disorders of the digestive system

any headache

any diagnosis of diabetes

Diseases of the nervous system

was there paralysis on both sides

any diagnosis of heart disease

Sickle cell anemia

any paralysis/weakness on 1 side

any diagnosis of HIV/AIDS

Transport-related accident

any stiff neck

any diagnosis of hypertension

Accidental poisoning

any oral candidiasis any rigidity/lockjaw

been discharged from hospital very ill any suggestion of suicide

Accidental drowning Other accident

abnormal hair coloring

any surgery just before death

Homicide

any coughing with blood

any diagnosis of TB

Suicide

any chest pain

was s/he adequately vaccinated

was there a cough for > 3 wks

any diagnosis of liver disease

Additional indicators

was there a cough for up to 3 wks

any diagnosis of cancer

did baby have arched back after 2 days

any productive cough

any diagnosis of stroke

baby stopped sucking after day 3

any rapid breathing any breathlessness on exertion

any diagnosis of measles any diagnosis of kidney disease

did the baby die on day 1 did the mother fail to receive tetanus toxoid vaccine

any breathlessness lying flat

any diagnosis of hemoglobinopathy

did convulsions happen on day 1

any chest indrawing

any diagnosis of malaria

was there no cry/move/breath at birth

any difficulty breathing

any delivery complications

was baby’s skin puffy/mushy at birth

any breast lump or lesion

any heavy bleeding around delivery

did the baby fail to cry at birth

any wheezing

was there prolonged labor > 24 hrs

any cyanosis

were there convulsions during delivery

any abdominal mass any abdominal pain

was the baby born early < 34 wks was the baby small < 2500 g

Additional causes of death Fresh stillbirth

any diarrhea with blood

was there difficulty breathing at birth

Macerated stillbirth

any vomiting with blood

any congenital malformations

any acute diarrhea (< 2 wks)

was this a multiple birth

any persistent diarrhea (2-4 wks)

any umbilical infection

Indicators added following refinement of InterVA for neonatal deaths are highlighted in bold.

248


Vergnano et al. Population Health Metrics 2011, 9:48 http://www.pophealthmetrics.com/content/9/1/48

Page 4 of 9

Table 2 Characteristics of the three studies used as data sources Neonatal mortality rate

Malawi

Nepal

Zimbabwe

27/1000[30].

33/1000[31]

24/1000[32]

Study period

1 year

3 years

2 years

Number of VA questionnaires

169

385

180

Questionnaire

Mixed open and closed questions

Mixed open and closed questions

Standard WHO tool incorporating open and closed questions (24)

Interviewers

5 lay Malawian interviewers with secondary education

Lay local field coordinators

45 midwife enumerators

Physician review

2 experienced local pediatricians Predefined algorithm 3 causes of death

3 experienced local pediatricians Predefined algorithm 1 cause of death

3 experienced local physicians International Classification of Diseases and Related Health Problems (ICD-10) 1 cause of death

researcher (SV) and an epidemiologist involved in the development of InterVA (EF). Following this initial refinement, InterVA was evaluated by comparing caseby-case diagnoses with physician-assigned diagnoses for the same 100 VA cases, as well the population-level CSMFs. A process of refinement and comparisons with physician review was undertaken until InterVA elicited mortality profiles deemed by the researchers to be plausible and satisfactorily comparable to physician review.

Refining InterVA

Initial refinement of the InterVA model was based on 100 (59%) physician-reviewed VAs from Malawi. The use of these data for refinement was pragmatic in that, at the time of refinement, they were the only data available. Data from the VA questionnaire were entered in the InterVA model, which assigned causes of death and associated likelihoods. The open histories, where the caregiver reported the events leading to death, were coded and also entered in the model. CSMFs obtained using the original InterVA and physician review were compared. CSMFs were calculated from the InterVA output as the sum of the likelihoods computed for each single cause of death category, divided by the sum of the likelihoods for all causes. For the calculation of CSMFs from physician-review data, if more than one cause of death was assigned, each was considered as a proportion of the total death. Therefore, if a single cause of death was assigned by all physicians, or if only one was available, it explained 100% of that death. If more than one cause of death was attributed, each contributed an equal proportion of the total 100%. For example, if both reviewing physicians assigned prematurity as a cause of death and one of them also assigned sepsis, then prematurity contributed 75% and sepsis 25% to the death. In this way, every available physician diagnosis contributed to the cause-specific mortality profile, avoiding a potential loss of information and bias that might have been introduced by using consensus diagnoses alone. Fifty-four neonatal-death questionnaires were analyzed with the original InterVA model. Stillbirths were initially excluded, as InterVA was not designed to classify them. The results of this first analysis identified the need for greater differentiation in the model among causes of death in the neonatal period. The InterVA indicators and matrix probabilities were therefore reviewed for clinical and epidemiological coherence by a pediatrician-

Evaluating the refined InterVA model

The modified InterVA model was evaluated by comparing population-level CSMFs derived from the two methods of interpretation (physician review and InterVA) for a further and hitherto-untouched 69 VAs from Malawi, 385 from Nepal, and 180 from Zimbabwe. A diversity of data sources was chosen to assess the performance of InterVA in a range of settings. Comparisons of population-level CSMFs were considered paramount as InterVA is intended as a public health tool for health monitoring and program evaluation, rather than for use in clinical settings. Nevertheless, individual level, caseby-case comparisons between physician diagnoses and InterVA were also conducted and the kappa statistic for interrater agreement was calculated to further evaluate the InterVA against the only available alternative method in our populations [23]. Ethical considerations

The Maimwana study (Malawi) received ethical approval from the Malawi National Health Sciences Research Committee; the MIRA Makwanpur, Nepal, study was approved by the Nepal Health Research Council and the Institute of Child Health and Great Ormond Street Hospital ethics committees; and the Zimbabwe Maternal and Perinatal Mortality Study received ethical approval from the Medical Research Council of Zimbabwe (MRCZ/A/1368).

249


Vergnano et al. Population Health Metrics 2011, 9:48 http://www.pophealthmetrics.com/content/9/1/48

Page 5 of 9

Results

Multicountry mortality comparison

Refining InterVA

Considering the above evaluations and taking the refined model to be adequate for the purposes of characterizing cause compositions of neonatal mortality for population health planning and monitoring, a threecountry comparison of neonatal cause-specific mortality was conducted (Figure 2). It showed some differences in cause compositions of neonatal deaths, particularly in Zimbabwe compared to the other two settings. In Zimbabwe, the proportions of preterm births and deaths due to infection were higher (44%) than in Malawi (28%) or Nepal (20%).

InterVA was modified to include two extra cause of death categories: fresh stillbirth and macerated stillbirth. To define the stillbirth diagnoses and differentiate among possible causes of stillbirth and neonatal death, nine further indicators were added to the model. The resulting modifications to the specific indicators and cause of death categories included in InterVA are shown in Table 1. As these are extra entities in the model, they run in parallel to the existing indicators and causes without directly affecting them. To compare the InterVA output and physician diagnoses in the three settings, some rationalization between the physician-assigned causes and the causes obtained from InterVA was necessary; therefore, causes of death not included in the InterVA classification were grouped as “other.” Similarly, infectious causes of neonatal deaths, including sepsis, pneumonia, and meningitis were grouped together into an “infection” category, since the possibilities of clinically distinguishing them in newborn infants is difficult. There were no cases of neonatal tetanus. The resulting CSMFs for InterVA and physician review of the 100 VA cases from Malawi used to refine the model are shown in Figure 1. In 73% of cases, at least one of the InterVA diagnoses agreed with at least one of the physician diagnoses (kappa 0.60 (95% confidence interval [CI]: 0.57, 0.70)).

Discussion The deadline for the Millennium Development Goals (MDGs) is less than five years away and the need to quantify childhood mortality, understand its causes, and assess the effects of proposed interventions are central to MDG4. Neonatal deaths contribute about 40% of under-5 mortality globally [24]. A recent evaluation of the INDEPTH network of Health and Demographic Surveillance Sites [25] calls for all sites to use InterVA for coding of causes of death, since such approaches represent “the only viable strategy to produce timely and comparable cause of death statistics” [26]. Our study has revised the InterVA method for verbal autopsy to improve its ability to identify causes of stillbirth and newborn death and tested it in three populations. In this study, physician review was used as a reference standard to compare InterVA. The use of physician review was the only alternative source of cause of death assessment for our study populations. This choice has limitations, however. Physicians are influenced by their experience, perception, and interpretation of local epidemiology [23,27]. Moreover, they mostly use the open history to reach a decision and may not account consistently for all the indicators. Sensitivity and specificity of physician review compared with hospital diagnosis in neonatal populations varied between 64% and 74% in a recent study [20] and concerns about inter- and intrarater reliability are well described [23]. An alternative to physician diagnoses is the use of hospital records. Hospital diagnoses have been used to establish sensitive, specific, and positive predictive values of VA diagnoses [8,12,20]. The main pitfall of hospital diagnoses in developing countries, particularly in rural settings, is that the CSMF of deaths occurring in hospitals are likely to be different from the ones in communities [23]. There is therefore the risk of increasing precision of an interpretative method, defined as its ability to reproduce hospital diagnoses in the population where it is tested. This would not necessarily produce results that are correct when used in populations where

Evaluation of the Refined InterVA Model

After refining the model, case-by-case agreement between InterVA and reviewing physician diagnoses, for 69 cases from Malawi, 180 cases from Zimbabwe, and 385 cases from Nepal, was 83% (kappa 0.76 (0.75 0.80)), 71% (kappa 0.41(0.32-0.51)), and 74% (kappa 0.63 (0.60-0.63)), respectively. CSMFs derived from InterVA and physician review in Malawi, Zimbabwe, and Nepal are illustrated in Table 3. In Malawi and Zimbabwe, the rank order of causes of death was identical when derived from InterVA or physician review. In Nepal, the most common cause of death according to InterVA was perinatal asphyxia, while it was neonatal infections according to physicians. Prematurity was diagnosed more commonly by InterVA than by physicians in Nepal and Zimbabwe. InterVA detected a higher proportion of neonatal infections than physicians in Zimbabwe, but a lower proportion in Nepal. Stillbirths

The proportion of total stillbirths identified by the two methods of VA interpretation was similar in all three settings. Data from Malawi and Nepal allowed for a more detailed comparison of the relative proportions of fresh and macerated stillbirths (Table 4).

250


Vergnano et al. Population Health Metrics 2011, 9:48 http://www.pophealthmetrics.com/content/9/1/48

Page 6 of 9

Table 4 Fresh/macerated split of stillbirths from Malawi and Nepal based on interpretation by InterVA and physician review

InterVA

Malawi -100

PR 60

Malawi 169 VA

50 40 30

Fresh

Nepal 385 VA

Physician review

InterVA

Physician review

InterVA

23.2

33.4

24.7

39.4

20

Macerated

4.8

10.9

19.2

5.9

10

Total Stillbirth

28.0

44.3

44.0

45.2

0 Ind eterminate Co ng enital malfo rmatio n

Stillb irth

Perinatal as p hyxia

Neo natal infectio ns

Prematurity

Other

Alternatively, it could be due to the selection of a priori probabilities. Greater understanding of the way physicians decide to value or ignore signs and symptoms may help in future refinements and evaluations of InterVA. Stillbirths were included for practical and public health reasons. Although globally there are about 3.2 million stillbirths per year, reliable statistics are lacking [29]. This information gap has to be addressed. About half of perinatal deaths are accounted for by stillbirths [29]. The refinements including stillbirths in the model eliminate the need to differentiate between live births and stillbirths before processing VA data, making the method more suitable for use in large surveys. The separation between fresh and macerated stillbirths is relevant, as prevention strategies are different. The comparisons between InterVA and physician review in Malawi and Nepal suggest that InterVA can differentiate the two categories, although, as with neonatal deaths, there may be room for further refinement. Case-by-case agreement was moderate in all datasets, however it was lower for Zimbabwe compared to Nepal and Malawi. The new indicators and matrix probabilities have been chosen and modified on the basis of the personal experience of the researchers, and subsequently tested and modeled on a subset of the Malawi data. There is a risk, therefore, that the tool may be too closely modeled on a sub-Saharan African setting (although the results from Nepal do not support this) or on a

Figure 1 Cause-specific mortality fractions from InterVA and physician review (PR) for the 100 VA cases from Malawi used to develop and refine the model. Note to Figure 1: Other causes include “jaundice,” “multiple pregnancies,” “maternal causes,” “hypothermia,” and “hypoglycemia.”

access to hospitals and health care is limited. Moreover, the ability to recognize, recall, and report signs of illnesses may be different among hospital users and nonhospital users. The results of InterVA as compared with physician review showed an almost identical ranking of causes of death. However, differences exist. Some of these differences can be explained by the way the model was constructed. Prematurity, for example, was over-diagnosed by InterVA in Zimbabwe and Nepal. This probably resulted from using a dataset where clinicians were allowed more than a single cause of death to refine InterVA. In fact, when multiple causes of death are allowed, prematurity is more likely to be listed as a coexisting cause of death than when a single cause is selected [28]. The model did not include “other” as a cause of death and would have classified such causes of death in one of the available diagnoses. InterVA over-diagnosed neonatal infections compared with physician review in Zimbabwe, while the opposite happened in Nepal. This inconsistency could be due to the interpretation of signs by different physicians.

Table 3 Comparison of cause-specific mortality fractions according to InterVA and physician review Malawi 69 VA

Nepal 385 VA

Zimbabwe 180 VA

Physician review

InterVA

Physician review

InterVA

Physician review

InterVA

Stillbirth

28.0

44.3

44.0

45.2

16.5

20.1

Perinatal asphyxia

18.8

19.4

21.5

26.4

11.3

9.9

Neonatal infections

23.3

26.0

28.0

20.4

30.6

44.5

Prematurity

10.4

7.5

3.1

6.5

18.2

23.9

Congenital malformations

2.2

1.3

0.8

0.9

1.3

0.4

Other

13.3

Indeterminate

4.0

1.6 1.5

1.0

251

9.5 0.6

12.8

1.3


Vergnano et al. Population Health Metrics 2011, 9:48 http://www.pophealthmetrics.com/content/9/1/48

Page 7 of 9

100%

Prematurity

Prematurity

Prematurity 80%

Neonatal infections

60%

Perinatal asphyxia

Neonatal infections

Neonatal infections

Perinatal asphyxia

40% Perinatal asphyxia

Stillbirth

Stillbirth

Zimbabwe

Malawi

Nepal

Prematurity

23.91

6.59

6.50

Neonatal infections

44.47

28.22

20.44

Perinatal asphyxia

9.89

16.17

26.37

Stillbirth

20.06

46.70

45.23

Congenital malformation

0.36

1.02

0.91

Indeterminate

1.32

1.28

0.56

20% Stillbirth 0%

Figure 2 Neonatal death cause compositions from InterVA interpretation of VA data from 169 deaths in Malawi, 180 deaths in Zimbabwe, and 385 deaths in Nepal.

further methodological research into the effects of other aspects of VA. It is likely that a number of strategies and international collaborations will be necessary to ensure the success of such investigations. The modified version of InterVA for stillbirths and neonatal deaths produced plausible results when compared with physicians’ opinions but had the advantage of being completely internally consistent, allowing standardized comparisons of data from different countries. Ultimately, standardized methods are essential and their application and evaluation in a wide range of settings is encouraged. Through wider application, the strengths and weakness of InterVA, and VA in general, will become more apparent, thereby better informing the application and public health utility of surrogate methods for measuring mortality in absence of vital registration systems.

particular research setup. In addition, the modifications have so far not been put to a panel of experts and may need to be subject to a wider consensus. There may be important epidemiological and social explanations for the difference in the CSMF in Malawi, Zimbabwe, and Nepal. However, even if the interpretation of verbal autopsy data by InterVA was consistent, methodological variability in other aspects of VA may have contributed to the observed cause distribution. Indeed, the close comparability of CSMF between Malawi and Nepal may to some degree reflect common data capture processes that differ from those used in Zimbabwe. It is possible that in Nepal and Malawi, the populations were part of research areas and might have been sensitized to recognize, describe, and recall signs of neonatal diseases, while in Zimbabwe the community was part of a government surveillance and may have responded differently. Nevertheless, this is a reality of all VA studies conducted in research settings. Use of lay (in Malawi and Nepal) versus health-professional (in Zimbabwe) interviewers and their gender may also have had an impact on data capture. This highlights the need for

Acknowledgements Maimwana trial funding in Malawi was provided by Saving Newborn Lives/ Save the Children, with additional funding from the UK Department for International Development (DFID), The Wellcome Trust and UNICEF Malawi.

252


Vergnano et al. Population Health Metrics 2011, 9:48 http://www.pophealthmetrics.com/content/9/1/48

Page 8 of 9

We thank the Maimwana office and field staff who made this project happen: Tambosi Phiri, Mikey Rosato, Delia Chikuse, Levie Kamtambe, Queen Sara Soho, Joseph Jaffu, Jeremia Mvula and the Mchinji community without which the study would have not been possible, the Mchinji District Health Management Team and district Executive Committee, and traditional leaders working in the district that supported the project. Prof. Marie-Louise Newell also helped set the study up and advised on its development. Makwanpur trial funding was provided by the UK Department for International Development, with additional support from the Division of Child and Adolescent Health, WHO, UNICEF, and the UN Fund for Population Activities. We thank the MIRA team in Makwanpur and Kathmandu, project managers Bhim Shrestha and Kirti Tumbahhamphe, Drs. S Manandhar and A Ojha, who read and interpreted the verbal-autopsy questionnaires, the communities in Makwanpur district who allowed the study to take place, and the Makwanpur District Development Committee and its members. Data from Zimbabwe were available thanks to the support and funding provided by the UK Department for International Development (DfID), the World Health Organization (WHO), the United Nations Fund for Population Activities (UNFPA), and the United Nations Children’s Fund (UNICEF). We are particularly grateful to Gwendoline Kandawasvika for assistance in data handling.

References 1. Murray CJ, Lopez AD, Wibulpolprasert S: Monitoring global health: time for new solutions. BMJ 2004, 329:1096-1100. 2. Mathers CD, Fat DM, Inoue M, Rao C, Lopez AD: Counting the dead and what they died from: an assessment of the global status of cause of death data. Bull World Health Organ 2005, 83:171-177. 3. Setel PW, Macfarlane SB, Szreter S, Mikkelsen L, Jha P, Stout S, et al: A scandal of invisibility: making everyone count by counting everyone. Lancet 2007. 4. Setel PW, Sankoh O, Rao C, Velkoff VA, Mathers C, Gonghuan Y, et al: Sample registration of vital events with verbal autopsy: a renewed commitment to measuring and monitoring vital statistics. Bull World Health Organ 2005, 83:611-617. 5. Snow B, Marsh K: How Useful are Verbal Autopsies to Estimate Childhood Causes of Death? Health Policy and Planning 1992, 7:22-29. 6. Quigley MA, Chandramohan D, Rodrigues LC: Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies. Int J Epidemiol 1999, 28:1081-1087. 7. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84:239-245. 8. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: their development and validation in a multicentre study. Trop Med Int Health 1998, 3:436-446. 9. Freeman JV, Christian P, Khatry SK, Adhikari RK, LeClerq SC, Katz J, et al: Evaluation of neonatal verbal autopsy using physician review versus algorithm-based cause-of-death assignment in rural Nepal. Paediatr Perinat Epidemiol 2005, 19:323-331. 10. Byass P, Huong DL, Minh HV: A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scand J Public Health Suppl 2003, 62:32-37. 11. Quigley MA, rmstrong Schellenberg JR, Snow RW: Algorithms for verbal autopsies: a validation study in Kenyan children. Bull World Health Organ 1996, 74:147-154. 12. Kalter HD, Hossain M, Burnham G, Khan NZ, Saha SK, Ali MA, et al: Validation of caregiver interviews to diagnose common causes of severe neonatal illness. Paediatr Perinat Epidemiol 1999, 13:99-113. 13. Marsh DR, Sadruddin S, Fikree FF, Krishnan C, Darmstadt GL: Validation of verbal autopsy to determine the cause of 137 neonatal deaths in Karachi, Pakistan. Paediatr Perinat Epidemiol 2003, 17:132-142. 14. Fantahun M, Fottrell E, Berhane Y, Wall S, Hogberg U, Byass P: Assessing a new approach to verbal autopsy interpretation in a rural Ethiopian community: the InterVA model. Bull World Health Organ 2006, 84:204-210. 15. Fottrell E, Byass P, Ouedraogo TW, Tamini C, Gbangou A, Sombie I, et al: Revealing the burden of maternal mortality: a probabilistic model for determining pregnancy-related causes of death from verbal autopsies. Popul Health Metr 2007, 5:1. 16. Bayes T: An essay towards solving a problem in the doctrine of chances. 1763. MD Comput 1991, 8:157-171. 17. Byass P, Fottrell E, Dao LH, Berhane Y, Corrah T, Kahn K, et al: Refining a probabilistic model for interpreting verbal autopsy data. Scand J Public Health 2006, 34:26-31. 18. Lewycka S, Mwansambo C, Kazembe PN, Phiri T, Mganga A, Rosato M, et al: A cluster randomised controlled trial of the community effectiveness of two intervantions in rural Malawi to improve health care and to reduce maternal, newborn and infant mortality. Trials 2010, 11. 19. Verbal Autopsy Standards: Ascertaining and Attributing Cause of Death. WHO France, WHO; 2007, Ref Type: Report. 20. Edmond KM, Quigley MA, Zandoh C, Danso S, Hurt C, Owusu AS, et al: Diagnostic accuracy of verbal autopsies in ascertaining the causes of stillbirths and neonatal deaths in rural Ghana. Paediatr Perinat Epidemiol 2008, 22:417-429. 21. Manandhar DS, Osrin D, Shrestha BP, Mesko N, Morrison J, Tumbahangphe KM, et al: Effect of a participatory intervention with women’s groups on birth outcomes in Nepal: cluster-randomised controlled trial. Lancet 2004, 364:970-979. 22. Munjanja SP: Ministry of Health and Child Welfare, Zimbabwe: Maternal and perinatal mortality study 2007. 2009 [http://www.unicef.org/ zimbabwe/ZMPMS_report.pdf], Unicef. Ref Type: Generic. 23. Fottrell E, Byass P: Verbal autopsy: methods in transition. Epidemiol Rev 2010, 32:38-55.

Author details 1 Centre for International Health and Development, UCL, Institute of Child Health 30 Guilford St, London WC1N1EH, UK. 2Umeå Centre for Global Health Research, Division of Epidemiology and Global Health, Department of Public Health and Clinical Medicine, Umeå University, SE 901-85 Sweden. 3 Baylor College of Medicine Children’s Foundation Malawi; Private Bag 397, Lilongwe, Malawi. 4Kamuzu Central Hospital Lilongwe, Department of Pediatrics, PO Box 149, Lilongwe, Malawi. 5Mother and Infant Research Activities (MIRA), Kathmandu Medical College G.P.O. Box 921, Kathmandu, Nepal. 6College of Health Science, University of Zimbabwe, Department of Obstetrics and Gynaecology, Po Box A178, Harare, Zimbabwe. Authors’ contributions SV contributed to the setup of the Maimwana study in Malawi and to the formulation of the VA questionnaire used in Malawi and was involved in the adaptation and evaluation of the existing InterVA method, processing VA data from Malawi and Nepal, and drafting and reviewing the manuscript. EF was involved in the initial development and testing of InterVA refinements and evaluation of the model for the current study, processing the Zimbabwe VA data, and drafting and reviewing the manuscript. DO contributed to the setup of the Makwampur study in Nepal and to the formulation of the VA questionnaire in Nepal and Malawi. He provided the Nepal data and contributed to the interpretation of the results and revisions of the manuscript. PNK was a PI of the Maimwana study and interpreted the VA questionnaires from Malawi. CM was a PI of the Maimwana study and interpreted the VA questionnaires from Malawi. DSM was a PI of the Makwampur study, interpreted the VA questionnaires from Malawi, and contributed to the final draft of this manuscript. SPM was PI of the maternal and neonatal mortality survey in Zimbabwe and contributed to the interpretation of the study results. PB devised the InterVA method to interpret VA and contributed to the interpretation of the results and revisions of the manuscript. SL contributed to the setup of Maimwana study in Malawi, to the formulation of the VA questionnaires used in Malawi, and to the final draft of this manuscript. AC ideated the Maimwana and Makwampur studies and contributed to the interpretation of the results and revisions of the manuscript. All authors read and approved to the final draft of this manuscript. Competing interests EF & PB contributed to this study with support from FAS, the Swedish Council for Working Life and Social Research (grant 2006-1512). DO is supported by a Wellcome Trust Fellowship (081052/Z/06/Z). Received: 18 November 2010 Accepted: 5 August 2011 Published: 5 August 2011

253


Vergnano et al. Population Health Metrics 2011, 9:48 http://www.pophealthmetrics.com/content/9/1/48

Page 9 of 9

24. Lawn JE, Cousens S, Zupan J: 4 million neonatal deaths: when? Where? Why? Lancet 2005, 365:891-900. 25. Indepth: International network of field sites with continuous demographic evaluation of populations and their health in developing countries. 2010 [http://www.indepth-network.org], Ref Type: Internet Communication. 26. Kinyanjui S, Timaeus IM: The international network for the demographic evaluation of populations and their health (INDEPTH), the importance of core support. SIDA, Stockholm; 2010, Ref Type: Report. 27. Coldham C, Ross D, Quigley M, Segura Z, Chandramohan D: Prospective validation of a standardized questionnaire for estimating childhood mortality and morbidity due to pneumonia and diarrhoea. Trop Med Int Health 2000, 5:134-144. 28. Lee AC, Mullany LC, Tielsch JM, Katz J, Khatry SK, LeClerq SC, et al: Verbal autopsy methods to ascertain birth asphyxia deaths in a communitybased setting in southern Nepal. Pediatrics 2008, 121:e1372-e1380. 29. Cousens S, Blencowe H, Stanton C, Chou D, Ahmed S, Steinhardt L, et al: National, regional, and worldwide estimates of stillbirth rates in 2009 with trends since 1995: a systematic analysis. Lancet 2011, 377(9774):1319-1330. 30. Mwale MW: Infant and Child Mortality. In Malawi Demographic Health and Survey. Edited by: National Statistic Office M, Macro ORC. Calverton, Maryland: National Statistic Office, Malawi; ORC Macro; 2005:123-132. 31. Ministry of Health and Population KN: Infant and Child Mortality. Nepal Demographic Health Survey, 2006 Katmandu, Nepal: New ERA and Macro Intternational; 2007, 123-130. 32. Central Statistical Office: Early Childhood Mortality. Zimbabwe Demographic Health Surevey 2005-2006 Calverton Maryland: CSO and Macro International Inc.; 2007, 109-117. doi:10.1186/1478-7954-9-48 Cite this article as: Vergnano et al.: Adaptation of a probabilistic method (InterVA) of verbal autopsy to improve the interpretation of cause of stillbirth and neonatal death in Malawi, Nepal, and Zimbabwe. Population Health Metrics 2011 9:48.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

254


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

RESEARCH

Open Access

Validating physician-certified verbal autopsy and probabilistic modeling (InterVA) approaches to verbal autopsy interpretation using hospital causes of adult deaths Evasius Bauni1*, Carolyne Ndila1, George Mochamah1, Gideon Nyutu1, Lena Matata1, Charles Ondieki4, Barbara Mambo4, Maureen Mutinda4, Benjamin Tsofa1,4, Eric Maitha4, Anthony Etyang1 and Thomas N Williams1,2,3,5

Abstract Background: The most common method for determining cause of death is certification by physicians based either on available medical records, or where such data are not available, through verbal autopsy (VA). The physiciancertification approach is costly and inconvenient; however, recent work shows the potential of a computer-based probabilistic model (InterVA) to interpret verbal autopsy data in a more convenient, consistent, and rapid way. In this study we validate separately both physician-certified verbal autopsy (PCVA) and the InterVA probabilistic model against hospital cause of death (HCOD) in adults dying in a district hospital on the coast of Kenya. Methods: Between March 2007 and June 2010, VA interviews were conducted for 145 adult deaths that occurred at Kilifi District Hospital. The VA data were reviewed by a physician and the cause of death established. A range of indicators (including age, gender, physical signs and symptoms, pregnancy status, medical history, and the circumstances of death) from the VA forms were included in the InterVA for interpretation. Cause-specific mortality fractions (CSMF), Cohen’s kappa ( ) statistic, receiver operating characteristic (ROC) curves, sensitivity, specificity, and positive predictive values were applied to compare agreement between PCVA, InterVA, and HCOD. Results: HCOD, InterVA, and PCVA yielded the same top five underlying causes of adult deaths. The InterVA overestimated tuberculosis as a cause of death compared to the HCOD. On the other hand, PCVA overestimated diabetes. Overall, CSMF for the five major cause groups by the InterVA, PCVA, and HCOD were 70%, 65%, and 60%, respectively. PCVA versus HCOD yielded a higher kappa value ( = 0.52, 95% confidence interval [CI]: 0.48, 0.54) than the InterVA versus HCOD which yielded a kappa ( ) value of 0.32 (95% CI: 0.30, 0.38). Overall, ( ) agreement across the three methods was 0.41 (95% CI: 0.37, 0.48). The areas under the ROC curves were 0.82 for InterVA and 0.88 for PCVA. The observed sensitivities and specificities across the five major causes of death varied from 43% to 100% and 87% to 99%, respectively, for the InterVA/PCVA against the HCOD. Conclusion: Both the InterVA and PCVA compared well with the HCOD at a population level and determined the top five underlying causes of death in the rural community of Kilifi. We hope that our study, albeit small, provides new and useful data that will stimulate further definitive work on methods of interpreting VA data. Keywords: verbal autopsy, InterVA, validation, cause-specific mortality fraction, kappa, ROC

* Correspondence: ebauni@kilifi.kemri-wellcome.org 1 Department of Epidemiology and Demography, KEMRI-Wellcome Trust Research Programme, PO Box 230 Kilifi 80108, Kenya Full list of author information is available at the end of the article Š 2011 Bauni et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

255


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

Page 2 of 12

interpret VA data with the distribution ascribed on the basis of physician diagnosis in a hospital, which we treat as our “gold standard.”

Background Vital registration data in developing countries are incomplete and capture few physician-certified deaths [1]. Nevertheless, any meaningful health intervention policy or program must be informed by the causes of illness and death that are of greatest importance locally. Verbal autopsy (VA)-the interviewing of family members or caregivers about the circumstances of death after the event-offers one approach to the supplementation of this scarce but useful information. The government of Kenya suggested that the Kilifi, Nairobi, and Kisumu Demographic Surveillance System (DSS) sites use this approach to supplement national cause of death data. To allow data comparability, the latest version of the World Health Organization (WHO) Sample Vital Registration with Verbal Autopsy (SAVVY) tools were recommended for the sites [2]. The Kilifi Health Demographic Surveillance System (KHDSS) covers an area of 900 km2 and a resident population of 250,000. Approximately 80% of patients admitted to Kilifi District Hospital (KDH) reside in this area. The population register is updated through re-enumeration rounds conducted every 3 to 4 months, and 1200 to 1500 deaths within the resident population are identified every year. More than 60% of these deaths occur outside the hospital where the causes of death are rarely recorded. Through collaboration with the Ministry of Health (MOH) at a local level, the KHDSS started collecting verbal autopsy data in 2008 with a view to establishing the underlying causes of death for the majority who die at home. Key sensitization messages were jointly developed and passed on to the community by staff working for both the KHDSS and the KDH. VA sensitization has subsequently become a routine process at the KDH and its surrounding health facilities. The Kilifi integrated data managing system (KIDMS) is a computer-based system that links the KHDSS, pediatric, adult, and maternity ward surveillance systems in realtime through unique personal identifiers (PIDs). Deaths captured through any of these surveillance systems were captured in a single database and classified as neonates (0 to 27 days old), children (28 days to 14 years old) or adolescents and adults (15+ years old). The system generated the corresponding VA instruments and homestead maps for field interviews. Completed VA forms were edited, and the data were entered into a computer database for subsequent coding by a physician. The main aim of the current study was to compare, at the population level, the distribution of underlying causes of adult deaths that are ascribed to a short list of 35 of the most common causes of death when using physician-certified verbal autopsy (PCVA) and the probabilistic InterVA model that are commonly used to

Materials and methods Study area and population

The KHDSS, first established in October 2000, serves as a framework for population-based epidemiological studies of diseases of local importance, monitors mortality trends, and is used to evaluate the impact of interventions of national public health importance. The area was initially mapped and all homesteads plotted using Garmin eTrex Venture ® hand-held geographical positioning system (GPS) units with an accuracy of three meters. The resident population was enumerated and individual details of age, sex, ethnicity, location/sub-location, and sleeping building unit (BU) of residence were recorded. Thereafter, births, deaths, in-migration and out-migration events, pregnancies, and new or demolished BU’s were updated through census rounds conducted approximately three times a year. Cause of death data have been explored using the latest version of the WHO SAVVY tools since 2008. The distribution of the adult deaths included in this study, which compares closely with the overall distribution of deaths from March 2007 to June 2010, is shown in Figure 1. The KHDSS area covers almost the whole of Kilifi district, making it possible to generalize the results of this study to the community living within the district. WHO SAVVY tools

The WHO SAVVY tools include three verbal autopsy questionnaires that are used to collect data on neonates (0 to 27 days old), children (28 days to 14 years old), and adolescents or adults (15+ years). Each questionnaire includes a short open narrative section followed by a series of closed questions. The narrative briefly explains the circumstances of death, while the closed questions provide details of specific signs, symptoms, and conditions. Introduction of the VA tools was preceded by a number of focus group discussions with community members to identify appropriate local terms for physical signs and symptoms and translate the forms into the local languages Giriama and Kiswahili. These translations were validated by back-translation by two independent teams of translators. Each interview took roughly 30 to 45 minutes to administer, 5 to 10 minutes to edit, and 5 to 10 minutes to enter into the computer. Physician certification of VA questionnaires

A computer-based work management system (written in FileMaker Pro™ V9.0; FileMaker, USA) was developed to capture deaths from the KIDMS, calculate age, print

256


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

Page 3 of 12

" u

" u

" u

" u ' "

" u

" u

" u " u

Died in Hospital

" u

Yes No " u

" '

" u

" u " u

Kilifi District Hospital Health facilities Roads

" u

DSS Area Kilifi District

Figure 1 represent 2007 and members

The distribution of adult deaths, March 2007-June 2010. The figure shows distribution of deaths used in this study. The red dots the 145 deaths that occurred in the Kilifi district hospital, and the white dots represent the overall death distribution between March June 2010. The KHDSS area covers almost the entire district of Kilifi, making it sensible to generalize the results for community living in the district.

257


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

Page 4 of 12

supporting the Kilifi District Hospital are Good Clinical Laboratory Practice (GCLP) accredited and are audited by international regulatory bodies on an annual basis. Patients admitted to KDH were examined according to a fixed protocol and samples were collected for malaria microscopy, hematology, and bacteriology. Other assays were performed as indicated by the clinical presentation of the patient. For those who died, the cause of death was determined by considering all the available evidence. The clinical data were captured online in real time using a standard questionnaire completed by the physician during the course of admission. The final diagnosis at death was selected from a modified, in-built ICD-10 list that included 590 diagnoses. For the purposes of this study, the hospital diagnosis that was based on standard guidelines (full medical history) and reflected the best judgment of the attending physician, substantiated by relevant radiological or laboratory investigations, was used as the gold standard.

the corresponding VA instrument and homestead map, and provide data entry and coding screens. Physicians trained in the use of the WHO 10 th revision of the International Classification of Diseases (ICD-10) list [3] independently logged into the coding screen to review the VA questionnaires and determined both the immediate and the underlying cause of death. Using PID numbers for residents of KHDSS, the system compared the results of the two physicians to ascertain the cause of death in cases where there was agreement, identified disagreements for consensus, and coded the underlying cause of death according to the core three character code, as recommended by the ICD-10 [3]. On average, each review took 15 to 20 minutes. The probabilistic InterVA model

The InterVA (Interpreting Verbal Autopsy) model is a probabilistic model based on Bayes’ theorem that can be used to determine the cause of death for each case by processing successive indicators to generate up to three likely causes of death for each case. The model was developed using an expert panel and was deliberately designed to be generic and not context dependent and to produce relatively broad cause of death categories. The development and details of the InterVA model have been described in detail previously [4,5]. The model is freely available in the public domain http://www.interva. net/. We recategorized our data to compare with the InterVA sublist of 35 causes of death. The input data for the model include signs, symptoms, medical history, and circumstances (injury, drowning, and accident) derived from the closed questions of the VA questionnaires. Adaptations made to the data to fit the model included compiling the same VA data into an input file for the InterVA model and processing it into cause of death data. The model also expects an input of “high’’ or “low’’ to reflect the local prevalence of two specific causes that often vary by more than an order of magnitude between settings: HIV and malaria, which in this study were set to “high’’ and “high,’’ respectively. Data on some InterVA indicators were not available in the WHO verbal autopsy tool and so remained null (see Additional file 1). It is also worth noting that the InterVA batch file was incompatible with recent versions of Microsoft Office™ (i.e., 2003/2007 or above), so we had to save our batch MS Excel file to a lower version. Data were transformed using both STATA Version 11 (Timberlake, USA) and SAS® 9.2 (SAS Institute, Inc.) software.

Ethical approval

The study was approved by the KEMRI/Wellcome Trust Kilifi - Scientific Coordinating Committee (SCC), the KEMRI Scientific Steering Committee (SSC), and the KEMRI/National Ethical Review Committee (ERC) in Nairobi. Community sensitization was conducted both by the Ministry of Health and the local community leaders. In addition, interviewers obtained informed consent from appropriate respondents. Data management and statistical analysis

We used HCOD as the gold standard for validating both PCVA and the InterVA model. Although the HCOD could be attributed to a maximum of two causes, we only considered the primary cause of death for the purposes of this comparative study. Where more than one cause was given, we selected the underlying cause of death (UCOD) as our unit of comparison. While the model is based on experts’ opinion, the PCVA and HCOD are based on the ICD-10 guidelines. To enable comparisons in the context of a wide range of causes of death from the three methods, we first had to recode the data (see Additional file 2). Diagnoses that were included in all three methods (such as malaria, meningitis, and tuberculosis) retained their initial codes while lower-frequency diagnoses were recoded according to the more restricted range of classifications included in the InterVA model. For example, deaths attributed to “asthma” or “bronchitis” by HCOD or PCVA were recoded as “chronic respiratory diseases,” while “rabies” and “tetanus” were recoded under “other acute infections.” Similarly, causes such as “stroke,” “hypertension,” and “all heart conditions” were recoded as “cardiovascular diseases.”

Hospital cause of death: the gold standard

The cause of death at KDH (HCOD) was determined on the basis of high-quality clinical and laboratory data. The KEMRI-Wellcome Trust Research laboratories

258


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

Page 5 of 12

In situations in which there was no direct correlation, we had to recategorize the causes of death into broader cause groups. For instance, the InterVA model has two broad categories of bloody and nonbloody diarrhea for classifying all cases of diarrheal diseases. However, despite lack of microbiological evidence in verbal autopsy, the PCVA coded causes such as shigellosis and gastroenteritis. Such causes were therefore recoded into one broad category of diarrhea/gastroenteritis for comparison. Another category “other acute infections” had conditions with fewer symptoms and/or nonspecific criteria to arriving at a particular diagnosis, and mostly termed as septicemia. The model did not distinguish pneumonia from sepsis and hence categorized both as a single COD of pneumonia/sepsis, but the physicians coded them separately. Pneumonia/sepsis was retained as a broad category and sepsis only was recategorized as “other acute infection.” While physicians could distinguish tuberculosis (TB) from HIV using the ICD-10 list, the InterVA model assigns TB and HIV as separate entities, making direct comparisons difficult in situations where TB and HIV occur together. TB cases reported in this current study, therefore, were cases that the physicians diagnosed as TB only. The main causes of death determined by both InterVA and PCVA were compared against the corresponding HCOD (see Figure 2). Agreement was recorded as “1” where two or three methods agreed and “0” for no agreement. Cause-specific mortality fractions (CSMF) were used to measure agreement at population level and receiver operator characteristics (ROC) curve [6] was used to measure overall diagnostic performance of the methods. Case-by-case agreement between the methods was measured by Cohen’s kappa ( ) statistic [7], sensitivity, specificity, and positive predictive values.

Equation 1: Kappa measure of agreement

Where P(A) was the proportion of times the raters agreed, and P(E) was the proportion of times the raters were expected to agree by chance alone. Complete agreement corresponds to a value of 1, complete disagreement (i.e., purely random coincidences of rates) corresponds to a value of 0. A negative value of kappa would mean negative agreement. We used the following kappa ( ) scale to rate the strength of agreement as described previously [8]: a < 0.21 was considered poor, a between 0.21 and 0.40 fair, a between 0.41 and 0.60 moderate, a between 0.61 and 0.80 good, and a > 0.80 very good. Receiver operator characteristics (ROC) Curve

The area under the receiver operator characteristics (ROC) curve was calculated to measure the overall diagnostic performance (correctly diagnosing all the diseases) for both PCVA and InterVA against HCOD. For a method to be highly sensitive and specific, the area under the curve (AUC) should be close to one. The closer the curve follows the left-hand border and the top border of the ROC space, the more accurate the method. We considered the performance of our methods to be adequate if the area under the ROC curve exceeded 0.75 Validity measures: sensitivity and specificity

Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with their 95% confidence intervals (CI) for the top five underlying causes of death were computed for PCVA and the InterVA model against the HCOD. The formulas for this calculation were defined as: Sensitivity = TP/(TP + FN); Specificity = TN/(FP + TN);

Cause-specific mortality fractions

PPV = TP/(TP + FP);

Cause-specific mortality fractions (CSMF) were determined as the proportion of all deaths that were attributable to a specific cause across the HCOD, the InterVA model, and the PCVA.

NPV = TN/(FN + TN)

Where: TP = true positive, FP = false positive, TN = true negative, FN = false negative We considered validity of a method to be adequate if the sensitivity and specificity exceeded 60% and 85%, respectively. All analyses were carried out using R version 2.12.0 http://www.r-project.org/.

Cohen’s Kappa statistics ( )

We used Cohen’s kappa statistic ( ) to measure the level of agreement between the InterVA model or PCVA and the HCOD (the gold standard) for the underlying causes of death. The kappa measure of agreement was stated as: κ=

P(A) − P(E) 1 − P(E)

Results The KHDSS recorded 438 adult deaths which occurred in a hospital between March 2007 and June 2010. The current study included only those deaths (145) that occurred

(1)

259


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

Page 6 of 12

Verbal Autopsy data 15 y years+

Hospital deaths

Hosp. p. ICD10 IC diagnosis

Physician ysici Certified VA

InterVA data modelling InterVA nterV Input

Eligibility: PCVACOD + HCOD + InterVA output

N=145 Figure 2 Selection of adult deaths for inclusion in study conducted to validate both physician-certified verbal autopsy (PCVA) and the InterVA model against the hospital cause of death (HCOD). The figure shows the validation study design and the selection process of the adult deaths. The underlying cause of death determined by both InterVA and PCVA were compared against the corresponding HCOD.

methods, were HIV/AIDS-related, tuberculosis (pulmonary), meningitis, cardiovascular diseases, and diabetes. The InterVA model over reported tuberculosis as a cause of death compared to the other two methods, while PCVA overestimated diabetes. The CSMFs obtained using the InterVA model and PCVA were compared separately with those obtained from the HCOD (Figure 3). The CSMFs obtained were within Âą 5% of those derived using the gold standard for the four most common causes of death (HIV-related, cardiovascular diseases, meningitis, and diabetes) and were within Âą 8% of the gold standard value for tuberculosis (pulmonary). The InterVA model attributed 38/ 145 (26.2%) deaths to HIV/AIDS, whereas the physicians and the HCOD attributed 36/145 (24.8%) and 33/145 deaths (22.7%), respectively. The InterVA model, PCVA, and HCOD all estimated similar CSMFs for cardiovascular diseases. On the other hand, PCVA attributed 14 (9.6%) deaths to diabetes, while the InterVA model and HCOD attributed 6 (4.1%) deaths and 8 deaths (5.5%),

in the hospital (and their VA data coded by a physician). Deaths not meeting these criteria were dropped from the analysis. The mean age at death was 55 years (standard deviation 20 years), and 81 (56%) were males and 64 (44%) were females. The 145 deaths were successfully compared with the PCVA and the InterVA model. Ninety-one cases (63%) had two medically confirmed causes of death, giving a total of 236 HCOD. In the InterVA model output, 118 cases (81%) were assigned a single cause of death, 20 cases (14%) were assigned two causes of death, 2 cases (1%) were assigned three causes of death, and 5 cases (4%) were assigned as indeterminate. When the most possible cause of death assigned by the model disagreed with the HCOD, we considered both second and third likely causes of death, although such cases were few (only eight cases). On the basis of PCVA, a single cause of death was assigned in 143 (99%) cases, and 2 (1%) cases were coded as indeterminate. The top five causes of death, which accounted for more than 60% of all deaths determined by the three

260


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

Page 7 of 12

100% 90% Others

80%

Others

Others

70% CSMF

60% 50%

Tuberculosis (pulmonary)

Tuberculosis (pulmonary)

Tuberculosis (pulmonary) Meningitis

Meningitis

Diabetes

Diabetes

40%

Cardiovascular

Cardiovascular

Meningitis Diabetes

Cardiovascular

30% 20% 10%

HIV/AIDS related death

HIV/AIDS related death

HIV/AIDS related death

InterVACOD

PCVACOD

0% HCOD

Figure 3 Cause-specific mortality fractions for 145 adult deaths. The figure shows cause-specific mortality fractions for 145 deaths derived from hospital causes of death, verbal autopsies interpreted by physician, and by the InterVA model. The CSMFs obtained were within Âą 5% of those derived using the gold standard for the four most common causes of death (HIV-related, cardiovascular diseases, meningitis, and diabetes) and were within Âą 8% of the gold standard value for tuberculosis (pulmonary).

0.48, 0.54), while InterVA versus HCOD yielded a kappa ( ) value of 0.32 (95% CI: 0.30, 0.38). The overall diagnostic performance accuracy of the InterVA model and PCVA are shown in Figures 4 and 5, respectively. The false positive rate (1-specificity) is plotted on the x-axis and the true positive rate (sensitivity) on the y-axis. The area under the curve (AUC) for InterVA (0.82) and PCVA (0.88) were quite good, being close to the ideal value of 1.0. The results for sensitivities, specificities, PPV, and NPV with their 95% CIs of the InterVA model and PCVA in comparison to HCOD for the five most common causes of death are presented in Table 2. The

respectively. Furthermore, the InterVA model assigned three times as many deaths to tuberculosis (pulmonary) as HCOD. The InterVA model, PCVA, and HCOD attributed 9 (6.2%), 5 (3.4%), and 7 (4.8%) deaths respectively to meningitis. The Kappa ( ) indicators for method agreement are shown in Table 1. The overall multirater kappa value across all three methods was 0.41 (95% CI: 0.37, 0.48), with agreement being better for females ( = 0.48, 95% CI: 0.44, 0.52) than for males ( = 0.35, 95% CI: 0.32, 0.38). Agreement between each method and the gold standard was fairly good (most > 0.40). PCVA versus HCOD yielded a higher kappa value ( = 0.52, 95% CI:

Table 1 Kappa ( ) statistics for agreement of the three methods among the 145 adult deaths statistic (total) (N = 145) Kappa (95%CI)

statistics (males) (N = 81) Kappa (95%CI)

statistics (females) (N = 64) Kappa (95%CI)

InterVA versus HCOD

0.32 (0.30-0.38)

0.27 (0.22-0.30)

0.38 (0.32-0.41)

InterVA versus PCVA

0.42 (0.37-0.48)

0.33 (0.30-0.37)

0.52 (0.47-0.54)

PCVA versus HCOD

0.52 (0.48-0.54)

0.47 (0.44-0.50)

0.57 (0.54-0.60)

InterVA + PCVA+ HCOD

0.41 (0.37-0.48)

0.35 (0.32-0.38)

0.48 (0.44-0.52)

Methods

InterVA: probabilistic InterVA model, PCVA: physician-certified verbal autopsy, HCOD: hospital cause of death as the gold standard, CI: confidence interval.

261


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

0.6 0.4 0.0

0.2

Sensitivity

0.8

1.0

Page 8 of 12

0.0

0.2

0.4

0.6

0.8

1.0

1−Specificity Figure 4 Receiver operator characteristic (ROC) curve for the InterVA model. The figure shows the area under the receiver operator characteristic (ROC) curve for InterVA against HCOD. The area under the curve captures the relationship between the sensitivity and specificity of the InterVA method and is therefore indicative of how the method performed with respect to HCOD. The overall diagnostic measure for InterVA model was 0.82, indicating good diagnostic performance of the method. Also, the curve follows the left-hand border and then the top border of the ROC space, indicating an acceptable level of accuracy.

HCOD to provide data on the performance of both PCVA and the InterVA model. The model is based on certainty; hence, the effect of causal relationship is difficult to address in our context. Thus, conceptual classification that reflects the real public health issues is as appropriate as is the ICD-10 coding. Our results are consistent with those of previous studies showing that the InterVA model and PCVA are valid tools to ascertain causes of death [5,16]. The CSMFs obtained were within 5% of the gold standard for four leading causes of death (HIV-related, cardiovascular diseases, meningitis, and diabetes) and were within 8% of the gold standard value for tuberculosis (pulmonary). Misclassification had a greater effect on the reported CSMF estimates (see Additional files 3 and 4). It appears that the misclassification by the model gives a different picture regarding deaths due to HIV and

observed sensitivities and specificities for both methods across the five major causes of death varied from 43% to 100% and 87% to 99%, respectively. The observed sensitivity value for meningitis for both PCVA and the InterVA model was relatively low (43%) as compared to the cut-off value of 60%.

Discussion Although a number of previous studies have been conducted with a view to validating the use of verbal autopsy as a means of determining the cause of death in adults [9-15], to our knowledge this is the first report that has aimed to validate data collected using the new WHO international standard verbal autopsy adult questionnaire against HCOD as the gold standard. The two previous validation studies [5,16] compared the InterVA model against PCVA. We take this process a step further by validating both methods against the standard

262


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

0.6 0.4 0.0

0.2

Sensitivity

0.8

1.0

Page 9 of 12

0.0

0.2

0.4

0.6

0.8

1.0

1−Specificity Figure 5 Receiver operator characteristic (ROC) curve for PCVA. The figure shows the area under the receiver operator characteristic (ROC) curve for PCVA against HCOD. The area under the curve captures the relationship between the sensitivity and specificity of the PCVA method and is therefore indicative of how the method performed with respect to HCOD. The overall diagnostic measure for PCVA was 0.88, indicating good diagnostic performance of the method. Also, the curve follows the left-hand border and then the top border of the ROC space, indicating an acceptable level of accuracy.

conditions together in the cardiovascular diseases category was reasonable. Despite Kilifi being one of the poorest districts in Kenya [18], cardiovascular diseases were among the five most common causes of adult death, confirming that deaths from cardiovascular diseases are not restricted to resource-rich communities. Furthermore, one death from sickle cell disease in a 28year-old patient was correctly classified both by PCVA and by the InterVA model. Although there are other important causes of adult deaths, our hospital data had two cases of cancer (cancer of the cervix and leukemia), a case of chronic obstructive pulmonary disease (asthma), a case of ischemic heart disease/stroke (stroke cases were due to other underlying causes such hypertension), a case of liver cirrhosis (alcoholic liver disease), a case of renal failure, and two cases of pneumonia. These frequencies were so low that a massive study would be required to

tuberculosis. However, if one considers that tuberculosis and HIV share many clinical features and can occur as a co-infection, a TB/HIV category will show a similar pattern to that derived from the HCOD and PCVA. Similarly, it was observed that for meningitis both the PCVA and the InterVA model misclassified many of the cases to the ambiguous “Others” category. PCVA performed better than the model at an individual level; however, both arrived at broad agreement in identifying cause of death at a population level. For the purpose of mortality tabulation and statistical use, selection of a single condition is required. In some instances, there may be several causes that can be attributed to a death, from which only one cause needs to be identified and selected based on the principle of preventing the primary or UCOD, had there been an effective preventive program [17]. The PCVA inferred stroke to be hypertension, and therefore merging stroke, hypertension, and all heart

263


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

Page 10 of 12

Table 2 Validation results for the InterVA model and PCVA against the HCOD in diagnosing the cause of death for the five most common causes of death among 145 adults Causes of death

Sensitivity (%) (95% CI)

PPV (%) (95% CI)

Specificity (%) (95% CI)

NPV (%) (95% CI)

InterVA HIV/AIDS-related death

70 (5-84)

61 (43-76)

87 (79-92)

91 (84-92)

Cardiovascular

52 (34-69)

57 (37-75)

88 (81-94)

86 (78-92)

Tuberculosis (pulmonary) Meningitis

83 (36-100) 43 (10-82)

28 (10-54) 33 (8-70)

91 (85-95) 96 (91-99)

99 (96-100) 97 (93-99)

63 (25-92)

83 (36-100)

99 (96-100)

98 (94-100)

Diabetes

PCVA HIV/AIDS-related death

88 (72-97)

80 (64-92)

94 (88-98)

96 (91-99)

Cardiovascular

70 (51-84)

82 (63-94)

96 (90-99)

91 (85-95)

Tuberculosis (pulmonary)

100 (54-100)

55 (23-83)

96 (92-99)

100 (97-100)

Meningitis

43 (10-82)

60 (15-95)

99 (95-100)

97 (93-99)

Diabetes

100 (63-100)

57 (29-82)

96 (91-99)

100 (97-100)

PCVA: physician-certified verbal autopsy; InterVA model: probabilistic model; NPV: negative predictive value; PPV: positive predictive value; CI: confidence interval PPV is the number of positives correctly diagnosed through the InterVA model/PCVA (true positives) divided by number of positives diagnosed in hospital (as the gold standard). NPV is the number of negatives correctly diagnosed through the InterVA model/PCVA (true negatives) divided by number of negatives diagnosed in hospital (as the gold standard).

were confusing signs or symptoms, or perhaps there were poor interviewing skills. This percentage is low, and we consider it acceptable given the obtuse nature of the VA process. Conversely, our study also had a number of limitations. First, it is likely that some causes of death are less likely to occur in a hospital than others, typically those due to accidents, violence, and suicide [21]. As a result, it could be argued that our results might not be generally applicable because of potential differences in the distribution of causes of death in the hospital compared to the community. Second, although postmortem examination is the most accurate way to determine cause of death, such data were unavailable at the Kilifi site. In the absence of such pathology reports, the hospital records were the best alternative. Third, the sample size was small; nevertheless, the overall picture of CSMF for the major causes of death in our study population was similarly determined by both methods. Finally, the absence of some variables in the WHO verbal autopsy adult tool is a factor challenging the accuracy of the InterVA model to be more realistic compared to the gold standard.

meaningfully investigate the performance of the different models for these conditions or subdivisions thereof. The kappa statistics obtained in the current study ( = 0.32 for InterVA, = 0.52 for PCVA, and > 0.40 overall) suggest that PCVA performs better than the InterVA model. Compared to the gold standard, the diagnostic accuracy of both the InterVA and PCVA were good. The area under the ROC curve is close to the ideal value of one for both methods, suggesting that both methods (InterVA and PCVA) are valid compared to the gold standard. The observed sensitivity values for both PCVA and InterVA model were above 60%, apart from meningitis which scored low sensitivity. This relatively low sensitivity is consistent with a previous study in Kilifi [19] where meningitis yielded a sensitivity of less than 50%. The observed specificity values for both PCVA and InterVA model were good. Our study had a number of strengths. First, the HCODs were ascertained by experienced physicians with access to a range of high-quality diagnostic facilities. Second, the verbal autopsies were conducted by trained field workers using the new WHO adult verbal autopsy tool. Inadvertently, these results also validate the WHO adult questionnaire. Third, the InterVA model has been shown in several studies to be effective and was also evaluated on a preliminary basis in Vietnam [20] and Ethiopia [5] and found to be good. Overall, the InterVA model and PCVA classified only 4% and 1%, respectively, of all cases in this study as indeterminate, reflecting deaths in which either the respondent was not very familiar with the deceased’s illness, there

Conclusion In conclusion, we have shown that both the probabilistic InterVA model and PCVA compared reasonably well with the HCOD in determining the five most common underlying causes of death in a rural community in Kilifi district in Kenya. We hope that our study, albeit small, provides new and useful data that will stimulate further definitive work on methods for interpreting VA data.

264


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

Page 11 of 12

Inadvertently, this study validated the WHO international standard verbal autopsy adult questionnaire in two ways: first, in collecting VA data successfully for interpretation by PCVA and second, in providing indicators for the InterVA input whose output compared well with HCOD. This study further suggests that both the WHO adult tool and the InterVA model are feasible tools to measure cause-specific mortality, which may potentially inform both health policy and program interventions in resource-limited settings.

setting up the adult hospital surveillance and editing of the paper. ON contributed in creating hospital data on cause of death. TU contributed in creating hospital data on cause of death. SY contributed in creating hospital data on cause of death. BT helped to conceive the study, established a continuous community awareness and a mechanism for disseminating and implementing the results, and edited the paper. MA designed sensitization messages, implemented a continuous community awareness system, and edited the paper. AE was responsible for managing the adult hospital surveillance and helped in editing the paper. All authors read and approved the final version of the manuscript. Competing interests The authors declare that they have no competing interests. Received: 11 February 2011 Accepted: 5 August 2011 Published: 5 August 2011

Additional material

References 1. Byass P: Who needs cause-of-death data? PLoS Medicine 2007, 4(11):e333. 2. Sample Vital Registration with Verbal Autopsy (SAVVY): Verbal autopsy Interviewer’s manual, MEASURE Evaluation. University of North Carolina; USA;[http://www.cpc.unc.edu/measure/tools/monitoring-evaluation-systems/ savvy]. 3. World Health Organization: International statistical classification of diseases and related health problems. ICD-10 WHO Geneva; 1993. 4. Byass P, Fottrell E, Dao LH, Berhane Y, Corrah T, Kahn K, Muhe L, Do DV: Refining a probabilistic model for interpreting verbal autopsy data. Scandinavian Journal of Public Health 2006, 34(1):26-31. 5. Fantahun M, Fottrell E, Berhane Y, Wall S, Högberg U, Byass P: Assessing a new approach to verbal autopsy interpretation in a rural Ethiopian community: the InterVA model. Bulletin World Health Organization 2006, 84(3):204-10. 6. Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiological Society of North America; 1982, 29. 7. Blackman NJM, Koval JJ: Interval estimation for Cohen’s kappa as a measure of agreement. Wiley Online Library 2000, 723-741. 8. Roberts C, McNamee R: Assessing the reliability of ordered categorical scales using kappa-type statistics. Statistical Methods in Medical Research 2005, 14(5):493-514. 9. Lulu K, Berhane Y: The use of simplified verbal autopsy in identifying causes of adult death in a predominantly rural population in Ethiopia. BMC Public Health 2005, 5:58. 10. Kahn K, Tollman SM, Garenne M, Gear JS: Validation and application of verbal autopsies in a rural area of South Africa. Tropical Medicine & International Health 2000, 5(11):824-31. 11. Kalter HD, Gray RH, Black RE, Gultiano SA: Validation of postmortem interviews to ascertain selected causes of death in children. International Journal of Epidemiology 1990, 19(2):380-6. 12. Yang G, Rao C, Ma J, Wang L, Wan X, Dubrovsky G, Lopez AD: Validation of verbal autopsy procedures for adult deaths in China. International Journal of Epidemiology 2006, 35(3):741-8. 13. Kumar R, Thakur JS, Rao BT, Singh MM, Bhatia SP: Validity of verbal autopsy in determining causes of adult deaths. Indian Journal of Public Health 2006, 50(2):90-4. 14. Setel PW, Whiting DR, Hemed Y, Chandramohan D, Wolfson LJ, Alberti KG, Lopez AD: Validity of verbal autopsy procedures for determining cause of death in Tanzania. Tropical Medicine & International Health 2006, 11(5):681-96. 15. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: issues in their development and validation. International Journal of Epidemiology 1994, 23(2):213-22. 16. Oti SO, Kyobutungi C: Verbal autopsy interpretation: a comparative analysis of the InterVA model versus physician review in determining causes of death in the Nairobi DSS. Population Health Metrics 8:21. 17. Ndeng’e G, Opiyo C, Mistiaen JA: Geographic Dimensions of Well-being in Kenya: Where are the Poor? Central Bureau of Statistics Ministry of Planning and National Development Kenya; 2005. 18. Kimalu PK: A situational analysis of poverty in Kenya. Kenya Institute for Public Policy Research and Analysis; 2002.

Additional file 1: Indicators included in the InterVA model but missing from WHO verbal autopsy adult tool. The majority of missing indicators are disease conditions in adults and variables from the treatment section of the WHO adult questionnaire. Conversely, indicators in the model are not accounted for in the WHO data collection tool. Additional file 2: Spreadsheet showing cause of death categories assigned by the HCOD, PCVA, and InterVA model. The spreadsheet shows varying causes of death for each method. These were further categorized into broader cause groups referred to as the “condensed common list” to match each other, especially for causes without direct correlates. Diseases with fewer frequencies were also regrouped; mapping was then done and a common list was generated (collapsed COD list) for easy comparison. Additional file 3: Pattern of misclassification error: comparison of InterVA model causes of death versus the hospital cause of death. The table shows patterns of misclassification of cause of death (COD) between InterVA model versus hospital cause of death (HCOD). Misclassification was observed among all COD. Additional file 4: Pattern of misclassification error: comparison of physician-certified verbal autopsy causes of death versus the hospital cause of death. The table shows patterns of misclassification of cause of death (COD) between physician-certified verbal autopsy (PCVA) versus hospital diagnosis (HCOD). Misclassification was observed among all COD.

Acknowledgements and funding We thank Anthony Ngatia, Rebecca Njue, Patrick Kosgei, Alexander Makazi, Christopher Nyundo, Michael Kahindi, Samwel Geji, Robert Mswia, Hamis Mponezya, the study respondents, the field workers, the MOH Kilifi, and all of the KEMRI-Wellcome Trust collaborators for their help with this study. This paper is published with permission from the Director of KEMRI. The study was funded by a grant from the USAID National M&E Support Programme (sub-grant no: 631548-10S-1524) and a fellowship awarded to TW by the Wellcome Trust, UK (076934). Author details Department of Epidemiology and Demography, KEMRI-Wellcome Trust Research Programme, PO Box 230 Kilifi 80108, Kenya. 2Nuffield Department of Medicine, John Radcliffe Hospital, Oxford OX39DS, UK. 3Department of Paediatrics, John Radcliffe Hospital, Oxford OX39DS, UK. 4Kilifi District Hospital, PO Box 9 Kilifi 80108, Kenya. 5INDEPTH Network of Demographic Surveillance Sites, Accra, Ghana. 1

Authors’ contributions TW conceived the study design and edited the final version of the paper. EB contributed to study design, literature review, interpretation of the results, and drafting of the paper. CN reviewed literature, analyzed and interpreted data, and drafted the paper. GM helped with verbal autopsies data coding/ matching, interpretation of the results, and editing of the paper. GN helped with data management aspects and editing of the paper. LM helped in

265


Bauni et al. Population Health Metrics 2011, 9:49 http://www.pophealthmetrics.com/content/9/1/49

Page 12 of 12

19. Quigley MA, Armstrong Schellenberg JR, Snow RW: Algorithms for verbal autopsies: a validation study in Kenyan children. Bulletin World Health Organization 1996, 74(2):147-54. 20. Byass P, Huong DL, Minh HV: A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scandinavian Journal of Public Health. Supplement 2003, 62:32-7. 21. Kahn K, Tollman SM, Garenne M, Gear JS: Who dies from what? Determining cause of death in South Africa’s rural north-east. Tropical Medicine & International Health 1999, 4(6):433-41. doi:10.1186/1478-7954-9-49 Cite this article as: Bauni et al.: Validating physician-certified verbal autopsy and probabilistic modeling (InterVA) approaches to verbal autopsy interpretation using hospital causes of adult deaths. Population Health Metrics 2011 9:49.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

266


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

RESEARCH

Open Access

Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards Rafael Lozano1*, Michael K Freeman1, Spencer L James1, Benjamin Campbell1, Alan D Lopez2, Abraham D Flaxman1 and Christopher JL Murray1 for the Population Health Metrics Research Consortium (PHMRC)

Abstract Background: InterVA is a widely disseminated tool for cause of death attribution using information from verbal autopsies. Several studies have attempted to validate the concordance and accuracy of the tool, but the main limitation of these studies is that they compare cause of death as ascertained through hospital record review or hospital discharge diagnosis with the results of InterVA. This study provides a unique opportunity to assess the performance of InterVA compared to physician-certified verbal autopsies (PCVA) and alternative automated methods for analysis. Methods: Using clinical diagnostic gold standards to select 12,542 verbal autopsy cases, we assessed the performance of InterVA on both an individual and population level and compared the results to PCVA, conducting analyses separately for adults, children, and neonates. Following the recommendation of Murray et al., we randomly varied the cause composition over 500 test datasets to understand the performance of the tool in different settings. We also contrasted InterVA with an alternative Bayesian method, Simplified Symptom Pattern (SSP), to understand the strengths and weaknesses of the tool. Results: Across all age groups, InterVA performs worse than PCVA, both on an individual and population level. On an individual level, InterVA achieved a chance-corrected concordance of 24.2% for adults, 24.9% for children, and 6.3% for neonates (excluding free text, considering one cause selection). On a population level, InterVA achieved a cause-specific mortality fraction accuracy of 0.546 for adults, 0.504 for children, and 0.404 for neonates. The comparison to SSP revealed four specific characteristics that lead to superior performance of SSP. Increases in chance-corrected concordance are attained by developing cause-by-cause models (2%), using all items as opposed to only the ones that mapped to InterVA items (7%), assigning probabilities to clusters of symptoms (6%), and using empirical as opposed to expert probabilities (up to 8%). Conclusions: Given the widespread use of verbal autopsy for understanding the burden of disease and for setting health intervention priorities in areas that lack reliable vital registrations systems, accurate analysis of verbal autopsies is essential. While InterVA is an affordable and available mechanism for assigning causes of death using verbal autopsies, users should be aware of its suboptimal performance relative to other methods. Keywords: Verbal autopsy, InterVA, validation

* Correspondence: rlozano@uw.edu 1 Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA Full list of author information is available at the end of the article Š 2011 Lozano et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

267


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

Page 2 of 11

in six sites in four countries (Mexico, Tanzania, India, and the Philippines) [29]. The PHMRC study is unique both in terms of the size of the validation dataset (7,836 adult deaths, 2,075 child deaths, and 2,631 neonatal deaths) and the use of rigorously defined clinical diagnostic criteria for a death to be included in the study as a gold standard cause of death. Although the study was not originally designed to test the validity of InterVA, the study provides a unique opportunity to assess the performance of InterVA compared to PCVA and alternative automated methods for analysis.

Background Verbal autopsy (VA) is increasingly being used in many monitoring, surveillance, and research settings [1-6]. In settings without complete vital registration and medical certification of death, VA provides one of the only methods for obtaining empirical information on cause of death patterns. The main strategy for assigning causes of death from data collected through a VA instrument is through physician-certified verbal autopsy (PCVA) [7-13]. Byass et al. proposed InterVA as an automated alternative to PCVA [14,15]. InterVA, now in edition 3.2 [16], has been applied in a number of research and demographic surveillance sites [14,17-25]. The method is based on the logic of Bayes’ theorem. According to Bayes’ theorem, prior views on the distribution of causes of death for a population are updated by each symptom response in the instrument. The probabilities of responding yes to an item conditional on the true cause of death have been developed through expert review panels. Several studies have investigated the validity of InterVA as a tool for assigning causes of death [15,17,18]. A 2003 study analyzing 189 VA interviews in Vietnam found that, when considering all three possible causes assigned by the program, InterVA achieved over 70% concordance using PCVA as a comparator [14]. In another study that used InterVA to estimate AIDS deaths from 193 VA interviews in Ethiopia, the model correctly assigned 82% of AIDS deaths using hospital data as a gold standard [17]. Lastly, a study in Kenya that examined 1,823 VA interviews found 35% agreement between InterVA and physician review cause assignments [26]. The main limitation of these studies, as noted by several of the authors, is that they compare cause of death as ascertained through hospital record review or hospital discharge diagnosis with the results of InterVA. In low-resource and rural settings, where many of these studies have been conducted, the quality of the hospital diagnosis itself is often suspect. These studies provide information on the nominal association between hospital-assigned cause of death and InterVA, not true assessments of criterion validity where there is a gold standard cause of death. Further, comparison of InterVA with other published automated methods such as direct cause-specific mortality fraction (CSMF) estimation [27] or the Symptom Pattern Method [28] are limited by the reporting of different metrics in these studies. The Population Health Metrics Research Consortium (PHMRC) provides an opportunity to assess the criterion validity of InterVA in a large multisite study. The PHMRC verbal autopsy study has been undertaken to develop a range of new analytical methods for verbal autopsy and to test these methods using data collected

Methods The design, implementation, and general descriptive results for the PHMRC gold standard VA validation study are described elsewhere [29]. The final study reports on 46 adult causes of death, 21 child causes of death, 10 neonatal causes of death, and stillbirths. Of note for this study, gold standard cause of death assignment was based on strict clinical diagnostic criteria defined prior to data collection - level 1 diagnostic criteria are stricter than level 2. Table 1 provides the number of adult, child, and neonatal deaths by cause (using the joint cause list described below). For the analysis in this paper, we present results pooling both level 1 and level 2 gold standard causes of death. We conduct and report on separate analyses for adult, child, and neonatal deaths. Figure 1 provides a visual representation of the overall approach of the methods. Symptoms

InterVA version 3.2 is designed to have as input 106 items and yield predictions for 35 causes of death across all ages. The PHMRC data collection was based on a modification of the World Health Organization (WHO) instrument for VA, and Additional files 1, 2 and 3 list the PHMRC questions used to answer each InterVA item. Because InterVA does not interpret missing data, items not mapped from the PHMRC survey to the InterVA items were input as negative responses in InterVA. We extracted free text terms from open ended responses and coded them as dichotomous variables as described in the PHMRC study design paper [29]. Separate analyses were run with and without free text responses, but their inclusion had a negligible impact on the performance of the tool. In addition to the 106 symptom inputs, InterVA also uses priors for malaria and HIV/AIDS prevalence in the region of the deceased. We used regional malaria and HIV/AIDS prevalence as priors (see Additional file 4), but conducted a separate analysis in which we used the prevalence of a sample data draw as the priors. As we expected, using the regional prevalence was superior to using the draw prevalence.

268


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

Page 3 of 11

Table 1 Number of deaths for adults, children, and neonates by cause Adult causes

Deaths

Child causes

Deaths

Acute cardiac death

400

Chronic cardiac death

76

Chronic cardiac death

416

Chronic respiratory disease

12

Chronic respiratory disease

218

Diarrhea

256

Diabetes

414

Drowning

83

Diarrhea

228

HIV/AIDS

20

Disease of nervous system

49

Homicide

52

Drowning

106

Malaria

117

HIV/AIDS

501

Malignancy

28

Homicide

167

Measles

23

Kidney or urinary disease

413

Meningitis

99

Liver disease

313

Other acute infection

111

Malaria

100

Other digestive disease

48

Other injuries

171

Maternal Death

Malignancy

1,090 402

Other noncommunicable diseases

182

Other acute infection

263

Pneumonia/sepsis

678

Other digestive disease

166

Poisoning

18

Other injuries

464

Transport-related accident

92

Other noncommunicable diseases

200

Tuberculosis (pulmonary)

9

Pneumonia/sepsis

609

Total

Poisoning

86

Stroke

630

Neonate causes

Suicide

124

Congenital malformation

2075 Deaths 250

Transport-related accident

202

Meningitis

Tuberculosis (pulmonary)

275

Perinatal asphyxia

461

7836

Pneumonia/sepsis

250

Total

Preterm/small baby Total

6

662 1629

have also separately computed chance-corrected concordance using one, two, or all three InterVA cause assignments. For calculating accuracy, indeterminate deaths were equally redistributed across the causes that InterVA had predicted. Redistribution of indeterminate causes across the other causes improves measured accuracy.

Cause lists

The PHMRC study included 46 causes for adults, 21 causes for children, 10 causes for neonates, and stillbirths. For each observation, InterVA predicts up to three causes of death from a list of 35 causes across all age groups. We have mapped the InterVA cause list and the PHMRC cause list into a set of mutually-exclusive, collectively-exhaustive cause categories for each age category. The details for this mapping are provided in Additional files 5, 6 and 7. The resulting joint cause lists contain 24 causes for adults, 18 causes for children, and six causes for neonates. As mentioned above, InterVA can produce up to three potential causes for each death, and in some cases assigns deaths an indeterminate cause. Table 2 shows (by age group) the fraction of deaths to which InterVA assigned exactly one, two, or three causes, and the fraction deemed indeterminate. For modules reporting on only one cause assignment, we use the first cause of death to calculate chance-corrected concordance. We

Multiple validation test sets

As recommended by Murray et al. for validation studies [30], we vary the cause composition of the validation dataset by creating 500 test datasets. To do this, we first sample 500 distributions of CSMFs such that the sum of the CSMFs across causes equals 1.0. This is implemented by sampling from an uninformative Dirichlet distribution. We then randomly sample gold standard deaths with replacement to generate a test dataset with the desired CSMF composition. We then compute chancecorrected concordance and CSMF accuracy for each split (explained below). Because InterVA produces the

269


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

Page 4 of 11

InterVA analysis, performed separately for each age group (adult, child, neonate)

Original Data with Validated Gold Standard 7836 Adult 2075 Child 1629 Neonate

Map PHMRC survey items to InterVA items

Generate 500 Dirichlet-sampled cause compositions

Run all deaths through InterVA interface with region-specific prevalences

Sample, with replacement, using cause compositions (up to the size of the full dataset)

Create age-specific data files that have the InterVAassigned and gold standard causes

500 Test Draws

Map InterVA causes and PHMRC causes to a merged list: Adult: 24 Child: 18 Neonate: 6

Calculate accuracy and chancecorrected concordance for each draw

Calculate accuracy and chance-corrected concordance across draws

Figure 1 Overview of analytical process. This figure is a visual representation of the steps necessary for analysis, performed separately for each age group.

same cause assignment for any given death, the deaths were run through the InterVA interface only once, and those cause assignments were used for the validation analysis.

Metrics

Following the recommendations of Murray et al. [30], we assess the performance of InterVA compared to the gold standard using two types of metrics capturing the

270


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

Page 5 of 11

Table 2 Percent of deaths assigned to particular cause numbers by InterVA Exactly one assignment

Exactly two assignments

Exactly three assignments

Indeterminate

Adult

80.3%

16.1%

1.9%

1.8%

Child

76.7%

17.9%

1.9%

3.5%

Neonate

96.8%

2.6%

0.0%

0.5%

to all other causes at a time. Second, we restricted the universe of items available for SSP to only those used by InterVA. Third, we force SSP to assume that each item or symptom is independent of each other, as opposed to clustering different symptoms and developing probabilities of those combinations. Further details on SSP are available in Murray et al. [32].

accuracy of individual death assignment and CSMF estimation. Assigning deaths to specific causes is assessed using cause-specific chance-corrected concordance and the average of cause-specific chance-corrected concordance across causes. As noted, to assess whether the second and third causes predicted for some deaths by InterVA improve performance, we also compute chance-corrected concordance incorporating the second and third cause assignments. Performance predicting CSMFs is assessed using CSMF accuracy, which is scaled from zero to one, where zero is the maximum possible error and one is no error in predicting CSMFs. The relationship between predicted CSMFs and true CSMFs across the 500 test datasets is summarized for each cause by performing a regression of true CSMFs on estimated CSMFs. Details on how to compute these metrics are provided in Murray et al. [30].

Results Performance assigning true cause to individual deaths Across-cause results

Table 3 reports median chance-corrected concordances (across all causes) for one, two, and three cause assignments. The results are shown separately for all age groups, reporting on models with and without the inclusion of free text variables. Across all age groups and cause selections, the inclusion of free text variables at most increases chance-corrected concordance by 1.3%. The performance of InterVA, as measured by chancecorrected concordance, was comparable for adults and children using one cause selection (adults = 24.2%; children = 24.9%). However, the tool performed substantially worse for neonates, with a chance-corrected concordance of 6.3%. In all three age groups, consideration of the second and third cause assigned by InterVA led to lower chance-corrected concordance, compared to consideration of only the first cause. This is largely due to the fact that InterVA rarely predicts more than one cause (at most 17% of cases). Figure 2 shows the comparison overall for adults, children, and neonates to PCVA as reported by Lozano et al. [31] for the PHMRC gold standard datasets. For all three age groups, InterVA has markedly lower chancecorrected concordances. Interestingly, the performances of InterVA and PCVA follow the same pattern, doing best in children by a small margin, followed by adults, and performing less well for neonates.

Comparison to Simplified Symptom Pattern Method

Because we document poor performance of InterVA in comparison to PCVA [31], we have also compared InterVA to the Simplified Symptom Pattern (SSP) Method [28,32]. SSP is also based on Bayes’ theorem; however, there are four key differences between InterVA and simplified SSP. First, the SSP Method develops Bayesian models for one cause compared to all other causes at a time, while InterVA considers all causes independently. Second, SSP uses the 40 most informative symptoms for each cause from the entire universe of all items in the survey, while InterVA is limited to the items that map to it (roughly one-third the number of inputs) and uses all of these symptoms (regardless of how informative they are). Third, SSP captures the interdependencies of the symptom responses, while InterVA considers each symptom individually. Finally, SSP uses empirical measurements of the probability of a symptom set conditional on the true cause captured in a training dataset, while InterVA uses expert opinion. Using the PHMRC data, we progressively change SSP to be more like InterVA and assess its performance using chance-corrected concordance and CSMF accuracy to understand which aspects of InterVA lead to poor performance. We analyzed three progressively changing permutations of the SSP Method to identify the effect each difference between SSP and InterVA had on the performances. First, we developed an SSP model for all causes at once rather than developing a model for each cause compared

Cause-specific results

Additional file 8 shows the chance-corrected concordance, by cause, for adults, children, and neonates. These figures were calculated without the use of free text variables, and only considered the first InterVA cause assignment. These tables illustrate the distribution of InterVA’s performance across causes. For both adults and children, InterVA performed quite well for transport-related deaths; the chance-corrected

271


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

Page 6 of 11

Table 3 Median chance-corrected concordance (%) across causes for one, two, and three cause assignments (95% uncertainty interval [UI]) Age

Module

One cause

Two causes

Three causes

Adult

Free text

25.2 (25.1, 25.3)

25.1 (25.0, 25.1)

21.7 (21.6, 21.8)

No free text

24.2 (24.1, 24.3)

24.0 (23.9, 24.1)

20.6 (20.5, 20.7)

Free text

25.0 (24.7, 25.2)

22.5 (22.3, 22.7)

17.5 (17.3, 17.7)

No free text

24.9 (24.7, 25.0)

21.4 (21.3, 21.7)

16.2 (16.1, 16.4)

Free text

6.5 (6.2, 6.7)

-22.3 (-22.6, -22.0)

N/A

No free text

6.3 (6.1, 6.5)

-22.8 (-23.0, -22.5)

N/A

Child Neonate

was also low for diseases that are rare in children, such as chronic cardiac death and malignancies. For neonates, InterVA did not perform well for a series of causes. Again, we saw the lowest chance-corrected concordance for the rarest cause (meningitis = -25.0%). Congenital malformation was another neonatal cause for which InterVA performed poorly, with a chance-corrected concordance of -12.9%.

concordances were 85.6% for adults and 95.7% for children. InterVA also did well on some other injuries, including its high chance-corrected concordance for poisoning (58.9%) and drowning (55.8%) in children. For adults, chance-corrected concordance was higher than 50% for homicide, liver disease, and tuberculosis, with nearly 50% for malignancy and maternal deaths. For children, in addition to the aforementioned injuries, InterVA had chance-corrected concordances of close to 50% for pneumonia/sepsis and HIV/AIDS. For neonates, the only cause with a chance-corrected concordance over 50% was perinatal asphyxia (77.4%). While InterVA performed well for some causes such as these selected injuries, there were a number of causes that InterVA struggled to predict accurately. For adults, the lowest chance-corrected concordances were for disease of the nervous system (-4.3%), and the residual category other noncommunicable diseases (-4.0%). For children, InterVA struggled to accurately assign individual deaths for a number of categories. Similarly to adults, InterVA had poor performance with residual categories such as other acute infection and other digestive disease, with chance-corrected concordances of -5.9% for both causes. Chance-corrected concordance

Performance estimating CSMFs CSMF accuracy

Table 4 reports median CSMF accuracy (across all causes) for one, two, and three cause assignments. The results are shown separately for all age groups, reporting on models with and without the inclusion of free text variables. Across all age groups and cause selections, the inclusion of free text variables at most increases accuracy by 0.016. The performance of InterVA was comparable for adults and children, with an accuracy of 0.546 for adults and 0.504 for children. However, the tool performed substantially worse for neonates, with an accuracy of 0.404. In all three age groups, consideration of the second and third cause assigned by InterVA had a negligible effect on accuracy, with a maximum difference of 0.017. While the consideration of multiple cause assignments had a detrimental effect on chance-corrected concordance, that relationship was not seen for accuracy. This implies that, at the population level, the second and third cause assignments are as accurate as the first. Figure 3 summarizes CSMF accuracy for the three age groups and provides benchmark comparisons for PCVA as reported by Lozano et al. [31] for the same PHMRC gold standard database. In all age groups, CSMF accuracy is substantially lower than that observed for PCVA. Interestingly, InterVA performs better for older age groups, while PCVA performs better for younger age groups.

Chance-Corrected Concordance

50.0% 45.0% 40.0% 35.0% 30.0% InterVA

25.0%

PCVA

20.0% 15.0% 10.0% 5.0% 0.0% Adult

Child

Neonate

True versus estimated CSMFs

Figure 2 Median chance-corrected concordance of InterVA and PCVA. This figure compares the performance of InterVA with PCVA across 500 Dirichlet draws. PCVA performs better than InterVA for all age groups.

Figure 4 shows the results of regressing the true CSMF on the estimated CSMF for four selected adult causes (Additional file 9 shows the results for all causes for

272


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

Page 7 of 11

Table 4 Median CSMF accuracy across 500 Dirichlet draws, by age group and number of cause assignments (95% UI) Age

Module

One cause

Two causes

Three causes

Adult

Free text

0.549 (0.542, 0.557)

0.555 (0.548, 0.563)

0.556 (0.548, 0.564)

No free text

0.546 (0.539, 0.553)

0.554 (0.548, 0.560)

0.555 (0.549, 0.561)

Child Neonate

Free text

0.520 (0.513, 0.528)

0.503 (0.495, 0.511)

0.503 (0.496, 0.512)

No free text

0.504 (0.496, 0.514)

0.487 (0.480, 0.494)

0.487 (0.482, 0.496)

Free text

0.405 (0.392, 0.420)

0.409 (0.397, 0.425)

N/A

No free text

0.404 (0.388, 0.419)

0.407 (0.393, 0.423)

N/A

in which the cause fractions are overestimated in draws with low cause fractions. With large intercepts, 0.160 and 0.082 respectively, InterVA predicts the presence of these conditions even if they are virtually absent in the population. Finally, higher RMSE values (> 0.01) suggest that correcting for this overestimation will be more difficult than correcting for the underestimation of homicide or maternal deaths.

adults, children, and neonates). Each element of the output has a distinct implication for the relationship between true and estimated CSMFs. The ideal slope should be 1.00, such that a unit increase in the true CSMF corresponds to an equal unit increase in the estimated CSMF. The ideal intercept value is 0.00, and deviation from this provides information regarding the performance of the tool in populations with small cause fractions for that particular disease. Finally, the root mean squared error (RMSE) gives a measure of the uncertainty in the estimated CSMFs. The causes selected for Figure 4 were chosen to demonstrate the differential performances of InterVA across causes. Both homicide and maternal death provide examples in which near-zero intercepts, 0.014 and 0.009 respectively, indicate good performance in sample populations with small cause fractions. However, in both instances, a slope that deviates substantially from 1.00 implies that InterVA will underestimate the proportion of these causes in populations where the disease is common. The low RMSEs (≤.006) indicate that the underestimation is consistent across different simulated populations, and may be amenable to a post hoc correction. Pneumonia/sepsis and HIV/AIDS provide examples

Comparison to SSP variants

Figure 5 shows a comparison of InterVA median chance-corrected concordance across causes with CSMF accuracy compared to three variants of SSP applied to the same dataset. Prior to modification, the SSP method had a chance-corrected concordance of 48% and an accuracy of 0.73. The first variant of SSP involved developing a model for all causes at once, rather than causeby-cause models. This lowered chance-corrected concordance by 2% and accuracy by 0.02. The second variant further modified the methods by only using the survey questions that mapped to the InterVA survey. This lowered the chance-corrected concordance an additional 7% and lowered accuracy an additional 0.04. In addition to these changes, the third variation of SSP assumes the responses to each symptom are independent, as opposed to using clusters of symptoms that allow for correlation between items in response patterns. This method lowered the chance-corrected concordance by 6%, resulting in an overall chance-corrected concordance of 33% and an accuracy of 0.60. As SSP is modified to become more like InterVA, its performance both in terms of chancecorrected concordance and accuracy steadily declines. Figure 6 shows a comparison of selected empirical probabilities of SSP to the expert probabilities of InterVA for the symptom acute cough. This graph illustrates some of the differences in the prior probabilities of selected causes, which, based on the above analysis, may account for up to 8% chance-corrected concordance and 0.05 accuracy. Of note, InterVA tends to have higher probabilities than SSP for causes that are unrelated to cough (drowning, suicide, maternal death), while SSP has a higher probability for related causes such as infections and chronic respiratory disease.

0.8 0.7

CSMF Accuracy

0.6 0.5 InterVA

0.4

PCVA 0.3 0.2 0.1 0.0 Adult

Child

Neonate

Figure 3 Median CSMF accuracy of InterVA and PCVA. This figure compares the performance of InterVA with PCVA across 500 Dirichlet draws. It shows a substantially better performance for PCVA than InterVA for all age groups.

273


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

Page 8 of 11

Est=.160 + 165 True RMSE=.019

Est=.014 + .533 True RMSE=.006

Est=.009 + .490 True RMSE=.002

Est=.082 + .332 True RMSE=.011

Figure 4 Estimated versus true CSMFs. This figure shows scatter plots of the estimated CSMF versus the true CSMF for pneumonia/sepsis, homicide, maternal death, and HIV/AIDS across 500 Dirichlet draws. It demonstrates the performance of InterVA for four causes of death as the cause fractions vary. Each graph shows the results from a regression of true CSMF on estimated CSMF, as well as the root mean squared error.

difference in the findings here compared with the more favorable studies. First, the PHMRC database is the first VA validation study where cause of death has been assigned using strict clinical diagnostic criteria and not medical record review or hospital diagnosis. The distinction is critical; in medical record review a chart may say myocardial infarction but not have documentation on how this diagnosis was made. In the PHMRC dataset, a death from myocardial infarction requires at least one of the following: cardiac perfusion scan, electrocardiogram changes, documented history of coronary artery bypass grafting or percutaneous transluminal coronary angioplasty or stenting, coronary angiography, and/or enzyme changes in the context of myocardial ischemia. Second, it is difficult to compare across previous studies because different metrics and results are reported for only one CSMF composition in the test data. Murray et al. report that findings can vary widely as a function of CSMF composition, and therefore metrics based on a single CSMF can be highly misleading [30].

Discussion This assessment of the performance of InterVA compared to gold standard cause of death assignment in a large multisite study shows an overall chance-corrected concordance of 24.2%, 24.9%, and 6.3% for adults, children, and neonates, respectively. At the level of estimating CSMFs, InterVA has a CSMF accuracy of 0.546 for adults, 0.504 for children, and 0.404 for neonates. Compared to PCVA, the performance of InterVA is much lower in terms of chance-corrected concordance, and it produces substantially larger errors in estimated CSMFs [31]. The poor performance of InterVA, given some published studies, is surprising. Not all studies, however, have reported good concordance. Oti et al. [33] compared InterVA on 1,823 deaths to physician review and found a chance-corrected concordance of 31.2%, which is not much higher than reported here - authors’ calculations. One other validation study found a 33.3% chancecorrected concordance when comparing InterVA to physician review [14]. Two factors may account for the

274


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

Page 9 of 11

different studies have shown different levels of accuracy for the program. InterVA could easily identify deaths with highly-probable symptoms such as road traffic injuries, but it struggled with less explicit causes such as infections. There also appeared to be some anomalous results from the program. For example, the program indicates that the probability of assigning drowning as a true cause is 0.99 if the respondent responded “yes” to the question “did s/he drown?” However, of the 117 adult deaths in which the respondent indicated that there was drowning, InterVA only assigned six of them “drowning” as the cause of death. We believe that this was the result of a coding error in the program. InterVA also tends to overpredict perinatal asphyxia in neonates. While we are less confident why this is, we believe that it is a notable shortcoming of the program. We hope that the cause-specific results can be used to better inform expert priors for future Bayesian methods. The analysis of InterVA compared to the other Bayesian automated approach, Simplified Symptom Pattern, also provides a clear indication of why InterVA is not working well. The analysis of SSP variants designed to approximate InterVA show that four factors contribute to better results using SSP: use of interdependencies in the symptom responses, the use of all the items in the WHO or PHMRC instrument rather than just the 106 items in InterVA, the use of empirical probabilities of symptoms conditional on the true cause rather than expert judgment, and finally the technical advantage of developing models for each cause relative to other causes rather than all causes independently [32]. Moving to empirical probabilities improved chance-corrected concordance by 4%, capturing the interdependencies of some items added another 6%, and expanding from the InterVA item list to the full item list added another 7%. The progressive improvement in the performance of the SSP variants provides an understanding of how the limitations of the implementation of Bayes’ theorem in InterVA contribute to its poor performance. There are several limitations of this study. First, because the InterVA and PHMRC cause lists had to be merged to a joint cause list, InterVA was essentially challenged to predict causes that it was not built to identify (such as specific types of injuries). Conversely, there are a number of causes for which InterVA may predict very well that were not included in the study (such as malnutrition in children). InterVA could in theory perform well for these causes, which would have increased its average chance-corrected concordance. Note that the cause list used for the assessment of PCVA performance was slightly longer, so the InterVA performance may have been slightly exaggerated [31]. Second, there were a number of InterVA items that were not mapped to the PHMRC survey (17 adult

60%

ChanceͲCorrected Concordance

55% 50% SSP SSP1

45% 40%

SSP2

35% SSP3

30% 25%

InterVA

20% 0.5

0.55

0.6

0.65

0.7

0.75

0.8

CSMF Accuracy SSP: Simplified Symptom Pattern Method SSP 1: SSP, but evaluates probabilities of all causes independently SSP 2: SSP 1, but uses only the symptoms that match to InterVA items SSP 3: SSP 2, but evaluates each symptom independently as opposed to in clusters InterVA: InterVA Method

Figure 5 Comparison of InterVA to variations of Simplified Symptom Pattern Method. This figure shows the performances of four permutations of SSP versus InterVA for adults, considering one cause selection (excluding free text). It demonstrates the importance of different aspects of Bayesian methods.

Reporting chance-corrected concordance and regression results of CSMF true on CSMF estimated for each cause provides a framework for analyzing the strengths and weaknesses of InterVA. Clearly, the program is currently better suited to identify certain more obvious causes than other more complex ones. The program also has differential performances based on the cause fraction of each disease. This partly explains why

Simplified Symptom Pattern Empirical Probabilities

0.4 Chronic respiratory disease

0.35

Pneumonia/Sepsis 0.3

0.25

Other acute infection Acute cardiac death

0.2 Stroke

Malaria

0.15

0.1

Maternity-related death

0.05

Suicide Accidental drowning

0

0

0.1

0.2

0.3

0.4

0.5

0.6

InterVA Expert Probabilities

Figure 6 Comparison of Simplified Symptom Pattern empirical probabilities and InterVA expert probabilities. The scatter plot compares the probabilities of InterVA versus SSP for selected causes, given the symptom acute cough. This difference of posterior probabilities is partially responsible for the superior performance of SSP.

275


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

Page 10 of 11

to or better than PCVA, such as the Tariff Method, SSP, and machine learning [32,34,35]. Given the widespread use of VA for understanding the burden of disease and setting health intervention priorities in areas that lack reliable vital registrations systems, accurate analysis of VAs is essential. Until InterVA is substantially revised, users should carefully consider the use of alternative automated approaches for the analysis of VA data.

questions, 32 child questions, and 30 neonatal questions). Inclusion of these items would likely improve performance of the tool. Third, InterVA predicted deaths in some age groups for causes that largely belong to other age groups. For example, it predicted preterm/ small baby as a child cause and malnutrition as an adult cause. These deaths were assigned to the residual other category. This practice also may have exaggerated InterVA accuracy. The contribution of this study is the use of gold standard cases for the validation of InterVA. The aforementioned studies only provide information on the relationship between InterVA and hospital- assigned or physician-reviewed cause of death. This study provides a direct comparison of InterVA to gold standard verified causes of death. It is also important to note that this study is considering the performance of InterVA in a diverse cultural and epidemiological context. However, further analysis from each of the sites will provide specific results about the performance of InterVA in each of the countries included in the PHMRC study.

Additional material Additional file 1: Mapping between InterVA input questions and PHMRC survey questions for adults. Additional file 2: Mapping between InterVA input questions and PHMRC survey questions for children. Additional file 3: Mapping between InterVA input questions and PHMRC survey questions for neonates. Additional file 4: Regional prevalence priors used for InterVA. Additional file 5: Mapping between InterVA causes and PHMRC causes for adults. Additional file 6: Mapping between InterVA causes and PHMRC causes for children. Additional file 7: Mapping between InterVA causes and PHMRC causes for neonates.

Conclusions This study demonstrated both the strengths and weaknesses of InterVA as a method of assessing both individual-level and population-level causes of death. For the first time, the use of gold standards for validation illustrates the performance of the tool in diverse settings. To date, InterVA has proven popular with some users because it is automated and can reduce the cost of VA analysis and speed up data processing. InterVA does not use free text items and implicitly encourages users to use structured instruments that may also lead to savings and efficiencies in data processing. The relative computational simplicity of InterVA also means that it can work in a variety of settings without access to more sophisticated computational power that might be required for some empirically-derived methods. Additionally, InterVA is not linked to a specific VA instrument, which is both a strength and a weakness. The strength is that, in principle, it can be used to analyze data collected historically with different or more limited instruments. The weakness, however, is that much of the salient information collected in the WHO or PHMRC instruments are not used. Further, because it is not tied to an instrument, the InterVA items are defined in medical terms and are not actually mapped to particular questions that can be asked of households. Such ambiguity stems from the specification of the InterVA variables as medical terms rather than VA instrument items. These advantages come at a substantial decrement in performance compared to PCVA. Fortunately, other automated options for the analysis of VA data have the same advantages but have validated performance equal

Additional file 8: Chance-corrected concordance (%) for adult, child, and neonatal causes across 500 Dirichlet draws (excluding free text, one cause selection). Additional file 9: Results of regressing true CSMFs on estimated CSMFs for adult, child, and neonatal causes (excluding free text, one cause selection).

Abbreviations CSMF: cause-specific mortality fraction; PCVA: physician-certified verbal autopsy; PHMRC: Population Health Metrics Research Consortium; RMSE: root mean squared error; SSP: Simplified Symptom Pattern Method; VA: verbal autopsy. Acknowledgements This research was conducted as part of the Population Health Metrics Research Consortium: Christopher J.L. Murray, Alan D. Lopez, Robert Black, Ramesh Ahuja, Said Mohd Ali, Abdullah Baqui, Lalit Dandona, Emily Dantzer, Vinita Das, Usha Dhingra, Arup Dutta, Wafaie Fawzi, Abraham D. Flaxman, Sara Gomez, Bernardo Hernandez, Rohina Joshi, Henry Kalter, Aarti Kumar, Vishwajeet Kumar, Rafael Lozano, Marilla Lucero, Saurabh Mehta, Bruce Neal, Summer Lockett Ohno, Rajendra Prasad, Devarsetty Praveen, Zul Premji, Dolores RamĂ­rez-Villalobos, Hazel Remolador, Ian Riley, Minerva Romero, Mwanaidi Said, Diozele Sanvictores, Sunil Sazawal, Veronica Tallo. The authors would like to additionally thank Charles Atkinson for managing the PHMRC verbal autopsy database, Alireza Vahdatpour and Charles Atkinson for intellectual contributions to the analysis, and Roger Ying for conducting a literature review. This work was funded by a grant from the Bill & Melinda Gates Foundation through the Grand Challenges in Global Health initiative. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. The corresponding author had full access to all data analyzed and had final responsibility for the decision to submit this original research paper for publication. Author details Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA. 2University of Queensland, School of Population Health, Brisbane, Australia. 1

276


Lozano et al. Population Health Metrics 2011, 9:50 http://www.pophealthmetrics.com/content/9/1/50

Page 11 of 11

Authors’ contributions RL, ADL, ADF, and CJLM designed the study. MKF and SLJ performed the statistical analyses. BC conducted the literature review and participated in the analysis. RL, MKF, and CJLM drafted the manuscript and approved the final version. RL accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. All authors have read and approved the final manuscript.

20.

21.

Competing interests The authors declare that they have no competing interests.

22.

Received: 13 April 2011 Accepted: 5 August 2011 Published: 5 August 2011

23.

References 1. Fottrell E, Byass P: Verbal autopsy: methods in transition. Epidemiol Rev 2010, 32:38-55. 2. Polprasert W, Rao C, Adair T, Pattaraarchachai J, Porapakkham Y, Lopez AD: Cause-of-death ascertainment for deaths that occur outside hospitals in Thailand: application of verbal autopsy methods. Population Health Metrics 2010, 8:13. 3. Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84:239-245. 4. Setel PW, Whiting DR, Hemed Y, Chandramohan D, Wolfson LJ, Alberti KGMM, Lopez AD: Validity of verbal autopsy procedures for determining cause of death in Tanzania. Trop Med Int Health 2006, 11:681-696. 5. Chandramohan D, Maude GH, Rodrigues LC, Hayes RJ: Verbal autopsies for adult deaths: issues in their development and validation. Int J Epidemiol 1994, 23:213-222. 6. Huong DL, Minh HV, Byass P: Applying verbal autopsy to determine cause of death in rural Vietnam. Scand J Public Health Suppl 2003, 62:19-25. 7. Quigley MA, Chandramohan D, Rodrigues LC: Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies. Int J Epidemiol 1999, 28:1081-1087. 8. Setel PW, Rao C, Hemed Y, Whiting DR, Yang G, Chandramohan D, Alberti KGMM, Lopez AD: Core verbal autopsy procedures with comparative validation results from two countries. PLoS Med 2006, 3:e268. 9. Kahn K, Tollman SM, Garenne M, Gear JS: Validation and application of verbal autopsies in a rural area of South Africa. Trop Med Int Health 2000, 5:824-831. 10. Coldham C, Ross D, Quigley M, Segura Z, Chandramohan D: Prospective validation of a standardized questionnaire for estimating childhood mortality and morbidity due to pneumonia and diarrhoea. Trop Med Int Health 2000, 5:134-144. 11. Thatte N, Kalter HD, Baqui AH, Williams EM, Darmstadt GL: Ascertaining causes of neonatal deaths using verbal autopsy: current methods and challenges. J Perinatol 2008, 29:187-194. 12. Maude GH, Ross DA: The effect of different sensitivity, specificity and cause-specific mortality fractions on the estimation of differences in cause-specific mortality rates in children from studies using verbal autopsies. Int J Epidemiol 1997, 26:1097-1106. 13. Boulle A, Chandramohan D, Weller P: A case study of using artificial neural networks for classifying cause of death from verbal autopsy. Int J Epidemiol 2001, 30:515-520. 14. Byass P, Huong DL, Minh HV: A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scand J Public Health Suppl 2003, 62:32-37. 15. Byass P, Fottrell E, Dao LH, Berhane Y, Corrah T, Kahn K, Muhe L, Do DV: Refining a probabilistic model for interpreting verbal autopsy data. Scand J Public Health 2006, 34:26-31. 16. InterVA: [http://www.interva.net/], Accessed: January 15, 2011. 17. Tensou B, Araya T, Telake DS, Byass P, Berhane Y, Kebebew T, Sanders EJ, Reniers G: Evaluating the InterVA model for determining AIDS mortality from verbal autopsies in the adult population of Addis Ababa. Trop Med Int Health 2010, 15:547-553. 18. Fantahun M, Fottrell E, Berhane Y, Wall S, Högberg U, Byass P: Assessing a new approach to verbal autopsy interpretation in a rural Ethiopian community: the InterVA model. Bull World Health Organ 2006, 84:204-210. 19. Fottrell E, Byass P, Ouedraogo TW, Tamini C, Gbangou A, Sombié I, Högberg U, Witten KH, Bhattacharya S, Desta T, Deganus S, Tornui J, Fitzmaurice AE, Meda N, Graham WJ: Revealing the burden of maternal

24.

25.

26.

27. 28.

29.

30.

31.

32.

33.

34.

35.

mortality: a probabilistic model for determining pregnancy-related causes of death from verbal autopsies. Popul Health Metr 2007, 5:1. Bell JS, Ouédraogo M, Ganaba R, Sombié I, Byass P, Baggaley RF, Filippi V, Fitzmaurice AE, Graham WJ: The epidemiology of pregnancy outcomes in rural Burkina Faso. Trop Med Int Health 2008, 13(Suppl 1):31-43. Bell JS, Qomariyah SN: Immpact–tools & methods: selected findings on maternal mortality. Presented at the Impact International Symposium “Delivering Safer Motherhood: Sharing the Evidence” University of Aberdeen, United Kingdom: IMMPACT; 2007. World Health Organization: Deployment at community level of artemether-lumefantrine and rapid diagnostic tests, Raya Valley, Tigray, Ethiopia. Geneva: World Health Organization; 2009. Lemma H, Byass P, Desta A, Bosman A, Costanzo G, Toma L, Fottrell E, Marrast A, Ambachew Y, Getachew A, Mulure N, Morrone A, Bianchi A, Barnabas GA: Deploying artemether-lumefantrine with rapid testing in Ethiopian communities: impact on malaria morbidity, mortality and healthcare resources. Tropical Medicine & International Health 2010, 15:241-250. Byass P, Kahn K, Fottrell E, Collinson MA, Tollman SM: Moving from data on deaths to public health policy in Agincourt, South Africa: approaches to analysing and understanding verbal autopsy findings. PLoS Med 2010, 7:e1000325. Fottrell E, Kahn K, Ng N, Sartorius B, Huong DL, Van Minh H, Fantahun M, Byass P: Mortality measurement in transition: proof of principle for standardised multi-country comparisons. Trop Med Int Health 2010, 15:1256-1265. Oti SO, Kyobutungi C: Verbal autopsy interpretation: a comparative analysis of the InterVA model versus physician review in determining causes of death in the Nairobi DSS. Popul Health Metr 2010, 8:21. King G, Lu Y: Verbal Autopsy Methods with Multiple Causes of Death. Statist Sci 2008, 23:78-91. Murray CJL, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the Symptom Pattern Method for Analyzing Verbal Autopsy Data. PLoS Med 2007, 4:e327. Murray CJL, Lopez AD, Black R, Ahuja R, Ali SM, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gómez S, Hernández B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Praveen D, Premji Z, RamírezVillalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D, Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metr 2011, 9:27. Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9:28. Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:32. Murray CJL, James SL, Birnbaum JK, Freeman MK, Lozano R, Lopez AD, the Population Health Metrics Research Consortium (PHMRC): Simplified Symptom Pattern Method for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:30. Oti SO, Kyobutungi C: Verbal autopsy interpretation: a comparative analysis of the InterVA model versus physician review in determining causes of death in the Nairobi DSS. Popul Health Metr 2010, 8:21. James SL, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metr 2011, 9:31. Flaxman AD, Vahdatpour A, Green S, James SL, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9:29.

doi:10.1186/1478-7954-9-50 Cite this article as: Lozano et al.: Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards. Population Health Metrics 2011 9:50.

277


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.