Departamento de Ciencias Médicas Sección de Métodos en Investigación Científica
Curso de Metodología de la Investigación I
SEMINARIOS DE METODOLOGÍA DE LA INVESTIGACIÓN I
Gutiérrez Villafuerte, César
Lima, marzo de 2018
GUÍA DE SEMINARIOS DE METODOLOGÍA DE LA INVESTIGACIÓN I
CONTENIDO
Tema del seminario
Página
1. La importancia de la historia de la ciencia.
3
2. Introducción a la ética en la investigación médica.
9
3. La observación de la realidad como fuente de problemas de investigación.
21
4. De la hipótesis al diseño de investigación.
25
5. ¿Dos respuestas para un mismo problema?
32
6. Visión general del diseño, conducción y reporte de una investigación en medicina.
44
7. Errores sistemáticos: el sesgo de medición.
52
8. Estudio de caso: Medición inapropiada de variables.
61
9. La importancia de la definición operacional de variables.
70
10. Inadecuada selección de sujetos como fuente de error sistemático.
80
11. El efecto placebo en la investigación médica.
85
12. Importancia del reconocimiento de resultados no esperados.
95
13. La generalización de los resultados de una investigación.
111
14. Validez de las recomendaciones dadas a los pacientes.
115
2
METODOLOGÍA DE LA INVESTIGACIÓN I
SEMINARIO N°1
LA IMPORTANCIA DE LA HISTORIA DE LA CIENCIA. Casadevall A, Fang FC. (A)Historical science. Infect Immun. 2015; 83(12): 4460-4.
Preguntas para el control de lectura y guía de discusión grupal 1.
En el primer párrafo los autores señalan que las historias personales definen a los individuos, mientras que las historias comunes definen a los grupos y naciones. ¿Está de acuerdo con esta posición? ¿Por qué?
2.
Un aspecto controvertido que mencionan los autores es la posibilidad de conocer a profundidad un campo de la ciencia sin conocer cómo se arribó a dichos conocimientos. ¿Usted se ve así en un futuro?
3.
Como usted ya conoce, en el artículo se señala que actualmente la estructura de la comunicación científica sigue el formato IMRyD en los artículos científicos. Los autores hacen una crítica e indican que esta convención "distorsiona" la historia. Si bien usted aún no ha leído muchos artículos originales, ¿cuál es su opinión al respecto de esta posición?
4.
En el artículo se mencionan cinco razones por las cuales los científicos deberían estar interesados y conocer sobre la historia de la ciencia. ¿Podría señalar alguna razón adicional? Sustente su propuesta.
5.
En varios pasajes del artículo se menciona el término serendipity. ¿Qué implica este concepto en la investigación científica y en particular en el conocimiento de la historia de la ciencia?
3
METODOLOGÍA DE LA INVESTIGACIÓN I
EDITORIAL
(A)Historical Science Arturo Casadevall,a Ferric C. Fangb Department of Molecular Microbiology & Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USAa; Departments of Laboratory Medicine and Microbiology, University of Washington School of Medicine, Seattle, Washington, USAb
In contrast to many other human endeavors, science pays little attention to its history. Fundamental scientific discoveries are often considered to be timeless and independent of how they were made. Science and the history of science are regarded as independent academic disciplines. Although most scientists are aware of great discoveries in their fields and their association with the names of individual scientists, few know the detailed stories behind the discoveries. Indeed, the history of scientific discovery is sometimes recorded only in informal accounts that may be inaccurate or biased for self-serving reasons. Scientific papers are generally written in a formulaic style that bears no relationship to the actual process of discovery. Here we examine why scientists should care more about the history of science. A better understanding of history can illuminate social influences on the scientific process, allow scientists to learn from previous errors, and provide a greater appreciation for the importance of serendipity in scientific discovery. Moreover, history can help to assign credit where it is due and call attention to evolving ethical standards in science. History can make science better. The history of science bores most scientists stiff. —Sir Peter Medawar (1)
O
ne of the unique experiences of being human is to have a history. The ability to recount the past and pass it on to future generations is made possible by the symbolic language unique to our species. Most human history has been conveyed by oral narratives and legends. However, the invention of writing allowed history to acquire a new permanence. Herodotus, who lived in Greece during the 5th century B.C.E., is generally regarded as the first historian who attempted to systematically organize and analyze information. (There are others who regard Thucydides as the first true historian and Herodotus as the “first liar” for getting so many of his facts wrong [2].) Personal histories define individuals, while communal histories define groups and nations. In some areas of human endeavor, such as law and politics, history is essential for interpreting and understanding the present, and competing versions of history are often critical points of contention. However, science is a human endeavor in which the study of its own history plays a less prominent role. This is evidenced by the scant attention paid to history during the scientific training process, the ahistorical style of most scientific literature, and the separation of science and the history of science as academic disciplines. As part of our exploration of the state of current science that includes descriptive (3), mechanistic (4), important (5), specialized (6), diseased (7), competitive (8), and field (9) science, we now examine the importance of history in the scientific process and the consequences of its neglect. Dictionaries describe history as a chronological record of significant events, often including an explanation of their causes (10). From such a definition, the history of science would include the Copernican revolution, Newton’s Principia, the Darwin-Wallace theory of evolution, and the theory of relativity. Major events in the history of science are widely known and well documented, although the intellectual and experimental struggle required for discovery may not be as well appreciated. For example, while all scientists are aware of the Copernican revolution and Galileo’s struggle with the Catholic Church, the scientific arguments made
4460
iai.asm.org
in favor of a geocentric universe, such as the inability to detect stellar parallax (11), are less common knowledge. Although major scientific discoveries eventually become accepted as fact, the hardfought struggles to obtain this understanding tend to fade with the passage of time. Why do most scientists ignore the history of science? Assuming that Sir Peter is correct in saying that “the history of science bores most scientists stiff,” it is perhaps not difficult to explain the limited interest that most scientists take in history. Science by its very nature seeks to push back the boundaries of the unknown— the border between the known and unknown is far more interesting to scientists than what happened in the past. Although most students in the biological sciences learn about the discoveries of Darwin, Mendel, and Watson and Crick, it is fair to say that historical training is not a major part of the undergraduate or graduate science curriculum. Very few scientific fields have an accessible historical literature to supplement scientific training. While some students may have learned additional science history from courses that consider classic papers, most learn the history of their chosen field of study from their laboratory mentor or from review articles that emphasize historical aspects of discovery. Human aspects of scientific discovery, such as scientific rivalries and their effect on science, are generally not discussed in formal articles. Rather, such information is maintained within fields by an oral tradition consisting largely of gossip, anecdote, and rumor. One can master a scientific topic without having the least idea of how the knowledge was obtained. For example, it is possible to describe the central dogma of molecular biology from transcription to Accepted manuscript posted online 14 September 2015 Citation Casadevall A, Fang FC. 2015. (A)Historical science. Infect Immun 83:4460 – 4464. doi:10.1128/IAI.00921-15. Editor: A. J. Bäumler Address correspondence to Arturo Casadevall, acasade1@jhu.edu. Copyright © 2015, American Society for Microbiology. All Rights Reserved. The views expressed in this Editorial do not necessarily reflect the views of the journal or of ASM.
Infection and Immunity
December 2015 Volume 83 Number 12
Editorial
translation in excruciating detail without having to mention a single scientist’s name. In this regard, science differs from politics, law, economics, or most social sciences, in which the history of events is essential for understanding the field. For example, it impossible to understand the state of race relations in the United States without considering the history of slavery, civil war, reconstruction, segregation, and civil rights. In contrast to other intellectual pursuits, science can be viewed as being either privileged or disadvantaged because it has the luxury of neglecting its history. The scientific literature is deliberately ahistorical. In a lecture titled “Is the Scientific Paper a Fraud?,” Medawar also noted that the format of a conventional scientific paper consisting of an introduction, description of methods, results, and discussion implies a logical inductive process that is completely alien to how most science is actually done (12). Carmody expanded upon this point by observing that research papers not only idealize the scientific process but also drain it of the passion of discovery (13). Perhaps this has always been the case. When Elie Metchnikoff described his discovery of phagocytosis in starfish larvae in a research journal (14), he drily reported the following: The reactive phenomena ensuing on artificial injuries may be readily observed in the much larger larvae, the Binpinnaria astrigera. . .If a delicate glass tube, a rose-thorn, or a spine of a sea urchin be introduced into one of these larvae, the amoeboid cells of the mesoderm collect around the foreign body in large masses easily visible with the naked eye. Yet, the historical recollection of the events in his biography (15) paints a much different picture: One day when the whole family had gone to a circus to see some extraordinary performing apes, I remained alone with my microscope, observing the life in the mobile cells of a transparent starfish larva, when a new thought suddenly flashed across my brain. . .I felt so excited that I began striding up and down the room and even went to the seashore in order to collect my thoughts. . .I was too excited to sleep that night in the expectation of the result of my experiment. Medawar’s criticism of the scientific literature resonates in the present day with additional profound and disturbing implications. As any working scientist knows, the process of scientific discovery is messy and often involves dead ends, chance, and being in the right place at the right time. At a minimum, the conventional format of a scientific paper distorts history by creating a narrative for scientific discovery that is different from what actually occurred. Howitt and Wilson recently revisited the question of whether writing a scientific paper in the current accepted style was itself a fraudulent act. These authors concluded that “doing science and communicating science are quite different things” and noted that little had changed since Medawar’s provocative essay (16). Perhaps it is of even greater concern that the “winner take all” reward system of science and the pressure to demonstrate novelty may create perverse incentives for authors to overemphasize the novelty of their own work and fail to appropriately cite the contributions of others or selectively cite publications that support their conclusions (17, 18). Such historical neglect, whether inadvertent or purposeful, can misrepresent and bias the scientific record.
December 2015 Volume 83 Number 12
In some respects, it is an advantage that science can convey its subject matter without having to consider history. This means that science, unlike other disciplines (or the legal system [19]), is not shackled to the misinterpretations of the past. While history demands that facts be interpreted in context, scientists are wary of interpretations that are difficult to validate or falsify. Instead, untethered scientific knowledge is independent of history and can serve as a platform for further research. Scientists do not need to consider the contentious emergence of the heliocentric theory to accurately deliver probes to Mars, Ceres, and Pluto. However, there are significant costs when science neglects its history. The history of science is replete with instances in which facts and research were forgotten and later rediscovered. For example, the changes in cross-striated muscle during contraction were known in the 19th century but forgotten, only to be rediscovered in the mid-20th century (20). The vertical optical fasciculus was described by the neuroanatomist Wernicke in 1881 but later disputed and forgotten until the recent work of Wandell and colleagues (21). Moreover, scientists who are concerned with only the facts and not the process miss out on the rich human drama of perseverance, serendipity, inventiveness, and conflict that characterizes the history of science. It is often such details that are most interesting to a nonspecialist, which in turn facilitates teaching and the engagement of the general public with science. The omission of the history of discovery from scientific papers may thus serve to perpetuate the barrier between scientists and the public whom they serve and depend upon for support. To neglect history and accept the scientific literature as record is in fact to embrace a false narrative. The absence of a historical perspective of science can create a disconnect between perception and reality. In a seminal essay (22), Brush jokingly suggested that the history of science should be “X-rated” because Young and impressionable students at the start of a scientific career should be shielded from the writings of contemporary science historians. . .because of violence to the professional ideal and public image of scientists as rational, open-minded investigators, proceeding methodically, ground incontrovertibly in the outcome of controlled experiments, and seeking objectively for the truth. However, the serious subtext of this statement is that “the history of science may be used to challenge the supposedly truthseeking nature of science” (23). This is a devastating criticism because it implies that scientists who ignore the discrepancies between the real and idealized views of science may also undermine their legitimacy as objective and trustworthy authorities on the realities of the natural world. Why scientists should care about the history of science. The history of science is important because it highlights the ingenuity of earlier scientists and provides a map to connect current pathways of discovery with the past. To this, we add five reasons why scientists should pay greater attention to history. (i) Science is influenced by historical and social factors. The great pathologist Rudolf Virchow rejected the germ theory of disease because his passionate concern for social justice led him to attribute infectious diseases to poverty rather than to microbes. He actually had a point, but this example shows how science is not a purely objective endeavor that stands apart from society but rather that science and culture profoundly influence each other.
Infection and Immunity
iai.asm.org
4461
Editorial
This is most readily appreciated from a historical perspective. The historian may also be able to appreciate broad historical trends that are inapparent to a scientist. For example, the British philosopher Stephen Toulmin has written of the “Alexandrian Trap,” in which scientists in the 1st and 2nd centuries C.E. became increasingly specialized and focused on technology, losing sight of the bigger questions (24). Historians can help scientists to avoid this conceptual trap in the modern era by illuminating the grand arc of scientific discovery and the importance of basic research. (ii) History allows scientists to learn from previous errors. Errors are an inescapable part of science (25). The history of science can help to show how investigators may be led astray and how the process of discovery can be improved. The historian James Atkinson has observed that scientists pay little attention to “the experiments that failed, the approaches that did not work out, the speculations without sound empirical support, and the metaphysical underpinnings of the work that did not appear in print” (26). However, such failures are the purview of historians, and scientists can learn a great deal from their insights. (iii) A historical perspective provides a greater appreciation of how discoveries occur. Kuhn’s seminal work on scientific revolutions used history to understand how discoveries occur and come to be accepted (27). In fact, history is essential for understanding how science advances, but the scientific literature does a poor job of documenting critical events in the process of discovery. For example, scientific papers seldom mention the critical role of chance in discovery. As a case in point, we consider the association of Helicobacter pylori with peptic ulcer disease, a discovery that changed the treatment of this common disease and was recognized by the Nobel Prize in Physiology in 2005. In their landmark paper, Marshall and Warren paid tribute to the role of serendipity in a single sentence: “At first plates were discarded after 2 days, but when the first positive plate was noted after it had been left in the incubator for 6 days during the Easter holiday, cultures were done for 4 days” (28). Other than this casual reference to the religious calendar, the role of chance is not mentioned elsewhere in the paper. Marshall later acknowledged that prolonged incubation due to the holiday was a critical event leading to their landmark discovery. Decades of observations had suggested the presence of bacteria in stomach lesions, but these observations could not be validated experimentally because the slow-growing organism had not been successfully cultivated. The ability to grow H. pylori from stomach tissue allowed Marshall to establish causality in his now-famous self-experimentation that fulfilled Koch’s postulates. A greater appreciation of the role of chance and serendipity in discovery (29) could eventually result in reforms to promote transformative curiosity-driven research as opposed to an exclusive emphasis on hypothesis-driven and translational forms of research (30, 31). (iv) History can give credit where it is due. Many alternative histories of science may emerge when scientists compete for rewards such as positions, prizes, and funding. Consider the discovery of the antibiotic streptomycin. Scientific papers tell us the origin of the compound, the properties of the molecule, and the spectrum of antimicrobial activity. However, underlying these cold facts is the struggle of a junior partner, Albert Schatz, for recognition and the efforts by a senior partner, Selman Waksman, to deny him that credit (32–34). Although the discovery of streptomycin was honored with a Nobel Prize, the committee never considered the contribution of Schatz, the graduate student who
4462
iai.asm.org
actually made the discovery while working in a basement laboratory. We have previously argued that the Nobel Prize often assigns disproportionate credit to certain individuals while neglecting the contributions of others (35), and the Schatz-Waksman controversy is but one example. As professional recognition is the currency of science, history can play an invaluable role in setting the record straight. (v) History reveals evolving ethical standards in science. The history of science is essential for teaching about ethical behavior in science. The sanitized literature of scientific discovery often fails to detail ethical considerations, and it is striking to consider how scientific ethical standards have evolved over time. History has allowed us to see how Pasteur’s human trials, the Tuskegee and Guatemalan syphilis experiments, and the unauthorized appropriation of Henrietta Lacks’ cells are now considered ethical transgressions (36–39), which underscores that the obligations of science to society must undergo continuing reevaluation to ensure that science remains a force for good in the world. How to bring more history to science. We conclude by making a few recommendations to enhance the awareness of history among scientists. (i) Recognizing science historians. The scientific culture currently rewards priority and importance in discovery (5, 40), but there is little recognition for those who chronicle and interpret the human stories behind those discoveries. Although historians of science are recognized within their own field, they are too often regarded as curiosities by scientists. Scientific recognition that science historians and journalists have a critical role in the scientific enterprise will help to elevate the value of history in science and encourage students to take an interest in these fields. (ii) Promoting history in scientific societies. Many scientific organizations, such as the American Society for Microbiology, contain groups that are focused on history, such as the Center for the History of Microbiology/ASM Archives (CHOMA). Such groups play a critical role in preserving the past and are largely maintained by a dedicated set of history-minded individuals. The efforts of such groups should be encouraged, supported, and made more visible. Meetings, conferences, and publications provide ample opportunities to provide historical perspectives on key scientific topics and ensure continuity between the scientific past and present. Science historians and scientists alike could benefit from greater interaction and cross-fertilization. (iii) Promoting history in scientific courses and literature. The history of science can be a powerful tool to teach and promote science. In the early 20th century, Paul De Kruif’s Microbe Hunters helped to inspire a generation of scientists to pursue problems in microbiology (41). One mechanism to enhance the appreciation of the history of science is to combine historical aspects of discovery with the didactic presentation of scientific information. For example, a course on nucleic acids could be supplemented by historical readings on the subject and include such material as Watson’s The Double Helix: a Personal Account of the Discovery of the Structure of DNA (42), Judson’s The Eighth Day of Creation: Makers of the Revolution in Biology (43), and Edwin Chargaff’s reminiscences on the critical discoveries that first elucidated DNA structure (44). The injection of history, with its inevitable human foibles and drama, can add interest to any course and help to stimulate discussions about how discoveries come about and what
Infection and Immunity
December 2015 Volume 83 Number 12
Editorial
constitutes ethical behavior. Similarly, journals could encourage more historical articles, perhaps pairing historians with scientists to document the process of discovery and encourage interactions between these disciplines. Placing new findings in the context of historical questions and discoveries can help make science more interesting to the general public. Nonscientists are often more engaged by the human history of discovery than by stark scientific facts. A greater emphasis on the historical process of discovery could also enliven courses, journal clubs, seminars, and scientific papers. (iv) Assuring historical accuracy in scientific publications. The scientific literature has been highly formulaic for many decades. In contrast to the papers of the early 20th century, which often provided considerable background on the problems being addressed, publications today are terse and often limited in word number and the space that they can occupy in journals. As research publications are increasingly accessible in electronic format, space limitations have become less of a concern. This should allow journals to relax restrictions on word counts that prevent historical discussions and lead to inadequate citation of the relevant literature. Given that citations are increasingly used as a measure of scientific impact, removing artificial restrictions on reference list length will help to ensure that authors are appropriately credited for their work. Perhaps some journals could introduce a small “serendipity box” where authors could tell the reader how a particular discovery came about. For example, although the role of serendipity in the discovery of phenotypic switching in Cryptococcus neoformans (45) was briefly alluded to in the paper, more could have been said. For that paper, the serendipity box might have stated: This project began when strange colony morphologies were observed on agar plated with a liquid culture that had been inadvertently forgotten in a walk-in refrigerator. Although contamination was initially suspected, the colonies were shown to be C. neoformans, which prompted a search for the conditions that promoted such phenomena. The precedent of phenotypic switching in Candida albicans led the authors to specifically test whether the unusual morphologies represented a similar mechanism in C. neoformans. Those few words pay tribute to the importance of serendipity and chance and provide a truthful account of how the finding came to be recognized that also acknowledges critical prior observations made with Candida albicans. This anecdote illustrates Pasteur’s quote that “chance favors the prepared mind,” since the knowledge of the phenomenon in another system encouraged pursuit of the observation. There is a strong lore in microbiology about forgotten culture plates leading to discovery. We note that culture plates kept past their time led to Nobel prizes for the discoveries of penicillin and Helicobacter pylori. Perhaps the role of serendipity is minimized in today’s literature because it is contrary to the prevailing hypothesis-driven models of discovery, and giving credit to chance takes it away from the investigators. In fact, investigators often acknowledge the role of serendipity in discovery once a finding is accepted as important and credit is assured. It is time for the scientific literature to more truthfully represent the process of discovery and to reinforce the notion that honesty is essential to the quest for truth in science.
December 2015 Volume 83 Number 12
Science is more than a disembodied collection of facts. It is a uniquely human construct, a detailed and interconnected understanding of the natural world based on innumerable observations and contributions from individuals spanning thousands of years. History can help to keep science honest, with a keen sense of where it has been and where it is going. As Darwin observed, “Great is the power of steady misrepresentation— but the history of science shows how, fortunately, this power does not endure long” (46). REFERENCES 1. Medawar PB. 1996. The strange case of the spotted mice and other classic essays on science. Oxford University Press, Oxford, United Kingdom. 2. Momigliano A. 1958. The place of Herodotus in the history of historiography. History 43:1–13. http://dx.doi.org/10.1111/j.1468-229X.1958.tb02501.x. 3. Casadevall A, Fang FC. 2008. Descriptive science. Infect Immun 76: 3835–3836. http://dx.doi.org/10.1128/IAI.00743-08. 4. Casadevall A, Fang FC. 2009. Mechanistic science. Infect Immun 77: 3517–3519. http://dx.doi.org/10.1128/IAI.00623-09. 5. Casadevall A, Fang FC. 2009. Important science—it’s all about the SPIN. Infect Immun 77:4177– 4180. http://dx.doi.org/10.1128/IAI.00757-09. 6. Casadevall A, Fang FC. 2014. Specialized science. Infect Immun 82:1355– 1360. http://dx.doi.org/10.1128/IAI.01530-13. 7. Casadevall A, Fang FC. 2014. Diseased science. Microbe Mag 9:390 –392. 8. Fang FC, Casadevall A. 2015. Competitive science: is competition ruining science? Infect Immun 83:1229 –1233. http://dx.doi.org/10.1128/IAI .02939-14. 9. Casadevall A, Fang FC. 2015. Field science—the nature and utility of scientific fields. mBio 6:e01259-15. http://dx.doi.org/10.1128/mBio .01259-15. 10. Merriam-Webster. http://www.merriam-webster.com/dictionary/history. 11. Hirshfeld AW. 2001. Parallax: the race to measure the cosmos. W.H. Freeman, New York, NY. 12. Medawar P. 1963. Is the scientific paper a fraud? Listener 70:377–378. 13. Carmody J. 2001. Celebrating science. Nature 412:383. http://dx.doi.org /10.1038/35086659. 14. Metchnikoff E. 1893. Lectures on the comparative pathology of inflammation, Kegan Paul. Trench, Trübner & Co., Ltd., London, United Kingdom. 15. Metchnikoff O. 1921. Life of Elie Metchnikoff, 1845–1916. Constable, London, United Kingdom. 16. Howitt SM, Wilson AN. 2014. Revisiting “Is the scientific paper a fraud?”: the way textbooks and scientific research articles are being used to teach undergraduate students could convey a misleading image of scientific research. EMBO Rep 15:481– 484. http://dx.doi.org/10.1002/embr.201338302. 17. Committee on Science Engineering and Public Policy. 1995. On being a scientist: responsible conduct of research. National Academy Press, Washington, DC. 18. Sawin VI, Robinson KA. 16 June 2015. Biased and inadequate citation of prior research in reports of cardiovascular trials is a continuing source of waste in research. J Clin Epidemiol http://dx.doi.org/10.1016/j.jclinepi .2015.03.026. 19. Cornell University Law School. Stare decisis. Cornell University Law School, Ithaca, NY. https://www.law.cornell.edu/wex/stare_decisis. 20. Galler S. 2015. Forgotten research from 19th century: science should not follow fashion. J Muscle Res Cell Motil 36:5–9. http://dx.doi.org/10.1007 /s10974-014-9399-4. 21. Yeatman JD, Weiner KS, Pestilli F, Rokem A, Mezer A, Wandell BA. 2014. The vertical occipital fasciculus: a century of controversy resolved by in vivo measurements. Proc Natl Acad Sci U S A 111:E5214 –E5223. http: //dx.doi.org/10.1073/pnas.1418503111. 22. Brush SG. 1974. Should the history of science be rated X?: the way scientists behave (according to historians) might not be a good model for students. Science 183:1164 –1172. http://dx.doi.org/10.1126/science.183.4130.1164. 23. Burian RM. 1977. More than a marriage of convenience: on the inextricability of history and philosophy of science. Philos Sci 44:1– 42. http://dx .doi.org/10.1086/288722. 24. Toulmin S. 1974. The Alexandrian trap. Encounter 42:61–72. 25. Casadevall A, Steen RG, Fang FC. 2014. Sources of error in the retracted scientific literature. FASEB J 28:3847–3855. http://dx.doi.org/10.1096/fj .14-256735.
Infection and Immunity
iai.asm.org
4463
Editorial
26. Atkinson JW. 1979. The importance of the history of science to the American Society of Zoologists. Am Zoologist 19:1243–1246. 27. Kuhn TA. 1962. The structure of scientific revolutions. University of Chicago Press, Chicago, IL. 28. Marshall BJ, Warren JR. 1984. Unidentified curved bacilli in the stomach of patients with gastritis and peptic ulceration. Lancet i:1311–1315. 29. Meyers MA. 1995. Glen W. Hartman lecture. Science, creativity, and serendipity. Am J Roentgenol 165:755–764. 30. Fang FC, Casadevall A. 2010. Lost in translation— basic science in the era of translational research. Infect Immun 78:563–566. http://dx.doi.org/10 .1128/IAI.01318-09. 31. Botstein D. 2012. Why we need more basic biology research, not less. Mol Biol Cell 23:4160 – 4161. http://dx.doi.org/10.1091/mbc.E12-05-0406. 32. Lawrence PA. 2002. Rank injustice. Nature 415:835– 836. http://dx.doi .org/10.1038/415835a. 33. Schatz A, Robinson KA. 1993. The true story of the discovery of streptomycin. Actinomycetes 4:27–39. 34. Pringle P. 2012. Experiment eleven: deceit and betrayal in the discovery of the cure for tuberculosis. Bloomsbury UK, London, United Kingdom. 35. Casadevall A, Fang FC. 2013. Is the Nobel Prize good for science? FASEB J 27:4682– 4690. http://dx.doi.org/10.1096/fj.13-238758. 36. Jones JH. 1981. Bad blood: the Tuskegee syphilis experiment. The Free Press, New York, NY.
4464 iai.asm.org
37. Geison GL. 1995. The private science of Louis Pasteur. Princeton University Press, Princeton, NJ. 38. Skloot R. 2010. The immortal life of Henrietta Lacks. Broadway Books, New York, NY. 39. Rodriguez MA, Garcia R. 2013. First, do no harm: the US sexually transmitted disease experiments in Guatemala. Am J Public Health 103:2122– 2126. http://dx.doi.org/10.2105/AJPH.2013.301520. 40. Casadevall A, Fang FC. 2012. Winner takes all. Sci Am 307:13. http://dx .doi.org/10.1038/scientificamerican0812-13. 41. De Kruif P. 1926. Microbe hunters. Harcourt, Brace and Co., New York, NY. 42. Watson JD. 1968. The double helix: a personal account of the discovery of the structure of DNA. Atheneum, New York, NY. 43. Judson HF. 1979. The eighth day of creation: makers of the revolution in biology. Simon and Schuster, New York, NY. 44. Chargaff E. 1974. Building the tower of babble. Nature 248:776 –779. http://dx.doi.org/10.1038/248776a0. 45. Goldman DL, Fries BC, Franzot SP, Montella L, Casadevall A. 1998. Phenotypic switching in the human pathogenic fungus Cryptococcus neoformans is associated with changes in virulence and pulmonary inflammatory response in rodents. Proc Natl Acad Sci U S A 95:14967–14972. http://dx.doi.org/10.1073/pnas.95.25.14967. 46. Darwin C. 1859. On the origin of species by means of natural selection. John Murray, London, United Kingdom.
Infection and Immunity
December 2015 Volume 83 Number 12
SEMINARIO N°2
INTRODUCCIÓN A LA ÉTICA EN LA INVESTIGACIÓN MÉDICA. Emanuel EJ, Wendler D, Grady C. What makes clinical research ethical? JAMA. 2000; 283(20): 2701-11.
Preguntas para el control de lectura y guía de discusión grupal 1.
Los autores mencionan que parte de los motivos por los cuales se han desarrollado algunos de los principales documentos normativos internacionales sobre ética en investigación en seres humanos fueron respuestas ante eventos específicos (como los casos Tuskegee y Willowbrook respecto al informe Belmont). ¿Qué opinión tiene frente a este accionar dado desde mediados del siglo XX? (reactivo vs preventivo)
2.
El primer requisito ético señalado por los autores para toda investigación clínica es que tenga un valor social o científico. ¿Qué criterios propondría usted para poder establecer esta valoración? Es decir, ¿cómo considerar qué investigaciones son relevantes científica y/o socialmente?
3.
El artículo plantea una crítica a la aproximación utilitarista que puede darse al evaluar la relación riesgobeneficio en investigaciones clínicas, cuando se valoran los potenciales beneficios sociales frente a los riesgos individuales. ¿Cuál considera usted debiera ser el abordaje a esta situación?
4.
Se describen dos casos en los que se analiza el cumplimiento o no de los requisitos éticos propuestos por los autores en las investigaciones clínicas. Al respecto: a.
¿podría mencionar una situación en la cual se justifique el uso de placebo como control en un ensayo clínico?
b.
si un estudio es llevado a cabo en un país de bajos ingresos y se evidencia la eficacia de la intervención, ¿cómo proceder luego con la comercialización y acceso al medicamento, justamente, en países de bajos ingresos?
Preguntas adicionales a ser discutidas en la sesión grupal 1.
Usted trabaja en un establecimiento médico especializado y se está llevando a cabo un ensayo clínico en pacientes con una enfermedad severa. Usted tiene un conocido con esa dolencia, y conversa con el investigador principal del estudio para que su conocido pueda ser incluido en el grupo de “tratamiento activo” del ensayo. Esta es una situación hipotética, pero de darse en realidad, ¿actuaría de esa manera?
2.
En una ciudad se ha producido un brote epidémico de una enfermedad altamente virulenta, siendo necesaria la identificación de los portadores como medida de control. Un equipo de investigación de brotes es encargado entonces para realizar el estudio. En el trabajo de campo, varias personas no dan su consentimiento para participar del estudio (lo que significa no dar una muestra de sangre para descartar la enfermedad). Estando entonces, por un lado, frente el respeto a la decisión individual, y por otro, a la potencialidad de expansión del brote epidémico ¿Qué decisión tomaría en este caso?
9
METODOLOGÍA DE LA INVESTIGACIÓN I
SPECIAL COMMUNICATION
What Makes Clinical Research Ethical? Ezekiel J. Emanuel, MD, PhD David Wendler, PhD Christine Grady, PhD
W
HAT MAKES RESEARCH IN-
volving human subjects ethical? Informed consent is the answer most US researchers, bioethicists, and institutional review board (IRB) members would probably offer. This response reflects the preponderance of existing guidance on the ethical conduct of research and the near obsession with autonomy in US bioethics.1-4 While informed consent is necessary in most but not all cases, in no case is it sufficient for ethical clinical research.5-8 Indeed, some of the most contentious contemporary ethical controversies in clinical research, such as clinical research in developing countries,9-13 the use of placebos,14-16 phase 1 research,17-19 protection for communities, 20-24 and involvement of children,25-29 raise questions not of informed consent, but of the ethics of subject selection, appropriate risk-benefit ratios, and the value of research to society. Since obtaining informed consent does not ensure ethical research, it is imperative to have a systematic and coherent framework for evaluating clinical studies that incorporates all relevant ethical considerations. In this article, we delineate 7 requirements that provide such a framework by synthesizing traditional codes, declarations, and relevant literature on the ethics of research with human subjects. This framework should help guide the ethical development and evaluation of clinical studies by investigators, IRB members, funders, and others.
Many believe that informed consent makes clinical research ethical. However, informed consent is neither necessary nor sufficient for ethical clinical research. Drawing on the basic philosophies underlying major codes, declarations, and other documents relevant to research with human subjects, we propose 7 requirements that systematically elucidate a coherent framework for evaluating the ethics of clinical research studies: (1) value— enhancements of health or knowledge must be derived from the research; (2) scientific validity—the research must be methodologically rigorous; (3) fair subject selection—scientific objectives, not vulnerability or privilege, and the potential for and distribution of risks and benefits, should determine communities selected as study sites and the inclusion criteria for individual subjects; (4) favorable risk-benefit ratio—within the context of standard clinical practice and the research protocol, risks must be minimized, potential benefits enhanced, and the potential benefits to individuals and knowledge gained for society must outweigh the risks; (5) independent review— unaffiliated individuals must review the research and approve, amend, or terminate it; (6) informed consent—individuals should be informed about the research and provide their voluntary consent; and (7) respect for enrolled subjects—subjects should have their privacy protected, the opportunity to withdraw, and their well-being monitored. Fulfilling all 7 requirements is necessary and sufficient to make clinical research ethical. These requirements are universal, although they must be adapted to the health, economic, cultural, and technological conditions in which clinical research is conducted. www.jama.com
JAMA. 2000;283:2701-2711
THE 7 ETHICAL REQUIREMENTS The overarching objective of clinical research is to develop generalizable knowledge to improve health and/or increase understanding of human biology30,31; subjects who participate are the means to securing such knowledge.32 By placing some people at risk of harm for the good of others, clinical research has the potential for exploitation of human subjects.33,34 Ethical requirements for clinical research aim to minimize the possibility of exploitation by ensuring that research subjects are not merely used but are treated with respect while they contribute to the social good.30
©2000 American Medical Association. All rights reserved.
For the past 50 years, the main sources of guidance on the ethical conduct of clinical research have been the Nuremberg Code,35 Declaration of Helsinki,36 Belmont Report,37 International Ethical Guidelines for Biomedical Research Involving Human Subjects,38 and similar documents (TABLE 1). However, many of these documents were written in response to specific events and to avoid future scandals.50,51 By focusing on the instigating issues, these guidelines tend to Author Affiliations: Department of Clinical Bioethics, Warren G. Magnuson Clinical Center, National Institutes of Health, Bethesda, Md. Corresponding Author and Reprints: Christine Grady, PhD, Warren G. Magnuson Clinical Center, Bldg 10, Room 1C118, National Institutes of Health, Bethesda, MD 20892-1156 (e-mail: cgrady@nih.gov).
(Reprinted) JAMA, May 24/31, 2000—Vol 283, No. 20
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
2701
ETHICAL REQUIREMENTS FOR CLINICAL RESEARCH
emphasize certain ethical requirements while eliding others. For instance, the Nuremberg Code35 was part of the judicial decision condemning the atrocities of the Nazi physicians and so focused on the need for consent and a favorable riskbenefit ratio but makes no mention of fair subject selection or independent review. The Declaration of Helsinki36 was developed to remedy perceived lacunae in the Nuremberg Code, especially as related to physicians conducting research with patients, and so focuses on favorable risk-benefit ratio and independent review; the Declaration of Helsinki also emphasizes a distinction between thera-
peutic and nontherapeutic research that is rejected or not noted by other documents.30,52 The Belmont Report37 was meant to provide broad principles that could be used to generate specific rules and regulations in response to US research scandals such as Tuskegee53 and Willowbrook. 54,55 It focuses on informed consent, favorable risk-benefit ratio, and the need to ensure that vulnerable populations are not targeted for risky research. The Council for International Organizations of Medical Sciences (CIOMS) guidelines38 were intended to apply the Declaration of Helsinki “in developing countries . . . [particularly for]
Table 1. Selected Guidelines on the Ethics of Biomedical Research With Human Subjects* Guideline Nuremberg Code35
Declaration of Helsinki36 Belmont Report37
International Ethical Guidelines for Biomedical Research Involving Human Subjects38
45 CFR 46, Common Rule8
Guidelines for Good Clinical Practice for Trials on Pharmaceutical Products42 Good Clinical Practice: Consolidated Guidance44
Convention on Human Rights and Biomedicine43 Guidelines and Recommendations for European Ethics Committees45 Medical Research Council Guidelines for Good Clinical Practice in Clinical Trials46 Guidelines for the Conduct of Health Research Involving Human Subjects in Uganda47 Ethical Conduct for Research Involving Humans48 National Statement on Ethical Conduct in Research Involving Humans49
Source Fundamental Nuremberg Military Tribunal decision in United States v Brandt World Medical Association National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research Council for International Organizations of Medical Sciences in collaboration with World Health Organization Other US Department of Health and Human Services (DHHS) and other US federal agencies World Health Organization
Year and Revisions 1947
1964, 1975, 1983, 1989, 1996 1979
Proposed in 1982; revised, 1993
DHHS guidelines in 1981; Common Rule, 1991 1995
International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use Council of Europe
1996
European Forum for Good Clinical Practice
1997
Medical Research Council, United Kingdom
1998
Uganda National Council for Science and Technology
1998
Tri-Council Working Group, Canada
1998
National Health and Medical Research Council, Australia
1999
1997
*CFR indicates Code of Federal Regulations. More extensive lists of international guidelines on human subjects research can be found in Brody39 and Fluss.40 An extensive summary of US guidelines can be found in Sugarman et al.41
2702
JAMA, May 24/31, 2000—Vol 283, No. 20 (Reprinted)
large-scale trials of vaccines and drugs.” The CIOMS guidelines lack a separate section devoted to risk-benefit ratios, although the council considers this issue in commentary on other guidelines. It also includes a section on compensation for research injuries not found in other documents. Because the Advisory Committee on Human Radiation Experiments was responding to covert radiation experiments, avoiding deception was among its 6 ethical standards and rules; most other major documents do not highlight this.56 This advisory committee claims that its ethical standards are general, but acknowledges that its choices were related to the specific circumstances that occasioned the report.56 Finally some tensions, if not outright contradictions, exist among the provisions of the various guidelines.5,19,30,51,52,57,58 Absent a universally applicable ethical framework, investigators, IRB members, funders, and others lack coherent guidance on determining whether specific clinical research protocols are ethical. There are 7 requirements that provide a systematic and coherent framework for determining whether clinical research is ethical (T ABLE 2). These requirements are listed in chronological order from the conception of the research to its formulation and implementation. They are meant to guide the ethical development, implementation, and review of individual clinical protocols. These 7 requirements are intended to elucidate the ethical standards specific for clinical research and assume general ethical obligations, such as intellectual honesty and responsibility. While none of the traditional ethical guidelines on clinical research explicitly includes all 7 requirements, these requirements systematically elucidate the fundamental protections embedded in the basic philosophy of all these documents.30 These requirements are not limited to a specific tragedy or scandal or to the practices of researchers in 1 country; they are meant to be universal, although their application will require adaptation to particular cultures, health conditions, and economic settings. These
©2000 American Medical Association. All rights reserved.
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
ETHICAL REQUIREMENTS FOR CLINICAL RESEARCH
7 requirements can be implemented well or ineffectively. However, their systematic delineation is important and conceptually prior to the operation of an enforcement mechanism. We need to know what to enforce. Value
To be ethical, clinical research must be valuable,4,35 meaning that it evaluates a diagnostic or therapeutic intervention that could lead to improvements in health or well-being; is a preliminary etiological, pathophysiological, or epidemiological study to develop such an intervention; or tests a hypothesis that can generate important knowledge about structure or function of human biological systems, even if that knowledge does not have immediate practical ramifications.4,30 Examples of research that would not be socially or
scientifically valuable include clinical research with nongeneralizable results, a trifling hypothesis, or substantial or total overlap with proven results.4 In addition, research with results unlikely to be disseminated or in which the intervention could never be practically implemented even if effective is not valuable.12,13,38,59 Only if society will gain knowledge, which requires sharing results, whether positive or negative, can exposing human subjects to risk in clinical research be justified. Thus, evaluation of clinical research should ensure that the results will be disseminated, although publication in peer-reviewed journals need not be the primary or only mechanism. There are 2 fundamental reasons why social, scientific, or clinical value should be an ethical requirement: responsible use of finite resources and avoidance of
exploitation.4 Research resources are limited. Even if major funding agencies could fund all applications for clinical research, doing so would divert resources from other worthy social pursuits. Beyond not wasting resources, researchers should not expose human beings to potential harms without some possible social or scientific benefit.4,30,35,38 It is possible to compare the relative value of different clinical research studies; clinical research that is likely to generate greater improvements in health or well-being given the condition being investigated, the state of scientific understanding, and the feasibility of implementing the intervention is of higher value. Comparing relative value is integral to determinations of funding priorities when allocating limited funds among alternative research proposals.60 Similarly, a comparative evalu-
Table 2. Seven Requirements for Determining Whether a Research Trial Is Ethical* Requirement
Explanation
Justifying Ethical Values
Expertise for Evaluation
Social or scientific value
Evaluation of a treatment, intervention, or theory that will improve health and well-being or increase knowledge
Scarce resources and nonexploitation
Scientific knowledge; citizen’s understanding of social priorities
Scientific validity
Use of accepted scientific principles and methods, including statistical techniques, to produce reliable and valid data
Scarce resources and nonexploitation
Scientific and statistical knowledge; knowledge of condition and population to assess feasibility
Fair subject selection
Selection of subjects so that stigmatized and vulnerable individuals are not targeted for risky research and the rich and socially powerful not favored for potentially beneficial research Minimization of risks; enhancement of potential benefits; risks to the subject are proportionate to the benefits to the subject and society
Justice
Scientific knowledge; ethical and legal knowledge
Nonmaleficence, beneficence, and nonexploitation
Scientific knowledge; citizen’s understanding of social values
Independent review
Review of the design of the research trial, its proposed subject population, and risk-benefit ratio by individuals unaffiliated with the research
Public accountability; minimizing influence of potential conflicts of interest
Intellectual, financial, and otherwise independent researchers; scientific and ethical knowledge
Informed consent
Provision of information to subjects about purpose of the research, its procedures, potential risks, benefits, and alternatives, so that the individual understands this information and can make a voluntary decision whether to enroll and continue to participate
Respect for subject autonomy
Scientific knowledge; ethical and legal knowledge
Respect for potential and enrolled subjects
Respect for subjects by (1) permitting withdrawal from the research; (2) protecting privacy through confidentiality; (3) informing subjects of newly discovered risks or benefits; (4) informing subjects of results of clinical research; (5) maintaining welfare of subjects
Respect for subject autonomy and welfare
Scientific knowledge; ethical and legal knowledge; knowledge of particular subject population
Favorable risk-benefit ratio
*Ethical requirements are listed in chronological order from conception of research to its formulation and implementation.
©2000 American Medical Association. All rights reserved.
(Reprinted) JAMA, May 24/31, 2000—Vol 283, No. 20
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
2703
ETHICAL REQUIREMENTS FOR CLINICAL RESEARCH
ation of value may be necessary in considering studies involving finite scientific resources such as limited biological material or the small pool of long-term human immunodeficiency virus nonprogressors. Scientific Validity
To be ethical, valuable research must be conducted in a methodologically rigorous manner.4 Even research asking socially valuable questions can be designed or conducted poorly and produce scientifically unreliable or invalid results.61 As the CIOMS guidelines succinctly state: “Scientifically unsound research on human subjects is ipso facto unethical in that it may expose subjects to risks or inconvenience to no purpose.”38 For a clinical research protocol to be ethical, the methods must be valid and practically feasible: the research must have a clear scientific objective; be designed using accepted principles, methods, and reliable practices; have sufficient power to definitively test the objective; and offer a plausible data analysis plan.4 In addition, it must be possible to execute the proposed study. Research that uses biased samples, questions, or statistical evaluations, that is underpowered, that neglects critical end points, or that could not possibly enroll sufficient subjects cannot generate valid scientific knowledge and is thus unethical.4,30,62 For example, research with too few subjects is not valid because it might be combined in a meaningful meta-analysis with other, as yet unplanned and unperformed clinical research; the ethics of a clinical research study cannot depend on the research that others might but have not yet done. Of course the development and approval of a valid method is of little use if the research is conducted in a sloppy or inaccurate manner; careless research that produces uninterpretable data is not just a waste of time and resources, it is unethical. Clinical research that compares therapies must have “an honest null hypothesis” or what Freedman called clinical equipoise.30,63 That is, there must be con2704
troversy within the scientific community about whether the new intervention is better than standard therapy, including placebo, either because most clinicians and researchers are uncertain about whether the new treatment is better, or because some believe the standard therapy is better while others believe the investigational intervention superior.63 If there exists a consensus about what is the better treatment, there is no null hypothesis, and the research is invalid. In addition, without clinical equipoise, research that compares therapies is unlikely to be of value because the research will not contribute to increasing knowledge about the best therapy, and the risk-benefit ratio is unlikely to be favorable because some of the subjects will receive inferior treatment. Importantly, a “good question” can be approached by good or bad research techniques; bad research methods do not render the question valueless. Thus, the significance of a hypothesis can and should be assessed prior to and independent of the specific research methods. Reviewers should not dismiss a proposal that uses inadequate methods without first considering whether adjustments could make the proposal scientifically valid. The justification of validity as an ethical requirement relies on the same 2 principles that apply to value— limited resources and the avoidance of exploitation.4,30 “Invalid research is unethical because it is a waste of resources as well: of the investigator, the funding agency, and anyone who attends to the research.”4 Without validity the research cannot generate the intended knowledge, cannot produce any benefit, and cannot justify exposing subjects to burdens or risks.50 Fair Subject Selection
The selection of subjects must be fair.30,37,56 Subject selection encompasses decisions about who will be included both through the development of specific inclusion and exclusion criteria and the strategy adopted for recruiting subjects, such as which communities will be study sites and
JAMA, May 24/31, 2000—Vol 283, No. 20 (Reprinted)
which potential groups will be approached. There are several facets to this requirement. First, fair subject selection requires that the scientific goals of the study, not vulnerability, privilege, or other factors unrelated to the purposes of the research, be the primary basis for determining the groups and individuals that will be recruited and enrolled.3,30,37 In the past, groups sometimes were enrolled, especially for research that entailed risks or offered no potential benefits, because they were “convenient” or compromised in their ability to protect themselves, even though people from less vulnerable groups could have met the scientific requirements of the study.30,37,53,54 Similarly, groups or individuals should not be excluded from the opportunity to participate in research without a good scientific reason or susceptibility to risk that justifies their exclusion.64 It is important that the results of research be generalizable to the populations that will use the intervention. Efficiency cannot override fairness in recruiting subjects.37 Fairness requires that women be included in the research, unless there is good reason, such as excessive risks, to exclude them.65-69 This does not mean that every woman must be offered the opportunity to participate in research, but it does mean that women as a class cannot be peremptorily excluded. Second, it is important to recognize that subject selection can affect the risks and benefits of the study.70 Consistent with the scientific goals, subjects should be selected to minimize risks and enhance benefits to individual subjects and society. Subjects who are eligible based on the scientific objectives of a study, but are at substantially higher risk of being harmed or experiencing more severe harm, should be excluded from participation.71 Selecting subjects to enhance benefits entails consideration of which subjects will maximize the benefit or value of the information obtained. If a potential drug or procedure is likely to be prescribed for women or children if proven safe and effective, then these groups should be
©2000 American Medical Association. All rights reserved.
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
ETHICAL REQUIREMENTS FOR CLINICAL RESEARCH
included in the study to learn how the drug affects them.63,66,67 Indeed, part of the rationale for recent initiatives to include more women, minorities, and children in clinical research is to maximize the benefits and value of the study by ensuring that these groups are enrolled.65-67,72,73 It is not necessary to include children in all phases of research. Instead, it may be appropriate to include them only after the safety of the drug has been assessed in adults. Additionally, fair subject selection requires that, as far as possible, groups and individuals who bear the risks and burdens of research should be in a position to enjoy its benefits,12,13,38,59,74 and those who may benefit should share some of the risks and burdens.75 Groups recruited to participate in clinical research that involves a condition to which they are susceptible or from which they suffer are usually in a position to benefit if the research provides a positive result, such as a new treatment. For instance, selection of subjects for a study to test the efficacy of an antimalarial vaccine should consider not only who will best answer the scientific question, but also whether the selected groups will receive the benefits of the vaccine, if proven effective.12,13,37,59,74,76 Groups of subjects who will predictably be excluded as beneficiaries of research results that are relevant to them typically should not assume the burdens so that others can benefit. However, this does not preclude the inclusion of subjects who are scientifically important for a study but for whom the potential products of the research may not be relevant, such as healthy control subjects. Fair subject selection should be guided by the scientific aims of the research and is justified by the principles that equals should be treated similarly and that both the benefits and burdens generated by social cooperation and activities such as clinical research should be distributed fairly.3,30,37,38,66,67 This does not mean that individual subjects and members of groups from which they are selected must directly benefit from each clini-
cal research project or that people who are marginalized, stigmatized, powerless, or poor should never be included. Instead, the essence of fairness in human subjects research is that scientific goals, considered in dynamic interaction with the potential for and distribution of risks and benefits, should guide the selection of subjects. Favorable Risk-Benefit Ratio
Clinical research involves drugs, devices, and procedures about which there is limited knowledge. As a result, research inherently entails uncertainty about the degree of risk and benefits, with earlier phase research having greater uncertainty. Clinical research can be justified only if, consistent with the scientific aims of the study and the relevant standards of clinical practice, 3 conditions are fulfilled: the potential risks to individual subjects are minimized, the potential benefits to individual subjects are enhanced, and the potential benefits to individual subjects and society are proportionate to or outweigh the risks.30,36,37 Assessment of the potential risks and benefits of clinical research by researchers and review bodies typically involves multiple steps. First, risks are identified and, within the context of good clinical practice, minimized “by using procedures which are consistent with sound research design and which do not unnecessarily expose subjects to risk, and whenever appropriate, by using procedures already being performed on the subjects for diagnostic or treatment purposes.”8 Second, potential benefits to individual subjects from the research are delineated and enhanced. Potential benefits focus on the benefits to individual subjects, such as health improvements, because the benefits to society through the generation of knowledge are assumed if the research is deemed to be of value and valid. The specification and enhancement of potential benefits to individual subjects should consider only health-related potential benefits derived from the research.77 Assessment of the research plan should determine if
©2000 American Medical Association. All rights reserved.
changes could enhance the potential benefits for individual subjects. For example, consistent with the scientific objectives, tests and interventions should be arranged to increase benefit to subjects. However, extraneous benefits, such as payment, or adjunctive medical services, such as the possibility of receiving a hepatitis vaccine not related to the research, cannot be considered in delineating the benefits compared with the risks, otherwise simply increasing payment or adding more unrelated services could make the benefits outweigh even the riskiest research. Furthermore, while participants in clinical research may receive some health services and benefits, the purpose of clinical research is not the provision of health services. Services directly related to clinical research are necessary to ensure scientific validity and to protect the wellbeing of the individual subjects. In the final step, risks and potential benefits of the clinical research interventions to individual subjects are compared. In general, the more likely and/or severe the potential risks the greater in likelihood and/or magnitude the prospective benefits must be; conversely, research entailing potential risks that are less likely and/or of lower severity can have more uncertain and/or circumscribed potential benefits. If the potential benefits to subjects are proportional to the risks they face, as generally found when evaluating phase 2 and 3 research, then the additional social benefits of the research, assured by the fulfillment of the value and validity requirements, imply that the cumulative benefits of the research outweigh its risks.30 Obviously, the notions of “proportionality” and potential benefits “outweighing” risks are nonquantifiable.37 However, the absence of a formula to determine when the balance of risks and potential benefits is proportionate does not connote that such judgments are inherently haphazard or subjective. Instead, assessments of risks and potential benefits to the same individuals can appeal to explicit standards, informed by existing data on the potential types
(Reprinted) JAMA, May 24/31, 2000—Vol 283, No. 20
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
2705
ETHICAL REQUIREMENTS FOR CLINICAL RESEARCH
of harms and benefits, their likelihood of occurring, and their long-term consequences.37 People routinely make discursively justifiable intrapersonal comparisons of risks and benefits for themselves and even for others, such as children, friends, and employees, without the aid of mathematical formulae.78 An additional evaluation is necessary for any clinical research that presents no potential benefits to individual subjects, such as phase 1 safety, pharmacokinetic, and even some epidemiology research, or when the risks outweigh the potential benefits to individual subjects.72 This determination, which Weijer79 calls a “risk-knowledge calculus,” assesses whether the societal benefits in terms of knowledge justify the excess risks to individual subjects. Determination of when potential social benefits outweigh risks to individual subjects requires interpersonal comparisons that are conceptually and practically more difficult.78 However, policymakers often are required to make these kind of comparisons, for example when considering whether pollution and its attendant harms to some people are worth the potential benefits of higher employment and tax revenues to others. There is no settled framework for how potential social benefits should be balanced against individual risks. Indeed, the appeal to a utilitarian approach of maximization, as in cost-benefit analysis, is quite controversial both morally and because many risks and benefits of research are not readily quantifiable on commensurable scales.78-82 Nevertheless, these comparisons are made,83 and regulations mandate that investigators and IRBs make them with respect to clinical research. When research risks exceed potential medical benefits to individuals and the benefit of useful knowledge to society, the clinical research is not justifiable. The requirement for a favorable riskbenefit ratio embodies the principles of nonmaleficence and beneficence, long recognized as fundamental values of clinical research.3,30,36,37 The principle of nonmaleficence states that one ought not 2706
to inflict harm on a person.3 This justifies the need to reasonably reduce the risks associated with research. The principle of beneficence “refers to a moral obligation to act for the benefit of others.”3 In clinical research, this translates into the need to enhance the potential benefits of the research for both individual subjects and society.3,30,37 Ensuring that the benefits outweigh the risks is required by the need to avoid the exploitation of subjects.30,37 Independent Review
Investigators inherently have multiple, legitimate interests—interests to conduct high-quality research, complete the research expeditiously, protect research subjects, obtain funding, and advance their careers. These diverse interests can generate conflicts that may unwittingly distort the judgment of even well-intentioned investigators regarding the design, conduct, and analysis of research.84-87 Wanting to complete a study quickly may lead to the use of questionable scientific methods or readily available rather than the most appropriate subjects. Independent review by individuals unaffiliated with the clinical research helps minimize the potential impact of such conflicts of interest.86,88 For some research with few or no risks, independent review may be expedited, but for much of clinical research, review should be done by a full committee of individuals with a range of expertise who have the authority to approve, amend, or terminate a study. Independent review of clinical research is also important for social accountability. Clinical research imposes risks on subjects for the benefit of society. Independent review of a study’s compliance with ethical requirements assures members of society that people who enroll in trials will be treated ethically and that some segments of society will not benefit from the misuse of other human beings. Review also assures people that if they enroll in clinical research, the trial is ethically designed and the risk-benefit ratio is favorable.
JAMA, May 24/31, 2000—Vol 283, No. 20 (Reprinted)
In the United States, independent evaluation of research projects occurs through multiple groups including granting agencies, local IRBs, and data and safety monitoring boards.89-91 In other countries, independent review of clinical research is conducted in other ways. Informed Consent
Of all requirements, none has received as much explication as informed consent.2-4,6,7,19,30-32,35-38 The purpose of informed consent is 2-fold: to ensure that individuals control whether or not they enroll in clinical research and participate only when the research is consistent with their values, interests, and preferences.2,3,30-32,35,37,92-96 To provide informed consent, individuals must be accurately informed of the purpose, methods, risks, benefits, and alternatives to the research; understand this information and its bearing on their own clinical situation; and make a voluntary and uncoerced decision whether to participate.97-99 Each of these elements is necessary to ensure that individuals make rational and free determinations of whether the research trial is consonant with their interests. Informed consent embodies the need to respect persons and their autonomous decisions.2,3,97,98 To enroll individuals in clinical research without their authorization is to treat them merely as a means to purposes and ends they may not endorse and deny them the opportunity to choose what projects they will pursue. Children and adults with diminished mental capacity who are unable to make their own decisions about participating in research nonetheless have interests and values.2,3 For instance, individuals rendered unconscious due to head trauma or a stroke typically retain the interests and values they had just before the accident. Even individuals with severe Alzheimer disease retain some interests, if only those related to personal dignity and physical comfort. Showing respect for these nonautonomous persons means ensuring that research participation is consistent with their interests and values; this
©2000 American Medical Association. All rights reserved.
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
ETHICAL REQUIREMENTS FOR CLINICAL RESEARCH
usually entails empowering a proxy decision maker to determine whether to enroll the person in clinical research. In making this decision, the proxy uses the substituted judgment standard: what research decision would the subject make if he or she could.2,3,100 However, an individual’s preferences and values related to clinical research may be unknown or unknowable, or, in the case of children, the individual may not have developed mature preferences related to research. In such cases, research proxies should choose the option that is in the individual’s best medical interests. There is controversy about how much discretion proxies should have in such circumstances, especially given the inherent uncertainty of the risks and potential benefits of research participation.101-105 The National Bioethics Advisory Commission has urged that proxies should exercise “great caution” in making judgments about a subject’s best interest regarding research.103 Other groups believe that proxies should have more discretion. In emergency settings that preclude time for identifying and eliciting the consent of a proxy decision maker, research can proceed without either informed consent or permission of proxy decision makers when conducted under strict guidelines.6 Most importantly, there should be clinical equipoise—the absence of a consensus regarding the comparative merits of the interventions to be tested.63 In such a case, the subject is not worse off by enrolling. Respect for Potential and Enrolled Subjects
Ethical requirements for clinical research do not end when individuals either sign the consent form and are enrolled or refuse enrollment.106 Individuals must continue to be treated with respect from the time they are approached—even if they refuse enrollment—throughout their participation and even after their participation ends. Respecting potential and enrolled subjects entails at least 5 different activities. First, since substantial informa-
tion will be collected about potential as well as enrolled subjects, their privacy must be respected by managing the information in accordance with confidentiality rules. Second, respect includes permitting subjects to change their mind, to decide that the research does not match their interests, and to withdraw without penalty. Third, in the course of clinical research new information about the effect of the intervention or the subject’s clinical condition may be gained. Respect requires that enrolled subjects be provided with this new information. For instance, when informed consent documents are modified to include additional risks or benefits discovered in the course of research, subjects already enrolled should be informed. Fourth, the welfare of subjects should be carefully monitored throughout their research participation. If subjects experience adverse reactions, untoward events, or changes in clinical status, they should be provided with appropriate treatment and, when necessary, removed from the study. Finally, to recognize subjects’ contribution to clinical research, there should be some mechanism to inform them of what was learned from the research. For commentators used to thinking about respect in terms of privacy and confidentiality alone, these different activities may seem a haphazard agglomeration of informed consent, confidentiality, and other protections. In fact, this requirement integrates into a coherent framework actions the commonality of which often goes unrecognized. As such, it reminds investigators, subjects, IRB members, and others that respect for subjects requires the respectful treatment of individuals who choose not to enroll and the careful ongoing monitoring of those who do, in addition to ensuring the privacy and confidentiality of enrolled subjects. This requirement emphasizes that the ethics of clinical research do not end with the signing of a consent document but encompass the actual implementation, analysis, and dissemination of research. Indeed, it suggests that although “human subjects” is the pre-
©2000 American Medical Association. All rights reserved.
vailing designation, the term subject may not fully reflect appropriate respect: human research participant or partner may be more appropriate terminology. Respect for potential and enrolled subjects is justified by multiple principles including beneficence, nonmaleficence, and respect for persons.3 Permitting subjects to withdraw and providing them additional information learned from the research are key aspects of respecting subject autonomy.3,37 Protecting confidentiality and monitoring wellbeing are motivated by respect for persons, beneficence, and nonmaleficence.3 ARE THESE ETHICAL REQUIREMENTS NECESSARY AND SUFFICIENT? Value, validity, fair subject selection, favorable risk-benefit ratio, and respect for subjects embody substantive ethical values. As such, they are all necessary: clinical research that neglected or violated any of these requirements would be unethical. Conversely, independent review and informed consent are procedural requirements intended to minimize the possibility of conflict of interest, maximize the coincidence of the research with subjects’ interests, and respect their autonomy.30 However, other procedures may also achieve these results. For instance, evidence of an individual’s preferences regarding research may be obtained from a research advance directive rather than the individual’s concurrent informed consent.103 Given the existence of alternative procedures, informed consent requirements can be minimized, and, in some circumstances, consent can even be waived.7,101,103 Research on emergency life-saving interventions for subjects who are unconscious or otherwise not mentally capable of consent and for whom family or proxy consent is not immediately available may be conducted without informed consent.6,107-109 Thus, all requirements need to be satisfied, but they may have to be adjusted and balanced given the circumstances of different types of research.
(Reprinted) JAMA, May 24/31, 2000—Vol 283, No. 20
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
2707
ETHICAL REQUIREMENTS FOR CLINICAL RESEARCH
As interpreted and elaborated for specific research protocols, the fulfillment of each of these 7 requirements ensures that research is socially valuable and subjects are not exploited, that subjects are treated fairly and with respect, and that their interests are protected. As a result, these requirements should be sufficient to ensure that the vast majority of clinical research is ethical.30 While it may be impossible to exclude the possibility that additional requirements are needed in rare cases, these 7 requirements are the essential ones. UNIVERSALITY OF THE REQUIREMENTS These 7 requirements for ethical clinical research are also universal.35-49,110 They are justified by ethical values that are widely recognized and accepted and in accordance with how reasonable people would want to be treated.110-112 Indeed, these requirements are precisely the types of considerations that would be invoked to justify clinical research if it were challenged. Like constitutional provisions and amendments, these ethical requirements are general statements of value that must be elaborated by traditions of interpretation and that require practical interpretation and specification that will inherently be context and culture dependent.110-113 For instance, while informed consent is meant to ensure that research subjects are treated with respect, what constitutes respect varies from culture to culture.110,114 In some places, it will be necessary to elicit the consent of elders before individual subjects can be approached for informed consent.115 Similarly, who is considered vulnerable for the purposes of fair subject selection criteria will vary by locale. While in the United States special efforts are necessary to ensure that racial minorities are not just targeted for research with high potential for risks,53,73 in other places fair subject selection may require special focus on religious groups. Similarly, local traditions and economic conditions will influence when financial payments may constitute undue inducements. Also, whether re2708
search has a favorable risk-benefit ratio will depend on the underlying health risks in a society. Research that is unacceptable in one society because its risks outweigh the risks posed by the disease may have a favorable risk-benefit ratio in another society where the risks posed by the disease are significantly greater. Adapting these requirements to the identities, attachments, and cultural traditions embedded in distinct circumstances neither constitutes moral relativism nor undermines their universality110-112; doing so recognizes that while ethical requirements embody universal values, the manner of specifying these values inherently depends on the particular context.110-112 NECESSARY EXPERTISE These ethical requirements emphasize the type of training and skills necessary for clinical investigators and those conducting independent review (Table 2). Not only must clinical investigators be skilled in the appropriate methods, statistical tests, outcome measures, and other scientific aspects of clinical trials, they must have the training to appreciate, affirm, and implement these ethical requirements, such as the capacity and sensitivity to determine appropriate subject selection criteria, evaluate risk-benefit ratios, provide information in an appropriate manner, and implement confidentiality procedures. Similarly, because independent review of clinical research must assess its value, validity, selection criteria, risk-benefit ratios, informed consent process, and procedures for monitoring enrolled subjects, the necessary skills must range from scientific to ethical to lay knowledge. Consequently, the independent ethical review of research trials should involve individuals with training in science, statistics, ethics, and law, as well as reflective citizens who understand social values, priorities, and the vulnerability and concerns of potential subjects (Table 2). ACTUAL CASES Considering actual cases illuminates how the requirements can guide ethi-
JAMA, May 24/31, 2000—Vol 283, No. 20 (Reprinted)
cal evaluation of clinical research. One persistently controversial issue is the use of placebo controls.14-16 A new class of antiemetics, serotonin antagonists, such as ondansetron hydrochloride and granistron hydrochloride, were developed about 10 years ago. To evaluate these drugs, investigators conducted placebo-controlled trials randomizing cancer patients receiving emetogenic chemotherapy to either placebo or the serotonin antagonists.116-118 In evaluating the ethics of this clinical research, all requirements need to be fulfilled, but 3 requirements seem particularly relevant: value, scientific validity, and risk-benefit ratio. There is no doubt that the dominant antiemetic therapies of the time, such as prochlorperazine, metoclopramide hydrochloride, and high-dose corticosteroids are effective. However, they are not completely effective, especially for strongly emetogenic chemotherapy such as platinum, and they have significant adverse effects, especially dystonic reactions. Alternative antiemetic therapies that would be more effective and have fewer adverse effects were viewed as desirable and of value. However, there was no value in knowing whether the serotonin antagonists were better than placebo in controlling emesis, since placebo was not the standard of care at the time of the research.14,63 Even if the serotonin antagonists were shown to be more effective than placebo, it would be a further issue to evaluate their effectiveness and adverse-event profile compared with the extant interventions. Thus, a placebocontrolled trial of the serotonin antagonists for chemotherapy-induced emesis does not fulfill the value requirement. Comparative studies evaluating the difference between 2 active treatments are common in cancer therapy and valid as a study design.14-16 Some argue that active-controlled studies are scientifically more difficult to conduct than placebo-controlled trials.119 However, any ethically and scientifically valid randomized trial requires that there be an honest null hypothesis.30,63 The null hypothesis that the serotonin antagonists are equivalent to
Š2000 American Medical Association. All rights reserved.
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
ETHICAL REQUIREMENTS FOR CLINICAL RESEARCH
placebo was not reasonable at the time of the clinical research.14,63 Indeed, coeval with the placebo-controlled studies were randomized controlled trials with serotonin antagonists vs active antiemetic therapy.120,121 Thus, a placebocontrolled trial was not the only scientifically valid method. Those who supported the notion of a randomized, placebo-controlled trial of serotonin antagonists argued that there was no serious risk from using a placebo because emesis is a transitory discomfort that results in no permanent disability.119,122 However, emesis is not pleasant. Indeed, the entire rationale for developing serotonin antagonists is that chemotherapyinduced emesis is a sufficiently serious health problem that development and use of effective interventions in clinical practice are justifiable and desirable.123 As one published report of a randomized placebo-controlled trial of ondansetron stated to justify the research: “Uncontrolled nausea and vomiting [from chemotherapy] frequently results in poor nutritional intake, metabolic derangements, deterioration of physical and mental condition, as well as the possible rejection of potentially beneficial treatment. Many patients are more afraid of uncontrolled nausea and vomiting than of alopecia.”118 Furthermore, the placebo-controlled trials for antiemetics included“‘rescue’ medication if patients had persistent nausea or vomiting.”118 This indicates both that there was an alternative standard treatment for chemotherapy-induced emesis and that emesis was sufficiently harmful to require intervention.14,15,123,124 Permitting patients to vomit while being administered placebo causes them unnecessary harm.14,123,124 Thus, a placebocontrolled trial of antiemetics for chemotherapy-induced emesis does not minimize harm in the context of good clinical practices and so fails the favorable risk-benefit ratio when an available clinical intervention can partially ameliorate some of the harm.123 Importantly, the evaluation of these placebo-controlled trials of antiemet-
ics did not need to address informed consent to determine whether they were ethical.122 Indeed, even if patients had signed an informed consent document that indicated they could be randomized to placebo and that there were alternative effective treatments, the placebo-controlled research on serotonin antagonists would still be unethical. Another controversial issue involves research in developing countries.9-13,57,59 Recently, a rhesus rotavirus tetravalent (RRV-TV) vaccine was licensed in the United States after randomized trials in developed countries demonstrated a 49% to 68% efficacy in preventing diarrhea and up to 90% efficacy in preventing severe cases of diarrhea.125-127 However, shortly after approval, the vaccine was withdrawn from the US market because of a cluster of cases of intussusception, representing an approximately 1 in 10000 added risk of this complication.128 Should randomized controlled trials of RRV-TV vaccine proceed as planned in developing countries or wait for a new vaccine candidate to be developed? (C. Weijer, MD, PhD, written communication, March 24, 2000) In evaluating the ethics of these proposed trials, the requirements of value, scientific validity, fair subject selection, and risk-benefit ratio are particularly relevant. Despite oral rehydration therapy, more than 600 000 children in developing countries die annually from rotavirus diarrhea.129 In some countries, the death rate from rotavirus is nearly 1 in 200. Clearly, a rotavirus vaccine with even 80% efficacy that prevented more than half a million deaths would be of great value. But is research using the RRV-TV vaccine ethical when the risk of intussusception stopped its use in the United States? The RRV-TV vaccine was the first and only licensed rotavirus vaccine and has already been administered to nearly 1 million children; potential alternative rotavirus vaccines are still years away from phase 3 research. Thus, given the potential benefit of preventing deaths from rotavirus in developing countries, a trial of RRV-TV vaccine now—even if a better vaccine becomes evaluable in a
©2000 American Medical Association. All rights reserved.
few years—is worthwhile. There is value to the research on the vaccine for developing countries only if there is reasonable assurance children in the country would be able to obtain it if it proved effective.12,13,59 Vaccines effective in developed countries may or may not be as effective or safe in developing countries. Host, viral, and environmental factors and seasonality of the disease can alter the efficacy and safety profiles of a vaccine.130 Thus, there is good scientific rationale for determining whether the RRV-TV vaccine can achieve sufficient levels of protection against diarrhea with an acceptably low incidence of complications in children in developing countries. In this case, given the lack of an established method of preventing rotavirus infections in these countries, a placebo-controlled trial would be valid. Two factors suggest that, in the RRV-TV vaccine study, subjects in developing countries are being selected for reasons of science and not being exploited. First, the most appropriate subjects for a rotavirus vaccine trial are infants and children who have a high incidence of rotavirus infection and who experience significant morbidity and mortality from the infection. In such a population the efficacy of the vaccine would be most apparent. Second, since the RRV-TV vaccine has been withdrawn from the US market, children in developing countries are not being selected to assume risks to evaluate a vaccine that will ultimately benefit children in developed countries (Weijer, written communication). As long as the RRV-TV vaccine would be made available to the population recruited for the study if proven safe and effective, children in the developing countries are being selected appropriately.12,13,59 The final element is evaluation of the risk-benefit ratio. In the United States, the RRV-TV vaccine posed a risk of intussusception of about 1 in 10000, while rotavirus causes about 20 deaths annually or in fewer than 5 in 1 million children. Thus, in developed countries the risk-benefit ratio is not favorable— 1 death from rotavirus diarrhea pre-
(Reprinted) JAMA, May 24/31, 2000—Vol 283, No. 20
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
2709
ETHICAL REQUIREMENTS FOR CLINICAL RESEARCH
vented at the risk of 20 to 40 cases of intussusception. Because of underlying disease burden, the risk-benefit ratio in developing countries is much different. If rotavirus causes the death of 1 in 200 children while the RRV-TV vaccine causes intussusception in 1 in 10000 children, about 50 deaths from rotavirus diarrhea are prevented for each case of intussusception. Consequently, the risk-benefit ratio of the RRV-TV vaccine is favorable for individual subjects in developing countries while it is unfavorable for subjects in developed countries. This difference in risk-benefit ratios is a fundamental part of the justification for conducting the research on an RRV-TV vaccine in a developing country when it could not be ethically conducted in a developed country (Weijer, written communication). Obviously, to be ethical, randomized controlled trials of an RRV-TV vaccine would also have to adhere to the other requirements—independent review, informed consent, and respect for enrolled subjects. CONCLUSION These 7 requirements for considering the ethics of clinical research provide a systematic framework to guide researchers andIRBsintheirassessmentsofindividual clinical research protocols. Just as constitutional rulings are rarely unanimous, this framework will not necessarily engender unanimous agreement on the ethics of every clinical research study. Reasonable disagreement results from 3 sources: differences of interpretations of the requirements, of views about the need for additional requirements, and of application to specific studies. Nevertheless,thisframeworkdoesprovidethenecessary context for review bodies to generate traditions of interpretation, understand disagreements, and highlight the kinds of considerations that must be invoked to resolve them. Like a constitution, these requirements can be reinterpreted, refined, and revised with changes in science and experience. Yet these requirements must all be considered and met to ensure that clinical research— wherever it is practiced—is ethical. 2710
Disclaimer: The views herein are those of the authors and do not represent the views or policies of the Department of Health and Human Services or the National Institutes of Health. Acknowledgment: We thank Robert J. Levine, MD, Steven Joffe, MD, Franklin Miller, PhD, Robert Truog, MD, James Childress, PhD, Francis Crawley, PhD, and Albert Kapikian, MD, for their criticisms of the manuscript as well as Alan Sandler, DDS, Ruth Macklin, PhD, Eric Meslin, PhD, and Charles Weijer, MD, PhD, for helpful discussion and suggestions on the ideas contained in the manuscript.
REFERENCES 1. Childress J. The place of autonomy in bioethics. Hastings Cent Rep. 1984;14:12-16. 2. Dworkin G. The Theory and Practice of Autonomy. New York, NY: Cambridge University Press; 1988. 3. Beauchamp TL, Childress J. The Principles of Biomedical Ethics. New York, NY: Oxford University Press; 1996:chap 3. 4. Vanderpool HY, ed. The Ethics of Research Involving Human Subjects. Frederick, Md: University Publishing Group; 1996:45-58. 5. Freedman B. Scientific value and validity as ethical requirements for research. IRB. 1987;9:7-10. 6. Office of the Secretary. Protection of human subjects: informed consent and waiver of informed consent requirements in certain emergency research; final rules. 61 Federal Register 51498-51533 (1996). 7. Truog RD, Robinson W, Randolph A, Morris A. Is informed consent always necessary for randomized, controlled trials? N Engl J Med. 1999;340:804807. 8. US Department of Health and Human Services. Protections of human subjects. 45 CFR §46 (1991). 9. Angell M. The ethics of clinical research in the third world. N Engl J Med. 1997;337:847-849. 10. Lurie P, Wolfe S. Unethical trials of interventions to reduce perinatal transmission of the human immunodeficiency virus in developing countries. N Engl J Med. 1997;337:853-856. 11. Varmus H, Satcher D. Ethical complexities of conducting research in developing countries. N Engl J Med. 1997;337:1003-1005. 12. Grady C. Science in the service of healing. Hastings Cent Rep. 1998;28:34-38. 13. Crouch R, Arras J. AZT trials and tribulations. Hastings Cent Rep. 1998;28:26-34. 14. Rothman KJ, Michels KB. The continuing unethical use of placebo controls. N Engl J Med. 1994;331: 394-398. 15. Freedman B. Placebo-controlled trials and the logic of clinical purpose. IRB. 1990;12:1-6. 16. Weijer C. Placebo-controlled trials in schizophrenia. Schizophr Res. 1999;35:211-218. 17. Lipsett M. On the nature and ethics of phase I clinical trials of cancer chemotherapies. JAMA. 1982; 248:941-942. 18. Freedman B. Cohort-specific consent. IRB. 1990; 12:5-7. 19. Annas GJ. The changing landscape of human experimentation. Health Matrix. 1992;2:119-140. 20. Lehrman S. Jewish leaders seek guidelines. Nature. 1997;389:322. 21. Levine C, Dubler NN, Levine RJ. Building a new consensus. IRB. 1991;13:1-17. 22. Weijer C, Goldsand G, Emanuel EJ. Protecting communities in research. Nat Genet. 1999;23:275-280. 23. Juengst ET. Groups as gatekeepers to genomic research. Kennedy Institute J Ethics. 1998;8:183-200. 24. Weijer C. Protecting communities in research. Camb Q Healthc Ethics. 1999;8:501-513. 25. Kopelman LM, Moskop JC, eds. Children and Health Care. Dordrecht, the Netherlands: Klumer; 1989:73-87.
JAMA, May 24/31, 2000—Vol 283, No. 20 (Reprinted)
26. Freedman B, Fuks A, Weijer C. In loco parentis. IRB. 1993;15:13-19. 27. Leikin S. Minors’ assent, consent, or dissent to medical research. IRB. 1993;15:1-7. 28. Grodin MA, Glantz LH, eds. Children as Research Subjects. New York, NY: Oxford University Press; 1994:81-101. 29. Committee on Drugs, American Academy of Pediatrics. Guidelines for the ethical conduct of studies to evaluate drugs in pediatric populations. Pediatrics. 1995;95:286-294. 30. Levine RJ. Ethics and Regulation of Clinical Research. 2nd ed. New Haven, Conn: Yale University Press; 1988. 31. The President’s Commission for the Study of Ethical Problems in Medicine and Biomedical and Behavioral Research. Summing Up. Washington, DC: US Government Printing Office; 1983. 32. Katz J. Experimentation With Human Beings. New York, NY: Russell Sage Foundation; 1972. 33. Wertheimer A. Exploitation. Princeton, NJ: Princeton University Press; 1996: chap 1. 34. DeCastro LD. Exploitation in the use of human subjects for medical experimentation. Bioethics. 1995; 9:259-268. 35. The Nuremberg Code. JAMA. 1996;276:1691. 36. World Medical Association. Declaration of Helsinki. JAMA. 1997;277:925-926. 37. National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont Report. Washington, DC: US Government Printing Office; 1979. 38. Council for International Organizations of Medical Sciences. International Ethical Guidelines for Biomedical Research Involving Human Subjects. Geneva, Switzerland: CIOMS; 1993. 39. Brody BS. The Ethics of Biomedical Research. New York, NY: Oxford University Press; 1998:chap 9. 40. Fluss S. International Guidelines on Bioethics. Geneva, Switzerland: European Forum on Good Clinical Practice/CIOMS; 1998. 41. Sugarman J, Mastroianni A, Kahn JP. Research With Human Subjects. Frederick, Md: University Publishing Group; 1998. 42. World Health Organization. Guidelines for good clinical practice for trials on pharmaceutical products. In: The Use of Essential Drugs. Appendix 3. Geneva, Switzerland: WHO; 1995. 43. Council of Europe (Direcorate of Legal Affairs). Convention for the Protection of Human Rights and Dignity of the Human Being With Regard to the Application of Biology and Medicine. Strasbourg, France: Council of Europe; 1996. 44. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH). Good clinical practice: consolidated guidance, 62 Federal Register 25692 (1997). 45. European Forum for Good Clinical Practice. Guidelines and Recommendations for European Ethics Committees. Leuven, Belgium: EFGCP; 1997. 46. Medical Research Council (UK). Guidelines for Good Clinical Practice in Clinical Trials. London, England: MRC; 1998. 47. Uganda National Council of Science and Technology (UNCST). Guidelines for the Conduct of Health Research Involving Human Subjects in Uganda. Kampala, Uganda: UNCST; 1998. 48. Medical Research Council of Canada, Natural Sciences and Engineering Research Council of Canada, and Social Sciences and Humanities Research Council of Canada. Tri-Council Policy Statement. Ottawa, Ontario: Public Works and Government; 1998. 49. National Health and Medical Research Council. National Statement on Ethical Conduct in Research Involving Humans. Canberra, Australia: NHMRC; 1999. 50. Levine RJ. The impact of HIV infection on society’s perception of clinical trials. Kennedy Institute J Ethics. 1994;4:93-98.
©2000 American Medical Association. All rights reserved.
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
ETHICAL REQUIREMENTS FOR CLINICAL RESEARCH 51. Vanderpool HY, ed. The Ethics of Research Involving Human Subjects. Frederick, Md: University Publishing Group; 1996:235-260. 52. Levine RJ. The need to revise the Declaration of Helsinki. N Engl J Med. 1999;341:531-534. 53. Jones J. Bad Blood. New York, NY: Free Press; 1992. 54. Rothman D, Rothman S. The Willowbrook Wars. New York, NY: Harper & Row; 1984. 55. Krugman S. The Willowbrook hepatitis studies revisited. Rev Infect Dis. 1986;8:157-162. 56. Advisory Committee on Human Radiation Experiments. The Human Radiation Experiments. New York, NY: Oxford University Press; 1996. 57. Christakis N, Panner M. Existing international ethical guidelines for human subjects. Law Med Health Care. 1991;19:214-220. 58. Lasagna L. The Helsinki declaration. J Clin Psychopharmacol. 1995;15:96-98. 59. Glantz LH, Annas GJ, Grodin MA, et al. Research in developing countries. Hastings Cent Rep. 1998;28:38-42. 60. Committee on the NIH Research Priority-Setting Process. Scientific Opportunities and Public Needs. Washington, DC: National Academy Press; 1998. 61. Rutstein DD. The ethical design of human experiments. In: Fruend PA, ed. Experimentation With Human Subjects. New York, NY: Braziller Library; 1970: 383-402. 62. The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont Report: Appendix. Vol 1. Washington, DC: US Government Printing Office; 1978:chap 9. 63. Freedman B. Equipoise and the ethics of clinical research. N Engl J Med. 1987;317:141-145. 64. National Institutes of Health. NIH policy and guidelines on the inclusion of children as participants in research involving human subjects. Available at: http:// grants.nih.gov/grants/guide/notice-files/not98024.html. Accessed April 28, 2000. 65. Dresser R. Wanted: single, white male for medical research. Hastings Cent Rep. 1992;22:21-29. 66. Merton V. The exclusion of pregnant, pregnable, and once pregnable (a.k.a. women) from biomedical research. Am J Law Med. 1993;19:369451. 67. DeBruin D. Justice and the inclusion of women in clinical studies. Kennedy Institute J Ethics. 1994;4: 117-146. 68. Mastriani AC, Faden RR, Federman DD. Women and Health Research. Washington, DC: National Academy Press; 1994. 69. Vanderpool HY, ed. The Ethics of Research Involving Human Subjects. Frederick, Md: University Publishing Group; 1996:105-126. 70. Weijer C. Evolving issues in the selection of subjects for clinical research. Camb Q Healthc Ethics. 1996; 5:334-345. 71. Weijer C, Fuks A. The duty to exclude. Clin Invest Med. 1994;17:115-122. 72. Merkatz RB, Temple R, Sobel S, et al. Women in clinical trials of new drugs. N Engl J Med. 1993;329: 292-296. 73. National Institutes of Health. NIH Guidelines for the inclusion of women and ethnic minorities in research, 59 Federal Register 14508-14513 (1994). 74. Barry M, Molyneux M. Ethical dilemmas in malaria drug and vaccine trials. J Med Ethics. 1992;18: 189-192. 75. Kahn J, Mastroianni A, Sugarman J. Beyond Consent. New York, NY: Oxford University Press; 1998. 76. Annas G, Grodin M. Human rights and maternalfetal HIV transmission prevention trials in Africa. Am J Public Health. 1998;88:560-563. 77. Freedman B, Fuks A, Weijer C. Demarcating research and treatment. Clin Res. 1992;40:653-660.
78. Anderson E. Value in Ethics and Economics. Cambridge, Mass: Harvard University Press; 1993:chap 9. 79. Weijer C. Thinking clearly about research risks. IRB. 1999;21:1-5. 80. Sen A, Williams B, eds. Utilitarianism and Beyond. Cambridge, England: Cambridge University Press; 1982. 81. MacLean D, ed. Values at Risk. Totowa, NJ: Rowman & Allanheld; 1985:31-48. 82. Gold MR, Siegel JE, Russell LB, Weinstein MC. Cost-Effectiveness in Health and Medicine. New York, NY: Oxford University Press; 1996. 83. Sen A. Choice, Welfare, and Measurement. Cambridge, Mass: Harvard University Press; 1982:264-284. 84. Relman AS. Economic incentives in clinical investigations. N Engl J Med. 1989;320:933-934. 85. Porter RJ, Malone TE. Biomedical Research. Baltimore, Md: Johns Hopkins University Press; 1992. 86. Thompson D. Understanding financial conflicts of interest. N Engl J Med. 1993;329:573-576. 87. Spece RG, Shimm DS, Buchanan AE. Conflicts of Interest in Clinical Practice and Research. New York, NY: Oxford University Press; 1996. 88. The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. Institutional Review Boards. Washington, DC: US Government Printing Office; 1978. 89. Curran WJ. Government regulation of the use of human subjects in medical research. In: Freund PA, ed. Experimentation With Human Subjects. New York, NY: George Braziller; 1970:402-455. 90. Edgar H, Rothman D. The institutional review board and beyond. Milbank Q. 1995;73:489-506. 91. Moreno J, Caplan AL, Wolpe PR, et al. Updating protections for human subjects involved in research. JAMA. 1998;280:1951-1958. 92. Fried C. Medical Experimentation. New York, NY: American Elsevier Co; 1974. 93. Freedman B. A moral theory of informed consent. Hastings Cent Rep. 1975;5:32-39. 94. President’s Commission for the Study of Ethical Problems in Medicine and Biomedical Research. Making Health Care Decisions. Washington, DC: US Government Printing Office; 1982. 95. Katz J. Human experimentation and human rights. St Louis University Law J. 1993;38:1-54. 96. Donagan A. Informed consent in therapy and experimentation. J Med Philos. 1977;2:318-329. 97. Faden RR, Beauchamp TL. A History and Theory of Informed Consent. New York, NY: Oxford University Press; 1986:chap 5-9. 98. Applebaum PA, Lidz CW, Meisel A. Informed Consent. New York, NY: Oxford University Press; 1987. 99. Grisso R, Applebaum PS. Assessing Competence to Consent to Treatment. New York, NY: Oxford University Press; 1998. 100. Buchanan AE, Brock DW. Deciding for Others. New York, NY: Cambridge University Press; 1990: chap 2. 101. American College of Physicians. Cognitively impaired subjects. Ann Intern Med. 1989;111:843-848. 102. Dresser R. Mentally disabled research subjects. JAMA. 1996;276:67-72. 103. National Bioethics Advisory Commission. Research Involving Persons With Mental Disorders That May Affect Decisionmaking Capacity. Washington, DC: US Government Printing Office; 1998. 104. Michels R. Are research ethics bad for our mental health? N Engl J Med. 1999;340:1427-1430. 105. Capron AM. Ethical and human rights issues in research on mental disorders that may affect decisionmaking capacity. N Engl J Med. 1999;340:1430-1434. 106. Weijer C, Shapiro S, Fuks A, Glass KC, Skrutkowska M. Monitoring clinical research. CMAJ. 1995; 152:1973-1980.
©2000 American Medical Association. All rights reserved.
107. Biros MH, Lewis R, Olson C, et al. Informed consent in emergency research. JAMA. 1995;273:12831287. 108. Levine RJ. Research in emergency situations. JAMA. 1995;273:1300-1302. 109. Council on Ethical and Judicial Affairs, American Medical Association. Waiver of Informed Consent for Emergency Research. CEJA Report 1-A-7, June 1997. 110. Macklin R. Against Relativism. New York, NY: Oxford University Press; 1999. 111. Scanlon TM. What We Owe to Each Other. Cambridge, Mass: Harvard University Press; 1999: chap 1, 8. 112. Kymlicka W. Liberalism, Community and Culture. New York, NY: Oxford University Press; 1989. 113. Angell M. Ethical imperialism? N Engl J Med. 1988;319:1081-1083. 114. Levine RJ. Informed consent. Law Med Health Care. 1991;19:207-213. 115. Ijsselmuiden CB, Faden RR. Research and informed consent in Africa. N Engl J Med. 1992;326: 830-833. 116. Cubeddu LX, Hoffmann IS, Fuenmayor NT, Finn AL. Efficacy of ondansetron (GR 38032F) and the role of serotonin in cisplatin-induced nausea and vomiting. N Engl J Med. 1990;322:810-816. 117. Gandara DR, Harvey WH, Monaghan GG, et al. The delayed-emesis syndrome from cisplatin. Semin Oncol. 1992;19:67-71. 118. Beck TM, Ciociola AA, Jones SE, et al. Efficacy of oral ondansetron in the prevention of emesis in outpatients receiving cyclophosphamide-based chemotherapy. Ann Intern Med. 1993;118:407-413. 119. Temple R. Government viewpoint of clinical trials. Drug Inform J. 1982:1610-1617. 120. Marty M, Pouillart P, Scholl S, et al. Comparison of 5-hydroxytryptamine3 (serotonin) antagonist ondansetron (GR38032F) with high-dose metoclopramide in the control of cisplatin-induced emesis. N Engl J Med. 1990;322:816-821. 121. Hainsworth J, Harvey W, Pendergrass K, et al. A single-blind comparison of intravenous ondansetron, a selective serotonin antagonist, with intravenous metoclopramide in the prevention of nausea and vomiting associated with high-dose cisplatin chemotherapy. J Clin Oncol. 1991;9:721-728. 122. Ondansetron and cisplatin-induced nausea and vomiting. N Engl J Med. 1990;323:1486. 123. Hait WN. Ondansetron and cisplatin-induced nausea and vomiting. N Engl J Med. 1990;323:14851486. 124. Citron ML. Placebos and principles. Ann Intern Med. 1993;118:470-471. 125. Rennels MB, Glass RI, Dennehy PH, et al. Safety and efficacy of high dose rhesus-human reassortant rotavirus vaccines. Pediatrics. 1996;97:7-13. 126. Bernstein DK, Glass RI, Rodgers G, et al. Evaluation of rhesus rotavirus monovalent and tetravalent reassortment vaccines in US children. JAMA. 1995; 273:1191-1196. 127. Joensuu J, Koskenniemi E, Pang XL, Vesikari T. Randomized placebo-controlled trial of rhesushuman reassortment rotavirus vaccine for prevention of severe rotavirus gastroenteritis. Lancet. 1997; 350:1205-1209. 128. Intussusception among recipients of rotavirus vaccine—United States, 1998-1999. MMWR Morb Mortal Wkly Rep. 1999;48:577-581. 129. Bern C, Martines J, de Zoysa I, Glass RI. The magnitude of the global problem of diarrheal disease. Bull World Health Organ. 1992;70:705-714. 130. Bresee JS, Glass RI, Ivanoff B, Gentsch JR. Current status and future priorities for rotavirus vaccine development, evaluation and implementation in developing countries. Vaccine. 1999;17:2207-2222.
(Reprinted) JAMA, May 24/31, 2000—Vol 283, No. 20
Downloaded from www.jama.com at Dartmouth College, on September 19, 2006
2711
SEMINARIO N°3
LA OBSERVACIÓN DE LA REALIDAD COMO FUENTE DE PROBLEMAS DE INVESTIGACIÓN. Warren JR, Marshall BJ. Unidentified curved bacilli on gastric epithelium in active chronic gastritis. Lancet. 1983; 321(8336): 1273-1275.
Preguntas para el control de lectura y guía de discusión grupal 1.
El artículo empieza con una frase que podría traducirse como “la microbiología gástrica ha sido lamentablemente descuidada”. A continuación se señala que la mitad de los pacientes a quienes se les realiza una endoscopia y biopsia presentan colonización bacteriana gástrica. a.
¿Por qué siendo un hecho tan frecuente, la comunidad científica en general no le había prestado hasta ese momento atención a estos hallazgos habituales en la práctica médica? Plantee motivos
b.
¿Considera que actualmente podría estar sucediendo algo similar en otras áreas?
2.
¿Qué estrategias metodológicas están implícitas en el segundo y tercer párrafo del artículo? (página 1273, primera columna)
3.
En el texto del Dr. Warren se menciona:
“They were difficult to see with haematoxylin and eosin stain, but stained well by the Warthin-Starry silver method.”
Y en el texto del Dr. Marshall:
“… he did not use silver stains, so, not surprisingly, he found “no structure which could reasonably be considered to be of a spirochaetal nature”.”
Teniendo en cuenta estos textos, ¿qué reflexión puede hacer respecto al conocimiento y dominio de las técnicas de medición de un fenómeno en el proceso de investigación científica?
4.
¿Por qué se supuso que las bacterias espirales reportadas en tejido gástrico humano fueran consideradas como comensales? ¿Qué implicancias pudo tener esto?
5.
¿A qué etapa del método científico corresponde la parte final del texto del Dr. Marshall?
21
METODOLOGÍA DE LA INVESTIGACIÓN I
1273 UNIDENTIFIED CURVED BACILLI ON GASTRIC EPITHELIUM IN ACTIVE CHRONIC GASTRITIS Half the bacterial colonisation of their stomachs, a colonisation remarkable for the constancy of both the bacteria involved and the associated histological changes. During the past three years I have observed small curved and S-shaped bacilli in 135 gastric biopsy specimens. The bacteria were closely associated with the surface epithelium, both within and between the gastric pits. Distribution was continuous, patchy,’ or focal. They were difficult to see with haematoxylin and eosin stain, but stained well by the WarthinStarry silver method (figure). I have classified gastric biopsy findings according to the type of inflammation, regardless of other features, as "no inflammation", "chronic gastritis" (CG), or "active chronic gastritis" (ACG). CG shows more small round cells than normal while ACG is characterised by an increase in polymorphonuclear neutrophil leucocytes, besides the features of CG. It was unusual to find no inflammation. CG usually showed superficial oedema of the mucosa. The leucocytes in ACG were usually focal and superficial, in and near the surface epithelium. In many cases they only infiltrated the necks of occasional gastric glands. The superficial epithelium was often irregular, with reduced mucinogenesis and a cobblestone surface. When there was no inflammation bacteria were rare. Bacteria were often found in CG, but were rarely numerous. The curved bacilli were almost always present in ACG, often in large numbers and often growing between the cells of the surface epithelium (figure). The constant morphology of these bacteria and their intimate relationship with the mucosal architecture contrasted with the heterogeneous bacteria often seen in the surface debris. There was normally a layer of mucous secretion on the surface of the mucosa. When this layer was intact, the debris was spread over it, while the curved bacilli were on the epithelium beneath, closely spread over the surface (figure). The curved bacilli and the associated histological changes may be present in any part of the stomach, but they were seen most consistently in the gastric antrum. Inflammation, with no bacteria, occurred in mucosa near focal lesions such as carcinoma or peptic ulcer. In such cases, the leucocytes were spread through the full thickness of the nearby mucosa, in contrast- to the superficial infiltration associated with the bacteria. Both the bacteria and the typical histological changes were commonly found in mucosa unaffected by the focal lesion. The extraordinary features of these bacteria are that they are almost unknown to clinicians and pathologists alike, that they are closely associated with granulocyte infiltration, and that they are present in about half of our routine gastric biopsy specimens in numbers large enough to see on routine histology. The only other organism I have found actively growing in the stomach is Candida, sometimes seen in the floor of peptic ulcers. These bacteria were not mentioned in two major studies of gastrointestinal microbiology 1,2 possibly because of their unusual atmospheric requirements and slow growth in culture (described by Dr B. Marshall in the accompanying letter). They were mentioned in passing by Fung et
SIR,-Gastric microbiology has been sadly neglected. patients coming to gastroscopy and biopsy show
al.3
How the bacteria survive is uncertain. There is a pH gradient from acid in the gastric lumen to near neutral in the mucosal vessels. The bacteria grow in close contact with the epithelium, presumably near the neutral end of this gradient, and are protected by the
overlying mucus. The identification and clinical significance of this bacterium remain uncertain. By light microscopy it resembles Campylobacter jejuni but cannot be classified by reference to Bergey’s Manual of
Curved bacilli
Gut 1967; 8:
Determinative
Bacteriology. The stomach must not be viewed as a sterile organ with no permanent flora. Bacteria in numbers sufficient to see by light microscopy are closely associated with an active form of gastritis, a cause of considerable morbidity (dyspeptic disease). These organisms should be recognised and their significance investigated. Department of Pathology, Royal Perth Hospital,
J. ROBIN WARREN
Perth, Western Australia 6001
SIR,-The above description of S-shaped spiral bacteria in the gastric antrum, by my colleague Dr J. R. Warren, raises the following questions: why have they not been seen before; are they pathogens or merely commensals in a damaged mucosa, and are they campylobacters? In 1938 Doenges’ found "spirochaetes" in 43% of 242 stomachs at necropsy but drew no conclusions because autolysis had rendered most of the specimens unsuitable for pathological diagnosis. Freedburg and Barron2studied 35 partial gastrectomy specimens and found "spirochaetes" in 37%, after a long search. They concluded that the bacteria colonised the tissue near benign or malignant ulcers as non-pathogenic opportunists. When Palmer33 examined 1140 gastric suction biopsy specimens he did not use silver stains, so, not surprisingly, he found "no structure which could reasonably be considered to be of a spirochaetal nature". He concluded that the gastric "spirochaetes" were oral contaminants which multiplied only in post mortem specimens or close to ulcers. Since that time, the spiral bacteria have rarely been mentioned, except as curiosities,4 and the subject was not reopened with the ‘
2.
574-81.
2. Drasar BS, Shiner M, McLeod GM. Studies on the intestinal flora I I: The bacterial flora of the gastrointestinal tract in healthy and achlorhydric persons. Gastroenterology 1969; 56: 71-79. 3. Fung WP, Papadimitriou JM, Matz LR. Endoscopic, histological and ultrastructural correlations in chronic gastritis. Am J Gastroenterol 1979; 71: 269-79.
gastric epithelium.
Section is cut at acute angle to show bacteria on surface, forming network between epithelial cells. (Warthm-Starry silver stain; bar= 10 m.)
1.
1. Gray JDA, Shiner M. Influence of gastric pH on gastric and jejunal flora
on
3
Doenges JL Spirochaetes in the gastric glands
of Macacus rhesus and humans without
definite history of related disease. Proc Soc Exp Med Biol 1938, 38: 536-38 Freedburg AS, Barron LE. The presence of spirochaetes in human gastric mucosa. Am J Dig Dis 1940, 7: 443-45. Palmer ED Investigation of the gastric spirochaetes of the human? Gastroenterology 1954; 27: 218-20
4 Ito S. Anatomic structure of the gastric
Handbook of
physiology,
Washington, DC:
American
6.
mucosa.
In: Heidel
US, Cody CF, eds.
Alimentary canal, vol Physiological Society, 1967 705-41. section
II·
Secretion
1274
Fig 1-Thin-section micrograph showing spiral bacteria on surface of advent of gastroscopic biopsy. Silver staining is not routine for mucosal biopsy specimens, and the bacteria have been overlooked. In other mammals spiral gastric bacteria are well known and are thought to be commensals(eg, Doenges1 found them in all offortythree monkeys). They usually have more than two spirals and inhabit the acid-secreting gastric fundus.In cats they even occupy the canaliculi of the oxyntic cells, suggesting tolerance to acid. The animal bacteria do not cause any inflammatory response, and no illness has ever been associated with them. Investigation of gastric bacteria in man has been hampered by the false assumption that the bacteria were the same as those in animals and would therefore be acid-tolerant inhabitants of the fundus. Warren’s bacteria are, however, shorter, with only one or two spirals and resemble campylobacters rather than spirochaetes. They live beneath the mucus of the gastric antrum well away from the
a mucous
cell in
gastric biopsy specimen. (Bar==1 m.)
acid-secreting cells. We have cultured the bacteria from antral
biopsy specimens,
using Campylobacter isolation techniques. They are microaerophilic and grow on moist chocolate agar at 37°C, showing up in 3-4 days as a faint transparent layer. They are about 0’ /-1m in diameter and 2’ 5 m in length, appearing as short spirals with one or two wavelengths (fig 1). The bacteria have smooth coats with up to five sheathed flagellae arising from one end (fig 2). In some cells, including dividing forms, flagellae may be seen at both ends and in negative7stain preparations they have bulbous tips, presumably an artefact. These bacteria do not fit any known species either morphologically or biochemically. Similar sheathed flagellae have been described in vibriosbut micro-aerophilic vibrios have now 7. Glauert AM, Kerridge D, Horne RW. The fine structure and mode of attachment of the
VG, Boler RK. Ultrastructure of a spiraled micro-organism in the gastric mucosa of dogs. Am J Vet Res 1970; 31: 1453-62. Vial JD, Orrego H. Electron microscope observations on the fine structure of parietal cells. J Biophys Biochem Cytol 1960; 7: 367-72
5. Lockard 6.
sheathed flagellum of Vibrio metchnikovii. J Cell Biol 1963; 18: 327-36. 8 Shewan JM, Veron M. Genus I vibrio. In: Buchanan RE, Gibbons NE, eds. Bergey’s manual of determinative microbiology, 8th ed. Baltimore: Williams & Wilkins, 1974·341.
Fig 2-Negative stain micrograph of dividing bacterium from broth culture.
Multiple polar flagellae
have terminal bulbs,
(2% phosphotungstate, pH 6.8; bar =1 m.) Inset: detail showing sheathbed flagellum and basal disc associated II nl plasma membrane. (3% ammonium molybdate, pH 6 - 5; bar= 100 nm.)
1275 been transferred
to
the
family Spirillaceae
genus
Campylobacter.1
Campylobacters however, have "a single polar flagellum at one both ends of the cell" and the campylobacter flagellum
or
is
unsheathed.9 Warren’s bacteria may be of the genus Spirillum. The pathogenicity of these bacteria remains unproven but their association with polymorphonuclear infiltration in the human antrum is highly suspicious. If these bacteria are truly associated with antral gastritis, as described by Warren, they may have a part to play in other poorly understood, gastritis associated diseases (ie, peptic ulcer and gastric cancer). I thank Miss Helen Royce for microbiological assistance, Dr J. A. Armstrong for electronmicroscopy, and Dr Warren for permission to use fig 1. Department of Gastroenterology, Royal Perth Hospital, Perth, Western Australia 6001
BARRY MARSHALL
VASODILATOR PROSTANOIDS AND ACTHDEPENDENT HYPERTENSION
SIR,-Dr Axelrod (April 23, p 904) proposes that the permissive effect of glucocorticoids on vascular tone is mediated via inhibition of prostacyclin production and that this may contribute to the hypertension of Cushing’s syndrome. We became interested in this possibility following the suggestion by Rascher at all that glucocorticoids may produce hypertension as a result of inhibition of phospholipase Az and a subsequent reduction in "vasodilator" prostaglandin synthesis. The demonstration by Weeks and Sutter2 that prostacyclin (epoprostenol) infusion attenuated the development of DOCA (desoxycortone) induced hypertension in the rat was also relevant. We have reviewed the evidence for such a hypothesis in relation to steroid and corticotropin (ACTH) dependent hypertension.3Our own studies have been concerned with the mechanism of ACTH induced hypertension in sheep, a form of experimental hypertension and features of glucocorticoid and mineralocorticoid excess but in which these two classes of adrenocortical steroid activity do not appear to account for more than about half of the hypertension.3 On the basis of detailed experiments in conscious sheep we concluded that although "vasodilator" prostanoids such as prostacyclin appear to modulate the ACTH induced rises in blood pressure they did not play a primary role in the development of the hypertension. Although in sheep,4as in other species, indomethacin enhances vasoconstrictor responses to angiotensin II, ACTH treatment does not alter pressor responsiveness to either angiotensin II, noradrenaline, or arginine-vasopressin.5-7 Also, indomethacin (3 mg/kg daily for 3 days) had no effect on blood pressure in normotensive sheep.ó Further, pretreatment of sheep for 24 h with prostacyclin at a dose which lowered total peripheral resistance but not blood pressure did not alter the blood pressure response to PJ. Electron microscopy of Campylobacter jejuni. J Med Microbiol 1979; 12: 383-85. 1. Rascher W, Dietz R, Schomig A, et al. Modulation of sympathetic vascular tone by prostaglandins in corticosterone-induced hypertension in rats. Clin Sci 1979; 57: 235s-37s 2. Weeks JR, Sutter DM. An antihypertensive effect of prostacyclin. New York: Raven Press, 1979. 253-57. 3. Scoggms BA, Coghlan JP, Denton DA, Mason RT, Whitworth JA. A review of mechanisms involved in the production of steroid induced hypertension with particular reference to ACTH dependent hypertension. In: Mantero F, Biglieri EG, Edwards CRW, eds. Endocrinology of hypertension. London: Academic Press, 1982: 41-67. 4. Beilby DS, Coghlan JP, Denton DA, et al. In vivo-modification of angiotensin II pressor responsiveness in sheep by indomethacin. Clin Exp Pharmacol Physiol 1981; 8: 33-37. 5. McDougall JG, Barnes AM, Coghlan JP, et al. The effect of corticotrophin (ACTH) administration on the pressor action of angiotensin II, noradrenaline and tyramine in sheep. Clin Exp Pharmacol Physiol 1978; 5: 449-55. 6. Mason RT, Coghlan JP, Denton DA. Do prostaglandins play a role in modulating the haemodynamic effects of ACTH administration? Proc Endocrinol Soc Aust 1981; 24: 9 Pead
7 7
Coghlan JP, Denton DA, Graham WF, et al. Effect of ACTH haemodynamic response to arginine-vasopressin in sheep. Physiol 1980; 7: 559-62.
administration on the Clin Expt Pharmacol
ACTH.6 This suggests to us that the proposal by Axelrod that ACTH-dependent hypertension is in any way caused by inhibition or prostaglandin synthesis is questionable. Our evidence that prostaglandins may modulate the severity of ACTH dependent hypertension is based on three series of experiments. The first showed that although indomethacin infusion for 60 min, at a dose which blocks the vasodepressor effect of arachidonic acid, has no effect on blood pressure in normotensive sheep, it produced a further increase in mean arterial pressure of 26 mm Hg in sheep with ACTH-induced hypertension.6 This rise in blood pressure was entirely due to a rise in total peripheral resistance. In the second series of experiments we showed that in animals pretreated with indomethacin for three days the rise in blood pressure in response to ACTH was significantly greater.6 Finally we found that although graded doses of prostacyclin, infused for 10 min, produced similar falls in blood pressure in normotensive and ACTH hypertensive sheep, the fall in total peripheral resistance is much greater in the ACTH treated animals.8 We speculated that plasma levels of vasodilator prostanoids such as prostacyclin may rise in response to ACTH administration. However, measurement of plasma 6-keto-PGF,, (considered by some to reflect prostacyclin production) by Dr Murray Mitchell (Dallas, USA)3 showed a small but significant decrease with ACTH treatment. Our studies in sheep suggest a modulating rather than causal role for vasodilator prostanoids in ACTH-dependent hypertension. B. A. SCOGGINS Howard
Florey Institute of Experimental Physiology and Medicine and Department of Nephrology, Royal Melbourne Hospital, Parkville, Victoria 3052, Australia
J. A. WHITWORTH J. P. COGHLAN D. A. DENTON R. T. MASON
EPOPROSTENOL (PROSTACYCLIN) DECREASES PLATELET DEPOSITION ON VASCULAR PROSTHETIC GRAFTS
SIR,-Prostacyclin (PGI2) is an important regulator of platelet deposition on vascular surfaces.When a prosthetic vascular graft is’ inserted, a few weeks are required before the formation ofPGI2 by the pseudovascular wall cells reaches the same level of activity of tissue in the vicinityt° because of the slow increase in prostacyclin synthetase 11in the invading cells. Hence platelet deposition on the graft surface may be a significant factor in limiting graft survival12 and causing early occlusion. PGI2 can decrease platelet deposition on vascular surfaces,13 so we wondered if platelet deposition on prosthetic grafts would be affected by a short term infusion of
epoprostenol. We examined nine male and two female patients aged 53-66 years between 48 and 72 h after surgery. Autologous platelet labelling was carried out with 100 Ci 11 ’*In-oxine sulphate. 14 Platelet labelling efficiency amounted to 92±2%, and recovery 2 h after re-injection of autologous labelled platelets was 76±4%. 6 h after re-injection of autologous labelled platelets gamma-camera imaging studies were done. Epoprostenol (prostacyclin) 5 ng/kg/min was then infused for 24 h. Gamma-camera imaging was repeated (see figure) during and after prostacyclin infusion. Regions of interest (ROI) were 8. Mason
9.
RT, Allen KJF, Coghlan JP, Denton, et al. ACTH hypertension modifies the haemodynamic effects of prostacyclin infusions in sheep. Clin Exp Pharmacol Physiol 1980; 7: 469-72. Moncada S, Vane JR. Unstable metabolites of arachidonic acid and their role in
hemostasis and thrombosis. Br Med Bull 1978, 34: 129-36. Silberbauer K, Winter M. Implanted vascular prostheses generate prostacyclin Lancet 1978; ii: 840-41. 11. Eldor A, Falcone D, Hajjar DP, Mimck CR, Weksler BB. Recovery of prostacyclin production by deendothelialized rabbit aorta. J Clin Invest 1981; 67: 735-41 12. Harker LA, Slichter SJ, Sauvaage LR. Platelet consumption by arterial prostheses: The effect of endothelialization and pharmacological inhibition of platelet function. Ann Surg 1977; 186: 594-600. 13 Moncada S, Higgs EA, Vane JR. Human arterial and venous tissue generates prostacyclin (prostaglandin) a potent inhibitor of platelet aggregation Lancet 1977; i 18-21 14. Sinzinger H, Schwarz M, Leithner Ch, Hofer R. Labelling of autologous human platelets with indium-111-oxine sulphate for monitoring of human kidney transplants. Nucl Med Biol (Paris) 1982; 2752-55.
10.Sinzinger H,
SEMINARIO N°4
DE LA HIPÓTESIS AL DISEÑO DE INVESTIGACIÓN. Marshall BJ, Warren JR. Unidentified curved bacilli in the stomach of patients with gastritis and peptic ulceration. Lancet. 1984; 323(8390): 1311-5.
Preguntas para el control de lectura y guía de discusión grupal 1.
En la introducción del artículo, los Drs. Marshall y Warren describen que la incorporación de la endoscopia digestiva flexible permitió el estudio con biopsias del antro gástrico, situación que antes no era posible realizar. Así, los avances tecnológicos permiten mejorar nuestras investigaciones al posibilitarnos mejores mediciones. Mencione dos ejemplos de desarrollos tecnológicos en medicina durante los últimos 20 años que han permitido ampliar el estudio del proceso salud-enfermedad
2.
Los autores plantean cuatro objetivos de investigación, siendo el primero confirmar la asociación entre la gastritis antral y la presencia de bacterias (observación que habían realizado previamente y desean corroborar su hipótesis). ¿Por qué considera que se plantean los otros tres objetivos?
3.
Represente mediante un diagrama (flujograma) los pasos seguidos por los autores en su investigación.
4.
Redacte la definición operacional de gastritis teniendo en cuenta tres componentes: valores de la variable, criterios para asignar los valores y los instrumentos/técnicas de medición. ¿Por qué es importante este tipo de definiciones en una investigación?
5.
¿Cómo presentaría los resultados de la tabla IV para ser más comprensibles?
6.
Hacia la parte final de la discusión los autores señalan que la diátesis ulcerosa sería un mito. ¿Puede mencionar algún hecho que actualmente podría ser considerado como un mito sobre la causalidad, tratamiento o pronóstico de una enfermedad? ¿Qué consecuencias pueden tener estas creencias? (nota: la pregunta no hace referencia a temas de mitología clásica)
7.
En la conclusión del estudio los investigadores mencionan que no pueden probar una relación causa-efecto con su investigación. ¿Por qué? ¿Cómo diseñaría un estudio para probar una relación causal?
Preguntas adicionales a ser discutidas en la sesión grupal 1.
Los pacientes participantes del estudio dieron su consentimiento informado para ser incluidos. ¿Qué principio de la bioética en investigación está más relacionado con este proceso?
25
METODOLOGÍA DE LA INVESTIGACIÓN I
2.
En la sección metodología, cuando se describe el cuestionario aplicado a los pacientes, así como en el cuarto párrafo de la discusión se menciona causas “conocidas” (entre comillas). ¿Por qué cree que los autores escriben entre comillas el término conocidas?
3.
¿Qué hubiera sucedido con los resultados de los cultivos si no se hubiera olvidado algunos en incubación durante los días feriados? ¿Qué opinión tiene sobre este hecho?
4.
Los reportes de las endoscopias y los análisis de laboratorio fueron realizados de manera independiente (ciego). ¿Por qué los autores decidieron realizar así estos procedimientos?
5.
La tabla I presenta una comparación entre los sujetos incluidos y excluidos del estudio. ¿Es esto importante? ¿Por qué?
6.
En la descripción de la histopatología se señala que las tinciones de plata fueron más sensibles. ¿Qué significa esto?
7.
Los autores explican por qué no sería adecuado designar a la bacteria identificada como Campylobacter pyloridis. ¿Cuál es la importancia de una correcta denominación de los organismos infecciosos?
26
METODOLOGÍA DE LA INVESTIGACIÓN I
16
Saturday
UNIDENTIFIED CURVED BACILLI IN THE STOMACH OF PATIENTS WITH GASTRITIS AND PEPTIC ULCERATION*
June
1984
Patients and Methods Patients
patients referred for gastroscopy on clinical grounds were eligible for the study which continued until there were 100 participants who gave informed consent and in whom biopsy was considered to be safe. The study was approved by our hospital’s human rights committee. All
BARRY J.
MARSHALL
J.
ROBIN WARREN
Departments of Gastroenterology and Pathology, Royal Perth Hospital, Perth, Western Australia taken from intact areas of antral mucosa in 100 consecutive consenting patients presenting for gastroscopy. Spiral or curved bacilli were demonstrated in specimens from 58 patients. Bacilli cultured from 11 of these biopsies were gramnegative, flagellate, and microaerophilic and appeared to be a new species related to the genus Campylobacter. The bacteria were present in almost all patients with active chronic gastritis, duodenal ulcer, or gastric ulcer and thus may be an important factor in the aetiology of these diseases.
Summary
Biopsy specimens
were
Introduction GASTRIC spiral bacteria have been repeatedly observed, reported, and then forgotten for at least 45 years. 1-3 In 1940 Freedburg and Barron stated that "spirochaetes" could be found in up to 37% of gastrectomy specimens,4- but examination of gastric suction biopsy material failed to confirm these findings.5 The advent of fibreoptic biopsy techniques permitted biopsy of the antrum, and in 1975 Steer and Colin-Jones observed gram-negative bacilli in 80% of patients with gastric ulcer. The curved bacilli they illustrated were said to be Pseudomonas, possibly a contaminant, and the bacteria were once more forgotten. The repeated demonstration of these bacteria in inflamed gastric antral mucosa7 prompted us to do a pilot study in twenty patients. Typical curved bacilli were present in over half the biopsy specimens and the number of bacteria was closely related to the severity of the gastritis. The present study was designed to confirm the association between antral gastritis and the bacteria, to discover associated gastrointestinal diseases, to culture and identify the bacteria, and to find factors predisposing to infection. *Based on paper read at Second International Infections (Brussels, 1983).
Workshop
on
Campylobacter
Questionnaire Where
.
a clinical questionnaire detect a source of infection or show any relationship with "known" causes of gastritis or Campylobacter infection, rather than give a detailed account of each patient’s history. The emphasis was on animal contact, travel, diet, dental hygiene, and drugs, rather than symptoms.
designed
possible patients completed
to
Endoscopy The gastroscopies were done by colleagues at the Royal Perth Hospital. Participants fasted for at least 4 h before endoscopy. An Olympus GIF-K fibreoptic gastroduodenoscope was used. Routine biopsies were done when indicated. For the study two extra specimens were taken from an area of intact antral mucosa, at a distance from any focal lesion such as an antral ulcer. When the appeared inflamed the specimens were taken from a red area, otherwise any part of the antrum was used. One biopsy was immediately fixed in phosphate-buffered formalin for histological examination, the other was placed in chilled anaerobic transport medium and taken to the microbiology laboratory within 1 h. In a few cases an extra specimen was taken for ultrastructural examination. The gastroenterologist dictated his report soon after the endoscopy. We had not planned to analyse these reports so a standard terminology was not used and no special attention was paid to minor endoscopic lesions. Findings of doubtful clinical significance, such as mild endoscopic gastritis or duodenogastric bile reflux, may thus have been under-reported. (Hereafter the term "gastritis" refers to a histological grade of chonic gastritis unless stated otherwise.) Before we analysed the data, the endoscopy reports were coded for the major diagnoses. mucosa
Histopathology Sections were stained with haematoxylin and eosin (H & E) and graded for gastritis (by J. R. W.) as 0 (normal), inflammatory cells rarely seen; 1 (normal), lymphoid cells present but within normal limits and with no other evidence of inflammation (see below); 2 (chronic), chronic gastritis; or 3 (active), active chronic gastritis. 8390
1312
Gradings were based solely on the type of inflammatory cells. Other types of mucosal change, such as gland atrophy or intestinal metaplasia, were noted separately, but were not used as evidence of inflammation. "Chronic gastritis" indicated inflammation with no increase in polymorphonuclear leucocytes (PMNs). There were either increased numbers of lymphoid cells or normal cell numbers with other evidence of inflammation such as oedema, congestion, or cell damage. The term "active" was used to indicate an increase in PMNs.8 The gastritis was considered active if a few PMNs infiltrated one gland neck or pit, if occasional PMNs were scattered throughout the superficial epithelium, or if there was an obvious increase in PMNs in the lamina propria. Later, sections stained with Warthin-Starry silver stain were examined for small curved bacilli on the surface epithelium. Numbers of bacteria were graded as 0, no characteristic bacteria; 1, occasional spiral bacteria found after searching; 2, scattered bacteria in most high-power fields or occasional groups of numerous bacteria; or 3, numerous bacteria in most high-power fields.
TABLE II-ASSOCIATION OF BACTERIA WITH ENDOSCOPIC DIAGNOSES
*More than one description applies to several patients (eg, 4 patients had both gastric and duodenal ulcers). tRefers to endoscopic appearance, not histological inflammation. TABLE III-HISTOLOGICAL GRADING OF GASTRITIS AND BACTERIA
Microbiology Tissue smears were Gram stained and examined for curved bacilli
resembling Campylobacter. The remaining tissue was minced, plated on non-selective blood and chocolate agar, and cultured at 37°C under microaerophilic conditions as used for Campylobacter isolation.9 At first plates were discarded after 2 days but when the first positive plate was noted after it had been left in the incubator for 6 days during the Easter holiday, cultures were done for 4 days.
_____
I
__
H -
I
*Gastritis grades 0 and 1 normal. tcase showed bacteria on gram stained
Analysis of Results
I
--
I
--
I
---
smear.
TABLE IV-RELATION BETWEEN GASTRITIS AND BACTERIA IN PATIENTS WITHOUT PEPTIC ULCER
Questionnaires, gastroscopy reports, and histopathology and microbiology results were coded independently in separate departments. Complete results for individual patients were not known until the statistician had received all the data. The findings were tested for positive correlation with the presence of either bacteria or gastritis, by the chi-squared method. Fisher’s exact test of significance was used for all the 2 x2 tables in this paper. Results In
184 patients were examined by the unit. Of the 84 patients excluded, 5 refused gastroenterology 4 had to biopsy, and 75 patients, contraindications consent, could be invited to participate. unbooked not cases, mostly These patients closely matched the study group for age, sex, and incidence of peptic ulcers (table I). 12
weeks
Questionnaires 99 patients completed the questionnaires. The only symptom which correlated with gastritis or bacteria was "burping" which was more common in patients with bacteria (p = 003) or gastritis (p 0007). This association remained when patients with peptic ulcer were excluded. None of the other questionnaire responses showed any relationship to the presence of gastric bacteria or gastritis. =
Endoscopy There was a very close correlation between both gastric ulcer and duodenal ulcer and the presence of the bacteria (table II). Most patients with peptic ulcer also had gastritis ‘
(29/31; p=0-0002). TABLE I-COMPARISON OF PARTICIPANTS WITH EXCLUDED PATIENTS I
I
Histopathology Gastritis could usually be graded with confidence at low magnification. There was some difficulty with about 25 cases where the changes were mild or the specimens were small, superficial, or distorted. To ensure that gradings were reliable, single H & E sections from the last 40 cases were examined "blind" by another pathologist who agreed with the presence or absence of gastritis in 36 cases (90%), and gave an identical grading in 32. Gradings for bacteria by silver staining were more straightforward. The bacteria stained well and were easily
differentiated from contaminant bacteria or debris. Silver staining was the most sensitive method of detecting the spiral bacteria. Silver stained sections and Gram stained smears were both done in 96 cases and spiral bacteria were seen in 56 of them; 32 with both stains, 23 with silver alone, and 1 case with the Gram stain alone. The correlation between gastritis and bacteria, defined by Gram and/or by silver staining, was remarkable (table III). Gastritis was present in 55/57 biopsy specimens with bacteria (p=2X 10’). When the 31 patients with peptic ulcer were excluded, the correlation persisted, implying that the presence ofbacteria was not secondary to an ulcer crater (table
IV).
Microbiology Specimens
for culture were received from 96 patients and culture positive, all being seen with Gram and silver staining also. No spiral bacteria were grown from the first 34 cases, probably because the cultures were discarded too soon. 11
were
1313
micrograph from a mucosal biopsy with active chronic gastritis. Upper: many profiles of sectioned pylonc campylobacter are located on the lumlllal aspect of mucusElectron
secreting epithelial cells; plasma membranes are intact, but indented and almost devoid ofmicrovilli (bar= 1 m). Lower: at higher magnification groups of transversely and longitudinally cut sheathed flagella are visible (arrows; bar= 100 nm).
The bacteria
were S-shaped or curved gram-negative rods, mx0’55 /Am, with up to 11/a wavelengths. In electron micrographs they had smooth coats and there were usually four sheathed flagella arising from one end of the cell. They grew best in a microaerophilic atmosphere at 37 °C, a campylobacter gas generating kit was sufficient (Oxoid BR56). Moist chocolate or blood agar was the preferred medium. Growth was evident in 3 days as 1 mm diameter non-pigmented colonies. In artificial media the bacteria were usually larger and less curved than those seen on Gram stains of fresh tissue. They formed coccoid bodies in old cultures. The bacteria were oxidase +, catalase +, HzS +, indole -, urease -, nitrate -, and did not ferment glucose. They were sensitive to tetracycline, erythromycin, kanamycin, gentamicin and penicillin, and resistant to nalidixic acid. DNA base analysis gave a guanine+ cytosine content of 36 mol%, a value in the range for campylobacters. Sources of Bias The patient sample was from a defined population with gastric symptoms expected to have some gastroenterological abnormality. The biopsy tissue studied was from apparently intact mucosa-ie, not the sort of specimen a pathologist usually sees. We attempted to limit bias by making the study consecutive and blind, and were partly successful. The study was not strictly consecutive since 84 patients had to be excluded. However, gastroscopy reports and laboratory investigations were completed serially and usually independently ("blind") except that clinically relevant
3
material was sent (to J. R. W.) with study biopsies, mainly from cases of gastric ulcer. However, an independent blind assessment of gastritis in 40 cases matched the study results well. Discussion The spiral bacteria of the human gastric antrum have never been cultured before, and their association with active chronic gastritis has not been described. They are a new
species closely resembling campylobacters morphologically and in respect of atmospheric requirements and DNA base composition, but their flagellar morphology is not that of the genus Campylobacter.9 Campylobacters have a single unsheathed flagellum at one or both ends of the cell whereas the new organism has four sheathed flagella at one end .7,10 If it is premature to talk of "Campylobacter pyloridis"ll perhaps the name "pyloric campylobacter" will do to define the site where these organisms are commonly found and to indicate the similarity to known Campylobacter spp. There was no well-defined clinical syndrome associated with pyloric campylobacter. Only "burping" was significantly associated. Others have described this symptom in patients with non-ulcer dyspepsia and PMN infiltration of the antrum is also common in such patients. 12,13 We expected abdominal pain to correlate with pyloric campylobacter or gastritis, but it did not. Perhaps, since most patients undergoing gastroscopy have pain (75% in our study) the question "Do you have abdominal pain-yes or no?" was too general.
1314
questionnaire was designed to select likely of pyloric campylobacter infection. For bacteria might have colonised patients who already example, had gastritis and were taking antacids, milk, or cimetidine, thus impairing their "gastric acid barrier" and predisposing Much of the
sources or causes
them to infection.14 Animal contact and carious teeth were also considered as sources of infection. Campylobacters are commensals of domestic and farm animals (C coli, C jejunz), and they also inhabit the human mouth (C sputorum ss sputorum).15 We found no evidence that any of these factors predisposed to the infection. The absence of a relation between "known causes" of gastritis and the presence of histological gastritis has been noted by others. For example, analgesic abusers often have no gastritis, even when a gastric ulcer is present;16 alcohol consumption is not clearly related to gastritis;17 the quantity of bile in the stomach (duodenogastric reflux) is not obviously related to the state of gastric mucosa; 18 autoimmune disease is an unlikely cause, since gastric autoantibodies are uncommon except in pernicious anaemia, where the main histological changes are in the body of the stomach, not the antrum.19 Gastric ulcer seems an unlikely primary cause of antral gastritis because the gastritis remains after successful treatment of the ulcer with cimetidine or carbenoxolone, and gastritis is just as common in patients with duodenal ulcer as with gastric ulcer.’, 20-23 Thus, the aetiology of chronic gastritis remains uncertain. We have found a close association between pyloric campylobacter and antral gastritis. When PMN infiltrated the mucosa the bacteria were almost always present (38/40). In the absence of inflammation they were rare (2/31), suggesting that they are not commensals. The bacteria were not cultured unless the patient had histological evidence of both gastritis and pyloric campylobacter. We know of no other disease state where, in the absence of complicating factors such as ulceration (table IV), bacteria and PMNs are so intimately related without the bacteria being pathogenic. How does pyloric campylobacter survive? The bacteria were usually in close contact with the mucosa, often in grooves between cells, within acinus-like infoldings of the epithelium or within the mucosal pits (figure). The surface mucus coating was superficial to the bacteria and any foreign material or organisms from the oral flora were present above the mucus, rarely mixed with it, and not beneath it: the mucus bacteria. The appeared to form a stable layer over the antrum secretes mainly mucus, and the deeper levels of the surface mucus coating are slightly alkaline.24 Thus pyloric campylobacter grows in a near-neutral environment, in close contact with the mucosa and protected from the bactericidal gastric juice. The absence of these bacteria from past reports of gastric microbiology may be because only gastric juice was cultured.25,26 Even salmonellae cannot survive the low intragastric pH for more than a few minutes.14 Where gastric biopsy material has been cultured,6,27,28 microaerophilic techniques were not used and pyloric campylobacter did not grow. Peptic ulcer was the only endoscopic finding associatedwith histological gastritis and pyloric campylobacter. This was surprising since the bacteria were not prominent on gastric ulcer borders and in duodenal ulcer no correlation would be expected. Perhaps the mucus coating is deficient or unstable near ulcer borders, thus allowing damage to the bacteria as well as the mucosa. Within a few millimetres of an ulcer, both pyloric campylobacter and gastritis were usually present. Other studies have shown continuing gastritis after ulcer healing with cimetidine and we have observed the persistence of pyloric campylobacter colonisation in such
spiral
patients. The failure of the H2 receptor antagonists to prevent ulcer relapse is attributed to an underlying ulcer diathesis which is unaffected by therapy. A bacterial aetiology, with continuing gastritis, could be the explanation. The diathesis may be a myth. Of ulcer-healing agents the only one thought to improve relapse rates is tripotassium dicitratobismuthate.29 This compound is bactericidal to pyloric campylobacter and in patients treated with it the gastritis improved and the bacteria disappeared. 30 The aetiology of peptic ulceration is unknown but until now a bacterial cause has not really been considered. We have found colonisation of the gastric antrum with pyloric campylobacter in over half of a series of cases at routine endoscopy. The bacteria were present almost exclusively in patients with chronic antral gastritis and were alsocommon in those with peptic ulceration of the stomach or duodenum. Although cause-and-effect cannot be proved in a study of this kind, we believe that pyloric campylobacter is aetiologically related to chronic antral gastritis and, probably, to peptic ulceration also. We thank Dr T. E. Waters, Dr C. R. Sanderson, and the gastroenterology unit staff for the biopsies, Miss Helen Royce and Dr D. I. Annear for the microbiological studies, Mr Peter Rogers and Dr L. Sly for supplying the G & C data, Dr J. A. Armstrong for the electron microscopy, Dr R. Glancy for reviewing slides, Miss Joan Bot for the silver stains, Mrs Rose Rendell ofRaine Medical Statistics Unit UWA, and Ms Maureen Humphries, secretary, and, for travel support, Fremantle Hospital.
Correspondence should be addressed to: B. M., Department of Microbiology, Fremantle Hospital, PO Box 480, Fremantle 6160, Western Australia. REFERENCES
2.
Doenges JL. Spirochaetes in gastric glands of macacus rhesus and humans without definite history of related disease. Proc Soc Exp Biol Med 1938; 38: 536-38. Ito S. Anatomic structure of the gastric mucosa. In: Heidel US, Cody CF, eds.
3.
Handbook of physiology, section 6: Alimentary canal, vol II: secretion Washington, Physiological Society, 1967 705-41 Fung WP, Papadimitriou JM, Matz LR. Endoscopic, histological and ultrustructural
1.
DC: American
4. 5.
correlations in chronic gastritis. Am J Gasiroenterol 1979, 71: 269-79. Freedburg AS, Barron LE. The presence of spirochaetes in human gastric mucosa. Am J Dig Dis 1940, 7: 443-45. Palmer ED Investigation of the gastric Spirochaetes of the human. Gastroenterology
1954; 27: 218-20. HW, Colin-Jones DG Mucosal changes in gastric ulceration and their response to carbenoxolone sodium. Gut 1975; 16: 590-97. 7. Warren JR, Marshall B. Unidentified curved bacilli on gastric epithelium in active chronic gastritis. Lancet 1983; i: 1273-75 8. Whitehead R, Truelove SC, Gear MWL. The histological diagnosis of chronic gastritis in fibreoptic gastroscope biopsy specimens. J Clin Pathol 1972; 25: 1-11 9 Kaplan RL. Campylobacter. In: Lenette E, Balows A, Hausler WJ, Truant JP, eds. 6. Steer
Manual of clinical microbiology, 3rd ed. Washington, DC: American Society for Microbiology, 1980: 235-41. 10. Pead PJ. Electron microscopy of Campylobacter jejuni J Med Microbiol 1979; 12: 383-85. 11. Skirrow MB Taxonomy and biotyping. Morphological aspects. In: Pearson AD, Skirrow MB, Rowe B, Davies JR, Jones DM, eds. Campylobacter II: Proceedings of the Second International Workshop on Campylobacter Infections. London. Public Health Laboratory Service, 1983: 36. 12. Crean GP, Card WI, Beattie AD, Holden RJ, James WB, Knill-Jones RP, Lucas RW, Spiegelhalter D. Ulcer-like dyspepsia. Scand J Gastroenterol 1982; 17 (suppl 79): 9-15. 13. Greenlaw R, Sheahan DG, Deluca V, Miller D, Myerson D, Myerson P Gastroduodenitis: a broader concept of peptic ulcer disease. Dig Dis Set 1980; 25: 660-72. 14. Giannela RA, Broitman SA, Zamcheck N. Gastric acid barrier to ingested microorganisms in man: Studies in vivo and in vitro. Gut 1972; 13: 251-56. 15. Blaser MJ, Reller LB. Campylobacter enteritis. N Engl J Med 1981; 305: 1444-52. 16 MacDonald WC Correlation of mucosal histology and aspirin intake in chronic gastric ulcer. Gastroenterology 1973; 65: 381-89. 17. Wolff G. Does alcohol cause chronic gastritis? Scand JGastroenterol 1970; 5: 289-91 18. Goldner FH, Boyce HW. Relationship of bile in the stomach to gastritis. Gastrointest Endosc 1976; 22: 197-99. 19. Whitehead R. Mucosal biopsy of the gastrointestinal tract. In: Bennington JL, ed. Ma)or problems in pathology: Vol III, 2nd ed. Philadelphia: WB Saunders, 1979: 15. 20. Gilmore HM, Forrest JAH, Pettes MR, Logan RFA, Heading RC Effect of short and long term cimetidine on histological duodenitis and gastritis. Gut 1978; 19: 981. 21. Mclntrye RLE, Piris J. Truelove SC. Effect of cimetidine on chronic gastritis in gastric ulcer patients Aust NZ J Med 1982; 12: 106. 22. Schrager J, Spink R, Mitra S. The antrum in patients with duodenal and gastric ulcers. Gut 1967; 8: 497-508.
1315
AZTREONAM COMPARED WITH GENTAMICIN FOR TREATMENT OF SERIOUS URINARY TRACT INFECTIONS
gentamicin in the treatment of serious urinary tract infections in patients requiring parenteral therapy. Patients and Methods
FRED R. SATTLER JAMES E. MOYER MARGARET SCHRAMM JEFFREY S. LOMBARD PETER C. APPELBAUM Division of Infectious Diseases and Epidemiology,
Department of
Medicine, Division of Urology, Department of Surgery, and
Department of Pathology, Pennsylvania State University College of Medicine, Hershey, Pennsylvania USA
Summary
52
patients with serious urinary tract infec-
randomised to receive either gentamicin (17). In the aztreonam group 23 had unqualified cures, 6 cures with relapse, and 6 patients cures with reinfection; the comparable numbers in the gentamicin group were 9, 1, and 4. There were no failures with aztreonam and 3 with gentamicin. The most important determinant of outcome was the presence or absence of urological abnormalities. 11 further patients, with renal failure or gentamicin-resistant isolates, treated with aztreonam were all cured. Toxic effects were limited to symptomless liver-function-test abnormalities with aztreonam, whereas deterioration in renal function occurred in 4 gentamicin-treated subjects. Urinary colonisation with group D streptococci occurred in 14 of 46 aztreonam-treated patients (1 required treatment) compared with only 1 of 17 gentamicin-treated patients. 97% of 309 consecutive gramnegative urinary isolates tested, including 50 Pseudomonas aeruginosa, were susceptible in vitro to aztreonam and 91% to gentamicin. Aztreonam may prove an effective and safe alternative to the aminoglycosides. tions
were
aztreonam (35) or
Introduction AZTREONAM is the first of a class of synthetic antimicrobials called monobactams. These are monocyclic 3-lactam drugs that lack the two-ring configuration of penicillin and cephalosporin molecules. In vitro aztreonam inhibits the growth of most Enterobacteriaceae, including multiply drug resistant strains of Serratia marcescens, at concentrations of 2 g/ml or less. 1-3 Moreover, more than 90% of Pseudomonas aeruginosa isolates are inhibited by concentrations of 16 lAg/ml and less.I-3 In human beings, 1 or 2 g given intravenously produces serum concentrations of at least 100-200 J.lg/ml. 4,5 In addition, the serum half-life of approximately 2 h is suitable for a dose interval of 8 or 1’2 h.4,s These in-vitro and pharmacological properties suggest that aztreonam has the potential to replace the aminoglycosides for therapy of gramnegative infections. We have compared aztreonam with
23.
Magnus HA. Gastritis. In: Jones FA, ed. Modern trends in gastroenterology. London Butterworth,
1952: 323-51.
24. Allen A, Garner G. Mucus and bicarbonate secretion in the stomach and their possible role in mucosal protection. Gut 1980; 21: 249-62. 25. Draser BS, Shiner M, McLeod GM. Studies on the intestinal flora I: The bacterial flora of the gastrointestinal tract in healthy and achlorhydric persons. Gastroenterology
1969; 56: 71-79. 26. Enander LK, Nilsson F, Ryden AC, Schwan A The aerobic and anaerobic flora of the gastric remnant more than 15 years after Billroth II resection. Scand J Gastroenterol 1982; 17: 715-20. 27. Mackay IR, Hislop IG. Chronic gastritis and gastric ulcer. Gut 1966; 7: 228-33. 28. Rollason TP, Stone J, Rhodes JM. Spiral organisms in endoscopic biopsies of the human stomach. J Clin Pathol 1984; 37: 23-26. 29. Martin DF, May SJ, Tweedle DE, Hollanders D, Ravenscroft MM, Miller JP. Difference in relapse rates of duodenal ulcer after healing with cimetidine or tripotassium di-citrato bismuthate. Lancet 1980; i: 7-10. 30. Marshall B, Hislop I, Glancy R, Armstrong J. Histological improvement of active chronic gastritis in patients treated with De-Nol. Aust NZ J Med (in press) (abstr).
Comparative Study Adult patients with a presumptive diagnosis of urinary tract infection who required systemic antibiotic therapy were eligible for study. Minimum criteria for enrolment included: fever 37’ 8°C or signs and symptoms of urinary tract infection; > 10 leucocytes per high power field of urinary sediment, and microscopic evidence of bacteriuria (1 gram-negative rod/oil immersion field in fresh uncentrifuged urine or bacteria too numerous to count per high power field in unstained sediment of fresh urine collected by clean catch or catheterisation). Patients who had had an indwelling Foley catheter or nephrostomy tube in the preceding 48 h were excluded. All patients gave informed consent. A pharmacist assigned the enrolled patients to receive aztreonam (I g every 8 h) or gentamicin (1 mg/kg every 8 h) in a 2:1 manner according to a table of random numbers. If bacteraemia was suspected the doses were increased to 2 g and 1’ 7 mg/kg, respectively. In patients with creatinine clearances below 30 ml/min, the dose of aztreonam was halved and that of gentamicin was changed according to serum drug levels. Both drugs were given by intravenous infusion over 20-30 min or intramuscularly if there was inadequate venous access; the two routes of administration give similar areas under the serum concentration curve and the proportions in the two groups receiving the drugs intramuscularly were similar. Treatment was continued for 5-10 days in uncomplicated cases. Patients with bacteraemia were treated for 10-14 days. Treatment was discontinued if pretreatment urine cultures did not grow >105 gram-negative organisms/ml.
Open Study Other patients were treated with aztreonam in an open, unrandomised way if their infecting organisms were known to be resistant to gentamicin or if they had renal failure (creatinine clearance <50 ml/min). Enrolment, treatment, and follow-up evaluation were otherwise carried out as in the comparative study.
Laboratory Studies Urine was collected for culture and sensitivity testingbefore treatment, after 2-4 days of treatment, and at the end. Test-of-cure cultures were obtained 5-9 days and 4-6 weeks after the last day of treatment. Blood samples for cultures were collected before treatment and, if positive, after 24-48 h of treatment, and 24 h after the completion of treatment. Haematology and chemistry test results were monitored for drug-related toxic reactions. These tests were done before the study, every 3-5 days during treatment, and on the last day of treatment.
Definitions of Outcome Unqualified cure.-Urine cultures were sterile at the completion of and 5-9 days and 4-6 weeks later. Cure with relapse.-Urine cultures at the end of treatment and 5-9 days later were sterile but those at 4-6 weeks grew &ge;105 colonyforming units/ml (cfu/ml) of the original infecting organism. Cure with reinfection.-Urine cultures at the end of treatment and 5-9 days later were sterile but those at 4-6 weeks grew > ’105 cfu/ml of an organism different from the original infecting isolate. Failure.-Urine culture during treatment or 5-9 days later grew 1O5 cfulml of the original infecting organism. Superinfection.-Urine culture during therapy or up to 5-9 days after treatment grew an organism different from the original infecting organism and corresponding gram stain of uncentrifuged urine showed 1 organism per oil immersion field. treatment
Results
Comparative Study From July 1, 1982, to Feb 28, 1983, 60 patients were enrolled in the comparative study. 8 patients (6 in the
SEMINARIO N°5
¿DOS RESPUESTAS PARA UN MISMO PROBLEMA? Mooney SJ, Knox J, Morabia A. The Thompson-McFadden Commission and Joseph Goldberger: Contrasting 2 Historical Investigations of Pellagra in Cotton Mill Villages in South Carolina. Am J Epidemiol. 2014; 180(3): 235–244.
Preguntas para el control de lectura y guía de discusión grupal 1.
¿Cuál fue el problema de investigación motivo de estudio de la Comisión Thompson-McFadden y de Joseph Goldberger? ¿Qué tipo de problema de investigación fue?
2.
Siendo un mismo problema de investigación, la Comisión Thompson-McFadden planteó una hipótesis mientras que el equipo de investigación de Joseph Goldberger otra. Esta diferencia en el planteamiento de hipótesis puede presentarse en el proceso de la investigación científica. Proporcione otro ejemplo histórico en el área de la salud donde se hayan planteado hipótesis distintas ante un mismo problema.
3.
Se nos relata en el artículo que Goldberger realizó una investigación en la cual indujo la ocurrencia de pelagra en un grupo de personas privadas de su libertad. Un fragmento de este estudio lo tenemos a continuación:
Desde el punto de vista ético, ¿qué opinión tiene respecto a esta investigación de Goldberger?
4.
Analice los resultados mostrados en la tabla 3. ¿Qué criterio de causalidad es que podríamos argumentar con estos datos? Revise los denominados “criterios de causalidad de Hill” y explique su respuesta.
5.
Los autores del artículo señalan tres motivos por los cuales la Comisión Thompson-McFadden arribó a conclusiones equivocadas. Los primeros dos motivos están relacionados con errores sistemáticos. ¿Cuáles son estos errores sistemáticos en este caso?
6.
Los autores mencionan como una de las características de los estudio de Goldberger la flexibilidad del razonamiento analítico. ¿Cómo entender esta característica? 32
METODOLOGÍA DE LA INVESTIGACIÓN I
Preguntas adicionales a ser discutidas en la sesión grupal 1.
Compare los argumentos a favor de cada hipótesis planteada, dados por cada uno de los grupos de investigación. Situándonos en el contexto histórico cuando ocurrieron estos hecho, ¿son ambos argumentos razonables? ¿Por qué?
2.
En la página 237 se señala que Goldberger falló en transmitir la pelagra en un modelo animal empleando monos. Esta experiencia, puede decirse, buscaba corroborar uno de los postulados de Koch sobre la transmisión de las enfermedades infecciosas. ¿En qué consisten los postulados de Koch? ¿Son aplicables en nuestros días?
3.
En el artículo también se describe lo que se conoció como filth parties. ¿Qué finalidad tuvieron?
4.
La figura 3 muestra una diferencia importante en las incidencias de pelagra. ¿Cuál fue posiblemente la causa de esta diferencia?
5.
Los autores del artículo realizan una serie de simulaciones estadísticas en función a los datos de las investigaciones de Goldberger, aplicando a estos hallazgos porcentajes de mala clasificación de variables. Un hallazgo principal de este análisis de muestra en la figura 6. ¿Qué podemos concluir a partir de este gráfico?
6.
¿Cuál fue la aplicación, desde el punto de vista de salud pública, de los hallazgos de las investigaciones de Goldberger?
33
METODOLOGÍA DE LA INVESTIGACIÓN I
American Journal of Epidemiology © The Author 2014. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Vol. 180, No. 3 DOI: 10.1093/aje/kwu134 Advance Access publication: June 24, 2014
Epidemiology in History The Thompson-McFadden Commission and Joseph Goldberger: Contrasting 2 Historical Investigations of Pellagra in Cotton Mill Villages in South Carolina
Stephen J. Mooney*, Justin Knox, and Alfredo Morabia * Correspondence to Stephen J. Mooney, Department of Epidemiology, Mailman School of Public Health, 722 West 168th Street, Room 729, New York, NY 10032 (e-mail: sjm2186@columbia.edu).
Initially submitted November 22, 2013; accepted for publication April 29, 2014.
As pellagra reached epidemic proportions in the United States in the early 20th century, 2 teams of investigators assessed its incidence in cotton mill villages in South Carolina. The first, the Thompson-McFadden Commission, concluded that pellagra was likely infectious. The second, a Public Health Service investigation led by Joseph Goldberger, concluded that pellagra was caused by a dietary deficiency. In this paper, we recount the history of the 2 investigations and consider how the differences between the 2 studies’ designs, measurements, analyses, and interpretations led to different conclusions. Because the novel dietary assessment strategy was a key feature of the Public Health Service’s study design, we incorporated simulated measurement error in a reanalysis of the Public Health Service’s data to assess whether this specific difference affected the divergent conclusions. epidemiology in history; measurement; multilevel epidemiology; nutrition
Pellagra
to relapse after active treatment ended (9)), it was reasonable to expect to find another infectious cause of pellagra. Meanwhile, Christiaan Eijkman’s finding that thiamine deficiency was the sole cause of beriberi disease provided a model for disease caused exclusively by nutrient deficiency, even in a diet with adequate caloric content (10). With etiology uncertain, early investigations in the United States described the pellagra epidemic patterns. Throughout Appalachia, pellagra was prevalent in villages where residents were mainly employed by a cotton mill (11), and it was associated with seasons, poverty, poor sanitation, and, potentially, diet (12). The growing epidemic led leaders in the affected states to pressure for more intensive investigations.
Pellagra, meaning “sour skin” in Italian, manifests through the “4 Ds”: dermatitis, diarrhea, dementia, and death. It was first formally described among Spanish agricultural workers in 1735 but remained rare in America until the early 20th century (1). Then, in the period from 1906 to 1940, it reached epidemic proportions, resulting in more than 3 million cases and 100,000 deaths (2). Cases were most predominant among the socially disadvantaged: almost always among the very poor and disproportionately among blacks and women (3, 4). As of the early 20th century, pellagra’s etiology was unknown, allowing some to believe it had an infectious origin (5). Others believed pellagra was caused by poor diet, either following Casimir Funk’s emerging theory of vitamin deficiency (6) or the longstanding belief that spoiled corn contained a toxic agent (7, 8). Both the infectious and dietary deficiency hypotheses were supported by recent discoveries. In the period following Robert Koch’s identification of Bacillus anthracis in 1875, many diseases of unproven etiology, including tuberculosis, cholera, pneumonia, scarlet fever, and diphtheria, had been linked to bacteria. Given the resemblance pellagra bore to tuberculosis (e.g., endemic in environments marked by poverty and poor sanitation and amenable to treatment with environmental change, yet prone
The Thompson-McFadden Commission
Through the influence of George Miller, president of the New York Post-Graduate Medical School, mining magnate Robert M. Thompson and cotton merchant Henry McFadden together donated funds for a commission bearing their names to investigate pellagra. The Thompson-McFadden Commission, led by Joseph F. Siler and Philip E. Garrison, 2 officers in the Armed Services Medical Corps, and Ward J. MacNeal, a doctor affiliated with New York Post-Graduate Medical School, investigated the epidemic on a large scale. 235
Am J Epidemiol. 2014;180(3):235–244
236 Mooney et al.
Table 1. Food Consumption Categories in the ThompsonMcFadden Commission’s Household Canvass, South Carolina, 1912–1913 Period of Consumption
(Table 2). Most foods, including meat and corn, were found to have no association with pellagra. An observed negative association between milk consumption and pellagra was noted but dismissed because milk’s “use did not fully insure against the development of the disease” (16, p. 373). Two findings appeared to contradict a dietary hypothesis. First, in contrast to beriberi, pellagra was almost never found among nursing infants (17). Second, within the surveyed villages, pellagra was more common among whites than blacks, in spite of blacks’ perceived worse diet (17). The infectious hypothesis was explored using an ad hoc classification of every house in the villages as being located, as of 1912, in zone 1 (containing a pellagra case), zone 2 (adjacent to a house with a pellagra case), or zone 3 (not adjacent to a house with a pellagra case). Individual zone assignment of residents who had lived in the zone for at least 2 weeks provided zone-specific incidence rates among all villages, yielding a striking inverse gradient with distance from pellagra cases consistent across villages (Figure 1). The lower incidence of pellagra in villages with better sanitation systems was also consistent with the infectious hypothesis (17), and the Thompson-McFadden Commission ultimately concluded that an infectious etiology was more likely than a dietary one.
Food Consumption Category 7 Times/ week
2–6 Times/ week
Fewer Than 2 Times/week
0 Times/ week
Seasonal
Part-time daily
Part-time habitually
Part-time rarely
Never
Year-round
Daily
Habitually
Rarely
The investigation, which lasted from 1912 to 1914, focused on the Spartanburg, South Carolina, area because of local cooperation, personal interest of South Carolina Senator Ben Tillman, and pellagra’s prevalence in the area (4, 13). An initial survey of individuals with pellagra described the local epidemic conditions (14, 15). Afterward, the Commission chose 6 cotton mill villages (Inman Mills, Whitney, Pacolet Mills, Saxon Mills, Arkwright, and Spartan Mills) to explore both dietary and infectious hypotheses (16). Every household in these villages was canvassed, with an interviewer asking 1 household resident (“usually the housewife” (16, p. 294)) for the individual age, sex, occupation, residential history, and pellagra status of each resident, as well as questions about the consumption of corn meal, grits, wheat flour, fresh meat, cured meat, lard, canned foods, milk, eggs, and butter. A resident was considered to be a case if a skin lesion was present at the time of canvass or if both the resident and his or her treating physician confirmed the prior presence of a lesion (15). Food consumption was categorized as year-round or seasonal, and as daily, habitual (less than daily but at least twice a week, on average), rare (twice a week or less), or never, for a total of 7 categories (Table 1). The Commission analyzed individual- and householdlevel diets as causes of pellagra, dismissing a dietary cause
Goldberger and the Public Health Service investigation
As the Thompson-McFadden Commission’s work concluded, the pellagra epidemic continued, and the U.S. Congress appropriated funding for a major investigation. In 1914, Joseph Goldberger, who had distinguished himself in the Public Health Service by combating yellow fever, dengue fever, typhoid fever, measles, and typhus, was appointed to lead the investigation. Goldberger reviewed the literature and concluded that pellagra had a dietary origin because 1) outbreaks occurring in institutional settings never affected the staff of the institutions (18); 2) among poor Southern whites, outbreaks and diet followed a similar seasonal pattern
Table 2. Selected Results From the Thompson-McFadden Commission’s Investigation of Pellagra Incidence in Cotton Mill Village Households in Relation to Self-Reported Consumption of Certain Foods, South Carolina, 1912–1913 Shipped Corn Meal Timing and Frequency of Consumption
Total No. of Households
Fresh Meat
Households With Incident Case No.
%
Total No. of Households
Fresh Milk
Households With Incident Case No.
Total No. of Households
%
Households With Incident Case No.
%
6.8
Year-round Daily
414
38
9.1
13
2
15.3
527
36
Habitually
161
17
10.6
151
21
13.3
112
9
8
85
9
10.6
337
33
9.7
121
9
7.4
Daily
69
6
8.7
8
1
12.5
0
0
Habitually
25
2
8
261
23
8.7
0
0
Rarely
11
4
36.4
45
4
8.9
1
1
100
9
9
46
1
2.1
117
24
Rarely Seasonal
Never
100 20.5
Am J Epidemiol. 2014;180(3):235–244
Contrasting Historical Pellagra Investigations 237
Figure 1. The Thompson-McFadden Commission’s chart of pellagra incidence rate by village and by zone, an ad hoc clustering analysis, as published in the report by Siler et al. (16). Subjects in zone 1 resided in the same household as a pellagrin (i.e., individual with pellagra), those in zone 2 resided adjacent to a pellagrin, and those in zone 3 lived farther away from any pellagrins. The Commission argued that the much higher incidence rates in zone 1 supported an infectious hypothesis. A, Arkwright, South Carolina; Av, average; I, Inman Mills, South Carolina; P, Pacolet Mills, South Carolina; SA, Saxon Mills, South Carolina; SP, Spartan Mills, South Carolina; W, Whitney, South Carolina. Reprinted from Arch Intern Med. XIV(3):293–373. Copyright ©1914 American Medical Association. All rights reserved.
(19); 3) although pellagra was associated with poverty, it rarely affected poor farmers with cattle (1, 19); and 4) pellagra’s association with diet even among individuals with a calorically adequate but monotonous diet bore similarities to beriberi and scurvy (20). Goldberger’s prior failure to Am J Epidemiol. 2014;180(3):235–244
transmit pellagra in monkeys may have given him a perspective the Thompson-McFadden Commission lacked (21). Working from the vitamin deficiency hypothesis, Goldberger used broad dietary changes to prevent pellagra in an orphanage and to cause and subsequently cure pellagra in a prison (22, 23). He next worked to rule out the infectious hypothesis by experimentally using nasal secretions, scabs, blood, urine, and feces from pellagrins (i.e., individuals with pellagra) to attempt to transmit pellagra to willing subjects (including himself, his wife, and his close associates) using techniques (e.g., ingestion, intramuscular injection, subcutaneous injection) that propagated other diseases; none of these attempts was successful (24). Yet despite this evidence in favor of a solely dietary hypothesis, Goldberger’s views were criticized, including by members of the ThompsonMcFadden Commission, who argued that the selected populations were different than those most at risk (25), and the diet that Goldberger believed had induced pellagra in prisoners had in fact merely weakened the body’s resistance to an as yet unidentified infectious agent (1). Therefore, Goldberger, with the support of the Public Health Service, undertook to show that pellagra in the general population could also be explained by diet. He followed up on the Thompson-McFadden Commission’s study of pellagra in cotton mill villages by undertaking another study in an overlapping set of cotton mill villages. The Public Health Service gathered data at the individual, household, and village levels using a household-by-household canvass and assessed both the dietary and the sanitation-mediated infectious hypotheses in a study closely resembling the ThompsonMcFadden Commission’s, yet with 2 notable differences. The first difference between the 2 studies was the extraordinary attention given by the Public Health Service to diet and pellagra assessment, including an integration of administrative and interview data that, to our knowledge, was unique for its time. Cohort studies preceding these pellagra investigations were based either on vital statistics alone (e.g., the work by Weinberg (26)) or on medical record abstraction (e.g., the work by Lane-Claypon (27)). The case definition required clearly defined, bilaterally symmetrical dermatitis as assessed by Goldberger or Goldberger’s assistant, G.A. Wheeler, who was also a physician. (Figure 2) (28). A 15-day household diet just prior to the peak pellagra season (between April 16 and June 15) was derived from food supply assessed through company store records and supplemented by housewife surveys of the quantity and value of food items obtained from other sources (e.g., produced at home, given to the household, or purchased from neighbors, farmers, or hucksters). Individual food supply was then weighted using the Atwater scale (for more detail, see Web Appendix 1 available at http://aje.oxfordjournals.org/). The second major difference was the Public Health Service’s selection of a slightly different set of cotton mill villages. Spartan Mills and Pacolet Mills, the larger villages in the Thompson-McFadden study, were replaced with Newry, Seneca, and Republic. The change in villages was likely related to a desire to compare sanitation levels in relation to pellagra incidence between villages. The Thompson-McFadden Commission had previously invoked poor sanitation as supporting an infectious hypothesis by contrasting Seneca’s
238 Mooney et al.
Figure 2. Reproduction of a drawing from a 1913 paper on pellagra in Africa with shading to show where lesions characteristic of pellagra were found on the body (44). Ward J. MacNeal of the Thompson-McFadden Commission included this drawing in a letter to the editor of the Journal of the American Medical Association arguing that the pellagra that Joseph Goldberger and G.A. Wheeler had induced in prison inmates did not follow the characteristic lesion pattern (45). This criticism may have led to the case definition in Goldberger and Wheeler’s subsequent investigation of pellagra in cotton mill villages in South Carolina. Reprinted from Trans Royal Soc Trop Med Hyg. 7(1):32–56. Copyright ©1913 Oxford University Press. All rights reserved.
pellagra incidence with Newry’s in an analysis focused on sanitation alone (29). Republic, like Newry, was unusual among cotton mill villages in having an improved sewage system. The Public Health Service study confirmed several previous results. The peak season for pellagra was late spring (12). Pellagra was positively associated with young age (between 2 and 10 years) regardless of sex and with female sex in adulthood (13, 30). Pellagra was much more common among poor households (12) and was associated with diet (Table 3). At the village level, pellagra was highly variable (Figure 3), but that variability was not associated with sanitation (which varied between villages but did not covary with pellagra incidence) (31, 32) or with poverty (which was roughly comparable in all villages) (33). These findings did not conclusively implicate diet as the cause of pellagra. No single food or food category was present in all nonpellagrous households and missing in all pellagrous households. Furthermore, because poverty was a cause both of poor diet and of squalid conditions permitting infectious agents to proliferate, a dietary association with pellagra did not fully distinguish between dietary and infectious
etiological hypotheses (Web Figure 1). To disentangle the effects of poverty and poor diet, Goldberger supplemented his between-household comparison with a between-village comparison in an early form of multilevel epidemiologic analysis. Having previously shown that sanitary conditions at the village level were unrelated to pellagra incidence, and that income distribution, food prices, and individuals of susceptible ages were comparable between villages, Goldberger compared a village with high incidence of pellagra, Inman Mills, to one with a single case, Newry. What differed between the 2, he argued, was food supply. Inman Mills had a company store with little fresh food and almost no nearby farms, whereas Newry had a well-stocked store and a wealth of nearby farmers selling food in town. Household-level purchase records indicated far more fresh meat and milk in Newry diets, even among the very poor. Thus, community conditions of food availability provided the best explanation for the difference in pellagra incidence between Inman Mills and Newry (33). By quantifying how the availability of food within the village shaped the specific foods consumed by individuals within households, Goldberger identified dietary Am J Epidemiol. 2014;180(3):235–244
Contrasting Historical Pellagra Investigations 239
Table 3. Selected Results From the Public Health Service Commission’s Investigation of Pellagra Incidence in Cotton Mill Village Households in Relation to Supply of Certain Foods per Adult Male Equivalent as Weighted by the Atwater Scale, South Carolina, April–June 1916 Food Item and Adult Male Unita Supply
Total No. of Households
Households With Incident Case No.
%
20
6.6
b
Corn meal, lbs <4.0
304
4.0–7.9
260
24
9.2
8.0–11.9
117
13
11.1
61
4
6.6
<1.0
495
54
10.9
1.0–1.9
131
4
3.1
2.0–2.9
61
2
3.3
3.0–3.9
36
1
2.8
18
0
0
<1.0
154
28
18
1.0–6.9
262
16
6.2
7.0–12.9
163
8
4.9
13.0–18.9
90
4
4.4
≥19.0
58
0
0
≥12.0 Fresh meat, lbsb
≥4.0 c
Fresh milk, qts
a
The Atwater scale accounts for the effect of household size on resources available to each individual within the household. Any male over the age of 16 years is 1 adult male unit, and women and children are calculated as proportions of an adult male unit on the basis of their expected consumption. b 1 lb = 0.454 kg. c 1 qt = 0.946 L.
deficiency patterns, the mechanisms of which that were not fully explained until biochemical analysis techniques developed. From these findings and analogy with beriberi, Goldberger wove a narrative highlighting the role of diet in explaining pellagra variation through multiple levels of analysis: age and sex were related to status at the dinner table and ability to supplement diet outside the home; income determined both the ability to purchase high-quality food and portion sizes of those foods; and village determined access to food. Furthermore, the recent emergence of pellagra could also be explained by diet; changes in the Southern economy (which included a transition from primarily growing food, which was then available in local markets, to producing cotton) had led to large-scale dietary changes (34). Overall, this narrative, coupled with prior results of dietary experiments, made a strong case for a dietary cause of pellagra. Why the investigations reached different conclusions
Goldberger’s hypothesis was ultimately found to be correct: individuals with diets deficient in niacin (vitamin B3) Am J Epidemiol. 2014;180(3):235–244
and tryptophan (which the body converts to niacin) develop pellagra (35). Yet the Thompson-McFadden Commission, a well-funded, well-intentioned, and professionally run study reached the wrong conclusions. Why did the Commission fail? We hypothesize that the failure was due to inaccurate dietary assessment, lack of between-village variability, and an interpretation of data shaped by an a priori assumption of infectious etiology. First, the Thompson-McFadden Commission’s dietary canvassing strategy was not as accurate as the Public Health Service’s use of company store records. By limiting dietary assessment to a single per-household interview and not considering household composition, the Commission could not account for seasonal variation in diet or for variations in portion size among individuals and households. Furthermore, in a context of severe poverty, some respondents may have reported on idealized diets rather than their actual diets. It is possible that, given the stigma attached to pellagra (2), those with disease may have overreported eating meat for social desirability reasons, although we have no data from which to assess this hypothesis today. Additionally, the Commission’s acceptance of patient and physician reports of pellagra may have resulted in the inclusion of cases of other diseases causing dermatitis. The most likely such disease is ariboflavinosis, a disease caused by riboflavin deficiency, which also results in skin lesions. Ariboflavinosis lesions are characteristically located in the mouth and genitals rather than on the extremities, as in pellagra (36); the Public Health Service’s case definition likely would have excluded ariboflavinosis (19). Second, the Public Health Service’s selection of villages to canvass was also superior in terms of exposure and disease variability. By including both Newry, with its broader food supply and improved sanitation system but no cases of pellagra, and Republic, with an improved sanitation system but prevalent pellagra, the Public Health Service was able both to examine a greater range of dietary conditions than the Thompson-McFadden Commission had been and to distinguish between the effects of improved diet and improved sanitation. Finally, the Thompson-McFadden Commission ignored the evidence for the dietary hypothesis, downplaying an observed protective association with milk because some cases of pellagra reported consuming milk. There is evidence, however, that the Commission could have interpreted its own data differently. In 1914, Edward Vedder, a nutritionist, wrote a report enumerating the resemblances between beriberi and pellagra (37). Invited by the Commission to investigate their data, Vedder noted that the age and sex patterning, the seasonality of the disease, and the lack of correlation between population density and pellagra incidence supported the dietary hypothesis. Vedder also observed that residents of Newry, whose dietary patterns the Thompson-McFadden Commission had not assessed but who were known to have a superior food environment, had little to no pellagra. He further pointed out that the Commission’s dietary survey did not capture portion size, and that diet should be considered as a whole rather than by individual foods. Finally, Vedder noted 2 flaws with arguments for the infectious hypothesis. First, contact tracing was nearly meaningless in a context in
240 Mooney et al.
70
Pellagra Incidence per 1,000 Residents
60 Public Health Service estimate Thompson-McFadden Commission estimate 50
40
30
20
10
0 Inman Mills
Republic
Saxon Mills
Arkwright
Whitney Village
Seneca
Newry
Spartan Mills Pacolet Mills
Figure 3. Pellagra incidence rates per 1,000 individuals in cotton mill villages in South Carolina in 2 surveys. The Thompson-McFadden Commission’s case ascertainment (in 1912–1913) may have included some ariboflavinosis along with pellagra, whereas the Public Health Service’s case ascertainment (in 1916) was more specific. The Public Health Service’s inclusion of villages with a broader range of pellagra rates, most notably Newry, also allowed for greater comparison of the role of context. Thompson-McFadden Commission data were extracted from Tables 53, 54, 57, and 58 in the report by Siler et al. (16); the Public Health Service data were extracted from Table I of the report by Goldberger et al. (28).
which all residents had some contact with pellagra. Second, given that hospital staff never contracted pellagra from patients after close contact, observed clustering within households was more consistent with causation due to shared diet than to shared exposure to an infectious agent (38). Unfortunately, the Thompson-McFadden Commission dismissed Vedder’s ideas, arguing that the dietary hypothesis would predict simultaneous development of pellagra within a household, whereas they had observed lags of months or years between incident cases (25). However, the Public Health Service was aware of Vedder’s work (19, 33); it may have been that Vedder’s critique of the Commission’s survey motivated the Public Health Service to use a dietary assessment strategy that accounted for portion size. Reanalysis
The Public Health Service’s attention to assessing household diet and validating the reported cases of pellagra indicates grave concern that measurement error might blur the nature of the association between diet and pellagra (19). To understand the specific importance of measurement differences between the 2 studies, we simulated analyses the Public Health Service might have performed using methods only as accurate as those used by the Thompson-McFadden
Commission. Our hypothesis was that inaccurate case definition and dietary assessment were 2 of the main factors preventing the Thompson-McFadden Commission from identifying the true cause of pellagra. METHODS Data collection
We were unable to retrieve original individual-level data from the Public Health Service study. Our search targeted the Goldberger archive at the University of North Carolina (C. Gray, University of North Carolina at Chapel Hill, personal communication, 2013), the Goldberger archive at Vanderbilt University (C. Ryland, Vanderbilt University Medical Center, personal communication, 2013), and the Public Health Service archive. We also made personal inquiries to the Pearl S. Buck family archive (the Public Health Service’s chief statistician, Edgar Sydenstricker, was the brother of Nobel laureate Pearl S. Buck), the descendants of G.A. Wheeler, and the Milbank Memorial Fund, where junior statistician Dorothy Wiehl, who likely compiled results into the final published tables, later worked. We therefore restricted our reanalysis to summary data extracted from papers authored by the Thompson-McFadden Am J Epidemiol. 2014;180(3):235–244
Contrasting Historical Pellagra Investigations 241
A)
B) 0.15
% of Households With Pellagra
% of Households With Pellagra
0.15
0.10
0.05
0.10
0.05
0.00
0.00 <1.0
1.0–1.9
2.0–2.9
3.0–3.9
≥4.0
Household Meat Supply per Adult Male Unit, lbs
<1.0
1.0–1.9
2.0–2.9
3.0–3.9
≥4.0
Household Meat Supply per Adult Male Unit, lbs
Figure 4. The association between household-level pellagra incidence and A) estimate of meat supply (slope = −0.022) in the Public Health Service study, South Carolina, April 1916–June 1916 and B) in 25 data sets simulated from the Public Health Service data with pellagra measured at 98.6% specificity and with 25% of household meat supply misclassified (median slope = −0.010). Each thin line represents the slope estimate from 1 simulation; the thick line represents the median estimate. 1 lb = 0.454 kg. Public Health Service data were extracted from Table IX of the report by Goldberger et al. (19).
Commission and the Public Health Service investigators. We simulated measurement error in meat assessment because milk and meat were the only 2 niacin-containing foods considered by both the Public Health Service and the Thompson-McFadden Commission. Data on the relationship of fresh meat supply to pellagra were extracted from Table IX of the Public Health Service’s diet paper (19) and from Tables 49 and 50 of the Thompson-McFadden Commission’s diet paper (16). Data on village-specific pellagra incidence were extracted from Tables I and III of the Public Health Service’s sanitation paper (31) and from Tables 53, 54, 57, and 58 of Thompson-McFadden Commission’s diet paper (16). Statistical analysis
Our analysis treated the Public Health Service data on meat supply and pellagra as the truth and simulated misclassification of both pellagra diagnosis and meat supply. Because the Commission’s canvassing strategy was as inclusive as that Am J Epidemiol. 2014;180(3):235–244
used by the Public Health Service (14), we assumed a sensitivity of 100% for pellagra diagnosis. However, because the Commission’s case definition allowed for patient or physician reports of skin lesions and made no mention of requiring bilateral symmetry (14), the Commission likely included some cases of ariboflavinosis in their pellagra group. Because pellagra’s yearly mortality rates among whites were relatively constant in the mid-1910s (39), we assumed that the total prevalence of pellagra in the 4 villages assessed by both investigations was equal in 1912 and 1916 and estimated specificity for the Commission’s diagnosis by assuming that the excess pellagra incidence observed by the Commission was due to ariboflavinosis misdiagnosed as pellagra. Further, because the Commission’s assessment of meat consumption was based on self-reports and was not strictly seasonal (16), we simulated misclassification of meat consumption ranging from 0% to 30% of individuals. We then compared the relation of partially misclassified meat supply to partially misclassified pellagra to assess how much weaker the trend of
242 Mooney et al.
Proportion of Simulations Resulting in a Suggestive Correlation
1.0
0.8
0.6
0.4 0
10 15 20 25 5 % of Households for Which Meat Supply Was Misclassified
30
Figure 5. Relation of proportion of meat supply misclassified to the proportion of simulations with a slope of 9.1 cases of pellagra per 1,000 subjects for each increase in meat category, an association the Public Health Service described as suggestive of an inverse correlation. All simulations assume 100% sensitivity and 98.6% specificity of pellagra case ascertainment, estimated by comparing the Thompson-McFadden Commission’s observed pellagra cases to those observed by the Public Health Service, which used a more rigorous case definition. Based on data collected in South Carolina in 1916.
increasing pellagra with decreasing meat supply became. We ran 1,000 simulations at each level of meat supply misclassification. We also plotted 25 of these slopes (at 25% misclassification) to illustrate the variation in slopes in these simulations. Both studies predate the use of modern statistics; investigators compared incidence rates visually by category to determine whether a significant relationship existed. To simulate this visual trend test in 1,000 simulations, we considered a regression line with a slope indicating an average change of 9.1 cases/1,000 subjects between categories to indicate a noteworthy relationship. We picked 9.1/1,000 as the change the Public Health Service considered “a suggestion of an inverse correlation” (19, p. 691). All analyses were performed using R for Windows, version 2.15.3 (R Foundation for Statistical Computing, Vienna, Austria). The code for simulations can be found in Web Appendix 2. RESULTS
Application of the pellagra incidence rate observed by the Public Health Service to the Thompson-McFadden Commission data in Arkwright, Newry, Saxon Mills, and Whitney resulted in an estimated 30 excess cases of pellagra in each village. Assuming those cases were misclassified resulted in an estimated specificity of 98.6%. Figure 4 displays the pellagra incidence rates by misclassified meat supply in the Public Health Service results and in 25 simulations, with the median slope highlighted. Simulations incorporating measurement error resulted in notably
weaker associations between meat supply and pellagra. Figure 5 displays the proportion of simulations with a slope of 9.1 cases of pellagra per 1,000 subjects for each increase in meat category. At 20% of meat supply misclassified, half of the slopes were below the level the Public Health Service considered a “suggestive correlation.” DISCUSSION
The Thompson-McFadden Commission and the Public Health Service both investigated pellagra incidence in cotton mill villages in South Carolina in the 1910s using similar survey methods, yet they came to starkly different conclusions. We believe that 3 characteristics allowed the Public Health Service to correctly determine the relation of pellagra to diet. First, a carefully planned approach to data collection and measurement avoided misdiagnosis of ariboflavinosis as pellagra, prevented inaccuracy in dietary measurement, and accounted for variation in household composition. Our simulations showed that small amounts of exposure and disease misclassification sufficed to nullify the meat consumption– pellagra association. Although we have no evidence as to the rate of misclassification in the Thompson-McFadden Commission’s data, this may explain why the Commission found no association between pellagra and the consumption of foods we now know to contain niacin. Furthermore, this may explain why Edward Vedder, making roughly similar claims to those Goldberger made later, was not widely heeded. Second, the Public Health Service’s selection of Am J Epidemiol. 2014;180(3):235–244
Contrasting Historical Pellagra Investigations 243
villages with a wider range of diet and pellagra incidence augmented the advantage of superior accuracy in data by presenting stronger between-village contrasts. Third, flexibility of analytical thought to understand that multiple levels of influence could be examined allowed the Public Health Service to identify the proper contrasts within the data. A contemporary perspective also allows us to consider the Public Health Service investigation as an early example of multilevel epidemiologic analysis (40). Diet at the individual level is notoriously difficult to assess; instead, the Public Health Service leveraged contrasts in village-level food availability and in household composition to reconcile known patterns in disease incidence with a dietary hypothesis. By comparison, the Thompson-McFadden Commission collected data at multiple levels but did not leverage the multilevel potential of the data, instead using lack of between-village variation in geographical clustering of cases as an argument for an infectious hypothesis (16). In hindsight, and using modern terminology, we can say that, for the first time that we are aware of, the Public Health Service team led by Goldberger combined 2 main features: 1) containment of misclassification using carefully collected data on exposure and outcome and 2) a multilevel analysis including individual level and several group (households and villages) levels, with the understanding that the village was what we would now term an effect modifier of the relation of poverty and pellagra (mediated by diet). Had modern statistical tools been available, the Public Health Service might have considered both the “neighborhood” (i.e., village) association with the individual-level outcome and individuallevel risk factors within the same analysis. If the original Public Health Service data are ever recovered, it would be of interest to know whether modern multilevel analytical tools would provide any insight beyond the published analyses. Unfortunately, although the cotton mill village study fully convinced Goldberger of pellagra’s dietary etiology, his critics remained skeptical (1). Deciding that further epidemiologic studies would not convince them, Goldberger instead focused on finding the agent that prevented pellagra (2). Nearly 20 years after the cotton mill village studies, Elvehjem and Koehn (41) identified nicotinic acid (later renamed niacin) as the vitamin whose deficiency caused pellagra. Pellagra was fully eradicated when the niacin supplementation of flour became widespread in the early 1940s (42). CONCLUSIONS
The Public Health Service’s dedication to collecting accurate data and its insight into the contextual shaping of risk enabled the second cotton mill village investigation to succeed where the Thompson-McFadden Commission’s did not. The Public Health Service’s investigation is an excellent example of 2 key features of effective epidemiology: careful data collection and formulation of analyses that differentiate between competing hypotheses. We believe that Goldberger deserved the Nobel Prize he might have won had he lived longer (43); his approach to studying pellagra represents a model we would do well to follow today. Am J Epidemiol. 2014;180(3):235–244
ACKNOWLEDGMENTS
Author affiliations: Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, New York (Stephen J. Mooney, Justin Knox, Alfredo Morabia); and Barry Commoner Center for the Biology of Natural Systems, Queens College, City University of New York, New York, New York (Alfredo Morabia). This work was supported by the National Cancer Institute (grant T32 CA09529 to S.J.M.), the National Library of Medicine (grant 1G13LM010884-01A1 to A.M.), and the Columbia University EPIC Fund. We thank Chris Gray for checking the Goldberger archive at the University of North Carolina for data and Dr. Daniel Westreich for putting us in touch with Chris Gray. We thank Dr. Andrew Rundle, Dr. Sandro Galea, Dr. Sharon Schwartz, Dr. Jeffrey Haydu, and M. Katherine Mooney for their insightful comments on an earlier version of this work. Conflict of interest: none declared.
REFERENCES 1. Kraut AM. Goldberger’s War: the Life and Work of a Public Health Crusader. New York, NY: Farrar, Straus and Giroux; 2003. 2. Bollet AJ. Politics and pellagra: the epidemic of pellagra in the U.S. in the early twentieth century. Yale J Biol Med. 1992; 65(3):211–221. 3. Rajakumar K. Pellagra in the United States: a historical perspective. South Med J. 2000;93(3):272–277. 4. Marks HM. Epidemiologists explain pellagra: gender, race, and political economy in the work of Edgar Sydenstricker. J Hist Med Allied Sci. 2003;58(1):34–55. 5. Sambon LW. Remarks on the geographical distribution and etiology of pellagra. Br Med J. 1905;2(2341):1272–1275. 6. Funk C. The etiology of the deficiency diseases. J State Med. 1912;20:341–368. 7. Silver DR. Corn and pellagra: a contribution to our knowledge of their relation as probable cause and effect. JAMA. 1910; LIV(6):452–453. 8. Randolph JH. Notes on pellagra and pellagrins. With report of cases. Arch Intern Med. 1909;II(6):553–568. 9. Siler JF, Garrison PE, MacNeal WJ. Further studies of the Thompson-McFadden Pellagra Commission—a summary of the second progress report. J Am Med Assoc. 1914;LXIII(13): 1090–1093. 10. Lavinder CH. The salient epidemiological features of pellagra. Public Health Rep. 1911;26(39):1459–1468. 11. Grimm RM. Pellagra: a report on an epidemiologic study. Public Health Rep. 1912;27(8):255–264. 12. Grimm RM. Pellagra: A Report on its Epidemiology. Washington, DC: Government Printing Office; 1913. 13. Siler JF, Garrison PE, MacNeal WJ. Pellagra: a summary of the first progress report of the Thompson-McFadden Pellagra Commission. JAMA. 1914;LXII(1):8–12. 14. Siler JF, Garrison PE. An intensive study of the epidemiology of pellagra. Report of progress. Am J Med Sci. 1913;146(2):238–277. 15. Siler JF, Garrison PE. An intensive study of the epidemiology of pellagra. Report of progress. Am J Med Sci. 1913;146(1):42–66. 16. Siler JF, Garrison PE, MacNeal WJ. A statistical study of the relation of pellagra to use of certain foods and to location of domicile in six selected industrial communities. Arch Intern Med (Chic). 1914;XIV(3):293–373.
244 Mooney et al.
17. Siler JF, Garrison PE, MacNeal WJ. The relation of recurrent attacks of pellagra to race, sex and age of the patient and to treatment of the disease. Arch Intern Med (Chic). 1916; XVIII(5):652–691. 18. Goldberger J. Public Health Reports, June 26, 1914. The etiology of pellagra. The significance of certain epidemiological observations with respect thereto. Public Health Rep. 1914;29(26):1683–1686. 19. Goldberger J, Wheeler GA, Sydenstricker E. A study of the relation of diet to pellagra incidence in seven textile-mill communities of South Carolina in 1916. Public Health Rep. 1920;35(12):648–713. 20. Goldberger J, Waring CH, Willets DG, et al. The Treatment and Prevention of Pellagra. Washington, DC: Government Printing Office; 1914. 21. Anderson JF. An attempt to infect the rhesus monkey with blood and spinal fluid from pellagrins. Public Health Rep. 1911;26(26):1003–1004. 22. Goldberger J, Waring CH, Willets DG. The prevention of pellagra: a test of diet among institutional inmates. Public Health Rep. 1915;30(43):3117–3131. 23. Goldberger J, Wheeler GA. Experimental pellagra in the human subject brought about by a restricted diet. Public Health Rep. 1915;30(46):3336–3339. 24. Goldberger J. The transmissibility of pellagra: experimental attempts at transmission to the human subject. Public Health Rep. 1916;31(46):3159–3173. 25. Siler JF, Garrison PE, MacNeal WJ. Relation of pellagra to location of domicile in Spartan Mills, S. C., and the adjacent district. Arch Intern Med (Chic). 1917;XX(2):198–315. 26. Weinberg W, Von Gruber M. Die Kinder der Tuberkulösen. Leipzig, Germany: S. Hirzel; 1913. 27. Lane-Claypon JE. Report to the Local Government Board Upon the Available Data in Regard to the Value of Boiled Milk as A Food for Infants and Young Animals. London, United Kingdom: Her Majesty’s Stationary Office; 1912. 28. Goldberger J, Wheeler GA, Sydenstricker E. Pellagra incidence in relation to sex, age, season, occupation, and “disabling sickness” in seven cotton-mill villages of South Carolina during 1916. Public Health Rep. 1920;35(28):1650–1664. 29. Siler JF, Garrison PE, MacNeal WJ. The relation of methods of disposal of sewage to the spread of pellagra. Arch Intern Med (Chic). 1914;XIV(4):453–474. 30. Siler JF, Garrison PE, MacNeal WJ. Statistics of pellagra in Spartanburg Country, S. C., including geographical distribution
31.
32. 33.
34. 35. 36. 37. 38. 39. 40. 41. 42. 43.
44. 45.
of the disease and its relation to race, age, sex and occupation. Arch Intern Med. 1915;XV(1):98–120. Goldberger J, Wheeler GA, Sydenstricker E, et al. A study of the relation of factors of a sanitary character to pellagra incidence in seven cotton-mill villages of South Carolina in 1916. Public Health Rep. 1920;35(29):1701–1714. Jobling JW, Petersen W. The epidemiology of pellagra in Nashville, Tennessee. J Infect Dis. 1916;18(5):501–567. Goldberger J, Wheeler GA, Sydenstricker E. A study of the relation of family income and other economic factors to pellagra incidence in seven cotton-mill villages of South Carolina in 1916. Public Health Rep. 1920;35(46):2673–2714. Sydenstricker E. The Prevalence of Pellagra: Its Possible Relation to the Rise in the Cost of Food. Washington, DC: Government Printing Office; 1916. Sydenstricker VP. The history of pellagra, its recognition as a disorder of nutrition and its conquest. Am J Clin Nutr. 1958; 6(4):409–414. Carpenter KJ, Lewin WJ. A reexamination of the composition of diets associated with pellagra. J Nutr. 1985;115(5): 543–552. Vedder EB. Some further remarks on beri-beri. Am J Trop Dis Prev Med. 1914;1:826–847. Vedder EB. Dietary deficiency as the etiological factor in pellagra. Arch Intern Med (Chic). 1916;XVIII(2):137–172. Marks HM. Epidemiologists explain pellagra: gender, race, and political economy in the work of Edgar Sydenstricker. J Hist Med Allied Sci. 2003;58(1):34–55. Diez-Roux AV. Bringing context back into epidemiology: variables and fallacies in multilevel analysis. Am J Public Health. 1998;88(2):216–222. Elvehjem CA, Madden RJ, Strong FM, et al. Relation of nicotinic acid and nicotinic acid amide to canine black tongue. J Am Chem Soc. 1937;59(9):1767–1768. Park YK, Sempos CT, Barton CN, et al. Effectiveness of food fortification in the United States: the case of pellagra. Am J Public Health. 2000;90(5):727–738. Carpenter KJ. The Nobel Prize and the discovery of vitamins. http://www.nobelprize.org/nobel_prizes/themes/medicine/ carpenter/index.html. Published June 22, 2004. Accessed October 22, 2013. Stannus HS. Pellagra in Nyasaland. Trans R Soc Trop Med Hyg. 1913;7(1):32–56. MacNeal WJ. The alleged production of pellagra by an unbalanced diet. JAMA. 1916;LXVI(13):975–977.
Am J Epidemiol. 2014;180(3):235–244
SEMINARIO N°6
VISIÓN GENERAL DEL DISEÑO, CONDUCCIÓN Y REPORTE DE UNA INVESTIGACIÓN EN MEDICINA. Clark GT, Mulligan R. Fifteen common mistakes encountered in clinical research. J Prosthodont Res. 2011; 55(1): 1-6.
Preguntas para el control de lectura y guía de discusión grupal 1.
En medicina, cuando se reporta una investigación como un artículo científico se sigue usualmente las recomendaciones del Comité Internacional de Editores de Revistas Médicas (ICMJE). En éstas se plantean cuatro grandes secciones: introducción, metodología, resultados y discusión. Complete el siguiente cuadro, indicando la correspondencia entre los errores reportados por los autores de la lectura del seminario y las secciones de un artículo de investigación donde debería plasmarse una adecuada redacción o reporte de estos puntos. Cada punto, podría o no, ser tratado en más de una sección. Seleccione las secciones en las cuales hay una correspondencia mayor.
Error
Sección del artículo de investigación donde abordar este tema Introducción Metodología Resultados Discusión
1. No examinar la literatura para encontrar investigaciones previas similares. 2. No realizar una lectura crítica de los antecedentes. 3. No especificar los criterios de inclusión y exclusión. 4. No determinar ni reportar los errores de los métodos de medición. 5. No especificar los supuestos estadísticos empleados en el análisis. 6. No realizar un análisis del tamaño de muestra previo a la ejecución del estudio. 7. No implementar medidas de control adecuadas frente a los sesgos. 8. No redactar ni seguir un cronograma detallado. 9. No lograr un reclutamiento ni retención adecuada de los sujetos. 10. No contar con un protocolo escrito, detallado y revisado. 11. No examinar la normalidad estadística de los datos. 12. No reportar datos ni sujetos perdidos, ni realizar un análisis por intención a tratar. 13. No realizar ni reportar un análisis de la potencia. 14. No reportar las debilidades del estudio. 15. No comprender ni usar correctamente el lenguaje científico.
44
METODOLOGÍA DE LA INVESTIGACIÓN I
2.
El tercer error señalado por los autores es la no especificación de los criterios de inclusión y exclusión de los sujetos de estudio. ¿Cuán importantes son estos criterios en la aplicación clínica de los hallazgos de investigaciones sobre diagnóstico, tratamiento o pronóstico de una enfermedad?
3.
¿Una mala medición de las variables de investigación puede llevar a que un estudio científico finalmente no tenga validez? Brinde un ejemplo histórico que muestre este hecho (diferentes a los tratados en seminarios previos).
4.
Para esta pregunta solamente emplee su razonamiento (no consulte textos de metodología de la investigación ni bioestadística). Situación 1: Usted considera que la diferencia en la eficacia entre dos tratamientos es amplia: 80% de eficacia del tratamiento A frente a 50% de eficacia del tratamiento B. Situación 2: Usted considera que la diferencia en la eficacia entre dos tratamientos es mínima: 70% de eficacia del tratamiento A frente a 65% de eficacia del tratamiento B. ¿En qué situación se requiere un mayor tamaño de muestra para hallar un resultado significativo en una investigación? ¿Por qué?
Preguntas adicionales a ser discutidas en la sesión grupal 1.
¿Qué finalidad tiene el “doble ciego”?
2.
¿Qué característica del conocimiento científico, según lo tratado en las sesiones de teoría del curso, se garantiza al contar con un protocolo de investigación detallado y revisado? ¿Por qué es importante esto?
3.
El último error señalado por los autores es el uso no adecuado del lenguaje científico. ¿Qué repercusiones tiene esta situación cuando se desea explicar al público general los hallazgos de investigaciones?
45
METODOLOGÍA DE LA INVESTIGACIÓN I
Available online at www.sciencedirect.com
Journal of Prosthodontic Research 55 (2011) 1–6 www.elsevier.com/locate/jpor
Review
Fifteen common mistakes encountered in clinical research Glenn T. Clark DDS, MSa,1,*, Roseann Mulligan DDS, MSb,1 a
Orofacial Pain and Oral Medicine Center, Herman Ostrow School of Dentistry, University of Southern California, Los Angeles, CA 90089-0641, USA b Community Dentistry Programs and Hospital Affairs, Herman Ostrow School of Dentistry, University of Southern California, Los Angeles, CA, USA Received 9 August 2010; accepted 23 August 2010 Available online 20 November 2010
Abstract The baseline standards for minimally acceptable science are improving as the understanding of the scientific method improves. Journals publishing research papers are becoming more and more rigorous. For example, in 2001 a group of authors evaluated the quality of clinical trials in anesthesia published over a 20 year period [Pua et al., Anesthesiology 2001;95:1068–73]. The authors divided the time into 3 subgroups and analyzed and compared the quality assessment score from research papers in each group. The authors reported that the scientific quality scores increased significantly in this time, showing more randomization, sample size calculation and blinding of studies. Because every journal strives to have a high scientific impact factor, research quality is critical to this goal. This means novice researchers must study, understand and rigorously avoid the common mistakes described in this review. Failure to do so means the hundreds and hundreds of hours of effort it takes to conduct and write up a clinical trial will be for naught, in that the manuscript with be rejected or worse yet, ignored. All scientists have a responsibility to understand research methods, conduct the best research they can and publish the honest and unbiased results. # 2010 Japan Prosthodontic Society. Published by Elsevier Ireland. Open access under CC BY-NC-ND license.
Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
Failure to carefully examine the literature for similar, prior research . . . . . . . . Failure to critically assess the prior literature . . . . . . . . . . . . . . . . . . . . . . . . Failure to specify the inclusion and exclusion criteria for your subjects . . . . . . Failure to determine and report the error of your measurement methods. . . . . . Failure to specify the exact statistical assumptions made in the analysis. . . . . . Failure to perform sample size analysis before the study begins . . . . . . . . . . . Failure to implement adequate bias control measures . . . . . . . . . . . . . . . . . . . Failure to write and stick to a detailed time line . . . . . . . . . . . . . . . . . . . . . . Failure to vigorously recruit and retain subjects. . . . . . . . . . . . . . . . . . . . . . . Failure to have a detailed, written and vetted protocol . . . . . . . . . . . . . . . . . . Failure to examine for normality of the data . . . . . . . . . . . . . . . . . . . . . . . . . Failure to report missing data, dropped subjects and use of an intention to treat Failure to perform and report power calculations . . . . . . . . . . . . . . . . . . . . . . Failure to point out the weaknesses of your own study. . . . . . . . . . . . . . . . . . Failure to understand and use correct scientific language . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
* Corresponding author. E-mail address: gtc@usc.edu (G.T. Clark). 1 Visiting Professors, School of Dentistry, Showa University, Tokyo, Japan.
....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... analysis . ....... ....... ....... .......
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 6
This review of the literature will describe the 15 common mistakes that novice researchers often make when planning conducting and writing up a clinical research project. These mistakes are usually made during the design phase; but might also be made during the data collection, analysis or manuscript
1883-1958 # 2010 Japan Prosthodontic Society. Published by Elsevier Ireland. Open access under CC BY-NC-ND license. doi:10.1016/j.jpor.2010.09.002
2
G.T. Clark, R. Mulligan / Journal of Prosthodontic Research 55 (2011) 1–6
preparation phases. In addition, hints on how to improve a research project and publication are suggested. 1. Failure to carefully examine the literature for similar, prior research All research begins with the idea or question. What young or novice researchers often fail to appreciate is that the questions they take an interest in are likely not to be new, but are actually questions that others have thought of and frequently have made attempts to investigate in the past. The way to avoid this mistake is to assume that the question of interest has already been studied and the first job in the research design process is to exhaustively pursue, find and then catalog what has been published. Of course, the novice researcher may have a new variation of the question, or they may be using a new methodology or examining a new population of patients, but it should always be assumed that the core question in some form is likely to have been addressed previously. It now becomes the novice investigator’s job to find that information, and consider the positive and negative outcomes of the prior studies in the new research design development. HINT 1: When selecting and refining the exact focus of a question it is critically important for the novice to read in detail the discussion section of similar articles, for in that portion of the paper, most researchers speculate on what needs to be accomplished next in that topical area to advance the science. 2. Failure to critically assess the prior literature Once a wise novice researcher has systematically accumulated and categorized the literature concerning the question of interest, the next step is to carefully examine the research papers related to the question of interest to find out what prior researchers felt could have been improved. One strategy to achieve this is to put together a team or group of research colleagues and select the 10–15 most important articles on a topic for the team to review. Ask each member of the team to present a critical analysis of the literature assigned, presenting both the good and bad points. Developing an individual’s critical analysis skills will aid novices greatly in designing studies that minimize error. Not only is it necessary to critically analyze the literature before designing a new research project, but it is necessary to include these critical remarks in the introductory section of the resulting final manuscript in order to justify why the study was needed and what you as a researcher did better than previous researchers. HINT 2: There is an old adage that says: ‘‘those who forget history are doomed to repeat it’’ and it is applicable to research as well. Investigators who repeat work previously done and do not recognize and build on prior efforts are likely to find their work unpublishable. 3. Failure to specify the inclusion and exclusion criteria for your subjects A common omission from many research papers is the lack of research subject specifications, namely the inclusion
and exclusion criteria. Listing these criteria helps other researchers understand why current results might differ from other published studies. For example your patient population might be younger or your patient population might be from a different racial group or have a different ratio of males to females than were used in other research studies. In any case, it is necessary to specify as best you can the make-up of your subjects. This includes specific criteria for exclusion if you have any. Once you have the inclusion and exclusion criteria, be sure that you actually follow these criteria in selecting subjects for your study. HINT 3: If a novice researcher is not sure how to develop a list of inclusion and exclusion criteria for a specific research question, look at prior research and use criteria that other researchers have specified.
4. Failure to determine and report the error of your measurement methods Very few research reports actually provide more than a single sentence saying their examiners were calibrated. They rarely specify the method of training, the standards of performance and the frequency of re-assessment of their putatively calibrated examiners. All methods need replication and every researcher who is attempting the research project needs to be able to answer the question, ‘‘What is the error of your measurement method?’’ Some researchers refer to prior publications when answering this question but a good researcher knows the exact error of his/her own measurement methods and the inter-examiner variation. To find this error value involves conducting a small test-retest experiment. If however a researcher is using multiple examiners to help collect data, these examiners need to be calibrated to a known standard before being given the go-ahead to begin making measurements. If the research project is a long-term project, i.e. lasting for many months or years, it is critical to have examiners who are calibrated and re-calibrated periodically to an accepted standard of performance. Often extensive, complex and difficult studies fail because of the lack of detail to this small issue. A 2001 article examined the effects of measurement error on therapeutic equivalence trials and reported that measurement errors inappropriately favor the goal of showing treatment equivalence [1]. Essentially, this article reported on how imprecise data makes it difficult to tell if there are any real differences between two methods or two treatments. Such imprecision is a disadvantage if your goal is to evaluate that a new method of treatment is better than the old method; however, if you want to show that the new method or treatment is equivalent to or as good as the old treatment then imprecise data benefits this goal of showing equivalence or nonsuperiority. Another study in 2008 examined the frequency and characteristics of data entry errors in large clinical databases [2]. These authors reported that error rates ranged from 2.3 to 26.9%, with the errors being not just mistakes in data entry but many non-random, clusters that could potentially affect the study outcome.
G.T. Clark, R. Mulligan / Journal of Prosthodontic Research 55 (2011) 1–6
HINT 4: A good researcher might even make the calibration process an independent research endeavor that could result in a publication of the process in a scientific journal.
3
2 treatment groups, the sample size required to show significance goes up substantially. 7. Failure to implement adequate bias control measures
5. Failure to specify the exact statistical assumptions made in the analysis Since most studies will include statistical analysis of the data, specifying the level of significance (called the alpha level) that is acceptable and the exact statistical tests methods used is common place. However, rarely do you see the authors stating what they used as their beta value (type II error) which indicates their chance of a type II error (usually beta is 0.2 or less). The reciprocal of beta (1minus beta) is then converted to a percent and reported as the power of a study (usually 80%). Novice researchers often do not state the directionality of the testing that they perform, namely whether they are using a one-tailed or two-tailed analysis. In 2007, an excellent review of the literature was published which cataloged and described 47 specific statistical mistakes that are commonly made in the medical literature [3]. These authors strongly suggested involving a statistical consultant early in a study as a way to prevent some of these common mistakes. HINT 5: Providing statistical test assumption details gives the reader/reviewer the sense that the authors are attentive to detail and honest in describing the research process and the lack of such detail implies the opposite. 6. Failure to perform sample size analysis before the study begins Most clinical trials that claim two methods are equivalent (or non-superior) are underpowered, which means they have too few subjects. To avoid this mistake, prior to initiation of a research project, it is important to know how many subjects are needed to achieve the minimum power level desired. There are multiple online and commercial computer based programs that will, with minimum information, provide the user with both the power and the estimated group sample size. To achieve sample size analysis it is necessary to understand the nature of the data that is to be collected, i.e. is the data linear or non-linear. It is also necessary to have a reasonable estimate of what effect the intervention will be, called the effect size. Finally it is essential to understand the variability of data collected. Without knowing the variability of the data, the effect size, and the power that is expected, it is impossible to estimate sample size, but with these data sample size estimation can easily be achieved. In a 2001 paper, the topic of equivalency testing and sample size in dental clinical trials was examined [4]. Specifically these researchers examined studies that compared the efficacy of dentures supported by 2 implants versus dentures supported by 4 implants. Such a study design is called an equivalency study. If the 2 methods are found to be equivalent, then one would logically recommend the use of the simpler and less expensive method. The authors found that underpowering a study makes it easier to find equivalency. HINT 6: For linear data, if the standard deviation is quite a bit larger (e.g. 2–3 times larger) than the difference between the
The single most important mistake that clinical researchers make is the failure to implement adequate bias control measures. Bias control is what distinguishes good from bad research and measures to control for bias include: randomization of subjects to the areas, interventions and control conditions: measurement and analysis of subjects with the investigators blind to the subject status; and having a credible control condition and verifying at the onset and along the way that the subject is truly blind to the group to which they were assigned. This process is called a blinding status check. Doubleblinding of researchers and subjects is desirable in a clinical trial to decrease bias. When blinding is not used or when the subject group status is easily detected, subjects will generally try to fulfill the perceived expectations of the researcher. The issue of expectation fulfillment was first pointed out in a study in Hawthorne, Michigan at an electronics plant [5]. The experimenters varied the intensity of electrical lighting available in the plant to see if there was a cause and effect relationship between work productivity and light intensity. Fortunately they varied the electric lighting in both directions, increasing the intensity and decreasing the intensity. What they discovered is that whenever an experiment was being conducted, work productivity increased; thus the phrase ‘‘the Hawthorne Effect’’ entered our scientific lexicon. This term means that any subject is likely to perform to the investigator’s expectations if they are not blind to their status. In 2001 a study examined the influence of study size on study outcome [6]. Specifically a meta-analysis reviewed 190 randomized trials involving 8 different therapeutic interventions divided the various studies into those with more than 1000 participants and those with less than thousand participants. The results of this analysis were that the smaller sized studies had more positive therapeutic effects than those studies with the larger size. These researchers also reported that the larger studies were systematically less likely to report a positive effect, suggesting bias was easier to occur and have an impact in smaller studies. These researchers also looked at other bias control measures such as randomization and blinding and concluded that inadequate randomization and blinding leads to exaggerated estimates of the intervention’s benefit. HINT 7: Patient’s are remarkably able to detect to which group they have been assigned even though the blinding measures have been implemented; therefore good studies always perform periodic blinding checks. 8. Failure to write and stick to a detailed time line A detailed timeline or Gantt chart is an essential feature to include in a protocol of a clinical trial. These charts can be created using a Microsoft Office Excel spreadsheet and every step of the trial should be noted in the timeline. The problem often seen with novice researchers is that they lack experience
4
G.T. Clark, R. Mulligan / Journal of Prosthodontic Research 55 (2011) 1–6
and cannot estimate realistically the time needed to achieve a specific task. Nevertheless, a timeline is a critical and important overall feature in clinical studies, and failure to create and follow the timeline is a common mistake that is frequently made in clinical research. HINT 8: Good researchers make a timeline plan that includes critical benchmarks along the way, they post it on the wall for everyone to see and they stick to it! 9. Failure to vigorously recruit and retain subjects Clinical research implies that human subjects will be involved in the study. Subjects must be identified and recruited and a plan for this recruitment process needs to be developed and written down. A 2009 study actually compared 3 methods of subject recruitment and reported that direct telephone calls to the patient by the investigator were the most effective method [7]. Failure to have a specific recruitment plan and a method for retaining subjects in the study is a common mistake. Moreover, since subject recruitment is often a major issue in research studies, there should be more than one plan for subject recruitment. HINT 9: Well designed research often fails because of poor subject recruitment and retention procedures so make this a priority. 10. Failure to have a detailed, written and vetted protocol Before you begin any research project, especially clinical research, a fully developed protocol is critical. Novice researchers often begin research without completing the protocol. Moreover, in addition to writing the protocol, the researcher needs to present the protocol to a peer group, hopefully a peer group with moderate research experience, with the request that the group provide critical comments and suggestions for improvement. There is an old saying ‘‘luck favors the well prepared’’. In the field of research, being well prepared means a well thought out, detailed written protocol is available and consulted frequently during the conduct of the clinical research project. Once the second phase of the research project starts, the data analysis phase, it is critical that an appropriate statistical methodology be selected and implemented to effectively analyze the data. Typically an experienced clinical researcher will consult a statistician for advice both before beginning the research and after the data has been collected. In the research phase a statistician is critical in helping to conceptualize the analytical methodology that should be used. Ideally the consultation with the statistician needs to continue as the data is being collected and prior to final analysis of the data. In many ways, the statistician serves as an outside auditor attesting to the diligence and honesty of the research process and analysis. It is not uncommon that the data that was planned to be collected, changes for pragmatic and unexpected reasons. This means the analytical plan may need to be adjusted. Although statistical software programs have improved
immensely in the last 10 years, no software program can make up for inappropriate or inexact design of a research project so consultation with an experienced statistician is almost always a necessity. In 2001, a review paper was written which discussed the topic of optimal clinical research design for chronic pain drug efficacy studies [8]. The authors made a list of suggestions that researchers should consider when they design and conduct such studies, but in their conclusions, they strongly suggested that a biostatistician consultant be used throughout all phases of the clinical trial. HINT 10: The adage that is applicable here is: ‘‘the devil is in the details!’’ This saying refers to the fact that getting a general understanding and agreement that a project will be conducted is not enough. A researcher must also achieve a thorough understanding and agreement on the specifics of the project, which must be adequately documented or it can easily fail. 11. Failure to examine for normality of the data In the analytic phase, it is important to examine the data that has been collected to see if it is normally distributed. Normality is a concept that applies to continuous linear data and is not applicable to categorical or non-linear dichotomous data. There are statistical programs that will take a data set and examine whether it meets the standards of normality. Data that is unevenly distributed about the mean can sometimes be transform into more equally distributed data by using a log or log–log transformation The advantage of transforming the data is that it allows you to continue using parametric statistical methods, as opposed to using non-parametric statistical analysis methods. In general, parametric statistical analysis is a more sensitive method (i.e. has more statistical power) and is preferred over that used to analyze non-parametric data. HINT 11: A researcher should always look at the raw data obtained from the study displayed graphically since this demonstrates areas where there are problems with the data. The goal is to see if a histogram of the data demonstrates a bellshaped curve or some other figure. 12. Failure to report missing data, dropped subjects and use of an intention to treat analysis Statistical consultants will most likely recommend analytical methods that are consistent with an intention to treat methodology. This methodology deals with dropouts. Often novice researchers exclude dropouts from the analysis, and this can alter the conclusions of the study. Regardless of the method of analysis used, it is critical to report all dropped data, missing data, and subject dropouts in a careful and honest fashion. How the project dealt with lost or dropped data must be included in the methods section of the research report. Clinical trials that involve complicated, difficult or prolonged protocols often suffer from subject dropout. Many researchers will implement inclusion and exclusion criteria that reasonably eliminate the non-compliant patient. For example exclusion criteria might
G.T. Clark, R. Mulligan / Journal of Prosthodontic Research 55 (2011) 1–6
specify that: ‘‘subjects that did not complete the health history questionnaire will be excluded from this study’’ or ‘‘subjects that failed to appear for more than one follow-up visit will be excluded’’. Sometimes researchers will see the potential clinical subjects more than once during the pre-enrollment phase to determine their eligibility. This pre-enrollment phase frequently is referred to as the run-in phase. A run-in phase in a clinical study is an advantage in that it is easier to identify subjects who are likely to be non-compliant with the protocol and would best be excluded before enrollment. Clearly such a strategy would result in fewer dropouts, which is highly desirable. Unfortunately, run-in designs with many exclusions, make the results less generalizeable to the real world population of subjects. Often such trade-offs are made between practicality, and idealism in design. In 1998 a small study was published describing the advantages and disadvantages of a run-in phase to a research protocol [9]. The authors concluded that run in clinical trials overestimate the benefits and underestimate the risks of treatment. HINT 12: If you have to choose between excluding subjects and having many drop-outs, always choose excluding. 13. Failure to perform and report power calculations Novice researchers often fail to perform a power calculation on their study. Such a calculation is critical in studies of equivalency. Small studies with low power often find no significant differences between the treatment interventions, however, if the study was inadequately powered then a type II error is more likely. A type II error is the acceptance of a false negative hypothesis. There are in fact multiple software programs that allow researchers to determine the power of their results. In 2001 an article examined how often underpowered reports of equivalency occurred in the surgical literature [10]. Specifically these authors looked at randomized controlled trials, where the control treatment was an active intervention, usually the standard treatment of the day. In these studies a new treatment was compared to the standard treatment and considered to be equal to the standard treatment if the results were equivalent. These researchers looked at 90 randomized controlled trials in the surgical literature and found that 39% of these reports met the standards for equivalency. The other 61% of the reports were typically underpowered and thus subject to a type II error. In 2001 another paper, examined type II error rates in the orthopedic trauma literature [11]. Similar to the results published in the prior study, 90% of this literature was underpowered with the overall power calculated for the 117 papers reviewed being 25%. The standard acceptable power in a study is 80% and therefore the authors concluded that many type II errors were likely to continue to occur in the orthopedic literature thereby affecting critical future research. Type II errors occur because there are too few subjects, but they also occur because there are too many measurements made on too few subjects. If you measure two groups of subjects twice, it is likely that some of the measurements taken on the second occasion will be different. It is also possible to show that the differences are indeed statistically different, if no downward
5
adjustments are made to the level of significance to compensate for the fact that there were multiple measurements. One example of spurious associations being made is in the field of genetic polymorphisms. In 2007 one researcher examined why so many statistically significant associations between diseases in genetic polymorphisms are not replicated in future studies [12]. Specifically this paper looked at 10 single nucleotide polymorphisms or SNPs of the COMT gene that have been associated with various specific diseases. The authors concluded that false positive findings are commonplace and initial associations between genetic SNPs and diseases must be interpreted with high caution, since they are frequently not replicated. In 2006, a group of researchers conducted a meta-analysis on the topic of false positive gene associations, specifically those associated with human lymphocyte disease [13]. These researchers suggested that a median sample size of over 3500 subjects was necessary to avoid false positive results. They went on to state that collaborative studies seem like a logical approach for collecting large data sets like this, since individual researchers often do not have the resources to gather such a large data set themselves. A 2010 paper suggested a statistical standard be developed before initial results are accepted [14]. This paper suggested that a true report probability (TRP) score be developed based on data from multiple studies. The authors suggested that the suggested TRP formula would be straightforward and appropriate and help distinguish spurious results from true results. HINT 13: Remember that ‘‘associations never prove causality.’’ This is certainly appropriate when trying to link genetic polymorphisms and disease, so replicate, replicate, and replicate. 14. Failure to point out the weaknesses of your own study In the last phase of a clinical trial, the results are written in a manuscript form and submitted for review. Many novice researchers fail to point out the weaknesses of their own study in the discussion section of their manuscript. This is often reason for rejection of the manuscript. HINT 14: In general hiding your mistakes or obfuscating them with the hope that no one will notice is not a good policy. Keep in mind that ‘‘honesty is the best policy’’ holds here as well. 15. Failure to understand and use correct scientific language Finally all researchers, experienced and novices must use the correct scientific language when describing their results. Specifically, a single study never proves that a hypothesis is true; it can only reject the null hypothesis. While most people are not comfortable using such cautionary language, this is the correct scientific language. This understanding begins with studying a good statistical textbook which focuses on clinical research design [15]. Actually very few research manuscripts formally state the null hypothesis in the method section, and then
6
G.T. Clark, R. Mulligan / Journal of Prosthodontic Research 55 (2011) 1–6
formally reject or accept the null hypothesis in the discussion section, but when this is done it shows a true understanding of scientific research and the limitations of the scientific method. HINT 15: If you want to be a good researcher, you must study and understand the nuances of the language associated with the scientific process and only by doing this will you also understand the limitations of this process. References [1] Kim MY, Goldberg JD. The effects of outcome misclassification and measurement error on the design and analysis of therapeutic equivalence trials. Stat Med 2000;20:2065–78. [2] Goldberg SI, Niemierko A, Turchin A. Analysis of data errors in clinical research databases. AMIA Annu Symp Proc 2008;6:242–6. [3] Strasak AM, Zaman Q, Pfeiffer KP, Göbel G, Ulmer H. Statistical errors in medical research—a review of common pitfalls. Swiss Med Wkly 2007;137:44–9. [4] Burns DR, Elswick Jr RK. Equivalence testing with dental clinical trials. J Dent Res 2001;80:1513–7. [5] Gale EAM. The Hawthorne studies—a fable for our times? Q J Med 2004;97:439–49. [6] Gluud LL, Thorlund K, Gluud C, Woods L, Harris R, Sterne JA. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Ann Intern Med 2001;135:982–9.
[7] Schroy 3rd PC, Glick JT, Robinson P, Lydotes MA, Heeren TC, Prout M, et al. A cost-effectiveness analysis of subject recruitment strategies in the HIPAA era: results from a colorectal cancer screening adherence trial. Clin Trials 2009;6:597–609. [8] Harden RN, Bruehl S. Conducting clinical trials to establish drug efficacy in chronic pain. Am J Phys Med Rehabil 2001;80:547–57. [9] Pablos-Méndez A, Barr RG, Shea S. Run-in periods in randomized trials: implications for the application of results in clinical practice. JAMA 1998;279:222–5. [10] Dimick JB, Diener-West M, Lipsett PA. Negative results of randomized clinical trials published in the surgical literature: equivalency or error? Arch Surg 2001;136:796–800. [11] Lochner HV, Bhandari M, Tornetta 3rd P. Type-II error rates (beta errors) of randomized trials in orthopaedic trauma. J Bone Joint Surg Am 2001;83:1650–5. [12] Sullivan PF. Spurious genetic associations. Biol Psychiatry 2007;61: 1121–6. [13] Ioannidis JP, Trikalinos TA, Khoury MJ. Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am J Epidemiol 2006;164: 609–14. [14] Weitkunat R, Kaelin E, Vuillaume G, Kallischnigg G. Effectiveness of strategies to increase the validity of findings from association studies: size vs. replication. BMC Med Res Methodol 2010;10:47. [15] Hulley SB, Cummings SR, Browner WS, Grady DG, Newman TB. Designing clinical research, 3rd ed., Philidelphia: Lippincott, Williams and Wilkins; 2007. p. 51–63.
SEMINARIO N°7
ERRORES SISTEMÁTICOS: EL SESGO DE MEDICIÓN. Althubaiti A. Information bias in health research: definition, pitfalls, and adjustment methods. J Multidiscip Healthc. 2016; 9: 211-7.
Preguntas para el control de lectura y guía de discusión grupal 1.
2.
3.
Una investigación busca establecer la prevalencia de consumo de drogas en estudiantes universitarios, y para ello se aplicará una entrevista en la que se preguntará sobre el consumo pasado y actual de bebidas alcohólicas, tabaco, marihuana, cocaína y éxtasis. a.
¿Qué tipo de sesgo de información podría presentarse en esta situación?
b.
¿Qué alternativas podría plantearse para mejorar la calidad en la recolección de estas variables en el marco del objetivo de esta investigación? Sustente su respuesta.
En un estudio casos y controles se investigó la asociación entre malformaciones congénitas (variable dependiente o resultado) y el consumo de bebidas energizantes durante la gestación (variable independiente o factor). Para ello, los investigadores seleccionaron un grupo de puérperas cuyos recién nacidos presentaron algún tipo de malformación congénita (grupo casos) y otro grupo de puérperas cuyos recién nacidos no presentaron malformación alguna (grupo control). A todas las participantes se les preguntó sobre el consumo de bebidas energizantes durante la gestación (tipo de bebida, frecuencia, cantidad, tiempo de consumo). a.
¿Qué tipo de sesgo de información podría presentarse en esta situación?
b.
Este sesgo, ¿sería diferencial o no diferencial? ¿Por qué?
c.
¿Cuál sería la consecuencia, respecto a la asociación que se desea determinar, del sesgo presentado en este estudio? Sustente su respuesta.
Para diagnosticar anemia se debe determinar el nivel de hemoglobina (Hb) en sangre de una persona. Cuando se realizan trabajos de base poblacional, una técnica usual para esta determinación es el empleo de un hemoglobinómetro portátil. En una investigación se adquirió uno de estos equipos, el cual estaba descalibrado (realizaba una lectura superior al nivel real de Hb). Los investigadores no se percataron de este hecho y llevaron a cabo el trabajo de campo. La investigación concluyó y se reportó la prevalencia de anemia en la población de estudio. a.
¿Qué tipo de sesgo de información podría presentarse en esta situación?
b.
La situación descrita, ¿qué efecto habrá tenido en la estimación de la prevalencia en el estudio?
c.
¿Cómo podría preverse situaciones como la presentada?
52
METODOLOGÍA DE LA INVESTIGACIÓN I
Preguntas adicionales a ser discutidas en la sesión grupal 1.
En la lectura se discute sobre el denominado sesgo de confirmación. ¿Este sesgo le recuerda a alguna investigación que hayamos discutido en las sesiones previas de los seminarios? ¿Cuál? Sustente su respuesta.
2.
En la página 214 los autores mencionan los “estudios piloto”. ¿En qué consiste un “estudio piloto”? ¿Cuáles son sus finalidades?
3.
En la introducción del artículo se menciona que los sesgos pueden ser introducidos intencionalmente por los investigadores. ¿Qué implicancias éticas tiene esta acción?
53
METODOLOGÍA DE LA INVESTIGACIÓN I
Journal of Multidisciplinary Healthcare
Dovepress open access to scientific and medical research
RevIew
Open Access Full Text Article
Information bias in health research: deinition, pitfalls, and adjustment methods
Alaa Althubaiti Department of Basic Medical Sciences, College of Medicine, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
Abstract: As with other fields, medical sciences are subject to different sources of bias. While understanding sources of bias is a key element for drawing valid conclusions, bias in health research continues to be a very sensitive issue that can affect the focus and outcome of investigations. Information bias, otherwise known as misclassification, is one of the most common sources of bias that affects the validity of health research. It originates from the approach that is utilized to obtain or confirm study measurements. This paper seeks to raise awareness of information bias in observational and experimental research study designs as well as to enrich discussions concerning bias problems. Specifying the types of bias can be essential to limit its effects and, the use of adjustment methods might serve to improve clinical evaluation and health care practice. Keywords: self-report bias, social desirability bias, recall bias, misclassification, measurement error bias, confirmation bias
Introduction
Correspondence: Alaa Althubaiti Department of Basic Medical Sciences, College of Medicine, King Saud bin Abdulaziz University for Health Sciences, Mail Code: 3127, Riyadh 11481, PO Box 3660, Saudi Arabia Tel +966 1 1429 9999 Email thubaitia@ksau-hs.edu.sa
Bias can be defined as any systematic error in the design, conduct, or analysis of a study. In health studies, bias can arise from two different sources; the approach adopted for selecting subjects for a study or the approach adopted for collecting or measuring data from a study. These are, respectively, termed as selection bias and information bias.1 Bias can have different effects on the validity of medical research findings. In epidemiological studies, bias can lead to inaccurate estimates of association, or over- or underestimation of risk parameters. Allocating the sources of bias and their impacts on final results are key elements for making valid conclusions. Information bias, otherwise known as misclassification, is one of the most common sources of bias that affects the validity of health research. It originates from the approach that is utilized to obtain or confirm study measurements. These measurements can be obtained by experimentation (eg, bioassays) or observation (eg, questionnaires or surveys). Medical practitioners are conscious of the fact that the results of their investigation can be deemed invalid if they do not account for major sources of bias. While a number of studies have discussed different types of bias,2–4 the problem of bias is still frequently ignored in practice. Often bias is unintentionally introduced into a study by researchers, making it difficult to recognize, but it can also be introduced intentionally. Thus, bias remains a very sensitive issue to address and discuss openly. The aim of this paper is to raise the awareness of three specific forms of information bias in observational and experimental medical research study designs. These are self-reporting bias, and the
submit your manuscript | www.dovepress.com
Journal of Multidisciplinary Healthcare 2016:9 211–217
Dovepress
© 2016 Althubaiti. This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License (http://creativecommons.org/licenses/by-nc/3.0/). By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms (https://www.dovepress.com/terms.php).
http://dx.doi.org/10.2147/JMDH.S104807
211
Althubaiti
Dovepress
often-marginalized measurement error bias, and confirmation bias. We present clear and simple strategies to improve the decision-making process. As will be seen, specifying the type of bias can be essential for limiting its implications. The “Self-reporting bias” section discusses the problem of bias in self-reporting data and presents two examples of selfreporting bias, social desirability bias and recall bias. The “Measurement error bias” section describes the problem of measurement error bias, while the “Confirmation bias” section discusses the problem of confirmation bias.
Self-reporting bias Self-reporting is a common approach for gathering data in epidemiologic and medical research. This method requires participants to respond to the researcher’s questions without his/her interference. Examples of self-reporting include questionnaires, surveys, or interviews. However, relative to other sources of information, such as medical records or laboratory measurements, self-reported data are often argued to be unreliable and threatened by self-reporting bias. The issue of self-reporting bias represents a key problem in the assessment of most observational (such as crosssectional or comparative, eg, case–control or cohort) research study designs, although it can still affect experimental studies. Nevertheless, when self-reporting data are correctly utilized, they can help to provide a wider range of responses than many other data collection instruments.5 For example, self-reporting data can be valuable in obtaining subjects’ perspectives, views, and opinions. There are a number of aspects of bias that accompany selfreported data and these should be taken into account during the early stages of the study, particularly when designing the self-reporting instrument. Bias can arise from social desirability, recall period, sampling approach, or selective recall. Here, two examples of self-reporting bias are discussed: social desirability and recall bias.
Social desirability bias When researchers use a survey, questionnaire, or interview to collect data, in practice, the questions asked may concern private or sensitive topics, such as self-report of dietary intake, drug use, income, and violence. Thus, self-reporting data can be affected by an external bias caused by social desirability or approval, especially in cases where anonymity and confidentiality cannot be guaranteed at the time of data collection. For instance, when determining drug usage among a sample of individuals, the results could underestimate the exact usage. The bias in this case can be referred to as social desirability bias.
212
submit your manuscript | www.dovepress.com
Dovepress
Overcoming social desirability bias The main strategy to prevent social desirability bias is to validate the self-reporting instrument before implementing it for data collection.6–11 Such validation can be either internal or external. In internal validation, the responses collected from the self-reporting instrument are compared with other data collection methods, such as laboratory measurements. For example, urine, blood, and hair analysis are some of the most commonly used validation approaches for drug testing.12–14 However, when laboratory measurements are not available or it is not possible to analyze samples in a laboratory for reasons such as cost and time, external validation is often used. There are different methods, including medical record checks or reports from family or friends to examine externally the validity of the self-reporting instrument.12,15 Note that several factors must be accounted for in the design and planning of the validation studies, and in some cases, this can be very challenging. For example, the characteristics of the sample enrolled in the validation study should be carefully investigated. It is important to have a random selection of individuals so that results from the validation can be generalized to any group of participants. When the sampling approach is not random and subjective, the results from the validation study can only apply to the same group of individuals, and the differences between the results from validation studies and self-reporting instruments cannot be used to adjust for differences in any group of individuals.12,16 Hence, when choosing a predesigned and validated selfreporting instrument, information on the group of participants enrolled in the validation process should be obtained. This information should be provided as part of the research paper and if not, further communication is needed with the authors of the work in order to obtain them. For example, if the target of the study is to examine drug use among the general population with no specific background, then a selfreporting instrument that has been validated on a sample of the population having general characteristics should be used. In addition, combining more than one validation technique or the use of multiple data sources may increase the validity of the results. Moreover, the possible effects of social desirability on study outcomes should be identified during the design phase of the data collection method. As such, measurement scales such as Marlowe–Crowne Social Desirability Scale17 or Martin–Larsen Approval Motivation score18 would be useful to identify and measure the social desirability aspect of the self-reported information.
Journal of Multidisciplinary Healthcare 2016:9
Dovepress
Recall bias Occasionally, study participants can erroneously provide responses that depend on his/her ability to recall past event. The bias in this case can be referred to as recall bias, as it is a result of recall error. This type of bias often occurs in case–control or retrospective cohort study designs, where participants are required to evaluate exposure variables retrospectively using a self-reporting method, such as selfadministered questionnaires.19–21 While the problems posed by recall bias are no less than those caused by social desirability, recall bias is more common in epidemiologic and medical research. The effect of recall bias has been investigated extensively in the literature, with particular focus on survey methods for measuring dietary or food intake.22–25 If not given proper consideration, it can either underestimate or overestimate the true effect or association. For example, a recall error in a dietary survey may result in underestimates of the association between dietary intake and disease risk.24
Overcoming recall bias To overcome recall bias, it is important to recognize cases where recall errors are more likely to occur. Recall bias was found to be related to a number of factors, including length of the recall period (ie, short or long times of clinical assessment), characteristics of the disease under investigation (eg, acute, chronic), patient/sample characteristics (eg, age, accessibility), and study design (eg, duration of study).26–30 For example, in a case–control study, cases are often more likely to recall exposure to risk factors than healthy controls. As such, true exposure might be underreported in healthy controls and overreported in the cases. The size of the difference between the observed rates of exposure to risk factors in cases and controls will consequently be inflated, and, in turn, the observed odds ratio would also increase. Many solutions have proven to be useful for minimizing and, in some cases, eliminating recall bias. For example, to select the appropriate recall period, all the above-mentioned factors should be considered in relation to recall bias. Previous literature showed that a short recall period is preferable to a long one, particularly when asking participants about routine or frequent events. In addition, the recall period can be stratified according to participant demographics and the frequency of events they experienced. For example, when participants are expected to have a number of events to recall, they can be asked to describe a shorter period than those who would have fewer events to recall. Other methods to facilitate participant’s recall include the use of memory
Journal of Multidisciplinary Healthcare 2016:9
Information bias in health research
aids, diaries, and interviewing of participants prior to initiating the study.31 However, when it is not possible to eliminate recall errors, it is important to obtain information on the error characteristics and distribution. Such information can be obtained from previous or pilot studies and is useful when adjusting the subsequent analyses and choosing a suitable statistical approach for data analysis. It must be borne in mind that there are fundamental differences between statistical approaches to make adjustments that address different assumptions about the errors.22,32–36 When conducting a pilot study to examine error properties, a high level of accuracy and careful planning are needed, as validation largely depends on biological testing or laboratory measurements, which, besides being costly to conduct, are often subject to measurement errors. For example, in a validation study to estimate sodium intake using a 24-hour urinary excretion method, the estimated sodium intake tended to be lower than the true amount.25 Despite these potential shortcomings, the use of biological testing or laboratory measurements is one of the most credible approaches to validate self-reported data. More information on measurement errors is provided in the next section. It is important to point out that overcoming recall bias can be difficult in practice. In particular, bias often accompanies results from case–control studies. Hence, case–control studies can be conducted in order to generate a research hypothesis, but not to evaluate prognoses or treatment effects. Finally, more research is needed to assess the impact of recall bias. Studies to evaluate the agreements between responses from self-reporting instruments and gold-standard data sources should be conducted. Such studies can provide medical researchers with information concerning the validity of the self-reporting instrument before utilizing it in a study or for a disease under investigation. Other demographic factors associated with recall bias can also be identified. For instance, a high agreement was found between self-reported questionnaires and medical record diagnoses of diseases such as diabetes, hypertension, myocardial infarction, and stroke but not for heart failure.37
Measurement error bias Device inaccuracy, environmental conditions in the laboratory, or self-reported measurements are all sources of errors. If these errors occur, observed measurements will differ from the actual values, and this is often referred to as measurement error, instrumental error, measurement imprecision, or measurement bias. These errors are encountered in both observational (such as cohort studies) and experimental (such
submit your manuscript | www.dovepress.com
Dovepress
213
Althubaiti
Dovepress
as laboratory tests) study designs. For example, in an observational study of cardiovascular disease, measurements of blood cholesterol levels (as a risk factor) often included errors. An analysis that ignores the effect of measurement error on the results can be referred to as a naïve analysis.22 Results obtained from using naïve analysis can be potentially biased and misleading. Such results can include inconsistent (or biased) and/or inefficient estimators of regression parameters, which may yield poor inferences about confidence intervals and the hypothesis testing of parameters.22,34 However, random sampling should not be confused with measurement error variability. Commonly used statistical methods can address the sampling variability during data analysis, but they do not account for uncertainty due to measurement error. Measurement error bias has rarely been discussed or adjusted for in the medical research literature, except in the field of forensic medicine, where forensic toxicologists have undoubtedly the most theoretical understanding of measurement bias as it is particularly relevant for their type of research.38 Known examples of measurement error bias have also been reported for blood alcohol content analyses.38,39
Systematic and random error Errors could occur in a random or systematic manner. When errors are systematic, the observed measurements deviate from true values in a consistent manner, that is, they are either consistently higher or lower than the true values. For example, a device could be calibrated improperly and subtract a certain amount from each measurement. By not accounting for this deviation in the measurement, the results will contain systematic errors and in this case, true measurements would be underestimated. For random errors, the deviation of the observed from true values is not consistent, causing errors to occur in an unpredictable manner. Such errors will follow a distribution, in the simplest case a gaussian (also called normal or bell-shaped) distribution, and will have a mean and standard deviation. When the mean is zero, the measured value should be reported within an interval around zero and an estimated amount of deviation from the actual value. When the target value is reported to fall within a range or interval of minimum and maximum levels, the size of the interval depends mainly on the size of measurement errors, that is, the larger the errors, the larger the uncertainty and hence the wider the intervals, which could affect the precision level. Random errors could also be proportional to the measured amount. In this case, errors can be referred to as multiplicative or non-gaussian errors.36 These random errors occur due 214
submit your manuscript | www.dovepress.com
Dovepress
to uncontrollable and possibly unknown experimental factors, such as laboratory environment conditions that affect concentrations in biological experiments. Examples of nongaussian errors can be found in breath alcohol measurements, in which the variability around the measurement increases with increasing alcohol concentrations.40–42
Adjusting for measurement error bias The type and distribution of measurement errors determines the type of adjusting method.34 When errors are systematic, calibration methods can be used to reduce their effects on the results. These methods are based on a reference measurement that can be obtained from a previous or pilot study, and used as the correct quantity to calibrate the study measurements. As such, simple mathematical tools can be used if the errors are estimated. The adjustment methods for systematic errors are simpler to apply than those for random errors. Significant efforts have been made to develop sophisticated statistical approaches that adjust for the effect of random measurement errors.34 Commonly available and popular statistical software packages, such as R Software Package (http:// www.r-project.org) and the Stata (Stata Corporation, College Station, TX, USA) include features that allow adjustments to be made for random measurement errors. Some of the bias adjustment methods include simulation–extrapolation, regression calibration, and the instrumental variable approach.34 In order to select the best adjustment approach, knowledge of the error properties is essential. For example, the amount of standard deviation and the shape of error distribution should be identified through a previous or pilot study. Therefore, evaluation of the measuring technique is recommended to identify the error properties before starting the actual measuring procedure. Error properties should also be identified for survey measurement errors, in which methods for examining the reliability and validity of the survey can be used such as test–retest and record checks. A simpler approach used by practitioners to minimize errors in epidemiologic studies is replication; in this method, replicates of the risk factor (eg, long-term average nutrients) are available and the mean of these values is calculated and used to present an approximate value relative to the actual value.43 These replicates can also be used to estimate the measurement error variance and apply an adjusted statistical approach.
Conirmation bias Placing emphasis on one hypothesis because it does not contradict investigator beliefs is called confirmation bias, otherwise known as confirmatory, ascertainment, or observer Journal of Multidisciplinary Healthcare 2016:9
Dovepress
Information bias in health research
bias. Confirmation bias is a type of psychological bias in which a decision is made according to the subject’s preconceptions, beliefs, or preferences. Such bias results from human errors, including imprecision and misconception. Confirmation bias can also emerge owing to overconfidence, which results in contradictory evidence being ignored or overlooked.44 In medicine, confirmation bias is one of the main reasons for diagnostic errors and may cause inaccurate diagnosis and improper treatment management.45–47 An understanding of how the results of a medical investigation are affected by confirmation bias is important. Many studies have demonstrated that any aspect of investigation that requires human judgment is subject to confirmation bias,48–50 which was also found to influence the inclusion and exclusion criteria of randomized controlled trial study designs.51 There are many examples of confirmation bias in the medical literature, some of which are even illustrated in DNA matching.16
Overcoming conirmation bias Researchers have shown that not accounting for confirmation bias could affect the reliability of the investigation. Several studies in the literature also suggest a number of approaches for dealing with this type of bias. An approach that is often used is to conduct multiple and independent checks on study subjects across different laboratories or through consultation with other researchers who may have differing opinions. Through this approach, scientists can seek independent feedback and confirmation.52 The use of blinding or masking procedures, whether single- or double-blinded, is important for enhancing the reliability of scientific investigations. These approaches have proven to be very useful in clinical trials, as they protect final conclusions from confirmation
bias. The blinding may involve participant, treating clinician, recruiter, and/or assessor. In addition, researchers should be encouraged to evaluate evidence objectively, taking into account contradictory evidence, and alter perspectives through specific education and training programs,53,54 with no overcorrection or change in the researcher’s decision making.55 However, the problem with the above suggestions is that they become ineffective if specific factors of bias are not accounted for. For example, researchers could reach conclusions in haste due to external pressure to obtain results, which can be particularly true in highly sensitive clinical trials. Bias in such cases is a very sensitive issue, as it might affect the validity of the investigation. We can, however, avoid the possibility of such bias by developing and following welldesigned study protocols. Finally, in order to overcome confirmation bias and enhance the reliability of investigations, it is important to accept that bias is a part of investigations. Quantifying this inevitable bias and its potential sources must be part of welldeveloped conclusions.
Conclusion Bias in epidemiologic and medical research is a major problem. Understanding the possible types of bias and how they affect research conclusions is important to ensure the validity of findings. This work discussed some of the most common types of information bias, namely self-reporting bias, measurement error bias, and confirmation bias. Approaches for overcoming bias through the use of adjustment methods were also presented. A summary of study types with common data collection methods, type of information bias and adjusting or preventing strategies is presented in Table 1. The framework described in
Table 1 Type of study designs, common data collection methods, type of bias, and adjusting strategies Study design
Data collection method
Type of bias
Overcoming strategy
Observational
Self-administered questionnaire, surveys, or interviews
Social desirability
Conduct internal or external validation study
Recall
Observational/experimental
Laboratory tests
Systematic errors Random errors
Clinical examination/diagnostic tests
Conirmation
Apply Marlowe–Crowne Social Desirability Scale or Martin–Larsen Approval Motivation score Use memory aids or diaries Interview a subsample of participants prior to initiating the study (validated subsample) Conduct calibration study Apply statistical adjusting method (eg, simulation– extrapolation, regression calibration, Bayesian approaches) Replicate measurements Make multiple and independent checks Introduce training and education programs
Journal of Multidisciplinary Healthcare 2016:9
submit your manuscript | www.dovepress.com
Dovepress
215
Althubaiti
Dovepress
this work provides epidemiologists and medical researchers with useful tools to manage information bias in their scientific investigations. The consequences of ignoring this bias on the validity of the results were also described. Bias is often not accounted for in practice. Even though a number of adjustment and prevention methods to mitigate bias are available, applying them can be rather challenging due to limited time and resources. For example, measurement error bias properties might be difficult to detect, particularly if there is a lack of information about the measuring instrument. Such information can be tedious to obtain as it requires the use of validation studies and, as mentioned before, these studies can be expensive and require careful planning and management. Although conducting the usual analysis and ignoring measurement error bias may be tempting, researchers should always follow the practice of reporting any evidence of bias in their results. In order to minimize or eliminate bias, careful planning is needed in each step of the research design. For example, several rules and procedures should be followed when designing self-reporting instruments. Training of interviewers is important in minimizing such type of bias. On the other hand, the effect of measurement error can be difficult to eliminate since measuring devices and algorithms are often imperfect. A general rule is to revise the level of accuracy of the measuring instrument before utilizing it for data collection. Such adjustments should greatly reduce any possible defects. Finally, confirmation bias can be eliminated from the results if investigators take into account different factors that can affect human judgment. Researchers should be familiar with sources of bias in their results, and additional effort is needed to minimize the possibility and effects of bias. Increasing the awareness of the possible shortcomings and pitfalls of decision making that can result in bias should begin at the medical undergraduate level and students should be provided with examples to demonstrate how bias can occur. Moreover, adjusting for bias or any deficiency in the analysis is necessary when bias cannot be avoided. Finally, when presenting the results of a medical research study, it is important to recognize and acknowledge any possible source of bias.
Disclosure The author reports no conflicts of interest in this work.
References 1. Hennekens CH, Buring JE. Epidemiology in Medicine. Boston: Little, Brown, and Company; 1987. 2. Gerhard T. Bias: considerations for research practice. Am J Health Syst Pharm. 2008;65(22):2159–2168.
216
submit your manuscript | www.dovepress.com
Dovepress
3. Pannucci CJ, Wilkins EG. Identifying and avoiding bias in research. Plast Reconstr Surg. 2010;126(2):619–625. 4. Choi BCK, Pak AWP. Bias, Overview. Chichester, UK: John Wiley and Sons, Ltd; 2005. 5. Zhu K, McKnight B, Stergachis A, Daling JR, Levine RS. Comparison of self-report data and medical records data: results from a case-control study on prostate cancer. Int J Epidemiol. 1999;28(3):409–417. 6. Magura S, Kang SY. Validity of self-reported drug use in high risk populations: a meta-analytical review. Subst Use Misuse. 1996;31(9): 1131–1153. 7. Harrison L. The validity of self-reported drug use in survey research: an overview and critique of research methods. In: Harrison L, Hughes A, editors. The Validity of Self-Reported Drug Use: Improving the Accuracy of Survey Estimates NIDA Research Monograph no 167. Rockville, MD; 1997:17–36. 8. Darke S. Self-report among injecting drug users: a review. Drug Alcohol Depend. 1998;51(3):253–263. 9. Brener ND, Billy JOG, Grady WR. Assessment of factors affecting the validity of self-reported health-risk behavior among adolescents: evidence from the scientific literature. J Adolesc Health. 2003;33(6): 436–457. 10. Mills JF, Loza W, Kroner DG. Predictive validity despite social desirability: evidence for the robustness of self-report among offenders. Crim Behav Ment Health. 2006;13(2):140–150. 11. van de Mortel TF. Faking it: social desirability response bias in selfreport research. Aust J Adv Nurs. 2008;25(4):40–48. 12. Harrison LD, Hughes A, National Institute on Drug Abuse, National Institutes of Health (U.S.). The Validity of Self-Reported Drug Use: Improving the Accuracy of Survey Estimates. Rockville, MD: U.S. Department of Health and Human Service, National Institutes of Health, National Institute on Drug Abuse, Division of Epidemiology and Prevention Research; 1997. 13. Schütz H, Gotta JC, Erdmann F, Risse M, Weiler G. Simultaneous screening and detection of drugs in small blood samples and bloodstains. Forensic Sci Int. 2002;126(3):191–196. 14. Ledgerwood DM, Goldberger BA, Risk NK, Lewis CE, Kato Price R. Comparison between self-report and hair analysis of illicit drug use in a community sample of middle-aged men. Addict Behav. 2008;33(9):1131–1139. 15. Stephens R. The truthfulness of addict respondents in research projects. Int J Addict. 1972;7(3):549–558. 16. Kassin SM, Dror IE, Kukucka J. The forensic confirmation bias: Problems, perspectives, and proposed solutions. J Appl Res Mem Cogn. 2013;2(1):42–52. 17. Crowne DP, Marlowe D. A new scale of social desirability independent of psychopathology. J Consult Psychol. 1960;24:349–354. 18. Paulhus DL. Measurement and control of response bias. In: Robinson JP, Shaver PR, Wrightsman LS, editors. Measures of Personality and Social Psychological Attitudes. San Diego, CA: Academic Press; 1991. 19. Holmberg L, Ohlander EM, Byers T, et al. A search for recall bias in a case-control study of diet and breast cancer. Int J Epidemiol. 1996;25(2):235–244. 20. Neugebauer R, Ng S. Differential recall as a source of bias in epidemiologic research. J Clin Epidemiol. 1990;43(12):1337–1341. 21. Kip KE, Cohen F, Cole SR, et al; Herpetic Eye Disease Study Group. Recall bias in a prospective cohort study of acute time-varying exposures: example from the herpetic eye disease study. J Clin Epidemiol. 2001;54(5):482–487. 22. Fuller WA. Measurement Error Models. New York: John Wiley and Sons, Inc; 1987. 23. Nusser SM, Fuller WA, Guenther PM. Estimating usual dietary intake distributions: adjusting for measurement error and nonnormality in 24-hour food intake data. In: Lyberg L, Biemer P, Collins M, De Leeuw E, Dippo C, Schwarz N, Trewin D, editors. Survey Measurement and Process Quality. Hoboken, NJ: John Wiley and Sons, Inc; 1997. 24. Paeratakul S, Popkin BM, Kohlmeier L, Hertz-Picciotto I, Guo X, Edwards LJ. Measurement error in dietary data: implications for the epidemiologic study of the diet-disease relationship. Eur J Clin Nutr. 1998;52(10):722–727. Journal of Multidisciplinary Healthcare 2016:9
Dovepress 25. Ribi CH, Zakotnik JM, Vertnik L, Vegnuti M, Cappuccio FP. Salt intake of the Slovene population assessed by 24 h urinary sodium excretion. Public Health Nutr. 2010;13(11):1803–1809. 26. Bryant HE, Visser N, Love EJ. Records, recall loss, and recall bias in pregnancy: a comparison of interview and medical records data of pregnant and postnatal women. Am J Public Health. 1989;79(1): 78–80. 27. Feldman Y, Koren G, Mattice D, Shear H, Pellegrini E, MacLeod SM. Determinants of recall and recall bias in studying drug and chemical exposure in pregnancy. Teratology. 1989;40(1):37–45. 28. Coughlin SS. Recall bias in epidemiologic studies. J Clin Epidemiol. 1990;43(1):87–91. 29. Weinstock MA, Colditz GA, Willett WC, Stampfer MJ, Rosner B, Speizer FE. Recall (report) bias and reliability in the retrospective assessment of melanoma risk. Am J Epidemiol. 1991;133(3):240–245. 30. Paganini-Hill A, Chao A. Accuracy of recall of hip fracture, heart attack, and cancer: a comparison of postal survey data and medical records. Am J Epidemiol. 1993;138(2):101–106. 31. Biemer PP, Groves RM, Lyberg LE, Mathiowetz NA, Sudman S. Measurement Errors in Surveys. Hoboken, NJ: John Wiley and Sons, Inc; 1991. 32. Carroll RJ, Freedman LS, Kipnis V. Measurement error and dietary intake. Adv Exp Med Biol. 1998;445:139–145. 33. Thomson CA, Giuliano A, Rock CL, et al. Measuring dietary change in a diet intervention trial: comparing food frequency questionnaire and dietary recalls. Am J Epidemiol. 2003;157(8):754–762. 34. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models. 2nd ed. New York: Chapman and Hall; 2006. 35. Althubaiti A, Donev AN. Mixture experiments with mixing errors. J Stat Plan Infer. 2011;141(2):692–700. 36. Althubaiti A, Donev A. Non-Gaussian Berkson errors in bioassay. Stat Methods Med Res. 2016;25(1):430–445. 37. Okura Y, Urban LH, Mahoney DW, Jacobsen SJ, Rodeheffer RJ. Agreement between self-report questionnaires and medical record data was substantial for diabetes, hypertension, myocardial infarction and stroke but not for heart failure. J Clin Epidemiol. 2004;57(10): 1096–1103. 38. Gullberg RG. Estimating the measurement uncertainty in forensic blood alcohol analysis. J Anal Toxicol. 2012;36(3):153–161. 39. Moroni R, Blomstedt P, Wilhelm L, Reinikainen T, Sippola E, Corander J. Statistical modelling of measurement errors in gas chromatographic analyses of blood alcohol content. Forensic Sci Int. 2010;202(1–3):71–74.
Information bias in health research 40. Dror IE, Charlton D. Why experts make errors. J Forensic Ident. 2006; 56(4):600–616. 41. Gullberg RG. Estimating the measurement uncertainty in forensic breath-alcohol analysis. Accred Qual Assur. 2006;11(11):562–568. 42. Dror I, Rosenthal R. Meta-analytically quantifying the reliability and biasability of forensic experts. J Forensic Sci. 2008;53(4):900–903. 43. Carroll RJ. Measurement error in epidemiologic studies. In: Armitage P, Colton T, editors. Encyclopedia of Biostatistics. New York: John Wiley and Sons; 1998:2491–2519. 44. Nickerson RS. Confirmation bias: A ubiquitous phenomenon in many guises. Rev Gen Psychol. 1998;2(2):175–220. 45. Berner ES, Graber ML. Overconfidence as a cause of diagnostic error in medicine. Am J Med. 2008;121(5 Suppl):S2–S23. 46. Croskerry P. The importance of cognitive errors in diagnosis and strategies to minimize them. Acad Med. 2003;78(8):775–780. 47. Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. JAMA. 1995;274(8):645–651. 48. Hill C, Memon A, McGeorge P. The role of confirmation bias in suspect interviews: A systematic evaluation. Legal Criminal Psych. 2010;13(2):357–371. 49. Butt L. The forensic confirmation bias: problems, perspectives, and proposed solutions – Commentary by a forensic examiner. J Appl Res Mem Cogn. 2013;2(1):59–60. 50. Nakhaeizadeh S, Dror IE, Morgan RM. Cognitive bias in forensic anthropology: Visual assessment of skeletal remains is susceptible to confirmation bias. Sci Justice. 2014;54(3):208–214. 51. Goodyear-Smith FA, van Driel ML, Arroll B, Del Mar C. Analysis of decisions made in meta-analyses of depression screening and the risk of confirmation bias: a case study. BMC Med Res Methodol. 2012;12:76. 52. Budowle B, Bottrell MC, Bunch SG. A perspective on errors, bias, and interpretation in the forensic sciences and direction for continuing advancement. J Forensic Sci. 2009;54(4):798–809. 53. Rassin E. Individual differences in the susceptibility to confirmation bias. Neth J Psychol. 2008;64(2):87–93. 54. Powell MB, Hughes-Scholes CH, Sharman SJ. Skill in interviewing reduces confirmation bias. J Investig Psych Offender Profil. 2012;9(2):126–134. 55. Dror IE, Busemeyer JR, Basola B. Decision making under time pressure: an independent test of sequential sampling models. Mem Cognit. 1999;27(4):713–725.
Journal of Multidisciplinary Healthcare
Dovepress
Publish your work in this journal The Journal of Multidisciplinary Healthcare is an international, peerreviewed open-access journal that aims to represent and publish research in healthcare areas delivered by practitioners of different disciplines. This includes studies and reviews conducted by multidisciplinary teams as well as research which evaluates the results or conduct of such teams or
healthcare processes in general. The journal covers a wide range of areas and welcomes submissions from practitioners at all levels, from all over the world. The manuscript management system is completely online and includes a very quick and fair peer-review system. Visit http://www.dovepress.com/testimonials.php to read real quotes from published authors.
Submit your manuscript here: http://www.dovepress.com/journal-of-multidisciplinary-healthcare-journal
Journal of Multidisciplinary Healthcare 2016:9
submit your manuscript | www.dovepress.com
Dovepress
217
SEMINARIO N°8
ESTUDIO DE CASO: MEDICIÓN INAPROPIADA DE VARIABLES. Lajunen HR, Keski-Rahkonen A, Pulkkinen L, Rose RJ, Rissanen A, Kaprio J. Are computer and cell phone use associated with body mass index and overweight? A population study among twin adolescents. BMC Public Health. 2007; 7: 24.
Preguntas para el control de lectura y guía de discusión grupal 1.
El inicio del cuarto párrafo de la introducción señala:
Given the paucity of data on the effects of information and communication technology on adolescent health, we studied the relation of the use of these technologies with weight status. Analice, respecto a los términos empleados en la redacción, la interpretación que podría darse a este fragmento.
2.
Los autores plantean dos hipótesis. Identifique la(s) variables(s) dependiente(s) e independiente(s) en estas hipótesis. Justifique su respuesta.
3.
¿Qué es el índice de masa corporal? ¿Cuál es el procedimiento recomendado para la medición del peso y talla para su cálculo?
4.
¿Cómo calificaría la validez del procedimiento empleado por los autores para la obtención del índice de masa corporal? ¿Qué implicancias tiene esto en la validez del estudio?
5.
En la página 7 del artículo, cuando los autores discuten sobre las mediciones del peso y talla, señalan:
However, our main goal was not to estimate prevalence of overweight but to study the associations between information and technology use and BMI/overweight. Only if self-report bias of weight and height differs by computer and cell phone use, would the associations observed in our study be biased. ¿Concuerda usted con la apreciación de los autores? ¿Por qué?
6.
¿Cuál es la valoración que finalmente tiene de esta investigación? Sustente su respuesta.
Preguntas adicionales a ser discutidas en la sesión grupal 1.
En la sección metodología se indica que para efectos del análisis, los adolescentes obesos fueron incluidos en el grupo de sobrepeso, debido al poco número de participantes con obesidad. ¿Qué aspectos favorables y desfavorables tendría esta decisión?
2.
Uno de las variables considerada como potencial confusor fue la práctica de actividad física o deporte durante el tiempo libre, obteniendo como valor final de esta variable el promedio de días a la semana que el adolescente practicaba ejercicios o alguna actividad física. Analice la manera de obtener la información sobre esta variable. ¿Podría asegurar que esta medición fue confiable? 61
METODOLOGÍA DE LA INVESTIGACIÓN I
SEMINARIO N°9
LA IMPORTANCIA DE LA DEFINICIÓN OPERACIONAL DE VARIABLES. Erkinjuntti T, Ostbye T, Steenhuis R, Hachinski V. The effect of different diagnostic criteria on the prevalence of dementia. N Engl J Med. 1997 Dec 4; 337(23): 1667-74.
Preguntas para el control de lectura y guía de discusión grupal 1.
Revise la definición de demencia en un diccionario médico y luego de haber leído el artículo, señale desde un punto de vista operacional si las mediciones de demencia aplicadas por los autores fueron: a.
Directas
b.
Indirectas unidimensionales
c.
Indirectas multidimensionales
Sustente su respuesta.
2.
Represente mediante un esquema (diagrama de flujo) los pasos seguidos por los autores en el desarrollo de su investigación. Resalte, en este diagrama, la fase que se corresponde con la medición de las variables.
3.
En la sección metodología (página 1668) se señala:
Dementia was rated as mild, moderate, or severe according to the guidelines of the DSM-III-R.4 The cause of dementia was classified as possible or probable Alzheimer’s disease, according to the criteria of the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association14; possible or probable vascular dementia; or other causes, according to the ICD-10 criteria.3
a.
Respecto al grado de demencia, ¿qué escala de medición fue empleada?
b.
Respecto al tipo de demencia, ¿qué escala de medición fue empleada?
4.
Revise la Tabla 1 e identifique si todos los dominios forman parte de manera necesaria de todos los criterios diagnósticos de demencia estudiados. ¿Esto puede influir en las diferencias encontradas que se plasman en la figura 1 y en las diferencias de las estimaciones de las prevalencias? Sustente su respuesta.
5.
En la sección resultados (página 1669), los autores mencionan:
The proportion of women among those who had dementia was similar to the proportion among those without dementia. Observe como parte de los resultados de la tabla 2 las frecuencias de sexo femenino para los pacientes con y sin demencia según los criterios CAMDEX. ¿Es consistente lo señalado en el texto con los datos de la tabla? Sustente su respuesta.
70
METODOLOGÍA DE LA INVESTIGACIÓN I
6.
7.
En la tabla 3 se presentan estimaciones de prevalencias según grupos etarios, para los diferentes criterios de clasificación de demencia.
a.
¿Habría alguna otra manera de presentar los datos que sea más comunicativa?
b.
¿Qué criterio de Hill es el que podemos apreciar?
Los autores, hacia el final de la sección discusión, mencionan las implicancias legales, sociales y económicas que significa tener criterios que son tan discordantes al evaluar un problema de salud. Plantee la situación de otro problema de salud donde podría presentarse también estos impactos.
71
METODOLOGÍA DE LA INVESTIGACIÓN I
E F F E C T O F D I F F E R E N T D I AG N O ST I C C R I T E R I A O N T H E P R EVA L E N C E O F D E M E N T I A
Special Article
THE EFFECT OF DIFFERENT DIAGNOSTIC CRITERIA ON THE PREVALENCE OF DEMENTIA TIMO ERKINJUNTTI, M.D., PH.D., TRULS ØSTBYE, M.D., M.P.H., RUNA STEENHUIS, PH.D., C.PSYCH., AND VLADIMIR HACHINSKI, M.D., D.SC.(MED.)
ABSTRACT Background There are several widely used sets of criteria for the diagnosis of dementia, but little is known about their degree of agreement and their effects on estimates of the prevalence of dementia. Methods We examined 1879 men and women 65 years of age or older who were enrolled in the Canadian Study of Health and Aging and calculated the proportion given a diagnosis of dementia according to six commonly used classification systems: the American Psychiatric Association’s Diagnostic and Statistical Manual of Mental Disorders (DSM), third edition (DSM-III), the third edition, revised, of the DSM (DSM-III-R), the fourth edition of the DSM (DSM-IV), the World Health Organization’s International Classification of Diseases (ICD), 9th revision (ICD-9) and 10th revision (ICD-10), and the Cambridge Examination for Mental Disorders of the Elderly (CAMDEX). The degree of concordance among classification schemes and the importance of various factors in determining diagnostic agreement or disagreement were examined. Results The proportion of subjects with dementia varied from 3.1 percent when we used the criteria of the ICD-10 to 29.1 percent when the DSM-III criteria were used. The six classification systems identified different groups of subjects as having dementia; only 20 subjects were given a diagnosis of dementia according to all six systems. The classifications based on the various systems differed little according to the patients’ age, sex, educational level, or status with respect to institutionalization. The factors that most often caused disagreement in diagnosis between DSM-III and ICD-10 were long-term memory, executive function, social activities, and duration of symptoms. Conclusions The commonly used criteria for diagnosis can differ by a factor of 10 in the number of subjects classified as having dementia. Such disagreement has serious implications for research and treatment, as well as for the right of many older persons to drive, make a will, and handle financial affairs. (N Engl J Med 1997;337:1667-74.) ©1997, Massachusetts Medical Society.
D
EMENTIA is a growing medical, social, and economic problem.1 Most diagnostic classification systems consider dementia to be a single category of symptoms with many causes. The systems based on the various editions of the Diagnostic and Statistical Manual of Mental Disorders (DSM) are commonly used in the United States and Canada, those based on the International Classification of Diseases (ICD) in continental Europe, and the Cambridge Examination for Mental Disorders of the Elderly (CAMDEX) in the United Kingdom. These diagnostic criteria include different combinations of impairment in cognitive, emotional, and social abilities and reflect an emphasis on different clinical features or on a particular cause. The consequences of the existence of a variety of diagnostic classification schemes are poorly understood. The use of different criteria may lead to different diagnostic conclusions. In a survey of 1045 persons who were 70 years of age or older in which case identification was based on an interview, Henderson et al.2 reported that 3.2 percent were given a diagnosis of dementia when the criteria of the 10th revision of the ICD (ICD-10)3 were used, and 7.3 percent when the criteria of the third edition, revised, of the DSM (DSM-III-R)4 were used. In another study of 402 subjects 85 years of age or older, a structured clinical interview and the Mini–Mental State Examination led to a diagnosis of dementia in 28.0 percent according to the DSM-III-R criteria and in 16.0 percent according to the ICD-10 criteria.5 These differences may reflect the fact that the subjects in the former study2 were younger, only 10 percent were institutionalized, and the study was based on lay interviews, whereas in the latter study5
From the Memory Research Unit, Department of Neurology, University of Helsinki, Helsinki, Finland (T.E.); and the Departments of Epidemiology and Biostatistics (T.Ø., V.H.) and Clinical Neurological Sciences (R.S., V.H.), University of Western Ontario, and Psychological Services, University Campus, London Health Sciences Centre (R.S.) — both in London, Ont., Canada. Address reprint requests to Dr. Østbye at the Department of Epidemiology and Biostatistics, University of Western Ontario, London, ON N6A 5C1, Canada.
Vol ume 337 The New England Journal of Medicine Downloaded from nejm.org on January 3, 2017. For personal use only. No other uses without permission. Copyright © 1997 Massachusetts Medical Society. All rights reserved.
Numbe r 23
1667
The New England Journal of Medicine
the subjects were older, 28 percent were institutionalized, and a more detailed clinical evaluation was done. However, both studies suggest that the use of two sets of criteria can produce quite different estimates of the prevalence of dementia. We examined the effects of six commonly used classification schemes — those of the third edition of the DSM (DSM-III),6 the DSM-III-R,4 the fourth edition of the DSM (DSM-IV),7 the ninth edition of the ICD (ICD-9),8 the ICD-10,3 and CAMDEX9 — on the prevalence of dementia in a large and thoroughly examined population-based cohort of elderly people. METHODS The data we analyzed were collected as part of the Canadian Study of Health and Aging (CSHA), a national, multicenter epidemiologic study of dementia.10 The CSHA surveyed randomly selected samples of people 65 years of age or older throughout Canada. Of the 10,263 people surveyed, 9008 were living in the community and 1255 in long-term care institutions. An extensive neurologic and neuropsychological examination was performed on all subjects in the institutions, those in the community who had cognitive impairment on screening with the Modified Mini– Mental State Examination,11 and a subsample of those in the community who did not have cognitive impairment on screening. The neuropsychological examination assessed the cognitive domains included among the various criteria for dementia: memory, abstract thinking, judgment, presence or absence of aphasia, presence or absence of apraxia, presence or absence of agnosia, and constructional abilities (particular tests used to assess each domain are described by Tuokko et al.12 and Steenhuis and Østbye13). On the basis of the neuropsychological tests, background information, and established normative information, the study neuropsychologists determined whether participants had impairment within each of the cognitive domains and made a preliminary diagnosis. The assessments by the physicians included a mentalstatus evaluation (including the Modified Mini–Mental State Examination), as well as physical and neurologic examinations. A knowledgeable informant (a family member, friend, or formal care giver) supplied historical information through an interview using the questions in the CAMDEX.9 The physician made an independent preliminary diagnosis. The final diagnosis of normality or abnormality in each domain of function was based on the results of the clinical examination, the neuropsychological evaluation, and the interview with the informant. At a clinical-consensus meeting, the subjects were classified as having no cognitive loss, cognitive loss but not dementia, or dementia. Dementia was rated as mild, moderate, or severe according to the guidelines of the DSM-III-R.4 The cause of dementia was classified as possible or probable Alzheimer’s disease, according to the criteria of the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association14; possible or probable vascular dementia; or other causes, according to the ICD-10 criteria.3 For a subgroup made up of 210 participants, a second diagnosis was made, on the basis of the clinical information. The agreement between this diagnosis and the original clinical-consensus diagnosis (dementia or no dementia) had a kappa value of 0.81.10 All 1879 subjects who underwent clinical evaluation, including the neuropsychological examination, were included in the present analyses. The classification systems used for diagnosing dementia were those of the DSM-III, DSM-III-R, DSM-IV, ICD-9, ICD-10, and CAMDEX. The diagnoses according to these systems were not based on assessments by different clinicians. Rather, we first identified the domains used in the various diagnostic systems, then identified those variables from the data set (based
1668
on a consensus of a neurologist, neuropsychologist, and nurse) that most closely corresponded to these domains. Given the amount and detail of information that had been collected, we had sufficient information to determine whether the subject had evidence of impairment in each domain. Table 1 shows graphically the sets of diagnostic criteria for dementia. The DSM-III, for example, includes difficulties with short-term or long-term memory, difficulties in one or more of the areas of abstract thinking, judgment, higher cortical functions, or personality, and impairment in work or social functioning. A decline from a level before the onset of illness is specified, with normal consciousness and an assumed organic cause. The ICD-9 and ICD-10 criteria required more factors to be present than the other systems. The contribution of the various factors to the diagnosis according to different classification schemes was calculated, and the degree of overlap among the patients identified as having dementia was examined (Fig. 1). Given the relatively large proportion of subjects without dementia by any classification in our sample, we chose as a measure of concordance the proportion of subjects in whom dementia was diagnosed according to each system divided by the total number of subjects in whom it was diagnosed by any system. The demographic characteristics of the subjects with and without dementia were compared (Tables 2 and 3). The relative frequency of moderate and severe dementia, Alzheimer’s disease, and vascular dementia and the duration of symptoms were examined. Finally, to determine the relative contribution of various factors to agreement or disagreement between DSM-IV and ICD-10, between DSM-IV and CAMDEX, and between ICD-10 and CAMDEX, multivariate logistic-regression models were fitted. The SAS software package (version 6.07; SAS Institute, Cary, N.C.) was used for data management and analysis.
RESULTS
The mean age of the 1879 subjects in the sample was 80.4 years; 62.4 percent were women, 42.5 percent had more than nine years of education, and 69.7 percent lived in the community. The frequency of dementia in the CSHA cohort when diagnosed according to the criteria for dementia in the six different classification systems varied considerably (Table 1). The proportion of subjects with dementia was 29.1 percent when the DSM-III criteria were used, 17.3 percent with the DSM-III-R criteria, 13.7 percent with the DSM-IV criteria, 5.0 percent with the ICD-9 criteria, 3.1 percent with the ICD-10 criteria, and 4.9 percent with the CAMDEX. The frequency of dementia according to the CSHA clinical-consensus diagnosis was 20.9 percent. Because the prevalence in the CSHA cohort was not adjusted for oversampling of the very old, it cannot be compared directly with the figures published by the CSHA for the prevalence of dementia in Canada.10 A step-by-step evaluation of the changes in prevalence as each factor was entered into the algorithm showed, for instance, that about half as many subjects had both short- and long-term memory dysfunction as had only defective short-term memory. That is, the inclusion of long-term memory impairment as a requirement for the diagnosis of dementia had a substantial effect on prevalence and probably accounted for most of the difference between the DSM-III criteria and the later DSM-based systems
Dec em b er 4 , 1 9 9 7 The New England Journal of Medicine Downloaded from nejm.org on January 3, 2017. For personal use only. No other uses without permission. Copyright © 1997 Massachusetts Medical Society. All rights reserved.
E F F E C T O F D I F F E R E N T D I AG N O ST I C C R I T E R I A O N T H E P R EVA L E N C E O F D E M E N T I A
TABLE 1. CRITERIA
DOMAIN
IN
FOR
WHICH IMPAIRMENT IS REQUIRED
Memory Short-term memory (learning skills) Long-term memory Executive function Abstract thinking Judgment Problem solving Other higher cortical function Aphasia Apraxia Agnosia Constructional abilities Calculation Behavioral and emotional function Personality Emotional control Motivation Social behavior Social function Work Social activities Activities of daily living Relationships with others Other features incorporated into criteria Impairment Progressive deterioration Decline from function before illness Duration of symptoms 6 mo Normal consciousness Assumed organic cause Mental retardation as cause Prevalence of dementia in the CSHA sample (%)
DEMENTIA
DSM-III
IN THE
CLASSIFICATION SYSTEMS.*
DSM-III-R
DSM-IV
ICD-9
• •
(•)
(•)
• •
• •
•
•
•
• •
•
• • •
ICD-10 CAMDEX
•
• • •
(•)
(•) (•) (•)
• •
• •
(•) (•) (•)
CLINICAL CONSENSUS
• •
•
• • •
(•) (•)
3.1
4.9
29.1
17.3
13.7
5.0
20.9
*The symbols used in this table are as follows: impairment in domain is always required for diagnosis; • one or more of those bracketed is required; and (•) optional, strengthens the diagnosis. CSHA denotes Canadian Study of Health and Aging.
(45.4 percent of the subjects had short-term memory problems as required by DSM-III, whereas only 23.3 percent of the subjects had both short-term and long-term memory problems, as required by the DSM-III-R and DSM-IV). The ICD-9 and ICD-10 classifications required more types of impairment and thus identified fewer subjects as having dementia. Far fewer subjects (only 5.4 percent) met the requirement for impaired executive function (abstract thinking, judgment, and problem solving) than met the less restrictive criterion of dysfunction in one of a variety of higher cortical functions that included abstract thinking and judgment (52.3 percent). The inclusion of impairment in basic activities of daily living as a criterion in ICD-10 and of evidence of progressive deterioration in CAMDEX also produced a change in the proportions in whom dementia was diagnosed. Regardless of which diagnostic scheme was used,
the subjects with dementia were slightly older than those without dementia, but the difference was not significant (Table 2). The proportion of women among those who had dementia was similar to the proportion among those without dementia. The subjects who were given a diagnosis of dementia according to the DSM-III and the DSM-IV criteria had less education than those without dementia, but the level of education was similar for the other classification systems. Not surprisingly, for all the classification systems, subjects with dementia were more often institutionalized than those without dementia. Overall, the classification systems differed little according to the subjects’ age, sex, amount of education, or living conditions. The frequency of dementia increased with increasing age regardless of which diagnostic classification system was used (Table 3). According to the criteria of DSM-III-R, for example, the frequency of deVol ume 337
The New England Journal of Medicine Downloaded from nejm.org on January 3, 2017. For personal use only. No other uses without permission. Copyright © 1997 Massachusetts Medical Society. All rights reserved.
Numbe r 23
1669
The New England Journal of Medicine
DSM-IV (n 256) 47 Clinical Consensus (n 393)
108
CAMDEX (n 91) 53 18
ICD-10 (n 58) 20
21
7
8
2
165
Figure 1. Subjects Identified as Having Dementia According to Various Diagnostic Classification Systems. A total of 1879 subjects were evaluated. One subject, in whom dementia was diagnosed according to DSM-IV and CAMDEX, is not shown.
mentia was 10.5 percent among subjects 65 to 74 years old, 16.0 percent among those 75 to 84 years old, and 24.4 percent among those 85 years of age or older. The classification systems that identified a higher percentage of the study population as having dementia (DSM-III, DSM-III-R, and DSM-IV) included more cases with mild dementia, and there was a trend toward a shorter mean duration of symptoms. Each successive revision of the DSM appeared to extend the diagnosis to fewer subjects with mild dementia. The ICD-10 and CAMDEX classifications identified the smallest proportion of mild cases. The distribution of Alzheimer’s disease and vascular dementia did not differ markedly among most of the classification systems. The exception was the CAMDEX, which assigned the diagnosis of dementia mainly to those with Alzheimer’s disease. Figure 1 shows the relation among the cases identified by the ICD-10, DSM-IV, and CAMDEX criteria and the CSHA clinical-consensus process. There was complete overlap between the CAMDEX and clinical-consensus diagnoses, although the CAMDEX identified only a subgroup of the cases of dementia identified by the CSHA consensus process. Neither DSM-IV nor ICD-10 identified a complete subgroup of the cases identified by a different classifica-
TABLE 2. DEMOGRAPHIC CHARACTERISTICS OF 1879 SUBJECTS IN THE CSHA COHORT WITH AND WITHOUT DEMENTIA ACCORDING TO VARIOUS CLASSIFICATION SYSTEMS.*
SYSTEM
AND
DIAGNOSIS
NO.
MEAN AGE
FEMALE SEX
yr
DSM-III Dementia No dementia DSM-III-R Dementia No dementia DSM-IV Dementia No dementia ICD-9 Dementia No dementia ICD-10 Dementia No dementia CAMDEX Dementia No dementia Clinical consensus Dementia No dementia
EDUCATION 9 YR
LIVING IN THE COMMUNITY
percent of subjects
546 1333
81.7 79.2
62.3 62.8
37.3 44.7†
57.9 74.5‡
326 1553
80.4 80.4
69.6 61.2
38.8 43.3
52.8 73.2‡
257 1622
81.8 79.6
66.1 62.1
35.3 43.7§
50.2 72.7‡
94 1785
81.8 79.9
60.6 62.7
35.5 42.9
46.8 70.9‡
58 1821
82.4 79.8
65.5 62.5
41.4 42.6
46.6 70.4‡
92 1787
82.9 79.8
71.7 62.2
43.3 42.5
54.3 70.5‡
393 1486
81.7 79.5
64.4 62.2
39.8 43.3
62.1 71.7‡
*CSHA denotes the Canadian Study of Health and Aging. †P 0.01 for the comparison between subjects with dementia and those without dementia. ‡P 0.001 for the comparison between subjects with dementia and those without dementia. §P 0.05 for the comparison between subjects with dementia and those without dementia.
1670
Dec em b er 4 , 1 9 9 7 The New England Journal of Medicine Downloaded from nejm.org on January 3, 2017. For personal use only. No other uses without permission. Copyright © 1997 Massachusetts Medical Society. All rights reserved.
E F F E C T O F D I F F E R E N T D I AG N O ST I C C R I T E R I A O N T H E P R EVA L E N C E O F D E M E N T I A
AS
AGE GROUP
TABLE 3. PREVALENCE OF DEMENTIA IN THE CSHA COHORT DIAGNOSED BY VARIOUS CLASSIFICATION SYSTEMS, ACCORDING TO AGE GROUP.*
NO.
DSM-III
DSM-III-R
DSM-IV
391 931 557
85 (21.7) 245 (26.3) 216 (38.8)
41 (10.5) 149 (16.0) 136 (24.4)
43 (11.0) 114 (12.2) 100 (18.0)
17 (4.3) 41 (4.4) 36 (6.5)
1879
546 (29.1)
326 (17.3)
257 (13.7)
94 (5.0)
yr
65–74 75–84 85 Total
ICD-9
CAMDEX
CLINICAL CONSENSUS
8 (2.0) 28 (3.0) 22 (3.9)
7 (1.8) 49 (5.3) 36 (6.5)
57 (14.6) 184 (19.8) 152 (27.3)
58 (3.1)
92 (4.9)
393 (20.9)
ICD-10
number of subjects (percent)
*CSHA denotes the Canadian Study of Health and Aging.
tion system. The groups of subjects identified as having dementia by the CSHA consensus process and by the DSM-IV criteria had the least overlap, with fewer than half the clinical-consensus cases identified by DSM-IV. Conversely, about a third of the DSM-IV cases were not classified as dementia by consensus. The simple concordance between the clinical-consensus diagnosis of dementia or no dementia and the diagnosis according to the DSM-IV, ICD-10, and CAMDEX criteria were 79.2 percent (1489 of 1879), 78.7 percent (1479 of 1879), and 83.8 percent (1574 of 1879), respectively. Since most of this apparent agreement is due to the large number of our subjects who did not have dementia according to any diagnostic system, we also calculated the proportion of subjects who were classified as having dementia according to one system as a proportion of those who had dementia according to any system. The results were as follows: ICD-10, 12.9 percent (58 of 450); DSM-IV, 57.1 percent (257 of 450); and CAMDEX, 20.4 percent (92 of 450). Factors that most often accounted for disagreement between the classification schemes were further studied by multivariate logistic-regression analysis. The factors that best predicted disagreement between DSM-IV and ICD-10 were long-term memory, executive function, presence or absence of aphasia, social activities, and duration of symptoms (Table 4). DSM-IV and CAMDEX were differentiated by the weight given to social activities and progressive deterioration, whereas long-term memory, social function, progressive deterioration, and the presence or absence of an assumed organic cause differentiated between ICD-10 and CAMDEX. DISCUSSION
In this large, population-based study, the frequency of dementia varied dramatically when different systems of diagnostic classification were used. Although there was substantial overlap among the groups of subjects identified by the various systems
as having dementia, many individual subjects assigned a diagnosis of dementia by one classification system were not so identified by another. This finding arouses concern about the validity of comparisons among studies that use different criteria to diagnose dementia. Questions about validity also arise with regard to the ICD-based systems, which are more likely to identify advanced cases of dementia in which the diagnosis is quite apparent. Our large sample drawn from a population-based study allowed us to examine the effects of various classification systems on the frequency of the diagnosis of dementia. The CSHA is one of a few studies in which detailed and structured clinical and neuropsychological evaluations have been used to assess cognitive functions in the domains incorporated into diagnostic classifications. In our study, the frequency of dementia was 3.1 percent when the ICD-10 criteria were used, 4.9 percent with the CAMDEX, 5.0 percent with ICD-9, 13.7 percent with DSM-IV, 17.3 percent with the DSM-III-R, 20.9 percent according to the CSHA clinical-consensus method, and 29.1 percent with the DSM-III criteria. The highest prevalence (with the DSM-III criteria) was approximately 10 times the lowest (with ICD-10). Increasing frequency of dementia with increasing age and lower educational levels and the absence of difference between the sexes have been reported previously.2,15,16 Other comparative studies2,5 also found that fewer cases were identified by the ICD-10 criteria than by the DSMIII-R criteria. In our study, the rate of dementia diagnosed with the DSM-III-R criteria was roughly six times that with the ICD-10, whereas the earlier studies reported a twofold difference. This discrepancy is likely to be due to the more detailed methods used by the CSHA and the characteristics of the sample (i.e., community and institutional). In a study of 486 patients with stroke (mean age, 72 years), Pohjasvaara et al.17 reported a frequency of 25 percent for dementia when the DSM-III criteria were used, 20 percent with the DSM-III-R criVol ume 337
The New England Journal of Medicine Downloaded from nejm.org on January 3, 2017. For personal use only. No other uses without permission. Copyright © 1997 Massachusetts Medical Society. All rights reserved.
Numbe r 23
1671
The New England Journal of Medicine
TABLE 4. ODDS RATIOS FOR DISAGREEMENT BETWEEN CLASSIFICATION SYSTEMS IN MULTIVARIATE LOGISTIC-REGRESSION MODELS.* DSM-IV VS. ICD-10
FACTOR
DSM-IV VS. CAMDEX
ICD-10 VS. CAMDEX
odds ratio (95% CI)
Memory Long-term memory Executive function Abstract thinking Judgment Problem solving Other higher cortical function Aphasia Apraxia Agnosia Constructional abilities Behavioral and emotional function Personality Emotional control Motivation Social behavior Social function Work Social activities Activities of daily living Relationships with others Other features Progressive deterioration Duration of symptoms 6 mo Assumed organic cause
64.8 (30.7–137.0)
—
8.8 (4.0–19.0)
0.9 (0.5–1.4) 1.7 (0.9–3.3) 1.5 (0.6–3.8)
1.2 (0.6–2.5) 1.6 (0.6–3.9) 2.7 (0.8–8.7)
22.1 (12.8–38.0) 1.7 (0.9–3.4) 0.4 (0.1–1.2) 1.1 (0.7–1.8)
4.0 1.0 0.9 1.5
(2.7–6.0) (0.6–1.7) (0.4–2.1) (1.0–2.2)
1.1 1.0 1.6 1.0
(0.6–1.9) (0.5–2.1) (0.5–5.0) (0.6–1.7)
0.7 (0.3–1.4) 0.9 (0.4–1.7) 1.2 (0.7–2.1) 1.3 (0.7–2.6)
0.7 0.8 1.3 1.3
(0.4–1.2) (0.5–1.4) (0.8–1.9) (0.8–2.3)
1.0 0.5 2.3 2.2
(0.5–2.0) (0.2–1.2) (1.3–4.1) (1.0–4.6)
2.0 (1.1–3.4) 11.4 (4.4–29.1) 1.2 (0.7–2.2) 1.7 (0.8–3.3)
0.8 20.9 1.6 2.1
(0.5–1.2) (9.0–48.5) (1.0–2.5) (1.3–3.6)
4.5 (2.1–9.8) 7.3 (0.9–59.3) — 1.8 (0.8–3.8)
0.9 (0.5–1.7) 0.6 (0.2–1.5) 0.5 (0.1–1.8)
1.5 (0.8–2.7) 0.4 (0.2–0.8) 1.0 (0.4–2.5)
0.5 (0.3–0.8) 0.9 (0.6–1.5) 1.3 (0.6–2.6)
5.5 (3.0–10.1) — 8.9 (0.9–86.7)
*The odds ratios express the odds of disagreement between the specified systems in the diagnosis of dementia or no dementia, as compared with agreement, given the presence of the factor, with adjustment for the other factors. CI denotes confidence interval. Dashes indicate that the factor in question was always required for both classification systems and therefore did not differentiate between them.
teria, 18 percent with the DSM-IV criteria, and 6 percent with the ICD-10 criteria. In another study of stroke, Tatemichi et al.18 also found varying proportions of patients identified as having dementia according to the criteria of the DSM-III (30 percent), the National Institute of Neurological Disorders and Stroke–Association Internationale pour la Recherche et l’Enseignement en Neurosciences (27 percent),19 and Cummings and Benson (41 percent).20 In contrast, in a series of 167 selected patients with suspected dementia (mean age, 72 years), Wetterling et al.21 found that the numbers of cases diagnosed by four classification systems were similar (86 cases with DSM-IV and 85 cases with ICD-10; the other two systems were not included in our study), but the groups included different patients. In the present study, the various classification systems yielded distinct groups; only 20 subjects were given a diagnosis of dementia according to all six classification systems. Thus, the problem is not simply that some systems are more restrictive than others. Rather, the systems identify different individual 1672
subjects as having dementia. Similarly, the work of Wetterling et al.21 yielded distinct groups of subjects, of whom only 35 percent met the criteria of all the classification systems used. The same results have been found when different clinical criteria are used for the diagnosis of vascular dementia.22 In the Canberra study,2 the kappa statistic for the agreement between the DSM-III-R and the ICD-10 criteria was only 0.48. A similar value was reported in a cohort with stroke (kappa 0.43).17 Thus, the literature indicates a lack of diagnostic agreement and only a moderate degree of concordance between the various classification systems. Based on our analysis of the algorithms used, the main factors related to the differences in the frequency of dementia in the study sample included the requirement that both short-term and long-term memory be impaired in the DSM-III-R, DSM-IV, and CAMDEX; the requirement of impairment in abstract thinking, judgment, and problem solving in ICD-9 and ICD-10; the requirement of impairment in basic activities of daily living in ICD-10; and the
Dec em b er 4 , 1 9 9 7 The New England Journal of Medicine Downloaded from nejm.org on January 3, 2017. For personal use only. No other uses without permission. Copyright © 1997 Massachusetts Medical Society. All rights reserved.
E F F E C T O F D I F F E R E N T D I AG N O ST I C C R I T E R I A O N T H E P R EVA L E N C E O F D E M E N T I A
requirement of evidence of progressive deterioration in CAMDEX. The multivariate regression analysis supported those observations and also identified long-term memory and executive function as important. The duration of symptoms, the presence or absence of aphasia, and social functioning also contributed. Similarly, in the series of Pohjasvaara et al.,17 the factors influencing diagnostic differences were long-term memory, executive function, and the duration of cognitive symptoms. Our findings support the need for validation of the criteria used to diagnose dementia. Inconsistency of diagnosis is not unique to dementia, however. For example, there are similar problems with diagnostic concordance and validity with the DSM and ICD criteria for substance use disorder,23 personality disorder,24 somatization disorder,25 and affective and psychotic disorders,26 with different systems emphasizing different aspects of the conditions. One possible conclusion is that universal standards are needed for diagnosis — that is, that all investigators and clinicians should use the same classification scheme and criteria. Certainly, such uniformity would make international comparisons of prevalence and incidence more meaningful. However, a more basic question relates to the validity of the criteria and the underlying theoretical and empirical foundations of diagnosis. In the context of cerebrovascular disease, for example, recent discussions have highlighted issues related to the validity of current criteria for the diagnosis of dementia.19,27,28 The current criteria for the diagnosis of Alzheimer’s disease are also constrained by the concept of dementia.14 New discoveries (e.g., the importance of early medial-temporal-lobe atrophy, characteristics of memory impairment, and the apolipoprotein E genotype) may soon permit the diagnosis of probable Alzheimer’s disease before the dementia is clinically obvious.29-32 All investigators should use the same minimal set of standardized, validated measures and record key demographic characteristics, so that patients can be reclassified and the findings reinterpreted in the light of emerging knowledge.27 The validity (construct, content, and criterion) of diagnostic classifications is essential for studies of the epidemiology, risk factors, prevention, and treatment of any disorder. This use of standardized measures is critical, especially at the early stages of disease, when early detection of pathologic processes is essential if any intervention is to prevent or retard cognitive impairment. One proposal is that the focus should be on the spectrum of cognitive impairment, and the label “dementia” may even be abandoned.27 From the viewpoint of research, a given person may or may not qualify for a therapeutic protocol, depending on the label applied to his or her condition. Clinically, it makes a great difference whether
a patient is labeled as having dementia. The diagnosis often sets the threshold for investigation, treatment, and prognosis. Third-party reimbursement depends on the diagnosis, as does the patient’s ability to obtain insurance. Legally, the diagnosis of dementia may deprive a person of the right to drive, manage personal affairs, and make a will. Diagnostic methods that generate prevalence figures that vary by a factor of 10 have important implications for health care planning. It makes a substantial difference whether 3 percent or 29 percent of the population over 65 years of age has dementia; the resources needed for prevention, treatment, and long-term care differ dramatically in these two cases. Our findings, and the prospect of the early diagnosis and treatment of dementia, point to the urgency of further debate and studies to redefine and refine the characterization of categories of cognitive impairment and dementia. The data reported here were collected as part of the CSHA, which was supported by the Seniors Independence Research Program and administered by the National Health and Research Development Program of Health and Welfare Canada (project 6606-3954-MC[S]). The study was coordinated through the University of Ottawa and the Canadian government’s Laboratory Centre for Disease Control. The work of Dr. Erkinjuntti was supported by grants from the Medical Council of the Academy of Finland, the Clinical Research Institute of Helsinki University Central Hospital, and the Finnish Alzheimer Foundation for Research. Dr. Hachinski is the recipient of the first Trillium Clinical Scientist Award of the Ministry of Health of Ontario.
REFERENCES 1. Østbye T, Crosse E. Net economic costs of dementia in Canada. Can Med Assoc J 1994;151:1457-64. [Erratum, Can Med Assoc J 1995;152: 158.] 2. Henderson AS, Jorm AF, Mackinnon A, et al. A survey of dementia in the Canberra population: experience with ICD-10 and DSM-III-R criteria. Psychol Med 1994;24:473-82. 3. Mental and behavioural disorders (F00-F99). In: The international classification of diseases, 10th rev.: ICD-10. Geneva: World Health Organization, 1992:311-88. 4. Diagnostic and statistical manual of mental disorders, 3rd ed. rev.: DSM-III-R. Washington, D.C.: American Psychiatric Association, 1987. 5. Fichter MM, Meller I, Schröppel H, Steinkirchner R. Dementia and cognitive impairment in the oldest old in the community: prevalence and comorbidity. Br J Psychiatry 1995;166:621-9. 6. Diagnostic and statistical manual of mental disorders, 3rd ed.: DSM-III. Washington, D.C.: American Psychiatric Association, 1980. 7. Diagnostic and statistical manual of mental disorders, 4th ed.: DSM-IV. Washington, D.C.: American Psychiatric Association, 1994. 8. International classification of diseases: manual of the international statistical classification of diseases, injuries, and causes of death: based on recommendations of the Ninth Revision Conference, 1975, and adopted by the Twenty-ninth World Health Assembly. Geneva: World Health Organization, 1977. 9. Roth M, Tym E, Mountjoy CQ, et al. CAMDEX: a standardised instrument for the diagnosis of mental disorder in the elderly with special reference to the early detection of dementia. Br J Psychiatry 1986;149:698709. 10. Canadian Study of Health and Aging: study methods and prevalence of dementia. Can Med Assoc J 1994;150:899-913. 11. Teng EL, Chui HC. The modified Mini–Mental State (3MS) examination. J Clin Psychiatry 1987;48:314-8. 12. Tuokko H, Kristjansson E, Miller J. The neuropsychological detection of dementia: an overview of the neuropsychological component of the Canadian Study of Health and Aging. J Clin Exp Neuropsychol 1995;17:35273. 13. Steenhuis RE, Østbye T. Neuropsychological test performance of specific diagnostic groups in the Canadian Study of Health and Aging (CSHA). J Clin Exp Neuropsychol 1995;17:773-85.
Vol ume 337 The New England Journal of Medicine Downloaded from nejm.org on January 3, 2017. For personal use only. No other uses without permission. Copyright © 1997 Massachusetts Medical Society. All rights reserved.
Numbe r 23
1673
The New England Journal of Medicine
14. McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer’s disease: report of NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer’s Disease. Neurology 1984;34:939-44. 15. Jorm AF, Korten AE, Henderson AS. The prevalence of dementia: a quantitative integration of the literature. Acta Psychiatr Scand 1987;76: 465-79. 16. Fratiglioni L, Grut M, Forsell Y, et al. Prevalence of Alzheimer’s disease and other dementias in an elderly urban population: relationship with age, sex, and education. Neurology 1991;41:1886-92. 17. Pohjasvaara T, Erkinjuntti T, Vataja R, Kaste M. Dementia three months after stroke: baseline frequency and effect of different definitions for dementia in the Helsinki Aging Memory Study (SAM) cohort. Stroke 1997;28:785-92. 18. Tatemichi TK, Desmond DW, Stern Y, Sano M, Mayeux R, Andrews H. Prevalence of dementia after stroke depends on diagnostic criteria. Neurology 1992;42:Suppl 3:413. abstract. 19. Roman GC, Tatemichi TK, Erkinjuntti T, et al. Vascular dementia: diagnostic criteria for research studies: report of the NINDS-AIREN International Workshop. Neurology 1993;43:250-60. 20. Cummings JL, Benson DF, eds. Dementia: a clinical approach. 2nd ed. Stoneham, Mass.: Butterworth–Heinemann, 1992. 21. Wetterling T, Kanitz R-D, Borgis K-J. Comparison of different diagnostic criteria for vascular dementia (ADDTC, DSM-IV, ICD-10, NINDSAIREN). Stroke 1996;27:30-6. 22. Skoog I, Nilsson L, Palmertz B, Andreasson L-A, Svanborg A. A population-based study of dementia in 85-year-olds. N Engl J Med 1993;328: 153-8.
1674
23. Schuckit MA, Hesselbrock V, Tipp J, Anthenelli R, Bucholz K, Radziminski S. A comparison of DSM-III-R, DSM-IV and ICD-10 substance use disorders diagnoses in 1922 men and women subjects in the COGA study: Collaborative Study on the Genetics of Alcoholism. Addiction 1994;89:1629-38. 24. Bronisch T, Mombour W. Comparison of a diagnostic checklist with a structured interview for the assessment of DSM-III-R and ICD-10 personality disorders. Psychopathology 1994;27:312-20. 25. Tomasson K, Kent D, Coryell W. Comparison of four diagnostic systems for the diagnosis of somatization disorder. Acta Psychiatr Scand 1993; 88:311-5. 26. Hiller W, Dichtl G, Hecht H, Hundt W, von Zerssen D. Testing the comparability of psychiatric diagnoses in ICD-10 and DSM-III-R. Psychopathology 1994;27:19-28. 27. Hachinski VC. Preventable senility: a call for action against the vascular dementias. Lancet 1992;340:645-8. 28. Erkinjuntti T, Hachinski VC. Rethinking vascular dementia. Cerebrovasc Dis 1993;3:3-23. 29. Erkinjuntti T, Lee DH, Gao F, et al. Temporal lobe atrophy on magnetic resonance imaging in the diagnosis of early Alzheimer’s disease. Arch Neurol 1993;50:305-10. 30. Soininen HS, Riekkinen PJ Sr. Apolipoprotein E, memory and Alzheimer’s disease. Trends Neurosci 1996;19:224-8. 31. Saunders AM, Hulette O, Welsh-Bohmer KA, et al. Specificity, sensitivity, and predictive value of apolipoprotein-E genotyping for sporadic Alzheimer’s disease. Lancet 1996;348:90-3. 32. Fox NC, Freeborough PA, Rossor MN. Visualisation and quantification of rates of atrophy in Alzheimer’s disease. Lancet 1996;348:94-7.
De c e m b e r 4 , 1 9 9 7 The New England Journal of Medicine Downloaded from nejm.org on January 3, 2017. For personal use only. No other uses without permission. Copyright © 1997 Massachusetts Medical Society. All rights reserved.
SEMINARIO N°10
INADECUADA SELECCIÓN DE SUJETOS COMO FUENTE DE ERROR SISTEMÁTICO. MacMahon B, Yen S, Trichopoulos D, Warren K, Nardi G. Coffee and cancer of the pancreas. N Engl J Med. 1981; 304(11): 630-3.
Preguntas para el control de lectura y guía de discusión grupal 1.
¿Cuál fue el objetivo principal de la investigación desarrollada por los autores? ¿Incluía este objetivo el análisis del consumo de café y té? Reflexione sobre esta situación.
2.
En la metodología se señala que los controles en la investigación fueron seleccionados entre los pacientes atendidos por los mismos médicos que atendían a los casos. ¿Qué especialidades son las que usualmente atienden a los pacientes que presentan los síntomas iniciales del cáncer de páncreas exocrino antes de su diagnóstico? Investigue y consulte sobre los síntomas del cáncer de páncreas exocrino y en función a ello plantee su respuesta.
3.
Según su respuesta a la pregunta anterior, ¿los pacientes usualmente atendidos por esta especialidad, tendrían en general un mayor o menor consumo de café por las molestias presentadas y las recomendaciones de sus médicos?
4.
¿Qué implicancias para la investigación tendrá la situación planteada en la pregunta 3? Sustente su respuesta.
5.
En la discusión, los autores hacen referencia a un artículo publicado por P. Stocks (referencia 14) como argumento a favor de sus resultados. Recuerde el concepto dado sobre estudios ecológicos en clase y particularmente sobre la falacia ecológica. ¿Puede entonces la referencia de P. Stocks brindar argumentos a favor de una relación causal?
6.
Revise el resumen de la siguiente investigación: Hsieh CC, MacMahon B, Yen S, Trichopoulos D, Warren K, Nardi G. Coffee and pancreatic cancer (Chapter 2). N Engl J Med. 1986; 315(9): 587-9. (http://agris.fao.org/agris-search/search.do?recordID=US8724694). ¿Qué opinion le merece?
Preguntas adicionales a ser discutidas en la sesión grupal 1.
Los criterios de inclusión y exclusión son muy importantes pues nos permiten evaluar la representatividad de la muestra de estudio y así tener elementos para valorar la aplicación de los resultados. En este sentido, los autores no brindan mayor información del porqué se excluyó a algunos pacientes por motivos de la etnia (no blancos) y procedencia (extranjeros). ¿Qué opinión tiene al respecto?
2.
El consumo de café y té fue establecido mediante un cuestionario que hacía referencia al consumo de estos productos en un día típico. Hágase usted mismo esa pregunta. ¿Cree que refleja de manera fiable y válida el consumo de café y té por parte suya?
3.
Observando los resultados de la tabla 4, ¿podemos afirmar que se evidencia una relación dosis-respuesta? Sustente su respuesta. 80
METODOLOGÍA DE LA INVESTIGACIÓN I
The New England Journal of Medicine Downloaded from nejm.org at UNL on February 2, 2015. For personal use only. No other uses without permission. From the NEJM Archive. Copyright Š 2010 Massachusetts Medical Society. All rights reserved.
The New England Journal of Medicine Downloaded from nejm.org at UNL on February 2, 2015. For personal use only. No other uses without permission. From the NEJM Archive. Copyright Š 2010 Massachusetts Medical Society. All rights reserved.
The New England Journal of Medicine Downloaded from nejm.org at UNL on February 2, 2015. For personal use only. No other uses without permission. From the NEJM Archive. Copyright Š 2010 Massachusetts Medical Society. All rights reserved.
The New England Journal of Medicine Downloaded from nejm.org at UNL on February 2, 2015. For personal use only. No other uses without permission. From the NEJM Archive. Copyright Š 2010 Massachusetts Medical Society. All rights reserved.
SEMINARIO N°11
EL EFECTO PLACEBO EN LA INVESTIGACIÓN MÉDICA. Linde K, Fässler M, Meissner K. Placebo interventions, placebo effects and clinical practice. Philos Trans R Soc Lond B Biol Sci. 2011; 366(1572): 1905-12.
Preguntas para el control de lectura y guía de discusión grupal
1.
Usted recibe en su consultorio a una madre con su hijo para la evaluación periódica del menor. Al examen clínico no se evidencia signos sugerentes de problema de salud alguno. Los exámenes de laboratorio de rutina solicitados previamente a la consulta tampoco muestran alguna alteración. La madre menciona que su hijo tiene poco apetito y que no termina de comer su almuerzo, y le solicita que le recete una vitamina para “abrir el apetito” de su hijo. ¿Cómo actuaría ante esta solicitud de la señora? ¿Cuál de los escenarios planteados por los autores del artículo aplicaría a esta situación?
2.
En un proceso que se ha seguido ante INDECOPI, se sancionó a una empresa por difundir publicidad que se consideró inducían a error a los consumidores. Así, ejemplos de esta publicidad son:
“Vence a la drogadicción, y encuentra tu vocación, tomando cada día Magnesol (…) Si tú quieres laborar, y a la depresión ganar, toma cada día Magnesol (…) Anda sin fatigarte, porque Magnesol y tú juntos son milagros de salud (…) En la crisis del asma se inyecta magnesio, Magnesol”. (https://www.indecopi.gob.pe/documents/20182/143803/019-2015.pdf)
¿Son estos tipos de productos consumidos por la población general? ¿Qué reflexión puede hacer al respecto?
3.
En relación a la pregunta anterior, identifique en los medios de comunicación algún tipo de publicidad sobre intervenciones en salud que podrían constituir placebos. Sustente su apreciación.
4.
Un tema discutido por los autores del artículo es el uso de antibióticos frente a infecciones respiratorias virales.
5.
a.
¿Qué consecuencias conlleva un uso irracional de antibióticos?
b.
¿Se justifica el uso de estos medicamentos por el deseo de los pacientes de recibir una prescripción?
c.
¿Cuál sería el grado de impacto del efecto placebo cuando se administra antibióticos en caso de infecciones virales?
Un aspecto que debe llamarnos la atención son los resultados de las investigaciones de Hróbjartsson y Gøtzsche, particularmente la referencia 40 del artículo. Esta referencia es una revisión sistemática, en la cual el resultado principal tras analizar 44 investigaciones (variable respuesta binaria) que incluían 6041 pacientes fue un riego relativo (RR) de 0.93 (IC95% 0.88 a 0.99). Los autores concluyen que el efecto placebo no es clínicamente relevante. ¿Qué opinión tiene al respecto?
85
METODOLOGÍA DE LA INVESTIGACIÓN I
6.
Imagine dos situaciones de manejo de una enfermedad. En ambos casos se administrará la misma medicación, pero en una situación el médico realiza la aplicación de manera evidente y se asegura que el paciente está al tanto en todo momento del procedimiento. En la otra situación, el médico trabaja discretamente y el paciente no percibe la intervención hecha. Según la lectura del artículo, ¿en qué situación se esperaría tener una mejor respuesta? ¿concuerda con esta opinión?
7.
Finalmente, ¿qué reflexión tiene sobre el efecto placebo y el efecto del contexto en el marco del desarrollo de investigaciones clínicas? ¿cómo podría afectar la validez de los resultados?
86
METODOLOGÍA DE LA INVESTIGACIÓN I
Phil. Trans. R. Soc. B (2011) 366, 1905–1912 doi:10.1098/rstb.2010.0383
Review
Placebo interventions, placebo effects and clinical practice Klaus Linde1,*, Margrit Fässler1,2 and Karin Meissner1,3 1
Institute of General Practice, Technische Universität München, 81667 Munich, Germany 2 Institute of Biomedical Ethics, University of Zurich, 8008 Zurich, Switzerland 3 Institute of Medical Psychology, Ludwig-Maximilians University, 80336 Munich, Germany This article reviews the role of placebo interventions and placebo effects in clinical practice. We first describe the relevance of different perspectives among scientists, physicians and patients on what is considered a placebo intervention in clinical practice. We then summarize how placebo effects have been investigated in randomized controlled trials under the questionable premise that such effects are produced by placebo interventions. We further discuss why a shift of focus from the placebo intervention to the overall therapeutic context is necessary and what research methods can be used for the clinical investigation of the relevance of context effects. In the last part of the manuscript, we discuss why placebo or context effects are seen as positive in clinical practice when they are associated with active treatments, while placebo interventions pose major ethical and professional problems and have to be avoided. Keywords: clinical practice; placebo; placebo effects; randomized controlled trials; ethics
1. INTRODUCTION There are three major areas in which placebo interventions have an important role: (i) as control interventions in experimental studies to determine specific effects and to reduce bias by enabling blinding; (ii) as experimental interventions in placebo research to study placebo effects; (iii) as a tool in clinical practice. If one searches a major bibliographical database such as Medline for references including the word placebo, the overwhelming majority of articles identified are either placebo-controlled trials or articles referring to such trials. A small minority of articles are review articles on general or specific aspects of the placebo phenomenon, or original reports of experimental placebo research. Only a very small number of articles report empirical investigations or are essays of placebo use or placebo effects in routine practice. In this article, we first describe the relevance of different perspectives among scientists, physicians and patients on what is considered a placebo intervention in clinical practice. We then summarize how placebo effects have been investigated in clinical research under the questionable premise that such effects are produced by placebo interventions. We further discuss why a shift of focus from the placebo intervention to the overall therapeutic is necessary and what research methods can be used for the clinical investigation of the relevance of context effects. In the last part of the manuscript, we discuss why placebo or context effects are seen as positive in clinical practice when
* Author for correspondence (klaus.linde@lrz.tum.de). One contribution of 17 to a Theme Issue ‘Placebo effects in medicine: mechanisms and clinical implications’.
they are associated with active treatments, while placebo interventions have to be avoided.
2. WHAT IS A PLACEBO INTERVENTION IN CLINICAL PRACTICE? According to the classical definition by Shapiro & Morris [1, p. 371] ‘a placebo is defined as any therapy or component of therapy used for its nonspecific, psychological, or psychophysiological effect, or that is used for its presumed specific effect, but is without specific activity for the condition being treated’. Shapiro & Morris further distinguish pure placebos, which are ‘treatments that are devoid of active, specific components’, and impure placebos, which ‘contain non-placebo components’ (p. 372). While this definition has been severely criticized on a conceptual level [2,3], it is summarizing well the implicit view of placebo interventions in biomedicine. We will not discuss conceptual issues here, but we will demonstrate—by using simple case scenarios of interventions classifying as placebos according to this definition—that, in clinical practice, it is often quite difficult to decide what should actually be considered a placebo (table 1). These difficulties are owing to the fact that the perspective of the definition by Shapiro & Morris is scientific, while physicians providing an intervention and patients receiving it might hold a different view. In scenario 1, a typical pure placebo (a saline injection) is administered to a pain patient. Both the provider and the scientist ‘know’ that the intervention is a placebo. The patient is informed in a deceptive manner which makes him believe he is receiving a ‘true’ treatment. If he were to be informed correctly he would also consider the treatment a placebo.
1905
This journal is q 2011 The Royal Society
1906
K. Linde et al.
Review. Placebo in clinical practice
Table 1. Five clinical scenarios and related views of providers, patients and scientists whether the intervention provided is to be considered a placebo. scenario
provider
patient
scientist
1. saline injection in a pain patient 2. antibiotic in a patient with suspected viral infection 3. homoeopathic remedy in a child with a cold (prescribed by a sceptic but uncertain physician) 4. homoeopathic remedy in a child with a cold (prescribed by a homoeopath) 5. arthroscopic débridement in a patient with osteoarthritis of the knee
placebo probably not indicated
specific therapy (deceptive information) specific therapy
placebo placebo
(probably) placebo
specific therapy
placebo
specific therapy
specific therapy
placebo
specific therapy
specific therapy
placebo
According to surveys, between 17 and 80 per cent of physicians and between 51 and 100 per cent of nurses have used pure placebos intentionally at some point in their professional career [4]. However, the data also indicate that the actual frequency is rare, because pure placebos are usually applied only once or a few times to a small minority of patients. There are three basic motivational patterns for such intentional use of a pure placebo. First, the physician aims primarily to promote the patient’s wellbeing. For example, in a young patient suffering from severe headaches at risk of becoming dependent on morphine, a physician tried to reduce this risk by substituting some applications with placebo injections without informing the patient or his parents [5]. In another example, a woman with newly diagnosed advanced cancer for which a curative treatment was not possible still had great hopes of being cured. In order not to dash the patient’s hopes and making her remaining time unbearable, she received a placebo intervention described as a form of cancer treatment [6]. While in both cases the patient was informed in a deceptive manner and the physician placed the relevance of his intent to help over the patient’s autonomy and the ideal of shared decision-making, some authors believe that such placebo applications can be ethically justifiable (e.g. [7]). Physicians move in a grey area, and opinions on the acceptability of using pure placebos vary. The first is a real case in which the mother of the patient filed a professional grievance against the physician and a nurse [5]. The second case is fictive from a survey asking both physicians and patients to assess the acceptability of the placebo treatment. Sixty-three per cent of participating patients and 18 per cent of physicians found the procedure acceptable as it was likely to preserve the patient’s hope [6]. A second motivational pattern could be summarized as ‘convenience’ [8]. For example, several surveys have found that pure placebos are given to difficult or complaining patients, or to avoid conflicts with a patient [9–13]. While understandable to some extent in a busy routine practice, such actions seem highly problematic both on a professional and on an ethical level [8]. It is likely that in reality many intentional applications of pure placebos are owing to a mixture of both the aim to promote wellbeing and convenience. Phil. Trans. R. Soc. B (2011)
A third pattern, which seems to have become more and more infrequent but still occurs, is the use of placebo for diagnostic purposes. In such cases placebos are given to see whether the complaints are ‘real’ or ‘simulated’ or ‘only psychological’ [4]. Such a use is not only ethically problematic, but also contrary to the evidence which clearly shows that ‘real’ complaints can react to placebo applications. Scenario 2 involves a patient with suspected viral upper respiratory tract infection who asks to receive the antibiotic that has helped so greatly in previous infections, and the physician complies. Antibiotics are potent and highly effective drugs when applied adequately but they are not indicated in viral infections. Therefore, this is considered as a classical example of an impure placebo. Obviously, the patient considers the treatment specific. The physician considers the antibiotic non-indicated, but there might be some uncertainty regarding the viral origin or a risk of bacterial super-infection. Based on general pathophysiological reasoning and clinical trial data, the scientist makes a general judgement that antibiotics do not have an effect over placebo in patients with viral infection. Surveys show that the non-indicated use of active drugs is much more frequent than the use of pure placebos [11,13,14]. Qualitative interview studies addressing the prescribing of antibiotics in uncomplicated upper respiratory tract infections have shown that physicians are aware of the problems of their behaviour in such situations, but the word placebo does not come up [15,16]. However, when asked explicitly about placebo use [14], physicians seem to accept that such prescriptions can be considered placebo therapy. The main reason for prescribing antibiotics and other unnecessary treatments is the perceived wish of or pressure from the patient [15 – 18]. There is some data that physicians overestimate the extent to which patients expect a prescription [19], suggesting that other, possibly subconscious, reasons might also play a role. Placebo prescription in such a situation is not a case of deception, but a conflict between the professional integrity of the physician and the patient’s wish [8]. Physicians also often raise the issue of remaining uncertainty as a justification [15,16]. For example, a bacterial origin of the infection or a bacterial super-infection cannot be ruled out. However,
Review. Placebo in clinical practice one could suggest that convenience is often a more important motivation for using a non-indicated treatment than uncertainty. It has been argued that such a use of antibiotics is unethical, unprofessional and harmful [8,20]. In scenario 3, a mother firmly believing in homoeopathic remedies is seeking a paediatrician for her 2-year-old child suffering from symptoms of a common cold. Homoeopathy is a widely used alternative therapy practised both by physicians and non-medical practitioners. Its most controversial aspect is the use of remedies which are prepared in serial dilution steps with vigorous shaking in between (potentization), commonly to the extent that no molecules of the original substance remain. Homoeopaths believe that during the dilution process information passes from the diluted agent to the solvent, which, in the light of current knowledge, seems implausible. Therefore, many scientists are convinced that highly diluted homoeopathic remedies are placebos. As they often do not contain any ‘active substance’ in a chemical sense, they might even qualify as pure placebos. From such a perspective, homoeopathy could be considered a pseudo-therapy. In our scenario, history and physical examination do not provide any indication for relevant risks but the child clearly suffers from bothersome symptoms. The mother asks for a homoeopathic remedy because the child improved very quickly in a similar situation when another physician prescribed the remedy. The paediatrician is sceptical about homoeopathy but he has seen some astonishing cases, so he is not really certain. Furthermore, he considers the risk minimal. He prescribes the remedy saying that he personally is a bit sceptical about homoeopathy, but it might be worth trying, and if the symptoms deteriorate the mother should return. Surveys have shown that the use of complementary therapies such as homoeopathy, herbal medicines or vitamins by sceptical physicians is also much more widespread than the use of pure placebos [14,21,22]. The motivational pattern for the physician is a mixture of convenience (he wants to respect the mother’s wish, and to avoid a conflict or losing a client) and uncertainty (he cannot rule out with certainty that homoeopathy is an active therapy). Scientists would clearly consider this a placebo prescription, but might have diverging views on whether the pragmatic approach of the physician is acceptable. In scenario 4, the mother and her 2-year old child visit a convinced homoeopath who prescribes a highly diluted homoeopathic remedy. Obviously, for the scientist, homoeopathy remains a placebo (or pseudotherapy). Instead, based on his daily experience, the homoeopath is convinced that the prescribed remedy is a ‘true’ active treatment. The scientific doubts of researchers not using this therapy are regularly discarded. Patients seeking a homoeopath usually believe that this is or at least could be an active therapy, although some are sceptic. Surveys on placebo use among physicians do not include questions on this type of placebo use. The reason is obvious: those using the treatment in this manner do not consider it a placebo. As they believe Phil. Trans. R. Soc. B (2011)
K. Linde et al.
1907
to act in the best interest of their clients, neither do they have any ethical problem. Surveys on the use of often highly controversial complementary and alternative therapies show that they are highly prevalent in many countries [23]. Some scientists see it as their duty to inform society about the ‘truth’ and consider it as ethically problematic that providers use therapies they consider disproven by science [24, pp. 244 – 250]. In the last scenario 5, an orthopaedic surgeon performs an arthroscopic débridement in a patient with osteoarthritis of the knee. This is a procedure in which an endoscope is introduced into the arthritic joint. The joint is then lavaged, rough cartilage is shaved and loose debris removed. Surgeons who perform this procedure usually consider it to be an active and effective therapy. Patients are unlikely to undergo this invasive treatment unless they share this view (after probably having been informed in a way supporting this view). However, many scientists consider improvements seen after such a treatment to be a placebo effect, as a rigorous randomized trial did not find any improved outcomes over those of a sham intervention [25]. The case differs considerably from scenario 4: arthroscopic débridement clearly cannot be considered a pure placebo but is an invasive, intense intervention. Compared with homoeopathic treatment, it is associated with much greater direct risks. Contrary to homoeopaths, surgeons usually claim to practice scientific medicine based on the best available current evidence. To justify their behaviour, they therefore have to question the validity of the relevant study results, at least for the selection of patients in whom they actually perform the procedure, and claim that the way they use the intervention is clearly not a placebo. If physicians discuss the use of placebo interventions in practice, they typically think of the intentional application of pure placebos (scenario 1). In this classical case providers, scientists and informed patients all agree that this is a placebo intervention. In the remaining scenarios, the situation is far less clear. Most readers would probably agree that scenarios 2 and 3 can be considered examples of placebo interventions as the provider at least to some extent uses the intervention for placebo purposes. Scenario 4 is a typical example of a strong conflict between perspectives. Many readers might have problems in considering scenario 5 a good example of a placebo intervention but according to the best available evidence, the procedure meets the definition by Shapiro and Morris. The scientific perspective summarized in this definition postulates an objective knowledge on what has specific effects and what not. This knowledge is often uncertain and incomplete. Those involved directly in the clinical encounter—physicians and patients—sometimes ignore the scientific perspective. In the discussion on placebo use, these differences in perspectives are often not reflected. This leads to misunderstandings. We suggest that apart from the intentional use of pure placebos (scenario 1), the word placebo interventions should be used more carefully.
1908
K. Linde et al.
Review. Placebo in clinical practice
3. INVESTIGATING WHETHER PLACEBO EFFECTS ARE CLINICALLY RELEVANT. THE CONVENTIONAL APPROACH According to Shapiro & Morris, ‘a placebo effect is defined as the psychological or psychophysiological effect produced by placebos’ [1, p. 371]. This questionable view of placebo effects has strongly influenced the approaches for quantifying such effects used in clinical research. For decades, improvements in placebo groups of randomized clinical trials have been interpreted as evidence for placebo effects. An analysis of the proportion of patients reporting satisfactory relief after receiving placebo in 15 controlled trials published by Beecher in 1955 in JAMA (The Journal of the American Medical Association) [26] is probably the most cited article in the field of placebo research. This article is the basis of a widely quoted myth that the average size of placebo effects is about 35 per cent. Beecher further claimed that the small standard error in his analysis (2.2%) indicates the constancy of the placebo effect. In 1994, a major review published in JAMA claimed even larger placebo effects based on the improvement in placebo groups [27]. A careful reanalysis of the original studies included in Beecher’s review concluded that spontaneous improvement, fluctuation of symptoms, regression to the mean, additional treatment, response biases and misquotation were plausible alternative explanations for the presumed placebo effects [28]. These reasons also explain why it is almost impossible to reliably judge in routine clinical practice whether a placebo effect has occurred. A recent analysis of trials including both a placebo and no-treatment control group also found relevant improvement in many no-treatment groups [29]. In conclusion, changes observed in patients receiving placebo over time are not a reliable way to estimate the size of placebo effects. From a methodological point of view, it seems obvious that for assessing the size of placebo effects, a no-treatment control group is crucial [30]. However, trials including both a placebo and a no-treatment group are comparably rare and widely dispersed in the medical literature. When, in 2001, the leading medical journal, the New England Journal of Medicine, published a meta-analysis of 114 such trials by Hróbjartsson & Gøtzsche [31] titled ‘Is the Placebo Powerless?’ this provoked a major debate. In the trials included in this review, placebo, on average, did not have a significant effect over no-treatment when outcomes were binary, regardless of whether these outcomes were subjective or objective. For continuous outcomes, there was a significant effect over no-treatment when the outcome was patient-reported, but not when it was an objective measure. The authors concluded that they had ‘found little evidence in general that placebo had powerful clinical effects’. This meta-analysis has been heavily criticized (e.g. [32–36]) for mixing highly heterogeneous studies with control interventions, which might not always be considered placebo, for including studies in which all study groups including the no-treatment group received basic treatment with potential impact on the outcomes measured, as well as for a variety of other reasons. Furthermore, subsequent analyses have Phil. Trans. R. Soc. B (2011)
provided evidence that a subset of studies with outcomes regulated by the autonomic nervous system is susceptible to placebo treatments ([37]; see also [38]). However, the overall conclusion that available trials including both a placebo and a no-treatment group do not provide convincing evidence for powerful placebo effects in general remains adequate. Hróbjartsson & Gøtzsche published updated and expanded versions of their review in 2004 [39] and 2010 [40]. The current analysis now includes 202 trials. While effect sizes remained similar to those in the first analysis, effects of placebo interventions over no treatment are now statistically significant for both patient- and observer-reported continuous outcomes and for binary outcomes owing to the larger number of included trials. While the evidence that there are placebo effects is stronger now, the authors still conclude that they ‘did not find that placebo interventions have important clinical effects in general’, as the overall effect size is small and the influence of bias unclear [40]. 4. PROBLEMS OF THE CONVENTIONAL APPROACH TO ASSESS THE CLINICAL RELEVANCE OF PLACEBO EFFECTS Randomized trials including both a placebo and a no-treatment control group are clearly more appropriate for investigating the size of placebo effects than trials without a no-treatment group. But are they really providing valid evidence on the size of placebo effects in clinical practice? An important methodological problem regarding the reliability of effect estimates is that patients cannot be blinded for comparisons between placebo and no-treatment. This could result in reporting biases (at least in the case of patient-reported subjective outcomes for which the meta-analyses by Hróbjartsson & Gøtzsche provide the most consistent results), differential use of co-interventions and a variety of other biases. This implies that the effect estimates are quite uncertain. But there is a much more fundamental question: are randomized trials including both a placebo and a no-treatment group truly a valid way to estimate the size of placebo effects in practice? The classical definition by Shapiro & Morris states that placebo effects are produced by placebos. In line with that thinking, one assumes that the difference between the placebo and the no-treatment group in randomized controlled trials (RCTs) is owing to the placebo intervention. But how should an intervention (e.g. a saline injection) produce an effect if it is objectively without a specific effect? [3] There now seems to exist a consensus among placebo researchers that what we call placebo effects is a heterogeneous class of psychobiological events attributable to the overall therapeutic context [41]. The placebo intervention by itself should not produce any effect (otherwise it would not be a true placebo); it completes a complex therapeutical situation and thus conveys meaning, influences expectations and possibly triggers conditioned responses or behaviour changes. If this hypothesis is correct, the same placebo intervention should be associated with different placebo effects depending on the context. Furthermore, very different
Review. Placebo in clinical practice placebos associated with very different contexts (e.g. pharmacological placebo and sham surgery) should regularly produce different placebo effects. The context in an RCT does not reflect any of the scenarios described in the previous section of this paper. Participants in RCTs must be informed in detail about the aims and procedures in a trial (although some studies deviate from this). The motivations of physicians for delivering a placebo differ strongly from normal practice. In conclusion, the focus on the placebo intervention as the cause of placebo effects is misleading and should be replaced by a focus on the context (including the placebo intervention). However, even if this shift of focus will occur, it seems likely that owing to their specific context situation, RCTs can provide only crude estimators of the size of placebo effects (better, context effects) in routine clinical practice. 5. MOVING FROM RESEARCH ON PLACEBO EFFECTS TO RESEARCH ON CONTEXT EFFECTS Apart from the strong evidence from experimental research supporting the contextual interpretation of placebo effects [41,42], there is also increasing evidence from clinical research. The most recent version of Hróbjartsson & Gøtzsche’s [40] review itself found that placebo effects were larger for physical placebos (compared with pharmacological or psychological placebos), in trials not informing patients that a placebo intervention was administered, and in trials with the explicit purpose of studying placebo effects [40]. Meta-analyses of changes over time in placebo groups in trials without a no-treatment control have identified a variety of context factors associated with effects size, too. For example, trials in which placebo was injected subcutaneously reported higher improvement rates than trials using oral placebos [43]. An elegant randomized trial found that a placebo acupuncture intervention was associated with significantly greater clinical effects when provided in an empathic compared with a neutral manner [44]. In principle, studies using the open – hidden paradigm could provide important information as to whether perceiving the act of treatment (be it a placebo or an active intervention) makes a difference. In such studies, an active substance and/or a placebo intervention are administered both in an overt and in a covert fashion [45]. For example, in a study by Benedetti et al. [46], patients with post-operative pain with an intravenous drip received either no treatment, an open or a hidden injection of saline (placebo) or of the cholecystokinin antagonist proglumide, which is known to potentiate analgesia induced by morphine and endorphins. Pain intensity was similar in patients receiving no treatment or a hidden injection of saline or proglumide, while it decreased after open injection of saline and even more after proglumide. These results indicate that both saline and proglumide do not have any direct (specific) analgesic effect, but that an overt injection is associated with reduced pain. The results further suggest that proglumide potentiates a placebo-activated endogenous opioid system if applied in an open manner. While the open – hidden paradigm is a fascinating approach, it Phil. Trans. R. Soc. B (2011)
K. Linde et al.
1909
is, however, not feasible for most treatments as a hidden administration is not possible. There are a variety of further approaches to investigate the influence of context factors. If we assume that context factors can modify the clinical response to both placebo and active treatment, this can be investigated directly in randomized trials. For example, trials using a balanced placebo design investigate simultaneously the influence of a specific (e.g. drug versus placebo) and a non-specific or context factor (e.g. positive or neutral information). Such trials are infrequent; however, the available studies suggest that context factors not only have direct effects but also interact with specific effects by either increasing or decreasing the differences between active treatment and placebo [47,48]. If we assume that all healthcare interventions can be associated with context effects and that (as we hope) the majority of interventions have specific activity, the majority of context effects in clinical practice should be associated with active treatments. Obviously, context factors can be and have been investigated directly in randomized trials without manipulating the active treatment. Again a systematic review suggests that context factor matters [49], but owing to the relatively small number of studies and lack of replications the evidence base is relatively weak. There is considerable research on ‘specific’ context factors such as expectations [50,51] or empathy [52]. However, in our view this should not be called placebo research. What is often described as harnessing placebo effects might be better summarized as harnessing context effects or as attempts to create optimal healing environments [53,54]. 6. BAD PLACEBO INTERVENTIONS AND GOOD CONTEXT EFFECTS While we do not have sound evidence regarding how relevant they are in clinical practice, there is a common belief that good physicians should harness placebo and/or context effects to maximize the benefits to their patients [55 – 57]—however, the use of placebo interventions should be avoided unless absolutely necessary [58]. A qualitative study by Comaroff published in 1976 [17] provided interesting insights into why this is the case. For this study, the views of 51 general practitioners on placebo therapy were elicited indirectly, in the context of a more general discussion about prescribing behaviour. Practitioners were first asked to estimate the proportion of their consultations which culminated in prescribing a treatment. All participants set the proportion at 70 per cent or above. In their elaborate answers, most practitioners spontaneously stated that they did not consider all prescriptions as truly necessary and felt necessitated to provide justifications. Implicitly, the answers clearly indicated that the physicians had internalized a professional ideal, which requires that any treatment should be specific in effect and administered or prescribed only when necessary. However, this ideal conflicted with the realities of general practitioners in the real world. Seeing only unselected patients, general practitioners faced considerable uncertainty but still
1910
K. Linde et al.
Review. Placebo in clinical practice
needed to make decisions. Making a firm diagnosis in general practice was often impossible or unnecessary, implying that the basis for choosing a specific treatment was weak. On the other hand, physicians usually believed that patients expected a clear diagnosis and a treatment. Therefore, the general practitioners often prescribed treatment which could be considered a placebo. If the professional imperative of specific and necessary treatment is taken seriously, giving a placebo is nothing else than a therapeutic defeat. The physician fails. Harnessing context or placebo effects is only legitimate if associated with a specific treatment. Therefore, intentional use of pure placebo is usually restricted to exceptional situations. When applying (what scientists call) impure placebos, physicians more or less use conscious rationalization strategies to cope with their dilemma. Apart from perceived demand or expectations of patients, important rationalizations are beliefs in the specific activity of the treatment provided, in spite of conflicting evidence, and arguing for avoidance of potential complications [18]. The available evidence suggests that the use of impure placebos is more frequent in primary care than in specialized care [4]. This seems plausible as diagnostic uncertainty is higher in unselected patient populations where the number of potential diagnoses is high and the prevalence of each single disease is low. However, uncertainty also occurs frequently in specialized settings. 7. SUMMARY AND CONCLUSIONS In summary, it is often unclear in the clinical setting as to what is a placebo intervention. This does not apply to the intentional use of pure placebos, but such interventions are infrequently used (although the total number of such uses on a population level still might be a cause for concern). Pseudo-treatments, disproven or non-indicated treatments are used much more frequently, but whether they are considered placebos is often a matter of perspective. What are summarized under the term placebo effects are highly heterogeneous phenomena related to the overall context of healthcare interventions. Calling context effects associated with the application of active interventions placebo effects leads to confusion and should be, in our opinion, avoided. We do not know how large placebo effects actually are in clinical practice, but the available evidence suggests that, on average, they are often small. Because of the professional ideal that all treatments used should be specific in action and used when only necessary, the use of placebo interventions is problematic, while harnessing context effects is clearly legitimate when the treatment is active. There is a clear need for more research on the role of placebo interventions and the relevance of context effects in clinical practice. This research must take the perspectives of providers and patients into account. Qualitative research can provide important insights into why and how physicians use pure and impure placebos. Investigating the relevance of placebo and context effects for clinical practice will remain a challenge. Studies using the open–hidden approach or a Phil. Trans. R. Soc. B (2011)
balanced placebo design seem particularly desirable. However, the first approach is rarely possible in clinical practice and the second is expensive. As in a clinical environment many factors cannot be controlled and as effects are likely to be small to modest, such studies need large sample sizes. A promising strategy could be to integrate minor manipulations of context factors (for example, using different communication styles) into randomized trials, which are performed for other purposes. If this is done in a larger number of trials, effects could be investigated with sufficient power in meta-analyses (see also [59]). However, as the context in studies and routine practice differs, uncertainty will remain regarding the size of context effects in clinical practice. We think that the professional imperative of specific and necessary treatment is adequate. It is an important basis for the quality and authority of medicine and other acknowledged healthcare professions. However, we also think that the amount of uncertainty in medical practice and its consequences on treatment decisions should be discussed more openly. Downplaying the degree of uncertainty and not accepting that the ideal of specific and necessary treatment often cannot be realized pushes healthcare professionals to use questionable rationalization strategies. It should also be accepted that humans very often behave irrationally and that rituals, myths, seemingly plausible explanations, etc., can strongly affect humans, sometimes even on a somatic level. If uncertainty and irrationality are accepted, we believe that there can be ethically, professionally and scientifically acceptable ways for a limited use of impure placebos (provided that they are associated with very low risks) and exceptional use of pure placebos.
REFERENCES 1 Shapiro, A. K. & Morris, L. A. 1978 The placebo effect in medical and psychological therapies. In Handbook of psychotherapy and behavior change (eds S. L. Garfield & A. E. Bergin), pp. 369 –410. New York, NY: Wiley. 2 Grünbaum, A. 1986 The placebo concept in medicine and psychiatry. Psychol. Med. 16, 19–38. (doi:10.1017/ S0033291700002506) 3 Moerman, D. E. & Jonas, W. B. 2002 Deconstructing the placebo effect and finding the meaning response. Ann. Intern. Med. 136, 471–476. 4 Fässler, M., Meissner, K., Schneider, A. & Linde, K. 2010 Frequency and circumstances of placebo use in clinical practice—a systematic review of empirical studies. BMC Med. 8, 15. (doi:10.1186/1741-7015-8-15) 5 Rich, B. A. 2003 A placebo for the pain: a medico-legal case analysis. Pain Med. 4, 366– 372. (doi:10.1111/j. 1526-4637.2003.03046.x) 6 Lynoe, N., Mattsson, B. & Sandlund, M. 1993 The attitudes of patients and physicians towards placebo treatment—a comparative study. Soc. Sci. Med. 36, 767– 774. (doi:10.1016/0277-9536(93)90037-5) 7 Foddy, B. 2009 A duty to deceive: placebos in clinical practice. Am. J. Bioethics 9, 4 –12. (doi:10.1080/ 15265160903318350) 8 Hróbjartsson, A. 2008 Clinical placebo interventions are unethical, unnecessary, and unprofessional. J. Clin. Ethics 19, 66–69. 9 Goodwin, J. S., Goodwin, J. M. & Vogel, A. V. 1979 Knowledge and use of placebos by house officers and nurses. Ann. Intern. Med. 91, 106 –110.
Review. Placebo in clinical practice 10 Gray, G. & Flynn, P. 1981 A survey of placebo use in a general hospital. Gen. Hosp. Psychiatry 3, 199– 203. (doi:10.1016/0163-8343(81)90002-5) 11 Hróbjartsson, A. & Norup, M. 2003 The use of placebo interventions in medical practice—a national questionnaire survey of Danish clinicians. Eval. Health Prof. 26, 153 –165. (doi:10.1177/0163278703026002002) 12 Sherman, R. & Hickner, J. 2007 Academic physicians use placebos in clinical practice and believe in the mind –body connection. J. Gen. Intern. Med. 23, 7 –10. (doi:10.1007/s11606-007-0332-z) 13 Fässler, M., Gnädinger, M., Rosemann, T. & BillerAndorno, N. 2009 Use of placebo interventions among Swiss primary care providers. BMC Health Serv. Res. 9, 144. (doi:10.1186/1472-6963-9-144) 14 Tilburt, J. C., Emanuel, E., Kaptchuk, T. J., Curlin, F. A. & Miller, F. G. 2008 Prescribing ‘placebo treatments’: results of national survey of US internists and rheumatologists. Br. Med. J. 337, a1938. (doi:10.1136/bmj.a1938) 15 Butler, C. C., Rollnick, S., Pill, R., Maggs-Rapport, F. & Stott, N. 1998 Understanding the culture of prescribing: qualitative study of general practitioners’ and patients’ perceptions of antibiotics for sore throats. Br. Med. J. 317, 637 –642. 16 Kumar, S., Little, P. & Britten, N. 2003 Why do general practitioners prescribe antibiotics for sore throat? Grounded theory interview study. Br. Med. J. 326, 138. (doi:10.1136/bmj.326.7381.138) 17 Comaroff, J. 1976 A bitter pill to swallow: placebo therapy in general practice. Sociol. Rev. 24, 79–96. 18 Schwartz, R. K., Soumerai, S. B. & Avorn, J. 1989 Physician motivations for nonscientific drug prescribing. Soc. Sci. Med. 28, 577–582. (doi:10.1016/0277-9536(89) 90252-9) 19 Lado, E., Vacariza, M., Fernandez-Gonzalez, C., Gestal-Otero, J. J. & Figueras, A. 2008 Influence exerted on drug prescribing by patients’ attitudes and expectations and by doctors’ perception of such expectations: a cohort and nested case–control study. J. Eval. Clin. Pract. 14, 453–459. (doi:10.1111/j.1365-2753.2007.00901.x) 20 Miller, F. G. & Colloca, L. 2009 The legitimacy of placebo treatments in clinical practice: evidence and ethics. Am. J. Bioethics 9, 39– 47. (doi:10.1080/ 15265160903316263) 21 Meissner, K., Höfner, L. & Linde, K. 2010 Häufigkeiten und Gründe für den Einsatz von Placebointerventionen in der allgemeinmedizinsichen Praxis: Erste Ergebnisse einer Fragebogenstudie. Zeitschrift für Allgemeinmedizin 86(suppl.), 97. 22 Classen, W. & Feingold, E. 1983 Use of placebos in medical practice. Pharmacopsychiatry 18, 131–132. (doi:10.1055/s-2007-1017341) 23 Ong, C. K., Bodeker, G., Burford, G., Grundy, C. & Shein, K. 2005 WHO global atlas of traditional, complementary and alternative medicine. Kobe, Japan: World Health Organization. 24 Singh, S. & Ernst, E. 2008 Trick or treatment. Alternative medicine on trial. London, UK: Bantam Press. 25 Moseley, J. B., O’Malley, K., Petersen, N. J., Menke, T. J., Brody, B. A., Kuykendall, D. H., Hollinsworth, J. C., Ashton, C. M. & Wray, N. P. 2002 A controlled trial of arthroscopic surgery for osteoarthritis of the knee. N. Engl. J. Med. 347, 81–88. (doi:10.1056/ NEJMoa013259) 26 Beecher, H. K. 1955 The powerful placebo. JAMA 159, 1602–1606. 27 Turner, J. A., Deyo, R. A., Loeser, J. D., VonKorff, M. & Fordyce, W. E. 1994 The importance of placebo effects in pain treatment and research. JAMA 271, 1609–1614. (doi:10.1001/jama.271.20.1609) Phil. Trans. R. Soc. B (2011)
K. Linde et al.
1911
28 Kienle, G. S. & Kiene, H. 1997 The powerful placebo effect: fact or fiction? J. Clin. Epidemiol. 50, 1311–1318. (doi:10.1016/S0895-4356(97)00203-5) 29 Krogsboll, L. T., Hróbjartsson, A. & Gotzsche, P. C. 2009 Spontaneous improvement in randomised clinical trials: meta-analysis of three-armed trials comparing no treatment, placebo and active intervention. BMC Med. Res. Methodol. 9, 1. (doi:10.1186/14712288-9-1) 30 Ernst, E. & Resch, K. L. 1995 Concept of true and perceived placebo effects. Br. Med. J. 311, 551 –553. 31 Hróbjartsson, A. & Gøtzsche, P. C. 2001 Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment. N. Engl. J. Med. 344, 1594 –1602. (doi:10.1056/NEJM200105243442106) 32 Miller, F. G. 2001 Is the placebo powerless? N. Engl. J. Med. 345, 1277. 33 Lilford, R. J. & Braunholtz, D. A. 2001 Is the placebo powerless? N. Engl. J. Med. 345, 1277 –1278. 34 Spiegel, D., Kraemer, H. & Carlson, R. W. 2001 Is the placebo powerless? N. Engl. J. Med. 345, 1276. 35 McDonald, C. J. 2001 Is the placebo powerless? N. Engl. J. Med. 345, 1276–1277. 36 Wampold, B. E., Minami, T., Tierney, S. C., Baskin, T. W. & Bhati, K. S. 2005 The placebo is powerful: estimating placebo effects in medicine and psychotherapy from randomized clinical trials. J. Clin. Psychol. 61, 835 –854. (doi:10.1002/jclp.20129) 37 Meissner, K., Distel, H. & Mitzdorf, U. 2007 Evidence for placebo effects on physical but not on biochemical outcome parameters: a review of clinical trials. BMC Med. 5, 3. (doi:10.1186/1741-7015-5-3) 38 Meissner, K. 2011 The placebo effect and the autonomic nervous system: evidence for an intimate relationship. Phil. Trans. R. Soc. B 366, 1808–1817. (doi:10.1098/ rstb.2010.0403) 39 Hróbjartsson, A. & Gøtzsche, P. C. 2004 Placebo interventions for all clinical conditions. Cochrane Database Syst. Rev. 3, CD003974. 40 Hróbjartsson, A. & Gøtzsche, P. C. 2010 Placebo interventions for all clinical conditions. Cochrane Database Syst. Rev. 1, CD003974. (doi:10.1002/14651858.CD00 3974. pub3) 41 Finniss, D. G., Kaptchuk, T. J., Miller, F. & Benedetti, F. 2010 Biological, clinical, and ethical advances of placebo effects. Lancet 375, 686– 695. (doi:10.1016/S01406736(09)61706-2) 42 Benedetti, F. 2008 Placebo effects. Understanding the mechanisms in health and disease. Oxford, UK: Oxford University Press. 43 de Craen, A. J., Tijssen, J. G., de Gans, J. & Kleijnen, J. 2000 Placebo effect in the acute treatment of migraine: subcutaneous placebos are better than oral placebos. J. Neurol. 247, 183–188. (doi:10.1007/s004150050560) 44 Kaptchuk, T. J. et al. 2008 Components of placebo effect: randomised controlled trial in patients with irritable bowel syndrome. Br. Med. J. 336, 999–1003. (doi:10. 1136/bmj.39524.439618.25) 45 Finniss, D. G. & Benedetti, F. 2005 Mechanisms of the placebo response and their impact on clinical trials and clinical practice. Pain 114, 3–6. (doi:10.1016/j.pain.2004.12.012) 46 Benedetti, F., Amanzio, M. & Maggi, G. 1995 Potentiation of placebo analgesia by proglumide. Lancet 346, 1231. (doi:10.1016/S0140-6736(95)92938-X) 47 Kleijnen, J., de Craen, A. J., van Everdingen, J. & Krol, L. 1994 Placebo effect in double-blind clinical trials: a review of interactions with medications. Lancet 344, 1347 –1349. (doi:10.1016/S0140-6736(94)90699-8) 48 Enck, P., Klosterhalfen, S., Weimer, K., Horing, B. & Zipfel, S. 2011 The placebo response in clinical trials:
1912
49
50
51
52
53
K. Linde et al.
Review. Placebo in clinical practice
more questions than answers. Phil. Trans. R. Soc. B 366, 1889–1895. (doi:10.1098/rstb.2010.0384) Di Blasi, Z., Harkness, E., Ernst, E., Georgiou, A. & Kleijnen, J. 2001 Influence of context effects on health outcomes: a systematic review. Lancet 357, 757 –762. (doi:10.1016/S0140-6736(00)04169-6) Mondloch, M. V., Cole, D. J. & Frank, J. W. 2001 Does how you do depend on how you think you’ll do? A systematic review of the evidence for a relation between patients’ recovery expectations and health outcomes. CMAJ 165, 174 –179. Crow, R., Gage, H., Hampson, S., Hart, J., Kimber, A. & Thomas, H. 1999 The role of expectancies in the placebo effect and their use in the delivery of health care: a systematic review. Health Technol. Assess. 3, 1–96. Pedersen, R. 2009 Empirical research on empathy in medicine: a critical review. Patient Educ. Couns. 76, 307– 322. (doi:10.1016/j.pec.2009.06.012) Jonas, W. B., Chez, R. A., Duffy, B. & Strand, D. 2003 Investigating the impact of optimal healing environments. Altern. Ther. Health Med. 9, 36–40.
Phil. Trans. R. Soc. B (2011)
54 Jonas, W. B. 2011 Reframing placebo in research and practice. Phil. Trans. R. Soc. B 366, 1896 –1904. (doi:10.1098/rstb.2010.0405) 55 Barrett, B., Muller, D., Rakel, D., Rabago, D., Marchand, L. & Scheder, J. 2006 Placebo, meaning, and health. Perspect. Biol. Med. 49, 178 –198. (doi:10. 1353/pbm.2006.0019) 56 Benedetti, F. 2002 How the doctor’s words affect the patient’s brain. Eval. Health. Prof. 25, 369–386. (doi:10.1177/0163278702238051) 57 Chaput de Saintonge, D. M. & Herxheimer, A. 1994 Harnessing placebo effects in health care. Lancet 344, 995– 998. (doi:10.1016/S0140-6736(94) 91647-0) 58 Moerman, D. E. 2002 The meaning response and the ethics of avoiding placebos. Eval. Health Prof. 25, 399– 409. (doi:10.1177/0163278702238053) 59 Colloca, L. & Miller, F. G. 2011 Harnessing the placebo effect: the need for translational research. Phil. Trans. R. Soc. B 366, 1922–1930. (doi:10.1098/rstb. 2010.0399)
SEMINARIO N°12
IMPORTANCIA DEL RECONOCIMIENTO DE RESULTADOS NO ESPERADOS. Anguera de Sojo A, Ares J, Martínez MA, Pazos J, Rodríguez S, Zato JG. Serendipity and the Discovery of DNA. Found Sci. 2014; 19: 387–401.
Preguntas para el control de lectura y guía de discusión grupal
1.
En el primer seminario del curso se discutió el artículo “(A)Historical science”, donde los autores mencionan cinco razones por las cuales los científicos deberían estar interesados y conocer sobre la historia de la ciencia. Luego de leer el artículo del presente seminario sobre serendipia y el descubrimiento del ADN, ¿cómo el ejemplo histórico de la investigación de Johan Friedrich Miescher puede contribuir a los científicos en formación? (Tome en cuenta los cinco motivos dados en el artículo “(A)Historical science”)
2.
En la página 390 del artículo se describen las condiciones no adecuadas en las que Miescher debió laborar a su retorno a Basilea. ¿Usted como hubiera enfrentado esa situación? Sea sincero en su respuesta.
3.
¿Cuáles fueron las características personales que mejor describen el trabajo científico de Miescher? Menciones las que considere las tres más importantes y sustente su elección.
4.
Felix Hoppe-Seyler le sugirió a Miescher trabajar con leucocitos en lugar de linfocitos. ¿Qué opina, a la luz de la historia que ahora conoce, sobre esa recomendación? ¿Es valiosa la experiencia en ciencia?
5.
El procedimiento de aislamiento de núcleos celulares y de la nucleina desarrollado por Miescher (páginas 392 y 393) implicó un arduo trabajo. ¿Esto es lo típico o es más bien “la excepción a la regla” respecto al trabajo científico?
6.
Los autores señalan una serie de motivos por los cuales la figura de Miescher no se relaciona con el ADN hoy en día. ¿Podría poner otro ejemplo de un investigador no reconocido y cuyos descubrimientos han sido cruciales en el conocimiento científico?
7.
En el texto se señalan cualidades que tendrían los descubrimientos por serendipia. ¿Estas cualidades pueden ser adquiridas o son innatas al investigador?
8.
En la sección conclusiones los autores señalan (página 399): “Serendipities, or sudden illuminations, are the result of the general schematic anticipation latent in a researchers mind…” ¿Qué reflexión puede hacer sobre esta afirmación?
95
METODOLOGÍA DE LA INVESTIGACIÓN I
Found Sci (2014) 19:387–401 DOI 10.1007/s10699-014-9348-0
Serendipity and the Discovery of DNA Áurea Anguera de Sojo · Juan Ares · María Aurora Martínez · Juan Pazos · Santiago Rodríguez · José Gabriel Zato
Published online: 19 March 2014 © Springer Science+Business Media Dordrecht 2014
Abstract This paper presents the manner in which the DNA, the molecule of life, was discovered. Unlike what many people, even biologists, believe, it was Johannes Friedrich Miescher who originally discovered and isolated nuclein, currently known as DNA, in 1869, 75 years before Watson and Crick unveiled its structure. Also, in this paper we show, and above all demonstrate, the serendipity of this major discovery. Like many of his contemporaries, Miescher set out to discover how cells worked by means of studying and analysing their proteins. During this arduous task, he detected an unexpected substance of unpredicted properties. This new substance precipitated when he added acid to the solution and it dissolved again when adding alkali. Unexpectedly and by a mere fluke, Miescher was the first person to obtain a DNA precipitate. The paper then presents the term serendipity and discusses how it has influenced the discovery of other important scientific milestones. Finally, we address the question of whether serendipitous discoveries can be nurtured and what role the computer could play in this process. Keywords
Computational serendipity · DNA · Miescher · Nuclein · Serendipity
1 Introduction Very few people, even biologists, know that it was Johannes Friedrich Miescher who originally discovered and isolated nuclein, now known as DNA, the molecule of life, in 1869, that is, 75 years before Watson and Crick unveiled its structure (Watson and Crick 1953).
Á. Anguera de Sojo · J. G. Zato Technical University of Madrid, Madrid, Spain J. Ares · S. Rodríguez (B) University of A Coruña, A Coruña, Spain e-mail: santi@udc.es M. A. Martínez · J. Pazos Madrid Open University (UDIMA), 28400 Madrid, Spain
123
388
Á. Anguera de Sojo et al.
And still fewer know that this, like many other scientific discoveries, was serendipitous, that is, made by chance when he was looking for something else. On 26 February 1869, a young Swiss physician wrote a letter from the old university town of Tubingen to his maternal uncle, Wilhelm His, a renowned physician and chair of anatomy and physiology at the University of Basel. This letter announced a major discovery: a substance that he had found in the cellular nucleus whose chemical composition was different from other proteins and any other known compound at the time. Strictly speaking, however, the first document written about DNA structure was a letter that one of its discoverers, Francis Crick, wrote on 19 March 1953 to his then 12-year-old son Michael, who was away at a British boarding school, weeks before the publication of the article that he had co-authored with Watson. After instructing him to attentively read and comprehend the contents of the seven-page letter (Watson and Crick 1953), he explained, as shown in Fig. 1, the composition of deoxyribonucleic acid (DNA), listed its constituent bases, plus its helical structure. The letter not only recounted the discovery but also conveyed to his beloved son, who was also interested in the world of science, how excited Crick, the father, was about the breakthrough. And, rightly so, as, further down, he wrote We think we have found the basic copying mechanism by which life comes from life. You can understand that we are very excited. Incidentally, and somewhat anecdotally, the manuscript was auctioned by Christie’s for no less than four million euros. The amazing thing about this discovery was that Miescher was unaware of the major future repercussions that his research, called upon to trigger one of the major scientific revolutions of all time, was to have. Also, Miescher was researching the chemical composition of cells, first lymphocytes and then their close relations, leucocytes, by analysing their proteins in order to find out how cells worked when he came across a substance that exhibited unexpected properties. This substance, which he called nuclein, is now known as DNA.
2 Miescher’s Biography To write this section we have consulted the following biographies of Miescher: (Dahm 2005, 2008b; Greenstein 1943; Wolf 2003; Lagerkvist and Brenner 1998; Meuron-Ladolt 1970). Johannes Friedrich Miescher was born into a family of scientists based in Basel, Switzerland, in 1844. His father and maternal uncle, Wilhelm His, were renowned physicians, and both had held the chair of anatomy and physiology at the University of Basel. Scientists often visited his home, where heated and interesting debates were held especially concerning scientific and cultural questions. This was the environment and breeding ground that opened up science, its ideas, problems, perspectives and prospects to Friedrich as of a very young age. This milieu naturally led to Miescher developing a profound and early interest in the natural sciences. This interest led him to enrol at the age of 17 years for medical school at Basel, from where he qualified with distinction in 1867. He originally had in mind to pursue the family profession; however, an illness, typhus, which he contracted during his childhood, had left him partially deaf. This was a substantial handicap for practising the teaching profession. And, looking on the bright side, this was actually a great stroke of luck, as his passion for science pushed him into research. And that is where his Uncle Wilhelm’s influence came in. Wilhelm His was convinced that chemistry held the key to the outstanding questions regarding tissue development. This encouraged Miescher to study biochemistry. In spring 1868, he moved to Germany to work with two of the most renowned scientists of the time: Adolf Strecker, a specialist in organic chemistry, at whose Göttingen-based
123
Serendipity and the Discovery of DNA
389
Fig. 1 Extract from Crick’s letter to his son with a drawing of the DNA spiral model
laboratory he spent 6 months, and Felix Hoppe-Seyler, a biochemist and pioneer of the recent discipline of physiological chemistry. From 1860 to 1871, Hoppe-Seyler led one of the first biochemistry laboratories in the world. It was located inside Tübingen’s medieval castle, overlooking the old part of the town. His laboratory occupied what had been the laundry, and Miescher worked in the former castle kitchens. Before Miescher’s arrival, Hoppe-Seyler had conducted innovative research into the properties of haemoglobin, which had a major impact on later studies on the structure and operation of this and other proteins. He soon rose to international fame. Under the leadership of Hoppe-Seyler, Miescher started to research the chemical composition of cells and developed a method that led him to a capital discovery. In autumn 1869, Miescher returned to Basel for a short holiday, where he started to write his first scientific publication on the analysis of the chemical composition of leucocytes, which included the discovery of nuclein (Miescher 1871). By that time he was sure about the importance of his discovery and likened the substance to a protein. After his vacation, he returned to the laboratory, this time based at the University of Leipzig, where he started to work with Carl Ludwig researching, among other things, the pain-transmitting nerve pathways of the spinal cord. And although Miescher went about his new undertaking with his characteristic rigour and thoroughness, he was nowhere near as
123
390
Á. Anguera de Sojo et al.
enthusiastic about his work as he had been in Tübingen. In fact, he spent his first few months at Leipzig working on the draft of his first publication, which he completed just before Christmas 1869. He sent the completed manuscript to Hoppe-Seyler for approval and publication in the journal that he edited. On 23 December 1869, Miescher wrote to his father, saying On my table lies a sealed and addressed packet. It is my manuscript, for whose shipment I have already made all necessary arrangements. I will now send it to Hoppe-Seyler in Tübingen. So, the first step into the public is done, given that Hoppe-Seyler will not refuse it. And although it was not rejected, it was not published until 1871 due to a series of unforeseen and events and changing circumstances, including the Franco-Prussian War which broke out in 1870. After his stay at Leipzig, Miescher was offered a position as faculty member at the University of Basel. He returned to the institution in 1870, after his scientific achievements had earned him a notable reputation as a tenacious and ingenious researcher. In 1872, at the age of 28 years, after qualifying as a university lecturer, he was offered the university’s chair of physiology, the same position that his father and uncle had held. Miescher worked hard at his new job, often to the point of exhaustion. Additionally, his passion for science gave him the itch to demonstrate that he had been awarded the chair on his own merit and not by “inheritance”. At Basel, Miescher went back to his nuclein research, which he had broken off during his stay at Leipzig. He was encouraged by the fact that Hoppe-Seyler was interested in and accepted to continue research on the topic, provided Miescher redoubled his efforts. Miescher’s goal was to describe the features of nuclein in more detail. But his working conditions were much worse than at Tübingen, which meant that the project progressed slowly. In a letter to a friend, he complained, In the past 2 years, I have avidly yearned for the laboratory in Tübingen Castle again, for I had no laboratory here and was merely tolerated in a small corner of the chemistry laboratory, where I could hardly move, surrounded by students; and above all the chemistry professor conducting his research here. He continued, You can imagine how it must feel to be hindered in the energetic pursuit of an endeavour on account of the most miserable conditions, knowing that I may never have such a fine opportunity again. Even so, Miescher did not give up. Moved by his uncle’s interest in developmental biology, he set out to study nuclein in eggs and sperm cells. He soon found that the sperm cells, composed almost exclusively of nuclei, were a perfect source of sufficient quantities of nuclein. Basel, located on the banks of the Rhine, turned out to be a good spot for his experiments. And the animal migration season, when salmon swim upriver to spawn, was the optimal time for collecting raw material. Thus, salmon sperm, which he used to carry out his increasingly complex research, became his new source of nuclein as of autumn 1871. The results of this research were published in 1874 (Miescher 1874). This contact with the salmon industry steered his interests towards other more mundane fields, and he never published on the nuclein again. In the mid-1870s, he studied the changes that took place in salmon anatomy during their annual migrations from the ocean to the cold waters of the Rhine where they spawned, a journey during which the fish starve. He was amazed by the fact that the fish’s sexual organs grew enormously eventually accounting for a quarter of their total body weight, taking up space left by their wasting muscles. By virtue of this work on salmon metabolism, the Swiss government commissioned a report on the diet of Basel prison inmates in autumn 1876. Miescher did not like the undertaking, which lasted months, but the authorities were so favourably impressed by the results that they commissioned similar reports for other prisons. Apparently every prison wanted its own menu. And it did not stop there, as educational institutions, food associations and other nutrition-related organizations sought Miescher’s advice and opinion. He ended up loathing
123
Serendipity and the Discovery of DNA
391
this job. Inquiries into the Swiss diet, cookery books for workers, nutrient tables for national exhibition, disagreements with the Chamer dairy gave Miescher the feeling that he was the stomach warden of three million Swiss. Years later in 1885, Miescher took on another challenge: the foundation of Basel’s first anatomical–physiological institute, which he directed successfully and responsibly. This foundation set out to promote scientific activity. He engaged renowned and accredited technicians that developed machines and instruments for performing exceedingly accurate physiological measurements. He researched the change in blood composition at altitude and discovered that it was the concentration of CO2 and not O2 that regulated breathing. With time, his increased commitments wore him down. His obsession for his work and his perfectionism led him to cut down on hours of rest. He slept less and less, kept up with his social commitments and relations and worked nonstop without respite. Exhausted, his body weakened day by day. In early 1890, he contracted tuberculosis and had to give up his work altogether and retire to a sanatorium in Davos in the Swiss Alps. He attempted to make a comeback soon afterwards and even return to his research on nuclein, but his health broke down completely, and he died in 1895. After his death, Wilhelm His published a collection of his research with a foreword reading, The appreciation of Miescher and his work will not diminish; on the contrary, it will grow and his discoveries and thoughts will be seeds for a fruitful future. Not even His could imagine how true these prophetic words in honour of his nephew would be.
3 The Discovery of Nuclein The references that we have used for this section are (Dahm 2008a; Harbers 1969; James 1970; Maderspacher 2004; Ostrowski 1970; Portugal and Cohen 1977). When under HoppeSeyler’s leadership, Miescher started to research the chemical composition of cells, he focused on lymphocytes. Lymphocytes were the most simple and independent type of cell, from which he hoped to unravel the secrets of cellular life. However, lymphocytes were hard to purify in sufficient quantities for chemical analysis from the lymph nodes. Hoppe-Seyler, who had been looking into the makeup of blood for some time, suggested that he should use leucocytes, which were closely related to lymphocytes. To do this, Miescher, isolated the raw material for this experiments, leucocytes, from the pus on surgical bandages which he collected from a nearby hospital. This was a rather uninspiring start. At the time, seeping wounds were considered to be the body’s cleansing mechanism whereby it excreted harmful substances. As antiseptics were not much used, there was no problem at all in gathering large quantities of pus-filled bandages. The first thing that Miescher had to do was to develop a method to extricate the leucocytes from the surgical material. To do this, he tested several saline solutions always checking the results under the microscope. Once he had established the extraction protocol, he proceeded to characterize and classify the proteins and lipids that he isolated from the cells. Like many of his contemporaries, his ultimate hope and aim was to analyse cellular proteins in order to discover how cells worked, which is why Miescher went about describing and classifying proteins. Now, his path was riddled with obstacles. The diversity of cellular proteins surpassed the primitive methods and instruments of the time. Even so, he unexpectedly detected a substance that he was not actually looking for, a substance which exhibited unpredicted properties. It precipitated when he added acid to the solution and dissolved again when an alkali was added. Without realizing it and by pure luck, Miescher had for the first time obtained a precipitate of DNA.
123
392
Á. Anguera de Sojo et al.
Table 1 DNA-related timeline Year
Event
1865
Gregor Mendel discovers that traits are inherited according to specific laws
1866
Ernst Haeckel states that the nucleus contains the factors responsible for transmitting hereditary traits Friedrich Miescher isolates DNA, which he calls nuclein
1869 1884–1885 1889 1928 1929 1944
Oscar Hertwig, Albrecht von Kölliker, Eduard Strasburger and August Weismann demonstrate that the cell nucleus is the basis of inheritance Richard Altmann changes the name of the nuclein molecule to nucleic acid Frederick Griffith proposes a transformation principle underlying the transmission of properties of one type of bacteria to another Phoebus Levene identifies the components of DNA
1954
Oswald T. Avery, Colin MacLeod and Maclyn McCarty demonstrate that DNA is responsible for Griffith’s transformation principle Erwin Chargaff finds that the composition of DNA bases varies from one species to another but that the proportions of bases are unchanged in each species Alfred Hersey and Martha Chase use viruses to confirm that DNA constitutes genetic material Rosalyn Franklin and Maurice Wilkins use X-rays to demonstrate that DNA has a regularly repeated helical structure James Watson and Francis Crick discover the molecular structure of DNA, which is a double helix Researchers continue to sequence the genome of many organisms
2001
The human genome is sequenced
1949–1950 1952 1953 1953
Where did that substance come from? While Miescher was extracting leucocytes using acids, he observed that the prolonged exposition of cells to diluted hydrochloric acid produced a cellular residue similar to isolated nuclei. He found that those nuclei were not stained yellow by iodine, which provided unquestionable proof that no proteins were present. Slightly alkaline solutions caused the swelling of, but did not dissolve, the nuclei. Miescher thought that the mysterious precipitate must come from the nucleus. Hardly anything was known about that organelle at the time. Although its cellular function had been discovered back in 1802, it was still a cause of speculation and controversy. In 1866, however, 3 years before Miescher’s discovery, Ernst Haeckel claimed that the nucleus contained the factors responsible for transmitting hereditary traits, whose laws Gregor Mendel had discovered. This hypothesis again raised interest in investigating the role of the nucleus. The chance finding by Miescher provided the key to gather more information about the nature of the mysterious organelle. Later, as of this discovery, a powerful, and today still buoyant, line of research was developed, whose highlights are reported chronologically in Table 1. Before he could identify the nuclear precipitate, Miescher had to develop several procedures for isolating highly pure nuclei. After numerous trials and errors, he came upon an effective method. This finicky and tiresome method (Dahm 2008b) had to be carried out at low temperatures to prevent the degradation of the samples and consisted of the following steps: 1. Wash the pus-filled bandages in a diluted solution of sodium sulphate to extract the leucocytes. 2. Filter the result to remove cotton fibres, leave to stand for several hours and then examine the leucocytes under the microscope to check that they are intact.
123
Serendipity and the Discovery of DNA
393
3. Wash the cells several times with heated alcohol to break down the cells and remove most of the lipids. 4. Wash pigs’ stomachs with hydrochloric acid to extract pepsin, the enzyme that digests proteins. 5. Heat and shake the cells with pepsin repeatedly for 18–24 h to form a fine, grey, grainy sediment of isolated nuclei separated from a pale yellow liquid. 6. Stir the nuclei in ether several times to remove any trace of lipids, wash with water, stain with iodine, and examine under the microscope. If they do not stain, the proteins have been successfully removed. 7. Wash again with heated alcohol and start checks. Add diluted sodium carbonate (alkaline solution) whereby they should swell and become translucent. 8. Add hydrochloric acid to obtain an insoluble, flocculent precipitate: nuclein or DNA. Through these experiments, Miescher demonstrated that the observed precipitate came from the nuclei on which ground he called it nuclein, a term that is still conserved in today’s deoxyribonucleic acid. Despite nuclein’s unusual behaviour, Miescher was not absolutely convinced that it was not a protein. This led him to conduct other experiments to further examine the nature of this strange molecule. First of all, he set out to determine its elementary composition. To do this, he had to purify nuclein. To eliminate the contaminating cytoplasm, he decided to apply the method that Wilhlem Kühne had described one year before in his physiology manual. Unfortunately, pepsin was not sold commercially at the time, and he had to isolate it on his own account. The second, equally or more disagreeable, part of his scientific enterprise began. This was to wash pigs’ stomachs in diluted hydrochloric acid and filter the extracted contents in order to obtain a crude solution of digestive enzymes. By treating the cells with this solution, he managed to demonstrate that nuclein was not a protein because the pepsin would have digested all the proteins. At that time, fundamental analysis was one of the few methods for identifying molecules. The procedure included heating the sample together with several chemical agents that reacted selectively with the different components. The resulting products were weighed to determine the amount of each element in the sample. Although this was an extremely laborious and slow process, factory work in Miescher’s words, it was successful. Miescher’s isolation method showed that nuclein behaved differently to lipids and proteins, as it was not degraded by the enzymes capable of breaking down proteins, nor could it be extracted by means of strong organic solvents. The analysis of its composition caused another surprise: apart from containing carbon, oxygen, hydrogen and nitrogen, elements that were known to abound in proteins, the molecule also contained large quantities of phosphorus but no sulphur. This was a surprising finding that certainly was a distinguishing feature of the substance as there was no other known organic molecule containing phosphorus. This result convinced Miescher that he had discovered a new type of fundamental cellular substance.
4 Getting the Results Published: A Calvary In autumn 1869, Miescher started to write his first scientific publication on the analysis of the chemical composition of leucocytes, which included the discovery of nuclein. Months later, he moved to Leipzig, where he completed the manuscript just before Christmas. Its off-putting, even repulsive, title, On the chemical composition of pus, obscured the crucial discovery that it reported. The body of the text did, however, underline the innovative finding
123
394
Á. Anguera de Sojo et al.
of nuclein as follows, Wir haben vielmehr hier einen Körper sui generis, mit keiner jetzt bekannten Gruppe vergleichbar; that is, Rather we are dealing with a sui generis entity that is not comparable to any hitherto known group. Hoppe-Seyler considered the manuscript rather circumspectly and guardedly as he distrusted such innovative results. This is not a surprising attitude taking into account that a profound debate about the existence of a molecule containing phosphate in brain tissue had been held at his laboratory not long before. Hoppe-Seyler was sceptical about a young and inexperienced scientist having discovered a new fundamental molecule. Additionally, Miescher’s manuscript was to be published in Medizinisch-chemische Untersuchungen. Hoppe-Seyler was the editor of this publication, on which ground he had to be specially demanding and critical. After checking Miescher’s results, Hoppe-Seyler was very favourably inclined towards the article, although his initial analyses of the elementary composition of nuclein differed from Miescher’s. They were unimportant differences, but they were going to delay the publication process. In face of this delay, Hoppe-Seyler suggested that Miescher submit the manuscript to another publication. However, Miescher preferred to wait until his results had been confirmed and have them published in the journal of his former mentor. In July 1870, when the Franco-Prussian War broke out, the situation worsened and there was further delay. Miescher became increasingly concerned about the hold-up, among other things because his qualification as a university professor at the University of Basel was at stake. Moreover, he feared that others might discover nuclein and publish their discovery before his article came out. And, as is known, there are only gold medals in scientific research. Desperate because of the long delay, he wrote to Hoppe-Seyler several times to get things moving. He even considered the option of submitting his work to another journal and asked Hoppe-Seyler to return the manuscript. Finally, after a year-long lapse, Miescher received the response to his letters. Hoppe-Seyler had confirmed his results and told him that he would try to publish the paper in the next issue of the journal. Pleased to hear that this work would soon see the light, Miescher immediately wrote back to Hoppe-Seyler noting his satisfaction and including some comments on the latest findings that he had been sent. A few weeks later he received the proofs of his first publication. These, anecdotally, were accompanied with a letter from the editor apologizing for all the typos which were the result of printers having found his handwriting hard to decipher. The paper was finally published in 1871. It headed the table of contents of that issue of the journal edited by Hoppe-Seyler. The publication contained another two articles on nuclein, one written by one of his disciples, which proved the presence of the molecule in the nucleated leucocytes of birds and snakes, and another by the editor, confirming Miescher’s results. In 1871, Miescher again took up his research on nuclein, which he had broken off during his stay a Leipzig. Despite the difficulties of experimenting at Basel, he returned to his nuclein research, driven by his uncle’s interest in developmental biology. During his stay at Tübingen, Miescher had, as already mentioned, isolated a great deal of the very purest nuclein. Thanks to this, he was able to carry out the thorough analyses that he had projected to undertake at Tübingen. The new observations confirmed the initial results and precisely determined nuclein’s phosphorus content. In 1874, he published his results on the presence of nuclein in vertebrate sperm. At the time, scientists were researching embryonic development and the transmission of hereditary traits. Miescher had the answer at his fingertips and even went as far as to write, If one wants to assume that a single substance is the specific cause of fertilization, then one should undoubtedly first and foremost consider nuclein. However, he did not believe that a single molecule was responsible for inheritance, as he could not conceive how a single substance
123
Serendipity and the Discovery of DNA
395
could produce such a broad assortment of animals as those whose sperm he had examined. He believed that the chemical structure of the molecule occasioned such variations, but that their variability was limited; not large enough to explain the differences observed among individuals of the same species and much less so among different species. Contrariwise, he defended that mechanical stimuli caused by the movement of sperm and other processes such as were observed during nerve and muscle fibre excitation were responsible for developing the fertilized egg. Miescher likewise put forward a hypothesis on the transmission of hereditary information, which, although he got the details wrong, is very close to the current description of information storage in DNA. It speculated on the possibility of information being encoded in the stereochemical arrangement of carbon atoms or, alternatively, their organization inside the molecule. In the same way that a 26-letter alphabet is enough to express all the words and concepts of most languages, molecules would be composed of different stereoisomers or specific forms of their constituent atoms. In other words, if there were a great many asymmetric carbon atoms in organic micromolecules, like proteins, there would be an extraordinarily large quantity of stereoisomers. For example, a molecule with only 40 asymmetric carbon atoms could contain 240, that is, over 1,012 stereoisomers. Miescher thought that such a figure was sufficient to encode the hereditary information of all forms of life. He later suggested that molecular errors during the embryo development could be prevented by merging the information of two germ cells during fertilization. These opinions anticipated what today we know to be true: intact progenitor alleles offset the defects in an allele inherited from another progenitor.
5 An Overlooked Discovery Why is Miescherâ&#x20AC;&#x2122;s name not associated with DNA today? The first reason is that, unlike many diseases, species, anatomical structures, etc., molecules are not named after their discoverer. Second, Miescher was a reserved and introverted person. This meant that he moved in a very small circle and had few disciples, most of whom ended up leaving him. On top of this, he publicized and promoted his discoveries poorly. Could there be anything worse? Finally, and perhaps most importantly, irrespective of his passion for scientific research, he was an irresolute perfectionist to the point that he repeated his experiments more often than necessary. This delayed his publications and reduced his visibility enormously. Miescher was himself aware that research on nuclein was being increasingly associated with other researchers. In 1889, Richard Altmann changed the name of the molecule that Miescher had discovered to nucleic acid. This irritated Miescher beyond measure, as he had always highlighted that nuclein was an acid, but his outburst got him nowhere. However, the 75 years that elapsed between his discovery and the subsequent realization of its importance were perhaps what ended up being most decisive for Miescherâ&#x20AC;&#x2122;s contribution being overlooked. That was when, despite all the mysteries that the molecule still holds, DNA became the icon of modern life sciences.
6 Serendipity in the Discovery of Nuclein In terms of what it both denotes and connotes, the word serendipity has, as shown in Table 2, a very long and interesting past. Technically speaking, it means to look for something and fortuitously, by chance, slip-up, accident or luck, end up unexpectedly finding something
123
396
Á. Anguera de Sojo et al.
better. Roberts (1989) noted that serendipitous discoverers share dominant characteristics, such as sagacity, perception (also described as awareness), curiosity, flexible thinking and intensive preparation. And van Andel, who has catalogued over 1,000 cases of serendipity and agrees with Roberts on this point, claims that there are three personal characteristics of serendipitous discoverers: sagacity, adequate preparation and curiosity (Van Andel 1994). Serendipitous discoveries include champagnization by Dom Prignon, pasteurization by Louis Pasteur, penicillin by Alexander Fleming, X-rays by Wilhem Röntgen, vulcanization of rubber by Charles Goodyear, quanta by Max Plank, the universal computer by Alan M. Turing and von Neumann, America by Christopher Columbus, nuclear fission by Otto Hahn, Fritz Strassmann and Lisa Meitner, cosmic background radiation by Arno Penzias and Robert Wilson, radio astronomy by Janski, vaccination by Jenner, etc. Of the many, albeit evidently equivalent definitions of serendipity, the popular catchphrase, It’s like looking for needle in a haystack and rolling out with the farmer’s daughter or son, is the one that we think is best, because it is illuminating, that is, plain and simple, productive, and useful for drawing analogies and modelling. Its descriptiveness is beyond question, and we say it is productive since the needle in the haystack metaphor, which, trivially, means something that it is hard to find, can be apprehended and, consequently, modelled differently and thus equated to the following search and/or research scenarios (Koll 2000): • • • • • • • • • • •
Finding a known needle in a known haystack Finding a known needle in an unknown haystack Finding an unknown needle in an unknown haystack Finding any needle in a haystack Finding the sharpest needle in a haystack Finding most needles in a haystack Finding all the needles in a haystack Being able to confirm that there are no needles in a haystack Identifying each new needle that appears in the haystack Finding where the haystacks are Finding any needless in any haystacks, etc.
Now, what might look more like a play on words makes absolute sense when needle is switched for issue, question, etc., and haystack for internet. The Physics Nobel laureate Yuval Ne’eman and his colleague Aharon Kantarovich and Ne’eman (1989) and years later, more specifically, Antonio Dias de Figueiredo and José Campos Dias De Figueiredo and Campos (2001), expounded qualitative ways for discerning whether or not a discovery and/or experiment is serendipitous. Ne’eman and Kantarovich used the Oxford English Dictionary definition: The faculty of making happy and unexpected discoveries by accident. The dictionary contains, however, the following sharper definition: looking for one thing and finding another. This second definition refers to cases where one looks for A and finds B. This way, scientists following a procedure to solve a problem discover that the final result provides a solution to another problem, of which they were not aware. The notion of serendipity implies that the discoverer knows that he or she has discovered B or, at least, that he or she found something unexpected and/or relevant and meaningful. Sometimes, the scientist that made the discovery is not aware of the outstanding importance of his discovery and other scientists finish the job. Accordingly, Ne’eman and Kantarovich divide serendipitous events into two classes, as follows: 1. Scientists solve and/or explain B when they intended to solve and/or explain A. 2. Scientists solve and/or explain B plus A when they intended to solve and/or explain only A.
123
Serendipity and the Discovery of DNA
397
Table 2 Timeline of the term serendipity (Merton and Barber 2004) Year
Event
Undated
The Greeks had a god for the unknown, to whom they attributed the unexpected finding of something good. It was the Arcadia-born Hermes, son of Zeus and a lesser goddess called Maia, from whom the month of May takes its name. Hermes was also the inventor of the alphabet, music, weights and measures and fire. He was the patron of orators, travellers, tradespeople and thieves, who, as a newborn jumped out of his cradle to steal Apollo’s oxen. Additionally, he was a psychopomos, that is, responsible for conducting lost souls to the Underworld. In his honour, the two-faced milestones, signposts placed by the ancients at crossroads and/or forks in the path, land boundaries and even in gardens and at altars are called herms, a name still in use today 504–501 BC Heraclitus said, Unless you expect the unexpected you will never find [truth], for it is hard to discover and hard to attain 385 BC Through Socrates, Plato criticized the sophists (500–300 BC) in his Meno, as follows: I understand the point you would make, Meno, Do you see what a captious argument you are introducing that, forsooth, a man cannot inquire either about what he knows or about what he does not know? For he cannot inquire about what he knows, because he knows it, and in that case is in no need of inquiry; nor again can he inquire about what he does not know, since he does not know about what he is to inquire 1302 Delhi-based Amir Khusran, the greatest Persian poet of India, published Hasht Bihisht (The Eight Paradises) 1557 Christoforo Armeno published his book Peregurinagfio di tre Giovani Figliuoli de Re di Serendip (The Three Princes of Serendip). This is the first time the word Serendip, the mediaeval name of Ceylon, now known as Sri Lanka, appeared in print. These princes had the talent of making unexpected discoveries by sagacity. This book is a loose adaptation of Hasht Bihisht 1679 Robert Hooke signalled the role that luck plays in scientific discovery and invention in the preface of his book 1754 Horace Walpole coined the word serendipity, which he defined as the unexpected discovery of something one is not in quest of, in a letter to his namesake Horace Mann dated 28 January 1754 1775 Priestley referred to luck as being the observation of events arising due to unknown causes 1833 1854
1865 1875
1911
1930 1938
Lord Dover published Walpole’s meticulous correspondence, and the word serendipity appeared in print for the first time and became known to the scholarly world In his opening address as dean of the new School of Sciences of Lille, Pasteur recalled that Örsted’s experiment suddenly opened his eyes, one might say, by chance, but reminded his audience that, in the observational sciences, chance favours only the prepared mind Claude Bernard wrote that Experimental ideas are very often born by accident or on the occasion of a fortuitous observation Antiquarian, booklover and former chemist Edward Solly used the word serendipity in Notes and Queries, the newspaper founded by John Thoms in 1857, in response to a reader and introduced the term into literary circles, to which it was confined until 1930 On 12 May of the same year, R.S. Charnock of Gray’s Inn explained in the same medium, that Serendip was the Arabic corruption of Sinhala-devipa (island of lions), later corrupted down to Ceylon On 26 June again in the same newspaper, R.C. Chielders corrected Charnock and claimed that Sinhaladvipa means island of the Sinhalese people, and that Ceylon is a corruption of Sinhala only The great physiologist Walter B. Cannon, discoverer of homeostasis, described the role that chance plays in research at a lecture to Yale Medical School graduates, titled Career of Investigator, without mentioning the word serendipity Cannon started to use the term serendipity on a regular basis; he introduced and popularized the term in scientific and medical circles E. McDonald used the word serendipity in the annual report on the work of the Biochemical Research Foundation of the Franklin Institute in Philadelphia
123
398
Á. Anguera de Sojo et al.
Table 2 continued Year
Event
1941
In a lecture given at Cornell Medical School, Bronson Ray described how he had recently discovered the term serendipity, and recalled that Elliot Cutler, a Harvard professor of surgery, had given a talk mentioning serendipity as a desirable quality for students In his book The Way of an Investigator, Cannon published a chapter titled Gains of Serendipity, explaining the concept An article titled Choice of Research Projects published in the Journal of the Franklin Institute, 247, revealed that the word had been discovered in 1938 in a detective story! Robert K. Merton and Elinor G. Barber produced a 338 page typed manuscript (to be revised and perhaps extended) titled The Travels and Adventures of Serendipity, A Study in Historical Semantics and the Sociology of Science. The book is a clear example of anti-serendipity as it was not published until many years later, in the early 21st century. Merton had previously given a precise definition of the meaning and importance of serendipity in 1957
1945 1949 1958
Fig. 2 Dias de Figueiredo and Campos’ serendipity equations
In the light of the above works, Dias de Figueiredo and Campos introduce a simple notation, which they somewhat imprecisely denote serendipity equations. These equations, shown in Fig. 2 with their respective examples, highlight the key differences between serendipitous and non-serendipitous situations. To do this, they use P to describe a problem; KP for the problem knowledge domain; M for the unexpected metaphor or inspiring idea; KM for the metaphor knowledge domain; S for the solution; KS for the solution knowledge domain. The four equations actually rely on a spark, the metaphor, as a means of provoking insight: metaphor, unexpected metaphor, no inspiring metaphor and ignorance as a metaphor. In the first equation, the unexpected metaphor inspires the sought after solution, and is hence classed as pseudoserendipity. In the second equation, the unexpected metaphor leads to a new problem and a new solution, apart from solving the original problem. In the third equation, pragmatism takes the place of metaphor, a problem finds an echo in another problem and thus leads to a new solution. In the fourth equation, ignorance as a metaphor leads to an incorrect description of the problem domain, which implies a new problem and then a new solution.
7 Conclusions Two conclusions can be drawn from Miescher’s research. Following on from the above, an immediate inference is that the discovery of nuclein can be classed, according to Ne’eman
123
Serendipity and the Discovery of DNA
399
and Kantarovich, as a type-2 serendipitous discovery and, according to Dias de Figueiredo and Campos, as a type-3, or laxly type-2, discovery. According to Ne’eman and Kantovarich’s classification, it can be pigeonholed as a type-2 discovery because Miescher was trying to find out how cells worked by analysing proteins. He did actually make a lot of progress in this field, while he, serendipitously or unexpectedly, came upon DNA, that is, two for the price of one. Applying Dias de Figueiredo and Campos’ equations, his discovery is classed as type-3 serendipity, because Hoppe-Seyler’s recommendation to switch research from lymphocytes to leucocytes can be considered to be the metaphor in this case. Now, if we were to be very strict and not consider this advice as a metaphor, then Miescher’s discovery would be equation 2, serendipity without a metaphor. In sum, Miescher’s discovery can unquestionably be classed as serendipitous, because, as discussed in Sect. 3, he was researching the chemical composition and arrangement of cells, analysing their proteins, and unexpectedly found something much better: what he referred to as nuclein, now known as DNA, the molecule of life. The other deduction is further flung. Serendipities, or sudden illuminations, are the result of the general schematic anticipation latent in a researchers mind and triggered by a fortuitous and unexpected external event. Now, only a mind prepared by some pre-existing interest, idea, thought or experience will grab the chance or window of opportunity. This raises the following, open question, which we are researching: Can serendipity be programmed, planned or, in principle, generated computationally? The definition of serendipity would appear to rule out any such possibility. And Van Andel (1994), for instance, states that “pure serendipity cannot be produced by a computer”. Because of this, no computer would ever be able to pass a modern Turing test containing this question. But, it could be arranged for something unforeseen to always happen. The person, and of course the computer, experiencing this would then react autonomously to at least try to understand the unexpected observation or event. The examples of true serendipity suggest that knowledge-based systems, and particularly expert systems, can at least help, and perhaps in the future link up with, the researcher to achieve serendipitous discoveries more efficiently. Additionally, it certainly is possible for a computer searching for patterns of association or related interests to contribute something that its user would take as a coincidental discovery. Therefore, a computer could automate, speed up and provide support for the discovery of a new information item, which is the first part of the above concept of serendipity. The second part of the concept, the sagacity and the wisdom required to make the connection between items of information, continues to be dependent upon the individual person. Therefore, serendipity, defined as the attitude of making unexpected discoveries, can be developed with computational assistance insofar as it is a precious faculty in research. But, at the present time, as Hamlet said, readiness is all; hence, no surprise element, situation, etc., should be overlooked, and whatever prompted the surprise, including, anomalies, novelties and enigmas, should be taken into account. Regarding the state of readiness, that is, grasping the opportunity, the only course of action open is training. Save at a small number of elite establishments, people are taught, from primary school through to higher education, that knowledge flows from a question to an answer, from a hypothesis to a thesis. As a result, student learning is increasingly tested using multiple choice questionnaires. These questionnaires contain pre-stated questions that are followed by several likewise pre-stated answers, of which only one response is correct. Indeed, contrary to its name, the choice is single and not multiple. This may, inadvertently, convey the idea that knowledge in scientific research grows from a correct hypothesis to a correct response. This could not be further from the truth, as there is no a priori correct question or answer in research. Precisely, there is no way of knowing whether any such question or answer really exists or whether or how it can be discovered. Additionally, scientific practice
123
400
Á. Anguera de Sojo et al.
shows that the reasoning in a serendipitous observation or finding is retroductive, flowing not from the question to the answer but from a surprising event to a new problem (hypothesis). In today’s education and assessment system, however, students seldom learn to reason out an original problem from a surprising observation. What is more, serendipity is rarely used in the classroom, and abductive inference is never explained. Additionally, outside the Englishspeaking countries, the term serendipity, and the concept that it denotes, is not even in the dictionary. And this is another field of action where there is a lot of interesting work to be done, shifting the stress from teaching towards learning, because, ultimately, it is impossible to learn for somebody else. In this respect, today’s education system is plainly failing the subject of serendipitous learning, that is, using stories, cases and hypothesis statement to turn the application of the faculty of serendipity into a regular practice in training centres.
References Dahm, R. (2005). Friedrich Miescher and the discovery of DNA. Developmental Biology, 278(2), 274–288. Dahm, R. (2008a). Discovering DNA: Friedrich miescher and the early years of nucleic acid research. Human Genetics, 385, 565–581. Dahm, R. (2008b). El descubrimiento del adn. Investigacion y Ciencia, 122(6), 77–85. De Figueiredo, A. D., Campos, J. (2001). The serendipity equations. In First workshop on creative systems, international conference of case-based reasoning (ICCBR-01), Vancouver (Vol. 30, pp. 121–124). British Columbia: Canada. De Meuron-Ladolt, M. (1970). Johannes Friedrich Miescher: His personality & the importance of his work. Bulletin der Schweizerischen Akademie der Medizinischen Wissenschaften, 25(1–2), 9–24. Greenstein, J. P. (1943). Friedrich Miescher, 1844–1895. The Scientific Monthly, 57(5), 523–532. Harbers, E. (1969). On the discovery of DNA by Friedrich Miescher 100 years ago. Deutsche Medizinische Wochenschrift, 94(38), 1948–1949. James, J. (1970). Miescher’s discoveries of 1869. A centenary of nuclear chemistry. Journal of Histochemistry and Cytochemistry, 18(3), 217–219. Kantarovich, A., & Ne’eman, Y. (1989). Serendipity as a source of evolutionary progress in science. Studies in the History and Philosophy Science Part A, 20(4), 505–529. Koll, M. (2000). Information retrieval. Bulletin Jasis, 26(2), http://www.asis.org/Bulletin/Jan-00/track_3. html. Lagerkvist, U., & Brenner, S. (1998). DNA pioneers and their legacy. New Haven: Yale University Press. Maderspacher, F. (2004). Rags before the riches: Friedrich Miescher & the discovery of DNA. Human Genetics, 14(15), R608. Merton, R. K., & Barber, E. G. (2004). The travels and adventures of serendipity: A study in sociological semantics and the sociology of science. Princeton, NJ: Princeton University Press. Miescher, F. (1871). Ueber die chemische zusammensetzung der eiterzellen. Medicinisch-Chemische Untersuchungen, 4, 441–460. Miescher, F. (1874). Die spermatozoen einiger wirbeltiere. Ein beitrag zur histochemie. Verhandlungen der naturforschenden Gesellschaft in Basel, VI, 138–208. Ostrowski, W. (1970). From nucleic acids to DNA. On the 100th anniversary of the discovery of nucleic acids by Friedrich Miescher. Postepy Biochemii, 16(4), 581–587. Portugal, L. H., & Cohen, J. S. (1977). A century of DNA: A history of the discovery of the structure and function of the genetic substance. Cambridge, MA: MIT Press. Roberts, R. M. (1989). Serendipity: Accidental discoveries in science. New York: Wiley. Van Andel, P. (1994). Anatomy of the unsought finding. Serendipity: Origin, history, domains, traditions, appearances, patterns and programmability. British Journal for the Philosophy of Science, 45(2), 631–648. Watson, J. D., & Crick, F. H. C. (1953). A structure for deoxyribose nucleic acid. Nature, 171(4356), 737–738. Wolf, G. (2003). Friedrich Miescher: The man who discovered DNA. Chemical Heritage, 21(10–11), 37–41. Áurea Anguera de Sojo received a B.A. Degree in Law and a B.A. Degree in Economics from the Universidad Pontificia de Comillas (ICADE) (Madrid, Spain). She obtained her Ph.D. in Computer Science from the Universidad de Coruña (Spain). She has worked at Universidad Nacional de Educación a Distancia (UNED) since 1996 until now teaching about Human Research, Law and e-Commerce. Since 2002 she
123
Serendipity and the Discovery of DNA
401
is Associate Professor at Informatics and Law Department of Universidad Politécnica de Madrid (UPM), imparting classes in e-Commerce and Social, Professional, Legal and Ethics Aspects of Engineering. Her research interests include information systems, information retrieval and ICT social, economic and legal implications. Juan Ares received his Ph.D. in Computer Science from the University of A Coruña, A Coruña, Spain, in 1994. He is Associate Professor of the Information and Communications Technologies Department at the University of A Coruña, A Coruña, Spain. He has worked as director and consultant in several organizations, including Norcontrol Soluziona and Arthur Andersen. He has edited several books and authored numerous chapters and publications. His research interests include conceptual modelling, knowledge management and software process assessment. Dr. Ares is codirector of the Software Engineering Laboratory at the University of A Coruña and he has been invited to review papers for numerous prestigious computer science journals. María Aurora Martínez is Associate Professor at the Madrid Open University (UDIMA), Spain. She has a Ph.D. in computer science by the University of A Coruña. She has worked on several projects in several organizations. She has published a few book chapters and papers on several journals and international conferences. Her research interests in computer science include Artificial Intelligence, knowledge management and e-learning. Juan Pazos received the first Spanish doctorate in computer science from the Universidad Politécnica de Madrid, where he is currently Full Professor at the Department of Artificial Intelligence. He set up the first Spanish Artificial Intelligence Laboratory, and was a visiting professor at Carnegie Mellon University and Sunderland University, among others. He has been/is a member of the editorial board of the following journals: AI Magazine, Heuristics, Expert Systems with Applications and Failure and Lessons Learned in Information Technology Management, among others. He is author and co-author of 10 books on computer science and of over 100 publications. His current research is on the construction of an Information Theory that integrates Computing Science, DNA and the brain. Currently, he is Emeritus Professor at the Madrid Open University (UDIMA). Santiago Rodríguez received his Ph.D. in Computer Science from the University of A Coruña, A Coruña, Spain, in 2002. He is Associate Professor of the Information and Communications Technologies Department at the University of A Coruña, Spain. He has been a project leader in several Spanish organizations. He has authored of several book chapters and publications on software engineering. His research interests include conceptual modelling, knowledge management and e-learning. Dr. Rodríguez has been invited to review papers for numerous prestigious journals. José Gabriel Zato was born in La Coruña, Spain in 1944. He received the B.E. and M.E. degree in Physics engineering from the Universidad Complutense de Madrid (UCM), Spain, in 1973. He got the Ph.D. in Physics at UCM in 1982. From 1989 to present he is professor of the School of Computer Science of the Technical University of Madrid (UPM). He is the head of the Intelligent Systems for Accessible Mobility and Communication Group, with a wide experience in leading regional, national and international projects, funded through public as well as private funds. His research interest includes intelligent systems, usability and accessibility and rehabilitation technologies.
123
SEMINARIO N°13
LA GENERALIZACIÓN DE LOS RESULTADOS DE UNA INVESTIGACIÓN. Rothman KJ, Gallacher JE, Hatch EE. Why representativeness should be avoided. Int J Epidemiol. 2013; 42(4): 1012-4.
Preguntas para el control de lectura y guía de discusión grupal
1.
Después de leer con detenimiento el artículo, señale un aspecto con el que estaría de acuerdo y otro con el cual discreparía con los autores. Fundamente su respuesta.
2.
Según los autores, ¿cuál sería la diferencia entre representatividad y generalización?
3.
¿Tiene la inferencia estadística el mismo fin que la inferencia científica? Sustente su respuesta
4.
¿Qué importancia tiene la estrategia de la comparación en la investigación?
5.
Plantee las limitaciones de las investigaciones descriptivas en la investigación de relaciones causales en ciencia.
111
METODOLOGÍA DE LA INVESTIGACIÓN I
Published by Oxford University Press on behalf of the International Epidemiological Association ß The Author 2013; all rights reserved.
International Journal of Epidemiology 2013;42:1012–1014 doi:10.1093/ije/dys223
POINT COUNTERPOINT
Why representativeness should be avoided Kenneth J Rothman,1,2 John EJ Gallacher3 and Elizabeth E Hatch1 1
Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA, 2RTI Health Solutions, RTI International, Research Triangle Park, NC, USA and 3Institute of Primary Care and Public Health, Cardiff University, Cardiff, UK
Accepted
21 November 2012
Why do so many believe that selecting representative study populations is a fundamental research aim for scientific studies? This view is widely held: representativeness is exalted along with motherhood, apple pie and statistical significance. For some researchers this goal can be so important that they would deem a study not worth undertaking if representativeness cannot be achieved. That was the case for two advisors to the U.S. National Children’s Health Study, who resigned when the study design was changed so that representativeness was threatened.2 We admire people who take a stand for principle over expediency, but what exactly is the principle that representativeness embodies? Here we suggest that representativeness may be essential for conducting opinion polls, or for public-health applications, but it is not a reasonable aim for a scientific study. Within most scientific disciplines, sampling representativeness is incongruous with research goals. Immunologists doing experiments with hamsters do not dwell on getting a representative sample of hamsters. To the contrary, they select hamsters that are extremely unrepresentative because they are homogeneous, having identical genes, living in identical circumstances, and fed identical diets. The immunology of these hamsters may not be identical to that of people, but the expectation is that by controlling the
characteristics and environment of the hamsters, inferences can be drawn that may generalize to people. A simple view of generalization casts it as a process of constructing a correct statement about the way nature works. That process is uncertain, along with everything else in empirical science, but it is not an extrapolation from sample to target population. When Pasteur created the experiment that refuted the theory of spontaneous generation, he used a goose-neck flask to allow air to contact his cooling broth without letting organisms settle into the broth. His concern was to control the conditions in a precise way. Similarly, when John Snow conducted his natural experiment showing that London citizens imbibing diluted sewage were at much greater risk of cholera than those consuming water piped from upriver, he was not looking for a representative sample of London citizens. Instead he was looking for people whose characteristics and living conditions were comparable except for the source of their water consumption. Generalizing his findings was predicated on understanding the phenomenon at hand. When Doll and Hill studied the mortality of male British physicians in relation to their smoking habits,3 their findings about smoking and health were considered broadly applicable despite the fact that their study population was unrepresentative of the general population of tobacco users with regard to sex, race, ethnicity, social class, nationality and many other variables. Scientific generalization relates to the elaboration of the circumstances in which a finding applies. Newton’s laws of mechanics explain many physical phenomena, although we now know that they are not applicable on very small scales, at high speeds or in strong gravitational fields. On a more modest level, consumption of contaminated shellfish can cause hepatitis A infection, but this relation is largely nullified by consumption of beverages containing at least 10% alcohol along with the shellfish.4 The added knowledge about the modifying effect of alcohol is part of the generalization of the relation between
1012
Downloaded from http://ije.oxfordjournals.org/ at Universidad de Piura on January 3, 2017
The essence of knowledge is generalisation. That rubbing wood in a certain way can produce fire is a knowledge derived by generalisation from individual experiences; the statement means that rubbing wood in this way will always produce fire. The art of discovery is therefore the art of correct generalisation. What is irrelevant, such as the particular shape or size of the piece of wood used, is to be excluded from the generalisation; what is relevant, for example, the dryness of the wood, is to be included in it. The meaning of the term relevant can thus be defined: that is relevant which must be mentioned for the generalisation to be valid. The separation of relevant from irrelevant factors is the beginning of knowledge. —Hans Reichenbach1
WHY REPRESENTATIVENESS SHOULD BE AVOIDED
merely an average effect that has been weighted by the distribution of people across these subgroups. Thus, if you have a sample that is representative of the sex distribution in the source population, the results do not necessarily apply either to males or to females, but only to a hypothetical person of average sex. If you want to study the extent to which an effect varies by subgroup of a third variable, you need to design the research to examine the effect by subgroups. Representative sampling is needed to implement some study designs, such as control sampling from the source population in some case-control studies, but that sampling concern about controls is not the same as the representativeness of the study population itself. Seeking representativeness of the study population makes sense when sampling purely for descriptive purposes. Pollsters seek representative samples of their target populations to avoid polling everyone in the study population. Similarly, public-health professionals may rely on representative samples to describe the health status of specific populations. These descriptions are sampling snapshots that make no pretence of explaining how nature works. Their utility is in their description of a specific population at a point in time. Thus we draw a line between the scientific goal of understanding a phenomenon and the practical goal of applying that knowledge to specific populations. The first goal is not enhanced by representativeness, but rather depends more on tightly controlled comparisons drawn over a variety of relevant settings. It is the second goal, the application of science, that may require representative sampling. For example, from studies not involving representative samples, regular use of aspirin has been found to reduce the incidence of bowel cancer.5 Given a polyp-related mechanism,6 the public-health impact of aspirin chemo prevention would likely depend on the incidence or the prevalence of colorectal polyps in the target population. Measuring that impact on a target population would involve representative sampling. Surveys of opinions, of the prevalence of disease, of habits or of environmental exposures may be informative, but they are not science in the same way that causal studies about how nature operates are science. Polls more than a few days old may become irrelevant, even if conducted with people in the same geographical area. Consequently polls are conducted in numerous places and repeated often. Prevalence surveys may also lose validity quickly over time, depending on the stability of the condition measured, and they are seldom generalizable across populations. In contrast, a scientific finding would be expected to be repeatable. One way to distinguish science from the kind of information that surveys produce is its overall applicability in space and time. Scientific statements ideally serve to describe nature in a way that is not limited to one time and one place. Although biological
Downloaded from http://ije.oxfordjournals.org/ at Universidad de Piura on January 3, 2017
consumption of contaminated shellfish and the risk of infection with hepatitis A. It is not representativeness of the study subjects that enhances the generalization, it is knowledge of specific conditions and an understanding of mechanism that makes for a proper generalization. It is true that statistical inference, the process of inferring from a sample to the source from which it was drawn, is greatly aided by having a representative sample. The mistake is to think that statistical inference is the same as scientific inference. Science works on the assumption that the laws of nature are constant, but if we conflate statistical inference with scientific inference we get the reverse principle, in which the results of a study are applicable only in circumstances just like those of the study itself, and applicable only to people who are just like those in the study population. Indeed, representativeness can be counterproductive. Suppose a study is designed to examine the therapeutic efficacy of a drug. Consider three design alternatives: option A, enrol subjects between the ages of 40 and 49; option B, enrol the number of subjects needed from three age groups, 20-29, 40-49 and 60-69, to produce about equal numbers of outcomes in each of these age categories; or option C, enrol subjects with an age distribution that has been sampled to be representative of all patients with the problem the drug is intended to treat. Which design is best? The first design option will greatly limit age imbalances that could confound the results, thereby enhancing the study validity. It has the drawback, however, of informing about the effect only for subjects in a narrow range of age. Can inferences be drawn for patients of other ages? The answer depends on how much is known about the mechanism of effect. If little is known, then generalizing beyond the age range of study participants may be unwarranted. In that case, the study goal might be expanded to include how the effect varies by age. To do that, we would have to choose option B or C, and control for age imbalances through matching or in the analysis. If weighing options B and C to study how the effect varies by age, it is much better to choose option B, which allows three equally informative assessments in three distinct age ranges, rather than allowing the distribution of ages in the source population to determine the study design. The same point would apply to other potential effect-modifying variables. For example, to study how an effect varies by ethnic group or socioeconomic category, it would be preferable to choose equal numbers from the different groups, rather than select subjects in proportion to their numbers in the source population. Clearly, representativeness does not, in and of itself, deliver valid scientific inference. If a study population is representative of some larger source population, the overall associations observed in the study population may not apply to every subgroup. The overall effect is
1013
1014
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
principles seem to be vastly more varied than physics, and more dependent on locally varying modifying influences, the ultimate aim of biological research on humans or other species, is like that of physics, to be able to make general statements about nature. Paradoxical though it may seem, statistical representativeness leads to particular statements about the world, not general statements about nature. As initial steps, surveys may help to seed hypotheses and give a push toward scientific understanding, but the main road to general statements on nature is through studies that control skillfully for confounding variables and thereby advance our understanding of causal mechanisms. Representative sampling does not take us down that road.
Conflict of interest: None declared.
References 1 2 3
4
5
6
K.J.R. and E.E.H. were supported by grant # R01 HD060680 from the National Institute of Child Health and Human Development. J.E.J.G. was supported by funding from the UK Biobank.
Published by Oxford University Press on behalf of the International Epidemiological Association ß The Author 2013; all rights reserved.
International Journal of Epidemiology 2013;42:1014–1015 doi:10.1093/ije/dyt101
Commentary: On representativeness J Mark Elwood Department of Epidemiology and Biostatistics, School of Population Health, Tamaki Innovation Campus, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand. E-mail: mark.elwood@auckland.ac.nz
Accepted
15 January 2013
Most epidemiological studies—indeed, all the interesting ones—are designed to assess a potential causal relationship. There are often difficult choices in the selection of the subjects included in the study. Whether an intervention study, an observational cohort study or a case-control study, the selection of the subjects can influence both internal validity and external validity; and further, can modify the hypothesis being tested. Internal validity is the quality controlling whether a valid assessment of cause and effect can be made within the context of the study. External validity relates to the generalizability or application of this cause and effect assessment to other populations, and is clearly a secondary issue; if the study has very low internal validity, the conclusions are likely to be wrong, and so its generalizability is irrelevant.
With high internal validity, the valid assessment of the causal relationship may be widely generalizable, and does not require that the participants be representative of those to whom the new evidence will be applied. The value of good studies is in the fact that their results can be applied to very different populations, particularly in the future. Thus to choose the best treatments, physicians apply the results from internally valid studies, usually randomized trials, often done in different countries on patients diagnosed many years previously. We do not need to assume that the subjects involved in these earlier studies are representive, in a general way, of the new patient. Similarly we apply knowledge of genetics from fruit flies to humans, because the biological relationships are generalizable although the individuals studied are not. An epidemiological example is the UK Biobank cohort study: whereas
Downloaded from http://ije.oxfordjournals.org/ at Universidad de Piura on January 3, 2017
Funding
Reichenbach H. The Rise of Scientific Philosophy. Bognor Regis, UK: University of California Press, 1951, p. 5. Reichenbach H. Children’s Study Row. Nature 2012;483: 378. Doll R, Hill AB. The mortality of doctors in relation to their smoking habits: a preliminary report. Br Med J 1954; ii:1451–55. Desenclos JA, Klontz KC, Wilder MH, Gunn RA. The protective effect of alcohol on the occurrence of epidemic oyster-borne hepatitis A. Epidemiology 1992;3:371–74. Rothwell RM, Wilson M, Elwin CE, et al. Long-term effect of aspirin on colorectal cancer incidence and mortality: 20-year follow-up of five randomised trials. Lancet 2010; 376:1741–50. Levine JS, Ahnen DJ. Adenomatous polyps of the colon. N Engl J Med 2006;355:2551–57.
SEMINARIO N°14
VALIDEZ DE LAS RECOMENDACIONES DADAS A LOS PACIENTES. Hackshaw AK, Paul EA. Breast self-examination and death from breast cancer: a metaanalysis. Br J Cancer. 2003; 88(7): 1047-53.
Preguntas para el control de lectura y guía de discusión grupal 1.
¿Ha escuchado o leído alguna vez la recomendación que las mujeres deben realizarse periódicamente un autoexamen de mama? ¿Dónde ha escuchado o leído esta recomendación?
2.
Los autores desarrollan una revisión sistemática y como parte de su análisis aplican un metanálisis. Brevemente (no más de 100 palabras) señale en qué consiste una revisión sistemática y en qué un metanálisis.
3.
Analice la figura 1. ¿Son coherentes ambos resultados encontrados? (la práctica de autoexamen de mama disminuye la probabilidad de muerte por cáncer de mama, pero si se halla una tumoración por autoexamen de mama no se disminuye tal probabilidad)
4.
En la figura 3 se muestran resultados referentes a la búsqueda de atención médica y la realización de una biopsia. ¿Qué implicancias emocionales, médicas y económicas tendrían estos hallazgos?
5.
En la misma figura 3 se muestran los resultados sobre diagnóstica de cáncer de mama y mortalidad por cáncer de mama. Analizando estos hallazgos, ¿recomendaría el autoexamen de mama cómo método de tamiz (cribado) del cáncer de mama en mujeres?
Preguntas adicionales a ser discutidas en la sesión grupal 1.
En la metodología se describe que los autores revisaron estudios de tipo cohortes, casos y controles y ensayos clínicos. ¿Cuáles son las diferencias más importantes entre estos diseños? ¿Cuál de ellos proporcionaría una mayor evidencia respecto a una relación de tipo causal?
2.
¿Qué nombre recibe el gráfico de las figuras 1, 2 y 3? ¿Qué información proporcionan?
3.
¿Por qué en este caso, la escala del eje de las abscisas de las figuras 1, 2 y 3 es logarítmica?
115
METODOLOGÍA DE LA INVESTIGACIÓN I
British Journal of Cancer (2003) 88, 1047 – 1053 & 2003 Cancer Research UK All rights reserved 0007 – 0920/03 $25.00
www.bjcancer.com
Breast self-examination and death from breast cancer: a metaanalysis AK Hackshaw*,1 and EA Paul1 1
Barts & The London School of Medicine & Dentistry, Wolfson Institute of Environmental & Preventive Medicine, Queen Mary, University of London, Charterhouse Square, London EC1M 6BQ, UK
Breast self-examination (BSE) is widely recommended for breast cancer prevention. Following recent controversy over the efficacy of mammography, it may be seen as an alternative. We present a meta-analysis of the effect of regular BSE on breast cancer mortality. From a search of the medical literature, 20 observational studies and three clinical trials were identified that reported on breast cancer death rates or rates of advanced breast cancer (a marker of death) according to BSE practice. A lower risk of mortality or advanced breast cancer was only found in studies of women with breast cancer who reported practising BSE before diagnosis (mortality: pooled relative risk 0.64, 95% CI 0.56 – 0.73; advanced cancer, pooled relative risk 0.60, 95% CI 0.46 – 0.80). The results are probably due to bias and confounding. There was no difference in death rate in studies on women who detected their cancer during an examination (pooled relative risk 0.90, 95% CI 0.72 – 1.12). None of the trials of BSE training (in which most women reported practising it regularly) showed lower mortality in the BSE group (pooled relative risk 1.01, 95% CI 0.92 – 1.12). They did show that BSE is associated with considerably more women seeking medical advice and having biopsies. Regular BSE is not an effective method of reducing breast cancer mortality. British Journal of Cancer (2003) 88, 1047 – 1053. doi:10.1038/sj.bjc.6600847 www.bjcancer.com & 2003 Cancer Research UK Keywords: breast self-examination; breast cancer; mortality; meta-analysis
For many years, women have been taught methods of breast selfexamination (BSE) and it is recommended that they practise this regularly (Boyle et al, 1995; Shapiro et al, 1998), usually every month. There is a belief that among women who practise BSE, those who develop breast cancer are more likely to find it at an earlier stage and this is expected to lead to earlier treatment and hence decrease their risk of dying from the disease. Breast selfexamination is appealing as a routine screening method because the examination has no financial cost (apart from the initial instruction sessions) and can be conducted in private. Most studies on the effectiveness of BSE have been observational. They suggest that women who practise BSE are more likely to find their breast tumour themselves, that the tumour tends to be smaller and that these women have an increased survival (Hackshaw, 1996; International Agency for Research on Cancer (IARC), 2002). However, survival time as an outcome measure can be misleading because of lead-time bias, in which BSE only identifies cancers at an earlier stage but has no effect on prognosis. Using mortality rates instead of survival time can overcome much of this bias. Recently, the International Agency for Research on Cancer (2002) published a review on breast-cancer screening that reported the individual results from observational studies of BSE in relation to survival and stage of cancer, and those from randomised trials and cohort studies in relation to mortality. We here, however, present a meta-analysis of BSE and breast-cancer mortality by *Correspondence: Dr AK Hackshaw; E-mail: a.k.hackshaw@qmul.ac.uk Received 6 November 2002; revised 16 January 2003; accepted 24 January 2003
reviewing the published evidence from both observational studies and randomised trials, including those based on women with advanced breast cancer (used as a marker of death), and pooling the results. We look at three aspects of BSE; women who practise BSE, women who find their cancer during one of their regular examinations, and women who are taught BSE and advised to practise it regularly.
METHODS Data sources and study selection Studies that reported on rates of death from breast cancer or rates of advanced breast cancer (a marker of death) according to BSE practice were identified from Medline, Embase and Cancerlit (1966 – 2002), and included in the analysis. Keywords used were ‘breast cancer’ with ‘BSE’ or ‘self-examination’. In some studies, women were classified according to whether they practised BSE regularly or not. In other studies, women were classified according to the method of detecting the cancer: during BSE, by chance (e.g. while washing or dressing), mammography or examination by physician. Below we describe the main common features of the studies, but in the interest of brevity we do not provide further details, since these can be obtained directly from the individual published reports. We included results on mortality or, as a surrogate for death, advanced breast cancer (defined as stage III or IV, regional or distant). Analyses are presented separately for these two outcomes.
Epidemiology
Breast self-examination and death AK Hackshaw and EA Paul
1048 The following types of studies were included in the analyses. Studies on women newly diagnosed with breast cancer A total of 15 studies were based only on women newly diagnosed with breast cancer (Greenwald et al, 1978; Smith et al, 1980; Feldman et al, 1981; Tamburini et al, 1981; Foster and Costanza, 1984; Owen et al, 1985; Smith and Burns, 1985; Ogawa et al, 1987; Huguley et al, 1988; Kuroishi et al, 1992; Le Geyte et al, 1992; Kurebayashi et al, 1994; Auvinen et al, 1996; McPherson et al, 1997; Koibuchi et al, 1998) and they were divided into four groups based on two different measures of outcome and two different measures of exposure.
Epidemiology
Women who reported practising BSE or not and were then followed up for several years (usually about 5 years) to see who later died from breast cancer. Women who reported whether they found their cancer during self-examination or by chance and were then followed up for several years to see who later died from breast cancer. Women found to have advanced breast cancer at the time of initial diagnosis who reported retrospectively on whether they practised BSE or not. Women found to have advanced breast cancer at the time of initial diagnosis who reported retrospectively on whether they had found their cancer during self-examination or by chance (e.g. washing and dressing). In most studies mammography use was not stated, although it would not have been offered to many women since these studies were based on women diagnosed before the mid-1980s when such screening was not commonplace. In other studies, mammography use was low (2% of cancers detected by mammography, in Feldman et al, 1981) or similar between the BSE and non-BSE groups (Smith and Burns, 1985). In one study (Koibuchi et al, 1998), all women had a clinical examination as part of a massscreening programme. A difference between the BSE and non-BSE group was reported with respect to mammography use in one study (18% in the BSE group compared to 7% in the non-BSE group, Huguley et al, 1988) and mass screening by clinician examination in another study (37% in the BSE group and 21% in the non-BSE group, Kuroishi et al, 1992); analyses were performed both with and without these two studies. Cohort studies of women with and without breast cancer The two cohort studies were from Finland (Gastrin et al, 1994) and the USA (Holmberg et al, 1997). In these, breast cancer death rates according to BSE practice were reported in populations of women followed up for over 13 years. In one study (Gastrin et al, 1994), mammography was used only as a method of further investigation after a woman found a lump by BSE. In the other study, the followup period was until 1972 when mammography was not commonplace. Case – control studies of women with and without breast cancer There were three case – control studies, two from the USA (Newcomb et al, 1991; Muscat and Huncharek, 1992) and one from Canada that was nested within a randomised trial of mammography (Harvey et al, 1997). In each study, cases (women who had died from breast cancer or had advanced cancer) and age-matched controls (women without breast cancer) were asked about their past BSE practice. One study also further matched for screening centre and enrolment year. Clinical trials One trial, from the UK, was nonrandomised (UK Trial of Early Detection of Breast Cancer Group, 1999) and two, from China (Thomas et al, 1997, 2002) and Russia (Semiglazov et al, 1992, 1996, 1999), were randomised. The nonrandomised trial was based on comparing the breast cancer death rates after 16 years follow-up in two centres, in which women aged 45 – 64 years British Journal of Cancer (2003) 88(7), 1047 – 1053
were invited to attend a BSE session, with the rates in four centres, in which women were not invited for either BSE training or mammography. The two randomised trials were large. The one from China (Thomas et al, 1997, 2002) was based on randomising 520 factories in Shanghai, in which all women in a particular factory were either given three sessions on how to practise BSE or they were not. In total, there were about 267 000 women aged 30 – 69 years. Recruitment began in 1989 and interim results were reported after 5 years. The trial in Russia involved two cities (Moscow and St Petersburg (formerly Leningrad)), but only the results from St Petersburg have been published; approximately five (Semiglazov et al, 1992), nine (Semiglazov et al, 1996) and 13 (Semiglazov et al, 1999) years after recruitment began in 1985. This trial included about 120 000 women aged 40 – 64 years and, similar to the one in China, randomisation was undertaken according to the place of work and BSE was taught during several sessions. Information on the following was also extracted from the reports of the two randomised trials; the number of women who sought medical advice after finding a lump, the number who had a biopsy and the number diagnosed with breast cancer. Mammography screening was not available to women in either trial. Attendance of the BSE training sessions in the UK trial was low; only 31 and 53% of women in the two centers, respectively, accepted the invitation to be taught BSE. Attendance in the trial from China was high; 98% received baseline instruction and in one cohort with complete information on attendance (representing about half the women in the BSE group in the trial) 84% had attended all three training sessions. The reports from the Russian trial were based on women who had received training in BSE.
Definition of BSE practice In the studies based on only women newly diagnosed with breast cancer, the definition of BSE practice varied. It was monthly (Ogawa et al, 1987; Auvinen et al, 1996), monthly or several times a year (Feldman et al, 1981; Tamburini et al, 1981; Foster and Costanza, 1984; Koibuchi et al, 1998) or at least two (Smith and Burns, 1985) or three (Smith et al, 1980) times per year. In several studies, about half or more of the women in the BSE groups had reported that they checked their breasts monthly (Feldman et al, 1981; Smith and Burns, 1985; Ogawa et al, 1987; Le Geyte et al, 1992). In the two cohort studies (Gastrin et al, 1994; Holmberg et al, 1997) women were classified as BSE practitioners, if they did so monthly. In the Russian trial, 76% of women taught BSE reported practising it at least every 2 months (Semiglazov et al, 1999), and in the Chinese trial women practised BSE at least every 4 – 5 months during the first 4 – 5 years of the trial and were strongly encouraged to practise it monthly (Thomas et al, 2002).
Statistical analysis The relative risks (or odds ratio) and 95% confidence intervals (CI) were estimated from the data in each study. They were pooled on a log scale and weighted by the inverse of the variance, with allowance for any heterogeneity (DerSimonian and Laird, 1986).
RESULTS Table 1 shows results from the observational studies based only on women with breast cancer. Figure 1 shows the individual relative risk of death from breast cancer and the pooled relative risks. Overall, there appears to be a statistically significant 36% reduction in the risk of death (relative risk 0.64, 95% CI 0.56 – 0.73, Po0.001) in those who practise BSE. There was no evidence of heterogeneity between the studies & 2003 Cancer Research UK
& 2003 Cancer Research UK
Table 1 Observational studies of women with breast cancer; the number of deaths or advanced cancers and relative risk of dying from breast cancer in women who practise BSE compared to those who do not and in those who found their cancer during an examination Women who practise BSE regularly
Women who never or rarely practise BSE Relative risk of death or advanced cancer (95% CI)
No. of women with breast cancer
No. of breast cancer deaths or advanced cancers
No. of women with breast cancer
Huguley, 1988, USA Le Geyte, 1992, UK Kurebayashi, 1994, Japan Auvinen, 1996, Finland
All (20 – 97) 22 – 49 50 – 97 All (unspec.) 15 – 59 All (unspec.) All (unspec.)
61 18 43 327 60 3 F
424 134 287 1398 226 91 246
108 15 92 260 130 10 F
411 58 346 681 390 132 104
0.55 0.52 0.56 0.61 0.80 0.44 0.85
(0.40 – 0.75) (0.26 – 1.03) (0.39 – 0.81) (0.52 – 0.72) (0.59 – 1.08) (0.12 – 1.58) (0.53 – 1.33)
Advanced breast cancer Smith, 1980, USA Feldman, 1981, USA Tamburini, 1981, Italy Foster, 1984, USA Smith, 1985, USA Ogawa, 1987, Japan Huguley, 1988, USA Kurebayashi, 1994, Japan Koibuchi, 1998, Japan
30 – 80 All (unspec.) 35 – 64 All (20 – 97) 20 – 54 25 – 77 All (unspec.) All (unspec.) All (unspec.)
44 137 34 41 75 3 225 7 3
107 408 170 422 185 30 1396 91 68
24 256 90 123 67 20 246 18 18
57 588 330 410 134 116 680 132 174
0.98 0.77 0.73 0.32 0.81 0.58 0.45 0.56 0.43
(0.59 – 1.61) (0.63 – 0.95) (0.49 – 1.09) (0.23 – 0.46) (0.58 – 1.13) (0.17 – 1.95) (0.37 – 0.53) (0.24 – 1.35) (0.13 – 1.45)
(0.47 – 1.41) (0.33 – 0.99) (0.88 – 1.26) (0.57 – 1.30)
Breast cancer death Foster, 1984, USA
Age of women (years)
Cancer found by accidenta
Cancer found during BSE
British Journal of Cancer (2003) 88(7), 1047 – 1053
Breast self-examination and death AK Hackshaw and EA Paul
No. of breast cancer deaths or advanced cancers
Study (first author), country
Breast cancer death Greenwald, 1978, USA Kuroishi, 1992, Japan Auvinen, 1996, Finland McPherson, 1997, USA
All (unspec.) All (unspec.) All (unspec.) 40 – 49
16 F F 33
55 347 34 200
65 F F 70
182 1322 104 364
0.81 0.57 1.06 0.86
Advanced breast cancer Greenwald, 1978, USA Owen, 1985, USA Kuroishi, 1992, Japan
All (unspec.) All (unspec.) All (unspec.)
11 76 28
55 185 355
56 539 224
182 1168 1327
0.65 (0.34 – 1.24) 0.89 (0.70 – 1.13) 0.47 (0.32 – 0.69)
a
In one study (Greenwald et al, 1978), 20% of women in this group practised BSE although found their cancer by chance. Italics indicate that the data were estimated from results presented in the paper.
1049
Epidemiology
Breast self-examination and death AK Hackshaw and EA Paul
1050 Relative risk of having advanced breast cancer in BSE vs non-BSE groups
Relative risk of dying from breast cancer in BSE vs non-BSE groups 0.2
0.5
1
2
0.2
3 4 5
0.5
1
2
3 4 5
Practise BSE vs do not practise BSE Foster, 1984
Practise BSE vs do not practise BSE
Koibuchi, 1998 Foster, 1984
Huguley, 1988
Huguley, 1988
Kurebayashi, 1994
Le Geyte, 1992
Ogawa, 1987
Kurebayashi, 1994
Tamburini, 1981
Auvinen, 1996
Epidemiology
Feldman, 1981
All
0.64 (0.56 −0.73)
All (excl. Huguley)
0.69 (0.56− 0.85)
Smith, 1985 Smith, 1980
All
Cancer found by BSE vs found by chance
0.60 (0.46− 0.80)
Greenwald, 1978 Cancer found by BSE vs found by chance Kuroishi, 1992 Auvinen, 1996
Kuroishi, 1992
McPherson, 1997
Greenwald, 1978 Owen, 1985
All
0.90 (0.72 −1.12)
All (excl. Kuroishi)
1.00 (0.85− 1.18)
All
0.66 (0.44 −1.01)
Figure 1 Observational studies of women with breast cancer, comparing the breast cancer death rates between the BSE and non-BSE groups. A test for heterogeneity between the studies yielded a P-value of 0.41 for those studies based on women who practise BSE and a P-value of 0.26 for those based on finding cancer by BSE.
Figure 2 Observational studies of women with breast cancer, comparing the rates of advanced breast cancer between the BSE and non-BSE groups. A test for heterogeneity between the studies yielded a P-value of o0.001 for those studies based on women who practise BSE and a P-value of 0.051 for those based on finding cancer by BSE.
(P ¼ 0.41). If the study in which some women had mammography (Huguley et al, 1988) is excluded, the estimate is not substantially different (relative risk 0.69, 95% CI 0.56 – 0.85, Po0.001). In those women who reported that their cancer was detected during self-examination, there was no evidence of a reduction in the risk of death compared to those who found their cancer by chance (relative risk 0.90, 95% CI 0.72 – 1.12, P ¼ 0.34). Again there was no strong evidence of heterogeneity between the results (P ¼ 0.26). If the study (Kuroishi et al, 1992) in which some women had mass screening is excluded, the estimate is not much changed the pooled relative risk is 1.00 (95% CI 0.85 – 1.18, P ¼ 0.98). Figure 2 shows the relative risk of having advanced breast cancer in women who practise BSE compared to those who did not, among all women newly diagnosed with breast cancer. There is a 40% reduction in the risk (relative risk 0.60, 95% CI 0.46 – 0.80, Po0.001). Although there was evidence of heterogeneity (Po0.001), all the studies reported a reduction in risk. In women who found their cancer during an examination, there was a 34% reduction in risk (relative risk 0.66, 95% CI 0.44 – 1.01, P ¼ 0.06). The results from the cohort and case – control studies of women with and without breast cancer according to BSE practice are
shown in Table 2. The two cohort studies show inconsistent results; one indicates a statistically significant 29% reduction in the risk of death associated with BSE practice (relative risk 0.71, 95% CI 0.57 – 0.87) and the other shows no effect at all (relative risk 1.03, 95% CI 0.95 – 1.12). The pooled estimate is not statistically significant (relative risk 0.87, 95% CI 0.62 – 1.23, P ¼ 0.42). None of the case – control studies found statistically significant effects with only one suggesting a benefit (Harvey et al, 1997). The results for two of the case – control studies were not materially altered after adjustment for mammography use (Newcomb et al, 1991; Muscat and Huncharek, 1992). Table 3 provides the main results from the trials of teaching BSE and Figure 3 shows the relative risks and the pooled estimates for the main outcomes. The nonrandomised trial in the UK showed no effect overall (relative risk 0.99) even after 16 years of follow-up, although there was a difference between the two BSE centres; one showing a reduction in mortality (relative risk 0.79) and the other not (relative risk 1.09), which cannot readily be explained. In the Russian trial, twice as many women in the BSE group sought medical advice compared to the non-BSE group (Figure 2), and this was consistent throughout the course of the trial (BSE vs non-BSE groups: 5.6 vs 2.8% at 5 years, 7.2 vs 3.5%
British Journal of Cancer (2003) 88(7), 1047 – 1053
& 2003 Cancer Research UK
Breast self-examination and death AK Hackshaw and EA Paul
1051 Table 2 Observational studies of women with and without breast cancer; number of deaths and relative risk of dying from breast cancer in women who practise BSE compared to those who do not
Study (first author), country
Age of women (years)
Cohort studies Gastrin, 1994, Finland
Holmberg, 1997, USA
Women who do not practise BSE
No. of breast cancer deathsa
No. of women without breast cancer
No. of breast cancer deathsa
No. of women without breast cancer
Relative risk or odds ratio of death (95% CI, if available)
95 24 71 925 F F F F
28 780 F F 176 677 F F F F
F F F 1375 F F F F
F F F 271 179 F F F F
0.71 (0.57 – 0.87) 0.64 0.74 1.03 (0.95 – 1.12) 0.95 1.07 1.03 1.02
251 168 97
430 344 1095
184 41 121
457 89 1091
1.45 (1.15 – 1.83) 1.06 (0.70 – 1.60) 0.79 (0.59 – 1.04)
All (X20) 20 – 49 X50 All p39 40 – 49 50 – 59 X60
Case – control studies Muscat, 1992, USA All (unspec.) Newcomb, 1991, USA 20 – 80 Harvey, 1997, Canada All (40+) a
Or women with advanced breast cancer (Muscat, 1992; Newcomb, 1991). Dashes indicate that the data were not available from the published paper.
Table 3 Clinical trials of BSE; the number of biopsies, breast cancer cases and deaths and the relative risk of dying from breast cancer in women who practise BSE compared to those who do not BSE training
Study Nonrandomised UK Trial, 1999 (after 16 years)
Randomised China (after 5 years) (after 10 years) Russia (after 5 years) (after 9 years) (after 13 years)
Age of women (years)
All (45 – 74) 45 – 49a 50 – 54 55 – 59 60 – 64
No BSE training
Breast cancers
Deaths
Number randomised
F F F F F
F F F F F
661 236 159 189 165
63 373b F F F F
1788 3627
331 857
25 135
662 1094 1138
190 449 493
F 99 157
Biopsies
Relative risk of death (95% CI)
Breast cancers
Deaths
Number randomised
F F F F F
F F F F F
1312 511 318 388 318
127 123b F F F F
133 375 132 979
945 2398
322 890
25 131
133 665 133 085
1.00 (0.58 – 1.74) 1.03 (0.81 – 1.31)
60 221 57 712 57 712
467 757 797
192 406 446
F 97 164
60 089 64 759 64 759
F 1.15 (0.87 – 1.52) 1.07 (0.86 – 1.34)
Biopsies
0.99 0.94 0.96 0.98 0.99
(0.87 – 1.12)c (0.80 – 1.12) (0.78 – 1.18) (0.81 – 1.19) (0.81 – 1.22)
30 – 69
40 – 64
a Includes women from additional cohorts. Dashes indicate that the data were not available from the published paper. Publications: China (Thomas, 1997, 2002) and Russia (Semiglazov, 1992, 1996, 1999). bFor the UK trial, this is the number of women invited to attend BSE training or the number that were not (i.e. in the comparison centres). cAge adjusted.
at 9 years and 7.5 vs 3.8% at 13 years). After 10 years, the two trials (Russia and China) show that overall there were 53% more biopsies in women who were taught BSE compared to those who were not, this was highly statistically significant (relative risk of having a biopsy 1.53, 95% CI 1.44 – 1.63, Po0.001). The trials also suggest that at 5 years, women taught BSE were no more likely to be diagnosed with breast cancer than those not taught BSE (relative risk 1.01). After a longer follow-up (9 – 10 years), there is an indication from the Russian trial that more cancers were found in the BSE group (24% more women & 2003 Cancer Research UK
diagnosed with breast cancer), but this was not found in the trial from China. The risk of dying from breast cancer was remarkably consistent between the three trials and over the different follow-up periods. There was no evidence of an advantage in the BSE group after any length of follow-up. The pooled relative risk was 1.01 with narrow 95% confidence limits (0.92 – 1.12, P ¼ 0.79); there was no evidence of heterogeneity (P ¼ 0.94). The results were not materially different if the nonrandomised trial from the UK was excluded, pooled relative risk 1.05 (95% CI 0.90 – 1.24, P ¼ 0.54). British Journal of Cancer (2003) 88(7), 1047 – 1053
Epidemiology
Women who practise BSE regularly
Breast self-examination and death AK Hackshaw and EA Paul
1052 0.2
Relative risk (95% Cl) 0.5 1 2 3 4 5
Women who found their cancer during an examination
Seeking medical advice Russia (5 years) (9 years) (13 years)
No evidence of a reduction in mortality was found in women who reported that they found their cancer during self examination (Figure 1). There was an indication that there was an effect when advanced cancer was used as the outcome measure, but the overall result was not statistically significant (Figure 2).
Undergoing a biopsy
Women who are taught BSE
China (5 years) (10 years) Russia (5 years) (9 years) (13 years) Both (5 years) Both (9 and 10 years)
1.64 (1.24− 2.18) 1.53 (1.44− 1.63)
Epidemiology
Diagnosis of breast cancer China (5 years) (10 years) Russia (5 years) (9 years) (13 years) Both (5 years) Both (9 and 10 years)
1.01 (0.90− 1.15) 1.09 (0.86− 1.38)
Death from breast cancer China (5 years) (10 years) Russia (9 years) (13 years) UK (16 years) All three (longest follow-up)
Breast self-examination as an alternative to mammography 1.01 (0.92 −1.12)
Figure 3 Trials of BSE training. The rates for specified outcomes are compared between women invited for BSE training and those who were not. A test for heterogeneity between the trials yielded a P-value of 0.94 in relation to the results on mortality.
There was little evidence that the effect of BSE varied between women in different age groups (Tables 1, 2 and 3).
DISCUSSION Women who practise BSE Only observational studies of women with breast cancer who were asked about their history of regular BSE practice consistently found a difference in breast cancer mortality associated with BSE. The studies are likely to be affected by several biases – publication bias, selection bias, recall bias, lead-time bias and length-biased sampling (there may be a larger proportion of slow-growing cancers diagnosed in women who practise BSE; slow-growing cancers tend to have better prognoses). Several studies have shown that various characteristics that are likely to be associated with dying from breast cancer were also associated with BSE practice, but analyses adjusting for the potential effect of such confounding on mortality were not reported. Women who practised BSE tended to be younger, premenopausal and of a higher socioeconomic status (Smith et al, 1980; Feldman et al, 1981; Tamburini et al, 1981; Huguley et al, 1988; Le Geyte et al, 1992; Auvinen et al, 1996). Much of the reduction in mortality observed in these studies might therefore be explained by a combination of these and other confounding factors as well as the aforementioned biases, rather than a real effect of BSE. British Journal of Cancer (2003) 88(7), 1047 – 1053
The two randomised trials of mortality are unaffected by bias and both show no effect of BSE on breast cancer mortality, after 5 or 13 years. In the Russian trial, there was an increase in breast cancer diagnoses in women taught BSE after 9 and 13 years, but this was not reflected in a decrease in mortality at either time. Both trials also show that women in the BSE group are much more likely to be referred for a biopsy. At about 10 years, the overall malignant to benign biopsy ratio was 1 : 2.3 in the BSE group and 1 : 1.3 in the non-BSE group, indicating that in women who were taught BSE, there is one extra biopsy in women without cancer for every diagnosed case of breast cancer. Despite the initial appeal of regular BSE, the evidence shows that it is likely to result in a considerable increase in women without breast cancer who have breast biopsy with its associated anxiety and counselling, but with no benefit. The two randomised and one nonrandomised trials were based on about 580 000 women and 2344 breast cancer deaths; the conclusions are therefore robust. Although the two randomised trials were based on BSE training, the negative results are also, to some extent, applicable to BSE practice since uptake was high and women reported practising BSE regularly (every 2 months in the Russian trial and every 4 – 5 months in the Chinese trial).
The results presented here on BSE may have an impact on the current debate over the use of mammography screening. Despite clear evidence to the contrary (Wald et al, 1993; Nystrom et al, 1996; IARC 2002), it has been suggested recently that mammography screening is not effective in reducing mortality for breast cancer (Olsen and Gotzsche, 2000, 2001). Breast self-examination may be considered to be an alternative. The conclusion that mammography was not worthwhile was based on only one out of the six existing randomised trials of breast-cancer mortality comparing mammography with no screening. The other five trials were rejected on the grounds of perceived differences between the screened and unscreened groups at baseline. When the one trial acceptable to the authors was combined with a trial that compared mammography with clinician examination, the reported relative risk was 1.04 (95% CI 0.84 – 1.27). As a result, there has been some confusion over whether mass mammography screening should continue. Several groups (Reply to Olsen and Gotzsche, 2000; Miettinen et al, 2002) have rejected the claim that mammography is not worthwhile with many valid criticisms of the reported analysis. Taking the evidence from all six trials, the relative risk is 0.76 (95% CI 0.67 – 0.87) in women aged X50 years (Wald et al, 1993); a statistically significant 24% reduction in breast cancer deaths. Mammography screening is recommended to women over the age of 40 years in the US and 50 – 64 years in the UK. Without it, these women currently have no other means of reducing their chance of dying from breast cancer. Breast self-examination, perhaps the only other method that could be in widespread use, is unlikely to be a worthwhile alternative, even as a method of screening to be used in between mammographic examinations. The evidence presented here shows that it is ineffective in saving lives. Women should, of course, still be aware of changes in their breasts and seek advice if concerned, but being taught BSE and practising it regularly is no more effective at reducing breast cancer mortality than finding the tumour by chance. & 2003 Cancer Research UK
Breast self-examination and death AK Hackshaw and EA Paul
1053 Auvinen A, Elovainio L, Hakama M (1996) Breast self-examination and survival from breast cancer; a prospective follow-up study. Breast Cancer Res Treat 38: 161 – 168 Boyle P, Veronesi U, Tubiana M, Alexander FE, Calais da Silva F, Denis LJ, Freire JM, Hakama M, Hirsch A, Kroes R, La Vecchia C, Maisonneuve P, Martin-Moreno JM, Newton-Bishop J, Pindborg JJ, Saracci R, Scully C, Standaert B, Storm H, Blanco S, Malbois R, Bleehen N, Dictato M, Plesnicar S (1995) European School of Oncology Advisory Report to the European Commission for the ‘‘Europe against Cancer Programme’’. Eur J Cancer 31A(9): 1395 – 1405 DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Controlled Clin Trials 7: 177 – 188 Feldman JG, Carter AC, Nicastri AD, Hosat ST (1981) Breast selfexamination, relationship to stage of breast cancer at diagnosis. Cancer 47: 2740 – 2745 Foster RS, Costanza MC (1984) Breast self-examination practises and breast cancer survival. Cancer 53: 999 – 1005 Gastrin G, Miller AB, To T, Aronson KJ, Wall C, Hakama M, Louhivuori K, Pukkala E (1994) Incidence and mortality from breast cancer in the Mama Program for breast screening in Finland, 1973 – 1986. Cancer 73: 2168 – 2174 Greenwald P, Nasca PC, Lawrence CE, Horton J, McGarrah RP, Gabrielle T, Carlton K (1978) Estimated effect of breast self-examination and routine physician examinations on breast cancer mortality. N Engl J Med 299: 271 – 273 Hackshaw AK (1996) Screening for breast cancer in young women using breast self-examination. In Evidence-Guided Prescribing of the Pill, Hannaford PC, Webb AMC (eds). Royal College of General Practitioners. Parthenon Publishing Group, Lancs, UK Harvey BJ, Miller AB, Baines CJ, Corey PN (1997) Effect of breast selfexamination techniques on the risk of death from breast cancer. Can Med Assoc J 157: 1205 – 1212 Holmberg L, Ekbom A, Calle E, Mokdad A, Byers T (1997) Breast cancer mortality in relation to self-reported use of breast selfexamination. A cohort study of 450,000 women. Breast Cancer Treat Res 43: 137 – 140 Huguley CM, Brown RL, Greenberg RS, Clark WS (1988) Breast self-examination and survival from breast cancer. Cancer 62: 1389 – 1396 International Agency for Research on Cancer (IARC) (2002) Efficacy of screening by self examination In Handbook of Cancer Prevention. Vol 7: Breast Cancer Screening, Vainio H, Bianchini F (eds). Lyon, France: IARC Koibuchi Y, Iino Y, Takei H, Maemura M, Horiguchi J, Yokoe T, Morishita Y (1998) The effect of mass screening by physical examination combined with regular breast self-examination on clinical stage and course of Japanese women with breast cancer. Oncol Rep 5: 151 – 155 Kurebayashi J, Shimozuma K, Sonoo H (1994) The practise of breast selfexamination results in the earlier detection and better clinical course of Japanese women with breast cancer. Jpn J Surg 24: 337 – 341 Kuroishi T, Tominaga S, Ota J, Horino T, Taguchi T, Ishida T, Yokoe T, Masaru I, Ogita M, Itoh S, Abe R, Yoshida K, Morimoto T, Enomoto K, Tashiro H, Kashiki Y, Yamamoto S, Kido C, Honda K, Sasakawa M, Fukuda M, Watanabe H (1992) The effect of breast self-examination on early detection and survival. Jpn J Cancer Res 83: 344 – 350 Le Geyte M, Mant D, Vessey MP, Jones L, Yudkin P (1992) Breast self-examination and survival from breast cancer. Br J Cancer 66: 917 – 918 McPherson CP, Swenson KK, Jolitz G, Murray CL (1997) Survival of women ages 40 – 49 with breast carcinoma according to method of detection. Cancer 79: 1923 – 1932 Miettinen OS, Henschke C, Pasmantier MW, Smith JP, Libby DM, Yankelevitz DF (2002) Mammographic screening: no reliable supporting evidence? Lancet 359: 404 – 406 Muscat JE, Huncharek MS (1992) Breast self-examination and extent of disease: a population-based study. Cancer Detect Prev 15: 155 – 159
& 2003 Cancer Research UK
Newcomb PA, Weiss NS, Storer BE, Scholes D, Young BE, Voigt LF (1991) Breast self-examination in relation to the occurrence of advanced breast cancer. J Natl Cancer Inst 83: 260 – 265 Nystrom L, Larsson LG, Wall S (1996) An overview of the Swedish randomised mammography trials: total mortality pattern and the representivity of the study cohorts. J Med Screening 3: 85 – 87 Ogawa H, Tominaga S, Yoshida M, Kubo K, Takeuchi S (1987) Breast selfexamination practise and clinical stage of breast cancer. Jpn J Cancer Res 78: 447 – 452 Olsen O, Gotzsche PC (2000) Is screening for breast cancer with mammography justifiable? Lancet 355: 129 – 134 Olsen O, Gotzsche PC (2001) Cochrane review on screening for breast cancer with mammography. Lancet 358: 1340 – 1342 Owen WL, Hoge AF, Asal NR, Anderson PS, Owen AS, Cucchiara AJ (1985) Self-examination of the breast: use and effectiveness. Southern Med J 78: 1170 – 1173 Reply to Olsen O, Gotzsche PC (various authors) (2000) Screening mammography re-evaluated. Lancet 355: 747 – 752 Semiglazov VF, Moiseyenko VM, Bavli IL, Migmanova NS, Seleznyov NK, Popova RT, Ivanova OA, Orlov AA, Chagunava OA, Barash NJ, Matitzin AN, Dyatchenko OT, Kozhevnikov SY, Alexandrova GI, Sanchakova AV, Musayev BT (1992) The role of breast self-examination in early breast cancer detection (Results of the 5-years USSR/WHO randomised trial study in Leningrad). Eur J Epidemiol 8: 498 – 502 Semiglazov VF, Moiseyenko VM, Manikhas AG, Protsenko SA, Kharikova RS, Popova RT, Migmanova NS, Orlov AA, Barash NI, Ivanova OA, Ivanov VG (1999) Interim results of a prospective randomised study of self-examination for early detection of breast cancer. Vopr Onkol 45: 265 – 271 Semiglazov VF, Moiseyenko VM, Protsenko SA, Bavli IL, Orlov AA, Ivanova OA, Barash NI, Chagunava OL, Golubeva OM, Migmanova NS, Seleznev IK, Popova RT, Diatchenko OT, Kozhevnikov SY, Aleksandrova GI, Sanchakova AV, Kharikova RS, Liubomirova NK, Ivanova GV, Azeev VF, Chuprakova IS (1996) Preliminary results of the Russia (St Petersburg)/WHO program for the evaluation of the effectiveness of breast self-examination. Vopr Onkol 42: 49 – 55 Shapiro S, Coleman EA, Broeders M, Codd M, de Konging H, Fracheboud J, Moss S, Paci E, Stachenko S, Ballard-Barbash R for the International Breast Cancer Screening Network (IBSN) and the European Network of Pilot Projects for Breast Cancer Screening (1998) Breast Cancer Screening Programmes in 22 countries: current policies administration and guidelines Int J Epidemiology 27: 735 – 742 Smith EM, Burns TL (1985) The effects of breast self-examination in a population based cancer registry. Cancer 55: 432 – 437 Smith EM, Francis AM, Polissar L (1980) The effect of breast self-exam practise and physician examinations on extent of disease at diagnosis. Prev Med 9: 409 – 417 Tamburini M, Massara G, Bertario L, Re A, Di Pietro S (1981) Usefulness of breast self-examination for an early detection of breast cancer. Results of a study on 500 breast cancer patients and 652 controls. Tumori 67: 219 – 224 Thomas DB, Gao DL, Ray RM, Wang WW, Allison CJ, Chen FL, Porter P, Hu YW, Zhao L, Pam D, Li W, Wu C, Coriaty Z, Evans I, Lin MG, Stalsberg H, Self SG (2002) Randomized trial of breast self-examination in Shanghai: final results. J Natl Cancer Inst 94: 1445 – 1457 Thomas DB, Gao DL, Self SG, Allison CJ, Tao Y, Mahloch J, Ray R, Qin Q, Presley R, Porter P (1997) Randomised trial of breast self-examination in Shanghai: methodology and preliminary results. J Natl Cancer Inst 89: 355 – 365 UK Trial of Early Detection of Breast Cancer Group (1999) 16-year mortality from breast cancer in the UK Trial of Early Detection of Breast Cancer. Lancet 353: 1909 – 1914 Wald NJ, Chamberlain J, Hackshaw AK on behalf of the Evaluation Committee Consensus statement (1993) Report of the European Society for Mastology Breast Cancer Screening Evaluation Committee. Breast 2: 209 – 216
British Journal of Cancer (2003) 88(7), 1047 – 1053
Epidemiology
REFERENCES