Bias in scientific research
Houda Boulahbel. February 2011
ŠHouda Boulahbel. February 2011
On his first lecture as a professor in the California Institute of Technology, Richard Feynman (a physicist and Nobel Prize winner) described an experiment performed by Robert Millikan, which aimed to measure the charge of the electron: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It’s a little bit off, because he had the incorrect value for the viscosity of air. It’s interesting to look at the history of measurements of the charge of the electron, after Millikan. If you plot them as a function of time, you find that one is a little bigger than Millikan’s, and the next one’s a little bit bigger than that, and the next one’s a little bit bigger than that, until finally they settle down to a number which is higher. Why didn’t they discover that the new number was higher right away? It’s a thing that scientists are ashamed of—this history—because it’s apparent that people did things like this: When they got a number that was too high above Millikan’s, they thought something must be wrong—and they would look for and find a reason why something might be wrong. When they got a number closer to Millikan’s value they didn’t look so hard. And so they eliminated the numbers that were too far off, and did other things like that1.
©Houda Boulahbel. February 2011
This essay is not about scientific fraud, or misuse of scientific data. It is about trends that reflect the ways of science, and that impact on the scientific process and ultimately shape our knowledge. The scientific process starts from a hypothesis based on observation, prior knowledge and earlier research. Once the hypothesis is formulated, the next step is to test it with a number of experiments in order to confirm or refute it. The best outcome would be to validate one’s hypothesis, to prove that one was right, and to publish it, obtain more funding and conduct follow up studies. This best case scenario requires that a number of conditions are met. The hypothesis needs to be exciting enough to obtain funding in the first place, and the results must be exciting enough to later be published, ideally in a prestigious journal. In the science world, this generally means that the hypothesis (or any deviation from it) ought to give rise to positive data that validate it, as disproving (commonly referred to as negative) results are very difficult, if not impossible to publish. Since publications are the currency whereby scientists are measured, appreciated and employed, their focus is shifted to generating positive data. Thus, a bias towards publishing positive results creates a flaw in the scientific process, right from its very start! The initial goal of testing a hypothesis becomes an ambition to prove it. It becomes then easy to see how a scientist who has invested considerable time and effort in developing a project, who is completely and utterly convinced of the idea, and who absolutely must publish it to secure his/her career, can subconsciously focus on all the elements that prove the hypothesis, and disregard what does not fit. This eagerness to produce positive data is trouble. Scientists can find ways to make their data fit to their hypothesis. There is always scope for alternative explanations for why a result is not quite in line with expectations, as there are justifiable reasons for excluding such a result. Different types of graphs can be used to make data look more convincing and more publishable. Statistics can also be presented or interpreted in various ways to make them significant*. Data are not altered, but simply interpreted from a specific angle. This is analogous to a photographer playing with light and shade to produce a flattering photograph and emphasise particular features. This may cause alternative -and potentially accurateexplanations to be overlooked. As an undergraduate I was very entertained by studies on sexual selection and reproductive success. I remember reading paper after paper that described how asymmetry reflected genetic quality, and how the more symmetrical males (in whatever animal species you could imagine) were more successful at mating. In most papers, the results were black and white. There seemed to be no doubts about the phenomenon. It was true. Fast forward to 2011, and an article in the New Yorker2 uses this example to illustrate a worrying tendency: that of many exciting scientific discoveries not standing the test of time. By looking at the number of reports on fluctuating asymmetry over time, a trend became apparent. The majority of initial papers proved the theory of fluctuating symmetry, but as time went on and more studies were published, increasingly fewer reports were consistent with it. Eventually, those that did showed an increasingly weak correlation between reproductive success and symmetry2. This phenomenon is quite common in science. Often, when a particular idea or school of thought dominates the field, data that confirm it are seductive, while those that contradict or
deviate too much from it are difficult to publish. This creates a bias in the scientific literature that reinforces the idea. However, paradigms can shift or fall apart, leading to the emergence of many reports that reflect this shift. This is not surprising considering that there is a large element of uncertainty in science, and that scientists are human and prone to error. There is always room for new evidence and improved theories. However, it seems foolish to ignore the trend to reward “positive� results and to ignore the rest. A result that does not prove a hypothesis can provide useful knowledge in itself3. Science needs to become a little more democratic, allowing all opinions to be expressed as long as they are the result of sound and rigourous experimental design. * A statistical result is deemed significant when the probability of it occurring merely by chance is very small. The p value is a calculation of this probability. The current cut off for statistical significance is 5%, i.e. p must be lower than 0.05. Statistical analyses that achieve such a value are commonly trusted despite (sometimes) inadequate sample size or experimental settings. Ironically, the 5% cut off was arbitrarily chosen by Ronald Fisher, the mathematician who invented the test. He only chose 5% because it made calculations easier.
1. More thoughts on the Decline effect. The New Yorker. 3.1.2011 2.The truth wears off. Is there something wrong with the scientific method? The New Yorker. 13.12.2010 3. The Journal of All Results is a new initiative attempting to encourage the peer-review based publication of negative and secondary data.
ŠHouda Boulahbel. February 2011