A few years ago I had the privilege of working within an organization largely funded by the National Institutes of Health, supporting scientists on the leading edge of research into cellular senescence and other age-related phenomenon. I learned a great deal about a wide range of subjects and I also learned that science, even when conducted by well-meaning and highly qualified researchers, can result in garbage outputs. The two great journals Science and Nature both conclude the same thing a little while later, noting that at least 50% of all papers published in reputable journals aren’t worth the paper they’re printed on.
This is largely because too many scientists aren’t taught during their years of study how to design experiments. They don’t think like engineers and they don’t have the skills of professional statisticians and so their study designs are flawed and their data interpretations are often wildly wrong. Thus we get junk science.
This could be remedied, of course. PhD courses could include in-depth study of the differences between good and bad experimental design. PhD courses could include in-depth study of statistics so that researchers (a) understood the techniques properly, and (b) understood when to apply which technique to what type of dataset so as to produce valid results. Sadly, there’s little sign of this happening as the typical PhD student increasingly knows more and more about less and less; this is an inevitable consequence of having to prepare a thesis on a new and unique topic in order to obtain that much-needed tertiary degree.
What this means is that the rest of us need to be extremely cautious consumers of science news. We already know to ignore the mass media reports, which too often turn the published information (“In this paper, Professor X shows that test subjects given c% of QQQ show a 1.35% reduction in pre-cancerous epithelial tissue cultured under standard conditions”) into headline-grabbing nonsense: Professor X Cures Cancer!!!
Until we’ve read the actual paper and been able to look at the experimental design and can analyze the data for ourselves, we really can’t be sure if the claims the paper is making are born out. Of course, most of us have neither the time nor the requisite knowledge to do this consistently, especially if we’re interested in topics across multiple specialist domains. But we can apply general heuristics across a surprisingly wide set of domains in the natural sciences.
In order to provide an example of the sort of basic approach we can use, I’ll borrow a theme from another article I recently read here on Medium. In it, the writer was arguing that acupuncture is a “proven” technique that can reduce pain. Her arguments, put into summary form, were as follows: (a) the Chinese have been using it for 5,000 years so there must be something to it; (b) a recent study in a hospital reported a 35% reduction in use of traditional analgesics among patients who’d been given acupuncture and a subsequent reduction in side-effects from administration of analgesics.
Sounds legit, right?
Except it’s not. Let’s take the first notion, that because someone’s been doing something for a very long time therefore that proves it must have validity.
People have been praying for a very long time. When was the last time a double-blind placebo-controlled randomized study yielded any evidence for the efficacy of prayer in delivering tangible outcomes?
People have been rubbing creams on their genitals for a very long time in order to make them larger (men’s penises) or smaller (women’s labia). When was the last time a double-blind placebo-controlled randomized study yielded any evidence for the efficacy of any of these creams?
We can all think of many things people have been doing for a very long time that have never yielded reliable outcomes. So the first argument is entirely misguided.
The second argument at first sight seems more solid. I mean, doctors, a hospital, a numerical value for reduction in analgesic prescription, fewer side-effects…
Except the study was riddled with design flaws.
First of all there was no control group which means there’s no way to account for the placebo effect. We already know that the placebo effect is astonishingly powerful — it was discovered in VietNam that when medics ran out of morphine they could inject water into an injured soldier and that soldier, provided he thought he was being given morphine, would experience a huge decrease in subjective pain. The placebo effect has been demonstrated in many different studies, which is why all well-designed studies take it into account by having a control group. No control group existed in the study cited by my fellow Medium scribe and so there was no way to know whether or not the acupuncture effect was purely placebo.
Second of all there was no double-blind, which simply means the doctors carrying out the research knew who was getting which treatment. This in turn means that “experimenter bias” can easily creep in, subtly altering the ways in which the experiment is conducted and subtly altering the subsequent data analysis. That’s why all well-designed studies are double-blind, meaning no one in the research group knows which patients received which treatments. As that wasn’t the case in the cited study, all manner of unintentional biases will have crept in.
Thirdly, the number of people in the study was quite small, meaning that the validity of any result would not be strong.
Finally, with a 35% reduction in the use of analgesics then of course we’d expect a concomitant reduction in side-effects. Because if you’ve not been given something, you can’t have a side-effect from it. So the part of the argument that says this reduction in side-effects “proves” the benefit of acupuncture is meaningless.
Just by knowing a few things about good experimental design versus bad design we can use these as heuristics for evaluating the claims made in scientific papers and, by extension, the claims made by people using these papers in support of their own positions.
While this may seem like nit-picking it’s actually essential. The sheer volume of papers being published and the impact some of them may end up having on public policy, medical treatments, and all manner of other areas means we absolutely do need to sort the wheat from the chaff. Today, we’re accepting far too much chaff as though it were wheat, and the consequences run from trivial (who cares if acupuncture works or not?) to very serious indeed (do we want to cut back on analgesics by relying on unproven therapies instead?).
The next time you read a “scientific” claim, it’s worth holding off on giving it the benefit of the doubt until you’ve had a chance to look at the paper itself. If the experimental design looks shaky (or worse, if the paper doesn’t explain the design early on) then it’s probably good to discount it and wait for a better-designed experiment to shed light on the topic. Because once we’ve formed a belief it is very, very hard for us to unlearn it. This is because learning creates new synaptic connections in the brain. Just like laying a railroad track, it’s not so difficult to lay new track across wide open land but it is rather difficult to tear it up and lay it back down in a new direction if we made a mistake first time around.
Although our brains always want to avoid ambiguity and we feel very uncomfortable in a state of not-knowing, we ought to fight the urge to believe everything we’re told and avoid jumping to conclusions. Very often a bit of patience and a bit of investigation can yield enough for us to make an educated decision about whether or not to give credence to a claim.
Which means we will be better informed, and hopefully thereby be in a position to make better decisions both large and small.