Psychology is still digesting the implications of a large study published last month, in which a team led by University of Virginia’s Brian Nosek repeated 100 psychological experiments and found that only 36% of originally “significant” (in the statistical sense) results were replicated.
Commentators are divided over how much to worry about the news. Some psychologists have suggested that the field is in “crisis,” a claim that others (such as Northeastern University psychology professor Lisa Feldman Barrett) have flatly denied.
What can we make of such divergence of opinion? Is the discipline in crisis or not? Not in the way that some seemed to suggest, but that doesn’t mean substantial changes aren’t needed.
Certainly the fact that 64% of the findings were found unstable is surprising and disconcerting. But some of the more sensational press response has been disappointing.
Over at The Guardian, a headline writer implied the study delivered a “bleak verdict on validity of psychology experiment results.” Meanwhile an article in The Independent claimed that much of “psychology research really is just psycho-babble.”
And everywhere there was the term “failure to replicate,” a subtly sinister phrasing that makes nonreplication sound necessarily like a bad thing, as though “success” in replication were the goal of science. “Psychology can’t be trusted,” runs the implicit narrative here, “the people conducting these experiments have been wasting their time.”
Reactions like this tied themselves up in a logical confusion; to believe that nonreplication demonstrated the failure of psychology is incoherent, as it entails a privileging of this latest set of results over the earlier ones. This can’t be right: it makes no sense to put stock in a new set of experimental results if you think their main lesson is to cast doubt on all experimental findings.
Experiments should be considered in the aggregate, with conclusions most safely drawn from multiple demonstrations of any given finding.
Running experiments is like flipping a coin to establish whether it is biased. Flipping it 20 times, and finding it comes up heads for 17 of them, might start to raise your suspicions. But extreme results like this are actually more likely when the number of flips is lower. You would want to try that coin many more times before feeling confident enough to wager that something funny is going on. Failure to replicate your majority of heads in a sample of 100 flips would indicate just that you hadn’t flipped the coin enough to make a safe conclusion the first time around.
This need for aggregation is the basis of an argument advanced by Stanford’s John Ioannidis, a medical researcher who proposed 10 years ago that most published research findings (not just those in psychology) are false. Ioannidis highlights the positive side of facing up to something he and many other people have suspected for a while. He also points out that psychology is almost certainly not alone among scientific disciplines.
The fact is, psychology has long been aware that replication is a good idea. Its importance is evident in the longstanding practice of researchers creating systematic literature reviews and meta-analyses (statistical aggregations of existing published findings) to give one another broader understandings of the field. Researchers just haven’t been abiding by best practice. As psychologist Vaughan Bell pointed out, a big part of Nosek’s achievement was in the logistical challenge of getting such a huge study done with so many cooperating researchers.
This brings us to the actual nature of the crisis revealed by the Science study; what Nosek and his colleagues showed is that psychologists need to be doing more to try to replicate their work if they want a better understanding of how much of it is reliable. Unfortunately, as journalist Ed Yong pointed out in his Atlantic coverage of the Nosek study (and in a reply to Barrett’s op-ed) there are several powerful professional disincentives to actually running the same experiments again. In a nutshell, the profession rewards publications and journals publish results which are new and counter-intuitive. The problem is compounded by the media, which tend to disseminate experimental findings as unquestionable “discoveries” or even God-given truths.
So though psychology (and very likely not only psychology) most certainly has something of a crisis on its hands, it is not a crisis of the discipline’s methodology or rules. Two of the study’s authors made some suggestions for improvement on The Conversation, including incentives for more open research practices and even obligatory openness with data and preregistration of experiments. These recommendations reiterate what methods specialists have said for years. Hopefully the discussion stirred up by Nosek and colleagues’ efforts will also inspire others.
In essence, everyone agrees that experimental coin flipping is a reasonable way to proceed. This study exposed a flaw of the discipline’s sociology, of what people actually do and why they do it. Put another way, psychologists have already developed a perfectly effective system for conducting research; the problem is that so few of them really use it.