My colleague asked me to do a summary about what the current issues are with psychology, for a national committee he is a member of.
I produced a 3 page document (in Swedish), with some of the main points. Headed with this
• Lots of published research is false.
• Incentives are not aligned with good research practices
o Publication bias
o Questionable research practices to increase chance of publication
o Price is not borne by QRP researchers/Journals
• Initiatives to handle the problem
o Several initiatives centered on replication (reproducibility project)
o Open science framework (ease of sharing materials and data)
o Pre-registration of research
o Published peer-review
o Post-publication peer-review
o Support for participating in replication/archive
o Support for participating in post-pub peer-review
o Developing better metrics to adjust problematic incentives
But, prior to that, I wrote pages and pages (and pages) to collect my thoughts. And, this is some of what I wrote.
What we have here are much too many scientists for too few slots both at universities and in journals, and for funding. And, as is the case whenever there is too many for too little, the Darwinian struggle will take place with selection for the fittest – whatever the fittest may be – not necessarily good science.
How did we get here? I’m no historian, but the system with publishing and peer-review and appointments was devised when the scientists were fewer, as were the outlets for findings. Even then, the scientific process was likely Darwinian (borrowing from Hull, science as a process), with tribal like groups pursuing one version of truth, with other groups opposing. The self-correction that science is famed for, was not in the individual scientists or even scientific groups, where one stubbornly could cling to phlogiston, or cladistics, or whatever theory that is even still going or dead, because rival groups would obligingly set out to test and rip it apart, if at all possible.
At some point, it started to grow. Historians will know better – my account is a hearsay account puzzled together from Meehl, the Economist, Older academics, and snippets I can’t for my life place, so it can certainly be critiqued. There was money to be had for research that the universities could skim off to pay for other activities (per both Meehl and Stephan, and an random commenter on Stephen Hsu’s blog). There was research to be done in the hard, and semi-hard, and medical and soft sciences, and there needed to be people to people those labs and write up accounts and educate the new students. Who to select and promote? When I read accounts of American universities from the mid last century, it seemed like they were mostly home-grown. The home-grown academic still exist in Sweden. There were places to fill, and here was a good person who had just gotten a doctorate.
But, as more and more doctors were graduated, the need for selection set in (I now am recalling Malthus). Who to retain? Who to promote? Who to grant funds?
Those who are either already doing interesting science, or promise to do interesting science.
But, this is notoriously difficult to assess. Sure, you can probably assess skill or lack thereof, but is someone following a viable path to discovery, engaging in what will turn out to be a blind path, or engaging in some kind of pseudoscience? Anybody with a bit of familiarity with the science of forecasting, or modern theory of science knows that this is just about impossible, except in hind-sight, where it seems obvious. Some of the smartest minds tried to find demarcation criteria, and instead discovered that they could not be found.
Successful science is determined historically, but prior to that it is a matter of placing bets.
An obvious way is to actually look at what a scientist is doing. Evaluate the soundness of the research, and how interesting it seems on this side of the future. This can be done, especially by close peers – this is, after all, what peer review is based on. Of course, we know that it is far from perfect. And it does not help us get around the problem with the unpredictable future.
But, hiring and promotion committees, and even some granting committees are not experts in that particular mile-deep pinhole of research, and have to find other ways of evaluating what is good, and deserving.
And, what do you do? Rely on a heuristic so common that marketers have stumbled on it unwittingly, social psychologists have thoroughly investigated its effects and side-effects, and evolutionary psychologists have dubbed it as an evolved heuristic for the use in social learning – looking at the decisions of Authorities. We may not be able to properly assess the work, but the researchers Peers can, let’s see what they say.
You use peer-review, or Peer-review by proxy – Journal articles.
This is not a bad index, at first glance. We use it all the time, because none of us have the time to dig through and learn to understand everything from scratch. If you are interested in focusing on research at your institution, and want to know who might be successful in the future, look at their past record, especially the record with their peers. The already published and cited are more likely to be published and cited in the future. Hence, publish or perish emerged.
But, perhaps, as I just lectured about in my evolutionary psychology group about this heuristic – the prestige index – it can give rise to other derivative indices, resulting in a run-away prestige selection. Very much like that one supposedly responsible for the peacocks tail (which Behavioral Ecologists looking at sexual selection will groan at for being so stale): A Fisherian run-away selection for indices of prestige.
Somewhere in here entered the ranking of journals, which also began in a history that I’m only foggily aware of. The roots of the impact factor can be found in another case of scarcity – space. Library space, in this case. Librarians needed to make reasonable bets on which journals to purchase and retain, as the output steadily increased (Arbeman’s “the half life of facts” illustrate this nicely). What to do? Well, look at which journals scientists mostly read, and mostly cite. Another bit of prestige selection, perhaps, or at least a bit of frequency selection (what do they all like? Let’s keep that).
Now we have an index. But, what do we do with the index? What humans, at least those interested in rank and prestige and competition, do: we use that to measure worth. It is thebeginning of the Fisherian prestige run-away selection. Instead of just looking at whether someone was published in a peer-reviewed journal, and how often, and how frequently cited, look at the rank of the journal. And, from the competing scientists view – if I get it into one of the journals that are more highly ranked (because more scientists read them), the more eyeballs my research gets, with potentially more citations, which raises my prestige.
Meanwhile, the universities kept churning out PhD’s, and although I surmise there were more university positions for teaching the ever growing number of undergraduates, the number of faculty slots were not growing as fast (by a long shot) as the number of researchers, all who at some point were best in class, with top marks, and willing to work out of love or obsession with their topic, or desire for – well – prestige.
How do you distinguish yourself among peers who are all just as excellent and inventive and passionate, and highly trained as you? Well, you look for competitive advantage, the economists will say. This can start as plain competing. You publish more – your CV is simply longer. Or, maybe it is shorter, but it is in those highly ranked journals. It starts becoming important to be strategic – am I in the right niche? What are they paying for this year? Where should I place my article? And, should it really, um, be just one? How about splitting it up in two? Can we collect data faster? We’ll label that data-set later. Could we add this co-variate and perhaps get the p-value into one that ups the chances of publishing? Remove a few outliers, perhaps a few more. Check on the data, do we need to collect a few more? No?
Then you have the journals. One of the limiting sources, by virtue of, you know, pages (a limit that is slowly dissolving into endless cyberspace). Like that choosy peahen, wanting to select just the really good pieces of research (still, just as difficult to assess, and now by peer-reviewers who are full on engaged in the competition to keep a job). There is also another piece to protect – journal ranking. And, let’s not get into financial issues – it leads too fast into the suspicion of the nefarious, and I simply don’t want to go there. I don’t think journals or editors or peer reviewers are nefarious. They are simply, like any good limiting resource (such as peahens) being choosy and discriminate in order to maintain their level of prestige, of eyeballs, of shelf-life, of down-loads, and of citations, so that the scientists will keep vying for their interest.
University positions are similar (oh, the sex metaphor breaks down, unless one think, like Kurt Vonnegut in Slaughterhouse 5, if I remember right, that there is a need for multiple sexes in order to reproduce). Who should they hire? To keep prestige. To keep the research going. To keep the funding coming. Funding agencies alike are looking at how to select who to grant. Good researchers are now a dime a dozen. More. They are endless. They are skilled. They are hard-working. They publish. They publish more and more and more. (As someone mentioned, the body of work Kahneman won the nobel prize for would not get you tenure today).
Being a researcher is slowly becoming similar to being a musician, or an actor or poet, or any of those positions where very few are chosen from myriads of qualified. You do not play the game, and play it well, you are gone, and even if you do, you may still be gone. Most are gone.
The markers, who at one point were reasonable, are also changing to become not so reasonable, because they are not necessarily signs of skill. Evolution sees the emergence of gamers, of mimics and cheats, and sees no way of eliminating them. It is a way of getting competitive advantage, or defending against predation. Mimic the pattern of that bitter butterfly and escape being eaten. Grow the bands of the poisonous snake, and enjoy the protection of the danger signal, despite being harmless. Lay your eggs in the nests of other species and avoid the cost of raising off-spring.
The competition, with the overproduction of scientists, is encouraging the mimics. The outright fraud is the extreme – nothing is likely to protect from that – but there are still the questionable practices, and the underpowered experiments, and the non-reproducible results in fields from cancer research to social cognition.
What does this have to do with Sweden, where there is no overproduction ofPhD’s? The market is global. Research is international. We cannot escape.
The problems with how science is conducted and reported and evaluated and rewarded have been raised for decades. You have the Tukey’s and the Meehls, and the Cohens and the Gigerenzers publishing and critiquing and requesting that we pay attention to the problems of the research and nothing really changes. Yes, we have added a requirement for effect sizes.
But, calls for going against the system in order to do something properly will not be heeded when they go against the practices for even staying in the system. Boom, you are selected out.
We can train the new scientists, but what will they do? They will use those evolved heuristics for social learning that we are endowed with, and look at their successful elders, and try to emulate them. The systems, as they are, are not good at distinguishing whether the signal comes from solid research or problematic research (it was never easy). It also may reward traits that are not conducive to good research, but are conducive to good signaling.
I think a lot of researchers experienced a disconnect between how statistics and methods were taught in the class room, and how it was actually practiced in the lab, and the border between good pragmatic practices, and questionable ones is not a clear one.
I think researchers have been unhappy with the state of affairs for a long time, especially those who have not done well by the system of course, but I think also some of those that have. Because science is based on trust, and if you cannot trust the results, everything breaks down.
Otherwise, there would have been no Tukey’s and Meehls and Cohens and Gigerenzers. Science is more interesting when it is true. And, to paraphrase Uri Simonsohn’s sentiment, one can’t bear being a part of a field of research where there is more flash than substance, because we got into this not for the fame or fortune, but because we wanted to figure out how the world works.
I don’t know if it is different this time.