Remarkably, only 12 percent of post-replication citations of non-replicable findings acknowledge the replication failure. [1]
Roommate submitted his thesis for publication and one reviewer told him "oh, you cited this result from ~30y ago but it actually has a gap in the proof that no one's figured out how to fix yet." (People learn this stuff via the number theory gossip grapevine apparently?) [2]
Google Scholar is in a great position to reduce the "replication crisis", by alerting the users that the listed article is known to have some defect.
Principally, it could work like "disputed" on Twitter or Facebook:
Is it the best UX to show a modal window? Most likely not:
- We want to inform the visitors not just about failed replications, but also about successful replications and small rectifications (like adding a missing condition to a claim or fix of a troublesome typo).
- We do not want to unnecessarily interrupt the visitor’s flow - maybe the visitor is already familiar with the issues of the article or they just don't care about them.
So what? The information about the presence and the overall conclusion of the replicas could be represented with a double-ended bar chart sparkline similar to how Google Translate shows frequency the translation pair (note the red-gray bar graph at the bottom):
When there is a lot of negative evidence, the red bar graph on the left from the black divider is long. When there is a lot of positive evidence, the green bar graph on the right from the black line is long (not present in this case).
How to get it started? Let people mark articles as a replication of other articles.
Why people would bother?
- It is a great opportunity for the authors of replication studies to piggyback (collect citations) on the original, likely popular, articles.
- After a lot wasted time, you might find out that a claim in paper A does not hold. And that there is paper B that has already spot the issue. It's just that you were not aware of paper's B existence. In the rage, you might be willing to spend a minute and complain to the world that paper A has some issue, as noted by paper B.
How to collect feedback? The "piggybacking" articles could be explicitly ranked (up-voted/down-voted) like on StackOverflow. While an explicit feedback is not in Google's style, it is important to realize that Google Scholar is for a niche community and niche communities seem to benefit from the explicit feedback as there isn't enough implicit signal (observe success of StackOverflow, Reddit, Hacker News,...). A nice side effect of that would be an increased engagement due to Ikea effect (People values things, on which they have spend some effort, more than things that they got for free. In this case, people would value Google Scholar more, because they have spent time marking articles as a "rectification" of other orticles).
And what about machine learning? Of course, over the time, Google would collect enough training data, explicit feedback, and implicit feedback, that the pairing of the articles could get fairly reliably predicted. But to get there, Google has to first get the training data.