How many times in the last year you were exposed to harmful claims related to COVID-19? False science is becoming more and more common and harder to detect. Can AI help us keep up with the rising amount of falsehoods on social media? Our latest research “Identifying Pseudo-science via Deep Language Models and Computational Linguistics” shows it can achieve nearly 90% accuracy. This research has been accepted to the Second International Misinformation Workshop and will be presented next month as part of the KDD 2021 conference.

What is Pseudo-science?

Pseudo-science is a collection of beliefs or practices mistakenly regarded as being based on scientific method. It can have harmful implications including the loss of life or health, as recently evidenced during the COVID-19 pandemic. Until recently, research on pseudo-scientific content has remained largely unexplored presumably due to the lack of publicly available labeled data. To advance research on this domain, we are publishing a new dataset which is well-balanced and rich with sources, including 10,000 articles obtained from 100 pseudo-science and 100 pro-science sites. Below you can see the most frequent keywords of the pseudo-science and pro-science articles.

The data is available on our GitHub repository.

In this work, we studied 5 different approaches to detect pseudo-science articles. Here you can see the results on the new dataset for each method:

In summary, the combination of transformers and linguistic features is very effective for identifying pseudo-science. We are looking forward to presenting the research in KDD 2021 and invite you to explore this new dataset and develop methods that build upon the methods shown here to advance research on detection of harmful pseudo-scientific content.

Thank you for reading and please check out our research work here.