How to detect fake news?

False news or fake news was defined by The New York Times as a “story invented with the intention of deceiving, often with a monetary benefit as a motive”.

Their main objective is to manipulate public opinion to influence the sociopolitical behaviors or belief systems of the masses, and they are normally generated by ideological or economic interests. To detect fake news is increasingly a social priority, and artificial intelligence is the only tool that can contain the invasion of online hoaxes.

The GPLSI (University of Alicante) and SINAI (University of Jaén) research groups work on the automatic detection of fake news. They are developing a system based on artificial intelligence that will automatically mark inconsistencies and other signals that warn that the information is not reliable in a text while it is being read. They have tested the detection system on news about covid-19.

Defend ourselves in the post-truth era

Fake news is the food of the “post-truth” in which we live. Post-truth, chosen as the word of the year 2016 by the Oxford dictionary, refers to a phenomenon of distortion in which objective facts have less influence on the formation of public opinion than appeals to emotion and personal beliefs.

Today the term has a much broader application in the newsmaking process, where “alternative facts” are substituted for real facts, and sentiments outweigh evidence.

The proliferation of fake news has been facilitated by the growth of personal blogs and social media such as Twitter, Facebook or WhatsApp. Anyone can be a transmitter of information and fact-checking is a lower priority than sharing news that might go viral.

Currently, information is mostly consumed online. Researchers at MIT have conducted a study that demonstrates the disturbing power of fake news, which spreads farther, faster, and more widely than real news.

An additional problem is that fake news is structured and worded in such a way that it is difficult to distinguish between what is true and what is false. Detecting and tackling fake news quickly and effectively is crucial, as once false information spreads and permeates society, it is difficult to refute.

This situation of false information is aggravated in times of emergency, such as during the global pandemic that we are experiencing due to covid-19. According to the IFCN, during the pandemic they verified more than 6,000 hoaxes that spread throughout the world.

The number of hoaxes is reaching such a level of viralization that it requires the application of automatic techniques that allow false news to be detected before it spreads massively.

Fake News Detection

Artificial intelligence techniques in general and natural language processing techniques in particular play a special role in improving and accelerating the detection process. Technologies such as machine learning or deep learning make it possible to detect features in information that make it unreliable. And all this working between millions of data.

Fact-checking technologies work in different ways. There are reference approaches, which look for a fact in some reference source; machine learning, which attempts to learn probability signals from truth; and contextual, which evaluate the probability of veracity based on how long the stories survive. The ideal would be to combine these three types.

Due to the complexity of detecting a hoax, the task is not addressed as a whole, but as small related subtasks that should end up being integrated into a single global detection system.

Error in the structure and content

We have designed a system that checks the news at two levels, analyzes its structure and its content. To analyze its structure, we check whether it meets the classic journalism rules: the 5W1H rule and the inverted pyramid (a concept of textual structure related to journalism).

The 5W and an H rule means that any journalistic text must answer these questions: What = What, Where = Where, When = When, How = How, Who = Who and Why = Why. This theory is an effective method that was adapted in different media.

In addition, the inverted pyramid consists of hierarchizing the information, counting the most relevant in the first paragraph. Artificial intelligence detects whether the text it analyzes follows this rule, and if it doesn’t, the information it contains may not be reliable.

Regarding the content of the news, we divide the parts of a news (title, subtitle, etc), and use a fact checking system to check the factual facts that are indicated with knowledge bases. We also extract various linguistic features automatically.

How have we tasted the system?

To test the effectiveness of our system, we have generated a dataset of news related to covid-19 that contains real and fake news. An example of news published and that is false is the following:

“Covid-19 is not a virus, it is an exosome. It is pollution that weakens the immune system, and as a consequence, people die from various causes, including seasonal flu, and all deaths are labeled coronavirus. Is a gotcha. And it will get worse, when 5G is fully deployed on Earth and in space, billions of people will die and another pandemic will be blamed. It’s not a virus, it’s an electromagnetic weapon.”

In our research work we have manually labeled a set of news, in terms of structure, content and veracity.

Making use of machine and deep learning algorithms, and using a relatively small dataset
as input due to the complexity of the annotation, the results obtained have been very promising. We have obtained a 75% accuracy in the veracity of a news item based on a plain text extracted from the Internet. The research has recently been published in a high-impact international journal.

The following image shows an example of the labeling that is done on any paragraph:

detect fake news
labelling news

With the good results obtained, the next step is to develop a final application that automatically marks the text of a news item while it is being read and that alerts by means of a signal of parts of the news item that may be false, indicating the reference to other similar texts in the that its veracity can be verified.

In this way, both an end user and a journalist could use this information to make the most convenient decision about its veracity.

Leave a Reply