1. Introduction
One of the worst problems in the current information society is disinformation. It is a wide-ranging problem that alludes to the inaccuracy and lack of veracity of certain information that seeks to deliberately deceive or misdirect [
1]. This phenomenon spreads on a viral scale and can therefore result in massive confusion about the real facts. Disinformation often involves a set of contradictory information that misleads users. Being able to automatically detect contradictory information becomes essential when the amount of information is so large that it becomes unmanageable and therefore confusing [
2]. Contradiction, as described in [
3], occurs between two sentences A and B when there exists no situation whatsoever in which A and B are both true. Therefore, in natural language processing (NLP), the task of contradiction identification implies detecting natural language statements conveying information about events or actions that cannot simultaneously hold [
4]. In the current context, the automatic detection of contradictions would contribute to detect unreliable information, as finding contradictions between two pieces of information dealing with the same factual event would be a hint that at least one of the two pieces of news is false. A definition of different types of contradictions were presented in [
3], where the authors defined a typology for English contradiction, finding two main categories: (1) those occurring via antonymy, negation, and date/number mismatch, which are relatively simple to detect, and (2) contradictions arising from the use of factive or modal words, structural and subtle lexical contrasts, as well as world knowledge (WK).
The task of automatic detection of contradictory information is tackled as a classification problem [
5], when two pieces of text are talking about the same fact, within the same temporal frame. If we define a statement as
, where
i refers to the information provided about fact
f occurring at the time
t, we will classify two pairs of text as
Compatible information: two pieces of text,
and
, are considered compatible if, given
and
, the following statement holds true:
Contradictory information: two pieces of text,
and
, are considered contradictory if, given
and
, the following statement holds true:
Unrelated information: two pieces of text,
and
, are considered unrelated if, given
and
, the following statement holds true:
Thus, a news item is classified as contradictory when given the same fact (It is considered that the same fact in two different news items could be expressed with different event mentions.) within the same time frame, the fact-related information is incongruent in the two news items being considered.
Nowadays, the coronavirus crisis has heightened both the need for reliable and not contradictory information. However, it is frequent to find different information about the same fact in different media, sometimes biased by a certain political spectrum. For example, here is a real case of contradiction in two different Spanish media outlets about the same information. The date of publication for the two news items taken from OkDiario and El Pais is the 19 March 2021:
Source “OkDiario” (
https://okdiario.com/espana/estas-son-imagenes-reacciones-vacuna-astrazeneca-algunos-funcionarios-prisiones-6976588, accessed on 22 March 2021):
“Varios funcionarios han sufrido reacciones adversas tras la inoculación del fármaco que en algunos casos han precisado de atención hospitalaria por lo que piden un protocolo de seguimiento para los vacunados….Hasta ahora al menos tres policías nacionales, un guardia civil y un policía de la Ertzaintza han desarrollado trombos de gravedad tras haberse vacunado…” (“Several government employees have suffered adverse reactions after being inoculated with the vaccine, and in some cases they have required hospital care, so they are calling for a follow-up protocol for those vaccinated…So far at least three national police officers, a civil guard and an Ertzaintza police officer have developed serious thromboses after having been vaccinated…”)
Source “El Pais” (
https://elpais.com/opinion/2021-03-19/confianza-en-las-vacunas.html, accessed on 22 March 2021):
“…La Agencia Europea del Medicamento ha ratificado que la vacuna de AstraZeneca es segura y eficaz y que los beneficios que aporta superan claramente a los posibles riesgos. Despeja así las dudas surgidas ante la notificación de una treintena de casos de trombosis…” (“…The European Medicines Agency has confirmed that AstraZeneca’s vaccine is safe and effective and that the benefits clearly outweigh the possible risks. This clears up the doubts that arose after the notification of some thirty cases of thrombosis…”)
These two pieces of information concerning vaccination are contradictory, as while the first states that episodes of thrombosis have occurred after inoculation, the second rules out that a relationship exists between the cases of thrombosis that have occurred and the vaccine. This type of disinformation caused by the contradiction of information between the traditional media is potentially dangerous, as it may cause a public health problem generated by a reluctance to take up the offer of vaccination against COVID-19. Therefore, there is a need to alert users of these contradictions.
Most of the resources and systems for contradiction detection are developed in English [
6,
7,
8,
9]. However, despite the fact that Spanish is one of the most widely spoken languages in the world, there are no powerful resources to carry out the task of detecting contradictions from the direct perspective of this language. Currently, XNLI [
10] is a cross-lingual dataset, which is divided into three partitions: training, developed, and test. The training set is developed in the English language, and the development and test sets are in 15 different languages. The XNLI has been used to create contradiction detection systems for training in English and predicting in other languages, obtaining good performance results. Each example in XNLI is classified as Contradiction, Entailment, or Neutral. However, to deal with contradictions it is important to consider their wide range and large variety of features [
3]. Therefore, the purpose of this paper is to demonstrate that differentiating between the different types of contradictions can help to perform a more specific treatment of them, thereby enhancing capability to detect them in a broader way without having many previous examples of them. The XNLI dataset does not distinguish between different types of contradictions in its annotation, and the Spanish language is only available in the development and test sets manually translated from English. Both of these facts may affect the performance of models created from XNLI dataset for different languages. Besides, in this sense, the novelty of the proposed work is that we focus the proposal beyond covering the detection of the contradiction in Spanish, towards being able to detect what type of contradiction it is.
Furthermore, the contradiction detection system can be applied to detect different types of disinformation such as incongruent headlines or news published by different media, whether traditional or social, that seek to inform about the same fact but the information provided is inconsistent, and thereby inaccurate and unreliable.
The main contributions of this research are the following:
First, as there is a lack of Spanish resources created from scratch for this task, a new Spanish dataset is built with different types of compatible, contradictory, and unrelated information for the purpose of creating a language model that is capable of automatically detecting contradictions between two pieces of information in this language. The novelty of this dataset and what differentiates it from others is the fact that in addition to detecting contradictions, each contradiction is annotated with a fine-grained annotation, differentiating between different types. Specifically, four of the types of contradictions defined in [
3] are covered: antonymy, negation, date/number mismatch, and structural. In addition, the dataset is based on the study of incongruent headlines in traditional media, and it contains different types of contradictions between headlines and body texts in the Spanish language.
Second, a set of experiments using a pretrained model as BETO [
11] has been applied to build the language model and validate its effectiveness.
Note that at this stage of the research, covering only four types of contradictions is a real limitation of our dataset due to the wide spectrum of contradictions existing between texts. However, it allows the structure and design of preliminary systems for detecting contradiction in Spanish. The creation of an automatic process for classifying contradictions between texts, scaling from trivial to complex cases, could contribute to the design of hybrid systems operating in human–machine environments, providing additional information to humans about the type of contradiction encountered in an automatic system, which is the future line of our research.
The rest of the paper is organized as follows.
Section 2 describes the previous work and existing resources on contradiction.
Section 3 presents the definition of the dataset benchmark.
Section 4 describes the model, the evaluation setup used, and experiments conducted in this research.
Section 5 presents the results and discussion. Finally, our conclusions and future work are presented in
Section 6.
3. ES-Contradiction: A New Spanish Contradiction Dataset
Our dataset (ES-Contradiction) is focused on contradictions that are likely to appear in traditional news items written in the Spanish language. Unlike other datasets, in the dataset proposed in this work, contradictions are annotated by distinguishing the type of contradiction according to its specific characteristics. Thanks to this fine-grained classification, complex contradictions can be treated in more precisely in future.
In order to create the ES-Contradiction dataset, news articles from a renowned Spanish source were automatically collected, including the headline and body text. According to the journalistic structure of a news item, the headline is the title of the news article, and it provides the main idea of the story. Normally, in one sentence it summarizes the basic and essential information about the story. The main objective of the title is to attract the reader’s attention. A headline is therefore expected to be as effective as possible, without losing accuracy or becoming misleading [
26]. Therefore, finding contradictions between headlines and body texts is a crucial task in the fight against the spread of disinformation.
In the current state of the dataset, news is focused on two domains—economics and politics, although the ultimate goal will be automatic cross-domain contradiction detection.
3.1. Dataset Annotation Stages
The dataset was built in four stages, subsequently outlined and detailed: (1) Extracting information from data source, (2) modifying news headline according to the different types of contradictions, (3) classifying the relationship between headline and body text (Compatible or Contradiction), and (4) randomly mixing headlines and body texts (Unrelated).
Extracting information from data source: The headline, body text, and date of the news item are extracted from a reliable data source. In this case, the news agency EFE was used (
https://www.efe.com/efe/espana/1, accessed on 22 March 2021). The news extracted belongs to the political and economic domains, assuming that the headlines and body texts are compatible, although in the third stage this relationship is verified.
Modifying news headlines: The aim of this stage is to make the news headline contradictory to the body text by including simple alterations to the headline structure. The changes to the headline together with some examples are (examples given in Spanish and translated into English for clarification) given as follows:
NEGATION (Con_Neg): This alteration consists of negating the headline of the news item.
- (a)
Original headline: “El comité de empresa debatirá mañana la “propuesta final” de Alcoa” (“Union representatives will discuss Alcoa’s ’final proposal’ tomorrow”)
- (b)
Modified headline: “El comité de empresa no debatirá mañana la “propuesta final" de Alcoa” (“Union representatives will not discuss Alcoa’s ’final proposal’ tomorrow”)
ANTONYM (Con_Ant): This transformation consists of replacing the verb denoting the main event of the headline with an antonym.
- (a)
Original headline: “El Gobierno se compromete a subir los salarios a los empleados públicos tras los comicios” (“The Government pledges to raise public employees’ salaries after the elections”)
- (b)
Modified headline: “El Gobierno se compromete a bajar los salarios a los empleados públicos tras los comicios” (“Government pledges to cut public employees’ salaries after the elections”)
NUMERIC (Con_Num): This amendment consists of changing numbers, dates, or times appearing in the headline.
- (a)
Original headline: “La economía británica ha crecido un 3% menos por el brexit, según S&P” (“UK economy has grown by 3% less due to Brexit, says S&P”)
- (b)
Modified headline: “La economía británica ha crecido un 5% menos por el brexit, según S&P” (“UK economy has grown by 5% less due to Brexit, says S&P”)
STRUCTURE (Con_Str): This modification consists of changing the position of one word for another or substituting words in the sentence.
- (a)
Original headline: “Arvind Krishna sustituirá a Ginni Rometty como consejero delegado de IBM” (“Arvind Krishna will replace Ginni Rometty as IBM’s CEO”)
- (b)
Modified headline: “Ginni Rometty sustituirá a Arvind Krishna como consejero delegado de IBM” (“Ginni Rometty will replace Arvind Krishna as IBM’s CEO”)
These alterations will change the semantic content of the sentence, making it contradictory to the previous headline and body text. The annotation process was carried out by two independent annotators that were trained by an expert annotator.
Classifying the relationship between the headline and the body text: The semantic relationship between the headline and the body text was annotated in two phases: The first phase consisted of classifying the information into Compatible (compatible information) or Contradiction (contradictory information). In the second phase, in the case of Contradiction, the type of contradiction was also annotated (Negation, Antonym, Numeric, Structure). This stage involved four annotators who are trained to detect semantic relationships between pairs of texts.
Aleatory mixing headline and body text: The news items reserved in the first stage were used to generate unrelated examples (Unrelated). The headline was separated from the corresponding body text and all the headlines were randomly mixed with the body texts. In the mixing process, it was verified that the headline is not mixed with the corresponding body text. This step was done automatically without the intervention of the annotators.
3.2. Dataset Description
The dataset consists of 7403 news items, of which 2431 contain Compatible headline–body news items, 2473 contain Contradictory headline–body news items, and 2499 are Unrelated headline–body news items. This represents a balanced dataset with three main classification items. The dataset split sizes for each annotated class are presented in
Table 1. We partitioned the annotated news items into training and test sets.
As can be seen in
Table 2, our dataset contains examples of each type of contradiction. However, it is important to clarify that there are few examples of structure contradiction, given the complexity of finding sentences that allow for this type of modification.
3.3. Dataset Validation
Due to the particularities of the dataset annotation process, it was necessary to validate the second and third stages of the process. For the second stage, a super-annotator validation was conducted, while for the third stage, an inter-annotator agreement was carried out. We randomly selected 4% of the Compatible and Contradiction pairs (n = 200) to carry out the dataset validations.
3.3.1. Super-Annotator Validation
For the second stage, it was not possible to make an inter-annotator agreement because this stage consists of headline modifications and the possible variations are infinite. In this case, a manual review of the modified headlines is performed by the Super-Annotator to detect inconsistencies with the indications in the annotation guide. Only 2% of the analyzed examples present inconsistencies with the annotation guide, corroborating the validity of this stage.
3.3.2. Inter-Annotator Agreement
In order to measure the quality of the third stage annotation, an inter-annotator agreement between two annotators was performed. In cases where there was no agreement, a consensus process was carried out among the annotators. Using Cohen’s
[
27] a
k = 0.83 was obtained, which validates the third-stage labeling.
5. Results and Discussion
This section presents the results obtained in each of the experiments described in
Section 4. The values are expressed in percentage mode (%).
5.1. Predicting All Classes
This experiment is performed on the entire dataset to predict the 3 classes previously defined. The system created is capable of detecting the Unrelated class with a high level of precision and achieves significantly good results in the Compatible and Contradiction classes.
Table 3 presents the results.
The results obtained in the Unrelated class indicate that the system is capable of detecting with excellent
these types of examples, corroborating the results obtained in the literature on this type of semantic relation between texts [
30]. The other two classes have room for improvement, by using, for instance, external knowledge. A future line of work would consist of including resources that detect antonyms and synonyms in line with [
31] for the purpose of improving the results of the Contradiction class. Furthermore, including syntactic and semantic information could improve the detection of other more complex contradictions, such as structural ones, without the need for such large datasets.
5.2. K-Fold Cross-Validation
A k-fold cross-validation experiment aims to estimate the error and select the hyperparameters of the model [
29]. This is achieved by training and testing the model with all available data for training.
Table 4 shows the results of the cross-validation for each fold.
The experiment conducted with our best fine-tuning model obtains a mean accuracy of 88.94% and a standard deviation of 1.234%. The prediction of the contradiction classification model in the test set should have an accuracy close to the mean obtained in the cross-validation because the standard deviation is very low. Furthermore, the training and test set of the ES-Contradiction are very similar as they were formed by splitting the original dataset.
5.3. Detecting Contradiction vs. Compatible Information
In this experiment, the Unrelated class is removed from the ES-Contradiction dataset to measure the accuracy of the approach in terms of distinguishing between compatible or contradictory information, assuming that the information is related. The results are shown in
Table 5. The approach obtains similar results in both predicted classes. This is due to the quality of the training examples and the balanced number of examples from each class in this dataset. As indicated in the discussion of the first experiment, the results for predicting classes could be improved by introducing external semantic information, similar to the introduction of SRL [
9] and the use of Wordnet relations [
8], both of which improve the results of deep learning models.
5.4. Detecting Specific Types of Contradictions
This experiment aims to analyze the detection capability of the approach by contradiction types.
Table 6 shows the results obtained exclusively for the detection of contradiction types.
The structural contradiction class (Con_Str) is the one that obtains the lowest accuracy results and
. This contradiction type is considered one of the most complicated contradictions to detect compared with the other contradictions [
3], which is in line with our results. In addition, the Con_Str class, due to the scarcity of training examples, contains the lowest number of examples in this dataset, so the model can learn more about other more representative classes. It is highly likely that contradictions such as the structure contradiction need external semantic knowledge to improve detection results.
5.5. Comparison between XNLI and ES-Contradiction
In order to demonstrate the generality of our proposal, a series of experiments has been performed using the XNLI dataset and ES-Contradiction dataset in different training–test configurations.
The XNLI dataset is divided into training, development, and test set. The training set is developed in the English language. The development and test sets are in 15 languages, including Spanish.
To carry out this experiment, machine translation of the training set into Spanish and the test set in Spanish were used.
Table 7 presents the results of each trained system. The best results are highlighted in italic.
The models of line 1 and 2 are trained using the XNLI training set, the difference being that the first line predicts the ES-Contradiction dataset test set, and the other one, the XNLI test set. The prediction results are quite close for both of them, but the Contradiction class is detected with a higher accuracy and .
Comparing lines 1 and 4, considering our dataset as the test set with the four types of contradictions, as expected the system trained on our dataset is substantially better than the system trained on the XNLI training set. The result indicates that the XNLI dataset does not manage to cover all the contradictions contained in our dataset, even though it is more than 40 times the size of the training set of ES-Contradiction dataset and is composed of examples from different genres.
The XNLI training set is exactly the same as the MultiNLI training set. It has been developed manually by parsing a sentence from a non-fiction article and creating three sentence variants: definitely correct, might be correct, and definitely incorrect [
7]. The procedure for creating the training set of MultiNLI dataset follows an annotation guide that is sufficiently general to avoid bias in the dataset. However, this lack of specificity may cause a shortage of examples of various types of contradiction, resulting in an imbalance of contradiction types.
Table 8 shows the accuracy by type of contradiction of the model trained in row 1 of
Table 7.
In the prediction of the type of contradiction (Con_Neg, Con_Ant, and Con_Num), this model achieves significantly good results; even in the class Con_Neg they are very good (88.50% accuracy). However, in the prediction of the class Con_Str, they are very low (48.48% accuracy); this result could be due to the lack of examples of this type in the XNLI dataset.
Finally, the system trained on the ES-Contradiction dataset failed to obtain good enough results in order to predict the XNLI test set. This system only obtains 32.28% to predict the contradictions of the XNLI test set. The need to include new types of contradictions in the ES-Contradiction dataset is evidenced, specifically those that allow the creation of robust contradiction detection systems for the real-world and enable prediction with higher accuracy in the XNLI dataset.
Unlike the XNLI dataset, the ES-Contradiction dataset in its first version could not be used to create a real system of contradiction detection. However, the annotation of contradiction types has enabled us to detect which contradictions are more difficult to tackle and how models may need external knowledge to improve the results. By future inclusion of other types of contradiction in our dataset (factive, lexical, WK, and more examples of structure contradiction), we could assess what kind of knowledge is useful to include in the reference models within this task, and thereby make progress towards the creation of a powerful system for detecting contradictions.
Extending the XNLI dataset with the types of contradictions contained in the ES-Contradiction dataset is not an appropriate option as the XNLI Spanish language training set is automatically translated, which could incorporate several biases into automatic detection systems. Furthermore, the currently annotated examples do not have this fine-grained annotation of our proposal.
6. Conclusions
This work has built the ES-Contradiction dataset, a new Spanish language dataset that contains contradiction, compatible, and unrelated information. Unlike other datasets, in the ES-Contradiction dataset, contradictions are annotated with a fine-grained annotation that distinguishes the type of contradiction according to its specific characteristics. The contradictions currently covered in the dataset created are negations, antonyms, date/numerical mismatch, and structural contradictions. However, all the contradictions presented in [
3] are the final goal of this research. The main purpose is to create an automatic process for classifying contradictions between texts, scaling from trivial to complex cases, and giving each contradiction a precise and customized treatment. This would avoid the need to have large datasets that contemplate a multitude of examples for each of the contradictions.
BETO model is used to create our system. Beto is a Transfer Learning model based on BERT. Five different experiments were performed with our system indicating that it is able to detect the four types of contradictions with a of 92.47% and contradiction types with a of 88.06%. As for the detection of each specific type of contradiction, our system obtains the best results for negation contradictions (98% ), whereas the lower results are obtained for structural contradictions (69% ), corroborating that the best results are obtained from the classes with the largest number of examples with more simple contradictions. Our results leave a great margin for improvement that can be tackled with the inclusion of external knowledge that enables improvement on the prediction of contradiction types.
Furthermore, as for the generalization of the system, we compared the system by training it on the XNLI dataset and training it on ES-Contradiction dataset. The system trained on our dataset was not able to detect with high accuracy the XNLI test set, which indicates that in this first version it is not possible to create a powerful contradiction detection system. The negative results in the generalization tests of our corpus were expected, as it only covers four types of contradictions existing in texts. On the other hand, the system trained on the XNLI dataset managed to detect the contradictions in our dataset with high accuracy, especially in the most common types of contradictions, which therefore will also be the largest number of examples. However, when analyzing by contradiction types, we detected that the structure contradiction is not detected correctly. With this experiment, we found that the XNLI dataset, although much larger than ours, does not cover all types of contradictions, which indicates a need to deal with more complex contradictions in a more specific manner.
The results obtained show that the created Spanish contradictions dataset is a good option for generating a language model that is able to detect contradictions in the Spanish language. This language model was capable of distinguishing between the specific type of contradiction detected. In order to create a powerful contradiction detection system in Spanish, it is necessary to extend our dataset with other types of contradictions and add specific features. This will enable us to detect, with greater precision, not only structural contradictions, but also other more complex contradictions that are possible in a real scenario for which the system is not previously trained.