Mass media not only reflect the activities of state bodies but also shape the informational context, sentiment, depth, and significance level attributed to certain state initiatives and social events. Multilateral and quantitative (to the practicable extent) assessment of media activity is important for understanding their objectivity, role, focus, and, ultimately, the quality of the society’s “fourth power”. The paper proposes a method for evaluating the media in several modalities (topics, evaluation criteria/properties, classes), combining topic modeling of the text corpora and multiple-criteria decision making. The evaluation is based on an analysis of the corpora as follows: the conditional probability distribution of media by topics, properties, and classes is calculated after the formation of the topic model of the corpora. Several approaches are used to obtain weights that describe how each topic relates to each evaluation criterion/property and to each class described in the paper, including manual high-level labeling, a multi-corpora approach, and an automatic approach. The proposed multi-corpora approach suggests assessment of corpora topical asymmetry to obtain the weights describing each topic’s relationship to a certain criterion/property. These weights, combined with the topic model, can be applied to evaluate each document in the corpora according to each of the considered criteria and classes. The proposed method was applied to a corpus of 804,829 news publications from 40 Kazakhstani sources published from 01 January 2018 to 31 December 2019, to classify negative information on socially significant topics. A BigARTM model was derived (200 topics) and the proposed model was applied, including to fill a table of the analytical hierarchical process (AHP) and all of the necessary high-level labeling procedures. Experiments confirm the general possibility of evaluating the media using the topic model of the text corpora, because an area under receiver operating characteristics curve (ROC AUC) score of 0.81 was achieved in the classification task, which is comparable with results obtained for the same task by applying the BERT (Bidirectional Encoder Representations from Transformers) model.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited