Verification of the Methods of Digital Monitoring of Information Space Based on Coding Theory Tools

Shaltykova, Dina; Bakirov, Akhat; Grishina, Anastasiya; Kostsova, Mariya; Vitulyova, Yelizaveta; Suleimenov, Ibragim

doi:10.3390/computers15040260

Open AccessArticle

Verification of the Methods of Digital Monitoring of Information Space Based on Coding Theory Tools

by

Dina Shaltykova

¹,

Akhat Bakirov

^1,*,

Anastasiya Grishina

²,

Mariya Kostsova

³

,

Yelizaveta Vitulyova

^1,4

and

Ibragim Suleimenov

¹

National Engineering Academy of the Republic of Kazakhstan, Almaty 050010, Kazakhstan

²

Sevastopol Economic-Humanitarian Institute (Branch) of Federal Autonomous Educational Institution, V.I. Vernadsky Crimean Federal University, Akademika Vernadskogo Ave, 4, 295007 Simferopol, Republic of Crimea

³

Psychology Department, Sevastopol State University, Universitetskaya Str, 33, 299053 Sevastopol, Republic of Crimea

⁴

JSC “Institute of Digital Engineering and Technology”, Almaty 050013, Kazakhstan

^*

Author to whom correspondence should be addressed.

Computers 2026, 15(4), 260; https://doi.org/10.3390/computers15040260

Submission received: 12 March 2026 / Revised: 16 April 2026 / Accepted: 17 April 2026 / Published: 21 April 2026

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study examines the applicability of coding-theoretic tools to the digital monitoring of information space. The proposed approach treats response patterns to socially significant stimuli as binary sequences and interprets their analysis as a classification problem analogous to error correction in coding theory. To verify the feasibility of this framework, a model psychological test consisting of seven binary questions was analyzed using a procedure derived from the Hamming code (7,4). The method makes it possible to map the full space of observed answer combinations onto a smaller set of reference codewords and thereby identify stable response configurations. The obtained results show that the distributions produced after coding-based transformation are markedly non-uniform and contain recurrent maxima, indicating the presence of structured patterns in collective responses. It is also shown that permutations of question order substantially affect the resulting distributions and correlation indicators, which highlights both the sensitivity and the analytical potential of the proposed encoding scheme. The main contribution of the study is methodological: it demonstrates that error-correcting coding can be operationalized as a formal tool for detecting latent regularities in simplified monitoring data. At the same time, the present results should be regarded as proof of concept, since further work is required to validate the approach on larger datasets, compare it with baseline classification methods, and extend it to longer and multivalued response sequences.

Keywords:

information space monitoring; coding theory; error-correcting codes; Hamming code; digital monitoring; classification; social media analytics; binary response patterns; psychological testing; information warfare

1. Introduction

Monitoring the information space is of significant interest for public administration purposes, including crisis response, real-time public opinion analysis, early detection of conflict and protest agendas, and assessment of public reaction to management decisions and government public communications [1,2]. Social media monitoring is already being considered a tool for increasing public administration responsiveness, as digital platforms allow for the rapid capture of signals that were previously detected with a significant delay or were completely lost to traditional feedback channels.

Adequate analysis of social media and other digital platforms allows for the recording of changes in public opinion significantly faster than is possible using traditional survey tools alone [3,4,5]. The literature emphasizes that social media data are highly temporally sensitive and allow for the observation of user reactions to events in near-real time, whereas traditional surveys often provide discrete snapshots and are less able to reflect the dynamics of a rapidly changing agenda. However, the most viable approach today is not to completely replace surveys with digital traces, but to use them in a complementary manner.

At the same time, such monitoring requires consideration of limitations related to data quality, platform-based content selection, and the heterogeneous representativeness of online audiences [6,7,8]. Researchers have repeatedly noted that social media does not uniformly reflect the population: user composition, engagement intensity, algorithmic content visibility, and data availability through platform interfaces can introduce systematic biases. Therefore, the results of digital platform monitoring require careful interpretation, especially when attempting to generalize the findings to broader social groups or use them as a direct proxy for public opinion as a whole.

Therefore, contemporary research views information space monitoring as an interdisciplinary task combining methods of event analysis, opinion mining, sentiment analysis, modeling of information flow dynamics, analysis of network content diffusion, and natural language processing tools [1,5,6,9,10,11,12,13]. In today’s context, such monitoring is of particular interest in the context of information warfare, as digital platforms have become one of the main environments for the dissemination of disinformation, propaganda, and coordinated influence campaigns. Modern conflicts are characterized by a shift in emphasis from the simple dissemination of messages to the management of perceptions, emotional reactions, and collective interpretations of events. This is why monitoring the information space is important not only as a tool for capturing content but also as a means of identifying patterns of manipulative influence, network coordination, and the targeted change of public attitudes. In this logic, the analysis of users’ digital footprints becomes part of the broader problematic of cognitive warfare and information influence operations [14,15,16,17,18,19,20].

In this regard, the development of methods for remote monitoring of the psychological state of users of online social networks and other Internet resources is certainly of particular interest, since information warfare tools are aimed primarily at the psychological state of a person [21,22,23]. This emphasis on the human psyche is fundamental, since modern methods of information influence exploit the mechanisms of attention, memory, emotional evaluation, heuristics, and social cognitive biases. The literature on cognitive warfare and social engineering [18,24,25,26] emphasizes that human vulnerability is determined not only by the content of a message, but also by the characteristics of its cognitive processing, including emotional involvement, trust in the source, and the effect of repetition. Social platforms further enhance such influences through algorithmic content selection, high speed of distribution, and the possibility of microtargeting [27,28,29]. Consequently, monitoring user reactions to resonant events can be considered as an indirect way of observing manifestations of cognitive and emotional vulnerability in a mass audience [15,16,17,18,20]. Such methods must be scalable to large audiences, which effectively precludes the use of classical psychological testing methods and the like. Monitoring should be focused on analyzing information that is already publicly available.

Various approaches are currently being proposed to address this challenge, including sentiment analysis [30,31], opinion mining [32,33,34], event detection [35,36], network interaction structure analysis [37,38], and deep learning neural network methods for extracting features from text and multimodal content. A separate area of research is represented by digital phenotyping and mental health surveillance [39,40,41,42], which consider a user’s digital traces as a source of indicators of their psychological state. However, reviews of recent years have emphasized that existing solutions are often focused either on recognizing emotional polarity or predicting individual mental states, but much less often on constructing interpretable and verifiable schemes comparable in logic to classical testing. This makes the search for formalisms relevant that would allow for the interpretation of user reactions as responses to quasi-test stimuli and then solve the classification problem in a robust algebraic formulation [10,11,12,13,41].

In this case, the response to a test question is analogous to the reaction of an online social network user to a particular high-profile event (of a political nature, for example). Both binary logic (positive/negative reaction) and ternary logic (positive/negative reaction/no reaction) can be used here. More detailed scales using multi-valued logic, the operations of which are also often reducible to algebraic ones, can also be applied [43,44].

The fundamental difference here from classical psychological testing is as follows. The test developer selects the questions based on the essence and nature of the specific task, as well as the feasibility of subsequent verification of the proposed methodology. More precisely, in classical psychological testing, the development of a set of questions is subject to validation procedures [45,46], as well as requirements for the reliability and interpretability of scales. This means that the measured construct, the scope of content coverage, the criteria for selecting items, the procedures for expert evaluation, and subsequent statistical verification are specified at the test development stage. In other words, a question in a test is not a random stimulus: it is designed to ensure reproducible discrimination between latent states or properties. In social network monitoring tasks, it is precisely this stage that is externally uncontrollable, which requires special methods for restoring the information content of an irregular and random sequence of “quasi-questions” [45].

Consequently, the problem arises of obtaining relevant information based on a random or pseudo-random set of questions. As noted in our recent work [47], this problem is essentially a classification problem that can be solved based on analogies with methods used in error-resistant coding theory.

This formulation of the problem is justified by the following considerations, illustrated in Figure 1 of [47], which demonstrates the existence of deep analogies between classification problems currently solved using neural networks and the error-correction methods employed in error-resistant coding. This formulation of the problem aligns well with the line of research in which multi-class classification problems are interpreted through error-correcting output codes [48,49]. In the classic work of Dietterich and Bakiri [50], it was shown that class coding with the introduction of redundancy allows for increasing the robustness of classification to errors in binary decision rules; this approach was subsequently developed into a wide range of ECOC methods [51]. Later work extended this logic to modern deep architectures, where code representations are used not only as a convenient decomposition of a multi-class problem, but also as a source of robustness [52,53]. Therefore, the analogy between classification and error correction problems is strictly methodological in nature and can also be productive for analyzing user reactions in a digital environment [49,50,54,55].

This figure emphasizes that any classification problem in which the source data are representable as a code sequence (regardless of the processing method used) can be viewed as a surjection from a set A of code sequences of a certain length (or images represented in digital form) onto a set B, which corresponds to code sequences containing a smaller number of symbols.

Consequently, the tools already developed in the theory of error-correcting coding (such as BCH codes and similar ones [56,57]) can indeed be applied to solving classification problems.

This paper substantiates the applicability of this approach to information space monitoring problems. For this purpose, a model psychological test is used that simulates monitoring results represented as binary logic symbol sequences. We emphasize that this paper does not aim to verify a specific analog of a psychological or sociological test. We limit our objective to proving the viability of an approach using error-correcting coding methods for the subsequent development of methods for monitoring the digital space.

To demonstrate the essence of the proposed approach, this paper uses a very simple example based on a test containing seven questions. The choice of test questions is based on issues actually discussed in the Kazakhstani media in recent years, but the test itself is constructed without employing methods employed in classical sociological or psychological testing. This means that this test simulates a sample that might be obtained through real-world monitoring of online social networks, etc. The number of questions was chosen based on analogy with the Hamming code (7,4); this number of questions allows for a procedure analogous to error correction to be implemented in the simplest and most visual manner.

This approach allows, among other things, to solve a number of problems related to classical psychological testing, discussed, among other things, in [58].

Specifically, psychological tests, which require the respondent to answer a certain set of questions, have been and remain one of the main means of psychological diagnostics. Currently, a large number of psychological tests have been developed for different purposes [45], for example, to optimize the choice of profession, to identify personal characteristics and character traits, to diagnose mental disorders, and to assess the level of stress and emotional state.

One of the most important problems in this area has been and remains the verification of the methodology underlying the construction of a particular test. Indeed, often the creation of a test is an act of creativity by its author, who, as a rule, is guided by heuristic considerations. Verification of the methodology is carried out at the next stage, and very often the verification of the test’s performance is based on a comparison of the results obtained with its help with the results obtained using other tests recognized by the international expert community.

This criterion does not always guarantee the validity of the methods used, many of which remain the subject of debate. The discussion concerned such widely used tests as the Lüscher test, the Szondi test, and the Susan Dellinger psychogeometric test, which have been used for many years for various purposes.

The aforementioned discussion regarding the validity of the use of projective psychological tests highlights a very specific problem related to the adequacy of the use of any psychological tests. Verification of new tests is typically based on comparing their results with those of previously tested analogs; however, the potential of this approach remains limited, as highlighted by discussions such as. The proposed approach, based on a response classification procedure constructed on the analogy of error-correcting coding methods, allows, among other things, to eliminate the need for direct comparison with previously developed tests.

2. Methods

2.1. Test Used

A model test is used, the questions of which were selected based on the possibility of subsequent qualitative interpretation, based on the analysis of specific features of the socio-cultural code of the population of Kazakhstan. We emphasize that these questions were selected without the use of classical methods for forming psychological tests, discussed in the introduction. Specifically, they were based on those problems that have actually been actively discussed in Kazakhstani publications (including in online social networks) recently (including those related to the discussion of the problem of the “new matriarchy”, which has a direct connection with the problems of the feminist movement). In this sense, this set of questions can indeed be considered as a model of monitoring carried out remotely. We also emphasize that our tasks in no way included the development of any assessments of the judgments expressed in the Kazakhstani media. The choice of questions was dictated solely by the very fact of discussion of problems of this kind in the Kazakhstani media. Respondents were asked to answer the following test (the choice was limited to two options: “Yes” and “No,” which allowed them to be interpreted in terms of the values of binary logic variables). For convenience, each question in this list is marked with a capital letter (indicated in parentheses).

Do you think that a man should be the head of the family and make responsible decisions alone, regardless of changes in the current socio-economic structure? (C)
Do you share the point of view according to which there is currently a crisis in the classical monogamous family and other options should be considered? (M)
Will you follow national traditions in the event that this will bring you minor financial damage? (O)
Do you consider the desire of Kazakh women to definitely find a husband justified, despite numerous publications in the media demonstrating that a significant part of Kazakh men are not ready to bear real responsibility for the family, i.e., for people who trusted a man? (H)
Are you ready to study modern socio-political literature that reflects significant changes in the role of women and their associations in modern society? (L)
The political leadership of the Russian Federation has made a decision according to which the international LGBT movement is recognized as an extremist organization. Do you think this decision is justified? (N)
Do you consider it necessary to emphasize in public the dominant role of a man in the family, even when he is actually (perhaps, gradually) led by his wife? (Q)

The number of questions in the test used was chosen to be 7. This value corresponds to the simplest Hamming code (7,4), which allows you to correct one error in a binary sequence containing 7 characters. As will be clear from what follows, the error correction procedure allows us to identify correlations between groups of questions.

2.2. Test Results Processing Method

This method is based on the following algorithm, which solves the same problem as the Hamming code [59]. The algorithm under consideration differs significantly from the classical Hamming code construction algorithms presented in the literature [60,61]. We use this algorithm solely for clarity, based on the assumption that this work may be of interest not only to specialists in the field of coding theory, but also to researchers working at the intersection of information technology, psychology, etc.

The following isomorphism is used to represent binary characters:

1 \to - 1; 0 \to 1

(1)

This mapping corresponds to the transition from (0, 1) codes to (1-1) codes in which the operation of addition modulo 2 is replaced by the operation of direct multiplication of vector elements. In particular, formula, which allows calculation of the Hamming distance between two binary sequences, in this case can be presented in the following form

\vec{c} = \vec{a} \cdot \vec{b} = (a_{1} b_{1}, a_{2} b_{2}, a_{3} b_{3}, \dots, a_{n} b_{n})

(2)

The code sequence is formed as follows. There are four initial information symbols. To these, three redundant symbols are added, which repeat bits 2, 3 and 4 of the original information sequence in the case when it is even, or inverted values of 2, 3 and 4 symbols in the case when the main sequence is odd.

(a_{1}, a_{2}, a_{3}, a_{4}) \overset{W}{\Rightarrow} (a_{1}, a_{2}, a_{3}, a_{4}, b_{2}, b_{3}, b_{4})

(3)

The parity of the sequence

s

is defined as the product of the values

a_{1}, a_{2}, a_{3}, a_{4}

, which take the values 1 or −1.

s = a_{1} a_{2} a_{3} a_{4}

(4)

That is, the rule for generating a code sequence can be written as

(b_{1}, b_{2}, b_{3}) = s (a_{1}, a_{2}, a_{3})

(5)

The presence of redundant symbols directly related to the original ones allows you to find and correct the error.

The error correction algorithm is as follows.

The code used is

(A_{1}, A_{2}, A_{3}, A_{4}, B_{1}, B_{2}, B_{3})

registered in practice (capital letters are used to distinguish registered values that may differ from the original ones).

The scalar product is calculated

W = A_{1} B_{1} + A_{2} B_{2} + A_{3} B_{3}

(6)

where

A_{i}

,

B_{i}

are registered values.

The following options are possible:

W = \pm 3

or

W = \pm 1

.

If the case

W = \pm 3

is realized, then the sequence either does not contain errors or contains an error in the symbol

a_{1}

.

Moreover, if the parity of the registered sequence

s = A_{1} A_{2} A_{3} A_{4}

coincides with the sign of the scalar product (6), then there is no error in the symbol

a_{1}

. If the above parities do not match, the sign of the symbol

a_{1}

should be reversed.

Let us determine the value

s_{0} = s i g n (W) s

(7)

Then the expression for the code sequence after correcting the error takes the form

(a_{1}, a_{2}, a_{3}, a_{4}) = (A_{1}, A_{2}, A_{3}, s_{0} A_{4})

(8)

If

W \pm 1

, then one of the sequences

(a_{1}, a_{2}, a_{3})

or

(b_{1}, b_{2}, b_{3})

contains one error; there is no error in the symbol

a_{1}

.

The sign of the value (7) allows you to determine which of the above sequences is correct, i.e.,

(a_{1}, a_{2}, a_{3}, a_{4}) = {\begin{matrix} (A_{1}, A_{2}, A_{3}, A_{4}); s_{0} = 1 \\ s (B_{1}, B_{2}, B_{3}, A_{4}); s_{0} = - 1 \end{matrix}

(9)

We emphasize that the formulas for the sequence after correcting the error contain only four elements. This corresponds to the fact that there were also exactly 4 original information symbols. Three additional ones appeared only to ensure error correction.

We emphasize that the present study does not use the standard syndrome-based Hamming decoder. Rather, formulas (6)–(9) define a non-classical rule-based decoding procedure inspired by the coding-theoretic logic of redundancy reduction. Accordingly, the role of the proposed construction in this work is methodological: it provides a deterministic mapping of 128 possible 7-bit response patterns into 16 reduced 4-bit profiles, which can then be analyzed as aggregated response configurations. To ensure reproducibility, the corresponding verification code and the results of exhaustive enumeration for all 128 possible input sequences are provided in Appendix B.

3. Results

3.1. Rationale for the Use of Error Correction Codes for Processing Test Results

One of the main tasks that any testing solves is the classification task. Classification criteria can be established from theoretical or heuristic considerations, which often occurs in practice. This, however, does not exclude the use of techniques that make it possible to identify such criteria empirically.

The use of error-correcting codes fully corresponds to precisely this formulation of the problem.

In the present work, the reduction from 128 seven-bit response patterns to 16 four-bit profiles should not be interpreted as the classical perfect Hamming partition. Instead, it is induced by the alternative decoding rule introduced in Section 2.2. Therefore, the resulting classes are understood here as algorithmically defined aggregated response configurations rather than as standard radius-1 Hamming spheres. This interpretation is sufficient for the methodological purpose of the study, namely, to test whether a coding-inspired reduction procedure can reveal nontrivial structure in observed response patterns.

Therefore, the total number of subsets

K

, each of which corresponds to the code sequence being restored, is

K = \frac{2^{7}}{1 + 7} = \frac{2^{7}}{2^{3}} = 2^{4}

(10)

This corresponds to the number of binary digits equal to 4, as well as the designation of the Hamming code in question (7,4).

Thus, if the Hamming code (7,4)—or a procedure that solves a similar problem—is used to process test results when the test contains 7 questions, then the following situation arises. The total number of answer options is 2⁷ = 128, but this set can be divided into 2⁴ = 16 subsets, which can already be used to solve classification problems, including searching for classification criteria. This, however, does not exclude the possibility of obtaining additional information. Such possibilities are discussed below without a specific example.

It is also important that the procedure used in this work solves the same problem as the Hamming code (7,4) is only the simplest example of error-correcting codes; it is primarily used in this work for clarity. Possibilities for using other error correction codes or their analogs are discussed below.

3.2. Distribution of Respondents’ Answers

Since the goals of this study are primarily to demonstrate the viability of further development of an approach based on error-correcting coding methods, a limited sample was used. A full list of responses to the model test questions is presented in Appendix B. Participants in the survey included female students aged 21 to 23 years—senior students at technical universities in Almaty, Kazakhstan. This selection was based on the respondents’ close proximity in terms of age group, choice of educational path, etc. The survey was conducted from 10 to 25 September 2024.

Figure 2 shows a histogram showing the distribution of answers to the questions of the test used. For convenience, questions are marked with capital letters, which are indicated in brackets in the above list of questions. The y-axis in this graph shows the percentage of respondents who answered “Yes” to the corresponding test question.

The frequency of the “Yes” answer varies from 30 to 60 percent, i.e., the distribution obtained in the experiment is quite heterogeneous. This allows it to be used to identify correlations between answers to different groups of questions.

Indeed, the key operation to correct the error in accordance with the algorithm used (Section 2.2) is to calculate the scalar product using formula (6). There is, however, a significant nuance. As follows from formulas (8) and (9), the result of error correction significantly depends on the position of the symbols in the code sequence. When you rearrange them, the answer will change. Therefore, the nature of the correlations can be examined by swapping the binary symbols corresponding to the answers to specific test questions.

We will use the following approach. At the first step, the test answers are converted into numerical form, i.e., the answer “Yes” is assigned a logical 1, the answer “No” is assigned a logical 0. The sequence formed according to the following rule is used as the base one.

(C, M, O, H, L, N, Q) \to (A_{1}, A_{2}, A_{3}, A_{4}, B_{1}, B_{2}, B_{3})

(11)

In this entry, the position corresponding to the symbol

A_{4}

is highlighted in bold. In the algorithm discussed above, it really occupies a special position, since it does not participate in the calculation of the scalar product (6).

Figure 3 shows histograms characterizing the correlations between two fragments of sequences corresponding to the characters appearing in the scalar product (6). The following values were used.

r_{1} = 100 \frac{f_{1}}{q_{0}}

(12)

where

f_{1}

is the number of responses corresponding to positive values of the scalar product (6),

q_{0}

is the total number of responses in the sample.

r_{2} = 100 \frac{f_{2}}{f_{1}}

(13)

where

f_{2}

is the number of answers corresponding to the value of the scalar product (6)

W = 3

r_{3} = 100 \frac{f_{1}}{f_{3}}

(14)

where

f_{3}

is the number of answers corresponding to the value of the scalar product (6)

W = \pm 3

.

The value

r_{1}

reflects the existence of a “positive” correlation between fragments of sequences corresponding to the characters appearing in the scalar product (6), i.e., case

W > 0

. In a hypothetical case where this value is 100%, the answers to questions that answer two parts of the test with three questions each are completely correlated with each other (considering the factor of correcting one error).

The value

r_{2}

reflects the percentage of responses corresponding to the value

W = 3

among all responses corresponding to

W > 0

. This indicator characterizes the frequency of responses for which the above correlation is most pronounced.

The value

r_{3}

is the ratio of the number of answers for which the correlation is maximally pronounced and “positive” to the total number of answers with the maximally pronounced correlation.

Figure 3 reveals changes in correlations when rearranging the last four characters in sequences of the form (11). The pictures are grouped based on the choice of value corresponding to the selected symbol in the sequence (11). The ordinate axis in this graph shows the values

r_{1}

,

r_{2}

,

r_{3}

, expressed as a percentage.

Permutations of test answers (permutations of binary symbols in the corresponding code sequence) really noticeably affect the result of calculating the considered indicators

r_{i}

.

3.3. The Result of Applying the “Error Correction” Algorithm

Figure 4 presents examples of the application of the error-correction algorithm to the obtained test-response sequences.

To visually display the results, the following methodology was used.

As noted above, after “correcting the error,” instead of 128 options, only 16 remain.

These 16 options correspond to all possible values of the first four binary symbols

A_{1}, A_{2}, A_{3}, A_{4}

. Therefore, from the set of specified binary numbers it is permissible to move to a decimal number using the obvious formula

N = {2^{3} x}_{1} + 2^{2} x_{2} + 2^{1} x_{3} + 2^{0} x_{4}

(15)

When calculating using formula (15), the inverse transition to formula (1) is used, i.e., “−1” is put in correspondence with “1”, and “1” with “0”, which can also be expressed by the formula

x_{i} = \frac{1}{2} (1 - A_{i})

.

Figure 4 are constructed using the program code presented in Appendix B and the dataset presented in Appendix A.

It is these numbers that are shown in Figure 3 along the abscissa axis, they mark a specific set of answers that are separated from the base configuration by a Hamming distance of 0 or 1. The ordinate axis shows the number of answers from the sample used that corresponds to a given number.

The nature of the resulting distributions significantly depends on which answer occupies the fourth position in the sequence under consideration.

It can also be seen that all the resulting distributions are significantly heterogeneous, and there are variants in which two dominant peaks appear—Figure 4b,d. In other cases, another sharp peak also appears, but it is less pronounced.

4. Discussion

4.1. Interpretation of the Obtained Results and Methodological Implications

The results obtained in this study support the principal applicability of error-correcting coding ideas to the analysis of binary-response patterns produced by a model psychological test. In the proposed framework, an individual response profile is represented as a binary sequence. The full set of observed sequences is mapped onto a smaller set of reference codewords by means of procedures analogous to error correction. From a methodological standpoint, this makes it possible to transform a heterogeneous set of raw answers into a more structured representation, which allows at least a qualitative interpretation. Such an interpretation is consistent with the general logic of the approach formulated earlier in the manuscript, where the classification problem is treated by analogy with the reconstruction of corrupted code sequences.

Another important observation concerns the effect of permuting the order of questions. As shown in the Results section, the values of the indicators

r_{1}

,

r_{2}

, and

r_{3}

, as well as the shapes of the resulting distributions, change noticeably when the same set of questions is arranged in different orders.

This finding has a dual interpretation. On the one hand, it confirms that the coding scheme is sensitive to the structure of the response sequence, which means that different orderings may reveal different aspects of inter-item dependence. On the other hand, such sensitivity also indicates that the inferred profiles depend not only on the content of the questions but also on the chosen encoding procedure. Therefore, the observed complementarity of different permutations should not be interpreted as purely substantive evidence; it is also a property of the formal transformation itself and must be analyzed as such. In methodological terms, the results point to the need for systematic criteria for selecting or aggregating permutations rather than relying on a single arbitrary encoding order.

In this respect, the proposed approach differs from more conventional ways of analyzing questionnaire data. Unlike them, the present method operates on the level of complete response patterns and explicitly uses redundancy in the answer sequence. This feature may be valuable in situations where the analyst is interested not merely in the polarity of separate answers but in the structure of co-occurrence across the whole set of items. For such tasks, the coding-based formalism provides a compact and potentially interpretable way to identify clusters of similar response profiles. This is especially relevant when the “questions” are not produced in a controlled testing environment but arise as analogs of external stimuli in a broader information environment.

At the same time, the present findings should not be read as a direct proof that the revealed peaks correspond to uniquely defined ideological, cultural, or psychological groups. The current data support a weaker but still important conclusion: there exist stable regions in the binary response space, and the coding-based mapping helps identify them. Any further substantive interpretation of these regions should be treated as a hypothesis requiring independent corroboration.

Thus, the main contribution of the present empirical part is methodological rather than taxonomic. The study demonstrates that the analogy with error-correcting coding is not merely conceptual: it can be operationalized for real answer data and can reveal hidden structure in response patterns. At the same time, the results also make clear that the interpretation of the recovered codewords requires a careful separation between mathematically identified regularities and their substantive explanation.

4.2. Relation to Broader Coding-Theoretic Constructions

The current study was carried out for binary sequences of length 7, which naturally corresponds to one of the classical coding-theoretic settings. This choice is sufficient to demonstrate the principal feasibility of the approach, but it should be regarded only as the simplest nontrivial case. The theoretical considerations presented in the paper indicate that the same logic can be generalized to longer sequences and, accordingly, to tests containing a larger number of items.

The relevance of longer codes lies not in their formal elegance alone, but in the possibility of extending the method to richer questionnaires and, potentially, to more nuanced representations of responses. A longer sequence would permit the inclusion of more items and could therefore improve the descriptive resolution of the method, provided that the additional questions remain meaningful and not excessively redundant. Likewise, the transition from binary to multivalued logic, as a natural generalization, may allow one to move beyond simple positive/negative distinctions and encode more differentiated forms of response. Such extensions are potentially important for monitoring problems in which the absence of reaction, ambivalence, or graded agreement carries substantive information.

The value of the coding-theoretic perspective is twofold. First, it offers a mathematically rigorous mechanism for grouping nearby response patterns and searching for hidden regularities. Second, it provides a systematic language for discussing redundancy, correction, stability, and admissible distortions in answer sequences. These properties are difficult to formalize within many conventional frameworks but are central when one attempts to infer stable structures from noisy or partially inconsistent responses. This makes the approach promising not only for model tests, but also for broader classes of monitoring tasks in which response patterns arise under uncertainty.

4.3. Limitations

Several limitations of the present study should be acknowledged explicitly.

First, the empirical part is based on a model test containing only seven binary questions. This was sufficient for demonstrating the feasibility of the approach, but it necessarily restricts the granularity of the recovered patterns. A small number of items also increases the risk that the identified configurations depend strongly on the particular formulation of the selected questions rather than reflecting more general latent structures. The manuscript itself notes that the questions were chosen primarily for qualitative interpretability and were not generated through classical psychometric procedures. This should be regarded as a feature of the pilot design, but also as a limitation with respect to generalizability.

Second, the current study does not provide external validation of the recovered classes. The coding-based procedure identifies stable response configurations, but the manuscript does not yet show whether these configurations correspond to independently measurable psychological, behavioral, or demographic characteristics.

Third, the sensitivity of the results to permutation of the questions, although informative, also indicates that the method may depend on the chosen encoding scheme. This is not a flaw unique to the present approach, but it means that robustness to encoding choices must become a central criterion in future work.

Fourth, the present study uses binary logic, which inevitably simplifies the complexity of real attitudes and reactions. Many socially significant responses are ambiguous, inconsistent, weakly expressed, or context-dependent. Their reduction to binary form may improve formal tractability, but it also risks losing meaningful variation. For this reason, the conclusions drawn here should be interpreted as pertaining to a deliberately simplified model of response behavior.

Finally, the substantive domain considered in the model test is socioculturally specific. This makes the study useful as proof of concept, but it also limits direct transferability to other populations, languages, and thematic contexts. Replication on independent samples and on question sets designed for other domains is therefore essential before broader claims can be made.

4.4. Directions for Future Research

The next step is to validate the proposed approach on larger and more diverse datasets. Such validation should not only include more respondents, but also broader sets of items, repeated measurements, and independent samples. This would make it possible to determine whether the identified response profiles are stable across time and across populations, or whether they primarily reflect local properties of the current dataset.

A second important direction is comparison with baseline methods. To assess the added value of coding-based classification, future studies should compare its results with those obtained from conventional methods. Such comparisons are necessary if the proposed method is to be positioned not merely as an elegant theoretical construction, but as a practically competitive analytical tool. A second important direction is comparison with baseline methods. To assess the added value of coding-based classification, future studies should compare its results with those obtained from conventional methods. Such comparisons are necessary if the proposed method is to be positioned not merely as an elegant theoretical construction, but as a practically competitive analytical tool. In particular, it would be useful to compare the proposed framework with standard pattern-recognition and classification approaches, clustering-based methods for grouping similar response profiles, and latent-class models commonly used for categorical questionnaire-type data [62,63,64].

A third direction concerns the transition to longer sequences and multivalued response representations. As noted above, the coding-theoretic framework naturally allows such generalization. However, these extensions should be accompanied by explicit studies of interpretability, computational complexity, and robustness. It would also be useful to investigate whether certain classes of codes are better suited than others for recovering psychologically meaningful structures from noisy response data.

Finally, from the standpoint of applications, the most promising long-term use of the method may lie in the analysis of response analogs arising in digital environments, where “questions” are replaced by externally occurring stimuli and “answers” by observable reactions. The present work does not yet address this problem directly, but it provides a formal basis for such a transition. If combined with carefully justified procedures for constructing stimulus sets and validating inferred classes, the coding-based perspective may become a useful tool for studying large-scale patterns of reaction in complex information spaces.

5. Conclusions

This study has demonstrated the feasibility of applying error-correcting coding principles to the analysis of structured response patterns in a model task related to digital monitoring of information space. By representing binary answers as code sequences and using a transformation analogous to error correction, the proposed approach reduces the initial space of possible response combinations to a smaller set of stable configurations, making latent regularities more visible and analytically tractable.

The empirical results obtained for the seven-item model test show that the resulting distributions are distinctly non-uniform and that their shape depends on the encoding order of the questions. This finding is important for two reasons. First, it confirms that the method is sensitive to non-random structure in collective response patterns. Second, it shows that robustness to encoding choices must be treated as a central methodological issue rather than a secondary technical detail.

The main contribution of the present work is therefore methodological. The study shows that the analogy between classification and error correction is not merely conceptual but can be implemented as an operational procedure for simplified monitoring data. In this sense, coding theory provides not only a mathematical language for describing redundancy and stability in answer sequences, but also a potentially useful computational framework for identifying hidden patterns in digital-response environments.

At the same time, the present study should be regarded as proof of concept rather than a finalized monitoring system. The use of a short binary questionnaire, the absence of external validation, and the sociocultural specificity of the selected items limit the generalizability of the current findings. Future research should therefore focus on larger and more diverse datasets, comparisons with baseline statistical and machine-learning methods, and extensions to longer and multivalued code constructions. Under these conditions, the proposed approach may contribute to the development of new formal tools for digital monitoring, social media analysis, and broader computational studies of structured response behavior.

Author Contributions

Conceptualization, D.S., I.S.; methodology, I.S., A.B.; formal analysis, I.S., A.G., M.K., Y.V.; Funding acquisition, I.S.; writing—original draft preparation, D.S., A.B., A.G., M.K., Y.V., I.S.; writing—review and editing, A.B., I.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the Committee of Science of the Ministry of Science and Higher Education of the Republic of Kazakhstan: AP26104635.

Data Availability Statement

Data is contained within the article. The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Table of Answers to Test Questions

Do you think that a man should be the head of the family and make responsible decisions alone, regardless of changes in the current socio-economic structure?	Do you consider the desire of Kazakh women to definitely find a husband justified, despite numerous publications in the media demonstrating that a significant part of Kazakh men are not ready to bear real responsibility for the family, i.e., for people who trusted a man? (H)	Are you ready to study modern socio-political literature that reflects significant changes in the role of women and their associations in modern society? (L)	Do you share the point of view according to which there is currently a crisis in the classical monogamous family and other options should be considered? (M)	The political leadership of the Russian Federation has made a decision according to which the international LGBT movement is recognized as an extremist organization. Do you think this decision is justified? (N)	Will you follow national traditions in the event that this will bring you minor financial damage? (O)	Do you consider it necessary to emphasize in public the dominant role of a man in the family, even when he is actually (perhaps, gradually) led by his wife? (Q)
No	No	Yes	No	No	No	No
Yes	Yes	Yes	Yes	Yes	Yes	Yes
No	No	Yes	No	No	No	No
No	No	Yes	Yes	No	No	No
No	No	Yes	Yes	No	No	No
No	No	Yes	Yes	No	No	No
Yes	Yes	No	Yes	No	Yes	Yes
No	No	Yes	No	No	No	No
No	Yes	Yes	Yes	No	No	No
No	No	Yes	No	No	No	No
No	No	Yes	No	Yes	No	No
No	No	No	No	No	No	No
No	No	No	No	No	No	Yes
No	No	Yes	No	No	No	No
Yes	Yes	Yes	Yes	Yes	No	No
No	No	Yes	Yes	No	No	No
Yes	No	Yes	Yes	Yes	Yes	No
No	Yes	Yes	No	Yes	No	Yes
No	No	Yes	No	No	Yes	No
No	No	No	No	No	Yes	No
No	Yes	Yes	Yes	No	No	No
No	No	Yes	Yes	No	No	No
No	No	Yes	No	No	No	No
No	No	Yes	No	No	Yes	No
No	No	Yes	No	Yes	No	No
Yes	Yes	No	No	Yes	Yes	Yes
No	No	Yes	Yes	Yes	Yes	Yes
No	No	Yes	Yes	Yes	No	Yes
No	No	Yes	Yes	Yes	No	No
Yes	No	No	Yes	Yes	Yes	Yes
Yes	No	No	Yes	Yes	Yes	Yes
Yes	No	No	Yes	Yes	Yes	Yes
Yes	No	Yes	Yes	Yes	Yes	Yes
Yes	Yes	No	No	Yes	Yes	Yes
No	No	Yes	Yes	No	No	No
Yes	Yes	Yes	Yes	Yes	Yes	Yes
No	No	No	No	No	No	Yes
No	No	No	No	No	No	No
No	No	No	Yes	Yes	No	Yes
Yes	No	Yes	No	No	No	No
No	No	Yes	Yes	No	No	No
No	Yes	No	No	Yes	No	Yes
No	Yes	Yes	Yes	No	No	No
No	Yes	Yes	No	Yes	No	No
No	No	Yes	No	No	No	No
Yes	Yes	No	No	Yes	No	Yes
No	No	Yes	No	No	No	No
Yes	Yes	No	Yes	Yes	Yes	No
No	Yes	Yes	Yes	Yes	No	No
No	No	Yes	Yes	Yes	No	No
No	No	No	No	No	No	No
Yes	Yes	No	No	Yes	Yes	Yes
Yes	No	No	No	No	No	No

Appendix B

from __future__ import annotations

import argparse

import csv

import itertools

from collections import Counter, defaultdict

from pathlib import Path

from typing import Dict, Iterable, List, Sequence, Tuple

try:

import openpyxl # type: ignore

except Exception:

openpyxl = None

SignWord7 = Tuple[int, int, int, int, int, int, int]

SignWord4 = Tuple[int, int, int, int]

# -------------------------------

# Core coding-theory utilities

# -------------------------------

def hamming_distance(u: Sequence[int], v: Sequence[int]) -> int:

return sum(int(a != b) for a, b in zip(u, v))

def sign_to_bit(x: int) -> int:

if x not in (-1, 1):

raise ValueError(f“Expected ±1, got {x!r}”)

return 1 if x == -1 else 0

def bit_to_sign(x: int) -> int:

if x not in (0, 1):

raise ValueError(f“Expected 0/1, got {x!r}”)

return -1 if x == 1 else 1

def word_to_bit_string(word: Sequence[int]) -> str:

return “”.join(str(sign_to_bit(x)) for x in word)

def all_sign_words(n: int) -> List[Tuple[int, ...]]:

return [tuple(word) for word in itertools.product([[-1, 1], repeat=n)]

def article_encoder_formula_consistent(info: SignWord4) -> SignWord7:

a1, a2, a3, a4 = info

s = a1 * a2 * a3 * a4

return (a1, a2, a3, a4, s * a1, s * a2, s * a3)

def article_decoder_exact_published(word: SignWord7) -> SignWord4:

A1, A2, A3, A4, B1, B2, B3 = word

W = A1 * B1 + A2 * B2 + A3 * B3

s = A1 * A2 * A3 * A4

sign_W = 1 if W > 0 else -1

s0 = sign_W * s

if abs(W) == 3:

return (A1, A2, A3, s0 * A4)

if abs(W) == 1:

if s0 == 1:

return (A1, A2, A3, A4)

return (s * B1, s * B2, s * B3, s * A4)

raise ValueError(f“Unexpected W={W}. For ±1 coding this should not happen.”)

def nearest_codeword_decoder(word: SignWord7, codebook: Dict[SignWord4, SignWord7]) -> Tuple[SignWord4, int]:

best_profile = None

best_dist = None

tie_count = 0

for profile, codeword in codebook.items():

d = hamming_distance(word, codeword)

if best_dist is None or d < best_dist:

best_profile = profile

best_dist = d

tie_count = 1

elif d == best_dist:

tie_count += 1

if best_profile is None or best_dist is None:

raise RuntimeError(“Nearest-codeword decoder failed to initialize.”)

if tie_count != 1:

raise RuntimeError(

f“Nearest-codeword decoding is not unique for {word}. ”

f“Found {tie_count} codewords at distance {best_dist}.”

)

return best_profile, best_dist

def build_codebook() -> Dict[SignWord4, SignWord7]:

return {

profile: article_encoder_formula_consistent(profile)

for profile in all_sign_words(4)

}

def summarize_class_distance_profiles(

preimages: Dict[SignWord4, List[SignWord7]],

codebook: Dict[SignWord4, SignWord7],

) -> Counter:

summary = Counter()

for profile, words in preimages.items():

target = codebook[profile]

dist_counter = Counter(hamming_distance(word, target) for word in words)

summary[tuple(sorted(dist_counter.items()))] += 1

return summary

# -------------------------------

# File IO for answer tables

# -------------------------------

def normalize_answer(value) -> int:

if value is None:

raise ValueError(“Empty cell”)

if isinstance(value, bool):

return -1 if value else 1

s = str(value).strip().lower()

mapping = {

“yes”: -1,

“y”: -1,

“дa”: -1,

“д”: -1,

“true”: -1,

“1”: -1,

“-1”: -1,

“no”: 1,

“n”: 1,

“_HeT”: 1,

“_H”: 1,

“false”: 1,

“0”: 1,

“+1”: 1,

“1.0”: -1,

“0.0”: 1,

}

if s not in mapping:

raise ValueError(f“Unsupported answer value: {value!r}”)

return mapping[s]

def load_csv_table(path: Path) -> List[dict]:

with path.open(“r”, encoding=“utf-8-sig”, newline=“”) as f:

reader = csv.DictReader(f)

return list(reader)

def load_xlsx_table(path: Path, sheet_name: str | None) -> List[dict]:

if openpyxl is None:

raise RuntimeError(“openpyxl is not available, so XLSX cannot be read.”)

wb = openpyxl.load_workbook(path, data_only=True)

ws = wb[sheet_name] if sheet_name else wb.active

rows = list(ws.iter_rows(values_only=True))

if not rows:

return []

header = [str(x).strip() if x is not None else “” for x in rows[0]]

out = []

for row in rows[1:]:

out.append({header[i]: row[i] if i < len(row) else None for i in range(len(header))})

return out

def load_answer_table(path: Path, sheet_name: str | None) -> List[dict]:

suffix = path.suffix.lower()

if suffix == “.csv”:

return load_csv_table(path)

if suffix in {“.xlsx”, “.xlsm”}:

return load_xlsx_table(path, sheet_name)

raise ValueError(“Supported formats: .csv, .xlsx, .xlsm”)

def parse_question_order(order_str: str) -> List[str]:

order = [x.strip() for x in order_str.split(“,”) if x.strip()]

if len(order) != 7:

raise ValueError(“--question-order must contain exactly 7 comma-separated column names.”)

return order

def row_to_sign_word(row: dict, question_order: List[str]) -> SignWord7:

try:

return tuple(normalize_answer(row[col]) for col in question_order) # type: ignore[return-value]

except KeyError as e:

raise KeyError(f“Column not found in answer table: {e.args[0]!r}”) from e

# -------------------------------

# Writers

# -------------------------------

def write_exact_membership_csv(path: Path, preimages: Dict[SignWord4, List[SignWord7]], codebook: Dict[SignWord4, SignWord7]) -> None:

rows = []

for profile, words in sorted(preimages.items()):

codeword = codebook[profile]

for word in sorted(words):

rows.append(

{

“decoded_profile_sign”: str(profile),

“decoded_profile_bits”: word_to_bit_string(profile),

“reference_codeword_sign”: str(codeword),

“reference_codeword_bits”: word_to_bit_string(codeword),

“input_sign”: str(word),

“input_bits”: word_to_bit_string(word),

“distance_to_reference_codeword”: hamming_distance(word, codeword),

}

)

with path.open(“w”, newline=“”, encoding=“utf-8”) as f:

writer = csv.DictWriter(f, fieldnames=list(rows[0].keys()))

writer.writeheader()

writer.writerows(rows)

def write_comparison_csv(path: Path, codebook: Dict[SignWord4, SignWord7]) -> Dict[str, int]:

rows = []

mismatch_count = 0

exact_farther_than_nearest = 0

for word in all_sign_words(7):

exact_profile = article_decoder_exact_published(word)

exact_codeword = codebook[exact_profile]

exact_dist = hamming_distance(word, exact_codeword)

nn_profile, nn_dist = nearest_codeword_decoder(word, codebook)

nn_codeword = codebook[nn_profile]

mismatch = exact_profile != nn_profile

mismatch_count += int(mismatch)

exact_farther_than_nearest += int(exact_dist > nn_dist)

rows.append(

{

“input_sign”: str(word),

“input_bits”: word_to_bit_string(word),

“exact_profile_sign”: str(exact_profile),

“exact_profile_bits”: word_to_bit_string(exact_profile),

“exact_codeword_sign”: str(exact_codeword),

“exact_codeword_bits”: word_to_bit_string(exact_codeword),

“exact_distance”: exact_dist,

“nearest_profile_sign”: str(nn_profile),

“nearest_profile_bits”: word_to_bit_string(nn_profile),

“nearest_codeword_sign”: str(nn_codeword),

“nearest_codeword_bits”: word_to_bit_string(nn_codeword),

“nearest_distance”: nn_dist,

“profiles_match”: str(not mismatch),

}

)

with path.open(“w”, newline=“”, encoding=“utf-8”) as f:

writer = csv.DictWriter(f, fieldnames=list(rows[0].keys()))

writer.writeheader()

writer.writerows(rows)

return {

“mismatch_count”: mismatch_count,

“exact_farther_than_nearest”: exact_farther_than_nearest,

}

def write_answers_processed_csv(path: Path, processed_rows: List[dict]) -> None:

if not processed_rows:

return

with path.open(“w”, newline=“”, encoding=“utf-8”) as f:

writer = csv.DictWriter(f, fieldnames=list(processed_rows[0].keys()))

writer.writeheader()

writer.writerows(processed_rows)

def write_profile_summary_csv(path: Path, processed_rows: List[dict]) -> None:

counter = Counter((row[“exact_profile_bits”] for row in processed_rows))

rows = [{“exact_profile_bits”: k, “count”: v} for k, v in sorted(counter.items())]

with path.open(“w”, newline=“”, encoding=“utf-8”) as f:

writer = csv.DictWriter(f, fieldnames=[“exact_profile_bits”, “count”])

writer.writeheader()

writer.writerows(rows)

def write_yes_frequency_csv(path: Path, raw_rows: List[dict], question_order: List[str]) -> None:

rows = []

total = len(raw_rows)

for col in question_order:

yes_count = 0

for row in raw_rows:

yes_count += int(normalize_answer(row[col]) == -1)

rows.append({“question”: col, “yes_count”: yes_count, “yes_percent”: round(100 * yes_count / total, 6) if total else 0.0})

with path.open(“w”, newline=“”, encoding=“utf-8”) as f:

writer = csv.DictWriter(f, fieldnames=[“question”, “yes_count”, “yes_percent”])

writer.writeheader()

writer.writerows(rows)

# -------------------------------

# Main

# -------------------------------

def main() -> None:

parser = argparse.ArgumentParser(description=“Verify the non-classical decoder and optionally process a real answer table.”)

parser.add_argument(“--outdir”, type=Path, default=Path.cwd(), help=“Directory for outputs.”)

parser.add_argument(“--answers”, type=Path, help=“Optional CSV/XLSX answer table with 7 columns.”)

parser.add_argument(“--sheet”, type=str, default=None, help=“Excel sheet name. If omitted, the active sheet is used.”)

parser.add_argument(

“--question-order”,

type=str,

default=“C,M,O,H,L,N,Q”,

help=“Comma-separated list of the 7 column names in the order used to form the 7-bit word.”,

)

args = parser.parse_args()

outdir = args.outdir.resolve()

outdir.mkdir(parents=True, exist_ok=True)

codebook = build_codebook()

all_words = all_sign_words(7)

exact_preimages: Dict[SignWord4, List[SignWord7]] = defaultdict(list)

for word in all_words:

exact_preimages[article_decoder_exact_published(word)].append(word)

exact_class_sizes = Counter(len(words) for words in exact_preimages.values())

exact_distance_profiles = summarize_class_distance_profiles(exact_preimages, codebook)

exact_membership_csv = outdir / “exact_decoder_membership.csv”

comparison_csv = outdir / “comparison_exact_vs_nearest.csv”

report_txt = outdir / “verification_report.txt”

write_exact_membership_csv(exact_membership_csv, exact_preimages, codebook)

comparison_stats = write_comparison_csv(comparison_csv, codebook)

with report_txt.open(“w”, encoding=“utf-8”) as f:

f.write(“=” * 64 + “\n\n”)

f.write(“A. Published decoder tested\n”)

f.write(“- Encoder: formula-consistent reading of the manuscript, B = s*(a1, a2, a3).\n”)

f.write(“- Decoder: Eqs. (6)–(9) exactly as printed.\n\n”)

f.write(“B. Exhaustive enumeration\n”)

f.write(f“- Total 7-bit input words checked: {len(all_words)}\n”)

f.write(f“- Total decoded 4-bit profiles obtained: {len(exact_preimages)}\n”)

f.write(f“- Exact decoder class-size distribution: {dict(sorted(exact_class_sizes.items()))}\n\n”)

f.write(“C. Distance structure inside exact-decoder classes\n”)

for profile_signature, count in sorted(exact_distance_profiles.items()):

f.write(f“- {count} classes have distance profile {profile_signature}\n”)

f.write(“\n”)

f.write(“D. Comparison with classical nearest-codeword decoding\n”)

f.write(“- Reference decoder: unique nearest-codeword decoder on the same [7,4,3] codebook.\n”)

f.write(f“- Exact-published decoder vs nearest-codeword profile mismatches: {comparison_stats[′mismatch_count′]} / 128\n”)

f.write(

f“- Cases where exact-published decoder sends the input farther from the assigned codeword than the nearest-codeword decoder: {comparison_stats[′exact_farther_than_nearest′]} / 128\n\n”

)

f.write(“E. Plain-language interpretation\n”)

f.write(“- The published decoder is deterministic and maps all 128 inputs into 16 reduced 4-bit profiles.\n”)

if args.answers:

question_order = parse_question_order(args.question_order)

raw_rows = load_answer_table(args.answers.resolve(), args.sheet)

if not raw_rows:

raise RuntimeError(“The answer table appears to be empty.”)

processed_rows = []

for i, row in enumerate(raw_rows, start=1):

sign_word = row_to_sign_word(row, question_order)

exact_profile = article_decoder_exact_published(sign_word)

exact_codeword = codebook[exact_profile]

nn_profile, nn_dist = nearest_codeword_decoder(sign_word, codebook)

processed = {

“row_index”: i,

“input_bits”: word_to_bit_string(sign_word),

“input_sign”: str(sign_word),

“exact_profile_bits”: word_to_bit_string(exact_profile),

“exact_profile_sign”: str(exact_profile),

“exact_codeword_bits”: word_to_bit_string(exact_codeword),

“distance_to_exact_codeword”: hamming_distance(sign_word, exact_codeword),

“nearest_profile_bits”: word_to_bit_string(nn_profile),

“nearest_distance”: nn_dist,

“profiles_match”: str(exact_profile == nn_profile),

}

for col in question_order:

processed[f“raw_{col}”] = row[col]

processed_rows.append(processed)

write_answers_processed_csv(outdir / “answers_processed.csv”, processed_rows)

write_profile_summary_csv(outdir / “answers_profile_summary.csv”, processed_rows)

write_yes_frequency_csv(outdir / “answers_yes_frequency.csv”, raw_rows, question_order)

with report_txt.open(“a”, encoding=“utf-8”) as f:

f.write(“\nF. Real answer table processed\n”)

f.write(f“- Source file: {args.answers.resolve()}\n”)

f.write(f“- Number of answer rows: {len(raw_rows)}\n”)

f.write(f“- Question order used to form the 7-bit word: {′,′.join(question_order)}\n”)

prof_counter = Counter(row[“exact_profile_bits”] for row in processed_rows)

f.write(f“- Non-empty decoded profiles in the real dataset: {len(prof_counter)}\n”)

f.write(f“- Most frequent decoded profiles: {prof_counter.most_common(10)}\n”)

print(f“Wrote: {report_txt}”)

print(f“Wrote: {exact_membership_csv}”)

print(f“Wrote: {comparison_csv}”)

if args.answers:

print(f“Wrote: {outdir / ′answers_processed.csv′}”)

print(f“Wrote: {outdir / ′answers_profile_summary.csv′}”)

print(f“Wrote: {outdir / ′answers_yes_frequency.csv′}”)

if __name__ == “__main__”:

main()

Decoded results

decoded_profile_sign,decoded_profile_bits,reference_codeword_sign,reference_codeword_bits,input_sign,input_bits,distance_to_reference_codeword

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,0

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(-1, -1, -1, -1, -1, -1, 1)”,1111110,1

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(-1, -1, -1, -1, -1, 1, -1)”,1111101,1

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(-1, -1, -1, -1, 1, -1, -1)”,1111011,1

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(-1, -1, -1, 1, -1, -1, -1)”,1110111,1

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(-1, 1, 1, -1, -1, -1, -1)”,1001111,2

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(-1, 1, 1, 1, 1, 1, 1)”,1000000,6

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(1, -1, 1, -1, -1, -1, -1)”,0101111,2

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(1, -1, 1, 1, 1, 1, 1)”,0100000,6

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(1, 1, -1, -1, -1, -1, -1)”,0011111,2

(-1, -1, -1, -1),1111,“(-1, -1, -1, -1, -1, -1, -1)”,1111111,“(1, 1, -1, 1, 1, 1, 1)”,0010000,6

(-1, -1, -1, 1),1110,“(-1, -1, -1, 1, 1, 1, 1)”,1110000,“(-1, -1, -1, -1, 1, 1, 1)”,1111000,1

(-1, -1, -1, 1),1110,“(-1, -1, -1, 1, 1, 1, 1)”,1110000,“(-1, -1, -1, 1, -1, 1, 1)”,1110100,1

(-1, -1, -1, 1),1110,“(-1, -1, -1, 1, 1, 1, 1)”,1110000,“(-1, -1, -1, 1, 1, -1, 1)”,1110010,1

(-1, -1, -1, 1),1110,“(-1, -1, -1, 1, 1, 1, 1)”,1110000,“(-1, -1, -1, 1, 1, 1, -1)”,1110001,1

(-1, -1, -1, 1),1110,“(-1, -1, -1, 1, 1, 1, 1)”,1110000,“(-1, -1, -1, 1, 1, 1, 1)”,1110000,0

(-1, -1, 1, -1),1101,“(-1, -1, 1, -1, 1, 1, -1)”,1101001,“(-1, -1, 1, -1, -1, 1, -1)”,1101101,1

(-1, -1, 1, -1),1101,“(-1, -1, 1, -1, 1, 1, -1)”,1101001,“(-1, -1, 1, -1, 1, -1, -1)”,1101011,1

(-1, -1, 1, -1),1101,“(-1, -1, 1, -1, 1, 1, -1)”,1101001,“(-1, -1, 1, -1, 1, 1, -1)”,1101001,0

(-1, -1, 1, -1),1101,“(-1, -1, 1, -1, 1, 1, -1)”,1101001,“(-1, -1, 1, -1, 1, 1, 1)”,1101000,1

(-1, -1, 1, -1),1101,“(-1, -1, 1, -1, 1, 1, -1)”,1101001,“(-1, -1, 1, 1, 1, 1, -1)”,1100001,1

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(-1, -1, 1, -1, -1, -1, 1)”,1101110,1

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(-1, -1, 1, 1, -1, -1, -1)”,1100111,1

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,0

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(-1, -1, 1, 1, -1, 1, 1)”,1100100,1

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(-1, -1, 1, 1, 1, -1, 1)”,1100010,1

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(-1, 1, -1, -1, 1, 1, -1)”,1011001,6

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(-1, 1, -1, 1, -1, -1, 1)”,1010110,2

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(1, -1, -1, -1, 1, 1, -1)”,0111001,6

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(1, -1, -1, 1, -1, -1, 1)”,0110110,2

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(1, 1, 1, -1, 1, 1, -1)”,0001001,6

(-1, -1, 1, 1),1100,“(-1, -1, 1, 1, -1, -1, 1)”,1100110,“(1, 1, 1, 1, -1, -1, 1)”,0000110,2

(-1, 1, -1, -1),1011,“(-1, 1, -1, -1, 1, -1, 1)”,1011010,“(-1, 1, -1, -1, -1, -1, 1)”,1011110,1

(-1, 1, -1, -1),1011,“(-1, 1, -1, -1, 1, -1, 1)”,1011010,“(-1, 1, -1, -1, 1, -1, -1)”,1011011,1

(-1, 1, -1, -1),1011,“(-1, 1, -1, -1, 1, -1, 1)”,1011010,“(-1, 1, -1, -1, 1, -1, 1)”,1011010,0

(-1, 1, -1, -1),1011,“(-1, 1, -1, -1, 1, -1, 1)”,1011010,“(-1, 1, -1, -1, 1, 1, 1)”,1011000,1

(-1, 1, -1, -1),1011,“(-1, 1, -1, -1, 1, -1, 1)”,1011010,“(-1, 1, -1, 1, 1, -1, 1)”,1010010,1

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(-1, -1, 1, -1, 1, -1, 1)”,1101010,6

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(-1, -1, 1, 1, -1, 1, -1)”,1100101,2

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(-1, 1, -1, -1, -1, 1, -1)”,1011101,1

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(-1, 1, -1, 1, -1, -1, -1)”,1010111,1

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,0

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(-1, 1, -1, 1, -1, 1, 1)”,1010100,1

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(-1, 1, -1, 1, 1, 1, -1)”,1010001,1

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(1, -1, -1, -1, 1, -1, 1)”,0111010,6

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(1, -1, -1, 1, -1, 1, -1)”,0110101,2

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(1, 1, 1, -1, 1, -1, 1)”,0001010,6

(-1, 1, -1, 1),1010,“(-1, 1, -1, 1, -1, 1, -1)”,1010101,“(1, 1, 1, 1, -1, 1, -1)”,0000101,2

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(-1, -1, -1, -1, -1, 1, 1)”,1111100,2

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(-1, -1, -1, 1, 1, -1, -1)”,1110011,6

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(-1, 1, 1, -1, -1, -1, 1)”,1001110,1

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(-1, 1, 1, -1, -1, 1, -1)”,1001101,1

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,0

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(-1, 1, 1, -1, 1, 1, 1)”,1001000,1

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(-1, 1, 1, 1, -1, 1, 1)”,1000100,1

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(1, -1, 1, -1, -1, 1, 1)”,0101100,2

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(1, -1, 1, 1, 1, -1, -1)”,0100011,6

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(1, 1, -1, -1, -1, 1, 1)”,0011100,2

(-1, 1, 1, -1),1001,“(-1, 1, 1, -1, -1, 1, 1)”,1001100,“(1, 1, -1, 1, 1, -1, -1)”,0010011,6

(-1, 1, 1, 1),1000,“(-1, 1, 1, 1, 1, -1, -1)”,1000011,“(-1, 1, 1, -1, 1, -1, -1)”,1001011,1

(-1, 1, 1, 1),1000,“(-1, 1, 1, 1, 1, -1, -1)”,1000011,“(-1, 1, 1, 1, -1, -1, -1)”,1000111,1

(-1, 1, 1, 1),1000,“(-1, 1, 1, 1, 1, -1, -1)”,1000011,“(-1, 1, 1, 1, 1, -1, -1)”,1000011,0

(-1, 1, 1, 1),1000,“(-1, 1, 1, 1, 1, -1, -1)”,1000011,“(-1, 1, 1, 1, 1, -1, 1)”,1000010,1

(-1, 1, 1, 1),1000,“(-1, 1, 1, 1, 1, -1, -1)”,1000011,“(-1, 1, 1, 1, 1, 1, -1)”,1000001,1

(1, -1, -1, -1),0111,“(1, -1, -1, -1, -1, 1, 1)”,0111100,“(1, -1, -1, -1, -1, -1, 1)”,0111110,1

(1, -1, -1, -1),0111,“(1, -1, -1, -1, -1, 1, 1)”,0111100,“(1, -1, -1, -1, -1, 1, -1)”,0111101,1

(1, -1, -1, -1),0111,“(1, -1, -1, -1, -1, 1, 1)”,0111100,“(1, -1, -1, -1, -1, 1, 1)”,0111100,0

(1, -1, -1, -1),0111,“(1, -1, -1, -1, -1, 1, 1)”,0111100,“(1, -1, -1, -1, 1, 1, 1)”,0111000,1

(1, -1, -1, -1),0111,“(1, -1, -1, -1, -1, 1, 1)”,0111100,“(1, -1, -1, 1, -1, 1, 1)”,0110100,1

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(-1, -1, 1, -1, -1, 1, 1)”,1101100,6

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(-1, -1, 1, 1, 1, -1, -1)”,1100011,2

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(-1, 1, -1, -1, -1, 1, 1)”,1011100,6

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(-1, 1, -1, 1, 1, -1, -1)”,1010011,2

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(1, -1, -1, -1, 1, -1, -1)”,0111011,1

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(1, -1, -1, 1, -1, -1, -1)”,0110111,1

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(1, -1, -1, 1, 1, -1, -1)”,0110011,0

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(1, -1, -1, 1, 1, -1, 1)”,0110010,1

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(1, -1, -1, 1, 1, 1, -1)”,0110001,1

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(1, 1, 1, -1, -1, 1, 1)”,0001100,6

(1, -1, -1, 1),0110,“(1, -1, -1, 1, 1, -1, -1)”,0110011,“(1, 1, 1, 1, 1, -1, -1)”,0000011,2

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(-1, -1, -1, -1, 1, -1, 1)”,1111010,2

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(-1, -1, -1, 1, -1, 1, -1)”,1110101,6

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(-1, 1, 1, -1, 1, -1, 1)”,1001010,2

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(-1, 1, 1, 1, -1, 1, -1)”,1000101,6

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(1, -1, 1, -1, -1, -1, 1)”,0101110,1

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(1, -1, 1, -1, 1, -1, -1)”,0101011,1

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(1, -1, 1, -1, 1, -1, 1)”,0101010,0

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(1, -1, 1, -1, 1, 1, 1)”,0101000,1

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(1, -1, 1, 1, 1, -1, 1)”,0100010,1

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(1, 1, -1, -1, 1, -1, 1)”,0011010,2

(1, -1, 1, -1),0101,“(1, -1, 1, -1, 1, -1, 1)”,0101010,“(1, 1, -1, 1, -1, 1, -1)”,0010101,6

(1, -1, 1, 1),0100,“(1, -1, 1, 1, -1, 1, -1)”,0100101,“(1, -1, 1, -1, -1, 1, -1)”,0101101,1

(1, -1, 1, 1),0100,“(1, -1, 1, 1, -1, 1, -1)”,0100101,“(1, -1, 1, 1, -1, -1, -1)”,0100111,1

(1, -1, 1, 1),0100,“(1, -1, 1, 1, -1, 1, -1)”,0100101,“(1, -1, 1, 1, -1, 1, -1)”,0100101,0

(1, -1, 1, 1),0100,“(1, -1, 1, 1, -1, 1, -1)”,0100101,“(1, -1, 1, 1, -1, 1, 1)”,0100100,1

(1, -1, 1, 1),0100,“(1, -1, 1, 1, -1, 1, -1)”,0100101,“(1, -1, 1, 1, 1, 1, -1)”,0100001,1

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(-1, -1, -1, -1, 1, 1, -1)”,1111001,2

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(-1, -1, -1, 1, -1, -1, 1)”,1110110,6

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(-1, 1, 1, -1, 1, 1, -1)”,1001001,2

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(-1, 1, 1, 1, -1, -1, 1)”,1000110,6

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(1, -1, 1, -1, 1, 1, -1)”,0101001,2

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(1, -1, 1, 1, -1, -1, 1)”,0100110,6

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(1, 1, -1, -1, -1, 1, -1)”,0011101,1

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(1, 1, -1, -1, 1, -1, -1)”,0011011,1

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(1, 1, -1, -1, 1, 1, -1)”,0011001,0

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(1, 1, -1, -1, 1, 1, 1)”,0011000,1

(1, 1, -1, -1),0011,“(1, 1, -1, -1, 1, 1, -1)”,0011001,“(1, 1, -1, 1, 1, 1, -1)”,0010001,1

(1, 1, -1, 1),0010,“(1, 1, -1, 1, -1, -1, 1)”,0010110,“(1, 1, -1, -1, -1, -1, 1)”,0011110,1

(1, 1, -1, 1),0010,“(1, 1, -1, 1, -1, -1, 1)”,0010110,“(1, 1, -1, 1, -1, -1, -1)”,0010111,1

(1, 1, -1, 1),0010,“(1, 1, -1, 1, -1, -1, 1)”,0010110,“(1, 1, -1, 1, -1, -1, 1)”,0010110,0

(1, 1, -1, 1),0010,“(1, 1, -1, 1, -1, -1, 1)”,0010110,“(1, 1, -1, 1, -1, 1, 1)”,0010100,1

(1, 1, -1, 1),0010,“(1, 1, -1, 1, -1, -1, 1)”,0010110,“(1, 1, -1, 1, 1, -1, 1)”,0010010,1

(1, 1, 1, -1),0001,“(1, 1, 1, -1, -1, -1, -1)”,0001111,“(1, 1, 1, -1, -1, -1, -1)”,0001111,0

(1, 1, 1, -1),0001,“(1, 1, 1, -1, -1, -1, -1)”,0001111,“(1, 1, 1, -1, -1, -1, 1)”,0001110,1

(1, 1, 1, -1),0001,“(1, 1, 1, -1, -1, -1, -1)”,0001111,“(1, 1, 1, -1, -1, 1, -1)”,0001101,1

(1, 1, 1, -1),0001,“(1, 1, 1, -1, -1, -1, -1)”,0001111,“(1, 1, 1, -1, 1, -1, -1)”,0001011,1

(1, 1, 1, -1),0001,“(1, 1, 1, -1, -1, -1, -1)”,0001111,“(1, 1, 1, 1, -1, -1, -1)”,0000111,1

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(-1, -1, 1, -1, -1, -1, -1)”,1101111,6

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(-1, -1, 1, 1, 1, 1, 1)”,1100000,2

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(-1, 1, -1, -1, -1, -1, -1)”,1011111,6

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(-1, 1, -1, 1, 1, 1, 1)”,1010000,2

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(1, -1, -1, -1, -1, -1, -1)”,0111111,6

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(1, -1, -1, 1, 1, 1, 1)”,0110000,2

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(1, 1, 1, -1, 1, 1, 1)”,0001000,1

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(1, 1, 1, 1, -1, 1, 1)”,0000100,1

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(1, 1, 1, 1, 1, -1, 1)”,0000010,1

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(1, 1, 1, 1, 1, 1, -1)”,0000001,1

(1, 1, 1, 1),0000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,“(1, 1, 1, 1, 1, 1, 1)”,0000000,0

References

Bekkers, V.; Edwards, A.; de Kool, D. Social media monitoring: Responsive governance in the shadow of surveillance? Gov. Inf. Q. 2013, 30, 335–342. [Google Scholar] [CrossRef]
Smailova, A.; Kuzembayeva, A.; Utkelbay, R.; Kukeyeva, F.; Baizakova, K. Public opinion monitoring technologies: How the state uses data to make political decisions. Front. Political Sci. 2025, 7, 1680172. [Google Scholar] [CrossRef]
Reveilhac, M.; Steinmetz, S.; Morselli, D. A systematic literature review of how and whether social media data can complement traditional survey data to study public opinion. Multimed. Tools Appl. 2022, 81, 10107–10142. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, F.; Rohe, K. Social media public opinion as flocks in a murmuration: Conceptualizing and measuring opinion expression on social media. J. Comput. Mediat. Commun. 2022, 27, zmab021. [Google Scholar] [CrossRef]
Cortis, K.; Davis, B. Over a decade of social opinion mining: A systematic review. Artif. Intell. Rev. 2021, 54, 4873–4965. [Google Scholar] [CrossRef]
Dong, X.; Lian, Y. A review of social media-based public opinion analyses: Challenges and recommendations. Technol. Soc. 2021, 67, 101724. [Google Scholar] [CrossRef]
Blank, G.; Lutz, C. Representativeness of social media in great britain: Investigating Facebook, Linkedin, Twitter, Pinterest, Google+, and Instagram. Am. Behav. Sci. 2017, 61, 741–756. [Google Scholar] [CrossRef]
Olteanu, A.; Castillo, C.; Diaz, F.; Kıcıman, E. Social data: Biases, methodological pitfalls, and ethical boundaries. Front. Big Data 2019, 2, 13. [Google Scholar] [CrossRef] [PubMed]
Fung, I.C.H.; Tse, Z.T.H.; Fu, K.W. The use of social media in public health surveillance. West. Pac. Surveill. Response J. WPSAR 2015, 6, 3. [Google Scholar] [CrossRef]
Soong, H.C.; Jalil, N.B.A.; Ayyasamy, R.K.; Akbar, R. The essential of sentiment analysis and opinion mining in social media: Introduction and survey of the recent approaches and techniques. In 2019 IEEE 9th Symposium on Computer Applications & Industrial Electronics (ISCAIE); IEEE: Piscataway, NJ, USA, 2019; pp. 272–277. [Google Scholar]
Mredula, M.S.; Dey, N.; Rahman, M.S.; Mahmud, I.; Cho, Y.Z. A review on the trends in event detection by analyzing social media platforms’ data. Sensors 2022, 22, 4531. [Google Scholar] [CrossRef]
Han, Z.; Shi, L.; Liu, L.; Jiang, L.; Fang, J.; Lin, F.; Zhang, J.; Panneerselvam, J.; Antonopoulos, N. A survey on event tracking in social media data streams. Big Data Min. Anal. 2023, 7, 217–243. [Google Scholar]
Rodríguez-Ibánez, M.; Casánez-Ventura, A.; Castejón-Mateos, F.; Cuenca-Jiménez, P.M. A review on sentiment analysis from social media platforms. Expert Syst. Appl. 2023, 223, 119862. [Google Scholar] [CrossRef]
Bakirov, A.; Suleimenov, I. Theoretical Bases of Methods of Counteraction to Modern Forms of Information Warfare. Computers 2025, 14, 410. [Google Scholar] [CrossRef]
Grahn, H.; Pamment, J. Exploitation of Psychological Processes in Information Influence Operations: Insights from Cognitive Science; Working Papers; Lund University Psychological Defence Research Institute: Helsingborg, Sweden, 2024. [Google Scholar]
Ferrara, E.; Cresci, S.; Luceri, L. Misinformation, manipulation, and abuse on social media in the era of COVID-19. J. Comput. Soc. Sci. 2020, 3, 271–277. [Google Scholar] [CrossRef]
Miller, S. Cognitive warfare: An ethical analysis. Ethics Inf. Technol. 2023, 25, 46. [Google Scholar] [CrossRef]
Deppe, C.; Schaal, G.S. Cognitive warfare: A conceptual analysis of the NATO ACT cognitive warfare exploratory concept. Front. Big Data 2024, 7, 1452129. [Google Scholar] [CrossRef] [PubMed]
Chang, H.T.; Tsai, F.C. A systematic review of Internet public opinion manipulation. Procedia Comput. Sci. 2022, 207, 3159–3166. [Google Scholar] [CrossRef]
Montañez, R.; Golob, E.; Xu, S. Human cognition through the lens of social engineering cyberattacks. Front. Psychol. 2020, 11, 1755. [Google Scholar] [CrossRef] [PubMed]
Bolton, D. Targeting ontological security: Information warfare in the modern age. Political Psychol. 2021, 42, 127–142. [Google Scholar] [CrossRef]
Vilarino del Castillo, D.; Lopez-Zafra, E. Antecedents of psychological capital at work: A systematic review of moderator–mediator effects and a new integrative proposal. Eur. Manag. Rev. 2022, 19, 154–169. [Google Scholar] [CrossRef]
Dov Bachmann, S.D.; Putter, D.; Duczynski, G. Hybrid warfare and disinformation: A Ukraine war perspective. Glob. Policy 2023, 14, 858–869. [Google Scholar] [CrossRef]
Fenstermacher, L.; Uzcha, D.; Larson, K.; Vitiello, C.; Shellman, S. New perspectives on cognitive warfare. In Signal Processing, Sensor/Information Fusion, and Target Recognition XXXII; SPIE: Bellingham, WA, USA, 2023; Volume 12547, pp. 172–187. [Google Scholar]
Salahdine, F.; Kaabouch, N. Social engineering attacks: A survey. Future Internet 2019, 11, 89. [Google Scholar] [CrossRef]
Syafitri, W.; Shukur, Z.; Asma’Mokhtar, U.; Sulaiman, R.; Ibrahim, M.A. Social engineering attacks prevention: A systematic literature review. IEEE Access 2022, 10, 39325–39343. [Google Scholar] [CrossRef]
Frauenstein, E.D.; Flowerday, S. Susceptibility to phishing on social network sites: A personality information processing model. Comput. Secur. 2020, 94, 101862. [Google Scholar] [CrossRef]
Norris, G.; Brookes, A.; Dowell, D. The psychology of internet fraud victimisation: A systematic review. J. Police Crim. Psychol. 2019, 34, 231–245. [Google Scholar] [CrossRef]
Surjatmodjo, D.; Unde, A.A.; Cangara, H.; Sonni, A.F. Information pandemic: A critical review of disinformation spread on social media and its implications for state resilience. Soc. Sci. 2024, 13, 418. [Google Scholar] [CrossRef]
Tan, K.L.; Lee, C.P.; Lim, K.M. A survey of sentiment analysis: Approaches, datasets, and future research. Appl. Sci. 2023, 13, 4550. [Google Scholar] [CrossRef]
Das, R.; Singh, T.D. Multimodal sentiment analysis: A survey of methods, trends, and challenges. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
Lin, B.; Cassee, N.; Serebrenik, A.; Bavota, G.; Novielli, N.; Lanza, M. Opinion mining for software development: A systematic literature review. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2022, 31, 1–41. [Google Scholar] [CrossRef]
Shaik, T.; Tao, X.; Dann, C.; Xie, H.; Li, Y.; Galligan, L. Sentiment analysis and opinion mining on educational data: A survey. Nat. Lang. Process. J. 2023, 2, 100003. [Google Scholar] [CrossRef]
Messaoudi, C.; Guessoum, Z.; Ben Romdhane, L. Opinion mining in online social media: A survey. Soc. Netw. Anal. Min. 2022, 12, 25. [Google Scholar] [CrossRef]
Weng, J.; Lee, B.S. Event detection in twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Barcelona, Spain, 17–21 July 2011; Volume 5, pp. 401–408. [Google Scholar]
Yu, M.; Bambacus, M.; Cervone, G.; Clarke, K.; Duffy, D.; Huang, Q.; Li, J.; Li, W.; Li, Z.; Liu, Q.; et al. Spatiotemporal event detection: A review. Int. J. Digit. Earth 2020, 13, 1339–1365. [Google Scholar] [CrossRef]
Kumar, P.; Sinha, A. Information diffusion modeling and analysis for socially interacting networks. Soc. Netw. Anal. Min. 2021, 11, 11. [Google Scholar] [CrossRef]
Peng, H.; Zhang, J.; Huang, X.; Hao, Z.; Li, A.; Yu, Z.; Yu, P.S. Unsupervised social bot detection via structural information theory. ACM Trans. Inf. Syst. 2024, 42, 1–42. [Google Scholar] [CrossRef]
Onnela, J.P. Opportunities and challenges in the collection and analysis of digital phenotyping data. Neuropsychopharmacology 2021, 46, 45–54. [Google Scholar] [CrossRef] [PubMed]
Bufano, P.; Laurino, M.; Said, S.; Tognetti, A.; Menicucci, D. Digital phenotyping for monitoring mental disorders: Systematic review. J. Med. Internet Res. 2023, 25, e46778. [Google Scholar] [CrossRef] [PubMed]
Skaik, R.; Inkpen, D. Using social media for mental health surveillance: A review. ACM Comput. Surv. (CSUR) 2020, 53, 1–31. [Google Scholar] [CrossRef]
Colpe, L.J.; Freeman, E.J.; Strine, T.W.; Dhingra, S.; McGuire, L.C.; Elam-Evans, L.D.; Perry, G.S. Public health surveillance for mental health. Prev. Chronic Dis. 2009, 7, A17. [Google Scholar]
Suleimenov, I.E.; Vitulyova, Y.S.; Kabdushev, S.B.; Bakirov, A.S. Improving the efficiency of using multivalued logic tools. Sci. Rep. 2023, 13, 1108. [Google Scholar] [CrossRef]
Suleimenov, I.E.; Vitulyova, Y.S.; Kabdushev, S.B.; Bakirov, A.S. Improving the efficiency of using multivalued logic tools: Application of algebraic rings. Sci. Rep. 2023, 13, 22021. [Google Scholar] [CrossRef]
Fenn, J.; Tan, C.S.; George, S. Development, validation and translation of psychological tests. BJPsych Adv. 2020, 26, 306–315. [Google Scholar] [CrossRef]
Stefana, A.; Damiani, S.; Granziol, U.; Provenzani, U.; Solmi, M.; Youngstrom, E.A.; Fusar-Poli, P. Psychological, psychiatric, and behavioral sciences measurement scales: Best practice guidelines for their development and validation. Front. Psychol. 2025, 15, 1494261. [Google Scholar] [CrossRef]
Shaltykova, D.; Massalimova, A.; Vitulyova, Y.; Suleimenov, I. Algorithm for Obtaining Complete Irreducible Polynomials over Given Galois Field for New Method of Digital Monitoring of Information Space. Computers 2025, 14, 468. [Google Scholar] [CrossRef]
Dietterich, T.G.; Bakiri, G. Error-correcting output codes: A general method for improving multiclass inductive learning programs. In The Mathematics of Generalization; CRC Press: Boca Raton, FL, USA, 2018; pp. 395–407. [Google Scholar]
Wang, L.N.; Wei, H.; Zheng, Y.; Dong, J.; Zhong, G. Deep error-correcting output codes. Algorithms 2023, 16, 555. [Google Scholar] [CrossRef]
Dietterich, T.G.; Bakiri, G. Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 1994, 2, 263–286. [Google Scholar] [CrossRef]
Radeva, P.; Pujol, O.; Escalera, S. ECOC-ONE: A novel coding and decoding strategy. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; IEEE: New York, NY, USA, 2006; Volume 3, pp. 578–581. [Google Scholar]
Archana, R.; Jeevaraj, P.E. Deep learning models for digital image processing: A review. Artif. Intell. Rev. 2024, 57, 11. [Google Scholar] [CrossRef]
Ahmed, S.F.; Bin Alam, S.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Ali, A.B.M.S.; Gandomi, A.H. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artif. Intell. Rev. 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
Escalera, S.; Pujol, O.; Radeva, P. Error-correcting ouput codes library. J. Mach. Learn. Res. 2010, 11, 661–664. [Google Scholar]
Yu, A.; Jing, S.; Lyu, N.; Wen, W.; Yan, Z. Error correction output codes for robust neural networks against weight-errors: A neural tangent kernel point of view. Adv. Neural Inf. Process. Syst. 2024, 37, 82493–82513. [Google Scholar]
Aly, S.A.; Klappenecker, A.; Sarvepalli, P.K. On quantum and classical BCH codes. IEEE Trans. Inf. Theory 2007, 53, 1183–1188. [Google Scholar] [CrossRef]
Li, C.; Wu, P.; Liu, F. On two classes of primitive BCH codes and some related codes. IEEE Trans. Inf. Theory 2018, 65, 3830–3840. [Google Scholar] [CrossRef]
Suleimenov, I.; Kostsova, M.; Grishina, A.; Matrassulova, D.; Vitulyova, Y. Empirical validation of the use of projective techniques in psychological testing using Galois fields. Front. Appl. Math. Stat. 2024, 10, 1455500. [Google Scholar] [CrossRef]
Bakirov, A.S.; Suleimenov, I.E. On the possibility of implementing artificial intelligence systems based on error-correcting code algorithms. J. Theor. Appl. Inf. Technol. 2021, 99, 83–99. [Google Scholar]
Singh, A.K. Error detection and correction by hamming code. In Proceedings of the 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC), Jalgaon, India, 22–24 December 2016; IEEE: New York, NY, USA, 2016; pp. 35–37. [Google Scholar]
Hillier, C.; Balyan, V. Error Detection and Correction On-Board Nanosatellites Using Hamming Codes. J. Electr. Comput. Eng. 2019, 2019, 3905094. [Google Scholar] [CrossRef]
Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006; Volume 4, p. 738. [Google Scholar]
Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Collins, L.M.; Lanza, S.T. Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]

Figure 1. The mapping of set A into set B, corresponding to the partition of set A into subsets A_i, each of which corresponds to a certain codeword with an absent error [47].

Figure 2. Frequency of answering “Yes” to test questions (percentages), marked in capital letters.

Figure 3. Histograms reflecting the distributions of values

r_{1}, r_{2}, r_{3}

during permutations in code sequences; (a–d) refer to cases where position

A_{4}

contains questions marked with the letters H, l, Q and N, respectively.

Figure 3. Histograms reflecting the distributions of values

r_{1}, r_{2}, r_{3}

during permutations in code sequences; (a–d) refer to cases where position

A_{4}

contains questions marked with the letters H, l, Q and N, respectively.

Figure 4. Distribution of test results by code sequence numbers for various permutations of answers; (a) corresponds to the combination of letters NQHL, (b)—NQLH, (c)—NHQL, (d)—NHLQ, (e)—NLQH, (f)—NLHQ.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shaltykova, D.; Bakirov, A.; Grishina, A.; Kostsova, M.; Vitulyova, Y.; Suleimenov, I. Verification of the Methods of Digital Monitoring of Information Space Based on Coding Theory Tools. Computers 2026, 15, 260. https://doi.org/10.3390/computers15040260

AMA Style

Shaltykova D, Bakirov A, Grishina A, Kostsova M, Vitulyova Y, Suleimenov I. Verification of the Methods of Digital Monitoring of Information Space Based on Coding Theory Tools. Computers. 2026; 15(4):260. https://doi.org/10.3390/computers15040260

Chicago/Turabian Style

Shaltykova, Dina, Akhat Bakirov, Anastasiya Grishina, Mariya Kostsova, Yelizaveta Vitulyova, and Ibragim Suleimenov. 2026. "Verification of the Methods of Digital Monitoring of Information Space Based on Coding Theory Tools" Computers 15, no. 4: 260. https://doi.org/10.3390/computers15040260

APA Style

Shaltykova, D., Bakirov, A., Grishina, A., Kostsova, M., Vitulyova, Y., & Suleimenov, I. (2026). Verification of the Methods of Digital Monitoring of Information Space Based on Coding Theory Tools. Computers, 15(4), 260. https://doi.org/10.3390/computers15040260

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Verification of the Methods of Digital Monitoring of Information Space Based on Coding Theory Tools

Abstract

1. Introduction

2. Methods

2.1. Test Used

2.2. Test Results Processing Method

3. Results

3.1. Rationale for the Use of Error Correction Codes for Processing Test Results

3.2. Distribution of Respondents’ Answers

3.3. The Result of Applying the “Error Correction” Algorithm

4. Discussion

4.1. Interpretation of the Obtained Results and Methodological Implications

4.2. Relation to Broader Coding-Theoretic Constructions

4.3. Limitations

4.4. Directions for Future Research

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Table of Answers to Test Questions

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI