Micro-Macro Modeling of Inherent Cognitive Biases in 5-Point Likert Scales: Uncovering the Non-Linearity of Critical Sample Sizes for Capturing Identical Statistical Populations

Kawahata, Yasuko

doi:10.3390/computation14050100

Open AccessArticle

Micro-Macro Modeling of Inherent Cognitive Biases in 5-Point Likert Scales: Uncovering the Non-Linearity of Critical Sample Sizes for Capturing Identical Statistical Populations

by

Yasuko Kawahata

Faculty of Sociology, Rikkyo University, 3-34-1 Nishi-Ikebukuro, Toshima-ku, Tokyo 171-8501, Japan

Computation 2026, 14(5), 100; https://doi.org/10.3390/computation14050100

Submission received: 25 March 2026 / Revised: 21 April 2026 / Accepted: 23 April 2026 / Published: 27 April 2026

(This article belongs to the Section Computational Social Science)

Download

Browse Figures

Versions Notes

Abstract

As social infrastructure intensively developed during the high economic growth period of the 1970s faces simultaneous aging, there is an urgent need to transition from conventional reactive maintenance to preventive maintenance utilizing various data (data-driven asset management. However, the greatest barrier in practice is that inspection data is unevenly distributed in analog formats such as paper and unstructured files, and heavily relies on the subjective visual evaluation of expert engineers (e.g., discrete graded evaluations from A to D). The intervention of this “Assessor Bias” makes it difficult to ensure the robustness required for direct statistical analysis. This paper serves as a bridge between this analog expert knowledge and quantitative data science. It formulates human cognitive conflicts (true state, peer pressure, avoidance of cognitive load) using the distance-decay model of the Analytic Hierarchy Process (AHP) and the Softmax function, constructing a micro-macro link model accompanied by stochastic variations. Through large-scale multi-agent simulations (

N = 10^{7}

) validating the model’s convergence, it was demonstrated that in long-tail distributions formed under peer pressure, macroscopic statistical distance metrics such as the Kullback-Leibler (KL) divergence ignore the fact that a small number of true signals are non-linearly suppressed, causing a statistical misinterpretation that “the error is within an acceptable range”. This implies that as long as macroscopic statistical indicators are over-trusted, signs of critical deterioration (minorities) will be structurally marginalized. Returning to the debate on “Homogeneity (Homogenität)” in German social statistics, this paper advocates that in order to realize objective “Micro-segmentation of Homogeneous Statistical Populations,” a paradigm shift from qualitative methods relying on human intuition to quantitative methods incorporating multi-criteria decision making is essential, rather than simply expanding the sample size.

Keywords:

5-point Likert scale; cognitive bias; micro-macro link; Monte Carlo simulation; Kullback-Leibler divergence; Finite Mixture Model; Informational Health

1. Introduction

1.1. The Challenge of Simultaneous Aging of Social Infrastructure and Assessor Bias

In Japan, much of the social capital intensively developed during the high economic growth period has passed 50 years since its construction, facing the severe issue of simultaneous aging [1]. With financial resources allocated to infrastructure management strictly limited due to a declining birthrate, an aging population, and decreasing tax revenues, transitioning to preventive asset management that maximizes effects with limited budgets has become a top national priority [1]. There is a strong demand to shift towards “data-driven management” based on the management and analysis of big data, represented by advanced information processing technologies like AI and IoT [1,2].

However, a significant barrier exists that hinders this realization in field operations such as tunnel lighting facilities. The majority of inspection data is stored in analog formats like paper media and unstructured PDFs, meaning there is a severe lack of time-series digital data sufficient for training machine learning models [1]. Furthermore, the deterioration state of facilities is evaluated on discrete scales such as A, B, C, and D through visual inspections by expert engineers; therefore, the personalization of judgment—namely, “Assessor Bias”—inevitably creeps in [3,4]. For instance, the boundary between “Rank B (minor deterioration)” and “Rank C (repair required)” strongly depends on the engineer’s tacit knowledge and experience, or cognitive conflicts such as “consideration for budgets and social demands” [5,6]. Applying existing data-driven approaches directly to raw data fraught with such uncertainty is extremely risky [1,7].

1.2. Statistical Misinterpretations and the Lack of “Homogeneity”

The biggest limitation in existing survey methods and statistical modeling is the over-reliance on the “Law of Large Numbers” and “macro statistical indicators.” In conventional paradigms, uncertainties have been treated as “random white noise,” and it has been assumed that a sample size (N) of hundreds to thousands is sufficient to represent the true population. However, in “long-tail distributions” formed in highly conformist social spaces or organizations, macro statistical indicators (such as KL divergence) treat the microscopic stochastic fluctuations of the tail as mere errors [8].

Recent literature over the past five years has extensively highlighted the limitations of these conventional assumptions. Modern psychometric studies and cognitive modeling research demonstrate that traditional Likert formats often fail to isolate genuine latent opinions from idiosyncratic response biases [9,10,11]. Recent psychometric assessments emphasize that Likert scales often suffer from a “lack of a common metric,” where different respondents interpret the intervals between discrete points inconsistently [9]. This inconsistency is further exacerbated by acquiescence bias and social desirability bias, particularly in environments where individuals feel pressured to align with perceived organizational norms [9,10]. To address this problem, it is essential to explicitly model expert “judgment variance” as stochastic fluctuations (e.g., a beta distribution bounded within the [0, 1] interval) and verify whether the model can output robust results even under such uncertainty through large-scale Monte Carlo simulations [1,2]. This means returning to the fundamental theoretical debate in German social statistics regarding the “Homogeneity Controversy”—namely, the distinction between “formal homogeneity” and “structural homogeneity” within a statistical population [12].

To address this problem, it is essential to explicitly model expert “judgment variance” as stochastic fluctuations (e.g., a beta distribution bounded within the [0, 1] interval) and verify whether the model can output robust results even under such uncertainty through large-scale Monte Carlo simulations [2,13,14]. This means returning to the fundamental theoretical debate in German social statistics regarding the “Homogeneity Controversy”—namely, the distinction between “formal homogeneity” and “structural homogeneity” within a statistical population [12,15].

1.3. Inherent Errors Regardless of Evaluation Scale and Limits in Capturing Homogeneity

Subjective evaluation systems, such as the 5-point Likert scale, inherently contain scale-free “errors (biases),” regardless of whether the target scope is a small internal survey or a large-scale social survey. While recent encyclopedic definitions standardize the Likert scale as a symmetrical psychometric tool [16], widespread misuses—such as unwarranted interpretation of the mean and ignoring asymmetric cognitive distances—persist in practice [17]. Cognitive conflicts in the evaluator’s mind, such as “true feelings,” “vanity,” and “compromise,” universally intervene in human decision-making processes regardless of the sample size (N).

To mitigate these format biases, alternative psychometric instruments such as Best-Worst Scaling (BWS) have been proposed in recent empirical studies [9]. However, in legacy infrastructure management data where discrete Likert-type evaluations are already fixed, a post-hoc computational approach is required. The critical flaw in conventional survey practices is the decisive lack of extraction and capture processing for “Homogeneous statistical populations” built on the premise of this error [12]. As clearly summarized in Table 1, targets have been judged solely by the superficial distribution (mean and variance) of analog aggregate results [1]. Consequently, while the majority leans toward a specific answer due to social norms or peer pressure, minority answers and extreme opinions deviating from this have been easily dismissed as mere “anomalies” or “measurement errors.”

Errors in subjective evaluations are not mere “clerical mistakes” but inevitable products strongly reflecting the environment (peer pressure, information overload) in which the evaluator is placed.
Nevertheless, conventional aggregation methods fail to theoretically distinguish the “homogeneity” behind this error, roughly grouping responses that have undergone different cognitive processes as the “same population” [12].
This constitutes a structural limitation that risks absorbing evaluation results whole and inadvertently rendering minority signals invisible.

1.4. A Warning Against Blind Faith in Evaluation Results and the “Question” of Re-Verification

The most fundamental starting point of this paper is the objective question: “Should we not be more rigorous about how we interpret the results of surveys and all subjective evaluations?” and “Can we achieve more accurate and fair decision-making by intervening with mathematical re-verification using computational resources, rather than taking the expressed results at face value?”

If human cognitive bias is unavoidable, rather than eliminating it retroactively, the process by which bias occurs should be built on a computer and subjected to stress tests (reproduction experiments). The construction of a micro-macro link model using AHP and the Softmax function, and the execution of a massive Monte Carlo simulation of 10 million people [2], represent scientific practices responding exactly to this question. Table 2 outlines how this computational approach advances modern decision-making methodologies.

Uncritical reliance on evaluation results as absolute facts carries significant risks. Data can easily mislead decision-makers unless the noise structure behind the results (such as the non-linearity of Softmax) is understood.
By conducting large-scale re-verification (Dry Run) by computers, we can uncover the mechanisms of statistical bias where existing statistical indicators (like KL divergence) falsely judge that “the error is within an acceptable range” [2,8].
Subjecting minute stochastic fluctuations in survey responses to “re-verification by computers” on mathematical models is an indispensable paradigm shift [1,2].

1.5. Approach and Contributions of This Paper

This paper elucidates how macro evaluation distributions emerge from human cognitive processes (true feelings, vanity, compromise) and how the “critical sample size” required to capture an identical statistical population changes non-linearly depending on the distribution shape [1,14].

Stochastic Modeling: Evaluator bias is mathematically formulated as stochastic noise using the distance-decay model of the Analytic Hierarchy Process (AHP) [5,6].
Simulation Engine: Non-linear transformations via the Softmax function are introduced, and macro population distributions are allowed to emerge through large-scale Monte Carlo trials ( $N = 10^{7}$ ) [2,6]. This addresses the class imbalance problem where rare degradation paths are statistically underestimated in empirical data [1].
Verification of Robustness: By evaluating errors using KL divergence, we analyze the pitfalls where macro statistical indicators suppress minority signals and present the necessity of Micro-segmentation [8,12].

The academic significance of this paper lies in driving a paradigm shift from qualitative methods relying on human intuition to quantitative methods based on logical evidence in domains where actual data is limited [1,5].

2. Materials and Methods: Mathematical Modeling of the Micro-Macro Link

To systematically present the approach, the simulation workflow is structured into four sequential phases: (1) Agent parameter initialization representing latent cognitive states, (2) Derivation of deterministic logical weights via AHP distance-decay, (3) Transformation into stochastic choice probabilities via the Softmax function, and (4) Macroscopic aggregation utilizing multinomial distributions. The codebase serving as the foundation of the simulation is publicly available as open-source [18]. Detailed information regarding the computational environment, system specifications, and reproducibility is provided in Appendix A.

2.1. Setting Evaluation Criteria and the AHP Distance-Decay Model

When respondent k selects from a 5-point evaluation

S = {a, b, c, d, e}

, it is assumed they possess an importance vector

v^{(k)} = {(v_{1}, v_{2}, v_{3})}^{⊤}

for the following three criteria:

C_{1}

: True State,

C_{2}

: Social Desirability,

C_{3}

: Cognitive Ease. The elements

u_{m, i}

of the evaluation vector

u_{m}

for the choices in each criterion

m \in {1, 2, 3}

are derived using a distance-decay model from the target choice [5].

u_{m, i} = \frac{exp (- α_{m} | i - T_{m} |)}{\sum_{j = 1}^{5} exp (- α_{m} | j - T_{m} |)}

(1)

Here,

T_{m}

is the ideal choice index for each criterion, and

α_{m}

is the decay parameter. By taking a linear combination of these, the final logical weight

w^{(k)}

for agent k is calculated.

w^{(k)} = \sum_{m = 1}^{3} v_{m} u_{m}

(2)

The parameters

α_{m}

are set to 1.0 based on standard sensitivity analysis in AHP-related studies, representing a moderate exponential decay of preference as the distance from the ideal choice index increases. This ensures that adjacent options maintain a statistically significant likelihood of being selected, mimicking human cognitive hesitation between similar categories.

2.2. Conversion to Probability Vectors via the Softmax Function

The Softmax function is applied to the derived logical weight

w^{(k)}

using human cognitive confidence (inverse temperature parameter)

β

to obtain the elements

p_{i}^{(k)}

of the final response probability vector

p^{(k)}

[3,6].

p_{i}^{(k)} = \frac{exp (β w_{i}^{(k)})}{\sum_{j = 1}^{5} exp (β w_{j}^{(k)})}

(3)

The confidence parameter

β

is varied between 0 and 20 to encompass the full spectrum of human decision-making states: from total stochastic noise (

β \to 0

) to deterministic, rigid adherence to logical weights (

β \to \infty

). These ranges are consistent with empirical observations in Random Utility Theory regarding the variability of assessor behavior under different levels of cognitive load.

2.3. Introduction of Large-Scale Monte Carlo Simulation and Multinomial Distribution

Since an approach holding individual data for 100 million people (

N = 10^{8}

) becomes computationally demanding, the “Multinomial Distribution” from statistics was adopted [2]. By pre-defining the true probability vector

p

and generating the “count of each choice” all at once using the multinomial distribution, computational complexity is drastically reduced from

O (N)

to

O (1)

[1].The specific computational environment and efficiency metrics validating this optimization are detailed in Appendix A.

2.4. Capture Criterion for Identical Statistical Populations (Kullback-Leibler Divergence)

Assuming the population distribution is P and the extracted empirical distribution is Q, the information-theoretic distance between the two is evaluated using KL divergence [8].

D_{K L} (P ‖ Q) = \sum_{i = 1}^{5} P (i) {log}_{2} \frac{P (i)}{Q (i)}

(4)

The minimum N where the 95th percentile value of this

D_{K L}

falls below the threshold

ϵ = 0.001

is defined as the “critical sample size capable of capturing the identical statistical population.”

3. Results

3.1. Behavior of Confidence Parameter ( $β$ ) as the Reciprocal of Error Variance

The parameter

β

introduced into the Softmax function functions as the “reciprocal of unobservable error variance” in Random Utility Theory [6].

The critical statistical implication presented by Figure 1 is the non-linear disappearance of “Fisher Information” when inversely estimating the true weight

w

from observation data. In a state where

β

is extremely high (error variance asymptotes to zero), probabilities stick to 0 or 1, causing “Complete Separation,” and the gradient of the log-likelihood function disappears. Conversely, as

β \to 0

, the signal submerges into background noise, returning to a uniform distribution [4].

In practices where assessor bias intervenes, variance in evaluator judgment (fluctuations in $β$ ) must be explicitly incorporated into the model to extract true signals [1].
In long-tail distributions, a high $β$ suppresses minority signals as an over-adaptation to peer pressure, while a low $β$ causes burial in noise [2].

Figure 1. Transition of multinomial logit probability accompanying changes in the reciprocal of error variance (

β

).

Figure 1. Transition of multinomial logit probability accompanying changes in the reciprocal of error variance (

β

).

3.2. Cognitive Processes as Finite Mixture Models and the Discovery of “Intra-Individual Mixture Distributions”

This study’s decision-making process also provides a critique of overconfidence in macro statistical indicators from the perspective of a “Finite Mixture Model” [14]. The logical weight

w

is mathematically a linear combination of three different probability distributions (components) [19].

Panel 4 (Internal Conflict) in Figure 2 presents a statistical paradox where “the random variable existing within an individual’s mind can itself be bimodal.” Conventionally, when bimodality is observed in macro data, statisticians assume “the population is divided into two heterogeneous sub-populations” and attempt to segment the group [12].

However, if individual choice probabilities are already bimodal (intra-individual mixture distribution), the division appearing in macro data is not a “social division” but merely the “sum of individual hesitations.” If segmentation is performed based solely on superficial distribution shapes despite the entire population being statistically perfectly homogeneous, it will cause spurious correlations and overfitting [14]. Furthermore, the antagonistic state where mass is dispersed across multiple probabilities causes overdispersion when sampling from a multinomial distribution, making parameter identifiability methodologically challenging [7]. Table 3 contrasts the conventional statistical fallacies with the reality shown by our model.

Figure 2. Distortion of probability distributions accompanying changes in the mixing ratio of cognitive criteria (phase states of finite mixture distributions). The letters a–e represent the choices in the 5-point evaluation scale.

4. Computational Experiments: 3 Patterns of Experiments and Mathematical Considerations

To ensure the statistical robustness of the micro-mechanism, a series of three additional experiments were executed, manipulating the limits of the parameter space and correlations between latent variables [20].

4.1. Experiment 1: Probability Fusion at Target Proximity and Opinion Dynamics

As clearly illustrated in Figure 3, when the true state and social demand approach each other, the distribution degenerates (fuses) into an “asymmetrically skewed unimodal distribution.” This indicates the limits of identifiability (Ill-posed problem) in mixture models [19].

Dynamical System Analysis of Non-linear Transformation: When targets are adjacent ( $| T_{1} - T_{2} | = 1$ ), the weight difference is minute, and the exponential nature of the Softmax function forms a smooth gradient where probability mass is distributed to both.
Sociophysics Reinterpretation: When the social correct answer and an individual’s true feelings are close, agents blur the boundary line without clear conflict. This “silent assimilation process” demonstrates how minority opinions can be absorbed into the gravitational pull of social norms [13,21].

Figure 3. Asymmetric probability fusion of mixture distributions accompanying target proximity (Experiment 1). The letters a–e represent the choices in the 5-point evaluation scale.

4.2. Experiment 2: Kernel Smoothing and Asymptotics to the Dirac Delta Function

Figure 4 demonstrates the degeneration phenomenon where the distance-decay parameter

α

functions as the reciprocal of bandwidth (precision parameter) in kernel density estimation [22].

Mathematical Consideration of Limits: In the limit of $α \to \infty$ , the entire evaluation vector converges to a discrete Kronecker delta, degenerating to a single choice. In a model using L1 norm distance, an increase in $α$ functions as a powerful sparsification operator that artificially reduces non-zero elements of the probability vector [23].
Rigidification of Social Systems: $α$ determines the “information entropy” of the entire system. An extremely large state evaluates minute distances between choices as infinite differences, serving as a signal of a rigid information environment [24].

4.3. Experiment 3: Heteroscedasticity of Error Terms and Statistical Mechanical Inverse Temperature

As shown in Figure 5, the negative correlation between cognitive load avoidance (

v_{3}

) and confidence (

β

) is an advanced attempt to introduce heteroscedasticity into discrete choice models [25].

Mathematical Consideration of Inverse Temperature: The Softmax function is theoretically analogous to a canonical distribution with energy states and inverse temperature. The more an agent attempts to avoid cognitive load, the more the system’s physical temperature rises, causing thermal excitation to peripheral states [6].
Deterioration of Informational Health and “Cognitive Fog”: The concentration of probability in the center forms a cognitive baseline for agents who have reduced decision-making effort. The phenomenon of probability leaking to the periphery mathematically visualizes the process where random errors (Cognitive Fog) obscure the true signal [1,26].

Figure 4. Degeneration phenomenon of probability density accompanying changes in the distance-decay parameter (

α

) (Experiment 2). The letters a–e represent the choices in the 5-point evaluation scale.

Figure 4. Degeneration phenomenon of probability density accompanying changes in the distance-decay parameter (

α

) (Experiment 2). The letters a–e represent the choices in the 5-point evaluation scale.

Figure 5. Generation of heteroscedasticity due to negative correlation between cognitive load (

v_{3}

) and confidence (

β

) (Experiment 3). The letters a–e represent the choices in the 5-point evaluation scale.

Figure 5. Generation of heteroscedasticity due to negative correlation between cognitive load (

v_{3}

) and confidence (

β

) (Experiment 3). The letters a–e represent the choices in the 5-point evaluation scale.

4.4. Comparative Analysis with Baseline Aggregation Methods

To properly position the proposed AHP-Softmax micro-macro link model within the existing literature, a comparative analysis with standard baseline methods—specifically, Simple Mean Aggregation and traditional Ordered Logit Models—is necessary. Traditional methods often treat discrete Likert responses as continuous variables, outputting a singular mean that completely obscures intra-individual variance. For instance, in our ’Bimodal’ scenario, Simple Mean Aggregation outputs a neutral score of ’3’, falsely implying a moderate consensus and failing to detect the underlying polarization. Conversely, standard Ordered Logit Models account for ordinality but assume homoscedasticity, thereby failing to capture the ’Cognitive Fog’ (variance inflation) demonstrated in Experiment 3. The proposed model explicitly isolates the confidence parameter (

β

) and social desirability (

v_{2}

), allowing for the mathematical recovery of minority signals that baseline models inherently suppress as statistical noise.

5. Discussion

These three computational experiments are stress tests analyzing algorithm behavior under challenging conditions: “Identifiability loss (fusion),” “Undersmoothing of information spaces (sparsification),” and “Heteroscedasticity of error terms (thermodynamic noise).” Using this validated model, we explored the “critical sample size

N^{*}

required to capture identical statistical populations.”

The notable mathematical feature in Figure 6 is that the curves generated from different micro cognitive mechanisms—Central, Bimodal, and LongTail—overlap on an identical asymptotic trajectory with a slope of

- 1

(

O (1 / N)

) on a log-log graph. Furthermore, Table 4 categorizes the physical meaning of these errors.

Figure 6. Attenuation of KL divergence against sample size for each emergent distribution (log-log scale).

This indicates that the “low-resolution sensor” of a 5-point evaluation dimensionally compresses the high-dimensional cognitive conflict space. Due to “Macroscopic Degeneracy,” macro statistical indicators such as KL divergence can cause a statistical bias, asserting that “the error is overall within an acceptable range” even when the 1% true signal at the tip of the long tail is suppressed [8,15].

5.1. Non-Linearity of Critical Sample Sizes and Observation Costs

Statistically, the manifestation of sampling error differs completely depending on whether the true probability distribution is “uniform,” “bimodal,” or “long-tail” [2]. As detailed in Table 5, existing models often overlook that the “sample size required to capture identical populations” jumps non-linearly depending on the “distribution shape.”

5.2. Practical Implications: The Pitfalls of Statistical Misinterpretations Across Survey Scales

The 5-point Likert scale is frequently utilized regardless of the survey size due to its simplicity. However, it acts as a “low-resolution sensor” that compresses human cognitive conflicts into a single dimension. Consequently, researchers must remain vigilant against different “pitfalls of statistical misinterpretations” depending on the scale of the survey.

5.2.1. Pitfalls in Small-Scale Surveys (Dozens to Hundreds of Samples)

In small-scale surveys, the primary challenges lie in “random noise” and the “invisibility of minorities.”

Blind Faith in the Mean and Centralization: As identified in recent critical reviews, the midpoint of a Likert scale often functions as a “default option” for respondents who are unwilling or unable to exert the cognitive effort required for a precise evaluation [17]. This phenomenon, documented as “central tendency bias,” directly corroborates the “Cognitive Fog” observed in our Experiment 3 (Figure 5) [27]. In small samples, this state constitutes background noise, rendering the calculated mean uninformative.
Marginalization of Minority Opinions: Under environments with strong peer pressure, extreme opinions are pushed to the very end of the long tail. Capturing a 1% minority signal requires a sample size of $N \geq 10^{5}$ . Therefore, interpreting the absence of extreme scores in small surveys as “the absence of dissatisfaction” is a statistical fallacy.
The Misinterpretation of Consensus (Probability Fusion): Even when responses are concentrated at “4”, one must be cautious. As shown in Experiment 1 (Figure 3), when “true feelings” and “social demand” are close, probabilities fuse without visible conflict. Observing a 1D aggregate graph makes it difficult to distinguish whether the peak represents genuine satisfaction or a compromised “Assimilation Trap.”

5.2.2. Pitfalls in Large-Scale Surveys (Thousands to Millions of Samples)

In big data analytics, the greatest risk is the assumption that an over-reliance on massive sample sizes eliminates bias, mistaking crystallized systematic errors for “solid truth.”

Over-reliance on the Law of Large Numbers: Biases such as peer pressure and cognitive laziness are not random white noise; they are gravitational forces directing probabilities toward specific answers. Increasing N merely mathematically crystallizes the erroneous distribution into a fixed systematic error.
Structural Suppression of Minorities by Macro Indicators: When macro statistical indicators like KL divergence are employed to measure overall fitness, they can issue a false “OK signal” around N = 5000. Because macro indicators are dragged by the majority, the 1% true minority signal is suppressed and rendered invisible.
Macroscopic Degeneracy via Low-Resolution Sensors: A bimodal graph cannot mathematically distinguish between “a society truly divided into two factions” and “homogeneous individuals hesitating internally.” Segmenting populations based solely on these 1D superficial shapes leads to spurious correlations and severe overfitting [14].

5.3. Post-Hoc Mathematical Remedies for Legacy Likert Data

To enhance the validity of already collected 5-point evaluation data, simple aggregations should be approached with caution. We propose four post-hoc mathematical processing approaches based on our findings:

Separating “Individual Hesitation” and “Group Division” via Hierarchical Bayesian Models: By applying latent variable models, one can statistically separate whether variance is due to differences between respondents (Between-variance) or internal cognitive conflicts (Within-variance).
Reverse-Estimating Confidence ( $β$ ) Assuming Heteroscedasticity: Rather than calculating simple means, one should apply multinomial logit models to reverse-estimate the “magnitude of error variance” for response patterns. Data groups with abnormally large variance should be down-weighted.
Pitfalls in Large-Scale Surveys: Thousands to Millions of Samples. Standard data cleaning practices—such as automatically rejecting extreme answers using 3 $σ$ rules—should be critically reassessed. These rare answers may be “true signals with high confidence.”
Re-verification of Generative Processes via Simulation (Dry Run): By utilizing our AHP distance-decay and Softmax link model, researchers can run Monte Carlo reverse-explorations to expose underlying mechanisms, such as probability fusion.

Empirical comparisons between Likert scales and alternative instruments, such as Best-Worst Scaling (BWS), have shown that Likert-type formats are significantly more susceptible to extreme response bias [9]. While BWS can mitigate these issues, it is often impractical for legacy datasets. Therefore, the post-hoc mathematical disentanglement proposed here provides a critical bridge for enhancing the validity of existing ordinal data [9].

5.4. Breaking Macroscopic Degeneracy via NLP-Assisted Re-Evaluation

Incorporating open-ended text responses provides a high-dimensional secondary data source to mathematically recover the lost cognitive confidence parameter (

β

). To overcome statistical limitations, we propose an NLP-assisted re-evaluation pipeline [27]. The computational procedure is defined as follows:

Data Decomposition: The 1D discrete scores and their corresponding open-ended text responses are separated.
NLP-Based Confidence Scoring: Natural Language Processing (NLP) measures the respondent’s cognitive effort. A “Text Confidence Weight” is calculated, applying penalties to computationally detected “lazy responses”.
Score Re-evaluation and Reintegration: The original 1D score is multiplied by the Text Confidence Weight. Extreme scores mathematically vulnerable to peer pressure are given a baseline protection weight.

This data-driven pre-processing dynamically down-weights “noisy centralizations” and amplifies minority signals backed by descriptive text. The proof-of-concept Python codebase has been published as an open-source GitHub repository [27].

5.5. Limitations and Future Directions: Mathematical Re-Evaluation for Pure Likert Scales

For legacy surveys that strictly employ Likert scales without open-ended fields, extracting the cognitive confidence parameter directly from text is impossible. To address this challenge, we propose a purely mathematical pattern-recognition approach [28].

Straight-Lining Penalty: Respondents selecting the same option across all questions exhibit zero variance, identifiable as “Cognitive Fog” rather than a genuine neutral stance.
Entropy-Based Noise Filtering: Responses with excessively high Shannon entropy can be penalized as thermodynamic noise.
Minority Signal Protection: An algorithm can automatically boost the baseline weight of extreme choices to prevent them from being suppressed.

This response-pattern-based pipeline for pure Likert scales is publicly available [28]. Integrating both NLP-based and mathematical re-evaluation methodologies provides a robust future direction for data-driven survey analysis.

6. Conclusions

This study revealed the structural limitations in 5-point Likert scales used in social surveys from the perspective of computational social science. By formulating human cognitive conflicts using the AHP distance-decay model and the Softmax function, and deploying large-scale simulations (

N = 10^{7}

), it was proven that reliance on the “Law of Large Numbers” requires careful scrutiny [1,8].

The important academic and practical contributions are summarized as follows:

Elucidation of the Non-linear Relationship between Distribution Shape and Critical Sample Size: Through the attenuation curves of KL divergence, we demonstrated that the critical sample size required to capture identical statistical populations leaps non-linearly depending on the distribution bias [8].
Presentation of the Danger of “Overlooked Errors” and Minority Invisibility: In long-tail distributions, macro statistical indicators suppress minority true signals, causing the misinterpretation that “the error is within an acceptable range.”
Transition to Survey Design using Multi-Criteria Decision Making Models (AHP): Since collecting the sample sizes necessary to capture minority opinions in a long tail is practically demanding, an approach that returns to the generation mechanism is required. We advocated the indispensability of incorporating AHP, which separates criteria and measures confidence ( $β$ ) [5].

Redefining the Data-Reading Paradigm: Moving Beyond Superficial Aggregation

The primary conclusion derived from our simulations is a cautionary note against an over-reliance on macro-level data aggregation. The observed distribution shapes must be evaluated carefully, considering the pathology of the information environment, such as peer pressure or cognitive fatigue.

This study proposes the following paradigm shifts:

Avoiding Overly Assertive Interpretations: Researchers must be cautious when making deterministic claims such as “X% of users are satisfied,” acknowledging that superficial distributions cannot exclude the possibilities of probability fusion.
Reconsidering Careless Anomaly Truncation: Extreme minority opinions should not be automatically erased through routine data cleaning as “noise.”
Fundamental Redesign of Survey Methodologies: For critical decision-making, relying solely on simple 5-point scales should be critically re-evaluated. It is indispensable to multi-dimensionalize the survey design itself [5].

By discarding the assumption that “massive data automatically equates to objective truth,” computational social science can help uncover the critical minority signals buried in society, thereby contributing to true “Informational Health”.

Funding

This work was supported by JSPS KAKENHI Grant Number JP24K00317. This research summarizes the mathematical methodologies, points of caution, and supplementary notes regarding data handling for the large-scale questionnaire surveys conducted under this project.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Python code supporting the findings of this study is openly available on GitHub at the following repositories: https://github.com/RUDATAScience/Micro-Macro-Modeling-of-5-Point-Likert-Scales, https://github.com/RUDATAScience/Additional-Experiments, https://github.com/RUDATAScience/Likert-NLP-Reevaluation-Pipeline, and https://github.com/RUDATAScience/5-10-pure_likert_pipeline (accessed on 24 March 2026).

Acknowledgments

In writing this paper, I would like to express my heartfelt gratitude to everyone who has supported my research activities and provided both tangible and intangible assistance. I would like to express my deep respect to the various institutions that provided the valuable data forming the foundation of this research, and to the information platforms that made the publication of this archive possible under the principles of open science. Above all, compiling this vast collection of records would not have been possible without my family, who believed in my intellectual pursuits even in the most difficult circumstances, warmly supported me in my daily life, and walked alongside me. To them, I offer my deepest gratitude. In compiling this complete collection, as well as in proofreading and formatting some of the explanatory texts and verifying mathematical expressions, I have utilised generative AI technologies, as auxiliary tools. However, the core datasets, analytical models, and original academic insights are all based on the author’s own knowledge and manual work.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AHP	Analytic Hierarchy Process
KL	Kullback-Leibler
NLP	Natural Language Processing

Appendix A. Computational Environment and Reproducibility

To ensure the reproducibility of the large-scale Monte Carlo simulations (

N = 10^{7}

) conducted in this study, the specific computational environment, algorithm efficiency, and parameter settings are detailed in this section.

Appendix A.1. Hardware Architecture and System Specifications

The simulations were executed on a standard cloud computing environment. The detailed hardware resources are as follows:

Table A1. Hardware Specifications for Large-Scale Simulation.

Item	Specification
Architecture	x86_64 (Little Endian)
CPU	Intel(R) Xeon(R) CPU @ 2.20 GHz
Core Count	2 vCPUs (1 Socket, 2 Threads per core)
L3 Cache	55 MiB
BogoMIPS	4399.99
OS	Linux (64-bit support)

Appendix A.2. Implementation Details and Computational Efficiency

The simulation code was implemented using Python 3.x, with a strong emphasis on computational optimization using multinomial distributions.

Random Seed Specification: To ensure exact reproducibility across all stochastic processes, the pseudo-random number generator seed was explicitly fixed (e.g., numpy.random.seed(42)) prior to execution.
Algorithmic Optimization: The proposed framework avoids the $O (N)$ looping structures typically found in standard Agent-Based Models (ABM). Instead, it generates the choice counts simultaneously by sampling from a statistical Multinomial Distribution.
Computational Complexity: This optimization effectively reduces the time complexity to $O (1)$ relative to the number of agents, enabling extremely fast processing even for astronomical sample sizes.
Execution Metrics: The total execution time for all processes—including the $N = 10^{7}$ sample generation, parameter sweeps across three experiments, and the calculation of Kullback-Leibler divergence attenuation curves—was approximately 3.5 min on the specified 2-core architecture.

Appendix A.3. Scalability for Field Applications

The high computational efficiency of this model offers significant practical advantages for infrastructure asset management.

Hardware Accessibility: The framework does not require expensive High-Performance Computing (HPC) clusters or GPU servers. It can be executed efficiently on standard commercial PCs or low-cost cloud instances.
Real-time Analysis: Field engineers can practically perform “Dry Runs” on-site to dynamically evaluate the structural bias and statistical uncertainty of freshly collected inspection data.
Parameter Robustness: Using the environment detailed in Appendix A and the identical parameters ( $α, β, v$ ), independent researchers can perfectly reproduce the mathematical results and figures presented in this paper.

Appendix A.4. Software Dependencies and Data Availability

The primary libraries and open-source repositories utilized for the execution of these simulations are listed below.

Table A2. Software Environment and Repositories.

Tool	Version/Location
Language	Python 3.8+
Library	NumPy (v1.23.5), SciPy (v1.9.3), Matplotlib (v3.6.2)
Code Base	https://github.com/RUDATAScience/Micro-Macro-Modeling-of-5-Point-Likert-Scales (accessed on 24 March 2026)
Additional Exp	https://github.com/RUDATAScience/Additional-Experiments (accessed on 24 March 2026)

References

Kawahata, Y.; Maeda, N.; Hatadani, S. Robustness and Convergence of a Stochastic Deterioration Prediction Model for Tunnel Lighting Facilities: A Japanese Case Study Based on Systematized Expert Knowledge and Large-Scale Monte Carlo Simulation. Appl. Sci. 2026, 16, 1994. [Google Scholar] [CrossRef]
Gentle, J.E. Random Number Generation and Monte Carlo Methods; Springer: New York, NY, USA, 2003. [Google Scholar]
McFadden, D. Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics; Zarembka, P., Ed.; Academic Press: New York, NY, USA, 1974; pp. 105–142. [Google Scholar]
Albert, A.; Anderson, J.A. On the existence of maximum likelihood estimates in logistic regression models. Biometrika 1984, 71, 1–10. [Google Scholar] [CrossRef]
Saaty, T.L. The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation; RWS Publications: Pittsburgh, PA, USA, 1990. [Google Scholar]
Gao, B.; Pavel, L. On the properties of the softmax function with application in game theory and reinforcement learning. arXiv 2017, arXiv:1704.00805. [Google Scholar]
McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 1989. [Google Scholar]
Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Heo, C.Y.; Kim, B.; Park, K.; Back, R.M. A comparison of Best-Worst Scaling and Likert Scale methods on peer-to-peer accommodation attributes. J. Bus. Res. 2022, 148, 368–377. [Google Scholar] [CrossRef]
Kurtuluş, K.; Basfirinci, C.; Cilingir Uk, Z. Beyond the middle: Rethinking midpoint and “no idea” options in Likert scale for consumer sensitive issues. Humanit. Soc. Sci. Commun. 2026. [Google Scholar] [CrossRef]
Grimmond, J.; Brown, S.D.; Hawkins, G.E. A solution to the pervasive problem of response bias in self-reports. Proc. Natl. Acad. Sci. USA 2025, 122, e2412807122. [Google Scholar] [CrossRef] [PubMed]
Nagaya, M. Statistical Mass and Homogeneity: The Homogeneity Controversy in German Social Statistics (1). Econ. Rev. (Keizai Ronso) 1989, 143, 344–372. [Google Scholar]
Galam, S. Sociophysics: A Physicist’s Modeling of Psycho-Political Phenomena; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
McLachlan, G.J.; Peel, D. Finite Mixture Models; John Wiley & Sons: New York, NY, USA, 2000. [Google Scholar]
Amari, S.I. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2016. [Google Scholar]
Koo, M.; Yang, S.-W. Likert-Type Scale. Encyclopedia 2025, 5, 18. [Google Scholar] [CrossRef]
Darnton, A. The Likert Scale: A Review of its Use and Misuse. In Proceedings of the European Conference on Research Methodology for Business and Management Studies, Aveiro, Portugal, 2–3 June 2022. [Google Scholar]
Kawahata, Y. Micro-Macro-Modeling-of-5-Point-Likert-Scales. GitHub Repository. Available online: https://github.com/RUDATAScience/Micro-Macro-Modeling-of-5-Point-Likert-Scales (accessed on 25 March 2026).
Titterington, A.W.; Smith, A.F.M.; Makov, U.E. Statistical Analysis of Finite Mixture Distributions; Wiley: New York, NY, USA, 1985. [Google Scholar]
Kawahata, Y. Additional-Experiments. GitHub Repository. Available online: https://github.com/RUDATAScience/Additional-Experiments (accessed on 25 March 2026).
Cinelli, M.; Morales, G.D.F.; Galeazzi, A.; Quattrociocchi, W.; Starnini, M. The echo chamber effect on social media. Proc. Natl. Acad. Sci. USA 2021, 118, e2023301118. [Google Scholar] [CrossRef] [PubMed]
Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman and Hall/CRC: Boca Raton, FL, USA, 1986. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620. [Google Scholar] [CrossRef]
Maddala, G.S. Limited-Dependent and Qualitative Variables in Econometrics; Cambridge University Press: Cambridge, UK, 1983. [Google Scholar]
Bailey, K.D. Social Entropy Theory; State University of New York Press: Albany, NY, USA, 1990. [Google Scholar]
Kawahata, Y. Likert-NLP-Reevaluation-Pipeline. GitHub Repository. Available online: https://github.com/RUDATAScience/Likert-NLP-Reevaluation-Pipeline (accessed on 26 March 2026).
Kawahata, Y. Pure-Likert-Reevaluation-Pipeline. GitHub Repository. Available online: https://github.com/RUDATAScience/5-10-pure_likert_pipeline (accessed on 26 March 2026).

Table 1. Limitations and Risks in Conventional Evaluation Systems.

Locus of Problem	Limitations and Risks in Conventional Systems
Scale Independence	Bias caused by cognitive conflict occurs universally. Increasing sample size alone does not resolve the error.
Anomaly Misidentification	By neglecting to accurately define homogeneous populations, true signals (critical deterioration) are marginalized as outliers.
Analog Constraints	Evaluations relying on 1D simple aggregations fail to measure the “invisible multidimensional structure of cognition” [1].

Table 2. Advancement of Decision-Making via Computational Resources.

Verification Approach	Advancement of Decision-Making
Eliminating Blind Faith	We scrutinize superficial aggregate results (1D outputs) and visualize the underlying “divergence between true feelings and public stance” or “cognitive fatigue.”
Introducing Re-verification	We re-verify via stochastic simulations whether the observed distribution is a “healthy aggregation of opinions” or the “suppression of minorities by peer pressure” [6].
Accurate Decision-Making	We micro-segment true “homogeneous statistical populations” from data previously discarded as anomalies [5,15], realizing data-driven, ethical decision-making.

Table 3. Statistical Fallacies Regarding the Source of Bimodality.

Analysis Dimension	Statistical Fallacies Regarding Bimodality
Conventional Approach	Macro bimodality is considered “inter-group heterogeneity (conflict of opinions),” attempting to cluster the population into two distributions.
Reality Shown by Model	Macro bimodality can also occur through the accumulation of “intra-individual mixture distributions (inner conflicts).” The entire population may be statistically perfectly homogeneous.
Statistical Consequence	Population segmentation (Micro-segmentation) based solely on superficial macro distribution shapes has a high risk of causing spurious correlations and overfitting [14].

Table 4. Physical Meaning of Error in Low-Resolution Sensors.

Phase State	Physical Meaning of Error
Centralization	Error mainly due to thermal fluctuations around the peak (c). Low impact on macro decision-making.
Bimodal	Measurement error of valley depth. Affects the estimation of societal polarization strength.
Long Tail	Observation omission of minority states (e). A critical error that overlooks rare but significant degradation paths.

Table 5. Non-linear Differences in Critical Sample Size.

Distribution Shape	Differences in Critical Sample Size
Uniform/Central	Because there is variance across choices or a single peak, the overall distribution shape stabilizes with relatively few samples ( $N \approx 10^{3}$ ∼ $10^{4}$ ).
Bimodal	To prove the “depth of the valley” in the middle layer, a slightly larger sample size ( $N \approx$ 5000∼10,000) is required.
Long Tail	To prove the existence of a minority with <1% probability, substantially larger sample sizes ( $N \approx 10^{5}$ ∼ $10^{7}$ ) are required.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kawahata, Y. Micro-Macro Modeling of Inherent Cognitive Biases in 5-Point Likert Scales: Uncovering the Non-Linearity of Critical Sample Sizes for Capturing Identical Statistical Populations. Computation 2026, 14, 100. https://doi.org/10.3390/computation14050100

AMA Style

Kawahata Y. Micro-Macro Modeling of Inherent Cognitive Biases in 5-Point Likert Scales: Uncovering the Non-Linearity of Critical Sample Sizes for Capturing Identical Statistical Populations. Computation. 2026; 14(5):100. https://doi.org/10.3390/computation14050100

Chicago/Turabian Style

Kawahata, Yasuko. 2026. "Micro-Macro Modeling of Inherent Cognitive Biases in 5-Point Likert Scales: Uncovering the Non-Linearity of Critical Sample Sizes for Capturing Identical Statistical Populations" Computation 14, no. 5: 100. https://doi.org/10.3390/computation14050100

APA Style

Kawahata, Y. (2026). Micro-Macro Modeling of Inherent Cognitive Biases in 5-Point Likert Scales: Uncovering the Non-Linearity of Critical Sample Sizes for Capturing Identical Statistical Populations. Computation, 14(5), 100. https://doi.org/10.3390/computation14050100

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Micro-Macro Modeling of Inherent Cognitive Biases in 5-Point Likert Scales: Uncovering the Non-Linearity of Critical Sample Sizes for Capturing Identical Statistical Populations

Abstract

1. Introduction

1.1. The Challenge of Simultaneous Aging of Social Infrastructure and Assessor Bias

1.2. Statistical Misinterpretations and the Lack of “Homogeneity”

1.3. Inherent Errors Regardless of Evaluation Scale and Limits in Capturing Homogeneity

1.4. A Warning Against Blind Faith in Evaluation Results and the “Question” of Re-Verification

1.5. Approach and Contributions of This Paper

2. Materials and Methods: Mathematical Modeling of the Micro-Macro Link

2.1. Setting Evaluation Criteria and the AHP Distance-Decay Model

2.2. Conversion to Probability Vectors via the Softmax Function

2.3. Introduction of Large-Scale Monte Carlo Simulation and Multinomial Distribution

2.4. Capture Criterion for Identical Statistical Populations (Kullback-Leibler Divergence)

3. Results

3.1. Behavior of Confidence Parameter ( β ) as the Reciprocal of Error Variance

3.2. Cognitive Processes as Finite Mixture Models and the Discovery of “Intra-Individual Mixture Distributions”

4. Computational Experiments: 3 Patterns of Experiments and Mathematical Considerations

4.1. Experiment 1: Probability Fusion at Target Proximity and Opinion Dynamics

4.2. Experiment 2: Kernel Smoothing and Asymptotics to the Dirac Delta Function

4.3. Experiment 3: Heteroscedasticity of Error Terms and Statistical Mechanical Inverse Temperature

4.4. Comparative Analysis with Baseline Aggregation Methods

5. Discussion

5.1. Non-Linearity of Critical Sample Sizes and Observation Costs

5.2. Practical Implications: The Pitfalls of Statistical Misinterpretations Across Survey Scales

5.2.1. Pitfalls in Small-Scale Surveys (Dozens to Hundreds of Samples)

5.2.2. Pitfalls in Large-Scale Surveys (Thousands to Millions of Samples)

5.3. Post-Hoc Mathematical Remedies for Legacy Likert Data

5.4. Breaking Macroscopic Degeneracy via NLP-Assisted Re-Evaluation

5.5. Limitations and Future Directions: Mathematical Re-Evaluation for Pure Likert Scales

6. Conclusions

Redefining the Data-Reading Paradigm: Moving Beyond Superficial Aggregation

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Computational Environment and Reproducibility

Appendix A.1. Hardware Architecture and System Specifications

Appendix A.2. Implementation Details and Computational Efficiency

Appendix A.3. Scalability for Field Applications

Appendix A.4. Software Dependencies and Data Availability

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. Behavior of Confidence Parameter ( $β$ ) as the Reciprocal of Error Variance