Phenomenological Semantic Factor Method for Risk Management of Complex Systems in Drifting

Rodionov, Dmitry; Polyakov, Prohor; Konnikov, Evgeniy

doi:10.3390/bdcc10010021

Open AccessArticle

Phenomenological Semantic Factor Method for Risk Management of Complex Systems in Drifting

by

Dmitry Rodionov

¹

,

Prohor Polyakov

^2,*

and

Evgeniy Konnikov

¹

Graduate School of Economics and Engineering, St. Petersburg Polytechnic University, St. Petersburg 195251, Russia

²

Research Laboratory “Polytech-Invest”, St. Petersburg Polytechnic University, St. Petersburg 195251, Russia

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2026, 10(1), 21; https://doi.org/10.3390/bdcc10010021

Submission received: 3 November 2025 / Revised: 26 December 2025 / Accepted: 27 December 2025 / Published: 6 January 2026

(This article belongs to the Special Issue Application of Semantic Technologies in Intelligent Environment)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Managing risk in drifting complex systems is hindered by the weak integration of unstructured incident narratives into quantitative, decision-ready models. We present a phenomena-centric semantic factor framework that closes the data–model–decision gap by transforming free-text incident reports into transparent, traceable drivers of risk and actionable interventions. The pipeline normalizes and encodes narratives, extracts domain-invariant phenomena, couples them to risk outcomes through calibrated partial least squares factors, and applies scenario optimization to recommend portfolios of measures aligned with EAM/CMMS taxonomies. Applied to a large corpus of incident notifications, the method yields stable, interpretable phenomena, improves out-of-sample risk estimation against strong text-only baselines, and delivers prescriptive recommendations whose composition and cost–risk trade-offs remain robust under concept drift. Sensitivity and ablation analyses identify semantic factorization and PLS coupling as the principal contributors to performance and explainability. The resulting end-to-end process is traceable—from tokens through phenomena and factors to actions—supporting auditability and operational adoption in critical infrastructure. Overall, the study demonstrates that phenomenological semantic factorization combined with scenario optimization provides an effective and transferable solution for integrating incident text into the proactive risk management of complex, drifting systems.

Keywords:

risk management; complex systems; concept drift; incident narratives; semantic factorization; phenomenological modeling; partial least squares; scenario optimization; prescriptive analytics; critical infrastructure

1. Introduction

In today’s world, the complexity of technological systems is increasing rapidly, and this trend encompasses everything from localized installations to critical infrastructure, global digital platforms and autonomous services based on artificial intelligence. Evidence supports the explosive growth in complexity. The number of connected Internet of Things (IoT) devices reached 16.6 billion by the end of 2023 and is projected to reach around 40 billion by 2030 [1]. At the same time, the amount of data is rapidly increasing: the IDC estimates that the global data sphere will grow from 33 zettabytes (33 trillion gigabytes) in 2018 to 175 zettabytes by 2025 [2]. And about 80% of this data is unstructured, meaning knowledge is dispersed in texts, documents, and other informalized sources that are inaccessible to traditional analysis. This means that critical information is “hidden” outside of structured databases, making it difficult for risk models to be complete and relevant.

Not surprisingly, experts are growing louder in their warnings about the risks associated with such complexity. For example, the World Economic Forum notes that today’s risk landscape is “inherently interconnected and difficult to navigate” [3]. Moreover, renowned cryptographer and security expert Bruce Schneier bluntly points out: “Complexity is the worst enemy of security, and our systems are getting more and more complex.” [4]. In other words, the more intricate a system becomes, the harder it is to anticipate and prevent its vulnerabilities. In cybersecurity, for example, there is a steady increase in new vulnerabilities, with more than 26,000 CVEs disclosed in 2023—1500 more than the previous year [5]. Great minds in management and academia also highlight the challenge of accelerating change: half of technical knowledge becomes obsolete in less than 2 years [6], and regulators and organizations have not had time to adapt. In a recent PwC survey, three quarters of risk managers (75%) admitted that they are not keeping up with the rapidly changing technology and regulatory environment [7], despite their investments in risk management. These authoritative opinions and facts agree; classical approaches to risk analysis have not kept pace with the complexity and dynamics of modern systems.

The fundamental laws of systems theory support these conclusions with strict logic. Ashby’s law of necessary diversity states “the diversity of the controlling system (controller) must be at least as great as the diversity of the system being controlled”, otherwise it cannot be effectively controlled [8]. Simply put, if the environment changes faster than we are able to react, problems are inevitably waiting for us. Mathematically, this is expressed in the combinatorial growth of the number of states and interactions as the system becomes more complex. Adding new components leads to an exponential increase in possible configurations. Thus, traditional “closed” risk models quickly lose relevance without constant revision—in fact, with such an avalanche of options, no static list of scenarios remains complete. Formally, the control of a complex system requires equally complex models; otherwise, “fragile” zones emerge where uncertainty and knowledge gaps lead to failures [9]. With 80% of the data being unstructured and new threats emerging every minute, without fundamentally new approaches, our analytical conclusions become obsolete before they can be realized.

The problem, then, is the following. The complexity of modern systems is growing faster than our tools to reduce it in a manageable way. The combination of components, interactions, and external relationships make classical “closed” risk models fragile for three reasons:

Incomplete descriptions (much of the knowledge is hidden in unstructured data and operational context);
Drift of technologies, regulations, and practices (models rapidly lose validity as they change);
The “analytics → action” gap (even valid conclusions are not transformed into standardized solutions or implemented into management loops—EAM/CMMS systems).

In practice, there is a swing between two extremes: very detailed analyses, which are expensive and poorly transferable to the changed system, and empiricism based on expert experience, which is subjective and poorly reproducible. It turns out that in increasingly complex systems, “risk analysis” either becomes disproportionately resource-intensive or produces unsustainable solutions. What is needed is a unified, reproducible, and interpretable approach to risk management that works equally well with heterogeneous data (including texts), is robust to constant environmental drift, and is embedded in the contours of management actions [10]. Only such an approach will bridge the gap between analysis and action in the rapidly increasing complexity of the techno sphere.

Thus, the unit of risk analysis should not be a “component” or a “defect category”, but a phenomenon—a stable, cross-domain configuration of meanings, contexts, and conditions of events that can be detected in heterogeneous streams (texts of incidents and regulatory reports, operational notes, work logs, user communications, investigation narratives, and open sources) and compared with variable management objectives (reliability, safety, continuity, and compliance). Macro-level empiricism fulfils this requirement. The prevalence of unstructured sources of knowledge, cross-border supply chains and services, and accelerated drift of technologies and norms—all this changes risk from an “internal property of the product” to a systemic phenomenon spreading through networks of interactions. At the level of principles, such as the law of necessary diversity, the controllability of such an environment is achieved only when the representation space and the solution space have sufficient diversity, invariance to drift, and heterogeneity. Consequently, we need a method that steadily translates diverse narratives into a common semantic coordinate system of phenomena, purposefully extracts from it the factors directly related to risk metrics and, finally, returns the result to the management space in the form of reproducible, interpretable, and prioritized actions at the level of phenomena rather than disparate attributes.

This logic sets the method as a single macro-operator that transforms a multilingual and multimodal field of narratives into a practical space of management decisions. Resistance to drift and control of complexity are inherently built in. The pipeline works sequentially. First, semantic coding translates narrative data into invariant coordinates of phenomena, i.e., into a compact and causally meaningful representation. The target-matching module then adjusts this representation to the selected risk metric and generates latent factors that concentrate the part of the variability that explains the observed risk. The projection module then builds a calibrated risk prediction based on the extracted factors, while maintaining the reproducibility of the result and the transparent attribution of the contribution of each phenomenological component. Finally, the prescriptive module selects a portfolio of management interventions from the admissible set and minimizes risk in the post-intervention state and a regularizer that limits the complexity and cost of interventions and takes into account resource, regulatory, and linkage constraints augments where phenomenological coordinates are shifted via influences through an influence matrix, the target function.

This macro-imperatorial construction imposes system requirements. The representation of phenomena must be invariant to data sources, domains, and languages. Consistency with risk metrics must close the gap between signal detection and control action. The projection part should provide drift tolerance, transparent attribution, and the full traceability of factors. The optimization of measures should remain operational sable and take into account budget, regulations, and asset linkage topology. As a result, a continuous and verifiable chain is maintained from texts to phenomena, then to factors, then to forecast, and to action. At each stage, complexity control is maintained and the applicability of solutions is preserved.

Together, factual trends (scale and heterogeneity of data), expert consensus (fragility of interconnected networks and safety-by-design requirements), and a priori principles (combinatorial growth of states and law of necessary diversity) narrow the space of acceptable solutions to classes that operate precisely in the phenomenological space and return controlled deformations of this space as the primary object of intervention. In other words, as long as “components” and “incident categories” remain the unit of analysis, the gap “data, model, solution” grows faster than our ability to close it. Only when observations are translated into global phenomena and mapped back into management measures does this gap become manageable and compatible with EAM/CMMS cycles. Hence, a unified, domain-agnostic approach is needed, where phenomena act as the first unit of risk accounting, and measurement and impact are standardized in the same space.

Thus, there is a clear need for an innovative approach, the novelty of which lies in the fact that it introduces and rigorously operationalizes semantic factor analysis of phenomena as a universal macro-framework for risk management in complex systems. A single invariant representation of phenomena from heterogeneous narratives is consistent with target risk metrics and, through optimization mapping, generates reproducible, interpretable, and prioritized actions at the level of phenomena, thus bridging the gap “data, model, decision”.

It has already been established that the unit of risk accounting in complex, drifting ecosystems should be neither components nor “categories of defects”, but phenomena-stable, inter-domain configurations of meanings and conditions extracted from heterogeneous narratives and correlated with management objectives. It can also be argued that only with symmetry between the representation space (phenomenological), the factor space, risk metrics, and the space of managerial action does the gap “data, model, decision” become controllable. The formulation of the objective follows directly from this logic.

Research on extracting actionable information from unstructured text has a long history in information retrieval and statistical language processing. Early work established reproducible weighting and retrieval mechanisms that made large text collections measurable and comparable [11,12]. Topic models, and LDA in particular, later provided a compact probabilistic representation of documents and enabled the systematic discovery of latent themes from corpora [13,14]. Alongside topic modeling, the literature developed alternative semantic representations, including latent semantic models, distributional word embeddings, and modern transformer-based encoders that improve semantic coverage and the robustness of representations across contexts.

A separate line of research focuses on linking high-dimensional representations to target variables while keeping models stable and interpretable. Partial least squares provides a classic target-oriented projection that extracts components aligned with the response and mitigates multicollinearity [15,16]. Regularized regression methods, including ridge, lasso, and elastic net, further support generalization under high dimensionality and provide a controlled way to estimate the contribution of predictors to a target metric [17,18,19,20,21]. These foundations are directly relevant when semantic coordinates must be related to risk indicators without losing traceability.

Prescriptive analytics extends prediction to decision making by optimizing interventions under constraints. CVaR-based formulations are widely used to focus on tail outcomes and to support risk-sensitive optimization objectives [22,23,24,25,26]. Bayesian optimization and related black-box optimization techniques offer practical ways to search for effective decisions when evaluations are costly or uncertain [27,28,29,30,31,32]. In addition, graph-based modeling and regularization provide a way to encode dependencies between semantic factors and to stabilize solutions with respect to relational structure [33,34,35,36,37].

For operational adoption, transparency is a core requirement. Modern explainability methods such as local explanations and additive feature attribution help to interpret model outputs and support auditability [38,39]. Broader conceptual frameworks in explainable and interpretable machine learning further systematize the taxonomy of explanations and the limits of interpretability [40,41]. These approaches motivate the need to keep a clear chain from semantic drivers to risk outcomes and to management actions.

In risk engineering, established work clarifies risk definitions and emphasizes the practical role of uncertainty and human factors in complex systems [42,43,44]. Risk-based inspection planning and dynamic strategies show how risk indicators can be coupled to inspection decisions in engineering settings [45,46]. Related studies demonstrate that textual incident narratives can be mined to build reliability models and to extract safety risk factors from accident reports [47]. Reviews on diagnostics, prognostics, and risk-based maintenance also provide decision-oriented context for maintenance planning in complex assets [48,49,50,51,52]. At the same time, existing contributions typically solve only part of the chain, either focusing on text extraction, or on prediction, or on decision rules. This creates a gap for an integrated approach that transforms incident narratives into stable phenomena, links them to a risk metric through target-aligned factorization, and converts results into implementable portfolios of interventions. For clarity, the reviewed literature and its positioning relative to the present study are summarized in Table 1.

The aim of this research is to develop a universal, domain-agnostic method of semantic factor analysis of phenomena, which ensures the manageability of risk under conditions of data heterogeneity and environmental drift through a standardized transition from observational narratives to prioritized management actions.

Thus, the goal of the proposed innovative method is to ensure risk manageability in conditions of increasing complexity and entropy through the transition to the phenomenological level of representation and impact. The implementation of the method leads to a stable translation of heterogeneous narratives into invariant coordinates of phenomena, purposefully coordinating them with risk metrics and returning the result to the space of managerial actions, thus systematically closing the gap “data, model, solution” in multi-sectoral and drifting environments. To concretize, the objective is revealed in three interrelated aspects:

Unifying risk representation—building a global phenomenological space invariant across sources, domains and languages, in which observations are comparable over time and across organizations;
Targeted alignment and explainability—extracting a semantic factor kernel directly paired with risk indicators, with a transparent “phenomenon, factor, risk” trace;
Operationalized management—prioritizing and optimizing actions at the phenomenon level, subject to cost and regulatory and network constraints, ensuring reproducibility, drift resistance, and seamless integration into EAM/CMMS loops.

The goal of the method is not to describe phenomena per se, but to use them as a primary, standardized unit of control to proactively reduce risk functionality in complex systems, turning semantic observations into coherent, prioritized, and transferable management decisions.

This study presents an end-to-end method that turns free-text incident notifications into decision-ready risk management outputs for complex systems under drift. The method uses phenomena as the main unit of analysis, so heterogeneous narratives become comparable across time and organizations. It links extracted phenomena to a risk indicator with target-oriented factor modeling, so that the drivers of risk stay interpretable and traceable. It then produces optimized portfolios of interventions that can be directly implemented in EAM/CMMS workflows.

2. Materials and Methods

The proposed approach is a holistic scenario-optimization semantic analysis pipeline that automatically converts textual incident data into recommendations for risk management. The pipeline covers the entire path from text processing and thematic modeling to the generation of management decisions, ensuring the end-to-end traceability of the entire chain (data, then model, then decision). The input data are descriptions of emergency incidents together with the calculated risk index (a weighted indicator of consequences that aggregates the frequency, severity, and other effective characteristics of the event). The output of the conveyor belt is a composite factor representation of the thematic space of incidents, an interpeted regression model of the influence of the identified factors on the risk index, optimal and efficiency-ranked “levers” of intervention (proposed actions to reduce the risk, taking into account the constraints), and a graph of interrelated topics for the comprehensive planning of measures, as well as human-readable recommendations ready for integration into asset management systems (EAM/CMMS).

First, the preprocessing of incident texts and topic modeling is performed. The text reports are brought to a uniform format (cleaning of service characters, case normalization, tokenization, and lemmatization), after which latent dirichlet allocation (LDA) is applied to the corpus [11,12,13]. LDA topic modeling turns unstructured text into compact reproducible semantic coordinates: each document is described by a K latent topic distribution and each topic by a word distribution. The result is a matrix Θ of dimension N × K (where N is the number of incidents), which serves as a semantic “passport” of all events. Θ captures which latent themes are present in the description of each incident and with what proportion. The obtained topics are still abstract, so the auto-annotation of the topics is then performed using a large language model (LLM).

We designed the neural auto-annotation stage to be reproducible and auditable. Topic labeling was performed locally using Ollama with a fixed model version and a deterministic decoding setup. We used a constant temperature and a fixed random seed for all calls. We also fixed the remaining decoding parameters, including top-p, top-k, and repetition penalty. We constrained the maximum number of generated tokens to keep outputs short and comparable across runs. The prompt template was fixed and versioned. It received the same structured inputs for each topic, including the top words with weights and a small set of representative documents. All prompts and raw outputs were logged.

To verify reproducibility, we reran the labeling procedure multiple times under identical settings. In our experiments, the resulting topic labels were identical in more than 95% of cases. The remaining differences were typically minor and consisted of near-synonyms or small wording variations that did not change the operational meaning of the label. To keep the nomenclature stable, we applied a simple normalization step that standardized capitalization, removed redundant qualifiers, and mapped frequent synonymous variants to a single canonical label. This protocol makes the annotation step transparent and repeatable, while preserving the interpretability benefits of human-readable phenomenon names.

Based on the top-words and characteristic n-grams of each topic, as well as a few example documents in which this topic is dominized, the LLM generates short human-readable names for each topic (e.g., “unauthorized access”, “discharge of contaminants”, “component defects”, or “radiation exposure”). This does not affect the numerical part of the model but significantly increases the interpretability of the results and unifies the terminology for subsequent use in reporting and integrations.

To increase methodological transparency and reproducibility, we follow a fixed and documented topic modeling protocol. The LDA model is trained on lemmatized tokens that pass simple document frequency filters. Extremely rare tokens and extremely frequent tokens are removed, and a domain specific stop word list is added to the standard English stop word list. This reduces noise from idiosyncratic terms, boilerplate, and regulatory phrases that carry little discriminative information. All preprocessing steps and vocabulary filters are implemented as deterministic transformations, which makes it possible to rerun the pipeline and obtain the same input to the LDA stage.

The LDA configuration is selected using a combination of quantitative and qualitative checks. We train candidate models under different numbers of topics and prior settings and we monitor standard topic coherence scores such as

C_{v}

and UMass on held out data. For each candidate model we also inspect top words and representative incident reports for several topics. Models with very few topics tend to merge distinct operational scenarios into broad themes, while models with too many topics split stable phenomena into fragmented clusters. The final configuration is chosen as the one that achieves high coherence scores and remains interpretable for safety engineers in terms of stable, reusable phenomena.

To check stability, we repeat LDA training with different random seeds and on different temporal slices of the NRC Event Notifications corpus. We then compare the resulting topics by overlap of top words and by similarity of document-level topic distributions. The main operational themes, such as unauthorized access, pollutant discharge, and equipment failures, appear consistently across seeds and time windows. We log model parameters, vocabularies, and topic word distributions for each run. This allows for independent verification of the linguistic layer and supports the external replication of the semantic coordinates used in the subsequent PLS and regression stages.

To relate the thematic representation to the target risk score, the matrices are compressed into a factor space by the target PLS projection. Linear combinations of features are selected that are maximally consistent with risk, eliminating multicollinearity and preserving interpretability through weights. The number of components is selected using cross-validation [14,15,16].

A regularized linear regression is trained on the obtained factors, estimating the contribution of each factor to the risk index (basic–L2-regularisation) [17]. Adjustment is performed on a set of metrics (R², RMSE/MAE/MAPE, CV indicators) with temporal cross-validation, which provides a balance of accuracy, generalizability, and interpretability. The coefficients actually set “levers” for subsequent optimization [18,19].

Next, a scenario-based optimization of management actions is formulated. A vector of interventions by topic is entered, their effects are assumed through a matrix of elasticities, and then the predicted risk is recalculated and minimized taking into account constraints and costs [20,21,22]. Bayesian optimization on Gaussian processes is used to find the optimal plan. The output is the optimal intervention levels and the ranking of topics by risk reduction/cost efficiency [23,24,25].

Prioritized themes are aggregated into a graph of thematic associations. Nodes are themes and edges are statistically significant associations (co-occurrence, correlations/partial correlations, causal relationships if available). The graph is annotated with metrics from previous steps: nodes–contribution to risk (PLS factors), centrality, and edges–strength of association (correlation/MI) and, where appropriate, effectiveness of the collaborative intervention. This representation provides a “management map” where the impact on high-centrality nodes has direct and indirect effects, e.g., dense subgraphs form coherent packages with the potential for multiplicative risk reduction [26].

Based on the graph and attributes (topic names, influence weights, links, elasticities, and constraints), LLM generates structured recommendations: type of work, periannuality, responsible persons, resources/competences, control points and KPIs, in a formatted format ready to be loaded into EAM/CMMS. This transforms analytics into operational artifacts, bridging the gap “analysis, action” and closing the data loop.

It is important to emphasize that the proposed method is iterative and supports the full cycle “data–models–solutions–effects–data”. After the implementation of the recommended actions and the occurrence of new incidents, their results are fed back into the loop: the accumulated descriptions and updated risk indicators are fed back into the input of the pipeline. The LDA model is re-trained on the replenished corpus (with stability control to ensure that new data do not distort previously found themes), PLS components and regression coefficients are recalculated with fresh data, and elasticities and weights on the graph are updated [27]. Periodic revisions of the model’s hyperparameters—e.g., the number of K themes or the set of intervention scenarios considered—may be conducted using regression tests on historical data to ensure that the quality of the model does not degrade over time.

To formalize the “feedback” block shown in Figure 1 and to make the pipeline explicitly adaptive, we treat the operational outcomes of implemented interventions as first-class training signals, captured in the same EAM/CMMS loop that executes the recommendations. Concretely, each deployment round

t

logs an auditable tuple (

Θ_{t}, a_{t}, E_{t}

), where

Θ_{t} \in R^{N_{t} \times K}

is the pre-intervention document–topic matrix for the current time window,

a_{t} = {a_{t, k}}_{k = 1}^{K}

is the implemented portfolio of topic-level intervention intensities, and

E_{t}

is post-intervention evidence (updated incident narratives, refreshed values of the proxy risk index

y_{proxy}

, tail indicators such as

CVaR

where applicable, and execution/quality logs of work orders). This makes the pipeline not only iterative in the «new data, retrain» sense, but explicitly outcome-driven. Recommendations are evaluated against realized effects, and the model is updated to better align its risk estimates and prescriptions with operational reality.

This outcome-driven update can be structured as Regression with Human Feedback (RHF), directly inspired by the alignment loop of reinforcement learning from human feedback (RLHF) used for instruction-following language models [28]. In RLHF, a supervised model is complemented by a human preference signal and an optimization step that improves decisions with respect to this signal. In our setting, the supervised backbone is the traceable regression

f (\cdot)

trained on PLS components. The factor kernel is

Z_{t} = Θ_{t} W_{t}

, and the predicted proxy risk is

\hat{y_{proxy, t}} = f (Z_{t})

. The corresponding regression coefficient vector

b_{t}

is traceable in the PLS subspace, i.e.,

b_{t} \in Span (W_{t})

. The «action» is the intervention portfolio

a_{t}

, and the «human feedback» is the expert assessment of effectiveness together with post-intervention operational outcomes that reveal whether the prescribed interventions actually reduced risk.

Formally, the prescriptive step uses the calibrated deformation of the semantic profile under interventions:

Θ_{t}^{'} = Θ_{t} - Δ Θ (a_{t}),

where

Δ Θ (a_{t})

is induced through the elasticity structure estimated in the optimization block. The predicted post-intervention risk under the implemented portfolio is then

\hat{y_{proxy, t}} (a_{t}) = f (Θ_{t}^{'} W_{t})

, and the predicted effect is

Δ \hat{y_{proxy, t}} (a_{t}) = \hat{y_{proxy, t}}

. After execution, the operational loop provides realized values

y_{proxy, t}^{pre}

and

y_{proxy, t}^{post}

, as well as an expert feedback score

h_{t}

for the effectiveness and appropriateness of the intervention. These signals define a scalar feedback target (utility)

r_{t}

used for learning (

r_{t} = (y_{proxy, t}^{pre} - y_{proxy, t}^{post}) - λ C (a_{t}) + η h_{t}

, where

C (a_{t})

is cost/effort and

λ, η

are governance-controlled weights. Thus, RHF uses the same traceable «phenomenon, factor, risk, action» chain, but closes it with an explicit «action, observed outcome, model update» channel.

The feedback loop enables two complementary and operationally realistic update modes, explicitly addressing how the PLS layers are refined. On a fixed schedule or when drift diagnostics trigger, we retrain the semantic layer and re-estimate the target PLS projection

W_{t}

and the regression parameters in

f (\cdot)

on an expanded or rolling-window corpus, using time decay weights and feedback-based sample weights. To preserve interpretability and auditability, we keep

W_{t}

fixed for a period and update only the regression layer in factor space. In parallel, the prescriptive block is calibrated by contrasting realized and predicted effects,

Δ y_{proxy, t} = y_{proxy, t}^{post} - y_{proxy, t}^{pre}

versus

Δ \hat{y_{proxy, t}} (a_{t})

, which updates the elasticity parameters that generate

Δ Θ (a)

and (if used) Bayesian optimization priors. This formalization turns the framework from a static analyzer into a self-improving tool that actively counters concept drift through outcome-driven learning.

To keep human-in-the-loop learning feasible, explicit feedback is requested only for high-impact or high-uncertainty cases, while routine updates rely on automatically captured post-intervention incident narratives and EAM/CMMS execution logs. When feedback is multi-source, we align heterogeneous ratings by modeling rater reliability and confidence and aggregating them into consistent targets, following the general approach of crowd-sourced feedback alignment studied by Wong and Tan (2025) [29]. In addition, when absolute scoring is difficult, experts can rank alternative candidate portfolios generated by the optimizer. These pairwise preferences provide a stable supervision signal analogous to RLHF preference datasets, but applied here to intervention portfolios rather than text outputs. A simple instantiation is to fit a utility model

u_{ϕ} (Θ, a)

from comparisons

(Θ_{t}, a_{t}^{(i)}, a_{t}^{(j)}, i ≻ j)

via

\Pr (i ≻ j∣ Θ_{t}) = σ (u_{ϕ} (Θ_{t}, a_{t}^{(i)}) - u_{ϕ} (Θ_{t}, a_{t}^{(j)}))

, and then select portfolios by maximizing the resulting expected utility subject to the operational constraints already encoded in the optimization step.

This approach ensures the robustness of the method to data drift and changes in external conditions, while maintaining reproducibility and auditability. At each round, versions of LDA/PLS dictionaries and parameters, model coefficient values, optimization settings, and graph snapshots are stored [28]. Together, the method integrates heterogeneous textual and quantitative data, provides interpretable insights at the level of themes, factors and activities, automates the transition from data analysis to management actions, while remaining transparent and reproducible for further analysis and trust by sectoral experts [29,30,31,32,33].

Figure 1 below presents a comprehensive flowchart of the scenario optimization method of semantic factor incident analysis. The flowchart visualizes the key steps and modules of the proposed pipeline and the interrelationships between them, demonstrating how unstructured textual data is sequentially transformed into risk factor models and management recommendations. The multi-level structure of the solution is shown, with forward, cross-links and feedback: from initial text processing and thematic modeling to recommendation generation and closing feedback, as well as an artifact registry for auditing and replication that feeds the effects of implemented interventions back into the data loop for adaptive model retraining.

The flowchart illustrates the main stages of the methodology and their interactions. The input data—textual incident reports and associated numerical risk indicators—are sequentially preprocessed (cleaned and normalized), after which latent themes of incident descriptions are extracted using LDA. Next, the neural network auto-annotation module assigns human-readable names to the obtained topics, increasing interpretability. In the next step, the topic space is projected into compact factors using a targeted PLS factorization method aligned with the risk index; this reduces dimensionality and eliminates multicollinearity, preserving the informativeness of the prisigns [34,35,36,37,38]. The obtained factor coordinates are fed into a regularized regression block that estimates the quantitative impact of each factor (topic) on the integral risk [39,40].

The regression results are used in the Bayesian optimization module: the model selects the optimal set of management “levers” (measures), i.e., calculates what proportion of incidents for each topic should be prevented or reduced to minimize the expected risk given the constraints. At the same time, a graph of thematic links is constructed, the nodes of which correspond to the auto-annotated themes, and the edges reflect significant statistical associations between them. The nodes and links of the graph are equipped with attributes from the previous stages (impact on risk according to regression data, elasticity and cost of measures according to the data of the optimization module, and centrality measures). This graph serves as a basis for generating recommendations: the final LLM module generates preventive and corrective measures understandable to specialists, aggregating the topics into comprehensive proposals with prioritization, resources, and integration into the asset management system (EAM/CMMS) [41,42,43,44].

The right-hand part of Figure 1 closes the operational loop of the method: after the recommended measures are implemented in EAM/CMMS and their effects are monitored, the resulting evidence (new incident narratives and updated indicators) is returned to the pipeline as refreshed input data. The verification/regression tests then compare predicted and observed outcomes and provide input for the retraining stage. In this retraining cycle, the semantic and predictive layers are refreshed—i.e., the LDA topic distributions are updated, the PLS components and regression coefficients are recalculated, and the intervention elasticities are re-calibrated—so that the framework adapts to evolving conditions and remains robust to data drift during continued operation [45].

Verification of the phenomenon-centric method is performed on the open corpus of NRC Event Notifications—a consolidated array of textual descriptions of incidents with associated attributes (date/time, organization and site, jurisdiction, 10 CFR codes, accident classes, etc.) [46,47,48,49,50]. The corpus covers a long time interval (about two and a half decades) and is formed by multiple operators and sites, which ensures both temporal variability (drift of vocabulary, reporting practices, and regulations) and structural heterogeneity (different technological contexts, safety cultures, and regulatory requirements). Incident texts are quite “deep” (containing descriptive motivations, conditions, consequences, and normative references) and semantically rich (stable collocations, domain terms, and referential attributes), which is crucial for thematization and the extraction of phenomena [51,52]. The corpus is representative of the global problem articulated in the introduction:

The heterogeneity of sources and domain practices mimics a real inter-organizational ecosystem;
The unstructured nature of primary descriptions verifies the method’s ability to systematically “stitch” narratives into manageable phenomenological coordinates;
The temporal scope allows for explicitly checking the resistance to drift (lexical, normative, and processual);
The presence of normative anchors (10 CFR, accident classes) creates a natural ground for proxy risk indicators and for traceable “phenomenon, factor, risk, action” linkages.

The corpus covers a wide range of risk phenomena (operational disturbances, equipment failures, procedural deviations, and communication/regulatory events), which allows the transferability of thematic coordinates between heterogeneous scenarios to be assessed. In terms of size, the corpus volume (≈ tens of thousands of records) is adequate for the typical dimensionality of the topic space and the subsequent target factorization. After document–theme thematization, the Θ matrix is projected into a compact factor kernel, Z = ΘW with K ≪ |V| (the number of topics and factors is selected by validation), which forms a favorable “observations per component” ratio. The minimum detectable linear effect (using the Fisher Z approximation for correlation) at significance level (α) and power (1 − β) is estimated as follows:

r_{m i n} \approx \sqrt{\frac{{(z_{1 - α / 2} + z_{1 - β})}^{2}}{n - 3},}

(1)

where n is the number of observations in the training split. For (n) approximately tens of thousands,

r_{m i n}

is up to

O 10^{- 2},

which is sufficient to detect weak but systematic factor contributions. The unbalanced nature of severe events is also important: the presence of a “thin tail” in accident rates naturally tests the tail metrics (CVaR) and the robustness of the regression on sparse extreme patterns. Together, this gives both substantive and statistical sufficiency of the corpus for validating the holistic “texts, phenomena, factors, risk, action” loop.

In the absence of a single “true” scalar business risk index, a proxy index y^”proxy” is introduced for all records aggregating normative and hard indicators (accident class, 10 CFR codes, off-site notification, length and “density” of the narrative, etc.) with monotonic normalization:

y_{i}^{p r o x y} = σ (w_{E C} \cdot s (E C_{i}) + \sum_{j} w_{j} \cdot 1_{(C F R_{i} \in C_{j})} + w_{l e n} \cdot \log (1 + l e n_{i})),

(2)

where (σ) is the logistic normalization and s(·) is the ordinal coding of the crash class. In parallel, an auxiliary classification tail is generated to align with the CVaR optimization target:

z_{i} = 1, y_{i} \geq q_{1 - α} (y), α \in {0.05, 0.10},

(3)

which allows for the simultaneous evaluation of regression calibration and recognition of high-risk cases.

To avoid time leakage, a chronologically meaningful split is used: training part (early years), validation part (interval for selection of hyperparameters and number of factors), and out of time OOT test (recent years). A within-training rolling origin with «purging» of neighboring windows is applied and the summary metric is averaged with weights on the window size:

M = \sum_{k = 1}^{K} w_{k}, M (V_{k}), w_{k} = \frac{|V_{k}|}{\sum_{j} |V j|}

(4)

.

The recruitment of metrics is organized into groups:

Goodness-of-fit ( $R^{2},$ RMSE/MAE);
Scale errors and calibration (slope/shift of regression ( $\hat{y}$ ) on (y), SMAPE/MASE);
CV metrics by time windows with confidence intervals;
Stability of representations (convergence of themes, robustness (W), and rank stability of factor contributions);
Graph replicates (convergence of edges/centrality);
Tail risk CVaRα ( $\hat{R}$ ) and PR-AUC/AUC for $z_{i}$ ;
Operationalizability (proportion and “footprint” of recommendations suitable for loading into EAM/CMMS).

To address reproducibility and methodological clarity across the full “tokens to actions” chain, we explicitly evaluate stability at the factor, regression, and prescriptive levels. This is necessary because the downstream estimates depend on the PLS-based semantic kernel and on the resulting regression coefficients that drive the optimization stage.

The stability of the PLS semantic factorization is evaluated across the rolling-origin splits used in the temporal validation design. For each split, we refit the PLS projection on the training window and obtain the topic for the factor loading matrix. We then compare these loading matrices between splits after resolving the sign indeterminacy of latent components. We report similarity both at the component level and at the subspace level. Component level similarity is computed using the cosine similarity between aligned loading vectors. Subspace level similarity is computed using the principal angle or Procrustes style alignment diagnostics. In addition, we compare factor scores obtained on a fixed reference subset and report their correlation structure across splits. This directly tests whether the semantic risk factors are stable under changes in the training period.

Robustness to sampling noise is assessed via bootstrap resampling inside each training window. We repeatedly resample incident reports with replacement, refit the PLS projection and the regression model, and summarize the variability of loadings and coefficients using empirical intervals and sign consistency rates. This quantifies whether the factor layer and the estimated risk contributions remain stable when the data contain stochastic variation and sparse tail events.

The regression interpretability is supported by coefficient stability diagnostics. We treat regression coefficients as associations between latent semantic factors and the proxy risk index. We do not interpret them as causal effects. To justify their use as inputs to prescriptive optimization, we report coefficient stability under both bootstrap resampling and rolling-origin splits. We also report the stability of the rank ordering of absolute coefficient magnitudes, since this ranking is used to define intervention leverage priorities.

Prescriptive recommendations are tested for reproducibility at the level of selected levers and portfolio composition. The prescriptive module is deterministic under fixed random seeds and fixed solver settings, but multiple near-optimal portfolios can exist. Therefore, we evaluate stability by repeating the full pipeline under controlled perturbations and reporting overlap metrics for the top recommended levers. We report Jaccard overlap for the top N levers and the dispersion of portfolio cost and predicted risk reduction across repeats. This directly tests whether the recommended actions are consistent across repeated runs of the pipeline.

Resistance to concept drift is evaluated with explicit temporal diagnostics on the semantic, factor, and regression layers. At the semantic layer, we measure how the distribution of inferred topic mixtures changes over time windows, and we monitor the stability of topic word distributions and topic coherence in delayed periods. At the factor layer, we track the similarity of the PLS loading subspace across time windows. At the regression layer, we track coefficient and ranking stability over time. These diagnostics complement the standard predictive and calibration metrics and provide direct evidence of whether the semantic representation and the coupled risk model remain stable when the narrative distribution shifts.

Ablations are provided to isolate component contributions: BOW/TF-IDF without thematization, LDA, Ridge versus LDA, PLS, Ridge/Elastic Net, optimization with/without graph regularization

u^{⊤}

Lu, LLM annotation ON/OFF, and permutation control π(y).

Thus, the corpus used is both realistic (inter-organizational heterogeneity, normative anchors) and rich (long narratives, diversity of phenomena), and its long time span creates natural conditions for testing robustness to drift. The sample size and its structural heterogeneity provide statistical power to estimate weak but systematic effects and allow for the validation of the end-to-end loop “data, phenomena, factors, risk, action” rather than individual algorithmic modules. The described validation design provides a rigorous basis for further testing and interpretation of empirical results without going beyond the intended purpose of the study [53].

All experiments were implemented in Python (version 3.x) using a reproducible open-source stack. Data handling and tabular transformations were carried out with pandas (v2.2) and NumPy (v1.26), while scientific utilities relied on SciPy (v1.16.2). Topic models and regression baselines were built using the scikit-learn library, complemented by gensim for exploratory topic modeling and NLTK for tokenization and lemmatization. Graph construction and analysis for the thematic association network were implemented with NetworkX, and the Bayesian optimization of intervention scenarios used Optuna (v4.6.0). Visualizations (Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6) were produced with Matplotlib version 3.10.6 and auxiliary plotting routines. The method therefore relies on standard, well-tested, open-source components, and only the authors implemented the integration logic of the end-to-end pipeline (data preprocessing, rolling-origin validation, scenario optimization, and artifact logging) as custom Python code on top of these libraries.

The full pipeline—from raw incident texts to trained topic models, PLS factorization, cross-validated regressors, and optimized intervention portfolios—was executed on a workstation-class Linux machine equipped with a multi-core Intel Xeon CPU (10 hardware cores), 64 GB of RAM, and no dedicated GPU. On this hardware, training the LDA model on the final NRC EN corpus with the selected number of topics required on the order of several minutes of wall-clock time. A complete run of the rolling-origin evaluation over all regression families and factorization variants was completed within a few hours. Training and evaluating a single configuration typically took from tens of seconds to a few minutes. Scenario optimization with Bayesian search and graph construction added comparable overhead but remained comfortably within interactive timescales, which confirms the practical feasibility of deploying the approach in operational risk management loops. Random seeds were fixed for all stochastic components to ensure the reproducibility of the reported results.

For completeness, we briefly summarize the regression models used in the study. We denote by X the matrix of predictors (in our case, these predictors are the latent PLS factors obtained from thematic coordinates), and by y the proxy risk index. All considered approaches are linear regression models that estimate an intercept term and a vector of coefficients, producing predictions as a linear combination of the predictors plus an intercept.

Ordinary Least Squares (OLS) fits the model by choosing the coefficients that minimize the overall discrepancy between the observed values of the target variable and the model predictions, measured as the sum of squared residuals. This is the standard unregularized linear regression baseline.

Ridge regression (L2 regularization) extends OLS by adding a penalty that discourages large coefficient values. Concretely, in addition to minimizing squared residuals, Ridge also minimizes the squared magnitude of the coefficient vector, scaled by a regularization strength parameter. This type of shrinkage improves stability under multicollinearity and helps prevent overfitting, which is why it is used in our best-performing configuration (PLS + SumDiff + Ridge with L/R scaling).

Lasso regression (L1 regularization) also extends OLS, but uses a different penalty: it discourages large coefficients by penalizing the sum of absolute values of the coefficients. This property tends to produce sparse solutions, meaning that some coefficients can become exactly zero. As a result, Lasso performs an implicit form of feature selection.

Elastic Net (combined L1 and L2 regularization) merges the two ideas above by using a weighted combination of the L1 and L2 penalties. A mixing parameter controls the balance between the Lasso-like sparsity effect and the Ridge-like shrinkage effect. This is often beneficial when predictors are correlated and one also wants some degree of sparsity.

In our pipeline, PLS regression is applied before these linear models in order to transform the input representation into a set of latent components that are maximally aligned with the response. The resulting PLS factors are then used as inputs to OLS, Ridge, Lasso, and Elastic Net. Model hyperparameters (regularization strength, Elastic Net mixing parameter, the number of PLS components, and the L/R scaling options) are selected using cross-validation within the rolling-origin evaluation framework described earlier in this section.

3. Results

A consolidated corpus of NRC Event Notifications was generated to test the performance of the method. The final sample consisted of 27,299 records with a predominant time span of 1993–2025. Single obvious date artifacts were excluded from the dynamic slices. The corpus aggregates reports from multiple operators and sites, contains rich attributes (date/time, facility and site, jurisdiction, 10 CFR regulatory codes, accident classes, etc.), and therefore combines temporal variability (lexical and regulatory drift) with the structural heterogeneity of technological contexts—exactly the kind of “stress environment” in which a phenomena-centric approach should remain robust. This composition makes the data indicative of the task: significant volume, long narratives, a wide range of phenomena (from operational and equipment failures to organizational and regulatory events), and the availability of normative anchors for building a proxy risk index and subsequent tracing from phenomenon to factor, risk and, finally, to action.

Before analyzing the regression layer and optimization results, we checked that the thematic representation itself is meaningful and stable. The selected LDA configuration achieves topic coherence values in a range that is usually interpreted as moderate to good, and these values are higher than for alternative settings with fewer topics or without document frequency filtering. Repeated training with different random seeds produces very similar clusters of top words, and the dominant themes remain present when the model is trained on early and late subsets of the corpus. Together with ablation experiments that remove the LDA layer or replace it with a simple bag of words and TF–IDF baselines, this indicates that the phenomena used in the factor models and in the optimization are robust structures of the data rather than artifacts of a single random run.

Beyond predictive accuracy, we evaluated whether the semantic factorization, regression layer, and prescriptive outputs are stable under time splits and repeated pipeline runs. This step is critical because the proposed framework claims traceability and reproducibility across the entire chain, and because the prescriptive recommendations inherit uncertainty from upstream linguistic representations.

For the PLS-based semantic factorization, we computed stability diagnostics across rolling-origin splits. We report the similarity of topics to factor loadings after the alignment of latent components, and we report the stability of factor scores on a shared reference subset. We also report the bootstrap variability of the loading structure within training windows. These results are summarized in a dedicated stability table. They provide direct evidence that the extracted semantic risk factors are not artifacts of a single split.

For the regression layer, we report coefficient stability across time windows and bootstrap resamples. We summarize sign consistency and rank the stability of absolute coefficient magnitudes. This supports the interpretation of the reported “quantitative impacts” as stable associations between semantic factors and the proxy risk index, and it clarifies the conditions under which coefficient-based leverage ranking is meaningful for prescriptive optimization.

For the prescriptive module, we repeated complete pipeline runs under controlled seeds and solver settings and evaluated the stability of the resulting recommended lever sets. We report overlap metrics for the top recommended levers and the dispersion of predicted risk reduction and portfolio cost. This directly addresses whether the recommended actions would remain consistent when the pipeline is rerun, even when upstream components are subject to small stochastic variation.

Finally, we complement the rhetoric of drift resistance with explicit temporal diagnostics. We quantify semantic distribution shifts over time windows, and we track whether factor loadings and regression coefficients remain stable when the narrative distribution changes. Together with the rolling-origin evaluation already used for predictive metrics, these analyses provide an empirical basis for the claimed drift resilience.

The verification design targets drift robustness. A chronologically consistent split is used (training–early years, validation–pro-interim period for hyperparameter selection, and out-of-time test–recent years) and rolling origin with purging of neighboring windows. The summary score is collected by groups of metrics reflecting different aspects of quality: goodness of fit (R², EVS, RMSE/MAE), scale errors/calibration (slope/shift, SMAPE/MASE), cv metrics (stability over time), diagnostic residuals, and probabilistic and information criteria. This multi-axis control allows scenarios to be compared on the trade-off between accuracy, tolerance, and calibration, rather than on a single metric.

The radar diagram above summarizes the composite scores for groups of metrics for the entire pool of scenarios (OLS/Lasso/Ridge/ElasticNet variants “as is”, their L/R versions with robust scaling, γ-regularization, and configurations with targeted PLS factorization). L/R convolutions and PLS + regularization form the most “round” polygons—they fill almost the entire triangle of goodness of fit–scale errors–cv metrics, indicating both correct error scaling and stability over deferred periods. On the contrary, “Original” versions without L/R transformations and aggressive domain-free compression (PCA95) show dips in 1–2 axes, signaling insufficient tolerance and calibration during drift, thus visually confirming the methodology hypothesis that targeted PLS compression of thematic coordinates is critical for industrially acceptable accuracy and stability.

The integral horizontal ranking quantitatively consolidates the conclusions of the radar. The group of L/R-models with γ-regularization and/or PLS-core leads. Top positions and composite scores (0–1):

Ridge Gamma L/R—0.600;
Lasso L/R—0.592;
ElasticNet L/R—0.591;
Lasso Gamma L/R—0.590;
ElasticNet Gamma L/R—0.589;
SumDiff PLS + Ridge L/R—0.582;
PLS L/R—0.571, OLS L/R—0.571, OLS Gamma L/R—0.568, and Ridge L/R—0.543.

Noticeably, the “Original” versions without L/R are inferior in terms of total quality (OLS Original—0.536, Ridge Original—0.523, Lasso Original—0.337, ElasticNet Original—0.324), while PCA95 + Ridge L/R—0.283 and “extrapolation from L/R” (Ridge EXT) turn out to be the lowest. Together with radar, this directly points to the value of target factorization (PLS) and outlier-resistant scaling as necessary elements of the end-to-end pipeline: they are what make the balance of the “accuracy–deceivability–calibration” systematically achievable on drift sampling.

Taken together, these results confirm the correctness of the methodology’s architectural decisions. The translation of texts into stable thematic coordinates (LDA, then auto-annotation), their target compression (PLS), and regularized regression on factors create a stable basis for the next steps—the scenario optimization of levers and construction of the graph of thematic relations, where management actions will be calculated over the interpreted factors and taking into account the constraints.

Neural auto-annotation is used only to assign human-readable names for the discovered topics. It does not change the underlying document–topic distributions produced by LDA. All quantitative steps in the pipeline use these numeric topic mixtures as inputs. This includes the PLS factorization, the regression models, and the scenario optimization. For this reason, predictions and recommended interventions do not depend on the wording of the labels. Labels affect interpretation and reporting, but not the computed risk factors or optimization results.

To compare scenarios, we aggregated quality into a single composite score for three groups of metrics: goodness_of_fit (approximation accuracy and error scale consistency), scale_errors/calibration, and cv_metrics (time tolerance). Each primary metric was normalized to a 0–1 scale within its group, then averaged into a group sub-score, after which the overall model score was calculated as a weighted average of the sub-scores (sum of weights = 1). This multi-axis aggregation allows the models to be ranked based on a balance of accuracy, calibration, and stability, rather than on a single metric, which is particularly important for the NRC EN drifting corpus. A detailed definition of the groups of metrics and the temporal design of the validation is given in the Section 2, where this three-axis estimation scheme is justified.

The radar diagram of the group sub-score shows a key feature of the top configurations. Their polygons are maximal in all available vectors, i.e., simultaneously high values are achieved in all three axes. In particular, chains with PLS target factorization and robust scaling (L/R) form the tops of the diagrams, whereas the “original” variants without L/R and aggressive domain-free compression (PCA95) show dips on one or two axes, indicating worse tolerance and calibration in delayed windows. This empirically confirms the methodological requirement. PLS + regularization and L/R transformations are essential elements for robust accuracy on a heterogeneous and drifting sample.

The final weighted rating of the models is shown in the horizontal diagram. The leaders are as follows:

SumDiff PLS + Ridge (L/R)—0.938;
Ridge Gamma (L/R)—0.938;
Then, Lasso Gamma (L/R)—0.926, Lasso (L/R)—0.926, ElasticNet (L/R)—0.922, Elas-ticNet Gamma (L/R)—0.921;
“upper echelon” is closed by PLS (L/R)—0.876;
OLS (L/R)—0.841, Ridge (L/R)—0.828, OLS Gamma (L/R)—0.821–mid-range;
“original” without L/R is noticeably lower (OLS Original—0.801, Ridge Original—0.782, Lasso Original—0.442, ElasticNet Original—0.393);
The lowest scores are PCA95 + Ridge (L/R)—0.349 and Ridge EXT (from L/R)—0.211.

This order quantitatively captures the balance. SumDiff PLS + Ridge (L/R) maximizes the cv_metrics block, while Ridge Gamma (L/R) has a slightly higher goodness_of_fit—as a result their final scores are the same. It is the ability to hold a high level of accuracy, calibration, and portability that puts these configurations in the lead. In addition, schemes without target factorization or without robust scaling lose ground due to degradation on at least one axis.

From a practical point of view, this means that for further steps of the pipeline (estimation of factor contributions, leverage optimization, and the graph of thematic relationships), one should rely on the upper group of models. They provide the best compromise “accuracy–stability–calibration”, i.e., they minimize the risk of over-learning for the historical vocabulary and correctly capture the magnitude of errors in the transition to out-of-time periods, which is critical for operation in production risk management loops [54,55,56,57,58,59].

Moving from the “flat” ranking to analyzing the interactions of the conveyer components, we constructed a cube visualization of the composite score (0–1) along three orthogonal axes:

(X) regressor family (OLS, Ridge, Lasso, ElasticNet, PLS-reg, PCA95 + Ridge, Ridge EXT);
(Y) scaling/robust normalization option (Original vs. L/R);
(Z) feature factorization “layer” (Base → PLS only → PLS + SumDiff).

Analysis of the cube visualization of the composite scores shows several consistent effects and their interactions. First, the “main effect” of L/R is noticeable. The transition from Original to L/R configurations yields an increase in integral scoring in almost all model families; the magnitude of the gain varies from ≈+0.03 to ≈+0.15—from moderate for OLS to pronounced for Ridge/Lasso/ElasticNet, which is fully consistent with the “flat” ranking, where L/R versions systematically outperformed the original ones. The second baseline effect is related to the factorization layer; moving up the Base, PLS only, PLS + SumDiff axis leads to a consistent increase in quality. The typical gain at transition Base, then PLS is ≈+0.07… +0.10, and additional step PLS, PLS, SumDiff adds another ≈+0.02… +0.06. Visually it appears as the “warming” of colors from the lower to the upper shelf of the cube and confirms that the target PLS factorization acts not as an auxiliary, but as a key driver of portability and calibration.

Against the background of these main effects, the “PLS-layer × L/R” synergy is particularly noticeable. The maximum values of the composite score are concentrated in the “top-right” corner of the cube—the combination of L/R + PLS + SumDiff. This is where the previously identified leaders—SumDiff PLS + Ridge (L/R) and Ridge Gamma (L/R)—are located—both configurations give Overall ≈ 0.938, which shows that the best result is achieved not by choosing one “right” module, but by their coordinated combination.

Ridge, Lasso, and ElasticNet appear to be the least conflicting with PLS and benefit simultaneously from both factorization and L/R, reaching upper quality levels. PLS-reg as a separate family is stable and strong in the PLS only layer, but more often yields to the PLS + SumDiff composition with external regularization. OLS moderately gains from L/R, but even with PLS + SumDiff remains in the middle echelon. The PCA95 + Ridge binding shows a systematic lag even in the L/R layer, indicating a deterioration of tolerance under domain-agnostic compression. Finally, Ridge EXT (from L/R) forms the worst “angle” cube (up to ≈0.21), as ablation with regressor removal from the L/R contour destroys feature scale and para-metric matching.

From an operational point of view, this means that the “hot” blocks of the cube coincide with the top of the flat ranking, and the L/R + PLS(+SumDiff) + regularizer bundle from the Ridge/Lasso/ElasticNet family provides a stable accuracy–calibration–tolerance trade-off on the drifting case. On the contrary, attempts to replace PLS with a domain-free PCA or to break the coherence of the pipeline lead to the structural degradation of quality.

Thus, the “cube” shows not only who is the leader, but also why. Quality is the result of a coordinated trio of solutions (robust scaling × target factorization × regularized class) rather than a single “strong” model. This is the methodological rationale for selecting the top group of configurations for the next steps, i.e., assessing the contributions of the topics, optimizing the selection of “levers”, and building the linkage graph.

By localizing the hot configurations found in the multivariate ranking at the level of internal representations, we consider the stability, calibration, and factor structure of the best model from the upper echelon (PLS + SumDiff + Ridge in the L/R-loop). The distribution of the cross-sectional explained variance shows that CV-R² is consistently high (Figure 7). The median is close to unity, the interquartile range is narrow (≈0.9–1.0), and sporadic dropouts at early windows (about 0.2–0.3) reflect the expected sensitivity to local lexical and process drift in historical periods. The “predicted vs. actual” diagram demonstrates good scale calibration. The point cloud lies along the diagonal without systematic displacement. Noticeable clusters near levels 0, ≈0.5, and 1 correspond to the discrete structure of the proxy index (aggregation of normative/class indicators) and confirm that the regressor correctly reproduces both the low-risk “background” and the medium/high-level “sto- tions”. The upper tail shows rare cases of underprediction (points under the diagonal at actual ≈1), which is natural for CVaR-sensitive settings. Extreme patterns are rare and can be compressed by regularization. They will be the targets of targeted monitoring enhancement in the operational loop. Ridge absolute coefficient decomposition in the PLS component space reveals a compact “kernel” of the predictive mass. The second component makes the largest contribution (|β| is maximal), followed by the first, third, fifth, and tenth components with decreasing weights. The rest constitute a “thin tail” of small but non-zero effects. This picture corresponds to the target meaning of PLS. Several orthogonal factors aligned with the risk metric concentrate the main informativeness, ensuring both interpretability (via the topic, factor-loading matrix) and robustness (suppression of multicollinearity of topics).

To link the factor structure to the semantic level, we assessed the permutation importance of the Sum/Diff thematic aggregates. The gradient features “Miscellaneous Incidents” and “Threats and Incidents” dominate the top 20, significantly outperforming the rest of the pool. This is followed by “Discharge of pollutants” and, as the first summary predictor, “Unauthorized access (summary)”. The prevalence of diff coordinates indicates that the risk is sensitive to the dynamics of topics (change of frequency/content of episodes) and not only to their absolute proportions. It is the rate of change in “miscellaneous incidents”, “threats”, and “discharges” that is an early indicator of risk growth. At the same time, the high significance of the total level of “unauthorized access” is consistent with the optimization block. Even without sharp dynamics, the increased “access background” itself gives the highest marginal derivative of the target risk function, which makes it the primary point of application of interventions. This combination of “dynamics of most topics + level of key topic” explains the previously observed asymmetry in the polar leverage diagram and the bridging role of “resets” in the linkage graph. On the management map, these phenomena form a contour where changes in practices are most quickly translated into a decrease in the integral index.

Building on the identified leader (PLS + SumDiff + Ridge in the L/R-loop) and its factor structure, we formalized the management choice setting as the optimization of an action vector

a \in {[0,1]}^{K}

, where

a_{k}

is the intensity of intervention on topic k (averted proportion of events/reduced severity). The change in topic profile was modeled by deforming Θ′ = Θ − ΔΘ(a) through calibrated elasticities, after which risk prediction was recalculated as (a) = f(Θ^′ W), where f is a trained regressor on PLS components and W is the topic, the factor loading matrix. The objective function combined expected risk and cost/constraints

J (a) = E [\hat{y} (a)] + λ C (a) J

and minimized the Bayesian optimization, which allowed for searching for an optimal “set of levers” under limited resources and uncertainty of estimates. The polar diagram of the factor mix shows that the largest modulo differential contribution belongs to the topic “Unauthorized access”. Its “cutting” gives the greatest marginal gain in the risk functional. For the dynamic (“diff”) coordinates— “Various incidents”, “Threats and incidents”, and “Pollutant discharge”—the marginal effects are noticeably smaller in modulus and close to zero, indicating the rationality of combining them into package measures (synergistic effect is manifested by joint impact rather than single levers). Thus, the optimization not only confirmed the dominance of “unauthorized access” but also established the priority of the combined strategy for ordinary diff themes [60]. Next, to capture the structure of inter-thematic dependencies and to identify points where bundling yields a multiplicative effect, a statistical relationship network was constructed. In addition, the permutation feature importance ranking (top 20, Sum/Diff) for the best-performing scenario is shown in Figure 8 as a horizontal bar chart.

Nodes correspond to auto-annotated topics, edges—to significant associations (co-occurrence/partial correlations/conditional mutual information). In the obtained topology, “Pollutant discharge diff” plays a central role. It connects “Unauthorized access (total)” to “Miscellaneous incidents”, as well as to the topics “Explosives (total)” and “Radiation exposure (total)”, and is linked via shortcuts to “Equipment accidents (total)”. In the language of graph theory, the node “Discharge of pollutants” has a high mediating centrality (often lying on the shortest paths between other topics), while “Unauthorized access” has a prominent, albeit smaller, centrality. Influencing ‘access’ has the largest direct risk effect, while prioritizing ‘discharges’ has a disproportionately large indirect effect, reducing cascades leading to equipment accidents. Therefore, the optimal portfolio should combine targeted interventions on ‘access’ and harmonized measures on the sub-facet where ‘discharges’, ‘miscellaneous incidents’, and ‘radiological effects’ converge (Figure 9).

The final step is to translate the quantitative results into management decisions. To make the promised end to end interpretability concrete, we provide a worked traceability example that follows one incident narrative through the entire pipeline (as an illustrative example, the graph-based visualization is shown in Figure 10 below). The example shows the input text after preprocessing, the inferred topic mixture, the auto-annotation label assigned to the dominant topics, the resulting PLS factor scores, the regression-based risk estimate, and the resulting prescriptive lever selection under the stated constraints. For readability, the full step by step trace is placed in Appendix B (see Appendix A, Table A1, for the main definitions of the formula–letter notations used in this study), while the portfolio summary below reports the aggregated recommendations for the best performing scenario.

We generated recommendations by aggregating ranked marginal effects from scenario analysis, graph attributes (centrality and bridge positions), and operational constraints/costs. The resulting specification is passed to the LLM module, which generates human-readable actions in an EAM/CMMS-compatible format (type of work, frequency, responsibilities, resources/competencies, checkpoints, and KPIs). To summarize, the first priority is to strictly limit unauthorized access (physical/logical protection, access control to hazardous materials), and then to reduce the coupling vulnerability of the “pollutant discharge” node (continuous monitoring and response procedures), and the “miscellaneous incidents, threats/radiation” package should be closed by joint organizational and procedural measures (standardization of reporting, training, and revision of regulations). In a deployed operational format, this leads to the following plan:

Strengthen the physical protection of NPPs, especially access control to explosives (quarterly inspections; responsible service: security);
Develop and implement a program for regular assessment of equipment vulnerabilities (annually; responsible: engineering department; channel: IAEA standard audit/peer-review) [61,62,63];
Improve monitoring and control of discharges of radioactive substances (continuously; responsible: environmental service; channel: automated monitoring and external data from observation networks);
Conduct enhanced training of accident and incident response personnel (annually; responsible: training department; control: certification and training);
Implement a system for the early detection and prevention of unauthorized access (permanently; responsible: security service; channel: access logs, SIEM);
Strengthen condition monitoring of reactor components and pipelines (quarterly; responsible: engineering department; channel: NDT/diagnostics, predictive maintenance);
Maintain an explosives threat plan (annual update and exercise; responsible service: security).

Thus, the final stage demonstrates the complete cycle inherent in the methodology: from stable semantic coordinates and target factorization to interpretable factor effects, from scenario-based optimization to graph-based bagging of measures and finally to operationalized recommendations ready to be loaded into EAM/CMMS [64,65]. On an approved dataset, this outline showed that the “phenomenon-centered” framework not only explains risk, but also transforms it into a valid management toolkit addressing pre-dominant and bridging phenomena in a coherent network of topics [66].

4. Discussion

Modern industrial systems are creating an increasingly complex and multifaceted technological environment in which heterogeneous data volumes are increasing dramatically. At the same time, much of the applied knowledge about accidents is contained in unstructured texts of reports and notifications, which is beyond the capabilities of traditional numerical risk analysis models. Classical methods of accident risk assessment (FMEA, event trees, Bayesian networks, etc.) face limitations [67,68,69,70,71]. The scarcity of serious incidents and unbalanced reporting create fragile metrics, and the constant drift of technology and regulations rapidly outdates static algorithms. The lack of transparency of “black boxes” further complicates the implementation of their results in practice. Specialists and controlling authorities need the traceability and explainability of each step of the analysis. As a result, there is a gap between the information accumulated in the texts and concrete management decisions. Analytical findings are rarely directly translated into preventive and corrective action plans or applications for asset management systems (EAM/CMMS) [71,72].

The proposed scenario optimization method for the semantic processing of incident texts addresses these challenges. It builds a unified pipeline of analyses from source texts to practical recommendations by combining qualitative information from accident reports with quantitative risk indicators. Clustering and thematic coding transforms incident texts into a set of predominant themes, and partial least squares (PLS) regression with regularization generates stable latent themes and risk factors. The resulting factors are invariant descriptions of key accident causes that remain robust to noise and changes in data distributions.

Our choice of classical LDA as the core topic modeling method is deliberate. The model produces probabilistic themes with explicit word distributions and document-level mixtures, which are easy to inspect, label, and monitor over time. This fits the requirements of regulated risk engineering environments, where expert review, audit trails, and versioning of semantic artifacts are essential. At the same time, modern approaches based on contextual embeddings can provide richer and more flexible representations of incident narratives. Examples include BERT-based topic models, neural topic models, and clustering in transformer-derived vector spaces. In this study we did not integrate these models into the main pipeline. Our focus was on clarifying the phenomenological layer and on demonstrating its integration with PLS factorization and prescriptive optimization in a fully traceable way. A systematic comparison with embedding-based topic discovery and hybrid architectures is an important direction for future work. It will show how stable the proposed phenomena remain when the underlying semantic representation changes and whether contextual models can further improve the robustness of lexical and regulatory drift.

Beyond embedding-based neural topic models, an increasingly active line of research explores GPT-based topic modeling, where large language models are used not only for topic labeling, but for topic discovery itself. In zero-shot or weakly supervised settings, an LLM can induce topic labels, short descriptions, and representative keywords directly from batches of documents, or it can generate intermediate document-level explanations/summaries whose embeddings are subsequently clustered, resulting in hybrid LLM–embedding topic models. Recent comparative evidence suggests that these approaches can improve the human interpretability of topics, but they may also exhibit higher thematic overlap and sensitivity to prompting and model/version choices, making reproducibility control and governance particularly important. Therefore, a key direction for future work is to benchmark our LDA-based phenomenological layer against neural and GPT-based topic modeling under the same drift-aware evaluation protocol and to assess not only topic coherence, but also the downstream stability of PLS factors, risk calibration, and the robustness of the resulting prescriptive portfolios [73,74].

The method then performs multi-criteria scenario optimization. Several quality metrics (fit accuracy, error magnitude, and stability on delayed samples) are used to compare model configurations at once, which prevents overfitting and increases the transferability of results to new cases. The formal structure of the pipeline ensures the full traceability of all steps. Each step—from text preprocessing and topic identification to regression calibration and leverage selection—is documented and can be audited [75,76,77]. Finally, the output of the pipeline is not an abstract mathematical model, but operationalized conclusions. It generates a set of concrete management actions with priorities, resources, and assigned responsibilities, ready to be integrated into EAM/CMMS and checked for compliance with regulatory requirements.

The proposed pipeline has clear practical value as a decision-support tool for organizations that already collect incident narratives and structured maintenance data. Many operators maintain shift logs, incident notifications, investigation notes, and asset records. The method uses these inputs to produce interpretable recurring patterns in the form of topics. It then links these patterns to a proxy risk index through latent factors. Finally, it generates a ranked set of mitigation actions with an estimated effect under resource and policy constraints.

In practice, the outputs can be aligned with maintenance and safety workflows. Recommended measures can be translated into work packages that specify the type of work, frequency, responsible roles, required resources, and control points. These work packages can be added to EAM or CMMS planning as preventive tasks, inspection routines, training activities, or procedural updates. This supports routine use during regular safety reviews and planning meetings. It also helps teams prioritize interventions based on expected impact and available budget.

Operational deployment also requires transparency and traceability. The approach supports expert review of topic labels and key assumptions used in optimization. It provides a clear mapping from detected phenomena to latent factors and then to the risk proxy and selected actions. This makes the method easier to justify in regulated environments and easier to audit. The pipeline can also be used in a continuous monitoring loop. After actions are implemented and new events are recorded, the same workflow can be rerun. This allows teams to check whether risk drivers have shifted and whether the selected measures delivered the expected improvement.

The validity of the approach has been confirmed via experimental validation. During testing on the NRC Event Notifications corpus (about 27,000 incident reports), multi-criteria scenario analysis showed that configurations with PLS factorization and regularized regressors are consistently among the leaders. They simultaneously provide high forecast accuracy, correct calibration of the error bars, and stable operation on deferred data. The advantage of target factorization over “coarse” dimensionality reduction without taking into account the topic (e.g., PCA) is noted separately. Models with thematic components demonstrate more stable metrics and better transferability.

“Unauthorized access” was the dominant theme in reducing the integral risk, while “pollutant discharge” acted as a linking node, connecting different groups of incidents (including “equipment accidents”). This topology reflects the multiplicative effect of integrated measures. Targeting these key topics reduces not only direct risks, but also the associated cascades of other incidents. In the final stage, the optimization results and the structured graph of topics are fed into a recommendation generator, which formulates human-readable and reproducible lists of measures. The resulting management guidelines proved to be reproducible. Second runs of the pipeline on the same dataset generated a similar list of prioritized measures. Thus, the experimental validation demonstrates the full viability of the approach—from semantic text factorization to practical solutions—with the transferability and repeatability of the results [78].

The scalability and application prospects of the method are vast. Thanks to its modular architecture and ontology-based approach, the pipeline can be easily adapted to other areas of critical infrastructure. It can be applied to the oil and gas, chemical, transport, and other sectors by customizing topics and taking into account industry-specific vocabularies. Full traceability of all analysis steps facilitates automated auditing and compliance checking [79]. The entire process—from text processing to recommendation generation—can be documented and verified by regulators and internal compliance agencies. In the future, this unity of formalization and operations will enable cross-domain, cross-sector risk management. By combining semantic information about incidents from different spheres, it is possible to identify common laws and coordinate preventive strategies across industries [80,81,82,83]. The continuous incorporation of new incident data and feedback on implemented measures enables self-learning, bringing management closer to proactive safety monitoring and bringing analytics closer to real-world maintenance planning and inspection processes [84,85].

5. Conclusions

The scenario optimization method of the semantic processing of textual descriptions of incidents proposed in the paper provides holistic automation of the entire path from raw messages to the development of management decisions. Analysis of the results confirmed that the pipeline is indeed reproducible and transparent. Its stages are formally linked to each other, which guarantees the end-to-end traceability of transformations (data, then model, then solution). The final product of the methodology is a plan for preventive and corrective measures based on semantic analysis of incident texts, which indicates a complete closure of the cycle “text analytics–management action”.

The method demonstrated high stability for data drift. Due to the combined selection of models and multi-criteria validation, its results are stable in case of changes in the statistics of the initial information. At the same time, the obtained conclusions have a high level of interpretability—topics, factor attributes, and optimal “leverage” are presented in human-readable form, which simplifies their verification and practical application. The developed recommendations have been formalized to meet the requirements of EAM/CMMS. They are structured and ready to be directly uploaded into corporate asset management systems. Thus, the proposed method fully satisfies all the stated requirements (drift resistance, reproducibility, interpretability, traceability, and compatibility with EAM/CMMS).

We plan to strengthen the link between detected phenomena and operational outcomes. We will evaluate additional risk proxies and use multi-objective formulations that optimize risk reduction, cost, and implementation effort at the same time. We will also improve uncertainty handling and interpretability. We will quantify uncertainty in both prediction and optimization results, and we will provide clearer explanations for recommended interventions. In addition, we will develop tighter integration with EAM and CMMS processes. We will test online model updates, human-in-the-loop feedback to support continuous monitoring, and the iterative refinement of mitigation strategies.

Finally, we will extend the semantic layer by evaluating neural and GPT-based topic discovery as drop-in alternatives to LDA (while keeping the PLS coupling and prescriptive optimization unchanged), and we will quantify how the choice of topic model affects phenomenon stability under drift, factor loadings, and the set of recommended mitigation actions. In regulated settings, we will specifically study hybrid designs that preserve traceability—e.g., using LLMs for constrained topic induction or for deterministic topic descriptors under fixed prompts and decoding—so that the «tokens, phenomena, factors, risk, actions» chain remains auditable.

The developed and tested phenomenon-centered semantic factor method systematically solves the global problem of risk manageability in complex, drifting ecosystems. It restores symmetry between the representation space, factor layer, risk metrics, and management action space, thereby sustainably closing the gap “data → model → solution” and providing tractable, transferable, and ready-to-implement risk management.

Author Contributions

Conceptualization, D.R. and E.K.; methodology, E.K.; software, P.P.; validation, D.R., E.K. and P.P.; formal analysis, E.K.; investigation, D.R.; resources, P.P.; data curation, E.K.; writing—original draft preparation, E.K.; writing—review and editing, E.K.; visualization, P.P.; supervision, D.R.; project administration, D.R.; funding acquisition, D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by “Development of methodology for the formation of a tool base for analysis and modelling of spatial socio-economic development of systems in the conditions of digitalization with reliance on internal reserves” (FSEG-2023-0008).

Institutional Review Board Statement

Ethical review and approval were not required for this study in accordance with local legislation and institutional requirements.

Informed Consent Statement

Ethical review and approval were not required for this study in accordance with local legislation and institutional requirements.

Data Availability Statement

Data supporting the reported results are publicly available from the U.S. Nuclear Regulatory Commission (NRC) data portal: https://www.nrc.gov/data/index.

Acknowledgments

The work was carried out within the framework of the project “Development of methodology for the formation of a tool base for analysis and modeling of spatial socio-economic development of systems in the conditions of digitalization with reliance on internal reserves” (FSEG-2023-0008).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Notations used in the paper.

Symbol	Meaning
$N$	Number of incident reports in the corpus (number of documents).
K	Number of latent topics in the LDA model (dimensionality of the topic space).
$Θ \in R^{N \times K}$	Document–topic matrix; semantic “passport” of incidents, where each row corresponds to an incident and each column to a latent topic.
Z = ΘW	Compact factor kernel (document–factor representation) obtained by projecting through PLS. Used as input to the regression model.
W	Topic–factor loading matrix; weights that map topic coordinates into the factor space.
$y^{proxy}$	Proxy risk index for each record. Scalar score aggregating accident class, 10 CFR codes, off-site notification, narrative length/density, and other normative or “hard” indicators.
$σ (\cdot)$	Logistic normalization function used in the monotonic normalization of the proxy risk index.
$s (\cdot)$	Ordinal coding of the accident class used as one of the inputs to the proxy risk index.
$n$	Number of observations in the training split used in validation and in the minimum detectable effect calculation.
$α$	Significance level in the Fisher Z-based formula for the minimum detectable linear effect. Also, confidence level for tail risk.
$β$	Parameter in the statistical power term in the minimum detectable effect formula. Regression coefficient in the Ridge model when factor contributions are analyzed.
CVaR	Conditional Value at Risk at level. Tail risk metric used in scenario optimization and tail-focused evaluation.
RMSE, MAE	Root Mean Squared Error and Mean Absolute Error. Goodness-of-fit metrics for regression.
MAPE, SMAPE	Mean Absolute Percentage Error and Symmetric Mean Absolute Percentage Error. Scale and calibration error metrics.
MASE	Mean Absolute Scaled Error. Scale error metric normalized by a native benchmark.
$R^{2}$ , EVS	Coefficient of determination and Explained Variance Score. Measures of explained variance of the regression model.
PR-AUC, AUC	Area under the Precision–Recall curve and under the ROC curve. Used to evaluate classification of high-risk cases.
cv_metrics	Group of cross-validation metrics summarizing temporal stability of model performance across rolling windows.
$a = (a_{k})$	$Vector of intervention intensities by topic k . Each a_{k}$ is the intensity of prevented or reduced incidents for topic k.
$Δ Θ (a)$	Change in topic proportions induced by intervention vector a. Deformation of the original document–topic matrix due to selected measures.
$Θ^{'} = Θ - Δ Θ (a)$	Post-intervention topic matrix reflecting the adjusted thematic profile after applying interventions.
$f (\cdot)$	$Trained regression model on PLS components . Maps factor coordinates to predicted risk . In the optimization stage used in y (a) = f (Θ^{'} W)$ .
$y, \hat{y}$	True and predicted values of the (proxy) risk index. Used in calibration diagnostics.
$π (y)$	Permuted target used in permutation control ablations to test whether the pipeline captures real signal or noise.
$γ$	$Regularization strength in “ γ$ -regularization” for enhanced robustness (e.g., Ridge Gamma L/R).

Appendix B

This appendix provides an illustrative end-to-end trace through the proposed pipeline. The purpose is to make the sequence of transformations auditable for readers. The example is representative of the NRC Event Notifications narrative style, but it is anonymized and lightly paraphrased for compactness. The numeric values shown below come from a single pipeline pass under the fixed seeds and fixed settings described in Section 2, including deterministic LDA inference and a deterministic local label generation setup in Ollama.

Raw incident narrative (excerpt): During routine rounds, security personnel identified an individual in a restricted area who did not present valid authorization and was escorted out. A subsequent check found that an exterior access point had not fully latched after maintenance. In parallel, operations reported a minor leakage from a valve packing in an auxiliary water line. The leakage reached a floor drain connected to a monitored sump. No off-site impact was observed. Operators isolated the line and initiated corrective maintenance. Notifications were made in accordance with internal procedures and applicable regulatory reporting requirements.

After cleaning, case normalization, tokenization, and lemmatization, the narrative is converted into a deterministic token stream: routine, round, security, personnel, identify, individual, restricted, area, authorization, escort, subsequent, check, exterior, access, point, latch, maintenance, operation, report, minor, leakage, valve, packing, auxiliary, water, line, floor, drain, monitored, sump, isolate, line, initiate, corrective, maintenance, notification, procedure, regulatory, and report. The processed token stream is used only for topic inference. It is not modified by any downstream stage.

Using the fitted LDA model, the incident is represented as a topic mixture vector over

K

latent topics. The top five topics for this incident are listed below. Topic identifiers are fixed by the trained model. Human-readable labels are produced by the deterministic Ollama-based auto-annotation module. Labels are used for interpretation only. They do not modify the topic mixture values:

Topic 07, weight 0.34, label Unauthorized access;
Topic 12, weight 0.19, label Equipment leakage and valve defects;
Topic 03, weight 0.14, label Pollutant discharge and drainage pathways;
Topic 21, weight 0.10, label Maintenance and work control deviations;
Topic 09, weight 0.07, label Regulatory reporting and notifications;
Remaining topics (sum), weight 0.16, label Mixed background.

For auditability, we also disclose the top words for each dominant topic as used by the auto-annotation module:

Topic 07 (Unauthorized access), top words: access, security, restricted, unauthorized, individual, entry, badge, door, area, escort;
Topic 12 (Equipment leakage and valve defects), top words: leak, valve, packing, line, isolate, pressure, auxiliary, repair, maintenance, defect;
Topic 03 (Pollutant discharge and drainage pathways), top words: drain, sump, discharge, release, water, containment, pathway, monitored, spill, cleanup;
Topic 21 (Maintenance and work control deviations), top words: maintenance, work, latch, procedure, check, inspection, control, backlog, corrective, verify;
Topic 09 (Regulatory reporting and notifications), top words: notify, report, requirement, regulatory, procedure, event, documentation, follow-up, log, compliance.

The Ollama module is executed with a fixed prompt template and constant decoding settings, including zero or near-zero temperature and a fixed seed. Under identical settings, labels are reproduced in more than 95% of repeated runs. When rare wording variants occur, a simple normalization rule maps frequent near-synonyms to one canonical label. The final label inventory is logged together with the topic inputs.

The topic mixture

Θ_{i}

for this incident is projected into the semantic factor space using the fitted PLS projection

W

. The resulting factor vector

Z_{i} = Θ_{i} W

is the input to the regression model. We report both the factor scores and the dominant topic contributors for each factor to preserve interpretability. The factor scores for the example incident (PLS + SumDiff layer, best scenario) are as follows:

Factor 1 score z_1 = 1.18;
Factor 2 score z_2 = 0.64;
Factor 3 score z_3 = 0.22;
Factor 4 score z_4 = −0.11.

The top contributing topics for each factor are identified using the absolute values of the topic-to-factor weights in

W

. Factor 1 is dominated by Topic 07 (Unauthorized access) and Topic 21 (Maintenance and work control deviations). Factor 2 is dominated by Topic 12 (Equipment leakage and valve defects) and Topic 03 (Pollutant discharge and drainage pathways). Factor 3 is dominated by Topic 09 (Regulatory reporting and notifications) and background operational disturbance topics. Factor 4 captures compensating patterns and is small in this example. This makes the factor layer auditable. A reader can see which topics drive each latent risk dimension.

The regression model in the best-performing configuration is Ridge regression in the L/R loop, trained on PLS factors. The proxy risk index is scaled to the unit interval. Higher values correspond to higher estimated risk. For this incident, the model outputs the following:

Observed proxy risk index y_proxy = 0.86;
Predicted proxy risk index $\hat{y}$ = 0.83.

We decompose the prediction into factor contributions using the fitted regression coefficients and the incident factor scores. This is used for interpretability and for leverage ranking in the optimization stage. Factor 1 contributes strongly and positively. This aligns with the unauthorized access signal and the procedural deviation signal. Factor 2 contributes positively. This aligns with the equipment leakage and the drainage pathway signal. Factor 3 contributes moderately. This reflects reporting and compliance language, which correlates with higher-risk classes in the proxy index. Factor 4 has a small stabilizing contribution in this example. The contribution profile matches the narrative content and provides a direct link from language to the predicted risk level.

The prescriptive stage selects a portfolio of topic-level levers

a_{k}

. Each lever represents an intervention intensity applied to a phenomenon. The intervention deforms the incident topic profile using the elasticity structure defined in Section 2, and risk is recomputed through the fixed regressor. Optimization searches for a portfolio that reduces predicted risk under budget and feasibility constraints. Optimization inputs for this incident are as follows:

Baseline predicted risk ${\hat{y}}_{base} = 0.83$ ;
Budget constraint. Normalized to 1.00 unit total cost for the portfolio;
Feasibility constraints. Limits on simultaneous interventions and minimum implementation granularity.

Selected lever portfolio (top interventions):

Lever on Unauthorized access with medium intensity;
Lever on Maintenance and work control deviations with low to medium intensity;
Lever on Pollutant discharge and drainage pathways with low intensity;
Lever bundle on Equipment leakage and valve defects as a corrective–preventive hybrid.

Total normalized cost used 0.97 of the available budget. Predicted risk after interventions

{\hat{y}}_{post} = 0.62

. Relative reduction

Δ \hat{y} \approx 25 %

compared to baseline for this incident profile.

This reduction is achieved by targeting the dominant contributor, which is the access control factor, and by adding a bundle that addresses the equipment and drainage themes that act as secondary drivers. The bundling mechanism is consistent with the graph-based planning logic used in the main text. It reduces both direct contributions and indirect cascades. Based on the selected levers, the action generator produces work packages that match typical EAM/CMMS fields.

Action 1. Access control verification route. This measure is implemented as a weekly preventive inspection route led by the security supervisor. It requires two security officers and an export of access logs for review. The route includes checks of door latch integrity, verification of badge validation procedures, and observation or auditing of potential tailgating events. Performance is monitored through the number of unauthorized entries detected, the number of latch defects identified, and the average closure time for detected issues.

Action 2. Maintenance close-out latch verification step. This measure is implemented as a procedural update that adds a mandatory control point during maintenance close-out. It is executed each time maintenance work is completed and is owned jointly by the maintenance planner and the shift supervisor. Implementation requires updating the close-out checklist and delivering a brief training reminder to relevant staff. The control point introduces two-person verification to confirm that exterior access points are properly latched before returning equipment or areas to service. Effectiveness is tracked by the share of work orders with completed verification and by the recurrence rate of latch-related deviations.

Action 3. Valve packing and leakage monitoring package. This measure is implemented as a condition-based maintenance package managed by the mechanical maintenance lead. It is performed monthly, with immediate follow-up when leakage exceeds a predefined threshold. Implementation requires a leak detection kit and spare packing sets to enable rapid corrective actions. The work includes inspection of packing wear, verification of seal integrity, and post-repair validation to confirm that leakage has been eliminated. The performance is tracked using the leakage recurrence rate and the mean time between leak-related events.

Action 4. Drainage pathway integrity and sump monitoring. This measure is implemented as a monthly environmental control inspection led by the environmental compliance officer. It requires standard sump inspection tools and an agreed sampling or monitoring plan. The inspection covers drain connections, sump alarm functionality, and the integrity of containment pathways to ensure that any releases are detected and managed promptly. Effectiveness is evaluated using the number of unplanned discharges, time-to-detection, and the number of compliance findings related to drainage and containment.

This example shows a complete auditable path from an incident narrative to a quantified risk estimate and then to an optimized, implementable intervention portfolio. The reader can inspect each intermediate artifact. These artifacts include tokens, topic mixture weights, topic labels, factor scores, factor contributions, and the final prescribed actions. Under fixed seeds and fixed model versions, the same input narrative reproduces the same numeric outputs, and the same lever portfolio remains stable under repeated runs within the tolerance described in the stability diagnostics.

References

State of IoT 2024: Number of Connected IoT Devices Growing 13% to 18.8 Billion Globally. Available online: https://iot-analytics.com/number-connected-iot-devices/ (accessed on 24 October 2025).
Unstructured Data: Five Key Statistics You Need to Know. Available online: https://blog.sphereco.com/blog/unstructured-data-5-stats-2 (accessed on 24 October 2025).
3 Surprising Findings in the Global Risks Report 2025. Available online: https://www.weforum.org/stories/2025/01/3-surprising-findings-global-risks-report-2025 (accessed on 24 October 2025).
Schneier, B. Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World; W. W. Norton & Company: New York, NY, USA, 2015; p. 163. [Google Scholar]
25+ Cybersecurity Vulnerability Statistics and Facts of 2024. Available online: https://www.comparitech.com/blog/information-security/cybersecurity-vulnerability-statistics (accessed on 24 October 2025).
Is There a Half-Life of Knowledge? Available online: https://www.tr-academy.com/future-of-learning/is-there-a-half-life-of-knowledge (accessed on 24 October 2025).
Organizational Risk Management Best Practices. Available online: https://www.knowledgeleader.com/blog/organizational-risk-management-best-practices (accessed on 24 October 2025).
Ashby’s Law of Requisite Variety. Available online: https://marcusguest.medium.com/ashbys-law-of-requisite-variety-e9f1dc0c769b (accessed on 24 October 2025).
Revamp Risk Reporting to Drive Executive Action. Available online: https://www.gartner.com/en/articles/risk-reporting (accessed on 24 October 2025).
Global Risks Report Spells Out Top Risks for 2025 and Beyond. Available online: https://www.zurichna.com/knowledge/articles/2025/02/global-risks-report-spells-out-top-risks-for-2025-and-beyond (accessed on 24 October 2025).
Salton, G.; Buckley, C. Term-Weighting Approaches in Automatic Text Retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef]
Manning, C.D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008; 482p, Available online: https://nlp.stanford.edu/IR-book/information-retrieval-book.html (accessed on 15 October 2025).
Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
Scikit-Learn Developers. LatentDirichletAllocation—Scikit-Learn Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html (accessed on 15 October 2025).
Geladi, P.; Kowalski, B. Partial Least-Squares Regression: A Tutorial. Anal. Chim. Acta 1986, 185, 1–17. [Google Scholar] [CrossRef]
Scikit-Learn Developers. PLSRegression—Scikit-Learn Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html (accessed on 15 October 2025).
Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
Scikit-Learn Developers. RidgeCV—Scikit-Learn Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html (accessed on 15 October 2025).
Scikit-Learn Developers. ElasticNet/ElasticNetCV—Scikit-Learn Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html (accessed on 15 October 2025).
Rockafellar, R.T.; Uryasev, S. Optimization of Conditional Value-at-Risk. J. Risk 2000, 2, 21–41. [Google Scholar] [CrossRef]
Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 2016, 104, 148–175. [Google Scholar] [CrossRef]
Hansen, N. The CMA Evolution Strategy: A Tutorial. arXiv 2016, arXiv:1604.00772. [Google Scholar] [CrossRef]
Optuna Team. Optuna: A Hyperparameter Optimization Framework—Documentation. Available online: https://optuna.readthedocs.io/ (accessed on 15 October 2025).
NetworkX Developers. Betweenness_Centrality—NetworkX Documentation. Available online: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html (accessed on 15 October 2025).
Ando, R.K.; Zhang, T. Learning on Graph with Laplacian Regularization. In Advances in Neural Information Processing Systems 19 (NIPS 2006), Proceedings of the 20th Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; Schölkopf, B., Platt, J.C., Hofmann, T., Eds.; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.L.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training Language Models to Follow Instructions with Human Feedback. arXiv 2022, arXiv:2203.02155. [Google Scholar] [CrossRef]
Wong, M.F.; Tan, C.W. Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models. arXiv 2025, arXiv:2503.15129. [Google Scholar] [CrossRef]
Hofmann, T. Probabilistic Latent Semantic Indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’99), Berkeley, CA, USA, 15–19 August 1999; pp. 50–57. [Google Scholar]
Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K.; Harshman, R. Indexing by Latent Semantic Analysis. J. Am. Soc. Inf. Sci. 1990, 41, 391–407. [Google Scholar] [CrossRef]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar] [CrossRef]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed Representations of Words and Phrases and Their Compositionality. In Advances in Neural Information Processing Systems 26 (NIPS 2013); Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2013; pp. 3111–3119. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minnesota, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Curran Associates Inc.: New York, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Lafferty, J.; McCallum, A.; Pereira, F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the 18th International Conference on Machine Learning (ICML 2001), Williamstown, MA, USA, 28 June–1 July 2001; pp. 282–289. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 (NIPS 2017); Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable; Leanpub: Victoria, BC, Canada, 2019. [Google Scholar]
Aven, T. Risk Assessment and Risk Management: Review of Recent Advances on Their Foundation. Eur. J. Oper. Res. 2016, 253, 1–13. [Google Scholar] [CrossRef]
Kaplan, S.; Garrick, B.J. On the Quantitative Definition of Risk. Risk Anal. 1981, 1, 11–27. [Google Scholar] [CrossRef]
Reason, J.T. Human Error; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
Luque, J.; Straub, D. Risk-Based Optimal Inspection Strategies for Structural Systems Using Dynamic Bayesian Networks. Struct. Saf. 2019, 76, 68–80. [Google Scholar] [CrossRef]
Straub, D.; Faber, M.H. Risk-Based Inspection Planning for Structural Systems. Struct. Saf. 2005, 27, 335–355. [Google Scholar] [CrossRef]
Rajpathak, D.; De, S. A Data- and Ontology-Driven Text Mining-Based Construction of Reliability Model to Analyze and Predict Component Failures. Knowl. Inf. Syst. 2015, 46, 87–113. [Google Scholar] [CrossRef]
Jardine, A.K.S.; Lin, D.; Banjevic, D. A Review on Machinery Diagnostics and Prognostics Implementing Condition-Based Maintenance. Mech. Syst. Signal Process. 2006, 20, 1483–1510. [Google Scholar] [CrossRef]
Vachtsevanos, G.; Lewis, F.; Roemer, M.; Hess, A.; Wu, B. Intelligent Fault Diagnosis and Prognosis for Engineering Systems; Wiley: Hoboken, NJ, USA, 2006; 456p. [Google Scholar]
Khan, F.I.; Haddara, M.M. Risk-Based Maintenance (RBM): A Quantitative Approach for Maintenance/Inspection Scheduling and Planning. J. Loss Prev. Process Ind. 2003, 16, 561–573. [Google Scholar] [CrossRef]
Lee, J.; Wu, F.; Zhao, W.; Ghaffari, M.; Liao, L.; Siegel, D. Prognostics and Health Management Design for Rotary Machinery Systems—Reviews, Methodology and Applications. Mech. Syst. Signal Process. 2014, 42, 314–334. [Google Scholar] [CrossRef]
Xu, N.; Ma, L.; Liu, Q.; Wang, L.; Deng, Y. An Improved Text Mining Approach to Extract Safety Risk Factors from Construction Accident Reports. Saf. Sci. 2021, 138, 105216. [Google Scholar] [CrossRef]
Khairuddin, M.Z.F.; Hasikin, K.; Abd Razak, N.A.; Lai, K.W.; Osman, M.Z.; Aslan, M.F.; Sabanci, K.; Azizan, M.M.; Satapathy, S.C.; Wu, X. Predicting Occupational Injury Causal Factors Using Text-Based Analytics: A Systematic Review. Front. Public Health 2022, 10, 984099. [Google Scholar] [CrossRef] [PubMed]
Pham, D.M.; Phyoe, S.M. Text Mining Framework for Predictive Maintenance in Manufacturing. In Proceedings of the 9th International Conference of Asian Society for Precision Engineering and Nanotechnology, ASPEN 2022, Singapore, 15–18 November 2022; pp. 15–18. [Google Scholar]
Bortolini, R.; Forcada, N. Analysis of Building Maintenance Requests Using a Text Mining Approach: Building Services Evaluation. Build. Res. Inf. 2020, 48, 207–217. [Google Scholar] [CrossRef]
Macêdo, J.B.; Moura, M.C.; Aichele, D.; Lins, I.D. Identification of Risk Features Using Text Mining and BERT-Based Models: Application to an Oil Refinery. Process Saf. Environ. Prot. 2022, 158, 382–399. [Google Scholar] [CrossRef]
Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian Optimization of Machine Learning Algorithms. In Advances in Neural Information Processing Systems 25 (NIPS 2012); Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; pp. 2951–2959. [Google Scholar]
Frazier, P.I. A Tutorial on Bayesian Optimization. arXiv 2018, arXiv:1807.02811. [Google Scholar] [CrossRef]
Papakonstantinou, K.G.; Shinozuka, M. Planning Structural Inspection and Maintenance Policies via Dynamic Programming and Markov Processes. Part I: Theory. Reliab. Eng. Syst. Saf. 2014, 130, 202–213. [Google Scholar] [CrossRef]
Valcamonico, D.; Baraldi, P.; Amigoni, F.; Zio, E. Text Mining for the Automatic Classification of Road Accident Reports. In Proceedings of the 30th European Safety and Reliability Conference and the 15th Probabilistic Safety Assessment and Management Conference, ESREL 2020—PSAM 15, Venice, Italy, 1–5 November 2020; Research Publishing: Singapore, 2020. [Google Scholar]
Soomro, A.A. Integrity Assessment of Corroded Oil and Gas Pipelines Using Machine Learning: A Systematic Review. Eng. Fail. Anal. 2022, 131, 105810. [Google Scholar] [CrossRef]
Chowdhury, G.G. Natural Language Processing. Annu. Rev. Inf. Sci. Technol. 2003, 37, 51–89. [Google Scholar] [CrossRef]
Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Pearson/Prentice Hall: Upper Saddle River, NJ, USA, 2010; 1152p. [Google Scholar]
D’Orazio, M.; Bernardini, G.; Di Giuseppe, E. Automatic Detection of the Health Status of Workplaces by Processing Building End-Users’ Maintenance Requests. ITcon 2025, 30, 650–678. [Google Scholar] [CrossRef]
Si, X.-S.; Wang, W.; Hu, C.-H.; Zhou, D.-H. Remaining Useful Life Estimation—A Review on the Statistical Data-Driven Approaches. Eur. J. Oper. Res. 2011, 213, 1–14. [Google Scholar] [CrossRef]
Jardine, A.K.S.; Tsang, A.H.C. Maintenance, Replacement, and Reliability: Theory and Applications, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
ISO 55000:2014; Asset Management—Overview, Principles and Terminology. International Organization for Standardization (ISO): Geneva, Switzerland, 2014.
EN 15341:2019; Maintenance—Maintenance Key Performance Indicators. CEN-CENELEC: Brussels, Belgium, 2019.
Reason, J. The Contribution of Latent Human Failures to the Breakdown of Complex Systems. Philos. Trans. R. Soc. Lond. B Biol. Sci. 1990, 327, 475–484. [Google Scholar] [CrossRef]
Perrow, C. Normal Accidents: Living with High-Risk Technologies, Updated ed.; Princeton University Press: Princeton, NJ, USA, 1999; 464p. [Google Scholar]
Dekker, S. The Field Guide to Understanding ‘Human Error’, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2014; 248p. [Google Scholar]
Wu, X.; Nguyen, T.; Luu, A.T. A Survey on Neural Topic Models: Methods, Applications, and Challenges. Artif. Intell. Rev. 2024, 57, 18. [Google Scholar] [CrossRef]
Litvinova, T.A.; Ippolitov, Y.A.; Seredin, P.V. From STM to GPT: A Comparative Study of Topic Modeling Methods for AI in Dentistry. Res. Result Theor. Appl. Linguist. 2025, 11, 85–121. [Google Scholar] [CrossRef]
Shingo, S. A Study of the Toyota Production System: From an Industrial Engineering Viewpoint; Productivity Press: New York, NY, USA, 1989; 296p. [Google Scholar]
Aven, T.; Zio, E. Some Considerations on the Treatment of Uncertainties in Risk Assessment for Practical Decision-Making. Reliab. Eng. Syst. Saf. 2011, 96, 64–74. [Google Scholar] [CrossRef]
Lee, J.; Bagheri, B.; Kao, H.-A. A Cyber-Physical Systems Architecture for Industry 4.0-Based Manufacturing Systems. Manuf. Lett. 2015, 3, 18–23. [Google Scholar] [CrossRef]
Rajkomar, A.; Oren, E.; Chen, K.; Dai, A.M.; Hajaj, N.; Hardt, M.; Liu, P.J.; Liu, X.; Marcus, J.; Sun, M.; et al. Scalable and Accurate Deep Learning for Electronic Health Records. npj Digit. Med. 2018, 1, 18. [Google Scholar] [CrossRef] [PubMed]
ISO 26262-1:2018; Road Vehicles—Functional Safety—Part 1: Vocabulary. International Organization for Standardization (ISO): Geneva, Switzerland, 2018.
ISO 31000:2018; Risk Management—Guidelines. International Organization for Standardization (ISO): Geneva, Switzerland, 2018.
GOST R ISO/IEC 27005-2021; Information Security, Cybersecurity and Privacy Protection—Guidelines for Information Security Risk Management. Standartinform: Moscow, Russia, 2021.
Hovakimyan, G.; Bravo, J.M. Evolving Strategies in Machine Learning: A Systematic Review of Concept Drift Detection. Information 2024, 15, 786. [Google Scholar] [CrossRef]
Hu, L.; Lu, Y.; Feng, Y. Concept Drift Detection Based on Deep Neural Networks and Autoencoders. Appl. Sci. 2025, 15, 3056. [Google Scholar] [CrossRef]
Costa, A.; Giusti, R.; dos Santos, E.M. Analysis of Descriptors of Concept Drift and Their Impacts. Informatics 2025, 12, 13. [Google Scholar] [CrossRef]
Postiglione, A.; Monteleone, M. Predictive Maintenance with Linguistic Text Mining. Mathematics 2024, 12, 1089. [Google Scholar] [CrossRef]
Zhou, Z.; Huang, J.; Lu, Y.; Ma, H.; Li, W.; Chen, J. A New Text-Mining–Bayesian Network Approach for Identifying Chemical Safety Risk Factors. Mathematics 2022, 10, 4815. [Google Scholar] [CrossRef]

Figure 1. Block diagram of the scenario optimization method of semantic factor analysis of incidents.

Figure 2. Model quality diagram.

Figure 3. Integral horizontal rating of models.

Figure 4. Composite ranking of models by weighted mean (composite score, 0–1).

Figure 5. Composite ranking of models by weighted mean (composite score, 0–1).

Figure 6. Multivariate model ranking (cube of composite scores on axes: X, regressor family; Y, Original/LR; Z, Base/PLS/PLS + SumDiff; color, overall score, 0–1).

Figure 7. CV R² by validation windows, predicted vs. actual (full), |β| Ridge by PLS components.

Figure 8. Permutational importance (top 20 Sum/Diff) for the best scenario.

Figure 9. Polar diagram of differential contributions of themes in the best scenario (best scenario–factor mix).

Figure 10. Graph of interrelations of themes in the best scenario (number of edges: 8).

Table 1. Summary of the reviewed literature, which positions the present study by highlighting both overlaps and the specific gap addressed.

References	Authors	Year	Problems	Solving Methods	Alignment with This Study	Difference and Gap Addressed
[11]	Salton, Buckley	1988	Making unstructured text measurable	Term weighting for retrieval	Supports text feature foundations	Not linked to risk metrics or prescriptive actions
[12]	Manning, Raghavan, Schütze	2008	Retrieval and text processing at scale	IR pipeline and text-processing foundations	Supports preprocessing and representation logic	Not a risk-management loop
[13]	Blei, Ng, Jordan	2003	Discovering latent themes in corpora	LDA topic modeling	Core for semantic coordinates of incidents	Mainly descriptive without target coupling and action optimization
[15]	Geladi, Kowalski	1986	Linking predictors to a target under collinearity	PLS regression	Basis for target-aligned factorization	Does not specify semantic phenomena extraction or intervention portfolios
[16]	Scikit-learn Developers	2025	Practical implementation of PLS for target-aligned dimensionality reduction	API specification and parameterization of PLS regression	We use PLS regression to build target-aligned latent factors from semantic coordinates	This is a software reference. It does not cover drift validation, interpretability chain, or conversion of factors into optimized intervention portfolios
[17]	Hoerl, Kennard	1970	Stable regression with many predictors	Ridge regression	Basis for calibrated, stable estimation	Not tied to semantics or decision layer
[18]	Tibshirani	1996	Sparse model selection	Lasso regularization	Alternative regularization option	Not an end-to-end narrative-to-action method
[19]	Zou, Hastie	2005	Sparse and grouped effects	Elastic net regularization	Alternative regularization option	Not a risk-to-actions framework
[22]	Rockafellar, Uryasev	2000	Tail-risk-aware decision objectives	CVaR optimization	Supports tail-focused risk minimization	Needs an actionable intervention space derived from narratives
[23]	Shahriari et al.	2016	Efficient search for optimal decisions	Bayesian optimization review	Supports scenario optimization block	Not specific to risk narratives or interpretability requirements
[27]	Ando, Zhang	2007	Learning with relational smoothness	Graph Laplacian regularization	Supports graph-regularized prescriptions	Does not define phenomena or EAM/CMMS-ready outputs
[38]	Ribeiro, Singh, Guestrin	2016	Local interpretability of predictions	LIME explanations	Motivates auditability and transparency	Not a full risk-management pipeline
[39]	Lundberg, Lee	2017	Consistent feature attribution	SHAP explanations	Motivates traceability of drivers	Not specific to narrative-driven risk or prescriptive steps
[42]	Aven	2016	Risk assessment foundations and limits	Review of risk assessment	Frames uncertainty and decision relevance	No text-to-phenomena operationalization and no optimization layer
[43]	Kaplan, Garrick	1981	Formal definition of risk	Conceptual quantitative framing	Supports risk definition view	No method for unstructured narratives
[45]	Luque, Straub	2019	Risk-based inspection under dynamics	Dynamic Bayesian networks	Aligns with risk-to-action coupling	Typically assumes structured engineering inputs
[46]	Straub, Faber	2005	Planning inspection using risk	Risk-based inspection planning	Aligns with operational decision orientation	Does not address semantic extraction from texts
[47]	Rajpathak, De	2015	Reliability modeling from textual sources	Text mining and ontology	Shares use of text for reliability	No target-aligned factor kernel plus portfolio optimization
[48]	Jardine, Lin, Banjevic	2006	CBM diagnostics and prognostics	Review of CBM approaches	Provides maintenance context	Focus on structured condition data more than narrative phenomena
[50]	Khan, Haddara	2003	Risk-based maintenance scheduling	Quantitative RBM	Aligns with maintenance decision focus	No unstructured narrative integration and drift-aware loop
[52]	Xu et al.	2021	Extracting safety risk factors	Text mining accident reports	Shares narrative-to-factor idea	Often stops at extraction without prescriptive optimization
[53]	Khairuddin et al.	2022	Evidence on text analytics for causal factors	Systematic review	Confirms relevance of text analytics	Does not propose an integrated phenomena–risk–actions chain

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rodionov, D.; Polyakov, P.; Konnikov, E. Phenomenological Semantic Factor Method for Risk Management of Complex Systems in Drifting. Big Data Cogn. Comput. 2026, 10, 21. https://doi.org/10.3390/bdcc10010021

AMA Style

Rodionov D, Polyakov P, Konnikov E. Phenomenological Semantic Factor Method for Risk Management of Complex Systems in Drifting. Big Data and Cognitive Computing. 2026; 10(1):21. https://doi.org/10.3390/bdcc10010021

Chicago/Turabian Style

Rodionov, Dmitry, Prohor Polyakov, and Evgeniy Konnikov. 2026. "Phenomenological Semantic Factor Method for Risk Management of Complex Systems in Drifting" Big Data and Cognitive Computing 10, no. 1: 21. https://doi.org/10.3390/bdcc10010021

APA Style

Rodionov, D., Polyakov, P., & Konnikov, E. (2026). Phenomenological Semantic Factor Method for Risk Management of Complex Systems in Drifting. Big Data and Cognitive Computing, 10(1), 21. https://doi.org/10.3390/bdcc10010021

Article Menu

Phenomenological Semantic Factor Method for Risk Management of Complex Systems in Drifting

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI