Machine Learning-Based Semantic Analysis of Scientific Publications for Knowledge Extraction in Safety-Critical Domains

Nosov, Pavlo; Melnyk, Oleksiy; Malaksiano, Mykola; Mamenko, Pavlo; Onyshko, Dmytro; Fomin, Oleksij; Píštěk, Václav; Kučera, Pavel

doi:10.3390/make7040150

Open AccessArticle

Machine Learning-Based Semantic Analysis of Scientific Publications for Knowledge Extraction in Safety-Critical Domains

by

Pavlo Nosov

¹

,

Oleksiy Melnyk

²

,

Mykola Malaksiano

¹,

Pavlo Mamenko

³

,

Dmytro Onyshko

⁴,

Oleksij Fomin

⁵

,

Václav Píštěk

⁶

and

Pavel Kučera

^6,*

¹

Department of Technical Cybernetics and Information Technology, Odesa National Maritime University, Mechnikov 34, 65029 Odessa, Ukraine

²

Department of Navigation and Maritime Safety, Odesa National Maritime University, Mechnikov 34, 65029 Odessa, Ukraine

³

Department of Ship Handling, Kherson State Maritime Academy, Nezalezhnjsti 20, 73003 Kherson, Ukraine

⁴

Department of Ship Computer Systems and Networks, Kherson State Maritime Academy, Nezalezhnjsti 20, 73003 Kherson, Ukraine

⁵

Department of Cars and Carriage Facilities, National Transport University, Kyrylivska Street, 9, 04071 Kyiv, Ukraine

⁶

Institute of Automotive Engineering, Faculty of Mechanical Engineering, Brno University of Technology, Technická 2896/2, 616 69 Brno, Czech Republic

^*

Author to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2025, 7(4), 150; https://doi.org/10.3390/make7040150

Submission received: 7 October 2025 / Revised: 4 November 2025 / Accepted: 21 November 2025 / Published: 24 November 2025

Download

Browse Figures

Versions Notes

Abstract

This article presents the development of a modular software suite for automated analysis of scientific publications in PDF format. The system integrates vectorization, clustering, topic modelling, dimensionality reduction, and fuzzy logic to combine both formal (vector-based) and semantic (topic-based) approaches. Interactive 3D visualization supports intuitive exploration of thematic clusters, allowing users to highlight relevant documents and adjust analytical parameters. Validation on a maritime safety case study confirmed the system’s ability to process large publication collections, identify relevant sources, and reveal underlying knowledge structures. Compared to established frameworks such as PRISMA or Scopus/WoS Analytics, the proposed tool operates directly on full-text content, provides deeper thematic classification, and does not require subscription-based databases. The study also addresses the limitations arising from data bias and reproducibility issues in the semantic interpretability of safety-critical decision-making systems. The approach offers practical value for organizations in safety-critical domains—including transportation, energy, cybersecurity, and human–machine interaction—where rapid access to thematically related research is essential.

Keywords:

intelligent data analysis; artificial intelligence (AI); AI-support; clustering; topic modelling; fuzzy logic; interactive visualization; human factor; decision-support systems; safety-critical systems; transport automation; cybersecurity; human–machine interaction; maritime sector; shipping safety; analytics; semantic classification

1. Introduction

In the digital age, the number of scientific publications across engineering, computer science, and applied domains has been rapidly growing. This expansion significantly complicates the preparation of targeted literature reviews, which now require not only identifying relevant sources but also conducting in-depth thematic analyses. The challenge is particularly acute in interdisciplinary research areas, where traditional bibliographic criteria often fail to capture the semantic overlap between different fields.

At the same time, organizations working in safety-critical sectors—such as transportation, energy, cybersecurity, and human–machine interaction—face an increasing need for continuous monitoring of scientific and technical literature to identify solutions to complex challenges. For instance, investigations in the maritime sector reveal that human error accounts for a large share of operational incidents. Similar findings can be observed in other domains, where operator behaviour, decision-making under stress, or inadequate integration of automated systems may compromise safety. Such cases highlight the urgent need for rapid access to relevant and up-to-date research, particularly studies that model operator behaviour under critical scenarios.

However, due to the lack of specialized analytical tools, this information often remains fragmented and dispersed across journals and technical reports, hidden in sources that are not structurally or thematically connected. There is therefore a clear need for a system capable of automatically analyzing large collections of academic papers and technical reports, extracting key topics, identifying interdisciplinary connections, and presenting results in a user-friendly format for researchers and decision-makers. Unlike established frameworks such as PRISMA, which rely on manual or semi-automated filtering based on formal attributes (e.g., keywords, year of publication, or source), there is increasing demand for tools that can automatically generate thematic maps from user-defined collections of PDF publications. Such tools should seamlessly integrate open-access content (e.g., ResearchGate, MDPI) without requiring costly subscriptions to commercial databases—an important advantage for graduate students and early-career researchers.

Therefore, this study proposes an intelligent analyzer of scientific publications that unifies semantic and formal approaches, integrates expert knowledge, and provides interactive visualization. The tool is designed to support researchers and organizations in safety-critical domains where timely access to relevant knowledge is essential.

2. Problem Statement, Purpose, and Objectives of the Study

Research in safety-critical domains is often characterized by substantial fragmentation, both thematically and structurally. A significant portion of relevant studies is distributed across author-managed repositories, institutional archives, and open-access platforms such as ResearchGate, which typically lack standardized thematic or methodological classifications. This dispersion limits the effectiveness of conventional bibliographic tools for systematically analyzing complex issues such as operator behaviour, risk management, and system integration in critical technical systems.

Existing frameworks for systematic reviews, including PRISMA and Scopus Analytics, also face several constraints. These include reliance on manual relevance assessments, subscription-based access to proprietary databases, and generalized data aggregation that offers limited thematic depth.

Table 1 provides a comparative overview, highlighting how the proposed analyser differs from PRISMA and Scopus/WoS-based approaches with respect to accessibility, selection methodology, depth of content analysis, visualization capabilities, and adaptability to specific research domains.

The absence of a unified thematic classification system in open-access PDF archives makes it difficult to determine the following:

The coverage of specific subtopics (e.g., telerobotics or decision-support);
The depth of research in critical areas (e.g., human–machine interaction in transport automation);
Existing gaps in addressing essential issues such as system integration or behavioural modelling.

Manual review of large publication volumes is also time-consuming, resource-intensive, and increases the risk of overlooking relevant material. The core problem, therefore, lies in the lack of an adaptive, open-access, and automated tool that allows researchers to carry out the following tasks:

Process custom collections of PDF articles independently of commercial platforms;
Perform thematic normalization that accounts for terminology and rubric variations;
Analyze category interrelations and article relevance from the perspective of different engineering and applied domains.

Addressing these challenges will enable the transition from fragmented to systematic and accelerated literature analysis, thereby expanding the toolkit available for developing evidence-based solutions in safety-critical project management.

The relevance of this study stems from the urgent need to design information-analytical tools for managing projects under conditions where human and system factors strongly influence safety outcomes. With rapid advances in automation, artificial intelligence, and decision-support systems, industries ranging from transport and energy to cybersecurity and aerospace face emerging challenges that require continuous monitoring of scientific progress and timely identification of thematically related publications.

To demonstrate the methodology’s practical potential, the approach is validated on a case study in the maritime safety domain, where the human factor remains a major source of operational risk. However, the proposed system is not limited to this application and can be readily adapted to other technical fields facing similar challenges.

Main contributions of this study are as follows:

Development of a modular, ML-based semantic analysis framework that combines vectorization, clustering, and fuzzy logic;
Integration of topic modelling and 3D visualization for comprehensive full-text analysis;
Demonstration of the framework’s applicability to safety-critical domains such as maritime safety and cybersecurity.

3. Materials and Methods

Before developing the user-level analyser of scientific publications, it was necessary to define a collection of articles that reflected the researcher’s field of interest. For validation purposes, the case study focused on project management for maritime transport safety with consideration of the operator–navigator factor. Searches were carried out in open databases such as ResearchGate, MDPI, and Google Scholar, and freely available PDF files were saved. This process produced a corpus of publications to be analyzed (Figure 1).

At this stage, several difficulties arise on the researcher’s side, usually including the language barrier, a deep understanding of the context, and the direction of the research. As a rule, the abstract can provide an idea of the direction of the publication and the main results. Still, even with keywords, it does not simultaneously provide information about its fuzzy affiliation with several scientific rubrics. Therefore, the task is to create and digitize a user database for future use and quick processing. For the case study, the articles selected for the literature analysis were related to maritime safety. However, the same workflow can be applied to any technical domain.

Evidence from [1] emphasizes the importance of human–machine interaction in maritime transport through incident analysis, supporting improvements in cooperation systems. In [2], challenges of HMI faced by navigators of large passenger ships are explored together with suggestions for interface design. Findings in [3] introduce the KONECT methodology, focusing on operator response times during critical events. Results of [4] point to the potential of explainable AI to increase the safety of autonomous ships. As demonstrated in [5], shore-based control centres can oversee autonomous inland waterway vessels, while Ref. [6] envisions a maritime traffic management system based on Multi-Agent Systems (MAS). A framework for training future professionals in autonomous shipping is presented in [7], and Ref. [8] reviews broader implications of autonomous ships on human–machine interactions. Analysis in [9] shows how HMI errors with MAS ships are linked to interface design. An integrated model of human–autonomous system interaction to enhance onboard safety is described in [10]. Moreover, Ref. [11] highlights opportunities for AI in sea search and rescue operations.

Approaches to anomaly detection within complex socio-technical systems are detailed in [12]. According to [13], neural networks can classify ice objects essential for Arctic navigation. Predictive maintenance as a means of ensuring reliability is emphasized in [14]. Deep learning applications to reduce accidents caused by human errors are reported in [15]. As illustrated in [16], machine learning can be combined with platforms to estimate ship drift in open waters. Contributions of [17] incorporate AR technologies to improve visibility in demanding navigation conditions. Reference [18] addresses HMI control architectures in underwater telerobotics. Cross-industry factors shaping remote ship control are investigated in [19]. The evaluation of sailors’ skills in dynamic HMI systems is presented in [20].

The significance of the human factor in collision communication is underlined in [21]. Behavioural aspects of remote-control errors are analyzed in [22]. The influence of route sharing on trust and decision-making appears in [23]. Knowledge modelling to support maritime decision-making is described in [24]. As shown in [25], captain behaviour during collision avoidance can be studied using Apriori algorithms and complex networks.

Machine learning for identifying the right timing of collision avoidance is proposed in [26]. A human–machine coordination model for collision avoidance is introduced in [27]. In [28], virtual reality is applied to test autonomous ship avoidance strategies. Risk assessment through RVM techniques is demonstrated in [29]. Findings in [30] describe a cooperative collision avoidance system integrating human and automated inputs. A hybrid approach combining manual and automatic controls is outlined in [31]. Dissertation results in [32] suggest supportive HMI functions in avoidance systems. Applications of AR in maritime cooperation are reviewed in [33]. As evidenced in [34], inverse control modelling can enhance cargo operations. Navigator action prediction for safety management is detailed in [35]. Qualification restoration for captains under risky conditions is examined in [36]. Pivot point control for solitary vessels is discussed in [37], while Ref. [38] analyses energy dissipation in unavoidable collisions.

Reference [39] describes an intelligent backup control system for ship mechanisms. External influences on seaworthiness are modelled in [40]. In [41], an expert methodology for onboard risk assessment is proposed. Risks of AIS manipulation and countermeasures are considered in [42]. Cybersecurity regulations in maritime transport are analyzed in [43]. The study in [44] examines psychological resilience in the context of excessive smartphone use, which is relevant to teamwork effectiveness. As reported in [45], coping styles are linked to anxiety. Defensive psychological mechanisms during military operations are uncovered in [46]. Prescriptive behavioural styles among activists are described in [47]. Coping strategies during martial law are analyzed in [48], sources [49,50] provide insights into psycho-emotional stability amid social change, relevant to crew adaptation during crises and major maritime safety concerns. Fundamental results on weighted classes and optimization methods in transport systems are presented in [51,52,53], highlighting both the theoretical background and applied approaches to maritime logistics.

Thus, after forming the basic catalogue of publications, we proceed to data processing. To complete the first research task, it is necessary to activate the mechanism for reading metadata of scientific publications and expert evaluations from tabular sources (Excel and CSV formats).

The procedure will include four consecutive stages:

Stage 1. Loading input data from tabular formats (Table 2)

This stage involves the processing of two data sources:

article_metadata.csv or the ArticleMetadata sheet from Art_AI.xlsx—publication metadata (ArticleID, APA, Keywords, etc.).
expert_matrix.csv or the ExpertMembership sheet—expert evaluations of each article’s correspondence to specific rubrics.

Key actions and tools:

pandas.read_excel(), pandas.read_csv()—for importing tables.
ExcelFile(…).sheet_names—for previewing available sheets in the .xlsx file.
Automatic conversion of identifiers (ArticleID) to string type (str) to avoid errors during table merging.

Stage 2. Building the generalized evaluation matrix

At this stage, an Article–Rubric Matrix is formed—a two-dimensional table where rows represent unique articles (ArticleID); columns represent rubrics; and values represent the share of expert assessment of the article’s affiliation with a given rubric (in normalized format: from 0.0 to 1.0).

Platform: Anaconda JupyterLab 3.4.4 environment; programming language—Python.

Output: A DataFrame-type matrix, used as the basis for subsequent clustering, PCA, and graph construction.

Stage 3. Automatic correction of duplicates and syntactic errors in rubrics

Problematic aspect: Manually entered rubrics may contain the following issues: different spelling variants (e.g., “AI/ML for analytic and prediction” and “AI/ML for analytics and prediction”); extra or missing spaces; character case mismatches; or transliteration errors.

Solution:

Forming a reference list of valid rubric names; applying automatic name normalization: .strip()—trimming spaces; .lower()—converting to lowercase; or fuzzy_matching method—finding the closest valid rubric using the difflib.get_close_matches command.

Result: A unified rubric structure is obtained for each article, which reduces the number of columns in the matrix and improves the quality of further analysis.

Stage 4. Saving the unified results

After the matrix has been generated, it is saved for further use in .csv or .xlsx formats:

pivot_expert.to_csv(r”C:\… Art_AI\expert_matrix.csv”)
df_meta.to_csv(r”C:\…Art_AI\ article_metadata.csv”)

These results are then used in: PCA analysis, construction of 3D article graphs by keywords, and fuzzy logic inference.

The preliminary results obtained in stages 1–4 make it possible to define the direction of further development and the expansion of the analyser’s functionality, taking into account the need for ergonomic and adaptive support for the researcher’s work.

Therefore, we now define the formal model for processing and calculating input data to expand the spectrum of analyser functions.

Data Acquisition. The framework operates exclusively on legally accessible sources, including open-access publications and user-provided PDFs with appropriate usage rights. When integrated with subscription-based databases, it accesses content via authenticated institutional APIs, ensuring full compliance with access control policies. The system strictly adheres to COPE and FAIR data ethics standards.

PDF input can be uploaded manually by the user or batch-processed through automated directory scanning. No web scraping is performed without the user’s explicit consent.

4. Formal Model for Analyzing the Core Collection of Scientific Publications

Input Data
1.1.
Primary sets:
A = {a₁, a₂, …, a_n}—the set of scientific articles.
R = {r₁, r₂, …, r_m}—the set of rubrics (topics).
1.2.
Rubric evaluation matrix:
M = [m_ij] ∈ [0, 1]ⁿ^×^m—normalized correspondence matrix between articles ai and rubrics r_j, formed automatically based on TF–IDF or classification models, where m_ij is the degree of relevance of ai to r_j.
1.3.
Article metadata (based on expert evaluation) (Figure 2):
K_i—keywords;
T_i—research objective;
S_i—results;
G_i—identified gaps;
APA_i—bibliographic reference.

Figure 2. Generation of the report on expert evaluation of publications.

Figure 2. Generation of the report on expert evaluation of publications.
Vectorization and Normalization
2.1.
Article rubric vector (1):

${\vec{v}}_{i} = (m_{i 1}, m_{i 2}, \dots, m_{i m}) \in R^{m},$

(1)

2.2.
Normalization (L2) (2):

${\vec{v}}_{i} : = \frac{{\vec{v}}_{i}}{{‖{\vec{v}}_{i}‖}_{2}}, {‖{\vec{v}}_{i}‖}_{2} = \sqrt{\sum_{j = 1}^{m} m_{i j}^{2}},$

(2)

At this stage, component-wise normalization of each article vector is performed, i.e., for each row of matrix M. The norm is used to ensure scale invariance when comparing similarities:

$Ω = \frac{1}{n - 1} X^{T} X,$

(3)

To assess external validity, the selected thematic clusters were manually cross-checked against Scopus and IEEE Xplore metadata to verify topic coherence. Future work will extend the validation to domain-specific databases.
Coverage Analysis and Entropy
3.1.
General article coverage (4):

$S C S_{i} = \sum_{j = 1}^{m} m_{i j},$

(4)

3.2.
Average rubric coverage (5):

${\overline{m}}_{j} = \frac{1}{n} \sum_{j = 1}^{n} m_{i j},$

(5)

3.3.
Semantic entropy of a rubric (according to Shannon) (6):

$p_{i j} = \frac{m_{i j}}{\sum_{i = 1}^{n} m_{i j}}, H (r_{j}) = - \sum_{j = 1}^{n} p_{i j} \log_{2} p_{i j},$

(6)
Dimensionality Reduction (PCA) (Figure 3)
4.1.
Data cantering and feature matrix (7) and (8):

${\vec{v}}_{i} : = {\vec{v}}_{i} - \overline{v}, \overline{v} = \frac{1}{n} \sum_{i = 1}^{n} {\vec{v}}_{i},$

(7)

$X = {[{\vec{v}}_{1}, \dots, {\vec{v}}_{n}]}^{T} \in R^{n \times m},$

(8)

Figure 3. Semantic clustering map illustrating topic relationships across the corpus. Each colour denotes a major thematic group (blue—cybersecurity; green—maritime safety; orange—energy systems; purple—AI safety). Node size indicates topic frequency, while edge thickness represents semantic similarity.

Figure 3. Semantic clustering map illustrating topic relationships across the corpus. Each colour denotes a major thematic group (blue—cybersecurity; green—maritime safety; orange—energy systems; purple—AI safety). Node size indicates topic frequency, while edge thickness represents semantic similarity.

4.2.
Covariance matrix (9):

$Ω = \frac{1}{n - 1} X^{T} X,$

(9)

where Ω is a symmetric matrix subject to spectral decomposition (for PCA).
4.3.
Component selection:
Selection of principal components based on the eigenvalues of the matrix Ω.
The first k principal components are selected to preserve more than 90% of the variance.
Article Clustering
5.1.
Euclidean distance (10):

$d (a_{i}, a_{k}) = \sqrt{\sum_{j = 1}^{m} {(m_{i j} - m_{k i})}^{2}} .$

(10)

5.2.
K-Means (11):

$\underset{C_{1}, \dots, C_{K}}{m i n} \sum_{k = 1}^{K} \sum_{a_{i} \in C_{k}} {‖{\vec{v}}_{i} - {\vec{μ}}_{k}‖}^{2} .$

(11)

where μ_k is the cluster centre.
5.3.
DBSCAN (12):

$N_{ε} (a_{i}) = \{a_{k} | d (a_{i}, a_{k}) \leq ε\} .$

(12)

Silhouette Score method for clustering quality assessment (13):

$s (i) = \frac{b (i) - a (i)}{\max (a (i), b (i))},$

(13)

where a(i) is the average distance to other points in the same cluster and b(i) is the minimum average distance to points in other clusters.
Cosine Similarity Matrix
6.1.
Similarity computation (14):

$\cos ({\vec{v}}_{i}, {\vec{v}}_{k}) = \frac{{\vec{v}}_{i} \cdot {\vec{v}}_{k}}{‖{\vec{v}}_{i}‖ \cdot ‖{\vec{v}}_{k}‖} .$

(14)

6.2.
Use cases:
For building a graph G = (V, E) or for a recommendation system (nearest neighbour search).
Detection of Semantic Gaps
7.1.
GapSet (15):

$G a p S e t = \{r_{j} | {\overline{m}}_{j} < θ, H (r_{j}) > η\},$

(15)

where
θ ∈ [0.05, 0.15]—threshold for weak coverage,
η ∈ [0.5, 1.5]—threshold for informational dispersion.
Recommendation System
8.1.
Averaged profile of nearest neighbours (16):

${\hat{v}}_{i} = \frac{1}{k} \sum_{a_{j} \in N_{k} (a_{i})} {\vec{v}}_{j},$

(16)

8.2.
Recommended rubrics (17) (Figure 4):

$RecRub (a_{i}) = \{r_{j} | {\hat{v}}_{i j} > τ, m_{i j} < δ\},$

(17)

where τ is the importance threshold and δ is the absence threshold in the current profile.

The interaction effects between topic coherence and vector density across key components were further analyzed using fuzzy membership surfaces, as illustrated in Figure 4.

Figure 4. Fuzzy membership surfaces illustrating the interaction between topic coherence and vector density for PCA Component 3 (left) and PCA Component 6 (right).

5. Model Description for 3D Software Visualization of Publication Analysis Results

Text vectorization (Count Vectorizer).
We have a set of articles: D = {d₁, d₂, …, d_n}, and a vocabulary of terms: T = {t₁, t₂, …, t_m}.
Next, we construct a document–term frequency matrix: $X = [x_{i j}] \in Ν_{0}^{n \times m}$ ,
where x_ij—the number of occurrences of term t_j in document d_i,
X—a sparse matrix, which is passed to the dimensionality reduction stage.
Dimensionality reduction (Truncated SVD model).
Transformation of $X \in R^{n \times m}$ into a three-dimensional projection $Z \in R^{n \times 3}$ .
The Truncated SVD model approximates matrix X as (18)

$X \approx Z \cdot W = U Ω V^{T},$

(18)

where $Z = X W^{T} \in R^{n \times 3}$ —coordinates of the documents in the component space,
$W \in R^{n \times 3}$ —matrix of right singular vectors (terms in component space),
W^T—transposed matrix of terms in the new basis.
That is, for each document (19):

$z_{i} = x_{i} W^{T} \in R^{3},$

(19)
Meaning of the Axes (Research Directions—Component k).
Top terms of each axis are defined via the weight influence W_kj, where k = 1, 2, 3. These are the absolute values of the term weights t_j in each component k (Figure 5).

Figure 5. Three-dimensional map of scientific publications generated using Truncated SVD, showing semantic grouping of the maritime safety literature (validation dataset).

Figure 5. Three-dimensional map of scientific publications generated using Truncated SVD, showing semantic grouping of the maritime safety literature (validation dataset).
Cluster assignment (DBSCAN).
Input data: document coordinates $Z = [z_{i}] \in R^{n \times 3}$ .
The DBSCAN algorithm classifies points based on Euclidean distance (20):

$dist (z_{i}, z_{j}) = {‖z_{i} - z_{j}‖}_{2} = \sqrt{\sum_{k = 1}^{3} {(z_{i k} - z_{j k})}^{2}},$

(20)

Parameters: ε = 1.5—neighbourhood radius and minPts = 2—minimum number of points in a neighbourhood. A point z_i belongs to cluster C if it satisfies $|\{z_{j} : ‖z_{i} - z_{j}‖ < ε\}| \geq \min P t s$ . If not, the point is considered noise (cluster = −1). That is, a point z_i in space is considered a “core” if it has more than minPts neighbours.
Vectorization and positioning of the selected user article.
The selected PDF article is converted to text d_user, and vectorized using the previously trained model (21):

$x_{u s e r} = CountVectorizer (d_{u s e r}) \in R^{m},$

(21)

Its coordinates in the component space are defined as (22)

$z_{u s e r} = x_{u s e r} \cdot W^{T} \in R^{3},$

(22)
Space construction.
Each point on the graph is: $Article d_{i} \to {\vec{z}}_{i} = (x_{i}, y_{i}, z_{i}) \in R^{3}$ . Colouring is based on cluster labels C_i, and the user’s article is displayed as a separate point (in red, with a rectangular label): ${\vec{z}}_{u s e r} = (x, y, z) . Cluster = - 1$ .
Axis interpretation.
Each axis is considered as a linear combination of terms (23):

$S V D_{k} (d_{i}) = \sum_{j = 1}^{m} x_{i j} \cdot W_{k, j}, k = 1,2, 3,$

(23)

SVD 1 (x-axis): technical terminology (systems, safety, ship, information); SVD 2 (y-axis): HMI, software, sensors; SVD 3 (z-axis): simulation, maneuvering, testing.
General view of the model:

$d_{i} \to {\vec{z}}_{i} = x_{i} W^{⊤} \in R^{3} \to TruncatedSVD, DBSCAN \to s e m a n t i c a x i s i n t e r p r e t a t i o n$

Precision and recall values indicate the framework’s ability to accurately retrieve and classify semantically relevant publications. Higher precision corresponds to fewer false positives, while a balanced recall ensures comprehensive coverage of thematic content.

The comparative performance results are summarized in Table 3.

In contrast to PRISMA and WoS Analytics, which rely on metadata-based topic classification, the proposed framework achieves greater thematic coherence by analyzing full-text semantics. Quantitatively, it outperforms metadata-based tools in topic cohesion by 8–10%, while maintaining comparable processing times.

6. Development of Software Modules for Analyzing Selected Groups of Publication Collections

For the purpose of automated processing and classification of scientific publications, a modular software system was implemented, which includes the following main components:

Clustering module (KMeans method) distributes the input array of articles into a fixed number of clusters based on the KMeans algorithm. Each article is described in a three-dimensional feature space (x, y, z), and assignment to a cluster is performed by minimizing intra-cluster dispersion.
Clustering module (DBSCAN method) applies the DBSCAN algorithm, which allows clusters of arbitrary shape to be identified without the need to predefine their number. It is particularly effective for identifying dense thematic groups and detecting anomalies.
Clustering quality assessment module performs quality assessment of the formed clusters using internal metrics, in particular the silhouette coefficient and the Davis–Boldin index. This allows you to quantitatively assess the degree of cohesion of clusters and their isolation.
Thematic analysis module performs a frequency analysis of keywords for each cluster. Frequently used terms (with a frequency of more than 10 times) are interpreted as indicators of the cluster’s theme.
Integrated analysis and report generation module summarizes the results of clustering, distance calculation, and thematic analysis. Generates a final report that includes the calculation of Euclidean distances to the user-uploaded article, the determination of cluster centres, and the visualization of results, Figure 6.

All scripts used in this study are internal to the project. Experiments were executed on Windows 10 Pro (22H2) with an 11th-Gen Intel^® Core™ i3-1115G4 @ 3.0 GHz (2C/4T) and 16 GB RAM. During the representative run reported here, the end-to-end runtime for 49 PDFs was 3.8 s (≈0.06 min), which corresponds to a throughput of ≈12.9 PDFs/s; system-level memory in use peaked at ~8.7/15.7 GB (55%), while the CPU boosted to ~3.59 GHz with utilization up to ~64% (Task Manager snapshots). The software stack comprised Python 3.10 (Anaconda) with pandas, numpy, scikit-learn, sentence-transformers, matplotlib, plotly/kaleido, and openpyxl; fixed seeds (random_state) ensure deterministic runs.

The pipeline emits shareable artefacts: (i) article_similarity_report*.csv, (ii) interactive 3D HTML maps (e.g., term_similarity_3d_map.html), (iii) article_coordinates.xlsx, (iv) cluster_analysis.xlsx, and (v) static figure exports (PNG/PDF).

Minimum system requirements (validated): Windows 10/11 (x64); Intel^® Core™ i3 (11th-Gen)/AMD Ryzen 3 or better; ≥8 GB RAM (16 GB recommended); ≥4 GB free disc; and no discrete GPU is required.

7. Discussion

At the final stage of this study, an additional analytical module was developed to extend the functionality of the proposed system. This module enables researchers to evaluate the proximity of individual scientific articles or entire thematic groups to a reference collection of publications. In doing so, it allows for the assessment of semantic distances between newly added articles and the established dataset along three thematic axes (x, y, z). Such an approach supports the identification of relevant works for citation, comparison, or inclusion in further investigations.

The module relies on multidimensional vector representations of textual content derived from PDF preprocessing using established natural language processing techniques (TF–IDF, Doc2Vec, BERT). Dimensionality reduction methods, including PCA, UMAP, and t-SNE, were employed to project these high-dimensional embeddings into a three-dimensional semantic space suitable for interpretation and visualization.

Several assumptions govern the construction of this 3D representation. The axes represent abstract semantic dimensions influenced by preprocessing steps such as scaling and normalization. While the z-axis typically includes both positive and negative values, the x and y coordinates were shifted to the positive domain during preprocessing. This asymmetry must be considered when interpreting spatial relationships between articles.

Clustering algorithms such as DBSCAN and KMeans were used to identify both dense groups and centroid-based clusters. Importantly, these methods operate strictly on the coordinate space without altering the underlying geometry, thus preserving the validity of spatial interpretation.

The integration of this module enabled practical experiments with user-defined groups of articles. Their positioning relative to the main publication cloud revealed the semantic coherence of the core dataset and the topical alignment of new entries. Expert evaluations in the maritime field confirmed these results, thereby reinforcing the validity of the proposed approach.

Performance was assessed using both internal and external measures. Internally, KMeans applied to the 3D embeddings achieved a high silhouette score (best result: 0.953 at k = 2), indicating compact and well-separated thematic clusters. Externally, nearest-neighbour retrieval for new articles showed Top-1 = 0.00 and Recall@3 = 1.00: the correct rubric consistently appeared within the top three neighbours. The rubric distribution was dominated by human factors, risk analysis, and competency modelling, followed by human–machine interaction, ergonomics, and AI/ML for analytics and prediction (Figure 7).

To begin, we will analyze a single selected article:

“Development of control model for loading operations on heavy lift vessels based on inverse algorithm” [34], which is related to maritime safety, although implicitly, and therefore requires detailed analysis by the researcher (Figure 8).

As we can see, the value x = 130.02 along SVD 1 (x-axis), which corresponds to technical terminology, shows a considerable distance from the cloud of the leading group of publications.

Next, we analyze the correspondence of four experimental groups simultaneously:

Group 1:

This group deals with publications [35,36].

The articles in Group 1 focus on improving navigational safety through automated analysis of navigator behaviour under risk conditions.

In article [35], a system is proposed for predicting navigator actions using data mining methods for timely intervention and accident prevention.

In article [36], a method is developed for automated restoration of navigator qualification by detecting errors in their actions under complex situations.

The common direction of both works lies in developing intelligent decision-support systems that analyze navigator behaviour and enhance the efficiency of ship handling under risk.

Group 2:

This group deals with publications [37,38 39].

The articles in Group 2 are focused on the projective application of optimal and intelligent control algorithms for the automation of maneuvering, more accurate trajectory prediction, and rapid response in crisis situations, to eliminate human errors, reduce maneuvering areas, and improve navigational safety.

In particular, the following conclusions are drawn:

Article [37] presents a new method for controlling the pivot point position on conventional single-screw vessels without bow thrusters, refining the “centre of mass–centre of rotation–pivot point model.”
Article [38] describes a method for automatically resetting kinetic energy in the case of an inevitable collision, based on gradient-based optimal control.
In article [39], the concept of an intelligent control system for the redundant structure of vessel executive devices is proposed. The system ensures automatic redistribution control by simultaneously modulating multiple ship actuators.

Group 3:

This group deals with publications [40,41,42,43].

All articles in Group 3 explore modelling approaches, expert evaluation methods, and cyber-regulatory strategies for identifying, forecasting, and minimizing navigational safety risks in the digital maritime environment (Figure 9).

The works in this group develop the direction of intelligent maritime safety management through systemic modelling and cybernetic analysis:

A simulation-based method using Markov processes to predict changes in the ship’s seaworthy condition under the influence of various factors [40];
Evaluation of the quality of risk analysis for shipboard operations through expert review [41];
A study on the impact of AIS manipulation on risk detection, safety, and proposed strategic countermeasures [42];
A comprehensive interdisciplinary model of cybersecurity in maritime transport, including analysis of the regulatory environment, technical solutions, global incident registries, and training initiatives [43].

Group 4:

This group deals with publications [44,45,46,47,48,49].

All articles in Group 4 are focused on studying the psychological state and stress resilience of student and youth populations—particularly maritime cadets and officers ranging from third to chief mates—under conditions of extreme stress exposure.

The studies aim to analyze individual coping strategies, mechanisms of psychological defence, and include the following:

The impact of excessive mobile device use on chronic stress [44];
The role of coping styles and self-regulation in the formation of anxiety [45];
Youth psychological defence strategies [46];
Attributional style as a defence mechanism in the behaviour of student activists [47];
Coping strategies of youth under martial law conditions [48];
Psycho-emotional stability of individuals during periods of societal transformation [49].

The developed modules are applicable across a wide range of safety-critical domains. In this study, their capabilities were demonstrated in the maritime context, where they facilitate efficient retrieval of relevant publications. Although exemplified on maritime safety, the system is general in scope and can be readily transferred to other technical or safety-related areas, such as semantic–geometric feature integration for visual environment assessment [54] or multiscale feature-fusion methods for hyperspectral image classification [55].

As can be observed, as a result of the experiment and the testing of the analyser’s software modules, the articles from Group 3 turned out to be the closest to the core collection of publications. Evaluations from experts in the maritime field further confirm this finding. It indicates the adequacy of the proposed theoretical and practical approaches, software modules, and tools for spatial-semantic visualization.

The developed modules will be of use to organizations across various safety-critical sectors. In this study, their potential was demonstrated for maritime organizations and companies, enabling rapid search and identification of publications. Although demonstrated in maritime safety, the system is domain-independent and readily transferable to other technical and safety-critical fields.

Nevertheless, some drawbacks still remain in the proposed model’s application, as demonstrated in the maritime case study, due to its high degree of adaptability and timeliness. There is a dire need to empirically validate the model when considered under extreme environmental conditions or for another class of ships. Further studies might wish to explore embedding this approach into shipborne decision-support systems and examine the behavioural interaction between human factors and automated modules under real-time operational navigation scenarios. While the proposed system demonstrates robust clustering and interpretability, potential biases arising from unsupervised feature learning and dataset imbalance remain an open limitation. To enhance reproducibility, all scripts and embeddings have been made available for inspection.

The automated semantic analysis of scientific texts may raise ethical concerns regarding data privacy, algorithmic bias, and explainability. In safety-critical domains, ensuring the transparency and accountability of ML-driven insights remains essential. This framework adheres to FAIR and COPE principles, thereby minimizing the risk of data misuse.

8. Conclusions

This study resulted in the development of an intelligent software system for multilevel analysis of scientific publications in PDF format. The proposed approach integrates advanced machine learning methods, clustering algorithms, topic modelling techniques, and high-dimensional data visualization. The architecture ensures full automation across all processing stages. It covers everything from reading metadata and expert evaluations to generating reports, rubric-based classification, semantic grouping, and spatial representation of results.

The inclusion of fuzzy logic enhances interpretability, as numerical outputs are translated into researcher-friendly linguistic assessments (e.g., high, medium, low similarity). In addition, the ability to generate interactive 3D visualizations provides intuitive navigation of clusters, clear identification of the studied publication, and semantic linkage to key terms. Each article is projected into a three-dimensional thematic space that enables semantically meaningful clustering and comparative analysis. Unlike existing frameworks, the proposed system combines full-text semantic analysis with expert rubric mapping, which ensures both depth and adaptability across technical domains.

In summary, the developed system supports both formal, vector-based analysis and semantic interpretation, while incorporating expert rubrics and topic relevance. It enables efficient and scalable processing of large libraries of scientific papers, identification of thematic clusters, and detection of relevant sources for citation, comparison, and research planning. The tool holds practical value for organizations and researchers across multiple sectors—including transportation, energy, cybersecurity, and human–machine interaction—where rapid access to thematically related knowledge is essential for safety-critical project management. Its applicability was demonstrated in a case study on maritime safety, which confirmed the system’s effectiveness and highlighted its broader adaptability to other technical domains.

Future work will focus on extending the analyzer with real-time data integration and validation in additional safety-critical domains, such as aerospace and healthcare. Beyond methodological contributions, the system provides tangible decision-support value: it enables faster identification of thematically related studies, reduces the risk of overlooking critical evidence, and supports practitioners in safety-critical sectors when timely knowledge access can directly influence operational safety. Thus, the proposed solution is not only a methodological advancement in the literature analysis but also a practical tool with the potential to enhance resilience and decision-making across diverse technical domains.

Author Contributions

Conceptualization, O.M. and M.M.; Methodology, P.N. and P.M.; Software, D.O.; Validation, V.P., P.K. and O.F.; Formal analysis, O.M.; Investigation, M.M.; Resources, P.N.; Data curation, D.O.; Writing—original draft preparation, O.M.; Writing—review and editing, V.P., P.K. and O.F.; Visualization, O.M.; Supervision, P.K.; Project administration, P.N.; Funding acquisition, V.P. All authors have read and agreed to the published version of the manuscript.

Funding

This publication was supported by the project “Innovative Technologies for Smart Low Emission Mobilities”, funded as project No. CZ.02.01.01/00/23_020/0008528 by Programme Johannes Amos Comenius, call Intersectoral cooperation.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The authors thank Brno University of Technology for support.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
AIS	Automatic Identification System
APA	American Psychological Association (style)
BERT	Bidirectional Encoder Representations from Transformers
CPU	Central Processing Unit
CSV	Comma-Separated Values
DBSCAN	Density-Based Spatial Clustering of Applications with Noise
ECDIS	Electronic Chart Display and Information System
EMSA	European Maritime Safety Agency
HMI	Human–Machine Interface
HTML	HyperText Markup Language
IMO	International Maritime Organization
KMeans	K-Means clustering
LDA	Latent Dirichlet Allocation
MAS	Multi-Agent Systems
ML	Machine learning
PCA	Principal Component Analysis
PDF	Portable Document Format
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
RAM	Random Access Memory
SVD	model Singular Value Decomposition
TF–IDF	Term Frequency–Inverse Document Frequency
t-SNE	t-distributed Stochastic Neighbour Embedding
UMAP	Uniform Manifold Approximation and Projection
XAI	Explainable AI
TruncatedSVD	Truncated Singular Value Decomposition (as implemented in scikit-learn for dimensionality reduction in sparse matrices)

References

Fan, S.; Shi, K.; Weng, J.; Yang, Z. Letting Losses Be Lessons: Human–Machine Cooperation in Maritime Transport. Reliab. Eng. Syst. Saf. 2025, 253, 110547. [Google Scholar] [CrossRef]
Man, Y.; Brorsson, E.; Björndal, P. Human–Machine Interaction Challenges for Bridge Operations in Large Passenger Ships and Future Improvements from the Deck Officers’ Perspective. Hum. Factors Transp. 2023, 95, 730–739. [Google Scholar] [CrossRef]
Saager, M.; Steinmetz, A.; Osterloh, J.-P.; Naumann, A.; Hahn, A. Ensuring Fast Interaction with HMIs for Safety-Critical Systems: An Extension of the Human–Machine Interface Design Method KONECT. Intell. Hum. Syst. Integr. 2024, 119, 193–203. [Google Scholar] [CrossRef]
Veitch, E.; Alsos, O.A. Human–Centered Explainable Artificial Intelligence for Marine Autonomous Surface Vehicles. J. Mar. Sci. Eng. 2021, 9, 1227. [Google Scholar] [CrossRef]
Peeters, G.; Yayla, G.; Catoor, T.; Van Baelen, S.; Afzal, M.R.; Christofakis, C.; Storms, S.; Boonen, R.; Slaets, P. An Inland Shore Control Centre for Monitoring or Controlling Unmanned Inland Cargo Vessels. J. Mar. Sci. Eng. 2020, 8, 758. [Google Scholar] [CrossRef]
Martelli, M.; Virdis, A.; Gotta, A.; Cassarà, P.; Di Summa, M. An Outlook on the Future Marine Traffic Management System for Autonomous Ships. IEEE Access 2021, 9, 157316–157328. [Google Scholar] [CrossRef]
Emad, G.R.; Narayanan, S.; Kataria, A. On the Road to Autonomous Maritime Transport: A Conceptual Framework to Meet Training Needs for Future Ship Operations. Adv. Transp. 2022, 60, 640–646. [Google Scholar] [CrossRef]
Li, X.; Yuen, K.F. A Human-Centred Review on Maritime Autonomous Surface Ships: Impacts, Responses, and Future Directions. Transp. Rev. 2024, 44, 791–810. [Google Scholar] [CrossRef]
Liu, J.; Aydın, M.; Akyüz, E.; Arslan, Ö.; Matyar, E.; Kurte, R.E.; Turan, O. Prediction of Human–Machine Interface (HMI) Operational Errors for Maritime Autonomous Surface Ships (MASS). J. Mar. Sci. Technol. 2021, 27, 293–306. [Google Scholar] [CrossRef]
Song, R.; Papadimitriou, E.; Negenborn, R.R.; van Gelder, P. Safety and Efficiency of Human–MASS Interactions: Towards an Integrated Framework. J. Mar. Eng. Technol. 2024, 24, 159–178. [Google Scholar] [CrossRef]
Gözalan, A.; John, O.; Lübcke, T.; Maier, A.; Reimann, M.; Richter, J.-G.; Zverev, I. Assisting Maritime Search and Rescue (SAR) Personnel with AI-Based Speech Recognition and Smart Direction Finding. J. Mar. Sci. Eng. 2020, 8, 818. [Google Scholar] [CrossRef]
Danial, S.N.; Smith, D.; Veitch, B. A Method to Detect Anomalies in Complex Socio-Technical Operations Using Structural Similarity. J. Mar. Sci. Eng. 2021, 9, 212. [Google Scholar] [CrossRef]
Pedersen, O.-M.; Kim, E. Arctic Vision: Using Neural Networks for Ice Object Classification, and Controlling How They Fail. J. Mar. Sci. Eng. 2020, 8, 770. [Google Scholar] [CrossRef]
Simion, D.; Postolache, F.; Fleacă, B.; Fleacă, E. AI-Driven Predictive Maintenance in Modern Maritime Transport-Enhancing Operational Efficiency and Reliability. Appl. Sci. 2024, 14, 9439. [Google Scholar] [CrossRef]
Cao-Feijóo, G.; Pérez-Canosa, J.M.; Pérez-Castelo, F.J.; Orosa, J.A. Deep Learning Methods to Mitigate Human-Factor-Related Accidents in Maritime Transport. J. Mar. Sci. Eng. 2024, 12, 1819. [Google Scholar] [CrossRef]
Bhaganagar, K.; Kolar, P.; Faruqui, S.H.A.; Bhattacharjee, D.; Alaeddini, A.; Subbarao, K. A Novel Machine-Learning Framework with a Moving Platform for Maritime Drift Calculations. Front. Mar. Sci. 2022, 9, 831501. [Google Scholar] [CrossRef]
Bandara, D.; Woodward, M.; Chin, C.; Jiang, D. Augmented Reality Lights for Compromised Visibility Navigation. J. Mar. Sci. Eng. 2020, 8, 1014. [Google Scholar] [CrossRef]
Abdullah, A.; Blow, D.; Chen, R.; Uthai, T.; Du, E.J.; Islam, M.J. Human-Machine Interfaces for Subsea Telerobotics: From Soda-Straw to Natural Language Interactions [Preprint]. arXiv 2024. [Google Scholar] [CrossRef]
Kari, R.; Steinert, M. Human Factor Issues in Remote Ship Operations: Lessons Learned by Studying Different Domains. J. Mar. Sci. Eng. 2021, 9, 385. [Google Scholar] [CrossRef]
Fan, S.; Yang, Z. Analysing Seafarer Competencies in a Dynamic Human–Machine System. Ocean Coast. Manag. 2023, 240, 106662. [Google Scholar] [CrossRef]
Misztal, L.; Hatlas-Sowinska, P. The Impact of the Human Factor on Communication during a Collision Situation in Maritime Navigation. Appl. Sci. 2025, 15, 2797. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, Z.; Wang, X.; Xie, H.; Tao, J.; Wang, J.; Yang, Z. Human Errors Analysis for Remotely Controlled Ships during Collision Avoidance. Front. Mar. Sci. 2024, 11, 1473367. [Google Scholar] [CrossRef]
Aylward, K.; Weber, R.; Man, Y.; Lundh, M.; MacKinnon, S.N. “Are You Planning to Follow Your Route?” The Effect of Route Exchange on Decision Making, Trust, and Safety. J. Mar. Sci. Eng. 2020, 8, 280. [Google Scholar] [CrossRef]
Smith, J.; Yazdanpanah, F.; Thistle, R.; Musharraf, M.; Veitch, B. Capturing Expert Knowledge to Inform Decision Support Technology for Marine Operations. J. Mar. Sci. Eng. 2020, 8, 689. [Google Scholar] [CrossRef]
Wang, S.; Gang, L.; Liu, T.; Lan, Z.; Li, C. Analysis of the Characteristics of Ship Collision-Avoidance Behavior Based on Apriori and Complex Network. J. Mar. Sci. Eng. 2025, 13, 35. [Google Scholar] [CrossRef]
Zhou, Y.; Du, W.; Liu, J.; Li, H.; Grifoll, M.; Song, W.; Zheng, P. Determination of Ship Collision Avoidance Timing Using Machine Learning Method. Sustainability 2024, 16, 4626. [Google Scholar] [CrossRef]
Zheng, J.; Liu, B.; Li, Y.; Huang, C. Unmanned Ship Collision Avoidance Action Plan Deduction Method under Man–Machine Interactive Negotiation in Collision Avoidance Scenarios. J. Mar. Sci. Eng. 2024, 12, 1842. [Google Scholar] [CrossRef]
Zhou, H.; Zheng, M.; Chu, X.; Yu, C.; Lei, J.; Lin, B.; Zhang, K.; Hua, W. Virtual Reality Fusion Testing–Based Autonomous Collision Avoidance of Ships in Open Water: Methods and Practices. J. Mar. Sci. Eng. 2024, 12, 2181. [Google Scholar] [CrossRef]
Park, J.; Jeong, J.-S. An Estimation of Ship Collision Risk Based on Relevance Vector Machine. J. Mar. Sci. Eng. 2021, 9, 538. [Google Scholar] [CrossRef]
Wu, X.; Liu, K.; Zhang, J.; Yuan, Z.; Liu, J.; Yu, Q. An Optimized Collision Avoidance Decision-Making System for Autonomous Ships under Human–Machine Cooperation Situations. J. Adv. Transp. 2021, 2021, 7537825. [Google Scholar] [CrossRef]
Huang, Y.; Chen, L.; Negenborn, R.R.; van Gelder, P.H.A.J.M. A Ship Collision Avoidance System for Human–Machine Co-Operation during Collision Avoidance. Ocean Eng. 2020, 207, 107913. [Google Scholar] [CrossRef]
Huang, Y. Supporting Human–Machine Interaction in Ship Collision Avoidance Systems. Doctoral Dissertation, Delft University of Technology, Delft, The Netherlands, 2019. [Google Scholar] [CrossRef]
van den Oever, F.; Fjeld, M.; Sætrevik, B. A Systematic Literature Review of Augmented Reality for Maritime Collaboration. Int. J. Hum.-Comput. Interact. 2024, 40, 4116–4131. [Google Scholar] [CrossRef]
Solovey, O.; Ben, A.; Dudchenko, S.; Nosov, P. Development of Control Model for Loading Operations on Heavy Lift Vessels Based on Inverse Algorithm. East. Eur. J. Enterp. Technol. 2020, 5, 48–56. [Google Scholar] [CrossRef]
Nosov, P.; Zinchenko, S.; Ben, A.; Prokopchuk, Y.; Mamenko, P.; Popovych, I.; Moiseienko, V.; Kruglyj, D. Navigation Safety Control System Development through Navigator Action Prediction by Data Mining Means. East. Eur. J. Enterp. Technol. 2021, 2, 55–68. [Google Scholar] [CrossRef]
Ponomaryova, V.; Nosov, P.; Ben, A.; Popovych, I.; Prokopchuk, Y.; Mamenko, P.; Dudchenko, S.; Appazov, E.; Sokol, I. Devising an Approach for the Automated Restoration of Shipmaster’s Navigational Qualification Parameters under Risk Conditions. East. Eur. J. Enterp. Technol. 2024, 1, 6–26. [Google Scholar] [CrossRef]
Kobets, V.; Popovych, I.; Zinchenko, S.; Nosov, P.; Tovstokoryi, O.; Kyrychenko, K. Control of the Pivot Point Position of a Conventional Single–Screw Vessel. In CEUR Workshop Proceedings; CEUR-WS: Aachen, Germany, 2023; Volume 3513, pp. 130–140. Available online: https://ceur-ws.org/Vol-3513/paper11.pdf (accessed on 28 August 2025).
Zinchenko, S.; Kyrychenko, K.; Grosheva, O.; Nosov, P.; Popovych, I.; Mamenko, P. Automatic Reset of Kinetic Energy in Case of Inevitable Collision of Ships. In Proceedings of the 13th International Conference on Advanced Computer Information Technologies (ACIT), Wrocław, Poland, 21–23 September 2023; pp. 496–500. [Google Scholar] [CrossRef]
Zinchenko, S.; Kobets, V.; Tovstokoryi, O.; Nosov, P.; Popovych, I. Intelligent System Control of the Vessel Executive Devices Redundant Structure. In CEUR Workshop Proceedings; CEUR-WS: Aachen, Germany, 2023; Volume 3403, pp. 582–594. Available online: https://ceur-ws.org/Vol-3403/paper44.pdf (accessed on 28 August 2025).
Melnyk, O.; Onyshchenko, S.; Onishchenko, O.; Shcherbina, O.; Vasalatii, N. Simulation-Based Method for Predicting Changes in the Ship’s Seaworthy Condition under Impact of Various Factors. In Studies in Systems, Decision and Control; Springer: Cham, Switzerland, 2023; Volume 481, pp. 653–664. [Google Scholar] [CrossRef]
Melnyk, O.; Bychkovsky, Y.; Onishchenko, O.; Onyshchenko, S.; Volianska, Y. Development of the Method of Shipboard Operations Risk Assessment Quality Evaluation Based on Experts Review. In Studies in Systems, Decision and Control; Springer: Cham, Switzerland, 2023; Volume 481, pp. 695–710. [Google Scholar] [CrossRef]
Melnyk, O.; Kuznichenko, S.; Onishchenko, O. Impact of AIS Manipulation on Shipping Safety and Strategic Countermeasures. Lex Portus 2024, 10, 31–39. [Google Scholar] [CrossRef]
Melnyk, O.; Drozdov, O.; Kuznichenko, S. Cybersecurity in Maritime Transport: An International Perspective on Regulatory Frameworks and Countermeasures. Lex Portus 2025, 11, 7–19. [Google Scholar] [CrossRef]
Limasheva, L.; Kariyev, A.; Berdibayeva, S.; Dautkaliyeva, P. Chronic Stress and Resilience of Students in Conditions of Excessive Mobile Phone Usage. Insight Psychol. Dimens. Soc. 2025, 13, 174–194. Available online: https://insight.journal.kspu.edu/index.php/insight/article/view/293 (accessed on 28 August 2025).
Kalamazh, R.; Voloshyna-Narozhna, V.; Tymoshchuk, Y.; Balashov, E. Coping Styles and Self-Regulation Abilities as Predictors of Anxiety. Insight Psychol. Dimens. Soc. 2024, 12, 96–114. [Google Scholar] [CrossRef]
Plokhikh, V.; Bilous, R. Psychological Defenses of Students in Confronting the Stressors of Military Actions. Insight Psychol. Dimens. Soc. 2025, 13, 18–48. [Google Scholar] [CrossRef]
Bokhonkova, Y.; Serbin, I.; Zavatska, N.; Danko, D. Attributional Style as a Defense Mechanism of Student Activists’ Behavior. Insight Psychol. Dimens. Soc. 2025, 13, 280–304. [Google Scholar] [CrossRef]
Halian, I.; Halian, O.; Myshchyshyn, M. Coping Strategies in the Behavioral Models of Youth in Martial Law Conditions. Insight Psychol. Dimens. Soc. 2024, 12, 44–65. [Google Scholar] [CrossRef]
Raievska, Y.; Kulbida, S.; Lavlinskyy, R.; Senytsia, N. Psycho–Emotional Stability of the Individual in the Period of Societal Transformations. Insight Psychol. Dimens. Soc. 2025, 13, 519–543. Available online: https://ekhsuir.kspu.edu/server/api/core/bitstreams/cf11f1aa-8d4a-46b6-981d-c1eeffbe6866/content (accessed on 1 September 2025).
Melnyk, O.; Onyshchenko, S.; Koryakin, K. Nature and Origin of Major Security Concerns and Potential Threats to the Shipping Industry. Sci. J. Silesian Univ. Technol. Ser. Transp. 2021, 113, 145–153. [Google Scholar] [CrossRef]
Malaksiano, N.A. Exact Inclusions of Gehring Classes in Muckenhoupt Classes. Math. Notes 2001, 70, 673–681. [Google Scholar] [CrossRef]
Lapkina, I.O.; Malaksiano, M.O. Modelling and Optimization of Perishable Cargo Delivery System through Odesa Port. Actual Probl. Econ. 2016, 177, 353–365. [Google Scholar]
Romanuke, V.V.; Romanov, A.Y.; Malaksiano, M.O. A Genetic Algorithm Improvement by Tour Constraint Violation Penalty Discount for Maritime Cargo Delivery. Syst. Res. Inf. Technol. 2023, 2023, 104–126. [Google Scholar] [CrossRef]
Wang, D.; Wang, Y.; Dou, M.; Qiao, M.; Fu, X.; Zhang, Y. Exploring the Nonlinear Impact of Visual Environment on Residents’ Happiness: A Computational Framework Integrating Semantic and Geometric Features. Geo-Spat. Inf. Sci. 2024, 27, 1–27. [Google Scholar] [CrossRef]
Yang, J.; Wu, C.; Du, B.; Zhang, L. Enhanced Multiscale Feature Fusion Network for HSI Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10328–10347. [Google Scholar] [CrossRef]

Figure 1. Architecture of the proposed publication analysis system, illustrating the modular workflow (data collection, preprocessing, clustering, visualization, and reporting).

Figure 6. Integration of clustering, similarity metrics, and thematic analysis in evaluating a user-selected article (case study: maritime safety).

Figure 7. Similarity-weighted rubric distribution of analyzed articles, showing the dominance of human factors, HMI ergonomics, and AI/ML themes.

Figure 8. Three-dimensional mapping of the user-selected article relative to the core dataset, highlighting semantic distance from the main publication cluster.

Figure 9. Proximity analysis showing the semantic positioning of Groups 2 and 3 relative to the core publication dataset, highlighting thematic coherence.

Table 1. Comparison of the proposed analyser with established approaches (PRISMA, Scopus/WoS Analytics) in terms of accessibility, selection methodology, content analysis, visualization, and domain adaptability.

Criterion	PRISMA	Scopus/WoS Analytics	Proposed System
Accessibility	Partly open, requires manual work	Commercial licence, paid access	Open-access PDF inputs; runs with Jupyter/Anaconda; reproducible CSV/HTML/PNG exports; no extra licences
Selection method	Manual/semi-manual	Automated, but metadata only	Automated full-text parsing plus expert rubric mapping; similarity-weighted “truth” and centroid (prototype) assignment; hybrid scoring
Content analysis	Limited, keyword-based	Aggregation of bibliometric data	SBERT embeddings with clustering and topic grouping; interpretable themes; optional expert review loop
Visualization	Basic PRISMA flow	Network maps, graphs	Interactive 2D/3D plots
Domain adaptability	Low	Low	High—validated on maritime safety, but extendable to HMI, AI/ML, cybersecurity, energy, and related domains

Table 2. Preliminary expert matrix created using Google Forms and Google Sheets, linking articles to rubrics and expert assessments.

ArticleID	Rubric	Expert Percent	Expert Summary
1	Human–machine interaction and HMI ergonomics	40	The article focuses on operator-automated subsystem interactions, analysis of HMI performance; it proposes a graph-theoretical approach for identifying bottlenecks in decision regions.
1	Human factors, risk analysis, and competency modelling	40	Primary attention is given to identifying and quantitatively analyzing risk influence factors (RIFs), accounting for their frequency and severity, and based on navigator errors and miscommunication.
1	Decision-support and expert systems	30	A graph-theoretical network is used to support decision-making, visualize key RIFs, and provide recommendations for training and design of MASS systems.
1	AI/ML for analytics and prediction	15	Correlation and network models are applied to uncover hidden relationships among RIFs, in line with data analytics methodologies.
2	Human–machine interaction and HMI ergonomics	40	The study of human–machine interaction on the bridge of large passenger vessels, including key issues of information availability, interface fragmentation, trust, and usability.
2	Human factors, risk analysis, and competency modelling	30	The study conducts an in-depth analysis of navigators’ cognitive workload under abnormal conditions, stress from distractions and limited signals, and the applicable 3D strategies of navigational crews.
2	Decision-support and expert systems	15	Gathering deck officers’ requirements for proactive decision-support and context-sensitive tools lays the foundation for developing DSS/expert systems.

Table 3. Comparative performance with existing NLP-based review tools.

Tool	Avg. Topic Coherence	Processing Time (100 PDFs)	Thematic Depth (Avg. Keywords/Topic)
Proposed Framework	0.71	6.5 min	12.4
CiteSpace (6.4.R2)	0.64	8.3 min	8.9
VOSviewer (1.6.20)	0.60	7.5 min	9.1
OpenAlex Embeddings (0.19)	0.67	6.8 min	10.2

Note: Benchmark conducted on identical 100 maritime safety abstracts using default configurations.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nosov, P.; Melnyk, O.; Malaksiano, M.; Mamenko, P.; Onyshko, D.; Fomin, O.; Píštěk, V.; Kučera, P. Machine Learning-Based Semantic Analysis of Scientific Publications for Knowledge Extraction in Safety-Critical Domains. Mach. Learn. Knowl. Extr. 2025, 7, 150. https://doi.org/10.3390/make7040150

AMA Style

Nosov P, Melnyk O, Malaksiano M, Mamenko P, Onyshko D, Fomin O, Píštěk V, Kučera P. Machine Learning-Based Semantic Analysis of Scientific Publications for Knowledge Extraction in Safety-Critical Domains. Machine Learning and Knowledge Extraction. 2025; 7(4):150. https://doi.org/10.3390/make7040150

Chicago/Turabian Style

Nosov, Pavlo, Oleksiy Melnyk, Mykola Malaksiano, Pavlo Mamenko, Dmytro Onyshko, Oleksij Fomin, Václav Píštěk, and Pavel Kučera. 2025. "Machine Learning-Based Semantic Analysis of Scientific Publications for Knowledge Extraction in Safety-Critical Domains" Machine Learning and Knowledge Extraction 7, no. 4: 150. https://doi.org/10.3390/make7040150

APA Style

Nosov, P., Melnyk, O., Malaksiano, M., Mamenko, P., Onyshko, D., Fomin, O., Píštěk, V., & Kučera, P. (2025). Machine Learning-Based Semantic Analysis of Scientific Publications for Knowledge Extraction in Safety-Critical Domains. Machine Learning and Knowledge Extraction, 7(4), 150. https://doi.org/10.3390/make7040150

Article Menu

Machine Learning-Based Semantic Analysis of Scientific Publications for Knowledge Extraction in Safety-Critical Domains

Abstract

1. Introduction

2. Problem Statement, Purpose, and Objectives of the Study

3. Materials and Methods

4. Formal Model for Analyzing the Core Collection of Scientific Publications

5. Model Description for 3D Software Visualization of Publication Analysis Results

6. Development of Software Modules for Analyzing Selected Groups of Publication Collections

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI