MDPI - Publisher of Open Access Journals

36 pages, 2906 KB

Open AccessReview

Data Organisation for Efficient Pattern Retrieval: Indexing, Storage, and Access Structures

by Paraskevas Koukaras and Christos Tjortjis

Big Data Cogn. Comput. 2025, 9(10), 258; https://doi.org/10.3390/bdcc9100258 - 13 Oct 2025

Viewed by 1517

The increasing scale and complexity of data mining outputs, such as frequent itemsets, association rules, sequences, and subgraphs have made efficient pattern retrieval a critical, yet underexplored challenge. This review addresses the organisation, indexing, and access strategies, which enable scalable and responsive retrieval [...] Read more.

The increasing scale and complexity of data mining outputs, such as frequent itemsets, association rules, sequences, and subgraphs have made efficient pattern retrieval a critical, yet underexplored challenge. This review addresses the organisation, indexing, and access strategies, which enable scalable and responsive retrieval of structured patterns. We examine the underlying types of data and pattern outputs, common retrieval operations, and the variety of query types encountered in practice. Key indexing structures are surveyed, including prefix trees, inverted indices, hash-based approaches, and bitmap-based methods, each suited to different pattern representations and workloads. Storage designs are discussed with attention to metadata annotation, format choices, and redundancy mitigation. Query optimisation strategies are reviewed, emphasising index-aware traversal, caching, and ranking mechanisms. This paper also explores scalability through parallel, distributed, and streaming architectures, and surveys current systems and tools, which integrate mining and retrieval capabilities. Finally, we outline pressing challenges and emerging directions, such as supporting real-time and uncertainty-aware retrieval, and enabling semantic, cross-domain pattern access. Additional frontiers include privacy-preserving indexing and secure query execution, along with integration of repositories into machine learning pipelines for hybrid symbolic–statistical workflows. We further highlight the need for dynamic repositories, probabilistic semantics, and community benchmarks to ensure that progress is measurable and reproducible across domains. This review provides a comprehensive foundation for designing next-generation pattern retrieval systems, which are scalable, flexible, and tightly integrated into analytic workflows. The analysis and roadmap offered are relevant across application areas including finance, healthcare, cybersecurity, and retail, where robust and interpretable retrieval is essential. Full article

► Show Figures

Figure 1

25 pages, 1839 KB

Open AccessArticle

Modeling the Emergence of Insight via Quantum Interference on Semantic Graphs

by Arianna Pavone and Simone Faro

Mathematics 2025, 13(19), 3171; https://doi.org/10.3390/math13193171 - 3 Oct 2025

Viewed by 447

Abstract

Creative insight is a core phenomenon of human cognition, often characterized by the sudden emergence of novel and contextually appropriate ideas. Classical models based on symbolic search or associative networks struggle to capture the non-linear, context-sensitive, and interference-driven aspects of insight. In this [...] Read more.

Creative insight is a core phenomenon of human cognition, often characterized by the sudden emergence of novel and contextually appropriate ideas. Classical models based on symbolic search or associative networks struggle to capture the non-linear, context-sensitive, and interference-driven aspects of insight. In this work, we propose a computational model of insight generation grounded in continuous-time quantum walks over weighted semantic graphs, where nodes represent conceptual units and edges encode associative relationships. By exploiting the principles of quantum superposition and interference, the model enables the probabilistic amplification of semantically distant but contextually relevant concepts, providing a plausible account of non-local transitions in thought. The model is implemented using standard Python 3.10 libraries and is available both as an interactive fully reproducible Google Colab notebook and a public repository with code and derived datasets. Comparative experiments on ConceptNet-derived subgraphs, including the Candle Problem, 20 Remote Associates Test triads, and Alternative Uses, show that, relative to classical diffusion, quantum walks concentrate more probability on correct targets (higher AUC and peaks reached earlier) and, in open-ended settings, explore more broadly and deeply (higher entropy and coverage, larger expected radius, and faster access to distant regions). These findings are robust under normalized generators and a common time normalization, align with our formal conditions for transient interference-driven amplification, and support quantum-like dynamics as a principled process model for key features of insight. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

24 pages, 467 KB

Open AccessArticle

Node Embedding and Cosine Similarity for Efficient Maximum Common Subgraph Discovery

by Stefano Quer, Thomas Madeo, Andrea Calabrese, Giovanni Squillero and Enrico Carraro

Appl. Sci. 2025, 15(16), 8920; https://doi.org/10.3390/app15168920 - 13 Aug 2025

Viewed by 1450

Abstract

Finding the maximum common induced subgraph is a fundamental problem in computer science. Proven to be NP-hard in the 1970s, it has, nowadays, countless applications that still motivate the search for efficient algorithms and practical heuristics. In this work, we extend a state-of-the-art [...] Read more.

Finding the maximum common induced subgraph is a fundamental problem in computer science. Proven to be NP-hard in the 1970s, it has, nowadays, countless applications that still motivate the search for efficient algorithms and practical heuristics. In this work, we extend a state-of-the-art branch-and-bound exact algorithm with new techniques developed in the deep-learning domain, namely graph neural networks and node embeddings, effectively transforming an efficient yet uninformed depth-first search into an effective best-first search. The change enables the algorithm to find suitable solutions within a limited budget, pushing forward the method’s time efficiency and applicability on larger graphs. We evaluate the usage of the L2 norm of the node embeddings and the Cumulative Cosine Similarity to classify the nodes of the graphs. Our experimental analysis on standard graphs compares our heuristic against the original algorithm and a recently tweaked version that exploits reinforcement learning. The results demonstrate the effectiveness and scalability of the proposed approach, compared with the state-of-the-art algorithms. In particular, this approach results in improved results on over 90% of the larger graphs; this would be more challenging in a constrained industrial scenario. Full article

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

► Show Figures

Figure 1

27 pages, 1853 KB

Open AccessArticle

Heterogeneous Graph Structure Learning for Next Point-of-Interest Recommendation

by Juan Chen and Qiao Li

Algorithms 2025, 18(8), 478; https://doi.org/10.3390/a18080478 - 3 Aug 2025

Viewed by 1289

Abstract

Next Point-of-Interest (POI) recommendation is aimed at predicting users’ future visits based on their current status and historical check-in records, providing convenience to users and potential profits to businesses. The Graph Neural Network (GNN) has become a common approach for this task due [...] Read more.

Next Point-of-Interest (POI) recommendation is aimed at predicting users’ future visits based on their current status and historical check-in records, providing convenience to users and potential profits to businesses. The Graph Neural Network (GNN) has become a common approach for this task due to the capabilities of modeling relations between nodes in a global perspective. However, most existing studies overlook the more prevalent heterogeneous relations in real-world scenarios, and manually constructed graphs may suffer from inaccuracies. To address these limitations, we propose a model called Heterogeneous Graph Structure Learning for Next POI Recommendation (HGSL-POI), which integrates three key components: heterogeneous graph contrastive learning, graph structure learning, and sequence modeling. The model first employs meta-path-based subgraphs and the user–POI interaction graph to obtain initial representations of users and POIs. Based on these representations, it reconstructs the subgraphs through graph structure learning. Finally, based on the embeddings from the reconstructed graphs, sequence modeling incorporating graph neural networks captures users’ sequential preferences to make recommendations. Experimental results on real-world datasets demonstrate the effectiveness of the proposed model. Additional studies confirm its robustness and superior performance across diverse recommendation tasks. Full article

► Show Figures

Figure 1

26 pages, 11841 KB

Open AccessArticle

Automatic Extraction of Road Interchange Networks from Crowdsourced Trajectory Data: A Forward and Reverse Tracking Approach

by Fengwei Jiao, Longgang Xiang and Yuanyuan Deng

ISPRS Int. J. Geo-Inf. 2025, 14(6), 234; https://doi.org/10.3390/ijgi14060234 - 17 Jun 2025

Cited by 1 | Viewed by 1390

Abstract

The generation of road interchange networks benefits various applications, such as vehicle navigation and intelligent transportation systems. Traditional methods often focus on common road structures but fail to fully utilize long-term trajectory continuity and flow information, leading to fragmented results and misidentification of [...] Read more.

The generation of road interchange networks benefits various applications, such as vehicle navigation and intelligent transportation systems. Traditional methods often focus on common road structures but fail to fully utilize long-term trajectory continuity and flow information, leading to fragmented results and misidentification of overlapping roads as intersections. To address these limitations, we propose a forward and reverse tracking method for high-accuracy road interchange network generation. First, raw crowdsourced trajectory data is preprocessed by filtering out non-interchange trajectories and removing abnormal data based on both static and dynamic characteristics of the trajectories. Next, road subgraphs are extracted by identifying potential transition nodes, which are verified using directional and distribution information. Trajectory bifurcation is then performed at these nodes. Finally, a two-stage fusion process combines forward and reverse tracking results to produce a geometrically complete and topologically accurate road interchange network. Experiments using crowdsourced trajectory data from Shenzhen demonstrated highly accurate results, with 95.26% precision in geometric road network alignment and 90.06% accuracy in representing the connectivity of road interchange structures. Compared to existing methods, our approach enhanced accuracy in spatial alignment by 13.3% and improved the correctness of structural connections by 12.1%. The approach demonstrates strong performance across different types of interchanges, including cloverleaf, turbo, and trumpet interchanges. Full article

► Show Figures

Figure 1

22 pages, 5186 KB

Open AccessArticle

Explicit and Implicit Feature Contrastive Learning Model for Knowledge Graph Link Prediction

by Xu Yuan, Weihe Wang, Buyun Gao, Liang Zhao, Ruixin Ma and Feng Ding

Sensors 2024, 24(22), 7353; https://doi.org/10.3390/s24227353 - 18 Nov 2024

Viewed by 2455

Abstract

Knowledge graph link prediction is crucial for constructing triples in knowledge graphs, which aim to infer whether there is a relation between the entities. Recently, graph neural networks and contrastive learning have demonstrated superior performance compared with traditional translation-based models; they successfully extracted [...] Read more.

Knowledge graph link prediction is crucial for constructing triples in knowledge graphs, which aim to infer whether there is a relation between the entities. Recently, graph neural networks and contrastive learning have demonstrated superior performance compared with traditional translation-based models; they successfully extracted common features through explicit linking between entities. However, the implicit associations between entities without a linking relationship are ignored, which impedes the model from capturing distant but semantically rich entities. In addition, directly applying contrastive learning based on random node dropout to link prediction tasks, or limiting it to triplet-level, leads to constrained model performance. To address these challenges, we design an implicit feature extraction module that utilizes the clustering characteristics of latent vector space to find entities with potential associations and enrich entity representations by mining similar semantic features from the conceptual level. Meanwhile, the subgraph mechanism is introduced to preserve the structural information of explicitly connected entities. Implicit semantic features and explicit structural features serve as complementary information to provide high-quality self-supervised signals. Experiments are conducted on three benchmark knowledge graph datasets. The results validate that our model outperforms the state-of-the-art baselines in link prediction tasks. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

17 pages, 3833 KB

Open AccessArticle

Dynamic Link Prediction in Jujube Sales Market: Innovative Application of Heterogeneous Graph Neural Networks

by Yichang Wu, Liang Heng, Fei Tan, Jingwen Yang and Li Guo

Appl. Sci. 2024, 14(20), 9333; https://doi.org/10.3390/app14209333 - 13 Oct 2024

Cited by 1 | Viewed by 1910

Abstract

Link prediction is crucial in forecasting potential distribution channels within the dynamic and heterogeneous Xinjiang jujube sales market. This study utilizes knowledge graphs to represent entities and constructs a complex network model for market analysis. Graph neural networks (GNNs) have shown excellent performance [...] Read more.

Link prediction is crucial in forecasting potential distribution channels within the dynamic and heterogeneous Xinjiang jujube sales market. This study utilizes knowledge graphs to represent entities and constructs a complex network model for market analysis. Graph neural networks (GNNs) have shown excellent performance in handling graph-structured data, but they do not necessarily significantly outperform in link prediction tasks due to an overreliance on node features and a neglect of structural information. Additionally, the Xinjiang jujube dataset exhibits unique complexity, including multiple types, attributes, and relationships, distinguishing it from typical GNN datasets such as DBLP and protein-protein interaction datasets. To address these challenges, we introduce the Heterogeneous Multi-Head Attention Graph Neural Network model (HMAGNN). Our methodology involves mapping isomeric nodes to common feature space and labeling nodes using an enhanced Weisfeiler–Lehman (WL) algorithm. We then leverage HMAGNN to learn both structural and attribute features individually. Throughout our experimentation, we identify the critical influence of local subgraph structure and size on link prediction outcomes. In response, we introduce virtual nodes during the subgraph extraction process and conduct validation experiments to underscore the significance of these factors. Compared to alternative models, HMAGNN excels in capturing structural features through our labeling approach and dynamically adapts to identify the most pertinent link information using a multi-head attention mechanism. Extensive experiments on benchmark datasets consistently demonstrate that HMAGNN outperforms existing models, establishing it as a state-of-the-art solution for link prediction in the context of jujube sales market analysis. Full article

► Show Figures

Figure 1

25 pages, 3004 KB

Open AccessArticle

Solving Flexible Job-Shop Scheduling Problem with Heterogeneous Graph Neural Network Based on Relation and Deep Reinforcement Learning

by Hengliang Tang and Jinda Dong

Machines 2024, 12(8), 584; https://doi.org/10.3390/machines12080584 - 22 Aug 2024

Cited by 7 | Viewed by 6359

Abstract

Driven by the rise of intelligent manufacturing and Industry 4.0, the manufacturing industry faces significant challenges in adapting to flexible and efficient production methods. This study presents an innovative approach to solving the Flexible Job-Shop Scheduling Problem (FJSP) by integrating Heterogeneous Graph Neural [...] Read more.

Driven by the rise of intelligent manufacturing and Industry 4.0, the manufacturing industry faces significant challenges in adapting to flexible and efficient production methods. This study presents an innovative approach to solving the Flexible Job-Shop Scheduling Problem (FJSP) by integrating Heterogeneous Graph Neural Networks based on Relation (HGNNR) with Deep Reinforcement Learning (DRL). The proposed framework models the complex relationships in FJSP using heterogeneous graphs, where operations and machines are represented as nodes, with directed and undirected arcs indicating dependencies and compatibilities. The HGNNR framework comprises four key components: relation-specific subgraph decomposition, data preprocessing, feature extraction through graph convolution, and cross-relation feature fusion using a multi-head attention mechanism. For decision-making, we employ the Proximal Policy Optimization (PPO) algorithm, which iteratively updates policies to maximize cumulative rewards through continuous interaction with the environment. Experimental results on four public benchmark datasets demonstrate that our proposed method outperforms four state-of-the-art DRL-based techniques and three common rule-based heuristic algorithms, achieving superior scheduling efficiency and generalization capabilities. This framework offers a robust and scalable solution for complex industrial scheduling problems, enhancing production efficiency and adaptability. Full article

(This article belongs to the Special Issue Smart Manufacturing and Beyond: Bridging Innovation in Industry 4.0 and 5.0)

► Show Figures

Figure 1

20 pages, 11994 KB

Open AccessArticle

Mining Spatial-Temporal Frequent Patterns of Natural Disasters in China Based on Textual Records

by Aiai Han, Wen Yuan, Wu Yuan, Jianwen Zhou, Xueyan Jian, Rong Wang and Xinqi Gao

Information 2024, 15(7), 372; https://doi.org/10.3390/info15070372 - 27 Jun 2024

Cited by 2 | Viewed by 1686

Abstract

Natural disasters pose serious threats to human survival. With global warming, disaster chains related to extreme weather are becoming more common, making it increasingly urgent to understand the relationships between different types of natural disasters. However, there remains a lack of research on [...] Read more.

Natural disasters pose serious threats to human survival. With global warming, disaster chains related to extreme weather are becoming more common, making it increasingly urgent to understand the relationships between different types of natural disasters. However, there remains a lack of research on the frequent spatial-temporal intervals between different disaster events. In this study, we utilize textual records of natural disaster events to mine frequent spatial-temporal patterns of disasters in China. We first transform the discrete spatial-temporal disaster events into a graph structure. Due to the limit of computing power, we reduce the number of edges in the graph based on domain expertise. We then apply the GraMi frequent subgraph mining algorithm to the spatial-temporal disaster event graph, and the results reveal frequent spatial-temporal intervals between disasters and reflect the spatial-temporal changing pattern of disaster interactions. For example, the pattern of sandstorms happening after gales is mainly concentrated within 50 km and rarely happens at farther spatial distances, and the most common temporal interval is 1 day. The statistical results of this study provide data support for further understanding disaster association patterns and offer decision-making references for disaster prevention efforts. Full article

► Show Figures

Figure 1

13 pages, 2270 KB

Open AccessArticle

GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple Documents

by Misael Mongiovì and Aldo Gangemi

Information 2024, 15(6), 318; https://doi.org/10.3390/info15060318 - 29 May 2024

Cited by 2 | Viewed by 1743

Abstract

Finding passages related to a sentence over a large collection of text documents is a fundamental task for claim verification and open-domain question answering. For instance, a common approach for verifying a claim is to extract short snippets of relevant text from a [...] Read more.

Finding passages related to a sentence over a large collection of text documents is a fundamental task for claim verification and open-domain question answering. For instance, a common approach for verifying a claim is to extract short snippets of relevant text from a collection of reference documents and provide them as input to a natural language inference machine that determines whether the claim can be deduced or refuted. Available approaches struggle when several pieces of evidence from different documents need to be combined to make an inference, as individual documents often have a low relevance with the input and are therefore excluded. We propose GRAAL (GRAph-based retrievAL), a novel graph-based approach that outlines the relevant evidence as a subgraph of a large graph that summarizes the whole corpus. We assess the validity of this approach by building a large graph that represents co-occurring entity mentions on a corpus of Wikipedia pages and using this graph to identify candidate text relevant to a claim across multiple pages. Our experiments on a subset of FEVER, a popular benchmark, show that the proposed approach is effective in identifying short passages related to a claim from multiple documents. Full article

(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)

► Show Figures

Figure 1

19 pages, 3690 KB

Open AccessArticle

Embedded Complexity of Evolutionary Sequences

by Jonathan D. Phillips

Entropy 2024, 26(6), 458; https://doi.org/10.3390/e26060458 - 28 May 2024

Viewed by 1289

Abstract

Multiple pathways and outcomes are common in evolutionary sequences for biological and other environmental systems due to nonlinear complexity, historical contingency, and disturbances. From any starting point, multiple evolutionary pathways are possible. From an endpoint or observed state, multiple possibilities exist for the [...] Read more.

Multiple pathways and outcomes are common in evolutionary sequences for biological and other environmental systems due to nonlinear complexity, historical contingency, and disturbances. From any starting point, multiple evolutionary pathways are possible. From an endpoint or observed state, multiple possibilities exist for the sequence of events that created it. However, for any observed historical sequence—e.g., ecological or soil chronosequences, stratigraphic records, or lineages—only one historical sequence actually occurred. Here, a measure of the embedded complexity of historical sequences based on algebraic graph theory is introduced. Sequences are represented as system states S(t), such that S(t − 1) ≠ S(t) ≠ S(t + 1). Each sequence of N states contains nested subgraph sequences of length 2, 3, …, N − 1. The embedded complexity index (which can also be interpreted in terms of embedded information) compares the complexity (based on the spectral radius λ₁) of the entire sequence to the cumulative complexity of the constituent subsequences. The spectral radius is closely linked to graph entropy, so the index also reflects information in the sequence. The analysis is also applied to ecological state-and-transition models (STM), which represent observed transitions, along with information on their causes or triggers. As historical sequences are lengthened (by the passage of time and additional transitions or by improved resolutions or new observations of historical changes), the overall complexity asymptotically approaches λ₁ = 2, while the embedded complexity increases as N^2.6. Four case studies are presented, representing coastal benthic community shifts determined from biostratigraphy, ecological succession on glacial forelands, vegetation community changes in longleaf pine woodlands, and habitat changes in a delta. Full article

(This article belongs to the Special Issue Entropy and Information in Biological Systems)

► Show Figures

Figure 1

21 pages, 6639 KB

Open AccessArticle

Key Vulnerable Nodes Discovery Based on Bayesian Attack Subgraphs and Improved Fuzzy C-Means Clustering

by Yuhua Xu, Yang Liu, Zhixin Sun, Yucheng Xue, Weiliang Liao, Chenlei Liu and Zhe Sun

Mathematics 2024, 12(10), 1447; https://doi.org/10.3390/math12101447 - 8 May 2024

Cited by 4 | Viewed by 1457

Abstract

Aiming at the problem that the search efficiency of key vulnerable nodes in large-scale networks is not high and the consideration factors are not comprehensive enough, in order to improve the time and space efficiency of search and the accuracy of results, a [...] Read more.

Aiming at the problem that the search efficiency of key vulnerable nodes in large-scale networks is not high and the consideration factors are not comprehensive enough, in order to improve the time and space efficiency of search and the accuracy of results, a key vulnerable node discovery method based on Bayesian attack subgraphs and improved fuzzy C-means clustering is proposed. Firstly, the attack graph is divided into Bayesian attack subgraphs, and the analysis results of the complete attack graph are quickly obtained by aggregating the information of the attack path analysis in the subgraph to improve the time and space efficiency. Then, the actual threat features of the vulnerability nodes are extracted from the analysis results, and the threat features of the vulnerability itself in the common vulnerability scoring standard are considered to form the clustering features together. Next, the optimal number of clusters is adaptively adjusted according to the variance idea, and fuzzy clustering is performed based on the extracted clustering features. Finally, the key vulnerable nodes are determined by setting the feature priority. Experiments show that the proposed method can optimize the time and space efficiency of analysis, and the fuzzy clustering considering multiple features can improve the accuracy of analysis results. Full article

(This article belongs to the Special Issue Fuzzy Modeling and Fuzzy Control Systems)

► Show Figures

Figure 1

21 pages, 5185 KB

Open AccessArticle

Meta-Interpretive LEarning with Reuse

by Rong Wang, Jun Sun, Cong Tian and Zhenhua Duan

Mathematics 2024, 12(6), 916; https://doi.org/10.3390/math12060916 - 20 Mar 2024

Viewed by 2006

Abstract

Inductive Logic Programming (ILP) is a research field at the intersection between machine learning and logic programming, focusing on developing a formal framework for inductively learning relational descriptions in the form of logic programs from examples and background knowledge. As an emerging method [...] Read more.

Inductive Logic Programming (ILP) is a research field at the intersection between machine learning and logic programming, focusing on developing a formal framework for inductively learning relational descriptions in the form of logic programs from examples and background knowledge. As an emerging method of ILP, Meta-Interpretive Learning (MIL) leverages the specialization of a set of higher-order metarules to learn logic programs. In MIL, the input includes a set of examples, background knowledge, and a set of metarules, while the output is a logic program. MIL executes a depth-first traversal search, where its program search space expands polynomially with the number of predicates in the provided background knowledge and exponentially with the number of clauses in the program, sometimes even leading to search collapse. To address this challenge, this study introduces a strategy that employs the concept of reuse, specifically through the integration of auxiliary predicates, to reduce the number of clauses in programs and improve the learning efficiency. This approach focuses on the proactive identification and reuse of common program patterns. To operationalize this strategy, we introduce MILER, a novel method integrating a predicate generator, program learner, and program evaluator. MILER leverages frequent subgraph mining techniques to detect common patterns from a limited dataset of training samples, subsequently embedding these patterns as auxiliary predicates into the background knowledge. In our experiments involving two Visual Question Answering (VQA) tasks and one program synthesis task, we assessed MILER’s approach to utilizing reusable program patterns as auxiliary predicates. The results indicate that, by incorporating these patterns, MILER identifies reusable program patterns, reduces program clauses, and directly decreases the likelihood of timeouts compared to traditional MIL. This leads to improved learning success rates by optimizing computational efforts. Full article

► Show Figures

Figure 1

26 pages, 1692 KB

Open AccessArticle

High-Risk HPV Cervical Lesion Potential Correlations Mining over Large-Scale Knowledge Graphs

by Tiehua Zhou, Pengcheng Xu, Ling Wang and Yingxuan Tang

Appl. Sci. 2024, 14(6), 2456; https://doi.org/10.3390/app14062456 - 14 Mar 2024

Cited by 2 | Viewed by 1763

Abstract

Lesion prediction, a very important aspect of cancer disease prediction, is an important marker for patients before they become cancerous. Currently, traditional machine learning methods are gradually applied in disease prediction based on patient vital signs data. Accurate prediction requires a large amount [...] Read more.

Lesion prediction, a very important aspect of cancer disease prediction, is an important marker for patients before they become cancerous. Currently, traditional machine learning methods are gradually applied in disease prediction based on patient vital signs data. Accurate prediction requires a large amount and high quality of data, however, the difficulty in obtaining and incompleteness of electronic medical record (EMR) data leads to certain difficulties in disease prediction by traditional machine learning methods. Secondly, there are many factors that contribute to the development of cervical lesions, some risk factors are directly related to it while others are indirectly related to them. In addition, risk factors have an interactive effect on the development of cervical lesions; it does not occur in isolation, a large-scale knowledge graph is constructed base on the close relationships among risk factors in the literature, and new potential key risk factors are mined based on common risk factors through a subgraph mining method. Then lesion prediction algorithm is proposed to predict the likelihood of lesions in patients base on the set of key risk factors. Experimental results show that the circumvents the problems of large number of missing values in EMR data and discovered key risk factors that are easily ignored but have better prediction effect. Therefore, The method had better accuracy in predicting cervical lesions. Full article

(This article belongs to the Special Issue State-of-the-Art of Knowledge Graphs and Their Applications)

► Show Figures

Figure 1

23 pages, 1470 KB

Open AccessArticle

Progressive Multiple Alignment of Graphs

by Marcos E. González Laffitte and Peter F. Stadler

Algorithms 2024, 17(3), 116; https://doi.org/10.3390/a17030116 - 11 Mar 2024

Cited by 2 | Viewed by 3567

Abstract

The comparison of multiple (labeled) graphs with unrelated vertex sets is an important task in diverse areas of applications. Conceptually, it is often closely related to multiple sequence alignments since one aims to determine a correspondence, or more precisely, a multipartite matching between [...] Read more.

The comparison of multiple (labeled) graphs with unrelated vertex sets is an important task in diverse areas of applications. Conceptually, it is often closely related to multiple sequence alignments since one aims to determine a correspondence, or more precisely, a multipartite matching between the vertex sets. There, the goal is to match vertices that are similar in terms of labels and local neighborhoods. Alignments of sequences and ordered forests, however, have a second aspect that does not seem to be considered for graph comparison, namely the idea that an alignment is a superobject from which the constituent input objects can be recovered faithfully as well-defined projections. Progressive alignment algorithms are based on the idea of computing multiple alignments as a pairwise alignment of the alignments of two disjoint subsets of the input objects. Our formal framework guarantees that alignments have compositional properties that make alignments of alignments well-defined. The various similarity-based graph matching constructions do not share this property and solve substantially different optimization problems. We demonstrate that optimal multiple graph alignments can be approximated well by means of progressive alignment schemes. The solution of the pairwise alignment problem is reduced formally to computing maximal common induced subgraphs. Similar to the ambiguities arising from consecutive indels, pairwise alignments of graph alignments require the consideration of ambiguous edges that may appear between alignment columns with complementary gap patterns. We report a simple reference implementation in Python/NetworkX intended to serve as starting point for further developments. The computational feasibility of our approach is demonstrated on test sets of small graphs that mimimc in particular applications to molecular graphs. Full article

(This article belongs to the Special Issue Graph Algorithms and Graph Labeling)

► Show Figures

Figure 1

Search Results (38)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (38)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI