MDPI - Publisher of Open Access Journals

24 pages, 4158 KB

Open AccessArticle

Federated Learning and Data Mining-Based Botnet Attack Detection Framework for Internet of Things

by Kalupahana Liyanage Kushan Sudheera, Lokuge Lehele Gedara Madhuwantha Priyashan, Oruthota Arachchige Sanduni Pavithra, Malwaththe Widanalage Tharindu Aththanayake, Piyumi Bhagya Sudasinghe, Wijethunga Gamage Chatum Aloj Sankalpa, Gammana Guruge Nadeesha Sandamali and Peter Han Joo Chong

Sensors 2026, 26(5), 1573; https://doi.org/10.3390/s26051573 - 2 Mar 2026

Viewed by 240

Abstract

Botnet attacks in Internet of Things (IoT) environments often occur as multi-stage campaigns, making early and reliable detection difficult across distributed and privacy-sensitive networks. Centralized detection approaches are often limited by heterogeneous traffic characteristics, severe data imbalance, and the need to aggregate large [...] Read more.

Botnet attacks in Internet of Things (IoT) environments often occur as multi-stage campaigns, making early and reliable detection difficult across distributed and privacy-sensitive networks. Centralized detection approaches are often limited by heterogeneous traffic characteristics, severe data imbalance, and the need to aggregate large volumes of raw network data, raising scalability and privacy concerns. To address these challenges, this paper proposes FDA, a federated learning-based and data mining-driven framework for stage-aware botnet attack detection in IoT networks. FDA operates at network gateways, where anomalous traffic is first detected and then abstracted into compact and interpretable patterns using Frequent Itemset Mining (FIM). This pattern-based representation reduces noise and local traffic bias, enabling more robust learning across different IoT networks. Lightweight neural network models are trained locally at gateways, and a global model is learned through federated aggregation of model parameters, avoiding direct sharing of raw network data while enabling gateways to collaboratively learn evolving attack patterns across different IoT networks. Experimental results show that FDA achieves anomaly detection F1-scores above 99% across all gateways and multi-stage botnet attack classification F1-scores in the range of 48–49%, which are comparable to centralized machine-learning baselines while operating under decentralized and privacy-preserving constraints. Overall, FDA provides a practical, privacy-preserving, and effective solution for distributed botnet attack stage detection in real-world IoT deployments. Full article

(This article belongs to the Special Issue Feature Papers in Communications Section 2025–2026)

► Show Figures

Figure 1

34 pages, 9590 KB

Open AccessArticle

Selecting Feature Subsets in Continuous Flow Network Attack Traffic Big Data Using Incremental Frequent Pattern Mining

by Sikha S. Bagui, Andrew Benyacko, Dustin Mink, Subhash C. Bagui and Arijit Bagchi

Algorithms 2025, 18(12), 795; https://doi.org/10.3390/a18120795 - 16 Dec 2025

Viewed by 424

Abstract

This work focuses on finding frequent patterns in continuous flow network traffic Big Data using incremental frequent pattern mining. A newly created Zeek Conn Log MITRE ATT&CK framework labeled dataset, UWF-ZeekData24, generated using the Cyber Range at The University of West Florida, was [...] Read more.

This work focuses on finding frequent patterns in continuous flow network traffic Big Data using incremental frequent pattern mining. A newly created Zeek Conn Log MITRE ATT&CK framework labeled dataset, UWF-ZeekData24, generated using the Cyber Range at The University of West Florida, was used for this study. While FP-Growth is effective for static datasets, its standard implementation does not support incremental mining, which poses challenges for applications involving continuously growing data streams, such as network traffic logs. To overcome this limitation, a staged incremental FP-Growth approach is adopted for this work. The novelty of this work is in showing how incremental FP-Growth can be used efficiently on continuous flow network traffic, or streaming network traffic data, where no rebuild is necessary when new transactions are scanned and integrated. Incremental frequent pattern mining also generates feature subsets that are useful for understanding the nature of the individual attack tactics. Hence, a detailed understanding of the features or feature subsets of the seven different MITRE ATT&CK tactics is also presented. For example, the results indicate that core behavioral rules, such as those involving TCP protocols and service associations, emerge early and remain stable throughout later increments. The incremental FP-Growth framework provides a structured lens through which network behaviors can be observed and compared over time, supporting not only classification but also investigative use cases such as anomaly tracking and technique attribution. And finally, the results of this work, the frequent itemsets, will be useful for intrusion detection machine learning/artificial intelligence algorithms. Full article

► Show Figures

Figure 1

36 pages, 2906 KB

Open AccessReview

Data Organisation for Efficient Pattern Retrieval: Indexing, Storage, and Access Structures

by Paraskevas Koukaras and Christos Tjortjis

Big Data Cogn. Comput. 2025, 9(10), 258; https://doi.org/10.3390/bdcc9100258 - 13 Oct 2025

Cited by 1 | Viewed by 2439

Abstract

The increasing scale and complexity of data mining outputs, such as frequent itemsets, association rules, sequences, and subgraphs have made efficient pattern retrieval a critical, yet underexplored challenge. This review addresses the organisation, indexing, and access strategies, which enable scalable and responsive retrieval [...] Read more.

The increasing scale and complexity of data mining outputs, such as frequent itemsets, association rules, sequences, and subgraphs have made efficient pattern retrieval a critical, yet underexplored challenge. This review addresses the organisation, indexing, and access strategies, which enable scalable and responsive retrieval of structured patterns. We examine the underlying types of data and pattern outputs, common retrieval operations, and the variety of query types encountered in practice. Key indexing structures are surveyed, including prefix trees, inverted indices, hash-based approaches, and bitmap-based methods, each suited to different pattern representations and workloads. Storage designs are discussed with attention to metadata annotation, format choices, and redundancy mitigation. Query optimisation strategies are reviewed, emphasising index-aware traversal, caching, and ranking mechanisms. This paper also explores scalability through parallel, distributed, and streaming architectures, and surveys current systems and tools, which integrate mining and retrieval capabilities. Finally, we outline pressing challenges and emerging directions, such as supporting real-time and uncertainty-aware retrieval, and enabling semantic, cross-domain pattern access. Additional frontiers include privacy-preserving indexing and secure query execution, along with integration of repositories into machine learning pipelines for hybrid symbolic–statistical workflows. We further highlight the need for dynamic repositories, probabilistic semantics, and community benchmarks to ensure that progress is measurable and reproducible across domains. This review provides a comprehensive foundation for designing next-generation pattern retrieval systems, which are scalable, flexible, and tightly integrated into analytic workflows. The analysis and roadmap offered are relevant across application areas including finance, healthcare, cybersecurity, and retail, where robust and interpretable retrieval is essential. Full article

► Show Figures

Figure 1

25 pages, 4023 KB

Open AccessArticle

Recursive Queried Frequent Patterns Algorithm: Determining Frequent Pattern Sets from Database

by Ishtiyaq Ahmad Khan, Hsin-Yuan Chen, Shamneesh Sharma and Chetan Sharma

Information 2025, 16(9), 746; https://doi.org/10.3390/info16090746 - 28 Aug 2025

Viewed by 1176

Abstract

Frequent pattern mining is a fundamental method for Data Mining, applicable in market basket analysis, recommendation systems, and academic analytics. Widely adopted and foundational algorithms such as Apriori and FP-Growth, which represent the standard approaches in frequent pattern mining, face limitations related to [...] Read more.

Frequent pattern mining is a fundamental method for Data Mining, applicable in market basket analysis, recommendation systems, and academic analytics. Widely adopted and foundational algorithms such as Apriori and FP-Growth, which represent the standard approaches in frequent pattern mining, face limitations related to candidate set generation and memory usage, especially when applied to extensive relational datasets. This work presents the Recursive Queried Frequent Patterns (RQFP) algorithm, an SQL-based approach that utilizes recursive queries on relational Mining Tables to detect frequent itemsets without the need for explicit candidate development. The algorithm was implemented using a Microsoft SQL Server and demonstrated through a custom-developed C# web application interface. RQFP facilitates easy integration with database systems and enhances result interpretability. Comparative analyses of Apriori and FP-Growth on an academic dataset reveal competitive efficacy, accompanied with diminished memory requirements and enhanced clarity in pattern extraction. The paper further contextualizes RQFP using benchmark datasets from the previous literature and delineates a roadmap for future evaluations in healthcare and retail data. The existing implementation is educational, although the technique demonstrates the potential for scalable, database-native pattern mining. Full article

(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)

► Show Figures

Figure 1

15 pages, 405 KB

Open AccessArticle

Theoretical Properties of Closed Frequent Itemsets in Frequent Pattern Mining

by Huina Zhang, Hui Li, Yumei Li, Guangqiang Teng and Xianbing Cao

Mathematics 2025, 13(11), 1709; https://doi.org/10.3390/math13111709 - 23 May 2025

Cited by 1 | Viewed by 1044

Abstract

Closed frequent itemsets (CFIs) play a crucial role in frequent pattern mining by providing a compact and complete representation of all frequent itemsets (FIs). This study systematically explores the theoretical properties of CFIs by revisiting closure operators and their fundamental definitions. A series [...] Read more.

Closed frequent itemsets (CFIs) play a crucial role in frequent pattern mining by providing a compact and complete representation of all frequent itemsets (FIs). This study systematically explores the theoretical properties of CFIs by revisiting closure operators and their fundamental definitions. A series of formal properties and rigorous proofs are presented to improve the theoretical understanding of CFIs. Furthermore, we propose confidence interval-based closed frequent itemsets (CICFIs) by integrating frequent pattern mining with probability theory. To evaluate the stability, three classical confidence interval (CI) estimation methods of relative support (rsup) based on the Wald CI, the Wilson CI, and the Clopper–Pearson CI are introduced. Extensive experiments on both an illustrative example and two real datasets are conducted to validate the theoretical properties. The results demonstrate that CICFIs effectively enhance the robustness and interpretability of frequent pattern mining under uncertainty. These contributions not only reinforce the solid theoretical foundation of CFIs but also provide practical insights for the development of more efficient algorithms in frequent pattern mining. Full article

(This article belongs to the Special Issue Advances in Statistical AI and Causal Inference)

► Show Figures

Figure 1

20 pages, 1498 KB

Open AccessArticle

Efficient Discovery of Association Rules in E-Commerce: Comparing Candidate Generation and Pattern Growth Techniques

by Ioan Daniel Hunyadi, Nicolae Constantinescu and Oana-Adriana Țicleanu

Appl. Sci. 2025, 15(10), 5498; https://doi.org/10.3390/app15105498 - 14 May 2025

Cited by 3 | Viewed by 4905

Abstract

Association rule mining plays a critical role in uncovering item correlations and hidden patterns within transactional data, particularly in e-commerce environments. Despite the widespread use of Apriori and FP-Growth algorithms, few studies offer a statistically rigorous, tool-based comparison of their performance on real-world [...] Read more.

Association rule mining plays a critical role in uncovering item correlations and hidden patterns within transactional data, particularly in e-commerce environments. Despite the widespread use of Apriori and FP-Growth algorithms, few studies offer a statistically rigorous, tool-based comparison of their performance on real-world e-commerce data. This paper addresses this gap by evaluating both algorithms in terms of execution time, memory consumption, rule generation volume, and rule strength (support, confidence, and lift). Implementations in RapidMiner and an analysis through SPSS establish statistically significant performance differences, particularly under varying support thresholds. Our findings confirm that FP-Growth consistently outperforms Apriori for large-scale datasets due to its ability to bypass candidate generation, while Apriori retains pedagogical and small-scale relevance. The study contributes practical guidance for data scientists and e-commerce practitioners choosing suitable rule-mining techniques based on their data size and performance constraints. Full article

► Show Figures

Figure 1

21 pages, 2616 KB

Open AccessArticle

Association Analysis of Benzo[a]pyrene Concentration Using an Association Rule Algorithm

by Minyi Wang and Takayuki Kameda

Air 2025, 3(2), 15; https://doi.org/10.3390/air3020015 - 12 May 2025

Viewed by 1307

Abstract

Benzo[a]pyrene is an important indicator of polycyclic aromatic hydrocarbons pollution that exhibits complex atmospheric dynamics influenced by meteorological factors and suspended particulate matter (SPM). Herein, the factors influencing B(a)P concentration were elucidated by analyzing the monthly environmental data for Kyoto, Japan, [...] Read more.

Benzo[a]pyrene is an important indicator of polycyclic aromatic hydrocarbons pollution that exhibits complex atmospheric dynamics influenced by meteorological factors and suspended particulate matter (SPM). Herein, the factors influencing B(a)P concentration were elucidated by analyzing the monthly environmental data for Kyoto, Japan, from 2001 to 2021 using an improved association rule algorithm. Results revealed that B(a)P concentrations were 1.3–3 times higher in cold seasons than in warm seasons and SPM concentrations were lower in cold seasons. The clustering performance was enhanced by optimizing the K-means method using the sum of squared error. The efficiency and reliability of the traditional Apriori algorithm were enhanced by restructuring its candidate itemset generation process, specifically by (1) generating C₂ exclusively from frequent itemset L₁ to avoid redundant database scans and (2) implementing the iterative pruning of nonfrequent subsets during L_k → C_k+1 transitions, adding the lift parameter, and eliminating invalid rules. Strong association rules revealed that B(a)P concentrations ≤ 0.185 ng/m³ were associated with specific meteorological conditions, including humidity ≤ 58%, wind speed ≥ 2 m/s, temperature ≥ 12.3 °C, and pressure ≤ 1009.2 hPa. Among these, changes in pressure had the most substantial impact on the confidence of the association rules, followed by humidity, wind speed, and temperature. Under the influence of high SPM concentrations, favorable meteorological conditions further accelerated pollutant dispersion. B(a)P concentration increased with increasing pressure, decreasing temperature, and decreasing wind speed. Principal component analysis confirmed the robustness and accuracy of our optimized association rule approach in quantifying complex, nonlinear relationships, while providing granular, interpretable insights beyond the traditional methods. Full article

► Show Figures

Figure 1

18 pages, 3640 KB

Open AccessArticle

Accident Factors Importance Ranking for Intelligent Energy Systems Based on a Novel Data Mining Strategy

by Rongbin Li, Jian Zhang and Fangming Deng

Energies 2025, 18(3), 716; https://doi.org/10.3390/en18030716 - 4 Feb 2025

Cited by 1 | Viewed by 1035

Abstract

As global energy networks expand and smart grid technology evolves rapidly, the volume of historical power accident data has increased dramatically, containing valuable risk information that is essential for building efficient public safety early warning systems. This paper introduces an innovative text analysis [...] Read more.

As global energy networks expand and smart grid technology evolves rapidly, the volume of historical power accident data has increased dramatically, containing valuable risk information that is essential for building efficient public safety early warning systems. This paper introduces an innovative text analysis method, the Sparse Coefficient Optimized Weighted FP-Growth Algorithm (SCO-WFP), which is designed to optimize the processing of power accident-related textual data and more effectively uncover hidden patterns behind accidents. The method enhances the evaluation of sparse risk factors by preprocessing, clustering analysis, and calculating piecewise weights of power accident data. The SCO-WFP algorithm is then applied to extract frequent itemsets, revealing deep associations between accident severity and risk factors. Experimental results show that, compared to traditional methods, the SCO-WFP algorithm significantly improves both accuracy and execution speed. The findings demonstrate the method’s effectiveness in mining frequent itemsets from text semantics, facilitating a deeper understanding of the relationship between risk factors and accident severity. Full article

(This article belongs to the Special Issue AI Facilitated Cyber–Physical Energy Systems—Planning, Operation, and Markets)

► Show Figures

Figure 1

26 pages, 3562 KB

Open AccessArticle

A Spatial-Temporal Exploration of Coordination Failures Preceding Coal Mine Explosion Accidents in China

by Wenwen Li, Gu Du, Lu Chen, Ruochen Zhang and An Chen

Sustainability 2025, 17(1), 85; https://doi.org/10.3390/su17010085 - 26 Dec 2024

Cited by 1 | Viewed by 1894

Abstract

Coal remains a crucial component of China’s energy supply, with production exceeding half of the global output in 2023. Despite safety improvements, the fatality rate in coal mining rose significantly, underscoring ongoing safety challenges. A total of 174 coal mine explosion investigation reports [...] Read more.

Coal remains a crucial component of China’s energy supply, with production exceeding half of the global output in 2023. Despite safety improvements, the fatality rate in coal mining rose significantly, underscoring ongoing safety challenges. A total of 174 coal mine explosion investigation reports from China between 2000 and 2024 were analyzed, extracting and mining text related to coordination failures. The texts were categorized by time and region, creating two temporal datasets (2000–2018 and 2019–2024) and six regional datasets (Northeast, East, Central South, Southwest, Northwest, and North China). Using frequent itemset mining and social network construction, the concept of risk propagation was applied to identify the critical paths that lead to coal mine explosions. Over time, coordination failures in China’s coal mine explosions have evolved from localized issues among a few stakeholders to complex, multi-layered challenges involving broader governmental oversight and systemic management issues. Based on regional findings, balancing the frequency and severity of penalties, ensuring meaningful safety inspections, and alleviating the policy pressure on small coal mines are key points for addressing coordination failures. Full article

(This article belongs to the Special Issue Sustainable Risk Management)

► Show Figures

Figure 1

29 pages, 1577 KB

Open AccessArticle

DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset Mining

by Mohsin Shaikh, Sabina Akram, Jawad Khan, Shah Khalid and Youngmoon Lee

Mathematics 2024, 12(24), 3930; https://doi.org/10.3390/math12243930 - 13 Dec 2024

Cited by 3 | Viewed by 1555

Abstract

Traditional approaches to data mining are generally designed for small, centralized, and static datasets. However, when a dataset grows at an enormous rate, the algorithms become infeasible in terms of huge consumption of computational and I/O resources. Frequent itemset mining (FIM) is one [...] Read more.

Traditional approaches to data mining are generally designed for small, centralized, and static datasets. However, when a dataset grows at an enormous rate, the algorithms become infeasible in terms of huge consumption of computational and I/O resources. Frequent itemset mining (FIM) is one of the key algorithms in data mining and finds applications in a variety of domains; however, traditional algorithms do face problems in efficiently processing large and dynamic datasets. This research introduces a distributed incremental approximation frequent itemset mining (DIAFM) algorithm that tackles the mentioned challenges using shard-based approximation within the MapReduce framework. DIAFM minimizes the computational overhead of a program by reducing dataset scans, bypassing exact support checks, and incorporating shard-level error thresholds for an appropriate trade-off between efficiency and accuracy. Extensive experiments have demonstrated that DIAFM reduces runtime by 40–60% compared to traditional methods with losses in accuracy within 1–5%, even for datasets over 500,000 transactions. Its incremental nature ensures that new data increments are handled efficiently without needing to reprocess the entire dataset, making it particularly suitable for real-time, large-scale applications such as transaction analysis and IoT data streams. These results demonstrate the scalability, robustness, and practical applicability of DIAFM and establish it as a competitive and efficient solution for mining frequent itemsets in distributed, dynamic environments. Full article

(This article belongs to the Special Issue Advances in Mathematical Methods for Distributed Learning and High-Dimensional Data Analysis)

► Show Figures

Figure 1

26 pages, 5233 KB

Open AccessArticle

Prompt Update Algorithm Based on the Boolean Vector Inner Product and Ant Colony Algorithm for Fast Target Type Recognition

by Quan Zhou, Jie Shi, Qi Wang, Bin Kong, Shang Gao and Weibo Zhong

Electronics 2024, 13(21), 4243; https://doi.org/10.3390/electronics13214243 - 29 Oct 2024

Viewed by 1372

Abstract

In recent years, data mining technology has become increasingly popular, evolving into an independent discipline as research deepens. This study constructs and optimizes an association rule algorithm based on the Boolean vector (BV) inner product and ant colony optimization to enhance data mining [...] Read more.

In recent years, data mining technology has become increasingly popular, evolving into an independent discipline as research deepens. This study constructs and optimizes an association rule algorithm based on the Boolean vector (BV) inner product and ant colony optimization to enhance data mining efficiency. Frequent itemsets are extracted from the database by establishing BV and performing vector inner product operations. These frequent itemsets form the problem space for the ant colony algorithm, which generates the maximum frequent itemset. Initially, data from the total scores of players during the 2022–2024 regular season was analyzed to obtain the optimal lineup. The results obtained from the Apriori algorithm (AA) were used as a standard for comparison with the Confidence-Debiased Adversarial Fuzzy Apriori Method (CDAFAM), the AA based on deep learning (DL), and the proposed algorithm regarding their results and required time. A dataset of disease symptoms was then used to determine diseases based on symptoms, comparing accuracy and time against the original database as a standard. Finally, simulations were conducted using five batches of radar data from the observation platform to compare the time and accuracy of the four algorithms. The results indicate that both the proposed algorithm and the AA based on DL achieve approximately 10% higher accuracy compared with the traditional AA. Additionally, the proposed algorithm requires only about 25% of the time needed by the traditional AA and the AA based on DL for target recognition. Although the CDAFAM has a similar processing time to the proposed algorithm, its accuracy is lower. These findings demonstrate that the proposed algorithm significantly improves the accuracy and speed of target recognition. Full article

(This article belongs to the Special Issue Knowledge Representation and Reasoning in Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 4405 KB

Open AccessArticle

State Evaluation of Electrical Equipment in Substations Based on Data Mining

by Ding Dang, Yi Liu and Seon-Keun Lee

Appl. Sci. 2024, 14(16), 7348; https://doi.org/10.3390/app14167348 - 20 Aug 2024

Cited by 5 | Viewed by 1998

Abstract

This paper explores the combination of a data mining-based state evaluation method for electrical equipment in substations, analyzing the effectiveness and accuracy. First, a Gaussian mixture model is applied to fit all raw data of electrical equipment. The Expectation Maximization algorithm summarizes the [...] Read more.

This paper explores the combination of a data mining-based state evaluation method for electrical equipment in substations, analyzing the effectiveness and accuracy. First, a Gaussian mixture model is applied to fit all raw data of electrical equipment. The Expectation Maximization algorithm summarizes the data distribution characteristics and identifies outliers. The a priori algorithm is then employed for data mining to derive frequent itemsets and association rules between equipment quality and measurement data. For new equipment samples, conditional probabilities of each feature are independently calculated and combined to classify and evaluate equipment quality. The results suggest that equipment reliability in smart substations can be inferred from historical and real-time operational data using improved association rule algorithms and Naive Bayes classifiers. Finally, the proposed method was applied to analyze statistical data from a 110 kV substation of a power supply company. The states prediction accuracy exceeded 95% when compared with actual equipment quality. The effectiveness evaluation metrics demonstrated that this method outperforms single-category algorithms in terms of accuracy and discrimination ability. Full article

(This article belongs to the Special Issue Electric Power Applications II)

► Show Figures

Figure 1

20 pages, 6636 KB

Open AccessArticle

Feature Detection Based on Imaging and Genetic Data Using Multi-Kernel Support Vector Machine–Apriori Model

by Zhixi Hu, Congye Tang, Yingxia Liang, Senhao Chang, Xinyue Ni, Shasha Xiao, Xianglian Meng, Bing He and Wenjie Liu

Mathematics 2024, 12(5), 684; https://doi.org/10.3390/math12050684 - 26 Feb 2024

Cited by 6 | Viewed by 2243

Abstract

Alzheimer’s disease (AD) is a significant neurological disorder characterized by progressive cognitive decline and memory loss. One essential task is understanding the molecular mechanisms underlying brain disorders of AD. Detecting biomarkers that contribute significantly to the classification of AD is an effective means [...] Read more.

Alzheimer’s disease (AD) is a significant neurological disorder characterized by progressive cognitive decline and memory loss. One essential task is understanding the molecular mechanisms underlying brain disorders of AD. Detecting biomarkers that contribute significantly to the classification of AD is an effective means to accomplish this essential task. However, most machine learning methods used to detect AD biomarkers require lengthy training and are unable to rapidly and effectively detect AD biomarkers. To detect biomarkers for AD accurately and efficiently, we proposed a novel approach using the Multi-Kernel Support Vector Machine (SVM) with Apriori algorithm to mine strongly associated feature sets from functional magnetic resonance imaging (fMRI) and gene expression profiles. Firstly, we downloaded the imaging data and genetic data of 121 participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and transformed gene sequences into labeled sequences by encoding the four types of bases (A, T, C, and G) into distinct labels. Subsequently, we extracted the first 130 temporal sequences of brain regions and employed Pearson correlation analysis to construct “brain region gene pairs”. The integration of these data allowed us to explore the correlations between genes and brain regions. To improve classification accuracy and feature selection, we applied the Apriori algorithm to the multi-kernel SVM, dynamically building feature combinations and continuously validating classification results. By iteratively generating frequent itemsets, we obtained important brain region gene pairs. Experimental results show the effectiveness of our proposed approach. The Multi-Kernel SVM with Apriori model achieves an accuracy of 92.9%, precision of 95%, and an F1 score of 95% in classifying brain region-gene pairs within the AD–Late mild cognitive impairment (AD-LMCI) group. The amygdala, BIN1, RPN2, and IL15 associated with AD have been identified and demonstrate potential in identifying potential pathogenic factors of AD. The selected brain regions and associated genes may serve as valuable biomarkers for early AD diagnosis and better understanding of the disease’s molecular mechanisms. The integration of fMRI and gene data using the Multi-Kernel SVM–Apriori model holds great potential for advancing our knowledge of brain function and the genetic basis of neurological disorders. This approach provides a valuable tool for neuroscientists and researchers in the field of genomics and brain imaging studies. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

19 pages, 481 KB

Open AccessArticle

An Efficient Bit-Based Approach for Mining Skyline Periodic Itemset Patterns

by Yanzhi Li and Zhanshan Li

Electronics 2023, 12(23), 4874; https://doi.org/10.3390/electronics12234874 - 3 Dec 2023

Viewed by 1628

Abstract

Periodic itemset patterns (PIPs) are widely used in predicting the occurrence of periodic events. However, extensive redundancy arises due to a large number of patterns. Mining skyline periodic itemset patterns (SPIPs) can reduce the number of PIPs and guarantee the accuracy of prediction. [...] Read more.

Periodic itemset patterns (PIPs) are widely used in predicting the occurrence of periodic events. However, extensive redundancy arises due to a large number of patterns. Mining skyline periodic itemset patterns (SPIPs) can reduce the number of PIPs and guarantee the accuracy of prediction. The existing SPIP mining algorithm uses FP-Growth to generate frequent patterns (FPs), and then identify SPIPs from FPs. Such separate steps lead to a massive time consumption, so we propose an efficient bit-based approach named BitSPIM to mine SPIPs. The proposed method introduces efficient bitwise representations and makes full use of the data obtained in the previous steps to accelerate the identification of SPIPs. A novel cutting mechanism is applied to eliminate unnecessary steps. A series of comparative experiments were conducted on various datasets with different attributes to verify the efficiency of BitSPIM. The experiment results demonstrate that our algorithm significantly outperforms the latest SPIP mining approach. Full article

► Show Figures

Figure 1

19 pages, 6921 KB

Open AccessArticle

Hypergraph-Clustering Method Based on an Improved Apriori Algorithm

by Rumeng Chen, Feng Hu, Feng Wang and Libing Bai

Appl. Sci. 2023, 13(19), 10577; https://doi.org/10.3390/app131910577 - 22 Sep 2023

Cited by 5 | Viewed by 2508

Abstract

With the complexity and variability of data structures and dimensions, traditional clustering algorithms face various challenges. The integration of network science and clustering has become a popular field of exploration. One of the main challenges is how to handle large-scale and complex high-dimensional [...] Read more.

With the complexity and variability of data structures and dimensions, traditional clustering algorithms face various challenges. The integration of network science and clustering has become a popular field of exploration. One of the main challenges is how to handle large-scale and complex high-dimensional data effectively. Hypergraphs can accurately represent multidimensional heterogeneous data, making them important for improving clustering performance. In this paper, we propose a hypergraph-clustering method dubbed the “high-dimensional data clustering method” based on hypergraph partitioning using an improved Apriori algorithm (HDHPA). First, the method constructs a hypergraph based on the improved Apriori association rule algorithm, where frequent itemsets existing in high-dimensional data are treated as hyperedges. Then, different frequent itemsets are mined in parallel to obtain hyperedges with corresponding ranks, avoiding the generation of redundant rules and improving mining efficiency. Next, we use the dense subgraph partition (DSP) algorithm to divide the hypergraph into multiple subclusters. Finally, we merge the subclusters through dense sub-hypergraphs to obtain the clustering results. The advantage of this method lies in its use of the hypergraph model to discretize the association between data in space, which further enhances the effectiveness and accuracy of clustering. We comprehensively compare the proposed HDHPA method with several advanced hypergraph-clustering methods using seven different types of high-dimensional datasets and then compare their running times. The results show that the clustering evaluation index values of the HDHPA method are generally superior to all other methods. The maximum ARI value can reach 0.834, an increase of 42%, and the average running time is lower than other methods. All in all, HDHPA exhibits an excellent comparable performance on multiple real networks. The research results of this paper provide an effective solution for processing and analyzing large-scale network datasets and are also conducive to broadening the application range of clustering techniques. Full article

► Show Figures

Figure 1

Search Results (50)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (50)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI