MDPI - Publisher of Open Access Journals

9 pages, 586 KB

Open AccessProceeding Paper

An Efficient Algorithm for Mining Top-k High-On-Shelf-Utility Itemsets with Positive/Negative Profits of Local/Global Minimum Count

by Ye-In Chang, Po-Chun Chuang, Yu-Hao Liao, Po-Yu Hu and Ting-Wei Chen

Eng. Proc. 2025, 108(1), 45; https://doi.org/10.3390/engproc2025108045 - 16 Sep 2025

Viewed by 409

Abstract

High-utility itemset mining (HUIM) utilizes the threshold value to extract HUI from the transactional database. However, it is difficult to define an optimal threshold value, since it depends on the domain knowledge of the application. Therefore, top-k HUIM is used to solve the [...] Read more.

High-utility itemset mining (HUIM) utilizes the threshold value to extract HUI from the transactional database. However, it is difficult to define an optimal threshold value, since it depends on the domain knowledge of the application. Therefore, top-k HUIM is used to solve the problem of setting a threshold. A user can define a k value, which represents the number of HUIs. Moreover, there exist itemsets occurring at a specific time interval, which can become HUI. Since the traditional HUIM algorithm does not consider the transaction with the time interval, the HUIM algorithm cannot be used directly. Therefore, high-on-shelf-utility itemset mining (HOUIM) is used to address the above problem in this study. The proportion of the utility value of the item in all of the time intervals with the itemset is used for determining whether the itemset is HOUI or not. In the top-k HOUIM, the KOSHU algorithm is used based on the data structure, ignoring the item with the negative profit in overestimating the utility of the itemset. The KOSHU algorithm needs less processing time. However, the KOSHU algorithm has to scan the database twice and sort the database once. Therefore, we developed an efficient algorithm based on the TIPN table to mine top-k HOUIs. The developed data structures include TIPN and MINC tables, IO Bitmap, and TIUL. In the TIPN table, we recorded positive items, positive utilities, negative items, and negative counts. The MINC table is used for storing the local/global counts of all of the items with negative profits. In the algorithm, we scanned the database only once. The developed algorithm is more efficient than the KOSHU algorithm. Full article

(This article belongs to the Proceedings of 2025 IEEE 5th International Conference on Electronic Communications, Internet of Things and Big Data)

► Show Figures

Figure 1

12 pages, 1814 KB

Open AccessProceeding Paper

An Efficient Approach for Mining High Average-Utility Itemsets in Incremental Database

by Ye-In Chang, Chen-Chang Wu and Hsiang-En Kuo

Eng. Proc. 2025, 108(1), 32; https://doi.org/10.3390/engproc2025108032 - 5 Sep 2025

Viewed by 4961

Abstract

Traditional high-utility itemset (HUI) mining methods tend to overestimate utility for long itemsets, leading to biased results. High average-utility itemset (HAUI) mining addresses this problem by normalizing utility with itemset length. However, uniform utility thresholds fail to account for varying item importance. Recently, [...] Read more.

Traditional high-utility itemset (HUI) mining methods tend to overestimate utility for long itemsets, leading to biased results. High average-utility itemset (HAUI) mining addresses this problem by normalizing utility with itemset length. However, uniform utility thresholds fail to account for varying item importance. Recently, HAUI mining with multiple minimum utility thresholds (MMU) has been used for flexible utility evaluation. While the generalized HAUIM (GHAUIM) algorithm performs well, it requires two database scans and is limited to static datasets. Therefore, we developed a novel tree-based method that scans the database only once to improve efficiency by reducing storage and eliminating costly join operations. Additionally, pruning strategies and incremental updates were introduced to enhance scalability. The developed method outperformed GHAIM in efficiency. Full article

(This article belongs to the Proceedings of 2025 IEEE 5th International Conference on Electronic Communications, Internet of Things and Big Data)

► Show Figures

Figure 1

21 pages, 2616 KB

Open AccessArticle

Association Analysis of Benzo[a]pyrene Concentration Using an Association Rule Algorithm

by Minyi Wang and Takayuki Kameda

Air 2025, 3(2), 15; https://doi.org/10.3390/air3020015 - 12 May 2025

Viewed by 1004

Abstract

Benzo[a]pyrene is an important indicator of polycyclic aromatic hydrocarbons pollution that exhibits complex atmospheric dynamics influenced by meteorological factors and suspended particulate matter (SPM). Herein, the factors influencing B(a)P concentration were elucidated by analyzing the monthly environmental data for Kyoto, Japan, [...] Read more.

Benzo[a]pyrene is an important indicator of polycyclic aromatic hydrocarbons pollution that exhibits complex atmospheric dynamics influenced by meteorological factors and suspended particulate matter (SPM). Herein, the factors influencing B(a)P concentration were elucidated by analyzing the monthly environmental data for Kyoto, Japan, from 2001 to 2021 using an improved association rule algorithm. Results revealed that B(a)P concentrations were 1.3–3 times higher in cold seasons than in warm seasons and SPM concentrations were lower in cold seasons. The clustering performance was enhanced by optimizing the K-means method using the sum of squared error. The efficiency and reliability of the traditional Apriori algorithm were enhanced by restructuring its candidate itemset generation process, specifically by (1) generating C₂ exclusively from frequent itemset L₁ to avoid redundant database scans and (2) implementing the iterative pruning of nonfrequent subsets during L_k → C_k+1 transitions, adding the lift parameter, and eliminating invalid rules. Strong association rules revealed that B(a)P concentrations ≤ 0.185 ng/m³ were associated with specific meteorological conditions, including humidity ≤ 58%, wind speed ≥ 2 m/s, temperature ≥ 12.3 °C, and pressure ≤ 1009.2 hPa. Among these, changes in pressure had the most substantial impact on the confidence of the association rules, followed by humidity, wind speed, and temperature. Under the influence of high SPM concentrations, favorable meteorological conditions further accelerated pollutant dispersion. B(a)P concentration increased with increasing pressure, decreasing temperature, and decreasing wind speed. Principal component analysis confirmed the robustness and accuracy of our optimized association rule approach in quantifying complex, nonlinear relationships, while providing granular, interpretable insights beyond the traditional methods. Full article

► Show Figures

Figure 1

31 pages, 2778 KB

Open AccessArticle

Mining High-Efficiency Itemsets with Negative Utilities

by Irfan Yildirim

Mathematics 2025, 13(4), 659; https://doi.org/10.3390/math13040659 - 17 Feb 2025

Cited by 4 | Viewed by 1219

Abstract

High-efficiency itemset mining has recently emerged as a new problem in itemset mining. An itemset is classified as a high-efficiency itemset if its utility-to-investment ratio meets or exceeds a specified efficiency threshold. The goal is to discover all high-efficiency itemsets in a given [...] Read more.

High-efficiency itemset mining has recently emerged as a new problem in itemset mining. An itemset is classified as a high-efficiency itemset if its utility-to-investment ratio meets or exceeds a specified efficiency threshold. The goal is to discover all high-efficiency itemsets in a given database. However, solving the problem is computationally complex, due to the large search space involved. To effectively address this problem, several algorithms have been proposed that assume that databases contain only positive utilities. However, real-world databases often contain negative utilities. When the existing algorithms are applied to such databases, they fail to discover the complete set of itemsets, due to their limitations in handling negative utilities. This study proposes a novel algorithm, MHEINU (mining high-efficiency itemset with negative utilities), designed to correctly mine a complete set of high-efficiency itemsets from databases that also contain negative utilities. MHEINU introduces two upper-bounds to efficiently and safely reduce the search space. Additionally, it features a list-based data structure to streamline the mining process and minimize costly database scans. Experimental results on various datasets containing negative utilities showed that MHEINU effectively discovered the complete set of high-efficiency itemsets, performing well in terms of runtime, number of join operations, and memory usage. Additionally, MHEINU demonstrated good scalability, making it suitable for large-scale datasets. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

19 pages, 6921 KB

Open AccessArticle

Hypergraph-Clustering Method Based on an Improved Apriori Algorithm

by Rumeng Chen, Feng Hu, Feng Wang and Libing Bai

Appl. Sci. 2023, 13(19), 10577; https://doi.org/10.3390/app131910577 - 22 Sep 2023

Cited by 5 | Viewed by 2266

Abstract

With the complexity and variability of data structures and dimensions, traditional clustering algorithms face various challenges. The integration of network science and clustering has become a popular field of exploration. One of the main challenges is how to handle large-scale and complex high-dimensional [...] Read more.

With the complexity and variability of data structures and dimensions, traditional clustering algorithms face various challenges. The integration of network science and clustering has become a popular field of exploration. One of the main challenges is how to handle large-scale and complex high-dimensional data effectively. Hypergraphs can accurately represent multidimensional heterogeneous data, making them important for improving clustering performance. In this paper, we propose a hypergraph-clustering method dubbed the “high-dimensional data clustering method” based on hypergraph partitioning using an improved Apriori algorithm (HDHPA). First, the method constructs a hypergraph based on the improved Apriori association rule algorithm, where frequent itemsets existing in high-dimensional data are treated as hyperedges. Then, different frequent itemsets are mined in parallel to obtain hyperedges with corresponding ranks, avoiding the generation of redundant rules and improving mining efficiency. Next, we use the dense subgraph partition (DSP) algorithm to divide the hypergraph into multiple subclusters. Finally, we merge the subclusters through dense sub-hypergraphs to obtain the clustering results. The advantage of this method lies in its use of the hypergraph model to discretize the association between data in space, which further enhances the effectiveness and accuracy of clustering. We comprehensively compare the proposed HDHPA method with several advanced hypergraph-clustering methods using seven different types of high-dimensional datasets and then compare their running times. The results show that the clustering evaluation index values of the HDHPA method are generally superior to all other methods. The maximum ARI value can reach 0.834, an increase of 42%, and the average running time is lower than other methods. All in all, HDHPA exhibits an excellent comparable performance on multiple real networks. The research results of this paper provide an effective solution for processing and analyzing large-scale network datasets and are also conducive to broadening the application range of clustering techniques. Full article

► Show Figures

Figure 1

22 pages, 2161 KB

Open AccessArticle

Vehicle Trajectory Prediction Method for Task Offloading in Vehicular Edge Computing

by Ruibin Yan, Yijun Gu, Zeyu Zhang and Shouzhong Jiao

Sensors 2023, 23(18), 7954; https://doi.org/10.3390/s23187954 - 18 Sep 2023

Cited by 8 | Viewed by 2631

Abstract

Real-time computation tasks in vehicular edge computing (VEC) provide convenience for vehicle users. However, the efficiency of task offloading seriously affects the quality of service (QoS). The predictive-mode task offloading is limited by computation resources, storage resources and the timeliness of vehicle trajectory [...] Read more.

Real-time computation tasks in vehicular edge computing (VEC) provide convenience for vehicle users. However, the efficiency of task offloading seriously affects the quality of service (QoS). The predictive-mode task offloading is limited by computation resources, storage resources and the timeliness of vehicle trajectory data. Meanwhile, machine learning is difficult to deploy on edge servers. In this paper, we propose a vehicle trajectory prediction method based on the vehicle frequent pattern for task offloading in VEC. First, in the initialization stage, a T-pattern prediction tree (TPPT) is constructed based on the historical vehicle trajectory data. Secondly, when predicting the vehicle trajectory, the vehicle frequent itemset with the largest vehicle trajectory support is found in the vehicle frequent itemset of the TPPT. Finally, in the update stage, the TPPT is updated in real time with the predicted vehicle trajectory results. Meanwhile, based on the proposed prediction method, the strategies of task offloading and optimization algorithm are designed to minimize energy consumption with time constraints. The experiments are carried out on real-vehicle datasets and the Capital Bikeshare datasets. The results show that compared with the baseline T-pattern method, the accuracy of the prediction method is improved by more than 10% and the prediction efficiency is improved by more than 6.5 times. The vehicle trajectory prediction method based on the vehicle frequent pattern has high accuracy and prediction efficiency, which can solve the problem of vehicle trajectory prediction for task offloading. Full article

(This article belongs to the Special Issue Cloud/Edge/Fog Computing for Network and IoT)

► Show Figures

Figure 1

21 pages, 3113 KB

Open AccessArticle

High Dimensional Data Differential Privacy Protection Publishing Method Based on Association Analysis

by Wei Shi, Xiaolei Zhang, Hao Chen and Xing Zhang

Electronics 2023, 12(13), 2779; https://doi.org/10.3390/electronics12132779 - 23 Jun 2023

Cited by 3 | Viewed by 1910

Abstract

In order to solve the problem of privacy disclosure when publishing high-dimensional data and to protect the privacy of frequent itemsets in association rules, a high-dimensional data publishing method based on frequent itemsets of association rules (PDP Growth) is proposed. This method, in [...] Read more.

In order to solve the problem of privacy disclosure when publishing high-dimensional data and to protect the privacy of frequent itemsets in association rules, a high-dimensional data publishing method based on frequent itemsets of association rules (PDP Growth) is proposed. This method, in a distributed framework, utilizes rough set theory to improve the mining of association rules. It optimizes association analysis while reducing the dimensionality of high-dimensional data, eliminating more redundant attributes, and obtaining more concise frequent itemsets, and uses the exponential mechanism to protect the differential privacy of the simplest frequent itemset obtained, and effectively protects the privacy of the frequent itemset by adding Laplace noise to its support. The theory validates that the method satisfies the requirement of differential privacy protection. Experiments on multiple datasets show that this method can improve the efficiency of high-dimensional data mining and meet the privacy protection. Finally, the association analysis results that meet the requirements are published. Full article

► Show Figures

Figure 1

29 pages, 4508 KB

Open AccessArticle

Organization Preference Knowledge Acquisition of Multi-Platform Aircraft Mission System Utilizing Frequent Closed Itemset Mining

by Yuqian Wu, Miao Wang, Wenkui Chu and Guoqing Wang

Aerospace 2023, 10(2), 166; https://doi.org/10.3390/aerospace10020166 - 10 Feb 2023

Cited by 2 | Viewed by 2446

Abstract

Organization preference knowledge is critical to enhancing the intelligence and efficiency of the multi-platform aircraft mission system (MPAMS), particularly the collaboration tactics of task behaviors, platform types, and mount resources. However, it is challenging to extract such knowledge concisely, which is buried in [...] Read more.

Organization preference knowledge is critical to enhancing the intelligence and efficiency of the multi-platform aircraft mission system (MPAMS), particularly the collaboration tactics of task behaviors, platform types, and mount resources. However, it is challenging to extract such knowledge concisely, which is buried in massive historical data. Therefore, this paper proposes an innovative data-driven approach via frequent closed itemset mining (FCIM) algorithm to discover valuable MPAMS organizational knowledge. The proposed approach addresses the limitations of poor effectiveness and low mining efficiency for the previously discovered knowledge. To ensure the knowledge effectiveness, this paper designs a multi-layer knowledge discovery framework from the system-of-systems perspective, allowing to discover more systematic knowledge than traditional frameworks considering an isolated layer. Additionally, the MPAMS’s contextual capability reflecting the decision motivation is integrated into the knowledge representation, making the knowledge more intelligible to decision-makers. Further, to ensure mining efficiency, the knowledge mining process is accelerated by designing an itemset storage structure and three pruning strategies for FCIM. The simulation of 1100 air-to-sea assault scenarios has provided abundant knowledge with high interpretability. The performance superiority of the proposed approach is thoroughly verified by comparative experiments. The approach provides guidance and insights for future MPAMS development and organization optimization. Full article

(This article belongs to the Collection Avionic Systems)

► Show Figures

Graphical abstract

11 pages, 255 KB

Open AccessArticle

Ignoring Internal Utilities in High-Utility Itemset Mining

by Damla Oguz

Symmetry 2022, 14(11), 2339; https://doi.org/10.3390/sym14112339 - 7 Nov 2022

Cited by 3 | Viewed by 1986

Abstract

High-utility itemset mining discovers a set of items that are sold together and have utility values higher than a given minimum utility threshold. The utilities of these itemsets are calculated by considering their internal and external utility values, which correspond, respectively, to the [...] Read more.

High-utility itemset mining discovers a set of items that are sold together and have utility values higher than a given minimum utility threshold. The utilities of these itemsets are calculated by considering their internal and external utility values, which correspond, respectively, to the quantity sold of each item in each transaction and profit units. Therefore, internal and external utilities have symmetric effects on deciding whether an itemset is high-utility. The symmetric contributions of both utilities cause two major related challenges. First, itemsets with low external utility values can easily exceed the minimum utility threshold if they are sold extensively. In this case, such itemsets can be found more efficiently using frequent itemset mining. Second, a large number of high-utility itemsets are generated, which can result in interesting or important high-utility itemsets that are overlooked. This study presents an asymmetric approach in which the internal utility values are ignored when finding high-utility itemsets with high external utility values. The experimental results of two real datasets reveal that the external utility values have fundamental effects on the high-utility itemsets. The results of this study also show that this effect tends to increase for high values of the minimum utility threshold. Moreover, the proposed approach reduces the execution time. Full article

(This article belongs to the Special Issue Information Technology and Its Applications 2021)

► Show Figures

Figure 1

23 pages, 512 KB

Open AccessArticle

An Efficient Algorithm for Mining Stable Periodic High-Utility Sequential Patterns

by Shiyong Xie and Long Zhao

Symmetry 2022, 14(10), 2032; https://doi.org/10.3390/sym14102032 - 28 Sep 2022

Cited by 7 | Viewed by 2755

Abstract

Periodic high-utility sequential pattern mining (PHUSPM) is used to extract periodically occurring high-utility sequential patterns (HUSPs) from a quantitative sequence database according to a user-specified minimum utility threshold (minutil). A sequential pattern’s periodicity is determined by measuring when the frequency of [...] Read more.

Periodic high-utility sequential pattern mining (PHUSPM) is used to extract periodically occurring high-utility sequential patterns (HUSPs) from a quantitative sequence database according to a user-specified minimum utility threshold (minutil). A sequential pattern’s periodicity is determined by measuring when the frequency of its periods (the time between two consecutive happenings of the sequential pattern) exceed a user-specified maximum periodicity threshold (maxPer). However, due to the strict judgment threshold, the traditional PHUSPM method has the problem that some useful sequential patterns are discarded and the periodic values of some sequential patterns fluctuate greatly (i.e., are unstable). In frequent itemset mining (FIM), some researchers put forward some strategies to solve these problems. Because of the symmetry of frequent itemset pattern (FIPs), these strategies cannot be directly applied to PHUSPM. In order to address these issues, this work proposes the stable periodic high-utility sequential pattern mining (SPHUSPM) algorithm. The contributions made by this paper are as follows. First, we introduce the concept of stability to overcome the abovementioned problems, mine sequential patterns with stable periodic behavior, and propose the concept of stable periodic high-utility sequential patterns (SPHUSPs) for the first time. Secondly, we design a new data structure named the PUL-list to record the periodic information of sequential patterns, thereby improving the mining efficiency. Thirdly, we propose the maximum lability pruning strategy in sequential pattern (MLPS), which can prune a large number of unstable sequential patterns in advance. To assess the algorithm’s effectiveness, we perform many experiments. It turns out that the algorithm can not only mine patterns that are ignored by traditional algorithms, but also ensure that the discovered patterns have stable periodic behavior. In addition, after using the MLPS pruning strategy, the algorithm can prune 46.5% of candidates in advance on average in six datasets. Pruning a large number of candidates in advance not only speeds up the mining process, but also greatly reduces memory usage. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

22 pages, 1351 KB

Open AccessArticle

Mining High Utility Itemsets Based on Pattern Growth without Candidate Generation

by Yiwei Liu, Le Wang, Lin Feng and Bo Jin

Mathematics 2021, 9(1), 35; https://doi.org/10.3390/math9010035 - 25 Dec 2020

Cited by 12 | Viewed by 2804

Abstract

Mining high utility itemsets (HUIs) has been an active research topic in data mining in recent years. Existing HUI mining algorithms typically take two steps: generating candidates and identifying utility values of these candidate itemsets. The performance of these algorithms depends on the [...] Read more.

Mining high utility itemsets (HUIs) has been an active research topic in data mining in recent years. Existing HUI mining algorithms typically take two steps: generating candidates and identifying utility values of these candidate itemsets. The performance of these algorithms depends on the efficiency of both steps, both of which are usually time-consuming. In this study, we propose an efficient pattern-growth based HUI mining algorithm, called tail-node tree-based high-utility itemset (TNT-HUI) mining. This algorithm avoids the time-consuming candidate generation step, as well as the need of scanning the original dataset multiple times for exact utility values, as supported by a novel tree structure, named the tail-node tree (TN-Tree). The performance of TNT-HUI was evaluated in comparison with state-of-the-art benchmark methods on different datasets. Experimental results showed that TNT-HUI outperformed benchmark algorithms in both execution time and memory use by orders of magnitude. The performance gap is larger for denser datasets and lower thresholds. Full article

► Show Figures

Figure 1

17 pages, 5080 KB

Open AccessArticle

Efficient Algorithm for Mining Non-Redundant High-Utility Association Rules

by Thang Mai, Loan T.T. Nguyen, Bay Vo, Unil Yun and Tzung-Pei Hong

Sensors 2020, 20(4), 1078; https://doi.org/10.3390/s20041078 - 17 Feb 2020

Cited by 29 | Viewed by 4597

Abstract

In business, managers may use the association information among products to define promotion and competitive strategies. The mining of high-utility association rules (HARs) from high-utility itemsets enables users to select their own weights for rules, based either on the utility or confidence values. [...] Read more.

In business, managers may use the association information among products to define promotion and competitive strategies. The mining of high-utility association rules (HARs) from high-utility itemsets enables users to select their own weights for rules, based either on the utility or confidence values. This approach also provides more information, which can help managers to make better decisions. Some efficient methods for mining HARs have been developed in recent years. However, in some decision-support systems, users only need to mine a smallest set of HARs for efficient use. Therefore, this paper proposes a method for the efficient mining of non-redundant high-utility association rules (NR-HARs). We first build a semi-lattice of mined high-utility itemsets, and then identify closed and generator itemsets within this. Following this, an efficient algorithm is developed for generating rules from the built lattice. This new approach was verified on different types of datasets to demonstrate that it has a faster runtime and does not require more memory than existing methods. The proposed algorithm can be integrated with a variety of applications and would combine well with external systems, such as the Internet of Things (IoT) and distributed computer systems. Many companies have been applying IoT and such computing systems into their business activities, monitoring data or decision-making. The data can be sent into the system continuously through the IoT or any other information system. Selecting an appropriate and fast approach helps management to visualize customer needs as well as make more timely decisions on business strategy. Full article

(This article belongs to the Special Issue Security and Privacy Techniques in IoT Environment)

► Show Figures

Figure 1

13 pages, 2419 KB

Open AccessArticle

MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support

by Chen-Shu Wang and Jui-Yen Chang

Appl. Sci. 2019, 9(10), 2075; https://doi.org/10.3390/app9102075 - 20 May 2019

Cited by 15 | Viewed by 4267

Abstract

In practice, single item support cannot comprehensively address the complexity of items in large datasets. In this study, we propose a big data analytics framework (named Multiple Item Support Frequent Patterns, MISFP-growth algorithm) that uses Hadoop-based parallel computing to achieve high-efficiency mining of [...] Read more.

In practice, single item support cannot comprehensively address the complexity of items in large datasets. In this study, we propose a big data analytics framework (named Multiple Item Support Frequent Patterns, MISFP-growth algorithm) that uses Hadoop-based parallel computing to achieve high-efficiency mining of itemsets with multiple item supports (MIS). The proposed architecture consists of two phases. First, in the counting support phase, a Hadoop MapReduce architecture is employed to determine the support for each item. Next, in the analytics phase, sub-transaction blocks are generated according to MIS and the MISFP-growth algorithm identifies the frequency of patterns. To facilitate decision makers in setting MIS, we also propose the concept of classification of item (COI), which classifies items of higher homogeneity into the same class, by which the items inherit class support as their item support. Three experiments were implemented to validate the proposed Hadoop-based MISFP-growth algorithm. The experimental results show approximately 38% reduction in the execution time on parallel architectures. The proposed MISFP-growth algorithm can be implemented on the distributed computing framework. Furthermore, according to the experimental results, the enhanced performance of the proposed algorithm indicates that it could have big data analytics applications. Full article

(This article belongs to the Special Issue Actionable Pattern-Driven Analytics and Prediction)

► Show Figures

Figure 1

16 pages, 1056 KB

Open AccessArticle

Synthesizing High-Utility Patterns from Different Data Sources

by Abhinav Muley and Manish Gudadhe

Data 2018, 3(3), 32; https://doi.org/10.3390/data3030032 - 3 Sep 2018

Cited by 2 | Viewed by 3978

Abstract

In large organizations, it is often required to collect data from the different geographic branches spread over different locations. Extensive amounts of data may be gathered at the centralized location in order to generate interesting patterns via mono-mining the amassed database. However, it [...] Read more.

In large organizations, it is often required to collect data from the different geographic branches spread over different locations. Extensive amounts of data may be gathered at the centralized location in order to generate interesting patterns via mono-mining the amassed database. However, it is feasible to mine the useful patterns at the data source itself and forward only these patterns to the centralized company, rather than the entire original database. These patterns also exist in huge numbers, and different sources calculate different utility values for each pattern. This paper proposes a weighted model for aggregating the high-utility patterns from different data sources. The procedure of pattern selection was also proposed to efficiently extract high-utility patterns in our weighted model by discarding low-utility patterns. Meanwhile, the synthesizing model yielded high-utility patterns, unlike association rule mining, in which frequent itemsets are generated by considering each item with equal utility, which is not true in real life applications such as sales transactions. Extensive experiments performed on the datasets with varied characteristics show that the proposed algorithm will be effective for mining very sparse and sparse databases with a huge number of transactions. Our proposed model also outperforms various state-of-the-art distributed models of mining in terms of running time. Full article

► Show Figures

Figure 1

13 pages, 840 KB

Open AccessArticle

Fast Identification of High Utility Itemsets from Candidates

by Jun-Feng Qu, Mengchi Liu, Chunsheng Xin and Zhongbo Wu

Information 2018, 9(5), 119; https://doi.org/10.3390/info9050119 - 14 May 2018

Cited by 4 | Viewed by 4349

Abstract

High utility itemsets (HUIs) are sets of items with high utility, like profit, in a database. Efficient mining of high utility itemsets is an important problem in the data mining area. Many mining algorithms adopt a two-phase framework. They first generate a set [...] Read more.

High utility itemsets (HUIs) are sets of items with high utility, like profit, in a database. Efficient mining of high utility itemsets is an important problem in the data mining area. Many mining algorithms adopt a two-phase framework. They first generate a set of candidate itemsets by roughly overestimating the utilities of all itemsets in a database, and subsequently compute the exact utility of each candidate to identify HUIs. Therefore, the major costs in these algorithms come from candidate generation and utility computation. Previous works mainly focus on how to reduce the number of candidates, without dedicating much attention to utility computation, to the best of our knowledge. However, we find that, for a mining task, the time of utility computation in two-phase algorithms dominates the whole running time of these algorithms. Therefore, it is important to optimize utility computation. In this paper, we first give a basic algorithm for HUI identification, the core of which is a utility computation procedure. Subsequently, a novel candidate tree structure is proposed for storing candidate itemsets, and a candidate tree-based algorithm is developed for fast HUI identification, in which there is an efficient utility computation procedure. Extensive experimental results show that the candidate tree-based algorithm outperforms the basic algorithm and the performance of two-phase algorithms, integrating the candidate tree algorithm as their second step, can be significantly improved. Full article

► Show Figures

Figure 1

Search Results (15)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (15)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI