Synthesizing High-Utility Patterns from Different Data Sources

: In large organizations, it is often required to collect data from the different geographic branches spread over different locations. Extensive amounts of data may be gathered at the centralized location in order to generate interesting patterns via mono-mining the amassed database. However, it is feasible to mine the useful patterns at the data source itself and forward only these patterns to the centralized company, rather than the entire original database. These patterns also exist in huge numbers, and different sources calculate different utility values for each pattern. This paper proposes a weighted model for aggregating the high-utility patterns from different data sources. The procedure of pattern selection was also proposed to efﬁciently extract high-utility patterns in our weighted model by discarding low-utility patterns. Meanwhile, the synthesizing model yielded high-utility patterns, unlike association rule mining, in which frequent itemsets are generated by considering each item with equal utility, which is not true in real life applications such as sales transactions. Extensive experiments performed on the datasets with varied characteristics show that the proposed algorithm will be effective for mining very sparse and sparse databases with a huge number of transactions. Our proposed model also outperforms various state-of-the-art distributed models of mining in terms of running time.


Introduction
Large and small enterprises are facing the challenges of extracting useful information, since they are becoming massively data rich and information poor [1].Organizations are getting larger and amassing continuously increasing amounts of data.Finding meaningful information plays a vital role in cross-marketing strategies and decision making process of business organizations, especially those who deal with big data.
The world's biggest retailer, Walmart, has over 20,000 stores in 28 countries which process 2.5 petabytes of data every hour [2].A few weeks of data contains 200 billion rows of transactional data.Data is the key for them to keep the company at the top for generating revenues.The cost of transferring the data at the central node can be cut-down if they focus on mining the data at the local node itself and forwarding only retrieved patterns, rather than a complete database.Swiggy, one of India's successful start-ups, generates terabytes of data every week from 35,000 restaurants spread across 15 cities to deliver the food to the consumer's doorstep [3].It relies on this data for the efficiency of delivery and a hassle-free experience for consumers.If the data is mined at the city-based local nodes and only the extracted patterns are forwarded, it would help them to deliver their services at lightning fast speed.
Traditional knowledge discovery techniques are capable of mining at a single source platform.These techniques are insufficient for mining the data of large companies scattered across multiple locations.While collecting all the data from multiple sources might gather a huge chunk of databases for centralized processing, it is unrealistic to combine the data together from multiple data sources because of the size of the data to be transported and privacy-related issues.Some sources of an organization may send their extracted patterns but not their entire data set due to privacy concerns.Hence it is feasible to mine the patterns at different data sources and send only the extracted patterns, rather than the original database, to the central branch of the company.The pattern mining at each source is also important for decision support at the local level.However, the number of patterns collected from different sources may be too high, so that finding valid patterns from the pattern set can be difficult for the centralized company.The proposed weighted model compresses the set of patterns and generates high voted patterns.For convenience, the work presented in this paper focuses on post-mining, that is, gathering, analyzing and synthesizing the patterns extracted from multiple databases.
There are existing parallel data mining algorithms which employ parallel machines to implement data mining algorithms called Mining Association Rules on Paralleling (MARP) [4][5][6][7][8].These algorithms have proved to be effective in mining from very large databases.However, there are certain limitations to these algorithms because they don't generate local patterns at the local data source, which are very useful in real-world applications.In addition, it requires massively parallel machines for computing and dedicated software for processing of parallel machines.Some mining algorithms are sequential in nature and cannot be tested on parallel machines.
Various techniques are developed for data mining from distributed data sources [9][10][11][12][13][14][15][16].Wu et al. [17] have already proposed a synthesis model to extract high-frequency association rules from different data sources using Frequent Itemset Mining (FIM) as their data mining technique.This model cannot be applied to High-utility Itemset Mining (HUIM) for the following reasons:

•
FIM assumes that every item can appear only once in each transaction and has the same utility in terms of occurrences and unit profit.

•
FIM maintains the anti-monotonicity of the support which is not applicable to the problem of High-utility Itemset Mining (HUIM) discussed in a later section.
T. Ramkumar et al. [18] have also proposed a synthesis model along similar lines to Wu's model, but the main drawback of this model is that the transaction population of a data source in terms of the population of other data sources must be known beforehand, which is not possible due to privacy concerns of data sharing.This paper emphasizes synthesizing local high-utility patterns rather than frequent rules, to find the patterns valid throughout the organization.The model presented in this paper doesn't require an assumption of transaction population to be known in advance.
The problem statement for synthesizing the patterns from different data sources can be formulated as: There are 'n' data sources present in a large organization: DS 1 , DS 2 , DS 3 , . . ., DS n .Each site supports a set of local patterns.We are interested in: (1) mining every data source to find local patterns supported by the source; and (2) developing an algorithm to synthesize these local patterns to find only the useful patterns (with high-utility), which are valid for the whole organization and calculate their synthesized/global utility.It is assumed that the cleaning and pre-processing of data has been performed already.Patterns obtained from mono-mining the union of all data sources is our goal.

Related Work
Data mining is the process of semi-automated analysis on large databases to find significant patterns and relationships which are novel, valid and previously unknown.Data mining is a component of a process called Knowledge discovery from databases (KDD).The aim of data mining is to seek out rules, patterns and trends in the data and infer the associations from these patterns.In a transaction database, Frequent Itemset Mining (FIM) discovers frequent itemsets that are the collection of itemsets appearing most frequently in a transaction database [19].Market basket analysis is the most popular application of FIM.The retail managers use frequent itemsets mined from analyzing the transactions to strategize store structure, offers, and classification of customers [20,21].As discussed earlier, the FIM has following limitations: (1) it assumes that every item can appear only once in each transaction; and (2) it has the same utility in terms of occurrences and unit profit.In market basket analysis, it may happen that customer buys multiple units of the same item, for example, 3 packets of almonds or 5 packets of bread, and so on.And every item doesn't have the same unit profit, for example, selling the packet of almonds yields more profit than selling a packet of bread.FIM doesn't take into account the number of items purchased in a transaction.Thus FIM only counts the frequency of items rather than the utility or profit of items.As a consequence, the infrequent patterns with high-utility are missed and frequent patterns with low-utility are generated.Frequent items may contribute to only a minor portion of the total profit, whereas non-frequent items may contribute to a major portion of the total profit of a business.The support and confidence framework of FIM established by Agrawal et al. [22] are the measures to generate the high-frequency rules, but high confidence may not always imply the high-profit correlation between the items.Another example is click-stream data, where a stream of web pages visited can be defined as a transaction.The time spent on a webpage contributes to its utility in contrast to the frequent visits counted in FIM.
To address this limitation of FIM, the concept of High-utility Itemset Mining (HUIM) [23][24][25] was defined.Unlike FIM, the HUIM takes into account the quantity of an item in each transaction and its corresponding weight (e.g., profit/unit).HUIM discovers itemsets with high-utility (profit).This allows items to appear more than once in a transaction.The problem of HUIM is known to be more difficult than the FIM, because the downward-closure property doesn't hold true for utility mining, that is, the utility of an itemset is not anti-monotonic or monotonic.Thus, the utility of an itemset can be higher, equal or lower than the utility of any subset of that itemset.The target of high-utility mining is to generate high-utility patterns which yield a major portion of the total profit.Interestingly, FIM assumes the utility of every item to be 1, that is, the quantity of each item and weight/unit are equal.Hence FIM is considered to be a special case of HUIM.Let's consider the sample database in Table 1.It has five transactions (T 1 , T 2 , T 3 , T 4 , and T 5 ).Items p, r, and s appear in the transaction T 1 having an internal utility (e.g., quantity of item) 2, 3 and 5 respectively.(p, 2), (q, 3), (r, 2), (s, 7), (t, 2), (u, 6) T 4 (q, 5), (r, 4), (s, 4), (t, 2) T 5 (q, 3), (r, 3), (t, 2), (v, 3) Table 2 shows the external utility (e.g., profit/unit) of the items p, r and s are 6, 2 and 3 respectively.The utility U(i, T i ) of any item i in a transaction T i is calculated as e(i) × X(i, T i ) where e(i) denotes external utility as per Table 2 and X(i, T i ) denotes internal utility as per Table 1.The utility U(i, T i ) denotes the profit generated by selling an item i in the transaction T i .For example, utility of the transaction T 1 is calculated as external utility of all items in T 1 × internal utility of respective items in T 1 i.e., (2 × 6) + (3 × 2) + (5 × 3) = 33.An itemset Z is labelled as a high-utility itemset when the utility calculated is more than the utility threshold minutil set by the user otherwise it is called a low-utility itemset.The HUIM discovers all the high-utility itemsets satisfying the minutil.

Method: Proposed Synthesis Model
In this section, we propose a weighted model for synthesizing high-utility patterns forwarded by different and known sources.Let P be the set of patterns forwarded by n different data sources DS 1 , DS 2 , DS 3 , . . ., DS n .For a pattern XY, suppose W(DS 1 ), W(DS 2 ), W(DS 3 ), . . ., W(DS n ) are the respective weights of DS 1 , DS 2 , DS 3 , . . ., DS n .The synthesized utility for pattern XY is calculated as: where util i (XY) is the utility of pattern XY in DS i for i = 1, 2, 3, . . ., n.
We adopt the Good's idea [26] based on the weight of evidence for allocating weights to our data sources as the same was also adopted by Wu's proposed model [17].It establishes that the weight of evidence is as important as the probability itself.For convenience, we normalize the weights in the interval 0-1 and the weight of each pattern is important, based on its presence in the original database.Therefore, we use the presence of patterns to evaluate the weight of the pattern.Figure 1 outlines the proposed method.Our weighted model is designed in following sections with an example.Pattern selection algorithm is constructed to deal with low-utility patterns.

Allocating Weights to Patterns
The weight of each data source needs to be calculated in order to synthesize the high-utility patterns forwarded by different and known data sources.Let P be the set of patterns forwarded by n different data sources DS 1 , DS 2 , DS 3 , . . ., DS n and P = P 1 ∪ P 2 ∪ P 3 ∪ . . .∪ P n where P i is the set of patterns forwarded by DS i .According to Good's idea of allocating weight, we take the number of occurrences of Pattern R in P to assign weight W(R) to R. High-utility patterns have higher chances of becoming a valid pattern than low-utility patterns, in the combination of all the data sources.Thus, the weight of a pattern depends upon the number of data sources that support/vote for it.In reality, a business organization is interested in mining patterns voted by most of its branches for generating maximum profit.The weight of data source also depends upon the number of high-utility patterns it supports.The following example illustrates this idea:

•
Let P 1 be the set of patterns mined from DS 1 having patterns: ABC, AD and BE where util We assume that the minimum utility threshold (minutil) is set to 800,000 for running example and the minimum voting degree, µ = 0.4 (Number of occurrences in P/Number of total data sources) in the pattern selection algorithm.Hence, the patterns selected are: Pattern R 1 is voted by 2 data sources, pattern R 2 is voted by all 3 data sources, pattern R 3 is voted by 2 sources and pattern R 4 is voted by only 1 data source so it is wiped out in pattern selection procedure.To assign the weights to patterns, we use Good's weighted model, that is, the number of occurrences of a pattern in P is used to define the weight of the pattern.The weights of patterns are assigned as: For n different data sources, we have P = P 1 ∪ P 2 ∪ P 3 ∪, . . ., ∪ P n and P contains R 1 , R 2 , R 3 , . . ., R m patterns.Hence, the weight of any pattern R i can be given as: where j = 1, 2, 3, . . ., n and Occurrence(R i ) = Number of occurrences of pattern R i in P.

Allocating Weights to Data Sources
Here, it is clearly seen that weight of data source is directly proportional to the weight of patterns mined by it.The weight of the data sources is assigned as: Since the values are exceeding beyond the range, we normalize and reassign the weights as: Data source DS 1 has the highest weight since it votes most patterns with high-utility and data source DS 2 has the lowest weight since it votes least patterns with high-utility.For n different data sources, we have P = P 1 ∪ P 2 ∪ P 3 ∪ . . .∪ P n and P contains R 1 , R 2 , R 3 , . . ., R m patterns.Hence, the weight of any data source DS i can be given as: where, i = 1, 2, 3, . . ., n; j = 1, 2, 3, . . ., n; R x ∈ P i ; R y ∈ P j .

Synthesizing the Utility of Patterns
We can synthesize the utility patterns after all the different data sources are assigned weights.The synthesizing process of patterns is demonstrated below: All of the selected patterns satisfy the minimum threshold of utility, so they are forwarded as it is for normalizing their utility, otherwise, they are wiped out again.For n different data sources, we have P = P 1 ∪ P 2 ∪ P 3 ∪ . . .∪ P n and P contains R 1 , R 2 , R 3 , . . ., R m patterns.Hence, the utility of any pattern util(R i ) can be calculated as:

Normalizing the Utility of Patterns
The synthesized utility obtained after allocating the weights can be in the larger range depending upon the utilities specified for an item in the transaction database.According to Good's idea, this synthesized utility can be normalized in the interval 0-1 for our simplicity.To calculate the normalized utility, the maximum profit generated in that transaction database and the number of occurrences of the pattern are used.The maximum profit generated by any item(s) after mining the union of DS 1 , DS 2 and DS 3 , is 17,824,032.Hence, this value is used as Maximum_Profit while normalizing the utility of patterns.The normalization of synthesized utility is demonstrated: • Pattern R 3 : BE Nutil(BE) = (1,021,578 × 2)/17,824,032 = 0.115 (10) From the above results, the ranking of patterns is R 2 (nutil = 0.293), R 1 (nutil = 0.28) and R 3 (nutil = 0.115).For n different data sources, we have P = P 1 ∪ P 2 ∪ P 3 ∪ . . .∪ P n and P contains R 1 , R 2 , R 3 , . . ., R m patterns.Hence the normalized utility for any pattern R i Can be given as:

Algorithm Design
When we combine all of the high-utility patterns from different data sources, it can amass a huge number of patterns overall.To deal with this problem, we first design a pattern selection algorithm for selecting only those patterns occurring more than the number of times specified by the user, that is, by specifying a minimum voting degree, µ.The minimum voting degree, µ is a user-specified value in the range 0-1.For example, if we want only those patterns whose occurrence is in more than 60% of data sources then µ is set to be 0.6.This algorithm only enhances our synthesis model by wiping out patterns whose occurrences is below the threshold µ.The patterns having a smaller number of occurrences are seen as noise and considered to be irrelevant in the set of all patterns.These irrelevant patterns are removed before assigning weights to data sources.The output of this algorithm will be a set of filtered patterns.We design a pattern selection algorithm as Algorithm 1 below: Let DS 1 , DS 2 , DS 3 , . . ., DS n be n different data sources, generating the universal set of patterns P = P 1 ∪ P 2 ∪ P 3 ∪ . . .∪ P n where P is the set of patterns forwarded by DS i (i = 1, 2, 3, . . ., n). util i (R j ) is the utility of pattern R j in DS i .minutil and Maximum_Profit are the thresholds set by the user.We design the following Algorithm 2 for synthesizing patterns from different data sources: Input: Pattern sets P 1 , P 2 , P 3 , . . ., P n ; minimum utility threshold-minutil; Maximum_Profit.Output: Synthesized patterns with their utility.
1. Combine all sets P 1 , P 2 , P 3 , . . ., P n into P by assigning Pattern ID to each distinguished pattern.2. call Pattern_Selection(P); /Maximum_Profit; 6. sort all the patterns in P by their normalized utility i.e., Nutil(R i ); 7. return all the rules ranked by their Nutil(R i ); end.
The above pattern synthesis algorithm synthesizes high-utility patterns ranked by their normalized utility.Step 1 combines all the sets of patterns forwarded by different data sources and assigns a unique pattern ID.Step 2 calls the algorithm Pattern_Selection (Algorithm 1) for removing the patterns occurring below the threshold µ.Step 3 assigns the weights to all patterns.Step 4 assigns the weight to all data sources.Step 5 calculates the synthesized utility and normalized utility of selected patterns.Steps 6 and 7 returns the synthesized patterns with their rank wise utility.

Data Description
The mining algorithms were used from SPMF Open-Source Data Mining Library [27] for performing mining through various datasets.We used Microsoft Excel for calculations and data visualization.All of the datasets used were taken from the same SPMF library.The transactions in the datasets are already having the internal utility and profit per item, hence there is no need to assign any kind of utility factor to any item.We evaluate the effectiveness of the proposed synthesis model by extensively experimenting with the datasets with varied characteristics.

Experimental Evaluation
The pattern ID denotes the number given to each itemset in the transaction database.The itemset shows the items appearing together in a pattern.The deviation measure between the proposed method and mono-mining is calculated by following formula: where Nutil i MM = Util(R i ) in the union of all the databases/Maximum_Profit of the database and Nutil i SM is the synthesized utility calculated by the proposed model.

Study 1
Kosarak is a very sparse dataset containing 990,000 sequences of click-stream data from a Hungarian news portal, having 41,270 distinct items.It is partitioned into 5 databases with 198,000 transactions each, representing 5 different data sources.We first performed mono-mining on the union of these 5 databases using the D2HUP algorithm [28] and calculated the normalized utility denoted by Nutil i MM .Then we mined these 5 databases separately by using the D2HUP algorithm and then applied our synthesis model to calculate normalized utility denoted by Nutil i SM when minutil = 800,000, µ = 0.4, Maximum_Profit = 17,824,032.The average deviation was found to be 0.001 for two different methods.The patterns mined from different databases and ranked with its synthesized utility are tabulated in Table 3.

Study 2
Chainstore is a very sparse dataset containing 1,112,949 sequences of customer transactions from a retail store, obtained and transformed from NU-Mine Bench, having 46,086 distinct items.It is partitioned into 5 databases with 222,589 transactions each, representing 5 different data sources.We first performed mono-mining on the union of these 5 databases by D2HUP algorithm [28] and calculated the normalized utility denoted by Nutil i MM .Then we mined these 5 databases separately by using the D2HUP algorithm and then applied our synthesis model to calculate normalized utility denoted by Nutil i SM when minutil = 500,000, µ = 0.6, Maximum_Profit = 82,362,000.The average deviation was found to be 0.004 for two different methods.The patterns mined from different databases and ranked with its synthesized utility are tabulated in Table 4.

Study 3
Retail is a sparse dataset containing 88,162 sequences of anonymous retail market basket data from an anonymous Belgian retail store, having 16,470 distinct items.It is partitioned into 4 databases with 22,040 transactions each, representing 4 different data sources.We first performed mono-mining on the union of these 4 databases using the EFIM algorithm [20] and calculated the normalized utility denoted by Nutil i MM .Then we mined these 4 databases separately by using the EFIM algorithm, and then applied our synthesis model to calculate normalized utility denoted by Nutil i SM when minutil = 20,000, µ = 0.75, Maximum_Profit = 481,021.The average deviation was found to be 0.025 for two different methods.The patterns mined from different databases and ranked with its synthesized utility are tabulated in Table 5. BMS is a sparse dataset containing 59,601 sequences of clickstream data from an e-commerce having 497 distinct items.It is partitioned into 3 databases with 19,867 transactions each, representing 3 different data sources.We first performed mono-mining on the union of these 3 databases using the EFIM algorithm [20] and calculated the normalized utility denoted by Nutil i MM .Then we mined these 3 databases separately by using the EFIM algorithm, and then applied our synthesis model to calculate normalized utility denoted by Nutil i SM when minutil = 500,000, µ = 0.3, Maximum_Profit = 9,449,280.
The average deviation was found to be 0.032 for the two different methods.The patterns mined from different databases and ranked with its synthesized utility are tabulated in Table 6.

Study 5
Foodmart 1 is a sparse dataset containing 4591 sequences of customer transactions from a retail store, obtained and transformed from the SQL-Server 2000, having 1559 distinct items.It is partitioned into 3 databases with 1530 transactions each, representing 3 different data sources.We first performed mono-mining on the union of these 3 databases using the EFIM algorithm [20] and calculated the normalized utility denoted by Nutil i MM .Then we mined these 3 databases separately by using the EFIM algorithm and then applied our synthesis model to calculate normalized utility denoted by Nutil i SM when minutil = 9000, µ = 0.3, Maximum_Profit = 30,240.The average deviation was found to be 0.524 for two different methods.The patterns mined from different databases and ranked with its synthesized utility are tabulated in Table 7. Foodmart 2 is a sparse dataset containing 9233 sequences of customer transactions from a retail store, obtained and transformed from the SQL-Server 2000, having 1559 distinct items.It is partitioned into 3 databases with 1530 transactions each, representing 3 different data sources.We first performed mono-mining on the union of these 3 databases using the EFIM algorithm [20] and calculated the normalized utility denoted by Nutil i MM .Then we mined these 3 databases separately by using the EFIM algorithm and then applied our synthesis model to calculate normalized utility denoted by Nutil i SM when minutil = 20,000, µ = 0.6, Maximum_Profit = 71,355.The average deviation was found to be 0.301 for the two different methods.The patterns mined from different databases and ranked with its synthesized utility are tabulated in Table 8.Accident is a moderately dense dataset containing 340,183 sequences of anonymized traffic accident data having 468 distinct items.It is partitioned into 4 databases with 85,045 transactions each, representing 4 different data sources.We first performed mono-mining on the union of these 4 databases using the EFIM algorithm [20] and calculated the normalized utility denoted by Nutil i MM .Then we mined these 4 databases separately by using the EFIM algorithm, and then applied our synthesis model to calculate normalized utility denoted by Nutil i SM when minutil = 7,500,000, µ = 0.5, Maximum_Profit = 31,171,329.The average deviation was found to be 0.101 for the two different methods.
The patterns mined from different databases and ranked with its synthesized utility are tabulated in Table 9.

Conclusions
This paper provides an extension to the work of Wu's model which is only applicable to frequent patterns, whereas our approach is applicable to frequent patterns as well as high-utility patterns in multiple databases.Experiments conducted in this study show that the results of the synthesis model and mono-mining are almost identical, hence the goal set during problem formation has been achieved.
The proposed method is useful when data sources are widely distributed and are not desirable to transport their whole database at the central node.The local pattern analysis gives the insights of the local behavior of nodes, whereas the global pattern analysis gives the insights of global behavior of nodes.The local analysis is useful for studying patterns at the local level and also at the global level, depending upon the weight of the local source in a global scenario.The weights of sources give the importance of local sources.However, the reliability of the weighted model proposed in this paper is an important issue to be discussed.If a given pattern occurs only in a single data source, if its synthesized utility is well above the threshold then it has to be considered as a high-utility valid pattern.Arbitrarily requiring that a pattern has to occur in multiple sources is not justified.There could be interaction effects between the parameters of different sources and they should also be considered in the equations defined.
Our proposed approach in this paper is suitable when data comes from different sources, such as sensor networks in the Internet of Things (IoT).Our approach is also beneficial when store managers are interested in high-profit items.This work can be extended further for dense datasets, as the results found for them were not very effective.Our future work will also focus on synthesizing

Conclusions
This paper provides an extension to the work of Wu's model which is only applicable to frequent patterns, whereas our approach is applicable to frequent patterns as well as high-utility patterns in multiple databases.Experiments conducted in this study show that the results of the synthesis model and mono-mining are almost identical, hence the goal set during problem formation has been achieved.
The proposed method is useful when data sources are widely distributed and are not desirable to transport their whole database at the central node.The local pattern analysis gives the insights of the local behavior of nodes, whereas the global pattern analysis gives the insights of global behavior of nodes.The local analysis is useful for studying patterns at the local level and also at the global level, depending upon the weight of the local source in a global scenario.The weights of sources give the importance of local sources.However, the reliability of the weighted model proposed in this paper is an important issue to be discussed.If a given pattern occurs only in a single data source, if its synthesized utility is well above the threshold then it has to be considered as a high-utility valid pattern.Arbitrarily requiring that a pattern has to occur in multiple sources is not justified.There could be interaction effects between the parameters of different sources and they should also be considered in the equations defined.
Our proposed approach in this paper is suitable when data comes from different sources, such as sensor networks in the Internet of Things (IoT).Our approach is also beneficial when store managers are interested in high-profit items.This work can be extended further for dense datasets, as the results found for them were not very effective.Our future work will also focus on synthesizing

Conclusions
This paper provides an extension to the work of Wu's model which is only applicable to frequent patterns, whereas our approach is applicable to frequent patterns as well as high-utility patterns in multiple databases.Experiments conducted in this study show that the results of the synthesis model and mono-mining are almost identical, hence the goal set during problem formation has been achieved.
The proposed method is useful when data sources are widely distributed and are not desirable to transport their whole database at the central node.The local pattern analysis gives the insights of the local behavior of nodes, whereas the global pattern analysis gives the insights of global behavior of nodes.The local analysis is useful for studying patterns at the local level and also at the global level, depending upon the weight of the local source in a global scenario.The weights of sources give the importance of local sources.However, the reliability of the weighted model proposed in this paper is an important issue to be discussed.If a given pattern occurs only in a single data source, if its synthesized utility is well above the threshold then it has to be considered as a high-utility valid pattern.Arbitrarily requiring that a pattern has to occur in multiple sources is not justified.There could be interaction effects between the parameters of different sources and they should also be considered in the equations defined.
Our proposed approach in this paper is suitable when data comes from different sources, such as sensor networks in the Internet of Things (IoT).Our approach is also beneficial when store managers are interested in high-profit items.This work can be extended further for dense datasets, as the results

Figure 1 .
Figure 1.The proposed model of synthesizing patterns by weighting.DS i-ith data source; P i-Pattern set mined from DS i; SPS-The synthesized pattern set with normalized utility.

Algorithm 1 :
Pattern_Selection (P): Input: P-Set of 'N' patterns forwarded by different data sources; n-Number of different sources; µ-Minimum voting degree.Output: The filtered set of patterns P. 1. for i = 1 to N do a.Occurrence(R i ) = Number of occurrences of R i in P; b. if (Occurrence(R i )/n < µ) i. P = P − {R i }; 2. end for; 3. return P;

Figure 2 .
Figure 2. Data visualization for overall results.

Figure 4 .
Figure 4. Running time on Accident dataset.

Figure 4 .
Figure 4. Running time on Accident dataset.

Figure 6 .
Figure 6.Running time on Retail dataset.

Figure 6 .
Figure 6.Running time on Retail dataset.

Figure 6 .
Figure 6.Running time on Retail dataset.

Table 5 .
Experimental results for the Retail dataset.

Table 6 .
Experimental results for BMS dataset.

Table 7 .
Experimental results for the Foodmart 1 dataset.

Table 9 .
Experimental results for Accident dataset.