Multi-Core Parallel Gradual Pattern Mining Based on Multi-Precision Fuzzy Orderings

: Gradual patterns aim at describing co-variations of data such as the higher the size, the higher the weight . In recent years, such patterns have been studied more and more from the data mining point of view. The extraction of such patterns relies on efﬁcient and smart orderings that can be built among data, for instance, when ordering the data with respect to the size, then the data are also ordered with respect to the weight. However, in many application domains, it is hardly possible to consider that data values are crisply ordered. When considering gene expression, it is not true from the biological point of view that Gene 1 is more expressed than Gene 2, if the levels of expression only differ from the tenth decimal. We thus consider fuzzy orderings and fuzzy gamma rank correlation . In this paper, we address two major problems related to this framework: (i) the high memory consumption and (ii) the precision, representation and efﬁcient storage of the fuzzy concordance degrees versus the loss or gain of computing power. For this purpose, we consider multi-precision matrices represented using sparse matrices coupled with parallel algorithms. Experimental results show the interest of our proposal.


Introduction
In data mining, mining for frequent patterns (In this paper, the words item and pattern are considered as being synonyms.)has been extensively studied during recent years.Among the patterns that can be discovered, gradual patterns aim at describing co-variations of attributes, such as the higher the size, the higher the weight.Such a gradual pattern relies on the fact that when the age increases, the salary also increases, people being ranked regarding their age and salary.However, real world databases may contain information that can hardly be ranked in a crisp manner.For instance, gene expression levels are measured by instruments and are imperfect.For this reason, an expression level can hardly be declared as being greater than another one if they only differ from a small value.We thus claim that orderings must be considered as being soft.Fuzzy orderings and fuzzy ranking indeed allow to handle vagueness, ambiguity or imprecision present in problems for deciding between fuzzy alternatives and uncertain data [1][2][3]12,16].However, though there are great benefits to fuzzy orderings and fuzzy rank correlation measures, these techniques prevent us from considering binary relations (greater than / lower than) and binary representations in machine which are efficient from the memory consumption and computation time (binary masks) points of view.The representation and efficient storage of the vagueness and imprecision of the data is indeed a complex challenge as studied in [2].We thus propose a framework to address the high memory consumption, the representation, precision and efficient storage of the fuzzy concordance degrees, by using sparse matrices and high performance computing (parallel programming).
This paper is organized as follows: Section 2 reports existing work on fuzzy orderings, gradual pattern mining and parallel data mining.Section 3 presents our gradual item set mining algorithm and our framework to address the high memory consumption, the representation, precision and efficient storage of the fuzzy concordance degrees.Experimental results are presented in Section 4. Section 5 is our conclusion.

Related Work
In this section, we recall the definition of gradual pattern in the particular context of fuzzy orderings (Section 2.1) before presenting parallel data mining (Section 2.2).

Gradual Pattern Mining and Fuzzy Orderings
Frequent gradual patterns are patterns like the older, the higher the salary.They are extracted from databases whose schema is defined over several attributes, also called items, which domains must be equipped with a total ordering.In our framework, we consider the following definitions of gradual item, gradual itemset, concordant couple and support of a gradual itemset.
Let db be a database constituted of n data records (objects) denoted by O = {o 1 , o 2 , ..., o n } defined over the database schema of m attributes A = {A 1 , A 2 , ..., A m } which domains are equipped with an order relation.The set of objects O must also be equipped with an order relation so that db is said to be a gradual database.Table 1 reports an example of such a database where there are three attributes related to three characteristics of five fruits.These attributes are numeric and the order relation is the "is lower than"one over every single attribute.Regarding a set of attributes A, objects are ordered by considering that an object precedes (resp.succeeds) another one if its value is lower than (resp.greater than) the value of the second object on every attribute from A. A gradual item is defined as a pair (A l ∈ db, v) where v is a variation that can be ascending (↑) if the attribute values increase, or descending (↓) if the attribute values decrease, i.e., {A l ↑} {A l (o i )<A l (o j )} and {A l ↓} {A l (o i )>A l (o j )} for i=1, 2, ..., n, for j=i+1, ..., n, i =j and l∈{1,2, ..., k} [4,5].
A GI (gradual itemset) is a combination of gradual items of the form GI={A 1 ↓A 2 ↓A 3 ↑} interpreted as {The lower A 1 , the lower A 2 , the higher A 3 }.Where the size (k) of a GI is defined as the number of gradual items contained in the GI, such that k∈{2, 3, 4,..., m}, each gradual items ∈ GI is unique [4,5].
For instance, (size, ↑) is a gradual item and {(size, ↑), (weight, ↑)} is a gradual itemset.A GI is an interesting pattern if support(GI) is greater than or equal to the user-predefined minimal support called minimum threshold (minsup).
Several definitions have been proposed in order to compute the support of a gradual pattern within a database.In our approach, we opted for the framework of the gradual dependency interpretation framework based on induced rankings correlation and concordant couple concept [2,4,13,15].A concordant couple (cc) is an index pair, where the records (o i , o j ) satisfy all the variations v expressed by the involved gradual items in a given GI of size k, e.g., let [2,4].
In this framework, the support of a GI is computed as: Given a gradual pattern and a database, it can be represented using a binary matrix which helps computing the support (whatever the support computation technique), as presented in [6] and on Fig 2 .In [5,7], we have shown that fuzzy orderings must be considered in order to better represent the real world where data cannot always be crisply ordered.In this context, we consider a framework based on the principles of Kendall's tau (rank correlation coefficient), Goodman's and Kruskal's gamma rank correlation measure, Bodenhofer's and Klawonn's fuzzy gamma rank correlation measure denoted by cp(i, j) ranging from 0 to 1.As the degree is no more binary, we have to consider an extension of the matrices, as described in [7]. Figure 2 illustrates the structure of the matrix of fuzzy concordance degrees cp(i, j) ∈ [0, 1], represented with a precision of 2 and 3 bits.In order to address the problem of the precision of the representation and efficient storage of each concordance degree cp(i, j), we consider the storage requirements for the binary case and fuzzy case.

Precision of 2 bits Precision of 3 bits
We use a precision of one bit for the binary case, i.e., for each cp(i, j)∈{0, 1}, as it is sufficient to represent and store a {0|1}.In the proposed multi-precision matrices, each value mpm(i, j) is represented by a bit-field integer value containing up to 52 bits.Double precision floating point values of fuzzy concordance degrees cp(i, j) are calculated through the ratio f l(mpm(i, j))/f l(mmax), where f l(x) is the floating point real representation of the integer value x and mmax is the maximum integer value representable in mpm(i, j).For instance, the corresponding fuzzy concordance degree of the 010 (binary) integer value will be f l(010)/f l(111) = 2.0/7.0 = 0.428 (Figure 2).
In [7], we considered fuzzy orderings and the management of the fuzzy degrees using Yale matrices.However, in [7], the fuzzy degrees are represented using floating numbers which are memory consuming.In this paper, we thus consider multi-precision representations and their efficient implementation using integer values represented as variable length bit-fields.

Parallel Data Mining
Parallel computing has recently received lots of interest for use in data mining.New generations of multicore processors and GPUs provide ways to exploit parallelism to reduce execution time [14].This will allow larger (Big Data) problems to be worked on.
The parallel programming models are roughly divided into three categories: • Distributed memory systems (each processor has its own system memory that cannot be accessed by other processors, the shared data are transferred usually by message passing, e.g., sockets or message passing interface (MPI)); • Shared memory systems where processors share the global memory, they have direct access to the entire set of data.Here, accessing the same data simultaneously from different instruction streams requires synchronization and sequential memory operations; • Hierarchical systems (a combination of shared and distributed models, composed by multiprocessor nodes in which memory is shared by intra-node processors and distributed over inter-node processors).
In multicore architectures, a parallel program is executed by the processors through one or multiple control flows referred to as processes or threads [8,9].

Parallel Fuzzy Gradual Pattern Mining Based on Multi-Precision Fuzzy Orderings
In this section, we detail our approach.

Managing Multi-Precision
Concerning the implementation of the matrices of concordance degrees cp(i, j), we address two important issues: (i) the memory consumption; and (ii) the precision of the representation of the concordance degrees of each cp(i, j).
In order to reduce memory consumption, we represent and store each matrix of concordance degrees according to the Binary Fuzzy Matrix Multi-precision Format, where each cc(i, j)∈[0, 1] is represented with a precision of 2, 3, or more up to 52 bits.
Because we generate itemset candidates from the frequent k-itemsets, only matrices of the (k − 1)-level frequent gradual itemsets are kept in memory while being used to generate the matrices of the (k)-level gradual itemset candidates.If the support of a gradual itemset (C k,q ) is less than minimum threshold, then the C k,q is pruned and its matrix of fuzzy concordance degrees cp(i, j) is removed.
As seen in the previous section, fuzzy orderings are interesting but consume large memory slots when they are stored as floating numbers.
On the other hand, binary matrices are very efficient regarding both memory and time consumption, we thus consider binary vectors in order to represent the fuzzy degrees.The size of these vectors determine the precision we manage.Figure 3 shows how values are represented at the 3 bits precision.Each cp(i, j) ∈ [0, 1] is thus represented with a precision ranging from 1 bit (crisp case) to n bits (52 in our implementation).n bits allow to represent up to 2 n values.Figure 4 shows the real matrix of fuzzy concordance degrees.Figure 2 shows how to represent values at precision of 3 bits.In our Algorithm 1, the concept of matrix concordant degrees plays an important role.
Compute matrix of concordance degrees of candidate C k,q .M as : C k,q .M = − norm(I.M, J .M ); /* I.M, J .M are matrices of concordance degrees of itemsets Else delete(candidate and matrix); Delete(F k−1 and M atrices); k++; q + +; until F F GP does not grow any more;

Coupling Multi-Precision and Parallel Programming
The evaluation of the correlation, support, and generation of gradual pattern candidates are tasks that require huge amounts of processing time, memory consumption, and load balance.In order to reduce memory consumption, each matrix of fuzzy concordance degrees m cc(i,j) is represented and stored according to the Binary Fuzzy Matrix Multi-precision Format, where each cc(i, j) ∈ [0, 1] is represented with a precision of 2, 3, or more up to 52 bits.In order to reduce processing time we propose to parallelize the program using OpenMP, a shared memory architecture API, which is ideally suited for multi-core architectures [9].
Figure 5 shows an overall view of the parallel version of two regions of our fuzzyMGP algorithm where, in the first region, the extraction process of gradual patterns of size k = 2 is parallelized.In the second region, we show the parallelization of the extraction cycle of gradual patterns of size k > 2.

Set of lfrequent gradual patterns of evel k > 2. parallel fuzzy-MGP algorithm
In the experiments reported below, we aim at studying how multi-precision impacts performances, regarding the trade-off between high precision but high memory consumption and low memory consumption with low precision.The behavior of the algorithms is studied with respect to the number of bits allocated for storing the precision.The question raised is to study if there exists a threshold beyond which it is useless to consider allocating memory space.This threshold may depend on the database.

Databases and Computing Resources
We lead experiments on two databases.The first set of databases is a synthetic database generated in order to study scalability, and thus containing hundreds of attributes and lines that can be easily split in order to get several databases.
The second database comes from astrophysics called Amadeus Exoplanete, which consists of 97,718 instances and 60 attributes [10].In this paper, we report experimental results of parallel gradual pattern mining from three subsets of data: of 1000, 2000, and 3000 instances with 15 attributes.The three datasets were obtained from Amadeus Exoplanete database.
In order to demonstrate the benefit of high performance computing on fuzzy data mining, our experiments are run on an IBM supercomputer, more precisely on two servers: • an IBM dx360 M3 server embedding computing nodes configured with 2 × 2.66 GHx six core Intel (WESTMERE) processors, 24 Go DDR3 1,066 Mhz RAM and Infiniband (40 Gb/s) (reported as Intel); and • an IBM x3850 X5 server running 8 processors embedding ten INTEL cores (WESTMERE), representing 80 cores at 2.26 GHz, 1 To DDR3 memory (1,066 Mhz) and Infiniband (40 Gb/s) reported as SMP (because of its shared memory).

Measuring Performances
In our experiments, we report the speedup of our algorithms regarding the database size and complexity [11].Speedup is computed in order to prove the efficiency of our solution on high performance platforms and thus its scalability in order to tackle very large problems.
The speedup of a parallel program expresses the relative diminution of response time that can be obtained by using a parallel execution on p processors or cores compared to the best sequential implementation of that program.The speedup (Speedup(p)) of a parallel program with parallel execution time T(p) is defined as where: • p is the number of processors/cores or threads; • T(1) is the execution time of the sequential program (with one thread or core); • T(p) is the execution time of the parallel program with p processors, cores, or threads.

Main Results
We first notice that computing time is impacted by the choice of minimum threshold but it is not noticeably affected by a small difference of precision (see Figure 6).Furthermore it has no impact at all on measured speed-ups.
Figures 6-9 show that we can achieve very good accelerations on synthetic databases even for a relatively high level of parallelization (more than 50 processing units).In particular, Intel nodes show a good speedup on small precision (Figure 10), which shows the interest of managing multi-precision in order to adapt to the computing resources (memory) being available.
Regarding real database on astrophysics, our experiments show that on Intel nodes, 2000 lines can be managed at 4 bits precision (Figure 11), and up to 3000 lines at precision 2 bits (Figure 12) while it is impossible to manage 3000 lines at 4 bits precision due to memory consumption limits.On SMP nodes, our experiments show excellent speedup and scale up even over large databases, without memory explosion (Figure 13).Elapsed time (sec.)Minimum threshold = .17,0.18 Minimum threshold = .17,0.18

Scaleup -Speedup
. Execution time and Speedup related to the number of threads on Intel nodes for real database of 3000 lines at precision 2 bits with minimum threshold values 0.17 and 0.18.Elapsed time (sec.)Minimum threshold = .17,0.18 Minimum threshold = .17,0.18 Scaleup -Speedup

Conclusions
In this paper, we address the extraction of gradual patterns when considering fuzzy ordering.This allows for dealing with imperfection in the datasets, when values can hardly be crisply ordered.For instance, this situation often occurs when considering data collected from sensors.In this case, the measurement error leads to values that can be considered as being similar even if they are not equal.The extent to which they can be considered as similar is handled by considering fuzzy orderings and fuzzy gamma rank correlation which we propose to introduce in the gradual pattern mining algorithms.We show that the parallelization of such algorithms is necessary to remain scalable regarding both memory consumption and runtime.Memory consumption is indeed challenging in our framework as introducing fuzzy ranking prevents us to use a single bit for representing that such value is greater than such other one.We this introduce the notion of precision and we propose an efficient storage of the fuzzy concordance degrees that can be tuned (from 2 to 52 bits) in order to manage the trade-off between memory consumption and the loss or gain of computing power.

Figure 1 .
Figure 1.Illustration of the binary matrix of concordant couple and computing of support.

Figure 2 .
Figure 2. Illustration of the matrix of fuzzy concordance degrees represented with a precision of 2 and 3 bits. o

Figure 3 .
Figure 3. Illustration of the real matrix of fuzzy concordance degrees.

Figure 4 .
Figure 4. Illustration of the binary matrix of fuzzy concordance degrees with a precision of three bits.

Figure 6 .
Figure 6.Execution time and Speedup related to the number of threads on Intel nodes for synthetic database of 150 attributes at precisions 6 and 9 bits.

Figure 7 .Figure 8 .Figure 9 .Figure 10 .Figure 11 .
Figure 7. Execution time and Speedup related to the number of threads on Intel nodes for synthetic database of 200 attributes at precision 8 bits with minimum threshold values 0.411, 0.412 and 0.413.

Figure 13 .
Figure 13.Execution time, Speedup and Scaleup related to the number of threads on SMP nodes for real database of 3000 lines at precision 12 bits with minimum threshold values 0.17 and 0.18.