A Summary of F-Transform Techniques in Data Analysis

Martino, Ferdinando Di; Perfilieva, Irina; Sessa, Salvatore

doi:10.3390/electronics10151771

Open AccessFeature PaperReview

A Summary of F-Transform Techniques in Data Analysis

by

Ferdinando Di Martino

^1,2,*

,

Irina Perfilieva

³

and

Salvatore Sessa

^1,2

¹

Dipartimento di Architettura, Università degli Studi di Napoli Federico II, Via Toledo 402, 80134 Napoli, Italy

²

Di Ricerca “Alberto Calza Bini”, Università degli Studi di Napoli Federico II, Via Toledo 402, 80134 Napoli, Italy

³

Institute for Research and Applications of Fuzzy Modeling, 708 00 Ostrava, Czech Republic

^*

Author to whom correspondence should be addressed.

Electronics 2021, 10(15), 1771; https://doi.org/10.3390/electronics10151771

Submission received: 28 June 2021 / Revised: 18 July 2021 / Accepted: 22 July 2021 / Published: 24 July 2021

(This article belongs to the Special Issue Fuzzy Systems and Data Science)

Download

Browse Figures

Versions Notes

Abstract

:

Fuzzy transform is a technique applied to approximate a function of one or more variables applied by researchers in various image and data analysis. In this work we present a summary of a fuzzy transform method proposed in recent years in different data mining disciplines, such as the detection of relationships between features and the extraction of association rules, time series analysis, data classification. After having given the definition of the concept of Fuzzy Transform in one or more dimensions in which the constraint of sufficient data density with respect to fuzzy partitions is also explored, the data analysis approaches recently proposed in the literature based on the use of the Fuzzy Transform are analyzed. In particular, the strategies adopted in these approaches for managing the constraint of sufficient data density and the performance results obtained, compared with those measured by adopting other methods in the literature, are explored. The last section is dedicated to final considerations and future scenarios for using the Fuzzy Transform for the analysis of massive and high-dimensional data.

Keywords:

direct F-transform; inverse F-transform; multi-dimensional F-transform; fuzzy partition; dependency between attributes time series; data classification

1. Introduction

Fuzzy Transform (for short, F-transform) [1,2] is a recent soft computing approximation technique, successfully used in numerous applications in image and data analysis (see, e.g., [3] for an in-depth discussion on this matter).

In particular, the properties of the F-transform in the information aggregation and function approximation favors its use in many data analysis and data mining problems.

The aim of this paper is to provide an in-depth overview of soft computing data analysis techniques based on the use of the F-transform proposed in the literature.

Some variations of basic functions used to constrict the F-transform are proposed in [4], in which they are given by B-spline functions, and in [5] where the basic functions are given by block pulse functions.

Recently an extension of the basic F-transform on higher-degree F-transform was introduced in [6] by generalizing the case of constant (zero-order) components to the case of m-order polynomial components. In [7,8] the applicability of the m-order F-transform is discussed and an application of the one-degree F-transform in seasonal time series forecasting is presented in [9]. However, while increasing the performance in terms of accuracy and precision of the results compared to basic F-transforms, the higher-degree fuzzy transforms are computationally more complex to manage and this makes them unsuitable for use in data analysis applications, especially in the presence of datasets of high cardinality and size.

In this work we focus on the application of the basic (zero-order) F-transform in data analysis. We will discuss the techniques proposed in the literature that employ the direct and inverse zero-order F-transform in data mining problems, such as dependencies between attributes, time series analysis and data classification, analyzing their critical points and performance benefits.

F-transform techniques were initially applied in image analysis in which the constraint of sufficient density described in Section 2 is always respected. In data analysis, however, the application of the F-transform necessarily requires the management of this constraint and the choice of suitable fuzzy partitions of the domains of the input variable and the choice of the appropriate dimensionality of the fuzzy partitions which cannot be too fine, to guarantee sufficient data density, nor too coarse grained, to guarantee high performance levels.

In Section 2 we introduce the one-dimensional and multi-dimensional F-transforms, providing a summary of their characteristics. In particular, the constraint of sufficient density of the data will be analyzed, which is of extreme importance in the use of F-transform techniques in data analysis. In Section 3 are discussed the methods proposed in the literature applying the multidimensional F-transform in the analysis of dependencies between attributes in the data and in detecting association rules. Section 4 focuses on the F-transform techniques applied in time series analysis. In Section 5 a classification method based on the multi-dimensional F-transform is discussed. Final considerations are contained in Section 6. A list with descriptions of all acronyms and abbreviations in the text is given in Appendix A.

2. Preliminaries

2.1. Basic Functions

Let X = [a,b] be a close interval in R and {x₁, x₂, …, x_n} be a set of n fixed points in [a,b] such that 3 ≤ n and a = x₁ < x₂ <…< x_n = b.

In [1,2] the following definition of fuzzy partition of X was introduced: the fuzzy sets A₁, …, A_n: [a,b] → [0,1] form a (generalized) fuzzy partition of [a,b], if for each k = 2, …, n − 1, the following constraints hold:

1.: $A_{k} (x) = 0 \forall x \notin (x_{k - 1}, x_{k + 1})$ (locality)
2.: $A_{k} (x) > 0 \forall x \in (x_{k - 1}, x_{k + 1}) and A_{k} (x_{k}) = 1$ (positivity)
3.: A_k is continuous in $[x_{k - 1}, x_{k + 1}]$ (continuity)
4.: A_k is strictly decreasing in (x_k−1, x_k) and strictly increasing in (x_k, x_k+1)
5.: $\sum_{k = 1}^{n} A_{k} (x) = 1 \forall x \in [a, b]$ (Ruspini condition).

The membership functions

\{A_{1}, \dots, A_{n}\}

are called basic functions. If the nodes x₁, ..., x_n are equidistant, the fuzzy partition

\{A_{1}, \dots, A_{n}\}

is called h-uniform fuzzy partition of [a,b] where h = (b − a)/(n + 1) is the distance between two consecutive nodes.

For an h-uniform fuzzy partition the following additional properties hold:

1.: $A_{k} (x_{k} - x) = A_{k} (x_{k} + x) \forall x \in [0, h]$
2.: $A_{k} (x) = A_{k - 1} (x - h) and A_{k - 1} (x) = A_{k} (x + h) \forall x \in [x_{k}, x_{k + 1}]$

An h-uniform fuzzy partition can be generated (see, e.g., [2]) by an even function A₀: [[–1,1] → [0,1], which is continuous, positive in (−1,1) and null on boundaries {−1,1}. The function A₀ is called generating function of the h-uniform fuzzy partition. The following expression represents an arbitrary basic function from an h-uniform generalized fuzzy partition:

A_{k} (t) = \{\begin{array}{l} A_{0} (\frac{x - x_{k}}{h}) & x \in [x_{k} - h, x_{k} + h] \\ 0 & otherwise \end{array} .

(1)

2.2. One-Dimensional Direct and Inverse F-Transform

Let {A₁, A₂, …, A_n} be a fuzzy partition of [a,b] and f(x) be a continuous function on [a,b]. The n-tuple

[F_{1}, F_{2}, \dots, F_{n}]

with components:

F_{k} = \frac{\int_{a}^{b} f (x) A_{k} (x) d x}{\int_{a}^{b} A_{k} (x) d x} k = 1, \dots, n

(2)

is called the fuzzy transform of f with respect to {A₁, A₂, …, A_n}. The F_k are called components of the F-transform.

If the fuzzy partition {A₁, A₂, …, A_n} is uniform with nodes x₁, x₂, …, x_n, the components are given (cfr. [2] Lemma 1) by the formula:

F_{k} = \{\begin{cases} \frac{2}{h} \int_{x_{1}}^{x_{2}} f (x) A_{k} (x) d x if k = 1 \\ \frac{1}{h} \int_{x_{i - 1}}^{x_{i}} f (x) A_{k} (x) d x if k = 2, \dots, n - 1 \\ \frac{2}{h} \int_{x_{n - 1}}^{x_{n}} f (x) A_{k} (x) d x if k = n \end{cases}

(3)

Now we define the following function on [a,b] given by a weighted average of the basic functions in which the weights are the F-transform components:

f_{F, n} (x) = \sum_{k = 1}^{n} F_{k} A_{k} (x) x \in [a, b]

(4)

It is called inverse F-transform of f with respect to the uniform fuzzy partition {A₁, A₂, …, A_n}. An important theorem proves that the function f_F,n approximates the continuous function f on [a,b] with arbitrary precision. We enunciate below this theorem and its proof is given in [2] Theorem 2.

Theorem 1.Let f(x) be a continuous function on [a,b]. For every ε > 0, then there exist an integer n(ε) and a related fuzzy partition {A₁, A₂, …, A_n(ε)} of [a,b] such that for all x ∊ [a, b] results

|f (x) - f_{F, n (ε)} (x)| < ε

.

Theorem 1 concerns the approximation of a known continuous function f, but in many cases we only know that the function f assumes determined values in a set of m points p₁, …, p_m ∊ [a,b].

We assume that the set P of these nodes is sufficiently dense with respect to the fixed fuzzy partition, i.e., for each k = 1, …, n there exists an index j ∊ {1, …, m} such that A_k(p_j) > 0. Then we can define the n-tuple [F₁, F₂, …, F_n] as the discrete F-transform of f with respect to {A₁, A₂, …, A_n }, where each F_k is given by

F_{k} = \frac{\sum_{j = 1}^{m} f (p_{j}) A_{k} (p_{j})}{\sum_{j = 1}^{m} A_{k} (p_{j})} k = 1, \dots, n

(5)

Then we call the discrete inverse F-transform of f with respect to {A₁, A₂, …, A_n} to be the following function defined in the same points p₁, ..., p_m of [a,b]:

f_{F, n} (x) = \sum_{k = 1}^{n} F_{k} A_{k} (x) x \in [a, b]

(6)

Analogously to Theorem 1, we have the following approximation theorem (its proof is given in [2] Theorem 5.

Theorem 2.Let f(x) be a function assigned on a set P of points p₁, ..., p_m of [a,b]. Then, for every ε > 0, there exists an integer n(ε) and a related fuzzy partition {A₁, A₂, …, A_n(ε) } of [a,b] such that P is sufficiently dense with respect to {A₁, A₂, …, A_n(ε) } and for every p_j ∊ [a, b], j = 1, …, m, holds

|f (x) - f_{F, n (ε)} (x)| < ε

.

Theorem 2 states that the inverse F-transform (6) approximates the original continuous function f in a point with an arbitrary precision.

2.3. Multi-Dimensional Direct and Inverse F-Transform

The one-dimensional F-transform can be extended to approximate continuous functions defined in a N-dimensional domain given by the Cartesian product [a₁,b₁] × [a₂,b₂] ×… × [a_s,b_s] of s real intervals [a_i,b_i] ⊆ R (i = 1, …, s).

Let f: [a₁,b₁] × [a₂,b₂] ×… × [a_s,b_s] → R be a continuous function on the universe of discourse. Let

\{A_{11}, A_{12}, \dots, A_{1 n_{1}}\}

,

\{A_{21}, A_{22}, \dots, A_{2 n_{2}}\}, \dots

\{A_{s 1}, A_{s 2}, \dots, A_{s n_{s}}\}

uniform fuzzy partitions of [a₁,b₁], …,[a_s,b_s], respectively.

The F-transform of the function f with respect to

\{A_{11}, A_{12}, \dots, A_{1 n_{1}}\}, \{A_{21}, A_{22}, \dots, A_{2 n_{2}}\}, \dots, \{A_{s 1}, A_{s 2}, \dots, A_{s n_{s}}\}

, are the functions given by

F_{k_{1} k_{2} \dots k_{s}} = \frac{\int_{a_{s}}^{b_{s}} \dots \int_{a_{2}}^{b_{2}} \dots \int_{a_{1}}^{b_{1}} f (x_{1}, x_{2}, \dots, x_{s}) A_{k_{1}} (x_{1}) A_{k_{2}} (x_{2}) \dots A_{k_{s}} (x_{s}) d x_{1} d x_{2} \dots d x_{s}}{\int_{a_{s}}^{b_{s}} \dots \int_{a_{2}}^{b_{2}} \dots \int_{a_{1}}^{b_{1}} A_{k_{1}} (x_{1}) A_{k_{2}} (x_{2}) \dots A_{k_{s}} (x_{s}) d x_{1} d x_{2} \dots d x_{s}}

(7)

The inverse F-transform of the function f with respect to

\{A_{11}, A_{12}, \dots, A_{1 n_{1}}\}

,

\{A_{21}, A_{22}, \dots, A_{2 n_{2}}\}, \dots

\{A_{s 1}, A_{s 2}, \dots, A_{s n_{s}}\}

are the following functions defined on [a₁,b₁] × [a₂,b₂] ×…× [a_s,b_s]:

f_{n_{1} n_{2} \dots n_{s}}^{F} (x_{1}, x_{2}, \dots, x_{s}) = \sum_{k_{1} = 1}^{n_{1}} \sum_{k_{2} = 1}^{n_{2}} \dots \sum_{k_{s} = 1}^{n_{s}} F_{k_{1} k_{2} \dots k_{s}} A_{k_{1}} (x_{1}) A_{k_{2}} (x_{2}) \dots A_{k_{s}} (x_{s})

(8)

Let the function f(x₁, x₂, …, x_s) be known in N points p_j = (p_j₁, p_j₂, …, p_js) ∊ [a₁,b₁] × [a₂,b₂] ×…× [a_s,b_s] being j = 1, 2, …, N.

The set P = {(p₁₁, p₁₂, …, p_1s), (p₂₁, p₂₂, …, p_2s), …, (p_N₁, p_N₂, …, p_Ns)} is called sufficiently dense with respect to the partitions

\{A_{11}, A_{12}, \dots, A_{1 n_{1}}\}

, …,

\{A_{s 1}, A_{s 2}, \dots, A_{s n_{s}}\}

if, for any combination (h₁, …, h_s) ∊ {1, …, n₁} ×…× {1, …, n_s} there is some

p_{v} = (p_{v 1}, p_{v 2}, p_{v s})

∊ P, v ∊ {1, …, N}, such that

A_{1 h_{1}} (p_{v 1}) \cdot A_{2 h_{2}} (p_{v 2}) \cdot \dots \cdot A_{s}_{h_{s}} (p_{v s}) > 0

. So we can define the (h₁, h₂, …, h_s)th components

F_{h_{1} h_{2} \dots h_{s}}

of the direct F-transform of f with respect to the basic functions

\{A_{11}, A_{12}, \dots, A_{1 n_{1}}\}

, …,

\{A_{s 1}, A_{s 2}, \dots, A_{s n_{s}}\}

as

F_{h_{1} h_{2} \dots h_{s}} = \frac{\sum_{j = 1}^{N} f (p_{j 1}, p_{j 2}, \dots p_{j s}) \cdot A_{1 h_{1}} (p_{j 1}) \cdot A_{2 h_{2}} (p_{j 2}) \cdot \dots \cdot A_{s h_{s}} (p_{j s})}{\sum_{j = 1}^{N} A_{1 h_{1}} (p_{j 1}) \cdot A_{2 h_{2}} (p_{j 2}) \cdot \dots \cdot A_{s h_{s}} (p_{j s})}

(9)

If the set P is sufficiently dense with respect to the fuzzy partition we can define the inverse multi-dimensional F-transform of f with respect to the basic functions

\{A_{11}, A_{12}, \dots, A_{1 n_{1}}\}

,

\{A_{21}, A_{22}, \dots, A_{2 n_{2}}\}, \dots

\{A_{s 1}, A_{s 2}, \dots, A_{s n_{s}}\}

to be the following functions by setting for each point p_j = (p_j₁, p_j₂, …, p_js) ∊ [a₁,b₁] × …× [a_s,b_s]:

f_{n_{1} n_{2} \dots n_{s}}^{F} (p_{j 1}, p_{j 2}, \dots, p_{j s}) = \sum_{h_{1} = 1}^{n_{1}} \sum_{h_{2} = 1}^{n_{2}} \dots \sum_{h_{s} = 1}^{n_{s}} F_{h_{1} h_{2} \dots h_{s}} \cdot A_{1}_{h_{1}} (p_{j 1}) \cdot \dots \cdot A_{s}_{h_{s}} (p_{j s})

(10)

for j = 1, …, N. The following theorem, which is an extension of Theorem 2, holds:

Theorem 3.Let

f (x_{1}, \dots, x_{s})

be a function assigned on the set of points P = {(p₁₁, p₁₂, …, p_1s),(p₂₁, p₂₂, …, p_2s), …,(p_m1, p_m2, …, p_ms)}

\subset

[a₁,b₁] × [a₂,b₂] × …× [a_k,b_s] and assuming values in [0,1]. Then for every ε > 0, there exist k integers n₁(ε), …, n_s(ε) and related fuzzy partitions

\{A_{11}, A_{12}, \dots, A_{1 n_{1} (ε)}\}

, …,

\{A_{s 1}, A_{s 2}, \dots, A_{s n_{s} (ε)}\}

such that the set P is sufficiently dense with respect to this fuzzy partitions. Moreover, for every p_j = (p_j1, p_j2, …, p_js)

\in

P, j = 1, …, m, the following inequality holds.

|f (p_{j 1}, p_{j 2}, \dots, p_{j s}) - f_{n_{1} (ε) n_{2} (ε) \dots}^{F}_{n_{s} (ε)} (p_{j 1}, p_{j 2}, \dots, p_{j s})| < ε

(11)

The inverse multi-dimensional F-transform

f_{n_{1} n_{2} \dots n_{s}}^{F}

can be used in regression analysis only if the input dataset is sufficiently dense with respect to the set of fuzzy partitions

\{A_{11}, A_{12}, \dots, A_{1 n_{1}}\}

,

\{A_{21}, A_{22}, \dots, A_{2 n_{2}}\}, \dots

\{A_{s 1}, A_{s 2}, \dots, A_{s n_{s}}\}

.

In Figure 1 an example of data points not sufficiently dense with respect to the fuzzy partition is shown. Let

\{A_{11}, A_{12}, \dots, A_{1 n_{1}}\}

be a fuzzy partition of the domain [a₁,b₁] and

\{A_{21}, A_{22}, \dots, A_{2 n_{2}}\}

be a fuzzy partition of the domain [a₂,b₂]. The data points are shown in red. No data points are located within the subset

[a_{1 h - 1}, \dots, a_{1 h + 1}] \times [a_{2 k - 1}, \dots, a_{2 k + 1}]

, corresponding to the dark yellow area in Figure 1. Consequently, for each data point p_j = (p_j₁, p_j₂) j = 1, …, N we have

A_{1}_{h} (p_{j 1}) = 0

and

A_{2}_{k} (p_{j 2}) = 0

. Then, the data are not sufficiently dense with respect to this set of two fuzzy partitions.

In Figure 2 two examples of fuzzy partitions that are more coarse-grained with respect to the fuzzy partitions are shown in Figure 1. In both cases the data points are sufficiently dense with respect to the fuzzy partitions.

It is necessary to properly set the size of the fuzzy partitions. In fact, the use of fuzzy partitions that are too thin can make the data points not sufficiently dense with respect to them; on the contrary, fuzzy partitions that are too coarse grained, while guaranteeing the sufficient density of the data points, can significantly reduce the performances of the regression analysis methods in which the inverse multi-dimensional fuzzy transform is used as a regression function.

3. Multi-Dimensional F-Transform Methods to Explore Dependency and Rules in the Data

3.1. Multi-Dimensional F-Transform Techniques to Detect Dependency between Attributes in Datasets

The multi-dimensional F-transform was applied by many researchers to detect dependency among numerical features in datasets.

In [10,11] the multi-dimensional discrete F-transform is applied to find dependency between attributes in the data.

Following [10,11] a dataset with r features can be schematized as a relation with r attributes and m instances as in Table 1.

where X₁, …, X_i, …, X_r are the attributes and, O₁, …, O_j, …, O_m (m > r) are the objects in the dataset; each object O_j is given by an r-dimensional data point (p_j₁, …, p_ji, …, p_jr) where p_ji is the value assumed by O_j of the attribute X_i.

The attribute X_i is a variable assuming values in the real interval [a_i,b_i] defined by setting a_i = min{p_1i, …, p_mi} and b_i = max{p_1i, …, p_mi}.

In [10,11] the dependency is studied among attributes in the form:

X_{z} = H (X_{1}, \dots, X_{K})

(12)

where H: [a₁,b₁] × [a₂,b₂] ×…× [a_k,b_k] → [a_z,b_z] is a continuous function of k variables.

In [10] the multi-dimensional inverse F-transform was applied as a regression function to assess the functional dependency (12). The given function H(X₁, …, X_k) is known in m points P_j = (p_j₁, p_j₂, …, p_jk), j = 1, …, m, by setting H(p_j₁, p_j₂, …, p_jk) = p_jz for j = 1, 2, …, m.

For any interval [a_i,b_i], i = 1, …, k, a fuzzy partition {

A_{i 1}, A_{i 2}, ...., A_{i n_{i}}

} is created with n_i ≥ 3. If the set of m points is sufficiently dense with respect to these fuzzy partitions, we can define the multi-dimensional direct F-transform of H with (h₁, h₂, …, h_k)th components given by

F_{h_{1} h_{2} \dots h_{k}} = \frac{\sum_{j = 1}^{m} p_{j z} \cdot A_{1}_{h_{1}} (p_{j 1}) \cdot \dots \cdot A_{k}_{h_{K}} (p_{j k})}{\sum_{j = 1}^{m} A_{1}_{h_{1}} (p_{j 1}) \cdot \dots \cdot A_{k}_{h_{K}} (p_{j k})}

(13)

Using Formula (10), the inverse F-transform

H_{n_{1} n_{2} \dots n_{k}}^{F}

of H in the point P_j is given by

H_{n_{1} n_{2} \dots n_{k}}^{F} (p_{j 1}, p_{j 2}, \dots p_{j k}) = \sum_{h_{1} = 1}^{n_{1}} \sum_{h_{2} = 1}^{n_{2}} \dots \sum_{h_{k} = 1}^{n_{k}} F_{h_{1} h_{2} \dots h_{K}} \cdot A_{1 h_{1}} (p_{j 1}) \cdot \dots \cdot A_{k h_{K}} (p_{j k})

(14)

In [10] a measure of the dependency of X_z from X₁, …, X_k evauated by (14) is given by the statistical index of determinacy:

r_{c}^{2} = \frac{\sum_{j = 1}^{m} {(H_{n_{1} n_{2} \dots n_{k}}^{F} (p_{j 1}, p_{j 2}, \dots p_{jk}) - {\hat{p}}_{z})}^{2}}{\sum_{j = 1}^{m} {(p_{jz} - {\hat{p}}_{z})}^{2}}

(15)

where

{\hat{p}}_{z}

is the average of values p_1z, p_2z, …, p_mz of the attribute X_z.

The index of determinacy

r_{c}^{2}

ranges in the interval [0,1], where

r_{c}^{2}

= 0 means that

H_{n_{1} n_{2} \dots n_{k}}^{F}

does not fit to the data and, conversely,

r_{c}^{2}

= 1 means that

H_{n_{1} n_{2} \dots n_{k}}^{F}

fits perfectly to the data.

A variation of the Formula (15) used in multiple regression analysis to take into account the number of independent variables k and the scale of the data sample is given by (Johnson and Wichern, 1998):

{r ’}_{c}^{2} = 1 - [(1 - r_{c}^{2}) \cdot \frac{m - 1}{m - k - 1}]

(16)

This formula includes both the number of independent variables k and the scale of the data sample. The function H in the point (x₁, x₂, …, x_k) is approximated by the following formula:

H_{n_{1} n_{2} \dots n_{k}}^{F} (x_{1}, x_{2}, \dots x_{k}) = \sum_{h_{1} = 1}^{n_{1}} \sum_{h_{2} = 1}^{n_{2}} \dots \sum_{h_{k} = 1}^{n_{k}} F_{h_{1} h_{2} \dots h_{K}} \cdot A_{1 h_{1}} (x_{1}) \cdot \dots \cdot A_{k h_{K}} (x_{k})

(17)

In [10] the inverse multi-dimensional F-transform is applied to find dependency among attributes in the dataset containing economic data measured in the Czech Republic in quarters starting from 1997. The two indices of determinacy (15) and (16) are used to evaluate the existence of such dependency.

The results obtained show that the inverse multi-dimensional F-transform provides good performance used as a regression function for the analysis of the dependency between numerical attributes in the datasets. However, it is necessary to determine the optimal fuzzy partitions of the domains of the input attributes and check when the data points are not sufficiently dense. In [11] an algorithm has been proposed that finds the optimal fuzzy partitions and checks that the data points are sufficiently dense with respect to the fuzzy partition. This algorithm is schematized in Figure 3.

To reduce the computational costs, the same number n of fuzzy sets is assigned to each of the fuzzy partitions of the input attribute domains. Initially the minimum value n = 3 is set; in each cycle the algorithm checks that the data points are sufficiently dense with respect to the fuzzy partitions and, successively, calculates the direct multi-dimensional F-transform and, for each data point, the inverse multi-dimensional F-transform, finally measuring the value of the index of determinacy. If this value exceeds a predetermined α threshold, the algorithm ends by returning the components of the direct F-transform, otherwise a successive iteration is performed in which the number n is increased by one unit. If, during an iteration, the data points are not sufficiently dense with respect to the fuzzy partition, the algorithm terminates by reporting that it has not found the dependency of X_z on the attributes X₁, ..., X_k.

In [11] this algorithm is executed to explore dependency between oceanographic and surface meteorological attributes of a dataset containing data measured from a series of buoys positioned throughout the Equatorial Ocean Pacific and used to analyze the El Nino/Southern Oscillation (ENSO) cycles.

The application of the multi-dimensional F-transform as a machine learning regression function can become expensive in the presence of massive datasets in which the number of data points and the number of features become higher. In a recent work [12] an extension of the algorithm is proposed in [11], called MFAD (Massive F-transform Attribute Dependency) aimed to find dependencies between numerical attributes in massive datasets. MFAD apply a uniform sampling algorithm to partition the dataset in subsets having the same cardinality. The F-transform attribute dependency algorithm [11] is executed on each subset returning the multi-dimensional direct F-transform components (13) and the index of determinacy (16). Let F_q be the direct F-transform vector obtained applying the F-transform attribute dependency algorithm on the qth subset, where q = 1, …, s.

The functional dependency of X_z from X₁, X₂, …, X_k in the form X_z = H(X₁, X₂, …, X_k) in a point (x₁, x₂, …, x_k) is evaluated computing the following weighted average:

H^{F} (x_{1} {, x}_{2}, \dots {, x}_{k}) = \frac{\sum_{q = 1}^{s} w_{p} (x_{1} {, x}_{2}, \dots {, x}_{k}) \cdot H_{n_{q}}^{F} (x_{1} {, x}_{2}, \dots {, x}_{k})}{\sum_{p = 1}^{s} w_{q} (x_{1} {, x}_{2}, \dots {, x}_{k})}

(18)

where

H_{n_{q}}^{F} (x_{1} {, x}_{2}, \dots {, x}_{k})

q = 1, …, s is the value of the inverse multi-dimensional F-transform in the point (x₁, x₂, …, x_k) obtained by (17) using the qth direct F-transfom F_q and the weighted term

w_{q} (x_{1} {, x}_{2}, \dots {, x}_{k}),

q = 1, …, s, is given by the formula:

w_{q} (x_{1} {, x}_{2}, \dots {, x}_{k}) = \{\begin{array}{l} r_{c q}^{2} if (x_{1} {, x}_{2}, \dots {, x}_{k}) \in D_{q} \\ 0 otherwise \end{array}

(19)

The closed set D_q = [a_q1,b_q1] × [a_q2,b_q2] ×…× [a_qk,b_qk] is the domain in which is defined the qth subset and

r_{c q}^{2}

is the index of determinacy obtained by executing the F-transform attribute dependency algorithm on the qth subset.

The greater the index of determinacy

r_{c q}^{2}

, the greater the weight of the inverse multi-dimensional F-transform

H_{n_{q}}^{F} (x_{1} {, x}_{2}, \dots {, x}_{k})

in the approximation of the function H in the point (x₁, x₂, …, x_k). The weighted term (19) is null if the point (x₁, x₂, …, x_k) is outside the domain D_q.

In Figure 4 the MFAD method is schematized. Each subset is treated separately by applying the F-transform attribute dependency algorithm. The regression function is constituted by the weighted average of the single inverse-fuzzy transforms where the weights are the values of the index of determinacy obtained for each subset.

To test the MFAD algorithm in [12] it was applied on a large dataset given by the Italian National Statistical Institute census database with 140 numerical features related to census characteristics and measured for all the 402,678 Italian census tracts enclosed. In their tests the authors execute the MFAD algorithm by varying the number s of subsets and compare the results with those ones obtained by applying the classical F-transform attribute dependency algorithm [11] to the entire dataset (s = 1). Table 2 show the final index of determinacy obtained by applying MFAD to explore the dependency of X_z = Families in owned residences on the attribute X₁ = Resident population with job or capital income, setting a threshold α = 0.8.

Table 1 shows the value of the index of determinacy obtained for different values of the parameter s. The value of the index of determinacy obtained by running the attribute dependency algorithm on the entire dataset (s = 1) is 0.881. All the values of the resulting index of determinacy obtained applying MFAD with different number of datasets (from s = 8 to s = 40) are comparable with this value.

The results of tests performed in [12] on large datasets show that the performances are comparable with the ones obtained using the well-known Support Vector Regression (SVM) and Multilayer Perceptron (MLP) regression methods.

3.2. Multi-Dimensional F-Transform Techniques for Mining Association Rules

In [10] a method based on the multi-dimensional F-transform for mining association rules in the data is proposed. The inverse multi-dimensional F-transform (14) applied to find a dependency of the attribute X_z to the attribute X₁, …, X_k in the form X_z = H(X₁… X_k) can be used to mine association rules.

However, unlike to the functions describing dependency between attributes, mining associations are fuzzy functions which establish a correspondence between universes of fuzzy sets.

Let U₁, …, U_k be the domains of k attributes partitioned by fuzzy sets: a mining association functionally joins some fuzzy sets from partitions of U₁ … U_k with fuzzy sets over respective F-transform components.

Let {

A_{i h_{1}}

, …,

A_{i h_{i}}

, …,

A_{i h_{i}}

} be an uniform fuzzy partition of the domain of the ith attribute X_i constructed as basic functions of this domain. The fuzzy partition is obtained on the n_i nodes x_i₁, …,

x_{i n_{i}}

in the domain U_i.

Each association is supported by two parameters, namely the degrees of support r and confidence γ defined below. In [10] the multi-dimensional F-transform is applied in order to discover associations rules in the following form:

(X_{1} i s A_{1 h_{1}}) A N D (X_{2} i s A_{2 h_{2}}) A N D .. A N D (X_{k} i s A_{k h_{k}}) ∽^{F} m e a n (X_{z}) i s C

(20)

where

A_{i h_{i}}

, i = 1, …, k, models the meaning of the linguistic expression “approximately

x_{h_{i}}

”. The corresponding logic clause can be read as “X_i is approximately

x_{h_{i}}

”.

The label C in the consequent is one of the following linguistic expressions characterizing the (h₁, …, h_k)th component of the F-transform: Sm (small), Me (medium), Bi (big); it is eventually combined with one of the following linguistic hedges: Ex (extremely), Si (significantly), Ve (very), empty hedge, ML (more or less), Ro (roughly), QR (quite roughly), VR (very roughly). Let O_j, j = 1, 2, …, m, be the jth data point with component (p_j₁, p_j₂, …, p_jk, p_jz).

To measure the strength of the fuzzy rule (20), in [10] a membership function of an induced fuzzy set on the set of m data points {O₁, …, O_m} is defined by considering the antecedent of the hth rule (20):

A_{h} (O_{j}) = A_{1}_{h_{1}} (p_{j 1}) \cdot \dots \cdot A_{k}_{h_{k}} (p_{j k})

(21)

where

A_{i}_{h_{i}} (p_{j i})

is the membership degree to the fuzzy set

A_{i}_{h_{i}}

of the ith attribute in the jth data point. The following value

r = \frac{c a r d \{O_{j} | A_{h} (O_{j}) > 0\}}{m}

(22)

is called degree of support of the association rule (20). If

F_{h_{1} h_{2} \dots h_{k}}

is the (h₁, …, h_k)th component of the direct F-transform (13) and

f_{n_{1} n_{2} \dots n_{k}}^{F} (O_{j}) = \sum_{h_{1} = 1}^{n_{1}} \sum_{h_{2} = 1}^{n_{2}} \dots \sum_{h_{k} = 1}^{n_{k}} F_{h_{1} h_{2} \dots h_{K}} \cdot A_{1 h_{1}} (p_{j 1}) \cdot \dots \cdot A_{k h_{K}} (p_{j k})

(23)

is the inverse F-transform on the point (p_j₁, …, p_jk), in (Perfilieva et al., 2008) the degree of confidence of the association rule (20) is defined as

γ = \sqrt{\frac{\sum_{j = 1}^{m} {(f_{n_{1} n_{2} \dots n_{k}}^{F} (O_{j}) - F_{h_{1} h_{2} \dots h_{K}})}^{2} \cdot A_{1 h_{1}} (p_{j 1}) \cdot \dots \cdot A_{{kh}_{K}} (p_{jk})}{\sum_{j = 1}^{m} {(p_{jz} - F_{h_{1} h_{2} \dots h_{K}})}^{2} \cdot A_{1 h_{1}} (p_{j 1}) \cdot \dots \cdot A_{{kh}_{K}} (p_{jk})}}

(24)

The strength of the hth association rule is evaluated by measuring the degree of support r and the degree of confidence γ. If both the two parameters are greater or equal to a degree of support threshold and a degree of confidence threshold, respectively, the association is found.

In [10] this method is tested on a dataset of measures of air pollution produced on a road related to traffic volumes and weather conditions, collected by the Norwegian Public Roads Administration.

4. F-Transform Techniques for Time Series Analysis

Time series forecasting involves methods for fitting over historical data referring to measures of an observable series and using them to predict future observations.

A time series is given by a set of data measured at different times listed in time order. Let y be a measured parameter and y(t) the measure performed at the time t. A time series is a function y: t ∈ N → y(t)∈ R known in n regular time steps y(1), y(2), …, y(n), where y(i), i = 1, 2, …, n, is the measured value of y at the ith time step.

Time series forecasting techniques assess the value of y in the n future time steps y(n + 1), ..., y(n + m), where the value y(t + 1) at the step t + 1 is evaluated as a function of the previous p + 1 measured values y(t), y(t − 1), ..., y(t − p). Let y(t), t = 1, 2, …, T, be a time series. It can be decomposed by following two terms:

y (t) = f (t) + r (t)

(25)

The term f(t) is a deterministic part, called trend; the term r(t) is an additional random function called residuals, giving the random error with respect to the trend at the time t. A general model of a stationary time series y(t) as a linear function of the p + 1 measured values y(t), y(t − 1), ..., y(t − p) is the Auto-Regressive of order p model AR(p), given by ([13,14]):

y (t) = α_{1} y (t - 1) + \dots + α_{p} y (t - p) + ε_{t}

(26)

The p coefficients α₁, …, α_p must satisfy some constraints and the term ε_t is the statistical white noise giving the fluctuations in the observations that cannot be explained by the model.

4.1. One-Dimensional F-Transform Time Series Models

In [15,16] the one-dimensional F-transform is applied to approximate the trend f(t) in (25). Let {y(t), t = 1, 2…, T} be a time series given by a set of data y(t) measured in T regular time intervals. Let {t₁ = 1, t₂, …, t_n = T} be a set of n nodes of the interval [1,T], where 3 ≤ n ≤ T, and {A₁, ..., A_n} be the basic functions of a uniform fuzzy partition of the interval [1,T].

If the dataset given by the time series {y(t), t = 1, 2…, T} is sufficiently dense with respect to this fuzzy partition, then there exists the direct one-dimensional F-transform of f with components

F_{k} = \frac{\sum_{i = 1}^{T} y (t) A_{k} (t)}{\sum_{i = 1}^{T} A_{k} (t)} k = 1, 2..., n

(27)

Let P_k, k = 1, ..., n, be a subset of {1, 2, …, T} given by the time steps t, being A_k(t) > 0, as

P_{k} = \{t \leq T |A_{k} (t) > 0\}

(28)

We can decompose y(t) as:

y (t) = \lor_{k = 1}^{n} (F_{k} + r_{t k})

(29)

where r_tk is the kth residual of y_t with respect to A_k given by

r_{t k} = \{\begin{array}{l} y_{t} - F_{k} i f t \in P_{\begin{matrix} k \end{matrix}} \\ - \infty otherwise \end{array}

(30)

Based on the autoregressive model (26), in [15,16] the kth component F_k is given by a linear combination of the p previous components. The trend at the kth time step is assessed by

F_{k} = α_{1} F_{k - 1} + α_{2} F_{k - 2} + \dots + α_{p} F_{k - p} k = p + 1, \dots, n

(31)

In [15,16] p = 3 is set as well. The calculated value for F_n are used to forecast the unknown value F_n+1 as

F_{n + 1} = {\bar{α}}_{1} F_{n} + {\bar{α}}_{2} F_{n - 1} + {\bar{α}}_{3} F_{n - 3}

(32)

The values

{\tilde{α}}_{1}, {\tilde{α}}_{2}, {\tilde{α}}_{3}

chosen for the three coefficients

α_{1}, α_{2}, α_{3}

minimize the absolute difference between the predicted and the calculated values of F_n. In [15] a numerical method and a Multilayer Perceptron neural network are used to find the optimal values of the coefficients α₁, α₂, α₃. In [16] a method based on fuzzy relations is proposed to find the best values of the three coefficients.

In [16] comparisons with the autoregressive model ARIMA and with other time series fuzzy-based models are performed; the MAPE and SMAPE indexes are used to measure the forecast errors; the authors showed that their F-transform-based time series prevision model has the best performances.

In [17] the one-dimensional F-transform is proposed to filter the high frequencies in the time series. A time series can be additively decomposed into three components: trend cycle, a seasonal component, and noise. The authors prove that the one-dimensional F-transform acts as a low-pass filter, removing or significantly reducing the seasonal and noise components; then, the inverse F-transform optimally approximates the trend component.

4.2. Multi-Dimensional F-Transform Time Series Model

In [17] a time series forecasting model based on the multi-dimensional F-transform is proposed. The authors applied their method to the well-known Mackey-Glass time series generated by the differential equation:

\frac{d y}{d t} = \frac{0.2 \cdot y (t - τ)}{1 + y^{10} (t - τ)} - 0.1 \cdot y (t)

(33)

In [18] the function y(t) is approximated by previous t-6 values y(t − 6), y(t − 5), …, y(t − 1) by constructing a multi-dimensional F-transform to approximate the output variable y as a function of six variable x_i = y(t − i), i = 1, …, 6.

To construct the components of the direct multi-dimensional F-transform the N points

(x_{1}^{(j)}, x_{2}^{(j)}, \dots, x_{6}^{(j)}, y^{(j)})

are considered, where j = 1, …, N. They are given by

F_{h_{1} h_{2} \dots h_{6}} = \frac{\sum_{j = 1}^{N} y^{(j)} \cdot A_{1}_{h_{1}} (x_{1}^{(j)}) \cdot \dots \cdot A_{6}_{h_{6}} (x_{6}^{(j)})}{\sum_{j = 1}^{N} A_{1}_{h_{1}} (x_{1}^{(j)}) \cdot \dots \cdot A_{6}_{h_{6}} (x_{6}^{(j)})}

(34)

The inverse F-transform is given by

f_{n_{1} n_{2} \dots n_{6}}^{F} (x_{1}^{(j)}_{1}, x_{2}^{(j)}, \dots, x_{6}^{(j)}) = \sum_{h_{1} = 1}^{n_{1}} \sum_{h_{2} = 1}^{n_{2}} \dots \sum_{h_{6} = 1}^{n_{6}} F_{h_{1} h_{2} \dots h_{6}} \cdot A_{1}_{h_{1}} (x_{1}^{(j)}) \cdot \dots \cdot A_{6}_{h_{6}} (x_{6}^{(j)})

(35)

To assess the value of the function y(t) at the time t considering the value obtained in the six previous time steps: x_i = y(t − i) i = 1, …, 6, the Formula (35) is applied by obtaining the following:

\tilde{y} = f_{n_{1} n_{2} \dots n_{6}}^{F} (x_{1}, x_{2}, \dots, x_{6}) = \sum_{h_{1} = 1}^{n_{1}} \sum_{h_{2} = 1}^{n_{2}} \dots \sum_{h_{6} = 1}^{n_{6}} F_{h_{1} h_{2} \dots h_{6}} \cdot A_{1}_{h_{1}} (x_{1}) \cdot \dots \cdot A_{6}_{h_{6}} (x_{6})

(36)

In [18] the authors compare the results obtained by applying this method to the Mackey-Glass time series with those ones obtained by using the well-known Wang and Mendel method and with the results obtained using a local Wavelet Neural Network with three layers, six input nodes, 10 hidden nodes and one output node. They measure the MAPE, RMSE and MADMEAN indices, showing that the multi-dimensional time series method has the best performances.

The multi-dimensional fuzzy transform method [18] can be generalized for any function considering a dependency on k input parameters. In [19] it is applied for forecasting problems in spatial analysis. The framework proposed in [19] is schematized in Figure 5.

The area of study is partitioned in subzones. For each subzone a training dataset with the measure of characteristics of the subzone in a specified period is extracted. Then, the time series correspondent to a measured characteristic f(t) from a time t = 0 to t = T is constructed and the multi-dimensional F-transform prediction method [17] is applied to assess the value of f at the time T + Δt. The RMSE and the MADMEAN are used to evaluate the performances of the forecasting model. Finally, two thematic maps of the predicted value of the characteristic at the time T + Δt and of the prediction error in each subzone are given after performing a fuzzification process. This approach is encapsulated in a Geographical Information System and is tested in [19] to analyze the demographical balance data measured every month in the period 1 January 2003–31 October 2014 in the municipalities of Cilento and Vallo di Diano National Park located in the province of Salerno (Italy). The birth-rate and death-rate in November 2014 in each municipality are evaluated. The mean RMSE obtained is under 0.01.

4.3. F-Transform Seeasonal Time Series Model

In some time series a phenomenon called seasonality is present, given by a repetitive and regular pattern of changes that repeats over S time periods. For example, in a monthly time series S = 12, in an hourly time series S = 24, and so on.

Some well-known statistical models as the Seasonal Auto Regressive Integrated Moving Average (SARIMA) models [20,21] are used to forecast the value of the output variable at a time t as a combination of the trend with a seasonal component.

In [22] a seasonal time-series forecasting method based on F-transforms is proposed as Time Series Seasonal F-transform (TSSF). A polynomial best fit is applied to extract the trend; then the data are de-trended, subtracting the trend from the time series and the de-treated time series is partitioned in S subsets. The one-dimensional F-transform is applied to each subset to assess the correspondent seasonality.

To assess the value of the output variable y at the time t included in the sth season, with s in {1, 2, …, S}, we calculate the inverse F-transform

f_{n (s)}^{F} (t)

.

Let {(t⁽¹⁾, y⁽¹⁾), (t⁽²⁾, y⁽²⁾) ...(t^(M_s⁾, y^(M_s⁾)} be the de-treated sth subset with cardinality M_s, where y^(j), j = 1, …, M_s, is given by difference between the original measure obtained at the time t^(j) and the trend calculated at that time.

Let F_h, where h = 1, 2, …, n(s), be the hth component of the one-dimensional direct F-transform calculated by using a fuzzy partition of n(s) basic functions of the domain of the sth subset. The one-dimensional inverse F-transform calculated at the time t is given by

f_{n (s)}^{F} (t) = \sum_{h = 1}^{n (s)} F_{h} \cdot A_{h} (t) \cdot

(37)

The forecasted value

{\tilde{y}}_{0} (t)

of the output y₀ at the time t included in season s is

{\tilde{y}}_{0} (t) = f_{n (s)}^{F} (t) + t r e n d (t)

(38)

where the term trend(t) is the assessed value of the trend of the time series at the time t.

In the TSSF model, to verify that each subset of data is sufficiently dense with respect to the fuzzy partition and to find the best fuzzy partition, is applied the technique proposed in [11]. To find the best fuzzy partition for each subset the MADMEAN measure is calculated, being

M A D M E A N_{S} = \frac{\sum_{j = 1}^{M_{s}} |f_{n (s)}^{F} (t^{(j)}) - y^{(j)}|}{\sum_{j = 1}^{M_{s}} y^{(j)}}

(39)

The number of fuzzy sets of the initial fuzzy partition is set to 3; then, the sufficient density of the data with respect to the fuzzy partition is verified and the direct F-transform is calculated. The inverse F-transform in each time t^(j), where j = 1, …, M_s, is calculated by Formula (37) and, finally, the MADMEAN index (39) is measured. If the MADMEAN index is greater than a fixed threshold, then the process stops and the direct F-transform components are stored; otherwise, the number of fuzzy sets of the fuzzy partition n(s) is increased by one unit and the previous steps are iterated. This process is executed for each seasonal subset.

In Figure 6 the flow diagram of the TSSF model is shown.

In [22] many comparison tests are performed comparing the performance of TSSF with the ones measured executing other forecasting algorithms applied to seasonal time series. Comparisons are executed with respect to the statistical Average Seasonal Variation (avgSV) and Seasonal ARIMA models [21], the model based on the multi-dimensional F-transform (MF-tr) [18] and the soft computing forecasting models Support Vector Machine (SVM) [23] and Automatic Design of Artificial Neural Networks (ADANN) [24]. Table 3 shows the RMSE obtained applying these models on a set of 14 seasonal time series giving the daily mean temperature measured by 14 weather monitoring stations located in the province of Genova (Italy). In each experiment, the month is used as seasonality and each dataset is partitioned in twelve subsets.

The results in Table 2 show that the TSSF’s performances are better than the ones obtained by using the avgSV, SARIMA and F-transform and comparable with those ones obtained by using SVM and ADANN. In addition, SVM and ADANN are computationally more complex to manage than TFSS. A critical point of TSFF is its inability to manage irregular time series, in which it is complex to evaluate time series patterns in the data.

In [9] an extension of the TFSS model has been proposed, based on the use of the first-order F-transform. This model improves the performance of the TFSS model but increases its computational complexity.

5. F-Transform in Data Classification

In Section 3 we analyzed techniques that use the multi-dimensional F-transform as a regression function to explore dependency between data ([10,11]). In [25] a classification method based on the use of the multi-dimensional F-transform is proposed. The proposed algorithm, called MFC (Multi-dimensional F-transform Classification), compute the direct and inverse multi-dimensional F-transforms to classify data points.

The learning dataset is given by a set of data points characterized by a pair (X,Y), where X is a vector of s numerical features (X₁, … X_s) and Y is the class feature designated as class which has C categories, labelled with the values 1, 2, …, C.

The multi-dimensional F-transform is applied to explore a relation between attributes in the form:

Y = f (x_{1}, \dots, x_{s})

(40)

where f is a discrete function f: [a₁,b₁] × [a₂,b₂] ×…× [a_s,b_s] → {1, 2, ..., C} with x_i

\in

[a_i,b_i] i = 1, …, s, and Y

\in

{1, 2, ..., C}_.

MFC uses the multi-dimensional inverse F-transform to approximate the function f. To avoid the over-fitting problem is applied the K-fold cross validation resampling algorithm to control this presence.

K-fold cross validation is a well-known resampling technique in which the dataset is partitioned into K subsets of equal size called folds. The classification algorithm is iterated K times. At any iteration of a fold constitutes the validation set and the union of the other K-1 folds forms the training set, used to train the classifier. With respect to other resampling techniques, K-fold is more efficient in dealing with the over-fitting problem, as in K-fold each fold is treated once as a validation set.

Let P = (p₁, p₂, …, p_s) be a data point. Formally, if

F_{k}

is the multi-dimensional direct F-transform calculated by using the kth fold and

f_{n_{1} n_{2} \dots n_{s}}^{F_{k}} (p_{1}, p_{2}, \dots, p_{s})

is the value of the multi-dimensional inverse F-transform calculated in P, then, an average of the K inverse F-transforms in the point P is calculated as

f_{n_{1} n_{2} \dots n_{s}} (p_{1}, p_{2}, \dots, p_{s}) = \frac{1}{K} \sum_{k = 1}^{K} f_{n_{1} n_{2} \dots n_{s}}^{F_{k}} (p_{1}, p_{2}, \dots, p_{s})

(41)

The point P is classified in the class labeled c*, where

c^{*} = a r g \{\underset{c = 1, \dots, C}{m i n} (|f_{n_{1} n_{2} \dots n_{s}} (p_{1}, p_{2}, \dots, p_{s}) - c|)\}

(42)

To evaluate the performance of the classifier for each fold two index

C V_{1}^{k}

and

C V_{2}^{k}

k = 1, …, K are calculated, where

-: $C V_{1}^{k}$ is the percentage of all the misclassified data points in the kth training set;
-: $C V_{2}^{k}$ is the percentage of all the misclassified data points in the kth validation set.

The final index giving the average of the percentage of misclassified data points in the training sets is

C V_{1} = \frac{1}{K} \sum_{k = 1}^{K} {CV}_{1}^{k}

(43)

and the final index giving the average of the percentage of misclassified data points in the validation sets is

C V_{2} = \frac{1}{K} \sum_{k = 1}^{K} {CV}_{2}^{k}

(44)

CV₁ and CV₂ are used to evaluate the performances of MFC. If CV₁ is under a fixed threshold α and CV₂ is under a fixed threshold β, then the algorithm stops, else a finer set of fuzzy partitions of the domains of the s input variables is constructed and the process is iterated.

In Figure 7 we show the flow diagram of MFC.

In [25] comparison tests are performed on over 100 classification datasets extracted from the University of California, Irvine (for short, UCI) Machine Learning and from the Knowledge Extraction Evolution Learning repositories.

In Table 4 are shown the mean accuracy, precision and recall classification measures obtained by running MFC, Decision tree-based J48 [26], Multi-Layer Perceptron [27], naive Bayes [28] and Lazy K-Nearest Neighbor IBK [29].

These results show that MFC provides classification performance better than those ones obtained by using the naive Bayes and Lazy IBK algorithms. They are comparable with the results obtained by the Decision tree J48 and the Multilayer Perceptron algorithms.

A weak point of MFC algorithm is its high computational complexity which makes it unsuitable to manage massive and high-dimensional datasets.

The integration with data compression and feature selection approaches in the pre-processing phase can reduce these high computational costs. An approach that integrates Principal Component Analysis (PCA) feature reduction techniques with higher-degree F-transform has been proposed in [30] in image classification. A mixed model that integrates higher-degree F-transform and PCA techniques could be tested in data classification to reduce the number of features and improve the accuracy and precision of the classifier model, without significantly increasing the time consumption.

6. Conclusions

This paper presents a summary of the data analysis techniques proposed in the literature based on the use of the F-transform in one or more dimensions. We initially presented the definition of one-dimensional direct and inverse F-transform, showing how it can be used to approximate a continuous function on a real interval. We then extended this concept to the multi-dimensional F-transform, showing how it can be used in regression analysis. In particular, attention was paid to the constraint of sufficient data density with respect to fuzzy partitions, which is extremely important for the choice of the optimal cardinality of fuzzy partitions. Then, the methods proposed in the literature for the analysis of the dependency between attributes in the data and for the extraction of association rules through the use of direct and inverse multi-dimensional F-transforms were presented and analyzed. An extensive discussion was devoted to the different time series analysis techniques based on the F-transforms proposed in the literature. Finally, a classification method recently presented in the literature based on the multi-dimensional F-transform was described.

The use of F-transform-based approaches in data analysis still remains an evolving research field. We foresee that in the future new approaches based on the use of the F-transform may be presented that reduce the time-consumption and computational complexity that currently, on the one hand, prevent the application of these techniques to massive and high dimensional data and on the other hand allow to also use high-orders F-transforms in data analysis, improving the performance obtained using the zero-order F-transform. In the future, hybrid strategies of using the high-order F-transform and reducing the data size could lead to an optimal trade-off between the quality of the results and the processing times.

In the future, the multidimensional zero and high-order fuzzy transform methods may be included into soft computing hybrid models for the analysis of risk prediction and damage assessment proposed in recent soft computing risk analysis and forecasting models such as damage assessment of existing buildings [31] and entity assessment of the damage that can be produced on them by seismic events [32]. Moreover, fuzzy transform methods can be applied for the solution of fuzzy differential equations [33] and fuzzy partial equations [34] in data analysis models for complex systems.

Author Contributions

The contributions of the three authors F.D.M., I.P. and S.S. are summarized below: conceptualization, F.D.M., I.P. and S.S..; methodology, F.D.M., I.P. and S.S.; software, F.D.M., I.P. and S.S.; validation, F.D.M., I.P. and S.S.; formal analysis, F.D.M., I.P. and S.S.; investigation, F.D.M., I.P. and S.S.; resources, F.D.M., I.P. and S.S..; data curation, F.D.M., I.P. and S.S.; writing—original draft preparation, F.D.M., I.P. and S.S.; writing—review and editing, F.D.M., I.P. and S.S.; visualization, F.D.M., I.P. and S.S.; supervision, F.D.M., I.P. and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The work of Irina Perfilieva was partially supported by the project AI-Met4AI, CZ.02.1.01/0.0/0.0/17-049/0008414.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Table of Acronyms and Abbreviations

In Table A1 are listed the acronyms and abbreviation terms used in the text.

Table A1. Acronyms and abbreviations.

Acronym/Abbreviation	Explanation
F-transform	Fuzzy transform
Multidimensional F-transform	Multi-dimensional Fuzzy transform
MFAD	Massive F-transform Attribute Dependency method
SVM	Support Vector regression Method
MLP	MultiLayer Perceptron method
avgSV	AVeraGe Seasonal Variation model
SARIMA	Seasonal AutoRegressive Integrated Moving Average model
MF-tr	Multi-dimensional Fuzzy TRansform forecasting model
TFSS	Time Series Seasonal time series F-transform model
ADANN	Automatic Design of Artificial Neural Networks model
MFC	Multidimensional F-transform Classification method
UCI	University of California, Irvine
K-fold	Cross-validation K-fold resampling method applied in classification.
Naïve Bayes	Naïve Bayesian classification method
J48	Decision tree J48 classification algorithm in the Weka data mining tool.
Lazy IBK	Lazy K-Nearest Neighbor Instance-Bases learning with parameter K classification method.
PCA	Principal Component Analysis.

References

Perfilieva, I.; Haldeeva, E. Fuzzy transformation. In Proceedings of the IFSA World Congress and 20th NAFIPS International Conference, Joint 9th, Vancouver, Canada, 25–28 July 2001; IEEE: Piscataway, NJ, USA, 2001; Volume 4, pp. 1946–1948. [Google Scholar]
Perfilieva, I. Fuzzy transforms: Theory and applications. Fuzzy Sets Syst. 2006, 157, 993–1023. [Google Scholar] [CrossRef]
Di Martino, F.; Sessa, S. Fuzzy Transforms for Image Processing and Data Analysis. Core Concepts, Processes and Applications; Springer: Cham, Switzerland, 2020; p. 217. [Google Scholar] [CrossRef]
Bede, B.; Rudas, I.J. Approximation properties of fuzzy transforms. Fuzzy Sets Syst. 2011, 180, 20–40. [Google Scholar] [CrossRef]
Khastan, A. A new representation for inverse fuzzy transform and its application. Soft Comput. 2017, 21, 3503–3512. [Google Scholar] [CrossRef]
Perfilieva, I.; Dankova, M.; Bede, B. Towards a higher degree f-transform. Fuzzy Sets Syst. 2020, 180, 3–19. [Google Scholar] [CrossRef]
Alikhani, R.; Zeinali, M.; Bahrami, F.; Shahmorad, S.; Perfilieva, I. Trigonometric f^m-transform and its approximative properties. Soft Comput. 2017, 21, 3567–3577. [Google Scholar] [CrossRef]
Zeinali, M.; Alikhani, R.; Bahrami, F.; Shahmorad, S.; Perfilieva, I. On the structural properties of f^m-transform with application. Fuzzy Sets Syst. 2018, 342, 31–52. [Google Scholar] [CrossRef]
Di Martino, F.; Sessa, S. Seasonal Time Series Forecasting by F¹-Fuzzy Transform, Special Issue Intelligent Systems in Sensor Networks and Internet of Things. Axioms 2019, 19, 3611. [Google Scholar]
Perfilieva, I.; Novàk, V.; Dvoràk, A. Fuzzy transforms in the analysis of data. Int. J. Approx. Reason. 2008, 48, 36–46. [Google Scholar] [CrossRef] [Green Version]
Di Martino, F.; Loia, V.; Sessa, S. Fuzzy transforms method and attribute dependency in data analysis. Inf. Sci. 2010, 180, 493–505. [Google Scholar] [CrossRef]
Di Martino, F.; Sessa, S. Attribute dependency data analysis for massive datasets by fuzzy transforms. Soft Comput. 2021. [Google Scholar] [CrossRef]
Wold, H. A Study in Analysis of Stationary Time Series. R. Stat. Soc. 1939, 102, 295–298. [Google Scholar]
Wei, W.W.S. Time Series Analysis Univariate and Multivariate Methods, 2nd ed.; Pearson Addison Wesley: Boston, MA, USA, 2006; p. 605. ISBN 0-321-32216-9. [Google Scholar]
Perfilieva, I.G.; Yarushkina, N.G.; Afanasieva, T.V. In Proceedings of International Conference on Fuzzy Systems, Barcelona, Spain, 18–23 July 2010.
Perfilieva, I.; Yarushkina, N.; Afanasieva, T.; Romanov, A. Time series analysis using soft computing methods. Int. J. Gen. Syst. 2013, 42, 687–705. [Google Scholar] [CrossRef]
Novàk, V.; Perfilieva, I.; Kreinovich, V. Filtering out high frequencies in time series using F-transform. Inf. Sci. 2014, 274, 192–209. [Google Scholar] [CrossRef] [Green Version]
Di Martino, F.; Loia, V.; Sessa, S. Fuzzy transforms method in prediction data analysis. Fuzzy Sets Syst. 2011, 180, 146–163. [Google Scholar] [CrossRef]
Di Martino, F.; Sessa, S. Fuzzy transform prediction in spatial analysis and its application to demographic balance data. Soft Comput. 2017, 21, 3537–3550. [Google Scholar] [CrossRef] [Green Version]
Ziegel, E.R.; Box, G.E.P.; Reinsel, G.C.; Jenkins, S. Time Series Analysis, Forecasting, and Control. Technometrics Taylor Fr. Milton Park 1995, 37, 238–239. [Google Scholar] [CrossRef]
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis: Forecasting and Control, 5th ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 2015; p. 712. ISBN 978-1-118-67502-1. [Google Scholar]
Di Martino, F.; Sessa, S. Time series seasonal analysis based on fuzzy transforms. Symmetry 2017, 9, 281. [Google Scholar] [CrossRef] [Green Version]
Pai, P.F.; Lin, K.P.; Lin, C.S.; Chang, P.T. Time series forecasting by a seasonal support vector regression model. Exp. Syst. Appl. 2010, 37, 4261–4265. [Google Scholar] [CrossRef]
Štepnicka, M.; Cortez, P.; Peralta Donate, J.; Štepnickova, L. Forecasting seasonal time series with computational intelligence: On recent methods and the potential of their combinations. Exp. Syst. Appl. 2013, 40, 1981–1992. [Google Scholar] [CrossRef] [Green Version]
Di Martino, F.; Sessa, S. A classification algorithm based on multi-dimensional fuzzy transforms. Ambient Intell. Humaniz. Comput. 2021. [Google Scholar] [CrossRef]
Bhargawa, N.; Sharma, G.; Bhargava, R.; Mathuria, M. Decision Tree Analysis on J48 Algorithm for Data Mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2013, 3, 1114–1119. [Google Scholar]
Pal, S.K.; Mitra, S. Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 1992, 3, 683–697. [Google Scholar] [CrossRef] [PubMed]
Murphy, K.P. Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning Series), 1st ed.; The MIT Press: London, UK, 2012; p. 1070. ISBN 978-0262018029. [Google Scholar]
Aha, D.W. (Ed.) Lazy Learning; Kluwer Academic Publishers: Norwell, MA, USA, 1997; p. 436. ISBN 978-0792345848. [Google Scholar]
Hurtik, P.; Molek, V.; Perfilieva, I. Novel dimensionality reduction approach for unsupervised learning on small datasets. Pattern Recognit. 2020, 103, 107291. [Google Scholar] [CrossRef]
Harirchian, E.; Ehsan, S.; Hosseini, A.; Jadhav, K.; Kumari, V.; Rasulzade, S.; Işık, E.; Wasif, M.; Lahmer, T. A review on application of soft computing techniques for the rapid visual safety evaluation and damage classification of existing buildings. J. Build. Eng. 2021, 43, 102536. [Google Scholar] [CrossRef]
Harirchian, E.; Lhamer, T. Developing a hierarchical type-2 fuzzy logic model to improve rapid evaluation of earthquake hazard safety of existing buildings. Structures 2020, 28, 1384–1399. [Google Scholar] [CrossRef]
Georgieva, A. Application of Double Fuzzy Natural Transform for Solving Fuzzy Partial Equations, AIP Conference Proceedings; AIP Publishing: Melville, NY, USA, 2021; Volume 2333, p. 080006. [Google Scholar] [CrossRef]
Mazandarani, M.; Xiu, L. A Review on Fuzzy Differential Equations. IEEE Access 2021, 9, 62195–62211. [Google Scholar] [CrossRef]

Figure 1. Example of non sufficiently dense data points with respect to the fuzzy partitions.

Figure 2. Examples of data points sufficiently dense with respect to the fuzzy partitions.

Figure 3. Flow diagram of the algorithm proposed in [11].

Figure 4. Schema of the MFAD method proposed in [12].

Figure 5. Schema of the framework proposed in [19].

Figure 6. Flow diagram of the TSSF model in [22].

Figure 7. Flow diagram of the MFC algorithm [25].

Table 1. Schema of a relation with r attributes and m instances.

	X₁	...	X_i	...	X_r
O₁	p₁₁	.	p_1i	.	p_1r
.	.	.	.	.	.
.	.	.	.	.	.
.	.	.	.	.	.
O_j	p_j₁	.	p_ji	.	p_jr
.	.	.	.	.	.
.	.	.	.	.	.
.	.	.	.	.	.
O_m	p_m₁	.	p_mi	.	p_mr

Table 2. Values of the index of determinacy applying MFAD for different values of the parameter s [12].

s	Index of Determinacy
1	0.881
8	0.872
9	0.872
10	0.874
11	0.875
13	0.877
16	0.878
20	0.878
26	0.875
40	0.872

Table 3. RMSE in six methods for the mean temperature in 14stations in the province of Genova (Italy).

Station	RMSE
Station	avgSV	SARIMA	MF-tr.	TSSF	SVM	ADANN
Alpe Gorreto	2.98	1.20	1.49	0.84	0.81	0.83
Campo Ligure	2.74	1.09	1.34	0.76	0.71	0.76
Barbagelata	3.25	1.30	1.57	0.89	0.84	0.90
Camogli	3.39	1.38	1.68	0.95	0.88	0.86
Campo ligure	3.02	1.20	1.49	0.83	0.77	0.79
Carlasco	2.91	1.15	1.42	0.80	0.77	0.76
Chiavari	2.78	1.12	1.39	0.78	0.73	0.77
Genova Bolzaneto	2.95	1.16	1.41	0.81	0.77	0.75
Genova Pegli	3.34	1.29	1.64	0.94	0.89	0.88
Panesi	3.20	1.29	1.56	0.87	0.84	0.83
Rapallo	2.71	1.08	1.33	0.75	0.78	0.84
Rovegno	2.94	1.18	1.45	0.82	0.82	0.80
Tigliolo	3.06	1.24	1.52	0.85	0.80	0.85
Viganego	3.17	1.28	1.57	0.88	0.82	0.83

Table 4. Mean accuracy, precision and recall with 5 classification algorithms.

Algorithm	Accuracy	Precision	Recall
MFC Classifier	98.15%	98.09%	97.36%
Decision tree J48	98.38%	98.17%	97.51%
Multilayer Perceptron	98.22%	98.23%	97.48%
Naive Bayes	96.55%	91.89%	90.65%
Lazy IBK	97.17%	93.30%	91.44%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Martino, F.D.; Perfilieva, I.; Sessa, S. A Summary of F-Transform Techniques in Data Analysis. Electronics 2021, 10, 1771. https://doi.org/10.3390/electronics10151771

AMA Style

Martino FD, Perfilieva I, Sessa S. A Summary of F-Transform Techniques in Data Analysis. Electronics. 2021; 10(15):1771. https://doi.org/10.3390/electronics10151771

Chicago/Turabian Style

Martino, Ferdinando Di, Irina Perfilieva, and Salvatore Sessa. 2021. "A Summary of F-Transform Techniques in Data Analysis" Electronics 10, no. 15: 1771. https://doi.org/10.3390/electronics10151771

APA Style

Martino, F. D., Perfilieva, I., & Sessa, S. (2021). A Summary of F-Transform Techniques in Data Analysis. Electronics, 10(15), 1771. https://doi.org/10.3390/electronics10151771

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Summary of F-Transform Techniques in Data Analysis

Abstract

1. Introduction

2. Preliminaries

2.1. Basic Functions

2.2. One-Dimensional Direct and Inverse F-Transform

2.3. Multi-Dimensional Direct and Inverse F-Transform

3. Multi-Dimensional F-Transform Methods to Explore Dependency and Rules in the Data

3.1. Multi-Dimensional F-Transform Techniques to Detect Dependency between Attributes in Datasets

3.2. Multi-Dimensional F-Transform Techniques for Mining Association Rules

4. F-Transform Techniques for Time Series Analysis

4.1. One-Dimensional F-Transform Time Series Models

4.2. Multi-Dimensional F-Transform Time Series Model

4.3. F-Transform Seeasonal Time Series Model

5. F-Transform in Data Classification

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Table of Acronyms and Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI