## 1. Introduction

Fusion systems allow the integration of heterogeneous sensor data in databases, including knowledge rules, contextual description, external knowledge bases, etc., in order to obtain a general description of real situations. The goal of this process of information fusion is to take the best decisions based on a global view of situations.

The idea behind this work is to find specific patterns in the behavior of consumption of agricultural products: the relationship among the quantities of each product. Information Fusion and Artificial Intelligence (IF/AI) techniques are used to extract association rules for improving the prediction of the consumption of agricultural products. This prediction allows the establishment of better strategies to improve local operations in the network of markets in Ecuador named “CIALCO” (the Spanish acronym for Alternative Circuits of Marketing). This country is crossed by the equator and its territory extends both north and south of latitude zero. This work is centered in the provinces of Tungurahua and Chimborazo. These regions are located in the south and central region of Ecuador, and data were collected on sales of agricultural products in these marketing circuits, created to establish a direct relationship between farmers and consumers.

The final goal is to increase incomes of the people in the Andean region of Ecuador that works on small farmers, to prevent the migration from this area to larger population centers. There are several markets in the CIALCO circuit; the analysis carried out in this work is related to information of people involved in markets of agricultural fairs. Fairs are places where farmers meet periodically to sell products to consumers and conduct their business [

1].

The first source of data was provided by the Ministry of Agriculture and Livestock of Ecuador. The dataset contains weekly performance of sales of products that were sold by small farmers located in the Ecuadorian provinces of Tungurahua and Chimborazo in 2014. The data for each fair is defined by the date, the place, and the volume of sales of products. An average of 300 items per week were sold. This research is based on previous work on information fusion and data mining techniques [

2] and the ability to extract knowledge from the fusion of information [

3]. Some preliminary results were presented by the same authors at the 2018 International Conference of Information Fusion, Cambridge, UK, on 10–13 July 2018 [

4]. In this work, a theoretical approach of fusion system is presented to integrate information from different sources (a net of soft sensors deployed in a certain region) able to improve the predictions of sales by exploiting the association relations mined in the available data and compare their impact with respect to other available magnitudes to fuse such as close population and climatic variables. Associative relations mined from available datasets have proven to be a powerful means to discover causal relations useful for understanding the information coming from heterogeneous sources in different domains such as education [

5], smart cities [

6], or pervasive computing [

7].

In this work, which is focused on the agricultural domain, the composition of several geographically dispersed local markets represents a global market. In each local market, the reported sales are dealt as a soft sensor measuring information about sales, giving information for each product and its associative relationships with the rest. Each market generates the following observational data:

Local Market Information (LMI): geographical position, name of products, number of sellers, quantity of population that uses the local market, etc.

For each sale and for each product, the quantity of product acquired in each sale.

The second source of data was provided by the National Institute of Meteorology and Hydrology of Ecuador, where climatic information is recorded for each geographic region based on meteorological sensor sources (temperature, precipitation, humidity, etc.). A third source of data considered in this study was provided by the National Institute of Statistics and Census of Ecuador, where information about the estimated population in each region is stored. At the end, the system is composed of a net of geographically dispersed soft sensors able to obtain data from local markets, fused with hard sensors providing climatic data about each market and also the population of cities close to the market.

The rest of this paper is organized as follows:

Section 2 presents an overview of works related to agricultural production.

Section 3 proposes a new view of the general problem as a hard and soft information fusion and the algorithm to predict agricultural prediction using special information.

Section 4 presents the methodology to integrate data mining to generate relevant knowledge to improve the information fusion process.

Section 5 presents the analysis of the target scenario, and

Section 6 concludes the paper.

## 3. Spatial Predictions Based on Data Fusion

In general, sensors are defined as sources of information from the environment. In real environments, sensors are composed of specific hardware and software dedicated to taking measurements of physical variables (such as location, size, temperature, pressure, etc.). In virtual environments, such as social networks, webs, etc., sensors are specific software tools that extract information from the digital world. These two different types of sensors can be labeled as hardware sensors (h) and software sensors (s). Both types of sources generate information about the environment, and this definition allows to distinguish between real and digital environments [

15]. Hard/soft data fusion has been shown to be an effective approach to improve models and understand situations in different domains. For instance, the work in [

6] deals with the texts posted from persons as a distributed social sensor system which can be correlated with physical sensor data (audio, temperature, pollution, etc.) to improve information available for services provided in smart cities.

Therefore, a sensor can be defined as a general data source in the following way:

Besides, several sensors of diverse types could be placed together in a platform of sensors to take advantage of measuring several variables in a coordinated way. For example, a meteorological station is composed of several hardware sensors (temperature, atmospheric pressure, altitude, etc.) and may include software sensors (human observations and forecast reports), or a surveillance system could be composed of a set of hard sensors (radar, camera and infrared camera aligned to detect any object in the search field) and may include human inputs from human operators.

A sensor platform could be defined in the following way:

In general, platforms should be distributed in the environment to cover a large area, and, in many cases, coverage areas are overlapped to improve detections in the boundaries, as shown in

Figure 1. The information captured by platforms is sent to a fusion centre where it is integrated using spatial and temporal information.

The fused information is used to predict future situations. This prediction depends on a model that could be based on measurements from a single source, for example current and past positions of a target to predict future positions based on measures and the movement model. In some situations, the information from the set of sensors of the platform could be integrated, and the complementary information of each one allows the final prediction to be improved. For example, the radar position, the visual position, and the thermic position could be combined to generate a “best” position to improve the prediction of future position, or the predicted future position based on radar could be improved considering the actual video position.

In the fusion centre, information of the network is received, and measures of each sensor are stored to improve the final prediction:

That could be seen as a sequence of values registered in a database, defining each register as a vector with the corresponding values (time stamp, platform

j, sensor

i, type of sensor

t, value). These vectors would be stored in a database, as shown in

Table 1.

The final prediction could be based on the information of only one sensor of the platform, for example, tracking a target using the radar of one platform. In this problem, we have a primary source, that is composed of a deployment with platforms of soft sensors located in each local market. Each platform is defined by the market data: position, number of products, number of sellers, quantity of population that uses the local market, etc., and has a set of soft sensors able to measure the level of sales of each product:

Each platform is composed of two other soft sensors:

${I}_{ij}^{t}$: Information of population characteristic i from market j of type t (percentage, absolute, etc.)

Considering this spatially distributed data fusion problem, future predictions could be computed exploiting the geo-statistical properties of variables. The prediction of a variable spatially distributed is a specific problem considering local information. Citing [

16], “in the geographical space everything is related to everything, but the closest spaces are more related to each other”. This process usually starts by defining the function taking values on a certain spatial region D. This function provides a set of random variables for x taking values in the domain D, being defined as

Z = {Z(x),

x$\in $D}. The values are also random, being the expectation and variance, first and second order moments, defined as usual:

This last value assesses the scatteredness of

Z(x) around the expectation, while the covariance between different points in D is defined as:

This value characterizes the interaction between

$Z\left({x}_{1}\right)$ and

$Z\left({x}_{2}\right)$, usually integrated in the semi variogram function, defined as:

which reflects the way in which a point has influence on another point at different distances. The relation between variogram and covariance is given by:

As defined above, given a set of

Z values provided by sensors deployed in n certain sites, {

x_{1}, …,

x_{n}}, the variogram can be estimated considering the separation vector h as:

In the case that $\gamma \left(h\right)$ is isotropic (identical in all directions of the space), it does not depend on the angle of vector h, but only on its magnitude, |h|.

In the general case, it is difficult to obtain the experimental variogram from data, given the scarce distances and directions, and usually a theoretical model must be adjusted with the available dataset. A quite generic model for variogram considers growing from the origin until a distance for stabilization, around a plateau, so that the random variables

Z(x) and

Z(x + h) are correlated when the length of the separation vector h is lower than a certain distance, the zone of influence, and beyond |

h|= a the variogram keeps constant (the plateau). For instance, a spherical variogram of reach a and plateau C is defined as [

17]:

In the isotropic case, assuming independent direction of the semi variance, the vector h is replaced with its magnitude, ‖

h‖. In this case, the variogram is computed taking pairs of data

$Z\left({s}_{i}\right),Z\left({s}_{i}+h\right)$ along the available distances in interval

$\overline{{h}_{j}}$, defined as:

In some cases, the experimental variogram shows changes of slope in certain distances. In these cases, the variogram model can be a sum of simple models (nested structures):

In these cases, the adjustment is not based only on experimental data, but it must consider also contextual information about the region. More details appear in [

18].

Kriging is a linear prediction model corresponding to the unbiased linear estimator. There are different types of kriging depending on the average of population: ordinary and simple.

In ordinary kriging, a stationary

Z random function is obtained:

where

V is the neighborhood considered in the process. The model is defined as:

where

${x}_{0}$ is the place to predict the variable, {

${x}_{\alpha}$, α = 1,…,

n} are the available sites with training data, and {

${\lambda}_{\alpha},\alpha =1,\dots ,n$} are the weights to be computed, together with constant a. A first constraint of estimator is to be unbiased, (mean value of error must be zero):

With this constraint, the weights to minimize the variance of estimator can be computed as:

Replacing the variogram by the covariance through the relationship

$\gamma \left(h\right)=C\left(0\right)-C\left(h\right)$, the kriging prediction is obtained as follows:

The core of the fusion process is carried out with co-kriging, a technique using multiple spatial variables to build an extended prediction model. This multivariable kriging process takes as input a set of m available spatial variables,

$\left\{{Z}_{1},\dots ,{Z}_{m}\right\}$, previously aligned with the variable to predict and collocated in the same coordinates. The prediction equation is extended for this case as follows:

It is an estimation method that minimizes the error variance by exploiting the cross-correlation between the available variables. In an analogous way, the error covariance of prediction ${Z}_{i}{}^{\ast}\left({x}_{0}\right)$ can be expressed as a function of the own coefficients, ${\lambda}_{\alpha},$ and the coefficients for the collocated variables, $\left\{{\lambda}_{\alpha j}\right\},j=1,\dots n,j\ne i$. Consequently, the expression to minimize, analogous to Equation (17), will depend both on the own variogram, ${\gamma}_{i}\left(h\right),$ and on the crossed variograms among the variables,${\gamma}_{ij}\left(h\right)$.

Therefore, the first step of co-kriging also consists in computing the multivariable variogram in order to estimate the semi variances among all variables:

This crossed variogram between variables

${Z}_{i},$ ${Z}_{j}$ can be estimated with the available training data:

where the set

N(h) is defined as {α,

$\beta ,\mathrm{such}\mathrm{that}{x}_{\alpha}-{x}_{\beta}=h\}$, with variables

${Z}_{\mathrm{i}}and{Z}_{\mathrm{j}}$ taking values in the corresponding locations

${x}_{\alpha}\mathrm{and}{x}_{\beta}$.

In this way, a methodology for data integration suggested by Doligez et al. [

19] is stepwise, progressively integrating data at different scales to improve the interpolation, illustrated in fusing seismic and well data. This stepwise approach can be used to integrate many data types. Once correlations among variables are analyzed, regression or geostatistical methods are used to integrate the data with the co-kriging approach. On the other hand, hybrid models mixing machine learning and geostatistics, such as Neural Network Residual Kriging/Co-kriging (NNRK/NNRCK) [

20], have proven their efficiency in real-world mapping problems.

Building extended models combining individual factors to fit the structure of the data faces the complexity of dimensionality and the risk of oversimplifying the unknown causal relationships of the multiple factors. Usually, it is best to separate the spatial processes whenever possible and try to understand the causal relationships among the available variables. Therefore, a refined fusion strategy is needed to overcome problems of dimensionality or stationarity assumed in statistical methods, or the interpretability problem of machine learning approaches. In this work, the underlying hypothesis is that the most relevant associative patterns can be mined in transactional data and then exploited to improve the efficiency of multivariable models integrating correlated variables. In general, results must be cross-validated to demonstrate the contribution of this approach. Cross validation in geospatial data implies systematically removing points from the data set and re-estimating the predictions based on the model assessed. This will be the means to validate and assess the contribution of the secondary information resulting from the fusion of available data.

## 4. Methodology Based on Data Mining to Improve the Fusion Process

The methodology follows three steps: Fusion, Machine Learning, and Prediction. The Fusion System integrates data available from different platforms (local markets) that contain several data sources (market sales, climatic variables, population data, etc.). As indicated, the system can learn relationships between variables that may be useful for the prediction of future sales based on information considering spatial distributions, meteorological information, etc. An overview of the general process is detailed in

Figure 2 and explained below:

- (1)
Clean and Transform Information

1.1. Records with no values are deleted

1.2. Similar values on records are standardized

1.3. Units are defined for all measurements

1.4. The set of products are selected for the study

1.5. Generate spatial structures:

1.6. Final database is generated and prepared for pattern search

- (2)
Extract best association rules

2.1. The database of sold products is discretized from transactional data

2.2. Association algorithms are applied to mine the best association rules

2.3. Establish the set of products with strongest associations

- (3)
Estimate future predictions applying geostatistical fusion techniques and validate the hypothesis of strongest conditional dependencies found by data association mining

3.1. Comparison of the fusion results using the set of most associated variables

3.2. Comparison of the results using climatic floors

3.3. Comparison of the results using population

3.4. Comparison of the results using the rest of the variables

- (4)
Finally, the improvement of the proposal is analyzed by comparing predictions with a single product vs the data fusion results (using residuals as evaluation metrics with Leave-one-out cross-validation, LOOCV)

In

Figure 2, global process for improving predictions is schematized. The first step is the integration of information received from knowledge sources (every local market) generating global information with every available source. From this information, system obtains the best association rules that should be used in the next step to improve predictions of sales (this is the learned sales model that represent the relation among products). The result of the first step is the fusion of local information received from local markets to generate a global view of sales in a region. In the second step, knowledge is extracted from this global view obtained relevant relationships among products. The third step, the final step, applies this knowledge to improve the future prediction of sales.

The global fusion process system needs knowledge extracted from local markets. In this sense, at the beginning of the process, it gathers representative information after applying the machine learning procedure to extract sales knowledge, and then this learned model is used to generate improved predictions by fusing observations and learned model.

Mining association rules is the way to find causal relations among variables (in the simple way is a rule that find relations between two variables). These rules allow the prediction of changes in the value of a variable based on knowledge of another one. The form of an association rule is {

A} ⇒ {

B}—this rule means: “if

A appears in the register then

B also should appear in the same register”. This kind of rule is useful to identify relationships between categorical attributes that are not explicit. As in any rule system, set

A is named antecedent of the rule and

B is named consequent. Each association rule should be evaluated to assess each quality, and evaluation is based on three common metrics: support, confidence, and lift [

21]:

Support (of a rule) is evaluated as the number of instances (register in the data set) the rule covers related to the whole set (of registers in the dataset).

where

D is the total set of transactions.

If antecedent

A and consequent

B are considered, the support is the intersection set:

Confidence (of a rule) is evaluated as the number (percentage) of times that consequent B appears among the instances that are selected by the antecedent A. The meaning of this concept is the accuracy of its prediction; it is defined as:

Lift (of a rule) is evaluated as the ratio of observed support considering that A and B were independent:

There are several algorithms that extract association rule from a database. The most representative algorithm for this task is the Apriori algorithm [

22]. Explanations about this algorithm in [

23,

24] clarify that Apriori algorithm finds trends using performance parameters (support, confidence, and lift) evaluated on “a priori” frequent sets (prior knowledge). The algorithm is composed of the following steps:

- (1)
Generate all item sets L with a single element; this set is used to form a new set with two, three, or more elements. All possible pairs are taken so that their support equals minsup

- (2)
For every frequent item set L’ found:

For each subset J, of L’

Repeat **1**, including next element into L

As explained, all item sets that satisfy a threshold of minimum support are searched. However, looking for all subsets would not be possible for the exponential size of search space of potential item sets to analyze. The Apriori algorithm prunes candidates with an infrequent subset before counting their supports. This is a Bread-First Search (BFS) process; it ensures that the support values of all subsets of a candidate are known in advance. All candidates of a cardinality k are counted in each scan in order to prune the branches below the support threshold, and then the search descends along the rest in the tree.

A possible alternative approach could be using Depth-First Search (DFS), expanding the candidate sets from the item sets of one of the nodes of the tree. Obviously, scanning the database for every node would result in tremendous overhead, so counting occurrences in a DFS mechanism is not practical. A more recent approach, called FP-growth, has been introduced in [

25], and was shown to be more efficient that Apriori in representative situations [

26]. In a preprocessing step, FP-growth builds a condensed representation of the transaction data, called FP-tree. FP-growth does not explore all the nodes of the tree, but directly descends to some part of the item sets in the search space and, in a second step, uses the FP-tree to derive the support values of frequent item set.

Besides, other recent extensions of Apriori go in the direction of temporal patterns. This algorithm cannot be used in many applications where patterns vary with time. In this case, entities follow periodic patterns such as transportation on time, load with time constraints, some trajectories, etc., and this kind of problem is not considered by Apriori basic algorithm. This kind of problem should be formulated as discovering patterns from dataset considering temporal attributes and try to model how they vary with time. Many algorithms for finding temporal patterns in sequence databases are listed in the bibliography; these algorithms are usually based on sequence mining techniques (or frequent patterns search) and, at the same time, temporal data association.

There are some extensions of the Apriori algorithm that consider lists of ordered objects using time as items. Then, searched result is the associations of items in the form of sequences of items. Some examples of these algorithms derived from Apriori are Generalized Sequential Pattern (GSP) for spatiotemporal associations [

27], Sequential PAttern Discovery using Equivalence classes (SPADE) [

28], and Sequential PAttern Mining (SPAM) [

29].

Other techniques extract meta-rules describing how relationships vary in time [

24,

30]. These techniques are also based on APriori mining schema extended to consider time meta-relationships. Some studies conducted in several domains of science have tried to use rules extracted with the Apriori algorithm as a criterion to generate future estimation using associations. Typical works are [

31], where the authors analyze the stock of a supermarket, and [

32], where authors predict admission decisions by students. Additionally, works such as [

33] have searched relationships between extracted rules (association rules) and other techniques such as fuzzy classification.