Data Fusion Using Improved Support Degree Function in Aquaculture Wireless Sensor Networks

For monitoring the aquaculture parameters in pond with wireless sensor networks (WSN), high accuracy of fault detection and high precision of error correction are essential. However, collecting accurate data from WSN to server or cloud is a bottleneck because of the data faults of WSN, especially in aquaculture applications, limits their further development. When the data fault occurs, data fusion mechanism can help to obtain corrected data to replace abnormal one. In this paper, we propose a data fusion method using a novel function that is Dynamic Time Warping time series strategy improved support degree (DTWS-ISD) for enhancing data quality, which employs a Dynamic Time Warping (DTW) time series segmentation strategy to the improved support degree (ISD) function. We use the DTW distance to replace Euclidean distance, which can explore the continuity and fuzziness of data streams, and the time series segmentation strategy is adopted to reduce the computation dimension of DTW algorithm. Unlike Gauss support function, ISD function obtains mutual support degree of sensors without the exponent calculation. Several experiments were finished to evaluate the accuracy and efficiency of DTWS-ISD with different performance metrics. The experimental results demonstrated that DTWS-ISD achieved better fusion precision than three existing functions in a real-world WSN water quality monitoring application.

for processing the sensor data which is incomplete or in an uncertain state by self-learning continuously. A multi-sensors data fusion technique was developed by using fuzzy clustering that is based on the ability of fuzzy sets in dealing with imprecision and uncertainty [12]. Rough set theory is also suitable for dealing with uncertain or unclear data, but has been limited to the attribute reduction for many years. Different applications of rough set theory in information fusion were presented in Reference [13]. DS theory has an advantage of studying nondeterministic problems in data fusion. But when dealing the conflict data, an abnormal value will occur frequently. A fusion-based uncertainty aware sensor networks deployment problem was discussed in Reference [14], DS theory was used to define a generic evidence fusion scheme that captures several characteristics of real-world applications. Before fusing the multi-sensors data, Bayesian network algorithm requires prior knowledge, and obtains the prior probability distribution to calculate the reliabilities of sensors. A novel approach for fault detection taking advantage of the mathematical framework of Bayesian to integrate micro and macro data was presented in Reference [15]. A Kalman filter can deal with the redundant information, but the prior knowledge and model of the target is required. A novel multi-sensor optimal data fusion methodology based on adaptive fading unscented Kalman filter for multi-sensor nonlinear stochastic systems was proposed in Reference [16]. As for the weighted fusion algorithm, it realized the weighting operation of data streams after calculating weights of sensors. A novel two classes of the ordered weighted gradient fusion algorithm was discussed in Reference [18] to fuse the multi-scale information inspired by the human visual system.
Weighted fusion algorithm is used in many application fields. It does not require the prior knowledge of sensor system during fusion, and can realize high precision information fusion with sensor data. Yager [22] proposed a power mean average operator to fuse sensor data based on the calculation of support degree. This algorithm can be applied in real-time fusing for its high efficiency. Xiong [23] provided a new model support function in real-time data fusion based on grey correlative degree theory. Before operating data fusion, it required checking for data consistency and exponentially smoothed three times on sensor nodes to improve data quality. Besides, Duan [24] adopted the regression prediction method based on siding window to check the consistency of data and provided a homogeneous data weighted fusion algorithm based on improved support degree to fuse these homogeneous data.
Although there were other works [20,26] that used the weighted fusion algorithm, based on different support degree functions, weighted fusion is preferred, due to the easy computation. The computational complexity and precision of these support degree functions are critical issues to be solved. In this paper, weighted fusion algorithm based on DTWS-ISD function was proposed to enhance data quality in aquaculture WSN. The first step of enhancing data quality is to collect data from a monitoring system. Then, finish checking data consistency to eliminate error. Finally, adopt data fusion mechanism to generate fused data.

Overview of Data Correction
Fault detection and data fusion mechanism are the crucial steps for improving the dissolved oxygen data quality. Due to the high correlation of multi-sensors data in time and space, it is necessary to check the consistency of historical data firstly. On the basis of reconstructing missing data and detecting the outlier, a new data set can be obtained to finish data fusion.
When any sensor does not work, data fusion mechanism can help to obtain corrected data to replace abnormal data. Suppose Sensor 1 is the fault node, the first step is to compute the mutual support degree values of sensors with DTWS-ISD function. Then, compute the fused result based on weighted fusion method. Each sensory data need to execute the following data processing mechanism to improve data quality as shown in Figure 1. The DTW method and time series segmentation strategy are adopted together to improve ISD function during the mutual support degree computing process. From Figure 1, when Sensor 1 is the fault node, we can get a new dataset Xi = [xi1, xi2, …, xit] in consistency checking module. Fusion module will utilize the fusion result of sensor nodes X2 = {x21, x22, …, x2t}, X3 = {x31, x32, …, x3t} and Xn = {xn1, xn2, …, xnt} to correct X1 = {x11, x12, …, x1t}. Here, xik represents the observed value of Sensor i in time j after data consistence checking, i = 2, 3, …, n, and k = 1, 2, …, t.
Algorithm 1 explains the detailed process of data correction. The input of data correction algorithm is the original data of dissolved oxygen content oi, which is collected from n sensors, as well as the time length t used for time series segmentation. The output of the algorithm is XFuse, which is applied to correct the data stream and improve data quality. The inner loop (lines 3-7) obtains the support degree matrix sij to compute the weights of sensors wj. In fact, the support degree value is limited to the Dist value between Sensor i and Sensor j. All these calculations are influenced by the Xi, which is obtained from data consistency checking. Thus, in the inner loop, each subsequence only needs to be performed j − 1 times when Sensor 1 is the fault node. We got used result XFuse with the calculation in lines 8-9 to replace the error data X1.  From Figure 1, when Sensor 1 is the fault node, we can get a new dataset X i = [x i1 , x i2 , . . . , x it ] in consistency checking module. Fusion module will utilize the fusion result of sensor nodes X 2 = {x 21 , x 22 , . . . , x 2t }, X 3 = {x 31 , x 32 , . . . , x 3t } and X n = {x n1 , x n2 , . . . , x nt } to correct X 1 = {x 11 , x 12 , . . . , x 1t }. Here, x ik represents the observed value of Sensor i in time j after data consistence checking, i = 2, 3, . . . , n, and k = 1, 2, . . . , t.
Algorithm 1 explains the detailed process of data correction. The input of data correction algorithm is the original data of dissolved oxygen content o i , which is collected from n sensors, as well as the time length t used for time series segmentation. The output of the algorithm is X Fuse , which is applied to correct the data stream and improve data quality. The inner loop (lines 3-7) obtains the support degree matrix s ij to compute the weights of sensors w j . In fact, the support degree value is limited to the Dist value between Sensor i and Sensor j. All these calculations are influenced by the X i , which is obtained from data consistency checking. Thus, in the inner loop, each subsequence only needs to be performed j − 1 times when Sensor 1 is the fault node. We got used result X Fuse with the calculation in lines 8-9 to replace the error data X 1 .

Data Consistency Detection
Due to the instability and transmission errors of underwater sensors in aquaculture, data exception or data missing often happens. Fragmentary data is mended from dissolved oxygen sensors by linear interpolation method [27]. Equation (1) shows the calculation process.
where o k is the observed value of dissolved oxygen content in time k, o k+j is the observed value of dissolved oxygen content in time k + j, o k+i is the missing value in time k + i. Consistency detection is realized by Autoregressive Integrated Moving Average Model (ARIMA) [28]. The main steps of consistency detection are described as follows: Step 1: Analyze the correlation of dissolved oxygen time series data and test the data stability.
Step 2: Determine the auto regression order p and moving-average order q of ARIMA. Build the optimal ARIMA model on the basis of these parameters.
Step 3: Calculate the prediction interval (PI) to determine whether the data collected is abnormal.
Here, o i (t) is the multi-sensor data in time t. x i (t) is the estimated value of ARIMA. C is the cost function [29]. Equation (3) calculates the PI value on the basis of the estimated value x of ARIMA.
where t is the P% of a Student's t-distribution with n − 1 degree of freedom, n is the sample size, and s is the standard deviation of n samples.

The Support Function
Weighted fusion method is one of the popular algorithms to fuse the homogeneous data [30]. Support function is used to explore the correlation between sensors from the experimental dataset. To express the support function well, we describe some useful parameterized formulations. Let sup(a, b) be the proximity between two elements a and b, called support degree. It meets the following properties: Actually, the more similar or closer the two elements, the more they support each other. Based on the three properties, Yager proposed the binary support functions [22], which is a discontinuity. One common form of the support function with a continuous is the Gaussian support function, which is defined as: where K is the maximally allowable support and can control the amplitude of the function. β is acting as the attenuation factor of function. The larger the β the more meaningful differences in distance. It should be noted that a = b makes the sup(a, b) = K. Thus, the distance between a and b will get larger, sup(a, b)→0. Gaussian support function is symmetric and lies in the unit interval. The calculation of this sup(a, b) relies on the exponent operation.

Improved Support Degree
To reduce the high computational complexity in calculating the Gaussian support degree function, a novel ISD function based on the theory of grey incidence analysis [31] is proposed in this paper. Liu [32] utilized the theory of grey incidence analysis to represent the proximity of two elements. Inspired by this idea, we constructed the ISD function by replacing the exponent operation of Gaussian support function. The computational complexity of support function can be reduced. The ISD function is defined as: where K decides the amplitude of the function, β denotes the attenuation factor of the function. If K is fixed, the attenuation velocity of support degree will go up with β. The smaller the difference between two elements is, the higher the support degree value is. Usually, the difference of dissolved oxygen content data at the same depth in an aquaculture concrete tank is lower than 2 mg/L, that is |a − b| ∈ [−2, 2]. To know the difference between support degree functions, a characteristic curve comparison of these functions is done in Figure 2. The G(a, b, 1, 2), D(a, b, 1, 2), SN(a, b, 1, 2) and ISD(a, b, 1, 2) represent the characteristic curves of these functions when K = 1 and β = 2.

Improved Support Degree
To reduce the high computational complexity in calculating the Gaussian support degree function, a novel ISD function based on the theory of grey incidence analysis [31] is proposed in this paper. Liu [32] utilized the theory of grey incidence analysis to represent the proximity of two elements. Inspired by this idea, we constructed the ISD function by replacing the exponent operation of Gaussian support function. The computational complexity of support function can be reduced. The ISD function is defined as: where K decides the amplitude of the function, β denotes the attenuation factor of the function. If K is fixed, the attenuation velocity of support degree will go up with β. The smaller the difference between two elements is, the higher the support degree value is. Usually, the difference of dissolved oxygen content data at the same depth in an aquaculture concrete tank is lower than 2 mg/L, that is |a − b|∈ [-2, 2]. To know the difference between support degree functions, a characteristic curve comparison of these functions is done in Figure 2. The G(a, b, 1, 2), D(a, b, 1,2), SN(a, b, 1, 2) and ISD(a, b, 1, 2) represent the characteristic curves of these functions when K = 1 and β = 2.

Improved Support Degree Function Based on DTW Distance (DTW-ISD)
The traditional support function is widely used to measure the proximity between two elements at time t. However, the Euclidean distance dij = |xi − xj| of two elements at t always loses the connection information of time series data. Considering the continuity of the time series, the similarity of time series was introduced to the ISD function.
DTW distance is a prevalent algorithm for measuring the similarity between two time series which may vary in time or speed. Based on the advantages of robust to the time warping and phase-shift, DTW was introduced to the ISD function [33,34]. Thus, we obtain the DTW-ISD support function, which is defined as:

Improved Support Degree Function Based on DTW Distance (DTW-ISD)
The traditional support function is widely used to measure the proximity between two elements at time t. However, the Euclidean distance d ij = |x i − x j | of two elements at t always loses the connection information of time series data. Considering the continuity of the time series, the similarity of time series was introduced to the ISD function.
DTW distance is a prevalent algorithm for measuring the similarity between two time series which may vary in time or speed. Based on the advantages of robust to the time warping and phase-shift, DTW was introduced to the ISD function [33,34]. Thus, we obtain the DTW-ISD support function, which is defined as: where Dist denotes the DTW distance between two time series X and Y. The time series X = {X 1 , X 2 , . . . , X p , . . . X m } of length m, and Y = {Y 1 , Y 2 , . . . , Y q , . . . Y n } of length n. Before calculating the DTW distance, the distance matrix D m×n is constructed firstly, where the element of the matrix, (p, q), corresponds to a distance function of the squared distance between X p and Y q :d pq = (X p − Y q ) 2 [34]. A warping path maps the elements of X and Y through matrix with minimal cumulative distance between them. Then the DTW distance Dist is calculated as Equation (7), which corresponds to the path with minimal warping cost.
(2) Continuity condition: The steps are confined to the points in the distance matrix with a − a ≤ 1 Therefore, the warping path can be determined using dynamic programming, as the following recurrence: where Dist (X, Y) is the sum of current d pq and the minimum cumulative distance from previous elements, d pq is the current cell distance.

ISD Function Based on DTW Distance and Time Series Segmentation
DTW has very high measure precision, but high computational complexity limits its application. DTW-ISD support function is a time-consuming process for the high dimensional calculation [36]. Therefore, we divide the time series into several subsequences by time series segmentation to reduce the complexity of the algorithm, and increase the efficiency.
We set a fixed length of segmentation for time series, and segmented them into some same-length subsequences. The DTW distance algorithm will be applied on each subsequence [37,38]. If the length of segmentation is l, the time series X and Y will be divided into the m/l and n/l subsequences respectively. Then calculate the DTW distance of these sequences and get the variable as follows.
where Dist T=l (X, Y) is the Dist(X, Y) in time slot T. Dist(T) represents the similarity distance between the series X and Y.
A combination of DTW distance and segmentation strategy (DTWS) is proposed to optimize the ISD function. DTWS takes the subsequence to calculate similarity distance and obtain the mutual support degree. Rather than on the original time series, it is computed on the segmented time series. The computational complexity of DTW is O(mn), and the computational complexity of DTWS is O(l 2 ) when m equals to n. If m = n, the computational complexity of DTWS is O(|m − n|%l × l).
Let X i (T) and X j (T) be the collected data from sensors i and j in time interval T after data consistency checking. Then substitute these data into Equation (6), we can get the DTWS-ISD support function as follows: (10)

Data Fusion Based on DTWS-ISD Function
We use the form of Equation (10) to define the proposed support function. The mutual support degree s ij between time series within time interval T can be constructed as follows: Then the mutual support degree matrix can be written as follows: where h denotes the number of sensors. The total support degree of the other h − 1 sensors to Sensor i within time interval T can be expressed as: Let w j represent the weighted factor of Sensor j.
Combined with the weighted fusion strategy, the final fusion estimation value is given in Equation (15).

Data Collection
All the dissolved oxygen data are collected by a WSN monitoring system monitoring system. The monitored pond is located in Changshu city, Jiangsu province. The total area of the Changshu aquaculture pond was 1.63 acres (about 110 × 60 m 2 ). There are five dissolved oxygen sensors distributed in different locations of the aquaculture concrete tank. All sensors are deployed in depth of 0.5 m underwater. The collected data set includes 720 data points (sampled 10 min once) of a cleaning period from 24 May to 28 May 2017. The detailed deployment diagram is shown in Figure 3. As shown in Figure 3, there are five sensor nodes and five aerators deployed in different locations. Aerator 5 and aerator 4 are controlled by the monitoring data of Sensor 5 and Sensor 2 respectively. Sensor 1 controls Aerator 1, Aerator 2 and Aerator 3. The data collected by sensor nodes are transmitted to sink node by wireless mode. Sink node can fuse all the sensor data and send them to server. Since all the data are stored on server, user makes control decisions through accessing the server. When any sensor does not work, data fusion mechanism can help to obtain corrected data to replace abnormal data on server. This control strategy is effective to reduce the amount of communication and improve the data quality.

The Analysis of Data Consistency Checking
ARIMA is used to detect the anomaly in dissolved oxygen data set. The anomalous data in aquaculture can be classified into two types: One is the peak data which occurs occasionally. The other type is continuous data which deviates from the normal data for a period. In the experimental data set, there are 16 missing data caused by time delay or transmission error. Considering these two types of anomalous data, the confidence interval is set at 95%, and there are 16 anomalous data. Here, detection rate (DR) is used to evaluate the performance of anomaly detection.
where TP is the true positive number, and FN is the false negative number.
The DR of ARIMA is 93.75%. Then we need to utilize the data fusion mechanism to correct the anomalous data when failures occur.

Time Series Segmentation and Analysis
We separately evaluate the performance of fusion algorithm with two metrics, including Mean Absolute Error (MAE) and time [39]. MAE is computed by Equation (17).
where N is the total number of sample points, yi is the real data and ŷi is the fusion value. Then set the segment length in five days respectively. All experiments are implemented by MATLAB and run on a PC with 3.4 GHz Core (TM) processor, 4.0 G memory, and Microsoft Windows 7. Figures 4 and 5 show the MAE and time of different segment lengths in five days. As illustrated in Figure 4, the overall trend of MAE value is basically in a stable state with different segment lengths in five days. On the other hand, the run time is different obviously in Figure 5. The overall As shown in Figure 3, there are five sensor nodes and five aerators deployed in different locations. Aerator 5 and aerator 4 are controlled by the monitoring data of Sensor 5 and Sensor 2 respectively. Sensor 1 controls Aerator 1, Aerator 2 and Aerator 3. The data collected by sensor nodes are transmitted to sink node by wireless mode. Sink node can fuse all the sensor data and send them to server. Since all the data are stored on server, user makes control decisions through accessing the server. When any sensor does not work, data fusion mechanism can help to obtain corrected data to replace abnormal data on server. This control strategy is effective to reduce the amount of communication and improve the data quality.

The Analysis of Data Consistency Checking
ARIMA is used to detect the anomaly in dissolved oxygen data set. The anomalous data in aquaculture can be classified into two types: One is the peak data which occurs occasionally. The other type is continuous data which deviates from the normal data for a period. In the experimental data set, there are 16 missing data caused by time delay or transmission error. Considering these two types of anomalous data, the confidence interval is set at 95%, and there are 16 anomalous data. Here, detection rate (DR) is used to evaluate the performance of anomaly detection.
where TP is the true positive number, and FN is the false negative number. The DR of ARIMA is 93.75%. Then we need to utilize the data fusion mechanism to correct the anomalous data when failures occur.

Time Series Segmentation and Analysis
We separately evaluate the performance of fusion algorithm with two metrics, including Mean Absolute Error (MAE) and time [39]. MAE is computed by Equation (17).
where N is the total number of sample points, y i is the real data andŷ i is the fusion value. Then set the segment length in five days respectively. All experiments are implemented by MATLAB and run on a PC with 3.4 GHz Core (TM) processor, 4.0 G memory, and Microsoft Windows 7. Figure 4, the overall trend of MAE value is basically in a stable state with different segment lengths in five days. On the other hand, the run time is different obviously in Figure 5. The overall trend of run time varies with the segment length linearly. Considering these two metrics together, the segment length is 2, and the weighted fusion method with DTWS-ISD support function can obtain a stable MAE value in less time.

The Best Proposed Function
To evaluate the DTW distance and time series segmentation strategy in DTWS-ISD method separately, we proposed other three functions: ISD, Cos-ISD (improved ISD by Cosine similarity [40,41]) and DTW-ISD. The cosine value of the angle is also introduced to replace the Euclidean distance and improve the ISD function. Figure 6 shows the fusion results of these functions. Here, x coordinate represents the different times (720 time points) in five days, y coordinate represents the dissolved oxygen content.
From Figure 6, the observed value of dissolved oxygen content basically has a periodical change trend every day, and some points deviate from the norm trend slightly during the sunrise. The changing trends of ID function, Cos-ISD function, DTW-ISD function and DTWS-ISD function are consistent with real values. However, the overall fusion result of DTWS-ISD function has the best

The Best Proposed Function
To evaluate the DTW distance and time series segmentation strategy in DTWS-ISD method separately, we proposed other three functions: ISD, Cos-ISD (improved ISD by Cosine similarity [40,41]) and DTW-ISD. The cosine value of the angle is also introduced to replace the Euclidean distance and improve the ISD function. Figure 6 shows the fusion results of these functions. Here, x coordinate represents the different times (720 time points) in five days, y coordinate represents the dissolved oxygen content.
From Figure 6, the observed value of dissolved oxygen content basically has a periodical change trend every day, and some points deviate from the norm trend slightly during the sunrise. The changing trends of ID function, Cos-ISD function, DTW-ISD function and DTWS-ISD function are consistent with real values. However, the overall fusion result of DTWS-ISD function has the best

The Best Proposed Function
To evaluate the DTW distance and time series segmentation strategy in DTWS-ISD method separately, we proposed other three functions: ISD, Cos-ISD (improved ISD by Cosine similarity [40,41]) and DTW-ISD. The cosine value of the angle is also introduced to replace the Euclidean distance and improve the ISD function. Figure 6 shows the fusion results of these functions. Here, x coordinate represents the different times (720 time points) in five days, y coordinate represents the dissolved oxygen content. approximation effect to the real values than the other three functions. The fusion results of the other three functions are close in Figure 6. In order to compare these functions sufficiently, we separately compare these functions with the metrics of MAE and time in Table 1.  From Table 1, we can see clearly that DTWS-ISD function has superior MAE value than other three functions and running in a short time. The relative MAE differences between DTWS-ISD and DTW-ISD, Cos-ISD, ISD are 4.8%, 53.6% and 23.1% in the test period respectively. The time of DTWS-ID is just 0.0039 s longer than ISD, but 2.4159 s shorter than DTW-ISD. That is because the proposed function can capture the continuity and fuzziness of data streams and improve the accuracy, but need take a little time.
The performance of Cos-ISD is unbalanced, regarding the maximal MAE value and shortest time. When compared with the other functions, it is not appropriate for Cos-ISD to finish fusion with the lowest accuracy. The results also show that DTW-ISD fusion precision is superior to ISD and Cos-ISD. DTW distance measuring algorithm can enhance the accuracy of ISD method greatly. However, the computing complexity of DTW distance is higher than cosine angle and Euclidean distance. DTWS-ISD has good performance both on accuracy and efficiency than DTW-ISD because of the time series segmentation strategy. It can reduce the complexity of DTW, thus improve the efficiency of DTWS-ISD. Considering both MAE and time, DTWS-ISD is the optimal fusion function.

Comparison with Existing Methods
In this experiment, we compare the performance of DTWS-ISD with Gauss support degree function [22], D function [23], and SN function [24]. Figure 7 shows the weighted fusion results of four functions. Here, x coordinate represents the different times in five days, and y coordinate represents the dissolved oxygen content.
From Figure 7, the changing curves of Gauss function, D function, SN function and DTWS-ISD function are consistent with real value. All curves have the periodic tendency of ascending first and descending in succession. However, the fusion results of DTWS-ISD are closer to the real value than other three existing functions. That is because DTWS-ISD can obtain better fusion results by exploring the correlation among sensors. From Figure 6, the observed value of dissolved oxygen content basically has a periodical change trend every day, and some points deviate from the norm trend slightly during the sunrise. The changing trends of ID function, Cos-ISD function, DTW-ISD function and DTWS-ISD function are consistent with real values. However, the overall fusion result of DTWS-ISD function has the best approximation effect to the real values than the other three functions. The fusion results of the other three functions are close in Figure 6. In order to compare these functions sufficiently, we separately compare these functions with the metrics of MAE and time in Table 1. From Table 1, we can see clearly that DTWS-ISD function has superior MAE value than other three functions and running in a short time. The relative MAE differences between DTWS-ISD and DTW-ISD, Cos-ISD, ISD are 4.8%, 53.6% and 23.1% in the test period respectively. The time of DTWS-ID is just 0.0039 s longer than ISD, but 2.4159 s shorter than DTW-ISD. That is because the proposed function can capture the continuity and fuzziness of data streams and improve the accuracy, but need take a little time.
The performance of Cos-ISD is unbalanced, regarding the maximal MAE value and shortest time. When compared with the other functions, it is not appropriate for Cos-ISD to finish fusion with the lowest accuracy. The results also show that DTW-ISD fusion precision is superior to ISD and Cos-ISD. DTW distance measuring algorithm can enhance the accuracy of ISD method greatly. However, the computing complexity of DTW distance is higher than cosine angle and Euclidean distance. DTWS-ISD has good performance both on accuracy and efficiency than DTW-ISD because of the time series segmentation strategy. It can reduce the complexity of DTW, thus improve the efficiency of DTWS-ISD. Considering both MAE and time, DTWS-ISD is the optimal fusion function.

Comparison with Existing Methods
In this experiment, we compare the performance of DTWS-ISD with Gauss support degree function [22], D function [23], and SN function [24]. Figure 7 shows the weighted fusion results of four functions. Here, x coordinate represents the different times in five days, and y coordinate represents the dissolved oxygen content. From Figure 7, the changing curves of Gauss function, D function, SN function and DTWS-ISD function are consistent with real value. All curves have the periodic tendency of ascending first and descending in succession. However, the fusion results of DTWS-ISD are closer to the real value than other three existing functions. That is because DTWS-ISD can obtain better fusion results by exploring the correlation among sensors.
Meanwhile, the fusion results of these functions also have some data fluctuation during sunrise. The sunrise occurs at 5:00 a.m. to 7:00 a.m., and changes with the seasons. Although the results of weighting fusion with Gauss function, D function and SN function are close, there are still some gaps. The proximity of Gauss function to the real value is slightly better than D function and SN function. To verify the accuracy and efficiency of DTWS-ISD, we give the comparisons of MAE and run time of four functions in Table 2.  We can see clearly from Table 2, the MAE of DTWS-ISD is minimal, and the other three functions are closer to each other. The MAE value of DTWS-ISD is improved 24.07% of Gauss, 29.96% of D function and 29.58% of SN function. As for the run time, the gaps among these four functions were very narrow. The time of DTWS-ISD is 0.002 s longer than Gauss, 0.0026 s than D and 0.0031 s than SN. The results indicate that DTWS-ISD has a significantly more reliable performance and higher fusion precision than Gauss function, D function and SN function. It is obvious that the support degree function, optimized by DTW distance and time series segmentation strategy, is a good choice for improving the quality of dissolved oxygen data streams.

Analysis of Correlation between Sensors' Distribution and Mutual Support Degree
In the process of multi-sensors fusion, the locations of sensors influence the accuracy and reliability of data. Figure 8 shows the distribution map of five sensors in aquaculture pond. D = {d12, d13, d14, d15} (d13 < d14 < d15, d12 = 2d0) represents the distance between Sensor 1 and the other sensors, d0 denotes the distance between Sensor 1 and the center point O. Sensor 1 and Sensor 2 are distributed symmetrically to the point O. To analysis the correlation between sensor's locations and mutual support degree, Figure 9 gives the support degree of four sensors to Sensor 1 over five days. Here, x coordinate denotes the sensors number, and y coordinate denotes the sensors' support degree to Sensor 1.  We can see clearly from Table 2, the MAE of DTWS-ISD is minimal, and the other three functions are closer to each other. The MAE value of DTWS-ISD is improved 24.07% of Gauss, 29.96% of D function and 29.58% of SN function. As for the run time, the gaps among these four functions were very narrow. The time of DTWS-ISD is 0.002 s longer than Gauss, 0.0026 s than D and 0.0031 s than SN. The results indicate that DTWS-ISD has a significantly more reliable performance and higher fusion precision than Gauss function, D function and SN function. It is obvious that the support degree function, optimized by DTW distance and time series segmentation strategy, is a good choice for improving the quality of dissolved oxygen data streams.

Analysis of Correlation between Sensors' Distribution and Mutual Support Degree
In the process of multi-sensors fusion, the locations of sensors influence the accuracy and reliability of data. Figure 8 Figure 9 gives the support degree of four sensors to Sensor 1 over five days. Here, x coordinate denotes the sensors number, and y coordinate denotes the sensors' support degree to Sensor 1. Combined with the location feature in pond and distance d, the correlation is known by analyzing the Figures 8 and 9. Sensor 2 and Sensor 1 are located from point O almost symmetrically, and the dissolved oxygen content distribution also have symmetrical feature around point O in Figure 8. From Figure 9, it is clear that Sensor 2 and Sensor 3 have greater support degree to Sensor 1. Meanwhile, the support degrees of Sensor 3, 4 and 5 are decreasing with the increase of distance di.
However, the correlation between support degrees of Sensors and distance is not a linear relationship. It is also influenced by the location feature. The closer these sensors located to the shore of pond or corner, the more complex the impact is. In these positions, there are many microbes, aquatic plants and sludge, which result in the difference of the similarity between the data of Sensor 1 and the data of Sensor 3, 4, 5. Therefore, the support degree value largely depends on the distribution of sensors.

Conclusions
Multiple sensors deployed in different locations of aquaculture pond can provide complementary information for fault detection and correction. We provide a novel improved support degree function combining with weighted fusion method for enhancing data quality. This method comprises two techniques: One is the ISD function inspired by the theory of grey incidence analysis, which can reduce the computational complexity of Gauss support function. The other is DTW time series segmentation strategy that replaces Euclidean distance for both accuracy and efficiency. The experimental results demonstrate that DTWS-ISD function can realize the data fusion and correction efficiently. Performance analysis of DTWS-ISD shows that it performs better than other three counterparts (Gauss, D and SN support function) in term of MAE and time. Its Combined with the location feature in pond and distance d, the correlation is known by analyzing the Figures 8 and 9. Sensor 2 and Sensor 1 are located from point O almost symmetrically, and the dissolved oxygen content distribution also have symmetrical feature around point O in Figure 8. From Figure 9, it is clear that Sensor 2 and Sensor 3 have greater support degree to Sensor 1. Meanwhile, the support degrees of Sensor 3, 4 and 5 are decreasing with the increase of distance d i .
However, the correlation between support degrees of Sensors and distance is not a linear relationship. It is also influenced by the location feature. The closer these sensors located to the shore of pond or corner, the more complex the impact is. In these positions, there are many microbes, aquatic plants and sludge, which result in the difference of the similarity between the data of Sensor 1 and the data of Sensor 3, 4, 5. Therefore, the support degree value largely depends on the distribution of sensors. Combined with the location feature in pond and distance d, the correlation is known by analyzing the Figures 8 and 9. Sensor 2 and Sensor 1 are located from point O almost symmetrically, and the dissolved oxygen content distribution also have symmetrical feature around point O in Figure 8. From Figure 9, it is clear that Sensor 2 and Sensor 3 have greater support degree to Sensor 1. Meanwhile, the support degrees of Sensor 3, 4 and 5 are decreasing with the increase of distance di.
However, the correlation between support degrees of Sensors and distance is not a linear relationship. It is also influenced by the location feature. The closer these sensors located to the shore of pond or corner, the more complex the impact is. In these positions, there are many microbes, aquatic plants and sludge, which result in the difference of the similarity between the data of Sensor 1 and the data of Sensor 3, 4, 5. Therefore, the support degree value largely depends on the distribution of sensors.

Conclusions
Multiple sensors deployed in different locations of aquaculture pond can provide complementary information for fault detection and correction. We provide a novel improved support degree function combining with weighted fusion method for enhancing data quality. This method comprises two techniques: One is the ISD function inspired by the theory of grey incidence analysis, which can reduce the computational complexity of Gauss support function. The other is DTW time series segmentation strategy that replaces Euclidean distance for both accuracy and efficiency. The experimental results demonstrate that DTWS-ISD function can realize the data fusion and correction efficiently. Performance analysis of DTWS-ISD shows that it performs better than other three counterparts (Gauss, D and SN support function) in term of MAE and time. Its

Conclusions
Multiple sensors deployed in different locations of aquaculture pond can provide complementary information for fault detection and correction. We provide a novel improved support degree function combining with weighted fusion method for enhancing data quality. This method comprises two techniques: One is the ISD function inspired by the theory of grey incidence analysis, which can reduce the computational complexity of Gauss support function. The other is DTW time series segmentation strategy that replaces Euclidean distance for both accuracy and efficiency. The experimental results demonstrate that DTWS-ISD function can realize the data fusion and correction efficiently. Performance analysis of DTWS-ISD shows that it performs better than other three counterparts (Gauss, D and SN support function) in term of MAE and time. Its effectiveness was verified in a real-world application for correcting the dissolved oxygen sensor data.
The following work will focus on two aspects. One is the improvement of DTWS-ISD function. It is expected to explore other algorithms to reduce computational complexity. The other is extending the idea of weighting fusion based on DTWS-ISD function to more fields, such as information prediction, target tracking, and data classification.