Multi-Attribute Fusion Algorithm Based on Improved Evidence Theory and Clustering

In most of the application scenarios of industrial control systems, the switching threshold of a device, such as a street light system, is typically set to a fixed value. To meet the requirements for a smart city, it is necessary to set a threshold that is adaptive to different conditions by fusing the multi-attribute observations of the sensors. This paper proposes a multi-attribute fusion algorithm based on fuzzy clustering and improved evidence theory. All of the observations are clustered by fuzzy clustering, where a proper clustering method is chosen, and the improved evidence theory is used to fuse the observations. In the experiments, two-dimensional observations for the street light illumination and for the ambient illumination are used in a campus-intelligent lighting system based on a narrowband Internet of things, and the results demonstrate the effectiveness of the proposed fusion algorithm. The proposed algorithm can be applied to a variety of multi-attribute fusion scenarios.


Introduction
Multi-attribute fusion is a fusion method involving data from multiple attributes of many sensors, in order to obtain more accurate and reliable conclusions. Many scholars have studied this issue, and have even applied this algorithm to numerous aspects of life. An opinion of multi-attribute information fusion was proposed in order to improve the accuracy of sensor networks. An important part of the unmanned parking-lot system designed by the authors of [1] is license plate recognition. A background fill light is installed under the camera, which can enhance the capture effect of the camera in a dim environment. The main role of the background fill light is to improve the brightness of the capture environment and to make the picture clearer. If the illuminance meter can be used to measure the light source illuminance and the ambient illuminance of the background fill light many times, so as to fuse a switch value, it is possible to accurately determine whether a background fill light needs to be turned on and captured at a certain time. In the literature [2][3][4], the intelligent road-mounting system detected and recognized the license plate, model, color, facial image of the people in the vehicle, and the fill light technology of the camera during capture. The outstanding effect of the LED (Light Emitting Diode) fill light in the work of the authors of [5] showed the growth and development of high-quality vegetable seedlings in low light conditions, light intensities, and photoperiods. It is shown in the literature [6] that an intelligent lighting system for an exhibition hall network, which may be a scene illuminance sensor installed according to the moment of sunlight illumination change, sets a different direct light brightness for exhibits when visitors approach, and also sets up the illumination according the objective function. It can determine generics of the samples so as to achieve the purpose of grouping the observations automatically, and its accuracy is higher than the classical K-means clustering algorithm. Therefore, the FCM algorithm was selected for the experiment.
In this paper, 100 sets of observations with two attributes were tested by measuring the ambient illuminance and light illuminance of street lights, based on a smart campus lighting system. These observations were clustered by an FCM algorithm. The idea of the algorithm has been described in detail [16][17][18][19][20].
It divides 100 sets of observations x n (n = 1, 2, . . . , 100) into categories. Each sample is not strictly divided into a certain category, but belongs to a category in a certain membership grade. The division can be stated by the membership matrix, U = [u ik ] a×100 . u ik shows the membership grade of i(i = 1, 2, . . . , 100) to k(k = 1, 2, . . . , a). v = (v 1 , v 2 , . . . , v a ) is the center of the clustering, and d ik = x i − v k is the Euclid distance between the samples and each clustering center. FCM divides the objective function as follows: where the parameter h > 1 is a weighted index to control the fuzzy degree of the membership matrix, U. h will be fuzzier when it gets bigger. When h = 1, the fuzzy clustering will decline to the HCM clustering.
For the FCM algorithm, the algorithm has to specify the number of categories in advance, initialize a and h here, a = 2, 3, 4 and h = 2. ε = 1 × 10 −4 is the accuracy of the cluster center. The initial cluster centers were generated randomly. The membership matrix can be calculated according to Equation (3). Secondly, we used Equation (2) to adjust the cluster centers and categories. Finally, in the light of the termination condition, we used ε = 1 × 10 −4 to determine whether the accuracy of the cluster centers met the requirement. We recalculated the membership matrix and cluster centers if the termination condition was not met by the iteration. The clustering effect of the different categories of these 100 samples were different.
Using the Lagrange multiplier method, the necessary conditions to minimize Equation (1) were as follows: The specific flow chart is shown in Figure 1.

Improved Dempster-Shafer Evidence Theory
The Dempster-Shafer evidence theory (DS evidence theory) was presented by Dempster, and Shafer perfected the theory based on the research of Dempster. The DS evidence theory can describe the uncertainty and incompleteness of the evaluation information well [21].
Identification framework. Given a finite nonempty set identification framework, it includes N mutually exclusive elements,  0 1 Θ → ， , L is any subset of the identification framework. L makes ( ) 0 m L > , which is known as the focal element, and ( ) m L is regarded as the basic probability assignment (BPA; also known as the mass function), which indicates the degree of support for proposition L . L satisfies the following: We used the obtained mass function to make a decision, then, we used the Euclidean distance to help find the data near the true value, and even to find the true value, eliminate the elements that can be removed, and fuse the evidence with a fusion rule based on the reliability factor.

Improved Dempster-Shafer Evidence Theory
The Dempster-Shafer evidence theory (DS evidence theory) was presented by Dempster, and Shafer perfected the theory based on the research of Dempster. The DS evidence theory can describe the uncertainty and incompleteness of the evaluation information well [21].
Identification framework. Given a finite nonempty set identification framework, it includes N mutually exclusive elements,Θ = L 1, L 2,··· , L N . The set of all of the subsets in Θ is called a power set of Θ, that is, 2 Θ .
Basic probability assignment. Suppose that in the identification framework of Θ, m(L) satisfies the mapping of 2 Θ → [0, 1] , L is any subset of the identification framework. L makes m(L) > 0, which is known as the focal element, and m(L) is regarded as the basic probability assignment (BPA; also known as the mass function), which indicates the degree of support for proposition L. L satisfies the following: We used the obtained mass function to make a decision, then, we used the Euclidean distance to help find the data near the true value, and even to find the true value, eliminate the elements that can be removed, and fuse the evidence with a fusion rule based on the reliability factor.

Judging the Accuracy of the Observations
All of the observations are denoted by L i = (x i , y i ), the distance from observation L i to observation between the two samples. Moreover, d representd the mean distance between all of the observations, and the formulas for d i and d are as follows: Because the observations from the sensors are different, the normal observations in all of the deviation ranges should be distributed near the true value, and the observations with large deviations are far from the normal observations. Definition 1. U 1 is a small deviation set and U 2 = U 1 is a large deviation set, where U 1 ∪ U 2 = Θ. U 1 meets the following requirements:

Observations Converted to Evidence
Converting observation L i to evidence e i is the core of evidence theory and the basis of data fusion.

Definition 2.
If any of the observations of L i exist as ∆ i ≥ 0, such that the true value, L 0 , is within the neighborhood, L i of δ i (the circle center of L i and the radius of ∆ i ), δ i is the scattering interval of L i , and ∆ i is called the scattering radius. The size of ∆ i is determined by the deviation of the observation L i from the true value L 0 .
If L i ⊆ U 1 , L i is a small deviation observation, the scattering radius is relatively small, and L i is located in the δ 1 circle (in the center of L i , with a radius of d). There are K observations; X K ⊆ Θ(K = 1, 2, . . . , K) are small deviation observations, and it is considered that the K observations are close to the true value, L 0 , with the same probability, that is, the K small deviation observations obtain a basic probability allocation of 1/K, and the basic probability assignment of the remaining N − K observations are 0. The mass function of the evidence, e i 1 , obtained by L i , is as follows: L i is a large deviation observation if L i ⊆ U 2 , which is far away from the true value, L 0 , and if the scattering radius is large, d max stands for the distance between the maximum and minimum values, and the following formula is obtained: Taking d max as the scattering radius of L i , because the true value, L 0 , must lie in the intermediate of the maximum and the minimum observations, all the observations are included in δ 2 (the circle centered L i and radius d max ), obtaining the evidence's (e i 2 ) mass function, as follows: Then, the evidence, e i 2 , is converted from the large deviation of the observation, L i , and each observation obtains a basic probability assignment of 1/N. The process of generating N initial evidence e i (i = 1, 2, . . . , N) from N observations have been completed [22,23].
The above initial evidences may contain focal points of both U 1 and U 2 , and the degree of deviation in each observation has not been considered. The initial evidence, e i , is corrected as follows: ∀ X 1 , X 2 ∈ U 1 , and the ratio of basic probability assignment obtained by X 1 and X 2 is as follows: ∀ X K ∈ U 1 and ∀ L i ⊆ U 2 , between X K and L i , the rate is as follows: The above two formulas produce a set of correction coefficients {ω n }(n = 1, 2, . . . , N) for the normalized, weighted, and corrected evidence, with the following equation:

Combination of Evidence
There may be a high conflict between all of the evidence obtained by the above algorithm, and in order to avoid the unreasonable weight distribution, combined with the literature [10,11], the evidence fusion formula that assigns the probability of supporting evidence conflict to the observation is as follows: where c is the conflict factor and M i (L i ) is the average distribution of L i in all of the evidence. The formulas are as follows: The basic probability distribution m(L i ) is the weight obtained by L i , and the fusion result is as follows:

Experiment Analysis
In order to verify the effectiveness of this algorithm, 100 sets of two-dimensional datasets were used for the experiments. The datasets involved the ambient illumination and street light illumination of intelligent campus lighting systems. As a result of the measurement error of the sensors and other factors, such as ambient lighting, the ambient illumination and street lights of different streets were measured to be different at the same time, but the observations were always distributed near a certain value. The reference value of the switch threshold is 20.00 LX.
The 100 sets of data were collected within 5 min of nightfall. At dusk, the brightness of the environment changed rapidly, and the process from daylight to dark was more obvious. Therefore, the ambient brightness varied from bright to dark, as the x column of the two-dimensional data sets. When the ambient light level was dimmed from light to dark, the data of this column should be changed from large to small. As the brightening of the street light source is a process, when the street light turned on, the brightness of the light source is relatively small. After a period of time, the light source will be brightened to a normal brightness. Thus, the change in the brightness of the light was recorded within 5 min. The light is a process from dark to light, as seen in the y column of the two-dimensional datasets, and the data of this column should be changed from small to large. This is the data law of the 100 datasets.
By combining the two brightness characteristics of street lights, a street light switching threshold suitable for the specific environment will be obtained. The brightness of the single light should be different in different weather, locations, human flows, and so on. The threshold obtained by fusion is also different. This threshold helps the street light's manager to manage and control the switch of the street lights precisely. The administrator can achieve precise control of the street light switch in different weather, environments, and seasons, according to the fusion threshold [24].
In this paper, the distance between the two observations was calculated by the Euclidean distance. These two-dimensional observations were clustered by MATLAB software programming. FCM helps to obtain clustering images and clustering conditions. Each clustering observation is regarded as the identification framework, and is then converted into evidence. By modifying and combining the evidence, we get the weight assignment of the observations, which is called the mass function, and the preliminary fusion value of the samples can be obtained. Finally, the data of all of the categories are weighted according to different weights. The fusion result is one-dimensional data. We compared the fusion results of two, three, and four categories, using the highest precision data as the final fusion result, which is the switching threshold of the street light in smart campus lighting system.
The In the above three figures, the abscissa indicates the ambient illuminance, and the illuminance of the street lights is shown in the ordinate. The samples with two attributes get the corresponding cluster images under the constraints of different cluster centers, respectively. Different categories of samples are represented by different shapes and colors. The red "x" indicates the cluster center. The first category of data is depicted by black circles, the second category is represented by purple triangles, the third category is shown by green "*", and the fourth category is presented by blue triangles. Figure 2 shows the location of the cluster center and which category all of the sample points belong to when the number of the cluster centers is two. Figure 3 shows the number of cluster centers set to three, the location of the cluster center and the category to which all of the samples belong. The number of cluster centers is four, and the location of the cluster center and the category to which all of the data belong are shown in Figure 4. For the different datasets, it cannot be decided whether to divide the dataset into several categories in order to achieve the best fusion effect. Therefore, the DS theory was used to fuse the three clustering cases, and then the fusion results were compared. The highest precision result was the threshold of the intelligent campus lighting system.
As can be seen from Tables 1-3, the data fusions are firstly performed by each group of data obtained by FCM clustering, and then the fusion results of these three groups are weighted and fused to obtain the final fusion result. When the number of clusters is three, the relative error is 2.003%, the relative error with two categories is 2.431%, and the relative error with four categories is 2.015%, so it is best to divide the dataset into three categories. Take the third group of the highest precision fusion results as an example-the fusion result is shown in Table 4. dimensional datasets, and the data of this column should be changed from small to large. This is the data law of the 100 datasets. By combining the two brightness characteristics of street lights, a street light switching threshold suitable for the specific environment will be obtained. The brightness of the single light should be different in different weather, locations, human flows, and so on. The threshold obtained by fusion is also different. This threshold helps the street light's manager to manage and control the switch of the street lights precisely. The administrator can achieve precise control of the street light switch in different weather, environments, and seasons, according to the fusion threshold [24].
In this paper, the distance between the two observations was calculated by the Euclidean distance. These two-dimensional observations were clustered by MATLAB software programming. FCM helps to obtain clustering images and clustering conditions. Each clustering observation is regarded as the identification framework, and is then converted into evidence. By modifying and combining the evidence, we get the weight assignment of the observations, which is called the mass function, and the preliminary fusion value of the samples can be obtained. Finally, the data of all of the categories are weighted according to different weights. The fusion result is one-dimensional data. We compared the fusion results of two, three, and four categories, using the highest precision data as the final fusion result, which is the switching threshold of the street light in smart campus lighting system.
The In the above three figures, the abscissa indicates the ambient illuminance, and the illuminance of the street lights is shown in the ordinate. The samples with two attributes get the corresponding cluster images under the constraints of different cluster centers, respectively. Different categories of samples are represented by different shapes and colors. The red "x" indicates the cluster center. The first category of data is depicted by black circles, the second category is represented by purple triangles, the third category is shown by green "*", and the fourth category is presented by blue triangles. Figure 2 shows the location of the cluster center and which category all of the sample points belong to when the number of the cluster centers is two. Figure 3 shows the number of cluster centers set to three, the location of the cluster center and the category to which all of the samples belong. The number of cluster centers is four, and the location of the cluster center and the category to which all of the data belong are shown in Figure 4. For the different datasets, it cannot be decided whether to divide the dataset into several categories in order to achieve the best fusion effect. Therefore, the DS theory was used to fuse the three clustering cases, and then the fusion results were compared. The highest precision result was the threshold of the intelligent campus lighting system.    As can be seen from Tables 1-3, the data fusions are firstly performed by each group of data obtained by FCM clustering, and then the fusion results of these three groups are weighted and fused to obtain the final fusion result. When the number of clusters is three, the relative error is 2.003%, the relative error with two categories is 2.431%, and the relative error with four categories is 2.015%, so it is best to divide the dataset into three categories. Take the third group of the highest precision fusion results as an example-the fusion result is shown in Table 4.    As can be seen from Tables 1-3, the data fusions are firstly performed by each group of data obtained by FCM clustering, and then the fusion results of these three groups are weighted and fused to obtain the final fusion result. When the number of clusters is three, the relative error is 2.003%, the relative error with two categories is 2.431%, and the relative error with four categories is 2.015%, so it is best to divide the dataset into three categories. Take the third group of the highest precision fusion results as an example-the fusion result is shown in Table 4.  The final fusion result of the three groups of data shown in Table 2 is 20.4006, and the relative error is 2.003%.
The fixed threshold method is a global fixed threshold, which means that all of the datasets of a system are binarized with a unified threshold. The multi-attribute fusion algorithm proposed in this paper can fuse a threshold adaptively, and can dynamically change the threshold to reduce the error. Take the data sample of high brightness ambient illumination as an example. The reference value of the switch threshold is 30.85 LX. Figures 5-7 show the cluster centers of different categories. From the following Tables 5-7, it can be seen that when the samples are clustered into four categories, the fusion result has the highest precision, and the fusion result is 31.4590. The fixed threshold method aims to set all of the street lights to the same switch threshold. If the switch threshold is still set to 20.00 LX in the place where the ambient brightness is high, a large error will occur, resulting in a waste of energy.   lights to the same switch threshold. If the switch threshold is still set to 20.00 LX in the place where the ambient brightness is high, a large error will occur, resulting in a waste of energy.      lights to the same switch threshold. If the switch threshold is still set to 20.00 LX in the place where the ambient brightness is high, a large error will occur, resulting in a waste of energy.      lights to the same switch threshold. If the switch threshold is still set to 20.00 LX in the place where the ambient brightness is high, a large error will occur, resulting in a waste of energy.     The literature [25] shows us the remediation of failed identification in product multi-source information fusion based on DS evidence theory. This work is aimed at the wear and tear, corrosion, and pollution of two-dimensional data matrix symbols lead by the complexity of the discrete manufacturing enterprise, production environment, and production process. The method firstly established a remedy technical framework. Secondly, it calculated the similarity measurement of the multi-attribute data identification of invalid distinguish goods. Finally, it recognized the data matrix code based on the multi-attribute fusion of DS theory. The experiment analysis demonstrated a processing statistic for 120 failure data matrix codes, and matching the correct rate can be up to 96%. Compared with this algorithm, the correct rate of the algorithm proposed in this paper is above 97.997%, which has the advantage of a higher accuracy. This opinion has more robustness to switching threshold of the smart street light system. This method can better integrate the street light switching threshold to be suitable for a specific environment, which provides effective support for other two-dimensional information fusion application scenarios.

Conclusions
Based on the FCM clustering algorithm and improved evidence theory, a novel multi-attribute fusion algorithm is proposed. The algorithm can combine the light source illumination and ambient illumination of the smart campus street light system effectively in order to obtain the switching threshold. To the best of our knowledge, it is the first time that the FCM clustering algorithm is being combined with evidence theory for multi-attribute fusion. The effectiveness of the algorithm is also tested by the real-life datasets based on the smart campus street lights system. The effect of the algorithm is proven by the change of the cluster centers, which lays a foundation for future multi-attribute fusion.
This algorithm also involves some future issues. First of all, the clustering method cannot be fully applied to the data of all of the application scenarios. When the number of samples is large, the FCM algorithm needs to define the cluster center, and it easily falls into the local optimal solution. Secondly, is how to distinguish useful information from noise in the process of classifying samples. The next step is to try other algorithms (such as the Bayesian estimation) to cluster the initial observations more accurately. For different application problems, how to make the algorithm adapt to the cluster center intelligently also has a lot of space for improvement.