Monitoring Chemical Processes Using Judicious Fusion of Multi-Rate Sensor Data

With the emergence of Industry 4.0, also known as the fourth industrial revolution, an increasing number of hardware and software sensors have been implemented in chemical production processes for monitoring key variables related to product quality and process safety. The accuracy of individual sensors can be easily impaired by a variety of factors. To improve process monitoring accuracy and reliability, a sensor fusion scheme based on Bayesian inference is proposed. The proposed method is capable of combining multi-rate sensor data and eliminating the spurious signals. The efficacy of the method has been verified using a process implemented at the Dow Chemical Company. The sensor fusion approach has improved the process monitoring reliability, quantified by the rates of correctly identified impurity alarms, as compared to the case of using an individual sensor.


Introduction
Initiated by the industrial internet of things (IIoT) concept, Industry 4.0 holds huge potential in improving operational effectiveness, developing new business models, and promoting sustainability [1]. With such a promising vista, Industry 4.0 has become a top priority for many companies, research centers and universities [2]. It in return facilitates the momentum of IIoT. As the number of instruments and sensors implemented in manufacturing processes increases, the amount of data collected also increases [3,4]. When a large amount of data, also known as big data, is available, it is non-trivial to apply relevant analytics methodologies to turn these data into useful and actionable information.
In chemical processes, multiple sensors are employed to monitor key process variables, such as variables related to product quality and process safety. The sampling rates and the reliabilities of three typical sensors for monitoring chemical plants are given in Table 1. The Laboratory analyzer refers to instruments installed in a laboratory. Chemical samples are taken from the process stream (typically manually) and transferred to a lab for analysis. Because of the relatively clean environment, lab analyzers are considered to be the relatively reliable method for analyzing chemicals. However, its sampling frequencies are often too low for process monitoring and control purposes. The sample transfer process also introduces a time delay for the measurements, during which, the chemicals may degrade. Online analyzers are analytical instruments that are installed at the plant, not in a laboratory. The instrument is either in contact with the process or the sample is automatically delivered a short distance to the analyzer. They are also called pipe-centric analyzers, which clearly indicates the usual location of this type of analyzers. As the samples are taken in real-time from the process, the sampling frequency of online analyzer increases to once every several minutes. On the other hand, due to the limitation of the location, sometimes high temperature or high pressure, the online analyzer may paper is a multi-rate and judicious sensor fusion scheme based on Bayesian inference [30]. Comparing to our previous work [31], this new sensor fusion scheme is capable of dealing with multi-rate data. It can also identify and eliminate the spurious data from malfunctioning sensors. The proposed method has been examined in a chemical process utilized at Dow. The product impurity is inferred by fusing the measurements from a lab analyzer, an online analyzer and a PLS software sensor with different sampling rates. The software sensor was updated using mean and variance update algorithm to track the time-varying process dynamics. The online analyzer measurements were filtered to reduce their variability. These two sensor measurements were fused together with the lab measurements using maximize a posterior (MAP) approach [32]. It is shown that the sensor fusion approach improves the process monitoring reliability, quantified by the rates of correctly identified impurity alarm, comparing to the case of using an individual sensor.
The paper is organized as follows: the adaptive PLS soft sensor methodology based on mean and variance update is briefly introduced. Subsequently, the multi-rate judicious sensor fusion scheme derived via MAP is presented. The proposed approaches are then applied to a chemical process at Dow and its performances for monitoring the process quality variable are discussed.

Adaptive PLS Soft Sensor
In this paper, PLS soft sensors are used to model the process quality variable using frequently measure process variables. The PLS soft sensor is given as follows: where X ∈ R n×m and Y ∈ R n×p are matrices of process and quality variables, respectively. Each element in the two matrices, x ij or y ij , is mean-centered and scaled by standard deviation, i.e., x ij = (x 0,ij − µ x 0, j /σ x 0, j ) and y ij = (y 0,ij − µ y 0, j /σ y 0, j ), with x 0,ij and y 0,ij being the original process and quality values; µ x 0, j and µ y 0, j the means of the j-th process and quality variables; σ x 0, j .and σ y 0, j the standard deviations of the corresponding variables. The T ∈ R n×m and U ∈ R n×p are the scores matrices for the X and Y matrices, while P ∈ R m×A and Q ∈ R p×A are the corresponding loadings matrices. By applying the PLS method, the process variables of m dimensions are projected to a reduced space of A (<m) dimensions, in which, the process variables are represented by the score vector, t. The number of principal components A is determined through cross-validation or information criterion [33,34]. The coefficients in the PLS soft sensor are estimated by: where R is the loading weight matrix following the notation in [35]. The prediction of the quality variables, y ∈ R p , given the new set of scaled process variable x s ∈ R m is given as: where the element in vector x s is again defined as x j = (x 0, j − µ x 0, j /σ x 0, j ). The means and standard deviations, µ x , σ x , µ y , σ y , are calculated using the training data.
To track the dynamics changes of chemical process, the PLS soft sensor here is updated periodically using the mean and variance update methodology [18]. When a new measurement is available, the coefficients, β PLS will remain the same, but the means and standard deviations, µ's and σ's given in Equation (3), of the process and quality variables will be updated. Then the new product quality variable is predicted by substituting the process variables and the updated means and standard deviations, µ x,new , σ x,new , µ y,new , σ y,new , in Equation (3): To minimize the computational burden, the soft sensor is updated only when significant process changes are detected. Two Key Performance Indicators (KPI's), based on Hotelling's T 2 and residuals, are used to quantify the changes. The scores, t, representing process variables in the reduced dimension space satisfies the Hotelling's T 2 distribution, a generation of student t-distribution for multivariate analysis. Here we utilize the T 2 statistic to detect mean shifts of the process variables from PLS score vectors. The T 2 statistic is defined as: with Λ = 1 n−1 T T T defines the covariance of the scores t of the training dataset used to update the PLS sensor. The training dataset contains n samples. The t 0 is the score vector of the new process variable. The F A,n−A represents an F-distribution with A and n − A degree of freedom where A is the number of principal components. When the T 2 exceeds the upper limit, T 2 max , the means and standard deviations of the soft sensor model are updated. The T 2 max is calculated using inverse Hotelling's T 2 distribution with a pre-determined confidence level, 1 − α: In this paper, a 99.9% confidence level (α = 0.001) is employed. The second KPI considered here is the absolute residual of the soft sensor prediction, e, which is determined as: where y is the quality variable measured using lab instrument whileŷ is the corresponding value predicted by the PLS soft sensor. The upper limit of the residual is calculated in a similar manner as given in Equation (6), but assuming a normal distribution of the residuals. As the raw values of two KPIs are easily affected by high frequency measurement noises, moving median filters are applied for improved reliability. Following the approach outlined in a previous publication [18], we use the filtering window size of 336 h. If the upper limit of the T 2 -based KPI is exceeded, the update of means and standard deviations of the soft sensor will be processed, using the lab measurement (excluding the spurious ones) during the most recent K hours as y variable and corresponding process variables as x variable. Here we select K = 3000 which is the same value used in the previous paper [18]. In addition, one may consider rebuilding the soft sensor if the residual-based KPI exceeds the corresponding upper limit. This mean and variance update method works as effective as rebuilding PLS models at new operating conditions [18] when the subset of input variables affecting the output variables remains identical at the new operating conditions.

Multi-Rate Judicious Data Fusion
In general, the fused measurement has smaller variance or higher precision as compared to individual sensor measurement. However, as discussed above, the individual sensor generates inaccurate measurements occasionally, called spurious data. When those values are fused together with other accurate individual measurements, the accuracy and precision of the fused measurement could be lower than the most accurate individual measurement. Therefore, the data fusion concept [30] must be judiciously applied to identify and eliminate the spurious data.
Assume that the individual sensor measurement is normally distributed around its true value, the probability of observing a senor measurement,ŷ i,k , given the true value, y k , is defined by: where the subscript i, k represents the i-th sensor's measurement at time instant k. The ϕ i,k = 1 indicates that at the k-th time instant, the corresponding sensor is working properly. The probability that a sensor is properly functioning defined in [30] is given by: with the value parameter α i,k assumed as: where m is selected as the largest divergence allowed between the sensor measurements. When the i-th sensor reading is close to other sensors readings, the value of α 2 i,k will be larger. Therefore, the probability that this sensor is properly functioning, as given in Equation (9), will be larger. It can be also seen later in this section in Equation (16), with a larger α 2 i,k value, the corresponding sensor data will have smaller weight in the fused value comparing to the data from other more reliable sensors. In this paper, m is selected as 50% of the largest measurement at time instant k. This value is selected based on the historical data of the sensors. If the data from different sensors are close to each other, the value of m can be smaller. If the sensors readings usually have a larger deviation, then a larger m should be selected. The variance for each sensor,σ 2 i,k , is estimated as: The average value for the i-th sensor measurement, y i,k , is calculated using L most recent measurements, i.e.,ŷ i,k−L+1 ,ŷ i,k−L+2 . . .ŷ i,k . In this paper, we select a relative short 48-hour time period. That means for lab measurement, L = 4 while for online analyzer and software sensor, L = 48. The smaller window size leads to higher variance when a sensor reports an alarm. Because the proposed sensor fusion method assigns more weight on the sensor readings with smaller variance, the smaller window size will make the identification of false alarm easier, especially when there are only two sensors available.
For a properly functioning sensor, the deviations of its reading from other sensors readings should be always smaller than the allowed deviation, m. By Equation (10), the following constraint for a properly functioning sensor should be satisfied: However, when a sensor is malfunctioning, its reading will deviate significantly from other sensor readings. This will lead to a smaller α 2 i,k value and violate the above constraint. The fused value will only take the reliable measurements satisfying inequality (12) into account.
Given the measurements from the three independent sensors mentioned previously, the posterior probability of the true product quality variable is given by: for i = 1, 2, 3. The optimal estimation of the actual product quality variable value, y k , via maximize a posterior (MAP) approach [32] is given by: The estimation of the product quality variable given in Equation (14) can be interpreted as a weighted average of predictions by three sensors and is given by: The weight coefficients of the i-th sensor reading at time instant k is then calculated as: for i = 1, 2, 3. As the variance of the sensor measurement and the deviation of a sensor from other sensors data decreases, the weight coefficients increase and the corresponding sensor measurements have more contribution to the fused estimateŷ k .
The above results are based on the case that all three sensors are available. As the lab analyzer has a lower sampling frequency compared to the online analyzer and the soft sensor, in most of cases, there are only two sensors data available. In such a case, the malfunctioned sensor is not that easy to identify. Fusing a spurious sensor data to the other accurate sensor data may corrupt the accurate one. The variance of the posterior distribution for the fused value of two sensor readings can be derived from Equation (13) and given by: To ensure an improved precision or decreased variance, the posterior variance σ 2 f must be smaller than the prior variance of the two individual sensors, i.e., σ 2 f ≤ min(σ 2 i,k , σ 2 j,k ). Otherwise, the two sensors value will not be fused and the sensor measurement with smaller variance will be used.

Proposed Sensor Fusion Scheme
The proposed multi-rate and judicious sensor fusion scheme is shown in Figure 1, while the steps for executing the proposed sensor fusion method are summarized in Table 2. Three sensors, Lab Analyzer, Online Analyzer and Software sensor model are monitoring the sample variable. The online analyzer is filtered to reduce the measurement variance. The software sensor model is updated periodically to track the changes in process behaviors. When the lab measurement is available, the data from all three sensors are compared to check if any of them is spurious. The spurious data will be eliminated and the fused value is calculated using the reliable data and the above equations. When the lab measurement is not available, the spurious data is identified by comparing the data from the online analyzer and the software sensor. If either of them is spurious, the reliable data will be used as the fused value. Otherwise, the fused value is determined based on data from both sensors. Using the proposed methodology, multi-rate sensor data can be used to improve the process monitoring accuracy and reliability. The malfunctioning sensor can be identified and eliminated from the fusion process.
Sensors 2019, 19, x; doi www.mdpi.com/journal/sensors be eliminated and the fused value is calculated using the reliable data and the above equations. When the lab measurement is not available, the spurious data is identified by comparing the data from the online analyzer and the software sensor. If either of them is spurious, the reliable data will be used as the fused value. Otherwise, the fused value is determined based on data from both sensors. Using the proposed methodology, multi-rate sensor data can be used to improve the process monitoring accuracy and reliability. The malfunctioning sensor can be identified and eliminated from the fusion process.  Step 1 At time instant k, check if software sensor needs an update by calculating T 2 using Equation (5). If T 2 > T 2 max , go to Step 2, otherwise, go to Step 3.
2 Update the mean and standard deviation using past K lab measurement (excluding the spurious lab measurements) as y variable and corresponding process variables as x variable. 3 For available sensor readings, calculate α 2 i,k and σ 2 i,k using Equation (10) and Equation (11), respectively. Select the sensor readings that satisfy constraint (12) 4 Fused the selected sensor readings using Equation (14).

5
If only two sensors data available, calculate σ 2 f using Equation (17). When σ 2 f ≤ min(σ 2 i,k , σ 2 j,k ) fuse the two sensor readings. Otherwise, take the reading with smaller σ 2 as the final reading. 6 Repeat step 1-5 for time instant k + 1

Results and Discussion
The proposed sensor fusion scheme is applied to a chemical process consisting of two distillation columns at Dow. The flow diagram of the process is shown in Figure 2. The product quality variable, namely the product impurity, is measured by both online and lab analyzers at the outlet of the primary column, marked as Sampling Point. Besides the impurity, input variables such as temperature, pressure, reflux ratio, etc., are measured and interpolated hourly to develop the soft sensor model. The lab analyzer sampling the impurity in every 12 h, while the online analyzer and soft sensor data are available in every 1 h.
The product impurity is a key quality variable that must be accurately monitored. Misidentification of off-spec products will lead to significant economic loss to both the company and the consumer. On the other hand, over-reaction to false alarms, i.e., mistaken eligible product to off-spec ones, will cause unnecessary operational costs. Therefore, using sensor fusion methodology to improve the monitoring precision and reliability is essential for improving the efficiency and sustainability of manufacturing processes.
In the following sub-sections, we first discuss about the preprocessing for the data from three sensors considered in the proposed scheme: the software sensor is developed using the lab data as the output variable and is updated when the T 2 based KPI exceeds its upper limit; the online analyzer data is filtered by a moving median filter with a properly tuned window size. Then we fuse all the multi-rate data to infer the value of product impurities. Examples of handling spurious data from individual sensor, such as false impurity alarm, are discussed as well.

Adaptive PLS Soft Sensor
We first examine the mean and variance update algorithm [18] for developing an adaptive soft sensor. The original PLS soft sensor including nine process variables was developed using data from four years ago. The data and model development are not discussed here. Rather, we use the recent two years data as testing data to examine how accurately the original model without adaptation predicts the product impurity. The resulting KPIs are plotted in Figure 3. The residuals between the soft sensor predictions and the lab measurements are shown in blue dots in the upper figure. As the lab measurement is in every 12 hours, the residuals are sparse comparing to the plot given in the lower sub-figure. The calculation of is based on the frequently measured input variables, pressures, temperatures and flow rates, so it is available every hour and is plotted in continuous line in

Adaptive PLS Soft Sensor
We first examine the mean and variance update algorithm [18] for developing an adaptive soft sensor. The original PLS soft sensor including nine process variables was developed using data from four years ago. The data and model development are not discussed here. Rather, we use the recent two years data as testing data to examine how accurately the original model without adaptation predicts the product impurity. The resulting KPIs are plotted in Figure 3. The residuals between the soft sensor predictions and the lab measurements are shown in blue dots in the upper figure. As the lab measurement is in every 12 h, the residuals are sparse comparing to the T 2 plot given in the lower sub-figure. The calculation of T 2 is based on the frequently measured input variables, pressures, temperatures and flow rates, so it is available every hour and is plotted in continuous line in Figure 3. As is seen in the figure, both the residual (upper figure) and T 2 (lower figure) based KPIs exceed the corresponding upper limit (dashed lines) after 4000 h. This indicates that the soft sensor needs to be updated to ensure accurate predictions.
After the application of the mean and variance update algorithm, the T 2 -based KPI (top sub-figure in Figure 4) first exceeds its upper limit at k = 4733, while the residual-based KPI (middle figure) is still under its upper limit. The bottom plot in Figure 4 shows the prediction of the updated soft sensor (red line) after updating the soft sensor, which tracks the lab measurements (blue dots) much closer than the original soft sensor (black line). The accuracy of the original soft sensor decreases significantly over time. In total, the soft sensor updates based on the criteria outlined in the earlier section are activated six times during the period examined here. The time instances when the software sensor are updated are marked by the vertical lines. It has shown that the adaptive model accurately track the product impurity. The residual-based KPI slightly exceeds the upper limit twice. This may indicate that the accuracy of the prediction can be further improved with a rebuilt soft sensor, which is not discussed in this paper.  After the application of the mean and variance update algorithm, the -based KPI (top subfigure in Figure 4) first exceeds its upper limit at k = 4733, while the residual-based KPI (middle figure) is still under its upper limit. The bottom plot in Figure 4 shows the prediction of the updated soft sensor (red line) after updating the soft sensor, which tracks the lab measurements (blue dots) much closer than the original soft sensor (black line). The accuracy of the original soft sensor decreases significantly over time. In total, the soft sensor updates based on the criteria outlined in the earlier section are activated six times during the period examined here. The time instances when the software sensor are updated are marked by the vertical lines. It has shown that the adaptive model accurately track the product impurity. The residual-based KPI slightly exceeds the upper limit twice. This may indicate that the accuracy of the prediction can be further improved with a rebuilt soft sensor, which is not discussed in this paper.

Tuning the Moving Median Filter for the Online Analyzer
The online analyzer provides frequent measurements for the variable of interest, but with a relatively large variance. In addition to that, the online analyzer also provides spurious values during k = 15170 to 15250 due to sensor malfunctioning as shown in Figure 5. Therefore, a moving window median filter is applied to reduce the variability of the measurements. The filtered output is the median value of the past measurements. The window size should be appropriately selected to reduce the variability and preserve the sensitivity to detect the abnormal events. Here we use the lab measurements as the reference and select the optimal value of to minimize the sum of squared

Tuning the Moving Median Filter for the Online Analyzer
The online analyzer provides frequent measurements for the variable of interest, but with a relatively large variance. In addition to that, the online analyzer also provides spurious values during k = 15170 to 15250 due to sensor malfunctioning as shown in Figure 5. Therefore, a moving window median filter is applied to reduce the variability of the measurements. The filtered output is the median value of the past N measurements. The window size N should be appropriately selected to reduce the variability and preserve the sensitivity to detect the abnormal events. Here we use the lab measurements as the reference and select the optimal value of N to minimize the sum of squared error (SSE) between the lab measurements and the filtered online analyzer measurements. The filter size considered here is in the range of 1 to 80. As shown in the lower figure of Figure 5, the SSE decreases as the window size increases from 1 to 14. The SSE obtained at N = 1 is the exact SSE of the raw online analyzer measurements. It can be seen as the window size of the filter increases, SSE has been reduced, with an optimum window size of approximately 18 for this data set.

Fusion of Three Sensors
Due to the availability of historical data from the online analyzer, we utilize the data of all three sensors during the time period of k = 9700 to 15420. Each sensor reports multiple impurity alarms during this time period and the corresponding time instances are listed in column 1 of Table 3. However, only one impurity alarm at k = 11996-12001 is verified as a true one by the plant engineers, while others are confirmed as false alarms. The true alarm is denoted with "T" in column 2 of Table 3 while the false ones are marked with "F". The robustness of the sensor measurements is quantified by the rate of correctly identified alarms, including the miss of detection for true alarms and false alarm rate, defined as number of true alarms missed (false alarm reported) per year. These two indicators are given in the last two rows of Table 3. The efficacy of the sensor fusion approach will be verified if the two indicators are improved compared to individual sensors.  On the other hand, the filtered measurements should capture the events when the product impurity exceeds its upper limit at 7.5. As shown in upper figure of Figure 5, such an event (impurity alarm when the concentration is greater than 7.5) occurs at k = 11,996 to 12,001. As the filter size increases, the filtered values at these time instants decrease (not shown in this paper). Therefore, the optimal size N is determined by the trade-off between minimizing SSE and maintaining the detection of impurity alarm. Here we select a filter window size of N = 4. The resulting filtered online analyzer measurements (red) are compared with the raw measurements (grey) in the upper graph of Figure 5. It has seen that the variability of the filtered values has been significantly reduced and the impurity alarm has been detected as well.

Fusion of Three Sensors
Due to the availability of historical data from the online analyzer, we utilize the data of all three sensors during the time period of k = 9700 to 15,420. Each sensor reports multiple impurity alarms during this time period and the corresponding time instances are listed in column 1 of Table 3. However, only one impurity alarm at k = 11,996-12,001 is verified as a true one by the plant engineers, while others are confirmed as false alarms. The true alarm is denoted with "T" in column 2 of Table 3 while the false ones are marked with "F". The robustness of the sensor measurements is quantified by the rate of correctly identified alarms, including the miss of detection for true alarms and false alarm rate, defined as number of true alarms missed (false alarm reported) per year. These two indicators are given in the last two rows of Table 3. The efficacy of the sensor fusion approach will be verified if the two indicators are improved compared to individual sensors. As discussed previously, we select a window size of 48 h for estimating the measurement variance in Equation (14) and the maximal allowed deviation between sensors as 50% of the highest sensor reading. One may select other values based on the specific application. Results from fusing the adaptive soft sensor predictions, the lab analyzer measurements and the filtered online analyzer measurements are shown in the upper plot of Figure 6. As discussed previously, we select a window size of 48 hours for estimating the measurement variance in Equation (14) and the maximal allowed deviation between sensors as 50% of the highest sensor reading. One may select other values based on the specific application. Results from fusing the adaptive soft sensor predictions, the lab analyzer measurements and the filtered online analyzer measurements are shown in the upper plot of Figure 6. The measurements of the online analyzer, the lab analyzer and the soft sensor are in general close to each other. However, each of these sensors has false alarms (product impurity > 7.5). The lab measurements have two false alarms at about = 12,957 and 13,365. These false alarms may be caused by human error when recording the data into database, but it is a good test for the sensor fusion scheme as the lab measurement is usually assigned highest weight in the fusion. If the fusion scheme can detect and eliminate the spurious lab data, the efficacy and robustness of the propose The measurements of the online analyzer, the lab analyzer and the soft sensor are in general close to each other. However, each of these sensors has false alarms (product impurity > 7.5). The lab measurements have two false alarms at about k = 12,957 and 13,365. These false alarms may be caused by human error when recording the data into database, but it is a good test for the sensor fusion scheme as the lab measurement is usually assigned highest weight in the fusion. If the fusion scheme can detect and eliminate the spurious lab data, the efficacy and robustness of the propose scheme is confirmed. The prediction of the adaptive soft sensor has two false alarms at k = 12,676 and 13,910. The filtered online analyzer data contains just one false alarm at k = 15,190. The fused measurements captures the true impurity alarm at k = 11,995, and generates no false impurity alarms. In addition, the fused values in general follow the trend of the three sensors. Therefore, the fused measurements are superior to the measurements of the individual sensors. In the lower panel of Figure 6, we depicted the weight coefficients of each data source. According to the weight coefficients, the lab measurements and the PLS predictions usually have large contributions (larger weight values shown in the figure) to the fused values while the influence of the online analyzer is usually small. This is because the online analyzer measurement usually has more frequent variations compared to other sensors which results in a larger variance for the measurement. In addition, when the soft sensor becomes less accurate, the corresponding weights decrease significantly and the fused values are mainly determined by lab measurements. Similarly, when the lab analyzer offers false alarm at k = 12,957 and 13,365, the corresponding weights drop to relative low values. These observations verify the efficacy of the proposed sensor fusion methodology presented in this paper.
We summarize the capability of the individual and fused sensors for detecting impurity alarms in column 3-7 of Table 3. The symbol "" indicates the corresponding sensor reports an alarm. For example, the true alarm at k = 11,996-12,001 is detected by all sensors. In this table, we can tell that the lab analyzer, usually being considered as the most reliable process monitoring method, gives two false alarms. The adaptive soft sensor also gives two false alarms. The online analyzer reports seven false alarms in the period of k = 15,161-15,222 caused by sensor malfunction. The filtered online analyzer only provides one false alarm. The fused sensor detect the true alarm without giving any false ones. Therefore, the fused sensor has the smallest false alarm rate compared to other sensors.
To gain more insights about how the proposed sensor fusion scheme handles spurious data, we zoomed in three representative time periods, k = 11,950-12,050, k = 12,630-12,730 and k = 13,330-13,430. In the first period, all sensors detect the true alarm while in the rest two periods, software sensor and lab measurement provide false alarms, respectively. The individual sensor readings of the quality variable as well as their corresponding weights are plotted in Figure 7. The PLS sensor first provide an alarm at k = 11,995, one hour ahead of the online analyzer. In contrast, the analyzer and lab measurement obtained at k = 11,995 show only slightly increase in impurity level. As both online analyzer and soft sensor provide alarm after k = 11,996, an additional lab measurement was taken at k = 12,001, which confirms the impurity level does increase exceed the specification limit. Because the soft sensor shows much higher impurity value during the true alarm compared to other sensors, its weight in the fused value is zero. The fused value depends on online analyzer and lab measurements. The nine input variable values of soft sensor during this period are plotted in Figure 8. It has shown that the alarm by the soft sensor is caused by the increased value in Variable 1-7 during k = 11,996-12,001.
An example how the fusion scheme handling false alarm from the PLS software sensor is given in Figure 9. The false alarm by the software sensor shown in Figure 9 results from the sensor measurements of several input variables in the PLS model as shown in Figure 10. Similar to the case of true alarms, process Variable 1-7 increases when the false alarm reported by the software sensor. However, most of these high values only last for two hours. This indicates the abnormal conditions in the process didn't last long enough to generate sufficient impurities to increase the impurity concentration beyond the specification limit. In addition, the online analyzer reading did increase from 3.9 to 4.6, which confirms there was increase in impurity level. By comparing the soft sensor readings in this false alarm and in the above true alarm, we can tell that the PLS sensor provide alarms when operating conditions change, but such changes may not last long enough to generate off-spec impurity concentrations. In such a case, the PLS soft sensor will report false alarms, but the proposed sensor fusion method can recognize the inaccurate prediction from the PLS soft sensor.
Sensors 2019, 19, x; doi www.mdpi.com/journal/sensors found.. The PLS sensor first provide an alarm at k = 11,995, one hour ahead of the online analyzer. In contrast, the analyzer and lab measurement obtained at k = 11,995 show only slightly increase in impurity level. As both online analyzer and soft sensor provide alarm after k = 11,996, an additional lab measurement was taken at k = 12001, which confirms the impurity level does increase exceed the specification limit. Because the soft sensor shows much higher impurity value during the true alarm compared to other sensors, its weight in the fused value is zero. The fused value depends on online analyzer and lab measurements. The nine input variable values of soft sensor during this period are plotted in Error! Reference source not found.. It has shown that the alarm by the soft sensor is caused by the increased value in Variable 1-7 during k = 11,996 − 12,001.  An example how the fusion scheme handling false alarm from the PLS software sensor is given in Figure 9. The false alarm by the software sensor shown in Figure 9 results from the sensor measurements of several input variables in the PLS model as shown in Figure 10. Similar to the case of true alarms, process Variable 1-7 increases when the false alarm reported by the software sensor. However, most of these high values only last for two hours. This indicates the abnormal conditions  A similar case that lab analyzer gives spurious data is shown in Figure 11. Again, the software sensor in general has higher weight compared to the online analyzer. When the lab data are available, the fuse value largely depend on the lab data. However, the weight of lab data drops zero when false alarms given by the lab measurements. It has shown that during k = 13,375 -13,395, the fused value solely depends on the PLS software sensor values, despite the readings from the soft sensor and the    A similar case that lab analyzer gives spurious data is shown in Figure 11. Again, the software sensor in general has higher weight compared to the online analyzer. When the lab data are available, the fuse value largely depend on the lab data. However, the weight of lab data drops zero when false alarms given by the lab measurements. It has shown that during k = 13,375 -13,395, the fused value solely depends on the PLS software sensor values, despite the readings from the soft sensor and the We now focus on discussing how the proposed method detects spurious sensor readings or false alarms. When the software sensor shows a false alarm (the peak in the upper sub- figure), the fused value weighs the online analyzer heavily and assigns almost zero weight to the soft sensor. This is because that the variance of the software sensor significantly increases due to the rapid change of its measurement. Meanwhile, the variance of the online analyzer remains at a similar low value. As the moving window size for calculating the variance is 48 h, the spurious data will cause a high variance of the software sensor data for the next 48 h. As a result, the weight of software sensor remains low until 48 h later, when the previous spurious data point is "forgotten" by the algorithm. During the period of k = 12,630-12,650, the software sensor predictions are very close to the lab measurement, as given in the upper sub-figure, and, as shown in the lower sub-figure, the weights of software sensor data are almost 1 when there is no available lab measurement, which means the fused inference solely depends on the software sensor data. When the lab measurement is available, it can be seen that the fused value relies on 90% of the lab and 10% of the software sensor data. As the online analyzer data deviates from two other measurements, the weight of the online analyzer is almost zero during this period. As the deviation of online analyzer from two other sensors decreases, the weights of the online analyzer increase.
A similar case that lab analyzer gives spurious data is shown in Figure 11. Again, the software sensor in general has higher weight compared to the online analyzer. When the lab data are available, the fuse value largely depend on the lab data. However, the weight of lab data drops zero when false alarms given by the lab measurements. It has shown that during k = 13,375-13,395, the fused value solely depends on the PLS software sensor values, despite the readings from the soft sensor and the online analyzer are close to each other. This is because that the variance of the fused value calculated using Equation (17) would be larger than the variance of the value from the soft sensor. In such a case, the values from the online analyzer and the software sensor are not fused together. As the soft sensor and online analyzer measurements become closer afterwards, the fused values rely on both of them. online analyzer are close to each other. This is because that the variance of the fused value calculated using Equation (17) would be larger than the variance of the value from the soft sensor. In such a case, the values from the online analyzer and the software sensor are not fused together. As the soft sensor and online analyzer measurements become closer afterwards, the fused values rely on both of them.
Using the above examples, we have explained that how the proposed sensor fusion scheme identifies and eliminates the spurious data or false alarms. The proposed method may be challenged when two of the three sensors provide spurious data at the same time. However, it is not likely to happen, because the factors affecting the sensor accuracies are different. In addition, even if such case did happen, the sensor readings are expected to be obviously different. And, extra lab sample will help to confirm true product impurity value.

Conclusions
In this paper, we have proposed a sensor fusion scheme to combine multi-rate measurements from both hardware and software sensors. It is capable of identifying and eliminating the spurious data from individual sensors to produce a highly reliable soft sensor. As a result, the reliability and precision of monitored variables related to product quality and process safety will be improved.
The proposed scheme utilizes three typical sensors for monitoring a chemical process: software sensor, online analyzer, and lab analyzer. Due to the time-varying dynamics of the chemical processes, a mean and variance update based adaptation algorithm is applied to keep the soft sensor tracking the process changes. The measurements of the online analyzer are filtered using a moving Using the above examples, we have explained that how the proposed sensor fusion scheme identifies and eliminates the spurious data or false alarms. The proposed method may be challenged when two of the three sensors provide spurious data at the same time. However, it is not likely to happen, because the factors affecting the sensor accuracies are different. In addition, even if such case did happen, the sensor readings are expected to be obviously different. And, extra lab sample will help to confirm true product impurity value.

Conclusions
In this paper, we have proposed a sensor fusion scheme to combine multi-rate measurements from both hardware and software sensors. It is capable of identifying and eliminating the spurious data from individual sensors to produce a highly reliable soft sensor. As a result, the reliability and precision of monitored variables related to product quality and process safety will be improved.
The proposed scheme utilizes three typical sensors for monitoring a chemical process: software sensor, online analyzer, and lab analyzer. Due to the time-varying dynamics of the chemical processes, a mean and variance update based adaptation algorithm is applied to keep the soft sensor tracking the process changes. The measurements of the online analyzer are filtered using a moving median filter to reduce its variability. Then the measurements from the adaptive soft sensor, the online analyzer and the lab analyzer are then fused using Bayesian inference.
We have examined the sensor fusion scheme reported in this paper using a chemical process implemented at Dow for monitoring the product impurity. It has demonstrated that the fused estimation of the product impurity is more reliable and accurate than the measurement of any individual sensor. Specifically, the fused estimation follows the trends of the measurements by the individual sensors, while only reporting the true impurity alarm without any false alarms in contrast to the individual sensors.