1. Introduction
Signals contain rich characteristic information; thus, signal processing analysis is crucial in the development of natural sciences, particularly in environmental monitoring [
1]. There is a relationship between the indoor carbon dioxide concentration and respiratory mucosal symptoms [
2,
3]; hence, it is essential to monitor and study carbon dioxide concentrations in various functional building spaces in real time. However, when there are sudden changes in background concentrations of CO
2 or equipment sampling failures, the CO
2-monitoring signal becomes noisy, making it difficult to distinguish important signal features from noise. Therefore, the noise cancellation of the CO
2-monitoring signal is important.
There are various methods of signal denoising, and the most commonly used include smoothing, Fourier transform [
4], wavelet theory [
5], and Hilbert–Huang transform [
6,
7]. Smoothing is a simple and convenient method that denoises the signal at the expense of reduced temporal resolution [
8,
9]. In Fourier and wavelet transforms, the selection of the a priori basis function directly affects the results of signal noise reduction [
9,
10], which is a major limitation in the application of these methods. The core of the Hilbert–Huang transform is empirical modal decomposition (EMD). The EMD model identifies the intrinsic oscillatory modes in the signal, based on the local characteristic timescale of the signal, and, accordingly, decomposes the raw signal into several intrinsic mode functions (IMFs) without requiring previous knowledge of original signal values [
11]. Therefore, EMD is commonly adopted for analyzing adaptive, nonlinear, and nonstationary signal processing, such as indoor CO
2 concentration signals. However, the EMD methods applied for noise reduction are subject to modal aliasing and endpoint effects, known as the mode-mixing issue [
12]. In other words, EMD is prone to noise. Based on the framework of EMD, the improved methods, such as ensemble EMD [
13] and noise-assisted MEMD [
14], are proposed to solve the intermittence problem of EMD. However, there are still many problems with these methods, such as unselectable parameters of noise and failure to separate modes.
The time-varying, filtering-based empirical mode decomposition (TVF-EMD) was developed to address the shortcomings of the EMD model, with the shifting process completed by the B-spline approximation filter [
12]. The three main features of TVF-EMD, compared with most existing methods, can be summarized as follows: (1) The TVF-EMD method can simultaneously address the issues of separation and intermittence [
15]. (2) With a B-spline approximation filter integrated into the shifting process, TVF-EMD solves the issue of mode mixing and maintains time-varying features. (3) The enhanced stopping criterion improves the performance of the TVF-EMD model for low sampling rates. However, two significant effects, bandwidth threshold
and B-spline order
, have direct impacts on the separation and filter performance of the time-varying filter, respectively [
12]. A reasonable selection of a combination of the two parameters in advance enables the TVF-EMD model to resolve the mode-mixing problem, thereby achieving optimal noise reduction. Hence, the choice of parameters for the TVF-EMD is clearly important.
Many optimization methods are used in hyperparameter estimation. Among the most widely used are the grid search [
16,
17], random search [
18], Genetic algorithm [
19], and Bayesian optimization (BO) algorithms [
20,
21]. Unlike grid and random searches, the framework of BO leverages information from existing data, and the current search for optimal values is based on previous search results [
22]. In addition, there are swarm intelligence optimization technologies that imitate the behaviors of various organisms, such as ants [
23], particle swarm optimization [
24], fish schools [
25], glow worm swarm optimization [
26], and grey wolves [
27], to achieve optimal parameter estimation. Zhou et al. [
28] proposed the parameter-adaptive TVF-EMD feature extraction method, based on improved GOA, to deal with the mechanical fault diagnosis. However, the Genetic algorithm and the swarm intelligence optimization algorithm can be categorized as population optimization algorithms. Population optimization algorithms are not particularly suitable for model hyperparameter tuning because they require a sufficient number of initial sample points and are not particularly efficient for optimization. To accelerate computation, Bayesian hyperparameter optimization based on surrogate algorithms is widely used. The tree-structured Parzen estimator (TPE) is one of the most notable hyperparameter optimization methods [
29,
30]. The Parzen-based estimator can naturally handle complex search spaces and can be extended to dozens of variables, with at least a thousand observations [
31]. Therefore, selecting a BO algorithm based on a tree-structured Parzen estimator (BO-TPE) is a reasonably effective method for TVF-EMD parameters.
In the aforementioned denoising theories, the smoothing method can reduce the signal resolution in time. Indoor CO2 concentrations in buildings is a typical non-stationary signal. These signal-denoising methods, such as Fourier transform, EMD, ensemble EMD, and noise-assisted MEMD, are not suitable for processing non-stationary signals. The wavelet transform is suitable for dealing with non-stationary, time-varying signals, but there is the problem of difficult wavelet basis selection. The TVF-EMD method can solve the shortcomings of the above methods. Combined with the hyperparameter algorithm, the TVF-EMD method can match the appropriate parameters more quickly to achieve accurate noise reduction of the indoor CO2 concentration.
Considering this background, this study proposed an optimized TVF-EMD method, based on the BO-TPE algorithm, for noise cancellation. First, the correlation coefficient was defined as a comprehensive index for the objective condition of TVF-EMD parameter optimization. In addition, the correlation coefficient was used as the mode selection criterion for the target IMF. Second, an optimization algorithm called BO-TPE was used to search for the optimal combination of TVF-EMD parameters to match the input signal. The effectiveness of the proposed method was suitable for processing the noise of the non-stationary signal, especially large amplitudes, but with dispersed distribution noise signals. The noise reduction results of the CO2 concentration signals from two different sets of functional building spaces demonstrated that the method was successful in canceling noise and could be effective in CO2-monitoring engineering applications.
2. Preliminaries
2.1. Time-Varying Filter Empirical Mode Decomposition
EMD can decompose a given signal
into a set of component signals known as IMFs and residual
, as shown in the following equation:
where
represents the
-th IMF. The screening process of EMD comprises two steps: (1) estimation of the “local mean”
and (2) recursive subtraction of the
from the input signal until the resulting signal meets the stopping criterion.
In TVF-EMD, the mono-components are replaced by local narrow-band signals to improve the performance of the EMD method. These signals have properties similar to those of the IMF but provide a meaningful Hilbert spectrum. The local instantaneous bandwidth is used to define local narrow-band signals, which necessitate that the bandwidth be below a given threshold value. This method aims to initially determine the local cut-off frequency and subsequently applies a time-varying filter. The TVF-EMD shifting process is performed using a time-varying filter, which primarily involves three main steps [
12]:
Step 1: Locate the maximum time consumption of , expressed as
Step 2: Determine all intermittences, expressed as
by setting a threshold value for the rate of change within a certain time span. These breaks should satisfy the following condition:
where
, and
is the bisection frequency.
Step 3: The condition of the rising edge of is , and is considered the floor. Similarly, the falling edge is , and is considered the floor. The rest of refers to the peak.
Step 4: By interpolating between the peaks, the final local cutoff frequency can be obtained. During the local cutoff frequency rearrangement phase, the TVF-EMD addresses the separation and intermittence problems.
- 2.
Shifting process for TVF-EMD
Step 1: Address the local cut-off frequency for signal .
Step 2: Filter the input signal using the time-varying filter (i.e., B-spline approximation filter) to obtain the local mean. The bandwidth threshold determines the separation effect and whether the input signal must be filtered. The B-spline order is independent of the cutoff frequency estimate, which determines the attenuation and filtering effect of the TVF.
Step 3: Verify that the residual signal satisfies the condition of the stopping criterion indicated below:
where
is the Loughlin instantaneous bandwidth, and
is the weighted average instantaneous frequency. The detailed calculation process of TVF-EMD was discussed in previous studies [
12,
32].
2.2. Bayesian Optimization
The BO-TPE works by assuming that the black-box function is sampled from a Gaussian mixture model and maintains the posterior distribution of that function while making observations. The posterior probabilities are updated using new sample points at each iteration. To prevent the results from falling into local optima, the BO algorithm adds a stochastic algorithm to balance stochastic exploration and posterior distribution.
The probabilistic surrogate model and acquisition function are the core components of the BO method, and the TPE is the probabilistic surrogate model of the study. The BO algorithm is highly efficient for hyperparameter estimation, as it exhibits excellent convergence.
The objective of Bayesian optimization is to determine the global maximum or minimum value of the objective function in the hyperparameter space. This study aimed to determine the maximum value of the true fitness objective function on a bounded set
in a limited number of iterations. The mathematical model of the BO algorithm is as follows:
where the point
that maximizes the surrogate function is the proposed point for evaluating the objective function
.
Let us suppose that the observation values are of the form
, where
is the generalization accuracy of the algorithm under
. This can be considered the generalization accuracy of random observations
, where the noise introduced into the observation
is assumed to satisfy
. To explore
in the hyperparameter space, the acquisition functions were obtained, in which the maximum of the function was calculated as
In this study, the bandwidth threshold
and B-spline order
were the independent variables
of the BO model, and the flowchart of the Bayesian optimization was outlined as follows (see
Figure 1). First, we determined whether the parameters of the model,
and
, were initialized, and if not, the initial parameters were generated randomly. If the parameters were initialized, the values were brought into the tree Parzer estimator surrogate model. Subsequently, it was judged whether the acquisition function reached its maximum value when given
and
. If the maximum value of the target function was satisfied, the value of the two parameters were output. If not, the values of the surrogate model parameters were updated until the requirements were met.
3. Adopted Methodology
The proposed optimized TVF-EMD method is based on the BO-TPE algorithm. It aims to search for optical combinations of parameters for the bandwidth threshold
and B-spline order
using the objective function, which determines the merits of the decomposition results. The kurtosis index
depends on the distribution density of the signal, which is highly sensitive to large amplitudes with dispersed distributions. A kurtosis index value between 0 and 3 indicates that the center peak of the signal is lower and broader, compared to the normal distribution represented by
. In contrast,
indicates that the central peak of the signal is higher and sharper. Thus, a smaller kurtosis index is required for a more sensitive identification of outliers. However, to avoid excessive noise cancellation, the correlation coefficient (
CC) is used to characterize the similarities between original and decomposing signals. Therefore, the synthetic measurement index, consisting of the kurtosis index and
CC, was developed as an objective function for TVF-EMD parameter optimization. The synthetic measurement index, correlation coefficient for kurtosis index (
CCKur), was calculated as follows:
As the original BO algorithm was developed to determine the maximum value, the method used the maximum
CCKur between the original signal and the modes obtained by TVF-EMD as the fitness. Therefore, the maximization of
CCKur was the optimization problem expressed below:
where
is the objective function, and
represents the parameter combination of the TVF-EMD method to be optimized. The CC of the original signal
, which is a function of mode
with the same length as
, is described in Equation (7).
Ensuring the reliability of the parameter optimization was essential, and the number of modes after CO
2 signal decomposition must be at least two. After several attempts, it was discovered that the bandwidth threshold
met
, and
satisfied the requirements. The same number of modes was obtained with the typically used standard EMD, and the maximum number of modes for the TVF-EMD was set to
[
4].
The autocorrelation properties of the signal and Hurst exponent were evaluated to distinguish noise from the most relevant modes. When , two signals were anticorrelated. In contrast, indicated white noise, and represented a positive correlation. In this study, the threshold for the Hurst index was defined as 0.5 (). The detailed steps of the noise cancellation methodology were as follows:
Step 1: Input the CO2 concentration signal and set the parameter population to the TVF-EMD model. Concurrently, initialize the parameters of BO algorithms and population , including the bandwidth threshold and B-spline order .
Step 2: Decompose the signal using the TVF-EMD model for the parameter combination of and , and then calculate the IMFS to obtain the objective function , where the best fitness for each iteration of the BO algorithm is stored.
Step 3: If the stored value of fitness satisfies the threshold, then save the optical parameters and . Otherwise, , and continue Step 2 to update parameters and until the maximization of is up to requirements.
Step 4: Obtain and save the best maximization of and the corresponding parameter combination of the TVF-EMD.
Step 5: Update the population by obtaining the best parameter combination.
Step 6: Use the optimized TVF-EMD with the combination parameters and to decompose the original CO2 signal.
Step 7: Calculate the Hurst exponent of each IMF. If the is greater than , save the sensitive IMF.
Step 8: Sum these sensitive IMFs together to obtain the reconstructed signals. The other insensitive IMFs are considered to be noise.
A flowchart of the proposed BO-based TVF-EMD method for the SNC model is shown in
Figure 2.
5. Conclusions
An optimized TVF-EMD model based on a Bayesian algorithm was adopted to develop a noise cancellation model for denoising the CO2 concentration signal of a building. The Bayesian algorithm was used to optimally estimate the TVF-EMD parameters, namely the bandwidth threshold and B-spline order , and the adaptive matching of the given CO2 concentration signal. The main conclusions can be summarized as follows:
In parameter optimization, a synthetic measurement index consisting of CCKur was used as the objective function of TVF-EMD. This function could identify anomalous signals while preserving the signal profile and avoided excessive noise reduction. In the proposed noise cancellation model, a thresholding parameter , based on the Hurst exponent, was introduced as a measurement index for selecting the relevant IMFs for signal reconstruction.
The hyperparameter was more important for decomposition results. The efficacy of the synthetic measurement index was verified against five optimization indices: CC, Kur, MI, EE, and EEMI on decomposed IMFs. The results demonstrated that the proposed CCKur index was sensitive to , and the selection of CCKur as a synthetic measurement index and as an objective function were reasonable and effective.
The noise reduction effect between different signal-denoising models, that is, TVF-EMD with default values, EMD, moving average method, and exponential smoothing method, was compared in terms of SNR, MSE, RSE, and NRMSE. The proposed noise cancellation model yielded the largest absolute value of SNR and the smallest MSE, RSE, and NRMSE, demonstrating the high noise reduction capability of the proposed model for CO2 concentration signals.