An Approach for Filter Divergence Suppression in a Sequential Data Assimilation System and Its Application in Short-Term Tra ﬃ c Flow Forecasting

: Mathematically describing the physical process of a sequential data assimilation system perfectly is di ﬃ cult and inevitably results in errors in the assimilation model. Filter divergence is a common phenomenon because of model inaccuracies and a ﬀ ects the quality of the assimilation results in sequential data assimilation systems. In this study, an approach based on an L 1 -norm constraint for ﬁlter-divergence suppression in sequential data assimilation systems was proposed. The method adjusts the weights of the state-simulated values and measurements based on new measurements using an L 1 -norm constraint when ﬁlter divergence is about to occur. Results for simulation data and real-world tra ﬃ c ﬂow measurements collected from a sub-area of the highway between Leeds and She ﬃ eld, England, showed that the proposed method produced a higher assimilation accuracy than the other ﬁlter-divergence suppression methods. This indicates the e ﬀ ectiveness of the proposed approach based on the L 1 -norm constraint for ﬁlter-divergence suppression.


Introduction
Short-term traffic flow forecasting is a crucial component in many intelligent transportation systems (ITSs) [1][2][3]. Short-term traffic flow forecasting refers to predicting traffic flow in units of seconds and minutes mainly for traffic control and guidance, which can better describe real traffic conditions [4,5]. To achieve dynamic traffic management or provide advanced traveler information, short-term traffic flow forecasting that reflects real-time local fluctuations or traffic congestion resulting from fast-changing traffic flow values is necessary [6,7]. Due to the stochastic nature of traffic flow, robust and accurate prediction algorithms have become increasingly important [4,[8][9][10]. Taking advantage of measurements and models is commonly used to make predictions [11][12][13]. Data assimilation (DA) is an important method that can be used to estimate the state vectors by integrating physical model information and measurements [14][15][16][17]. It can take advantage of measurements and models to make predictions by fusing measurement information during the model process based on the spatial-temporal distribution of the data and errors in the measurement and background fields [18]. Related DA methods have been applied to short-term traffic state predictions [11,12,19]. A DA system Table 1. Kalman filter method adapted from [34].
Matrix Q and R are the covariance matrix of the Gaussian noise of state and the measurement equation, respectively. In the KF method, they are assumed to be uncorrelated [30,31,34]. The K matrix in the updated component of the KF method shown in Table 1 is the Kalman gain matrix and is the weight of the state forecast values and measurements.
In theory, the reason for the filter-divergence phenomenon in the KF method is that the K matrix becomes increasingly smaller over the assimilation evolution. This decrease results in a weak correction of the measurements on the state model and ultimately leads to a filter-divergence phenomenon [15,35]. The filter-divergence phenomenon in the KF method mainly arises from two factors. The first factor is the limitation of the computer word length, which causes errors such as rounding errors [36]. The accumulation of these errors reduces the filtering accuracy. When the accumulation of errors is significant, it causes the calculation error variance matrix to lose its positive definiteness and symmetry. Consequently, the deviation between the calculated and theoretical K matrices will increase. The second factor is a cognitive limitation, which results in improper and inaccurate descriptions of the model and its statistical noise information. Furthermore, this limitation causes mismatching between the simulated values from the state model and the measurements as the unknown model errors propagate in the covariance matrix during the KF method process [15]. In other words, as the algorithm progresses, the measurement covariance Q grows, whereas the model covariance P shrinks. Thus, the model credibility increases, and the estimated values are mainly acquired from the model's analog outputs. This study addressed solutions for suppressing the filter divergence in the KF method of the S-DA systems resulting from these cognitive limitations. Selecting a classic case as an example, that is, target tracking with uniform motion [37], it was possible to intuitively demonstrate the phenomenon of filter divergence arising from cognitive limitations. The assimilation models for target tracking are expressed as follows: where x k is the position of the tracking target at time k, and x 0 = 205m.v is the speed with v = 1 m/s; y k denotes the measurements of the target position; w k and δ k are the noise of the dynamic state and observation models with w k ∼ N(0, Γ w ) and δ k ∼ N(0, Γ δ ), respectively. Γ w and Γ δ were assumed to be 1 in this test. Due to cognition limitations, it is supposed that the model given by Equation (3) was mistakenly selected as follows: The KF assimilation method was used to estimate the target position under the two models given by Equations (3) and (4). The errors between the predicted values and the real measurements were calculated through 1000 simulations. The results are shown in Figure 1. inaccurate descriptions of the model and its statistical noise information. Furthermore, this limitation causes mismatching between the simulated values from the state model and the measurements as the unknown model errors propagate in the covariance matrix during the KF method process [15]. In other words, as the algorithm progresses, the measurement covariance Q grows, whereas the model covariance P shrinks. Thus, the model credibility increases, and the estimated values are mainly acquired from the model's analog outputs. This study addressed solutions for suppressing the filter divergence in the KF method of the S-DA systems resulting from these cognitive limitations. Selecting a classic case as an example, that is, target tracking with uniform motion [37], it was possible to intuitively demonstrate the phenomenon of filter divergence arising from cognitive limitations. The assimilation models for target tracking are expressed as follows: Γ were assumed to be 1 in this test.
Due to cognition limitations, it is supposed that the model given by Equation (3) was mistakenly selected as follows: The KF assimilation method was used to estimate the target position under the two models given by Equations (3) and (4). The errors between the predicted values and the real measurements were calculated through 1000 simulations. The results are shown in Figure 1. As shown in Figure 1, the simulation errors acquired from the correct model in Equation (3) were far smaller than those obtained from the incorrect model in Equation (4). Filter divergence occurred in the forecasting results of the target position using the wrong forecasting model with the KF method. The main reason that filter divergence occurred in the incorrect model was that this model did not consider the speed information. As vk  items were missing in the incorrect model, the model errors were underestimated.
Various methods have been proposed to suppress the filter divergence caused by inaccurate assimilation models. Initially, gain matrix adjustment methods were commonly used [38,39] including constant gain methods, stage-constant gain methods, and finite-lower-bound gain As shown in Figure 1, the simulation errors acquired from the correct model in Equation (3) were far smaller than those obtained from the incorrect model in Equation (4). Filter divergence occurred in the forecasting results of the target position using the wrong forecasting model with the KF method. The main reason that filter divergence occurred in the incorrect model was that this model did not consider the speed information. As v × k items were missing in the incorrect model, the model errors were underestimated.
Various methods have been proposed to suppress the filter divergence caused by inaccurate assimilation models. Initially, gain matrix adjustment methods were commonly used [38,39] including constant gain methods, stage-constant gain methods, and finite-lower-bound gain methods. These methods force the filter gain matrix K to decrease to a certain value to avoid filter divergence. However, these methods are not precise enough, as the determination of the constant gain matrix and its limited values always rely on experiences. The limited memory method also can effectively suppress filter divergence [40]. This method reduces the influence of historical data on the gain matrix construction; however, the length of the memory is difficult to select. In addition, the covariance weighting (C-W) method [41] and the adaptive Kalman filter (A-KF) method [42][43][44][45][46] have commonly been used and are effective for suppressing filter divergence. The C-W method [41] suppresses the filter divergence by adding a weight or a fading factor to inflate the model error covariance matrix. After this, the dependence of the forecasting results on inaccurate models can be reduced, and the suppression of the filter divergence can be achieved. This method is often used in fields that require a balance of precision and computational efficiency. The A-KF method can estimate the state model error covariance matrix Q and the measurement error covariance matrix R online in real time to suppress filter divergence. The A-KF method is usually used in fields in which an accurate noise model is not available, but high precision is required. These two methods suppress the filter-divergence phenomenon by adjusting the model error covariance matrices during the KF method process in real-time to reduce the influence of the model error or model discrepancy term in the propagation of the covariance matrix on the filtering results. However, selecting a weight or inflating factor in the C-W method is difficult. Additionally, in the A-KF method, it is challenging to ensure that the calculation error covariance matrix is suitable for the dynamic noise in an actual dynamic process [47].
In this paper, an approach based on an L 1 -norm constraint for filter-divergence suppression in the S-DA system was proposed and applied to short-term traffic flow forecasting to verify its effectiveness. The advantages of the filter-divergence suppression method based on the L 1 -norm constraint method are that it can guarantee high precision and it is easy to implement. Three critical items were investigated: (i) an inaccurate S-DA system that generates filter divergence for short-term traffic flow forecasting; (ii) an approach based on the L 1 -norm constraint for filter-divergence suppression; and (iii) the application of the S-DA system based on the L 1 -norm constraint method to short-term traffic flow forecasting to verify its effectiveness compared with other methods.
The remainder of the paper is organized as follows. In Section 2, an inaccurate S-DA system that generated filter divergence for short-term traffic flow forecasting is presented. In Section 3, a filter-divergence suppression method based on the L 1 -norm constraint is proposed. Section 4 gives a numerical example. In Section 5, short-term traffic flow forecasting application experiments are described. Section 6 presents the results and Section 7 presents our conclusions.

Representation of Filter Divergence Phenomenon in Sequential Data Assimilation System for Short-Term Traffic Flow Forecasting
Filter divergence occurs in parametric models in the S-DA system. Recently, vector autoregressive (VAR) models have been widely applied to short-term traffic flow forecasting [48,49]. VAR models are based on the assumption that the traffic flow of the current path in time interval [kT, (k + 1)T], expressed as q pc (k + 1), is related not only to its own traffic flow values in n previous time intervals q pc (k), q pc (k − 1), · · · , q pc (k + 1 − n), but also to the corresponding values of its m adjacent paths: T is the forecasting time period, and k = 1, 2, . . .. Future values can be acquired from the past values and their correlations analyzed. In this study, the traffic flow forecasting of q pc (k + 1), which can be obtained from the VAR model presented previously, is expressed as follows [50]: where F(k) denotes the traffic flow values of the current and adjacent paths at time interval [(k − 1)T, kT], and f (·) denotes a nonlinear function. Equation (5) can be rewritten in detail as follows: with ISPRS Int. J. Geo-Inf. 2020, 9, 340 The unknown parameters [ξ 0 (k), ξ 1 (k), . . . , ξ n (k)] in Equation (6) are treated as state vectors, which are also shown in Equation (8) and these vectors need to be calculated in the S-DA system. In Equation (7), q pc (k + 1) denotes the average value of the traffic flow along the current path calculated by historical flow measurements at time interval [kT, (k + 1)T] on the same day in previous consecutive weeks; q pc (k) represents the historical traffic flow values of the current path at time interval [(k − 1)T, kT]; q pc (k) is the corresponding average value; q pa i (k) represents the traffic flow values of adjacent paths pa i at time interval [(k − 1)T, kT], and q pa i (k) represents the corresponding average values. In this study, n was set to 2.
The form of Equation (6) can be related to Equations (1) and (2) as follows: In the S-DA system, filter divergence will occur if the assimilation model is not correct. The essence of filter divergence is that the Kalman gain matrix K becomes increasingly smaller and even tends to zero over the assimilation evolution process. Consequently, the relative deviation between the estimated state values and the true values becomes increasingly larger. To display this phenomenon intuitively, it was assumed in this study that our understanding of the traffic flow forecasting system was incomplete and inaccurate. The system in Equation (7) was incorrectly taken as follows: As an example to display the filter divergence more intuitively, the traffic flow forecasting results of path 7768 (LM838), which is a path of England's highway system, were obtained using the correct and incorrect models. Without loss of generality, the forecasting results on workday Monday and non-workday Saturday are given. Figure 2 shows that the filter-divergence phenomenon occurred on both Monday and Saturday using the incorrect model with the S-DA system based on the KF method.  Figure 3 shows the overall technical framework of the filter-divergence suppression approach based on the L1-norm constraint in the KF assimilation method of the S-DA system.  Figure 3 shows the overall technical framework of the filter-divergence suppression approach based on the L 1 -norm constraint in the KF assimilation method of the S-DA system.  Figure 3 shows the overall technical framework of the filter-divergence suppression approach based on the L1-norm constraint in the KF assimilation method of the S-DA system.

Filter-Divergence Suppression Approach Based on L1-norm Constraint:
The key to suppressing filter divergence is to calculate the proper weights of the state estimation and new measurements, that is, the Kalman gain matrix K . When filter divergence occurs, it means that the Kalman gain matrix K is too small to balance the weight between the simulated and measured values when calculating the forecasting results. This also means that the new measurements have a weak correction effect on the state model during the assimilation process. Thus, we proposed an approach to calculate the proper Kalman gain matrix K when divergence was about to occur by improving the ability of the new measured values to correct the state. The specific way to acquire the matrix K in the proposed method was to force the measurement value cal y calculated from the assimilation model to be close to the true measured value obs y .
Thus, the objective function is the minimum value between cal y and obs y based on the L1norm criterion. The L1-norm and L2-norm are commonly used as objective functions to be minimized Filter-Divergence Suppression Approach Based on L 1 -norm Constraint: The key to suppressing filter divergence is to calculate the proper weights of the state estimation and new measurements, that is, the Kalman gain matrix K. When filter divergence occurs, it means that the Kalman gain matrix K is too small to balance the weight between the simulated and measured values when calculating the forecasting results. This also means that the new measurements have a weak correction effect on the state model during the assimilation process. Thus, we proposed an approach to calculate the proper Kalman gain matrix K when divergence was about to occur by improving the ability of the new measured values to correct the state. The specific way to acquire the matrix K in the proposed method was to force the measurement value y cal calculated from the assimilation model to be close to the true measured value y obs .
Thus, the objective function is the minimum value between y cal and y obs based on the L 1 -norm criterion. The L 1 -norm and L 2 -norm are commonly used as objective functions to be minimized [51]. However, objective functions based on the L 1 -norm criterion have better resistance to noise. The L 1 -norm constraint function for calculating the Kalman gain matrix K is as follows: where y cal is the estimated measurement calculated from the state information, and y obs is the new measurement at the current interval [(k − 1)T, kT]. Based on Table 1 and Equations (2), (7), and (8), y cal can be calculated as follows: where x a k is an expression involving the K matrix. Combining Equations (10) and (11), the objective function can be further expressed as: To acquire optimal solution of the gain matrix K when Equation (12) is minimum, the gradient projection method [52][53][54] or conjugate gradient method [55][56][57] can be used.
To determine whether filter divergence occurs, a more commonly used criterion is as follows [58]: where v k = y k − H k x f k and r ≥ 1. Filter divergence will occur if Equation (13) is not satisfied.

Numerical Study
To verify the effectiveness of the filter suppression approach based on the L 1 -norm constraint, two commonly used and effective methods for suppressing filter divergence were employed, namely the C-W and A-KF methods. The original KF method [34], C-W method [41], A-KF method [44], and proposed method based on the L 1 -norm constraint were applied in the numerical example of filter-divergence suppression, as shown in Equation (4) and Figure 1. As selecting a weight or inflating factor in the C-W method is difficult, in following experiments, an adaptive inflation was used to acquire the weight or fading factor to inflate the model error covariance matrix [41]. The results were then compared and analyzed.
To display the results clearly, the cumulative absolute errors average (Cum-AEA) values of each method were given, calculated as follows: wherex(i) denotes the forecasted traffic flow value, and x(i) denotes the corresponding true value. The smaller the values of Cum-AEA, the better the forecasted results. Figure 4 shows the Cum-AEA values acquired using the four methods mentioned previously.  To evaluate the assimilation forecasting results, two commonly used evaluation criteria, the root mean square error (RMSE) [1,6,59,60] and the mean absolute percentage error (MAPE) [1,6], were used and were calculated as follows: where  xk denotes the forecasted traffic flow value, and   xk denotes the corresponding true value. The smaller the values of RMSE and MAPE, the better the forecasted results. Table 2 displays the RMSE and MAPE (%) values of the assimilation results acquired from the original KF method, the C-W method, the A-KF, and the proposed method based on the L1-norm constraint. To evaluate the assimilation forecasting results, two commonly used evaluation criteria, the root mean square error (RMSE) [1,6,59,60] and the mean absolute percentage error (MAPE) [1,6], were used and were calculated as follows: wherex(k) denotes the forecasted traffic flow value, and x(k) denotes the corresponding true value. The smaller the values of RMSE and MAPE, the better the forecasted results. Table 2 displays the RMSE and MAPE (%) values of the assimilation results acquired from the original KF method, the C-W method, the A-KF, and the proposed method based on the L 1 -norm constraint. As shown in Figure 4 and listed in Table 2, a filter-divergence phenomenon occurred using the KF method. The corresponding RMSE and MAPE values were 0.5647 and 31.81%, respectively. Compared with the results from the KF method, filter divergence was suppressed to various degrees under the C-W method, the A-KF method, and the proposed method based on the L 1 -norm constraint. The RMSE and MAPE values acquired from the proposed method were 0.0340 and 0.12%, respectively, which were the smallest values out of all methods. Compared with the C-W and A-KF methods, the RMSE values were reduced by 7.61% and 4.49%, respectively, and the MAPE values were reduced by 25% and 14.29%, respectively. This result indicates that the proposed method based on the L 1 -norm constraint could suppress the filter divergence problem efficiently with the highest assimilation accuracy. To verify the applicability of the proposed method, an empirical study is presented in the next section.

Study Area and Material Description
The datasets used in this study were downloaded from the Highways England website (highwaysengland.co.uk). The data were from a sub-area of the highway between Leeds and Sheffield, England, as shown in Figure 5a. Data from Monday to Sunday for each path were collected. The time interval for the data was 15 min. The data for each path used in the S-DA system contained eight days from consecutive weeks. As the mean traffic flow values are needed in the assimilation models shown in Equations (6) and (7), datasets of the former seven days were used for model construction in the S-DA system, and the data from the eighth day were employed to test the effectiveness of the proposed approach. Without loss of generality, traffic flow forecasting results from Monday to Sunday were acquired and analyzed. Furthermore, as traffic flow in the early morning and late night was low and of little concern to traffic management, the forecasting results from 6:00 a.m. to 9:00 p.m. were examined. The datasets used in this study were downloaded from the Highways England website (highwaysengland.co.uk). The data were from a sub-area of the highway between Leeds and Sheffield, England, as shown in Figure 5a. Data from Monday to Sunday for each path were collected. The time interval for the data was 15 min. The data for each path used in the S-DA system contained eight days from consecutive weeks. As the mean traffic flow values are needed in the assimilation models shown in Equations (6) and (7), datasets of the former seven days were used for model construction in the S-DA system, and the data from the eighth day were employed to test the effectiveness of the proposed approach. Without loss of generality, traffic flow forecasting results from Monday to Sunday were acquired and analyzed. Furthermore, as traffic flow in the early morning and late night was low and of little concern to traffic management, the forecasting results from 6:00 a.m. to 9:00 p.m. were examined.

Test Design
In this section, the S-DA system based on the proposed L1-norm constraint-based method was applied with the original KF method [34], the C-W method with an adaptive inflation [41], and the A-KF method [44] to acquire the traffic flow forecasting results of the paths shown in Figure 5a. The

Test Design
In this section, the S-DA system based on the proposed L 1 -norm constraint-based method was applied with the original KF method [34], the C-W method with an adaptive inflation [41], and the A-KF method [44] to acquire the traffic flow forecasting results of the paths shown in Figure 5a. The effectiveness of the proposed method was verified by comparing the results. To illustrate the analysis in detail, the forecasting performances of the six paths shown in Figure 5b are listed first.

Results and Discussion
Traffic flow predictions on workday Monday and non-workday Sunday for path 7078(LM862), which was part of the study area shown in Figure 5b, were computed first using the KF method, the C-W method, the A-KF method, and the proposed L 1 -norm constraint-based method under the incorrect model shown in Equation (9) as an example to demonstrate the performance of the proposed method in detail and the results are shown in Figures 6 and 7, respectively. The results presented showed that filter divergence occurred when only using the KF method under the incorrect model; however, the filter divergence was alleviated to different degrees after using the C-W, A-KF, and L 1 -norm constraint-based methods. Furthermore, the true values were added to verify the effectiveness of these three filter-divergence suppression methods. Results showed that the best performance for filter-divergence suppression was obtained using the L 1 -norm constraint-based method on both Monday and Sunday.  To evaluate the assimilation forecasting results, the two commonly used evaluation criteria shown in Equation (15) were used. The RMSE and MAPE values for the six paths shown in Figure 5b were acquired from the S-DA system based on the KF, C-W, A-KF, and L1-norm constraint-based methods under the incorrect model and the KF method under the correct model (Equation (7)). The values are presented in Figure 8 and Figure 9, respectively. The RMSE and MAPE values calculated from the KF method using the correct model were set as a reference to verify the effectiveness of each filter-divergence suppression method. Figure 9 showed that the MAPE values of each path acquired using the KF method under the incorrect model reached almost 100%, which indicated that filter divergence occurred. The traffic flow forecasting performances improved to different degrees when using the C-W, A-KF, and L1-norm constraint-based methods. The RMSE and MAPE values of each path on Monday to Sunday acquired using the C-W, A-KF, and L1-norm constraint-based methods were smaller than those from the KF method under the incorrect model. Furthermore, compared with the results from the C-W and A-KF methods, the results obtained using the proposed L1-norm constraint-based method were much closer to those from the KF method under the correct model. This indicates the effectiveness of the proposed L1-norm constraint-based method.  To evaluate the assimilation forecasting results, the two commonly used evaluation criteria shown in Equation (15) were used. The RMSE and MAPE values for the six paths shown in Figure 5b were acquired from the S-DA system based on the KF, C-W, A-KF, and L1-norm constraint-based methods under the incorrect model and the KF method under the correct model (Equation (7)). The values are presented in Figure 8 and Figure 9, respectively. The RMSE and MAPE values calculated from the KF method using the correct model were set as a reference to verify the effectiveness of each filter-divergence suppression method. Figure 9 showed that the MAPE values of each path acquired using the KF method under the incorrect model reached almost 100%, which indicated that filter divergence occurred. The traffic flow forecasting performances improved to different degrees when using the C-W, A-KF, and L1-norm constraint-based methods. The RMSE and MAPE values of each path on Monday to Sunday acquired using the C-W, A-KF, and L1-norm constraint-based methods were smaller than those from the KF method under the incorrect model. Furthermore, compared with the results from the C-W and A-KF methods, the results obtained using the proposed L1-norm constraint-based method were much closer to those from the KF method under the correct model. This indicates the effectiveness of the proposed L1-norm constraint-based method. To evaluate the assimilation forecasting results, the two commonly used evaluation criteria shown in Equation (15) were used. The RMSE and MAPE values for the six paths shown in Figure 5b were acquired from the S-DA system based on the KF, C-W, A-KF, and L 1 -norm constraint-based methods under the incorrect model and the KF method under the correct model (Equation (7)). The values are presented in Figures 8 and 9, respectively. The RMSE and MAPE values calculated from the KF method using the correct model were set as a reference to verify the effectiveness of each filter-divergence suppression method. Figure 9 showed that the MAPE values of each path acquired using the KF method under the incorrect model reached almost 100%, which indicated that filter divergence occurred.
The traffic flow forecasting performances improved to different degrees when using the C-W, A-KF, and L 1 -norm constraint-based methods. The RMSE and MAPE values of each path on Monday to Sunday acquired using the C-W, A-KF, and L 1 -norm constraint-based methods were smaller than those from the KF method under the incorrect model. Furthermore, compared with the results from the C-W and A-KF methods, the results obtained using the proposed L 1 -norm constraint-based method were much closer to those from the KF method under the correct model. This indicates the effectiveness of the proposed L 1 -norm constraint-based method.   Table 3 and Table 4, respectively. Compared with the KF method under the incorrect model, the average RMSE and MAPE values decreased significantly, especially those obtained using the proposed L1-norm constraint-based method. The best performance of the L1-norm constraint-based method reduced the average RMSE by 823.3 (from   Table 3 and Table 4, respectively. Compared with the KF method under the incorrect model, the average RMSE and MAPE values decreased significantly, especially those obtained using the proposed L1-norm constraint-based method. The best performance of the L1-norm constraint-based method reduced the average RMSE by 823.3 (from  Tables 3 and 4, respectively. Compared with the KF method under the incorrect model, the average RMSE and MAPE values decreased significantly, especially those obtained using the proposed L 1 -norm constraint-based method. The best performance of the L 1 -norm constraint-based method reduced the average RMSE by 823.3 (from 901.86 to 78.56), and the relative accuracy improved by 91.29%. The corresponding average MAPE decreased by 92.51% (from 99.75% to 7.24%).   Table 5 show that the average RMSE and MAPE values acquired from the proposed L 1 -norm constraint-based method were much closer to those from the KF method under the correct model than the values from the other two filter-divergence suppression methods. The smallest average RMSE and MAPE differences were 2.07% and 0.47%, respectively. The traffic flows for all of the paths shown in Figure 5a predicted using the S-DA system based on the KF, C-W, A-KF, and proposed L 1 -norm constraint-based methods were calculated. Tables 6  and 7 show the average RMSE and MAPE values, respectively, of each path from Monday to Sunday. The average MAPE values for all paths from Monday to Sunday obtained using the KF method under the incorrect model were all above 97%. This indicates that filter divergence occurred. The average MAPE values acquired from the C-W, A-KF, and L 1 -norm constraint-based methods decreased by different amounts. For the sake of analysis, the forecasting results for all paths on workday Monday and non-workday Saturday were taken as examples. The best filter-divergence suppression performances were obtained using the L 1 -norm constraint-based method. The average RMSE value from the L 1 -norm constraint-based method decreased by 524.82 (from 596.29 to 71.47), and the relative accuracy improved by 88.01% on Monday compared with the results from the KF method under the incorrect model. The corresponding relative accuracy improved by 90.87% on Saturday. Furthermore, the average MAPE value from the L 1 -norm constraint-based method decreased by 89.05% (from 98.39% to 9.34%) on Monday and 89.64% (from 98.50% to 8.86%) on Saturday compared with those obtained using the KF method under the incorrect model.  Table 8 presents the differences in the average RMSE and MAPE values of all paths for the filter-divergence suppression methods under the incorrect model and the KF method under the correct model. The average RMSE and MAPE values acquired using the L 1 -norm constraint-based method were much closer to those obtained using the KF method under the correct model. On workday Monday, the difference in the average RMSE and MAPE values between the C-W and KF methods under the correct model was 177.69 and 24.44%, respectively. The corresponding differences between the A-KF and KF methods were 100.44 and 14.49%, respectively. The differences in the average RMSE and MAPE values between the proposed L 1 -norm constraint-based method and the KF method were 15.77 and 2.22% on Monday, respectively. Similar results were obtained for non-workday Saturday. The smallest average RMSE and MAPE differences were 11.77 and 2.44%, respectively, which were acquired using the L 1 -norm constraint-based method. Overall, the results in Tables 3-8 suggest that in suppressing the filter divergence and improving the short-term traffic flow forecasting accuracy, the L 1 -norm constraint-based method outperformed the C-W and A-KF methods. This indicates that using the proposed L 1 -norm constraint-based method to suppress filter divergence is effective. Unlike suppressing the filter divergence phenomenon in the C-W method by adding an inflation factor to model the covariance matrix, in the proposed method, the gain matrix K based on the actual conditions was directly and adaptively adjusted. The difficulty of selecting a weight or inflating factor and the propagation from the model or measurement error covariance to the K matrix estimation in the C-W method were therefore also avoided. Moreover, compared with the A-KF method, this method was simpler and required less storage space. Furthermore, the proposed method adjusted the gain matrix directly without adjusting the model error covariance matrix. This prevented the uncertainties of the indirect operations in the A-KF method from affecting the assimilation results.

Conclusions
In this study, an approach for filter-divergence suppression in an S-DA system for short-term traffic flow prediction was proposed. The approach allowed for the simulated values from the state model to be close to the measured values when divergence was about to occur based on an L 1 -norm criterion. The proposed L 1 -norm constraint-based method was compared with two other commonly used methods, the C-W and A-KF methods, to suppress filter divergence in short-term traffic flow forecasting. The empirical results confirmed the following.

1.
The proposed approach based on the L 1 -norm constraint can suppress the filter-divergence phenomenon that occurs in the KF assimilation method of the S-DA system.

2.
The proposed approach based on the L 1 -norm constraint had a higher assimilation accuracy for suppressing filter divergence than the other two methods.
Therefore, the proposed filter-divergence suppression method based on the L 1 -norm constraint is feasible and effective for filter-divergence suppression in an S-DA system for short-term traffic flow predictions. In future work, the proposed method is planned to be used in other fields to expand its range of applications.