Short-Term Tra ﬃ c Flow Forecasting via Multi-Regime Modeling and Ensemble Learning

: Short-term tra ﬃ c ﬂow forecasting is crucial for proactive tra ﬃ c management and control. One key issue associated with the task is how to properly deﬁne and capture the temporal patterns of tra ﬃ c ﬂow. A feasible solution is to design a multi-regime strategy. In this paper, an e ﬀ ective approach to forecasting short-term tra ﬃ c ﬂow based on multi-regime modeling and ensemble learning is presented. First, to properly capture the di ﬀ erent patterns of tra ﬃ c ﬂow dynamics, a regime identiﬁcation model based on probabilistic modeling was developed. Each identiﬁed regime represents a speciﬁc tra ﬃ c phase, and was used as the representative feature for the forecasting modeling. Second, a forecasting model built on an ensemble learning strategy was developed, which integrates the forecasts of multiple regression trees. The tra ﬃ c ﬂow data over 5-min intervals collected from four I-80 freeway segments, in California, USA, was used to evaluate the proposed approach. The experimental results show that the identiﬁed regimes are able to well explain the di ﬀ erent tra ﬃ c phases, and play an important role in forecasting. Furthermore, the developed forecasting model outperformed four typical models in terms of root mean square error (RMSE) and mean absolute percentage error (MAPE) on three tra ﬃ c ﬂow measures.


Introduction
Traffic congestion brings substantial negative impacts on humanity, such as high travel costs, increased anxiety, and polluted air.To alleviate traffic congestion problems, researchers and authorities all over the world have explored a wide number of feasible solutions.Among them, intelligent transportation system (ITS) has become the most popular and effective one.By effectively and efficiently collecting, processing, and disseminating traffic data, ITS helps traffic researchers and practitioners make reasonable and reliable decisions, and has achieved great success during the past decade [1][2][3][4][5].
Traffic flow describes the traffic conditions over certain time intervals using representative measures, for example, flow rate, speed, and density.As the values of the measures are continuous over time, traffic flow is commonly recorded and described with a time series format.Short-term traffic flow forecasting, the forecasted time scale of which is no more than 30 min (e.g., 5, 10, or 15 min) is a fundamental function in ITS.One key issue in traffic flow forecasting is how to properly define and capture the temporal patterns of traffic flow [6].To solve this issue, two categories of coping strategies are commonly adopted.The first is a detrending strategy, which is to build forecasting models by separating the trends of the traffic flow time series from the remaining fluctuations and developing distinct models to describe the trends and fluctuations, respectively.The second is a multi-regime strategy, which assumes that different patterns are contained in traffic flow dynamics.Based on the assumption, different regimes are first identified according to the distinct patterns.After that, the pattern of each regime is characterized and captured by a separate model.A comprehensive comparison study with the two strategies was conducted by Li et al. [6].They concluded that the models based on the multi-regime strategy are capable of identifying the local trends in the vehicle count time series and describing its fluctuations more effectively.However, the overall forecasting errors of the multi-regime models cannot be notably reduced in their experiments because of the introduced errors by the regime identification model.
As the above analyses show, the forecasting performance of a multi-regime model is closely related to two major aspects.First, a proper regime identification model needs to be built in order to assign the correct regime to each traffic flow observation.Second, an accurate multi-regime forecasting model to capture the patterns of different regimes of traffic flow is required.Inspired by these two facts, this paper presents an effective approach to forecasting short-term traffic flow based on multi-regime modeling and ensemble learning, which includes two key procedures.First, to properly capture the different patterns of traffic flow dynamics, a regime identification model based on a probabilistic approach is developed.Each regime represents a specific and homogeneous traffic condition.The probabilistic approach builds on the concepts of the Markov Process and Markov Chain.Second, a forecasting model using an ensemble learning strategy is developed, which integrates the outputs of multiple individual learners to produce the final forecasts.The ensemble learning strategy introduced can effectively improve the accuracy of the forecasting model.
The organization of this paper is as follows.The next section provides a comprehensive review on the existing studies associated with traffic flow forecasting.Subsequently, the details of the proposed approach are given.After that, the evaluated data sets are described.The following experiments are carried out to evaluate the proposed approach.Lastly, the conclusions and future work are discussed.

Literature Review
During the past decades, researchers have explored and developed a vast number of traffic flow forecasting models.
As traffic flow is recorded as a time series format, which can be effectively described and modeled by typical time series models, autoregressive integrated moving average (ARIMA) was applied by Ahmed and Cook [7] to establish a short-term traffic flow forecasting model.The model can accurately capture the local trends in traffic flow.Nevertheless, given the cyclical characteristic of traffic flow, the first difference of a traffic flow time series will not produce a stationary series, leading to inaccurate forecasts.In view of this, a seasonal autoregressive integrated moving average (SARIMA) model was developed by Williams et al. [8] to forecast urban freeway traffic flow.The results indicated that the SARIMA models outperform the nonparametric regression (NPR), artificial neural network (ANN), and historical average models.Xia et al. [9] designed a multistep forecaster based on a SARIMA model and an embedded adaptive Kalman filter model.The advantage of the designed model is that it is easy to implement and is computationally inexpensive.Kalman filtering is another efficient model, which uses the state of the previous moment to compute the best estimate of the state at the current moment.Some researchers extended the Kalman filtering to forecast short-term traffic volume [10,11].The ARIMA, SARIMA, and Kalman filter models are capable of accurately and timely forecasting stable traffic flow, while neither is able to capture the nonlinear patterns in traffic flow.
To effectively capture the nonlinear patterns in traffic flow, a series of more elegant models were proposed.The NPR models [12,13] were presented to avoid the shortcomings of the parametric models (e.g., ARIMA and SARIMA) and to make more accurate forecasts in non-stationary traffic flow.The main advantages of the NPR models are that they do not need to make priori assumptions, and have a high flexibility and intuitive expression [14].It should be noted that the NPR models are sensitive to outliers usually underlying in traffic flow data.Recently, SVR has drawn increasing attention, and been successfully applied in short-term traffic forecasting [15,16].It needs to be indicated that the standard SVR model requires a blind and time-consuming searching procedure to find suitable hyperparameters.To improve the computational efficiency, some more advanced SVRs have been presented [17,18].Because of the capability to capture the uncertainty and nonlinearity in traffic flow, the ANN models were designed for short-term traffic flow forecasting tasks.Although each neuron in ANN has a very simple function, the network, composed of a large number of neurons, can possess significantly powerful functions [19,20].In the past decade, deep learning models, in particular deep neural networks (DNN), have achieved great success in various domains.Accordingly, several researchers presented a number of traffic flow forecasting models based on DNN.Lv et al. [21] put forward a deep architecture model using autoencoders as building blocks to represent traffic flow features.Experimental results demonstrated the superiority of the DNN model.Polson and Sokolov [22] showed that deep learning architectures are able to capture the nonlinear spatiotemporal effects resulting from the transitions between free flow, breakdown, recovery, and congestion in traffic flow.More recently, Do et al. [23] introduced the spatial and temporal attentions that are utilized to capture the spatial dependencies between road segments and temporal dependencies between time steps into the DNN model.
While the models mentioned above have been proven to be effective in short-term traffic flow forecasting, an issue that cannot be ignored is that the work mechanisms of the models have not been easily understood by decision-makers.That is, the forecasting results lack a convincing interpretation.Several researchers have noticed this problem, and tried to solve it by adopting some prior feature selection strategies [15,16].Similarly, we presented a short-term traffic flow forecasting model using a data-driven feature selection strategy and bias-corrected random forests [24], which shows an excellent forecasting performance and good interpretability.
As the well-known "No Free Lunch" theorem [25] states, there is no one model can work best for every problem.Furthermore, the above analyses indicate that each of the analyzed models has its own advantages and limitations.To compare different forecasting models from distinct viewpoints, some researchers conducted comprehensive and deep literature reviews on traffic flow forecasting [1,[26][27][28].Interested readers can refer to these studies for more comparison details of various models.
A key issue in traffic flow forecasting is how to properly define and capture the temporal patterns of traffic flow.To effectively handle this issue, two categories of coping strategies are commonly adopted.The first is a detrending strategy, which is to build forecasting models by separating the trends of traffic flow time series from the remaining fluctuations, and developing distinct models to describe the trends and fluctuations, respectively.For instance, the SARIMA model assumes that there is a weekly or monthly cyclic trend in the traffic flow [29].Chen et al. [30] studied different forecasting models built on either the original traffic flow sequence or the remaining time series.They discovered that in the latter case, a significant performance improvement can be achieved.The second is a multi-regime strategy, which assumes that different patterns are contained in traffic flow dynamics.In some relevant studies [31][32][33], researchers divided the traffic flow data into several regimes according to distinct patterns, and established separate models to characterize the traffic flow dynamics in each regime.The associated results demonstrate the potential of the multi-regime strategy.A comprehensive comparison study with the two strategies was conducted by Li et al. [6].They indicated that the multi-regime strategy could be utilized to improve the forecasting performance when using an accurate regime identification model.With this in mind, in the subsequent sections, we will develop an effective traffic flow forecasting model based on a multi-regime strategy, which establishes the regime identification model by probabilistic modeling.In addition, an ensemble learning mechanism is introduced to further improve the performance of the developed forecasting model.

Overall Framework
The overall framework of the proposed approach is illustrated in Figure 1.As seen, the framework consists of two major procedures.In the first procedure, multiple regimes of traffic flow are identified using a probabilistic approach.Each regime characterizes a pattern that describes a homogeneous traffic condition during the study time period.The identified regimes are then used as the representative features for the forecasting modeling.In the second procedure, the training data set and test data set are separately established according to the constructed features.Subsequently, the forecasting model is built based on the training set and an ensemble learning strategy, and utilized to produce the forecasts of the test data set.The overall framework of the proposed approach is illustrated in Figure 1.As seen, the framework consists of two major procedures.In the first procedure, multiple regimes of traffic flow are identified using a probabilistic approach.Each regime characterizes a pattern that describes a homogeneous traffic condition during the study time period.The identified regimes are then used as the representative features for the forecasting modeling.In the second procedure, the training data set and test data set are separately established according to the constructed features.Subsequently, the forecasting model is built based on the training set and an ensemble learning strategy, and utilized to produce the forecasts of the test data set.

Regime Identification Based on Probabilistic Modeling
Traffic behaviors under various traffic conditions can be treated as a stochastic process [34].In view of this, probabilistic modeling is conducted in this study to properly identify the associated regimes of traffic flow.The hidden Markov model (HHM) is one of the most powerful algorithms in probabilistic modeling [35,36], and is thus used here.An HMM describes two stochastic processes based on the concepts of the Markov Process and Markov Chain.The first stochastic process is hidden, while the second is observable.The hidden stochastic process is established to infer the observations of the observable stochastic process.In the regime identification task, the regimes are described as the hidden states in HMM, and the hidden process depicts how the regimes are transformed into each other.Meanwhile, the values of the traffic flow measures are described as the observations in HMM, and the observable process describes how the collected observations of the measures evolve over time.

Regime Identification
In the study, the regime identification model based on HMM is defined and characterized with the following elements.

Regime Identification Based on Probabilistic Modeling
Traffic behaviors under various traffic conditions can be treated as a stochastic process [34].In view of this, probabilistic modeling is conducted in this study to properly identify the associated regimes of traffic flow.The hidden Markov model (HHM) is one of the most powerful algorithms in probabilistic modeling [35,36], and is thus used here.An HMM describes two stochastic processes based on the concepts of the Markov Process and Markov Chain.The first stochastic process is hidden, while the second is observable.The hidden stochastic process is established to infer the observations of the observable stochastic process.In the regime identification task, the regimes are described as the hidden states in HMM, and the hidden process depicts how the regimes are transformed into each other.Meanwhile, the values of the traffic flow measures are described as the observations in HMM, and the observable process describes how the collected observations of the measures evolve over time.

Regime Identification
In the study, the regime identification model based on HMM is defined and characterized with the following elements.
(1) A set of hidden states S = {s 1 , s 2 , . . ., s N }, where each hidden state s j (1 ≤ j ≤ N) describes the jth regime of traffic flow.N is the number of hidden states.Specifically, the state at time t is denoted as q t .
Appl.Sci.2020, 10, 356 describes the kth observation of a traffic flow measure.M is the number of observation symbols.
(3) A state transition probability distribution , where A describes the evolving probability distribution of the analyzed traffic flow from one of the defined regimes to another.Specifically, a ij describes the probability of the analyzed traffic flow evolving from regime s i to regime s j .
(4) The observation symbol probability distribution B = b j (k) in state s j , where B describes the generation probability of the observations of the analyzed traffic flow in each regime.Specifically, b j (k) describes the generation probability of the kth observation of the analyzed traffic flow in regime s j .( 5) The initial state distribution π = {π i }, where π describes the probability of the initial regime of the analyzed traffic flow.Specifically, π i is the probability of the initial regime equaling to s i .
Given an observation sequence of a traffic flow measure O = O 1 , O 2 , . . ., O T , the regime identification model aims to identify the corresponding regime sequence Q = q 1 , q 2 , . . ., q T , based on the determined values of N, M, A, B, and π.
As mentioned above, the hidden state set S defines the regimes of traffic flow.Each regime describes a homogeneous traffic condition.According to the fundamental diagram in traffic flow theory and a series of relevant studies [37][38][39][40][41], five distinct regimes could be defined in practice, as shown in Figure 2. The first two regimes, Regime 1 and Regime 2, describe two kinds of free-flow traffic conditions, while the last two regimes, Regime 4 and Regime 5, represent two categories of congested traffic conditions.The third regime, Regime 3, defines a transition condition from the free-flow stage to the congested stage.Therefore, the number of hidden states in the study, N, is set as five.
Appl.Sci.2020, 10, x 5 of 17 describes the evolving probability distribution of the analyzed traffic flow from one of the defined regimes to another.Specifically,  describes the probability of the analyzed traffic flow evolving from regime  to regime  .(4) The observation symbol probability distribution  =  () in state  , where describes the generation probability of the observations of the analyzed traffic flow in each regime.Specifically,  () describes the generation probability of the th observation of the analyzed traffic flow in regime  .(5) The initial state distribution  =  , where describes the probability of the initial regime of the analyzed traffic flow.Specifically,  is the probability of the initial regime equaling to  .
Given an observation sequence of a traffic flow measure  =  ,  , … ,  , the regime identification model aims to identify the corresponding regime sequence  =  ,  , … ,  , based on the determined values of , , , , and .
As mentioned above, the hidden state set  defines the regimes of traffic flow.Each regime describes a homogeneous traffic condition.According to the fundamental diagram in traffic flow theory and a series of relevant studies [37][38][39][40][41], five distinct regimes could be defined in practice, as shown in Figure 2. The first two regimes, Regime 1 and Regime 2, describe two kinds of free-flow traffic conditions, while the last two regimes, Regime 4 and Regime 5, represent two categories of congested traffic conditions.The third regime, Regime 3, defines a transition condition from the freeflow stage to the congested stage.Therefore, the number of hidden states in the study, , is set as five.The observation symbols in the identification model are associated with the values of the traffic flow measures.For example, the speed measure in short-term traffic flow forecasting is usually recorded every 5 min, and its value range determines the representation space of the observation symbols.However, in practice, the value range is usually large, resulting in many observation symbols and hence in a high computational complexity of the model.To effectively tackle the problem, a simple discretization strategy is employed, in which the measure values are discretized into multiple relatively large and equal numerical intervals (e.g., 0 ∼ 5, 5 ∼ 10, 10 ∼ 15).For the other two measures, a similar strategy can be adopted.Based on the strategy, , where o max and o min are the maximum and minimum of the measure values, respectively, and o int is the size of the divided numerical interval.
The remaining three parameters, A, B, and π, need to be determined by training the probabilistic model using the traffic flow observations.For simplicity, we denote λ = (A, B, π).To get the optimal λ according to the given observation sequence O = O 1 , O 2 , . . ., O T , we defined and solved the training problem by computing the maximum probability p = (O|λ) with an iterative procedure.A modified Baum-Welch algorithm was introduced to learn the best λ.The details of the algorithm are skipped here.Interested readers can refer to the work [42] for more details.
The determined λ can be further utilized to uncover the hidden state sequence Q, based on the given observation sequence O.To this end, we implemented the Viterbi algorithm [43] to solve the problem.The algorithm aims to optimize the following objective: where δ t (i) is the highest probability along a single path at time t.To obtain the state sequence, we needed to iteratively solve the following optimization problem by considering each t and i: The complete computing procedure of the Viterbi algorithm can be seen in the literature [44].In summary, through the above modeling process, the regimes of traffic flow can be properly identified.The next section is to build the forecasting model by using the regimes to construct the representative features and data sets for the model training.

Feature Construction
A key step in forecasting modeling is to determine the representative features.For the multi-regime model, the identified regimes are treated as the most important features, and will be used in the modeling.In addition, the temporal correlations of the forecasted traffic flow measure and the interactive correlations of the multiple traffic flow measures are also two significant factors that can be used to improve the forecasting accuracy [14,24,45].With this in mind, the time-lagged and interactive features of the traffic flow are also added to the representative feature pool.Table 1 lists the constructed features for the forecasting modeling.Note that in the study, the time interval is set as 5-min and the forecasted time is denoted as t.Occupancy at time t − 3

Forecasting Modeling via Ensemble Learning
Given the historical traffic flow observations and the constructed features, the training data set and the test data set can be separately established.After that, the model to forecast the short-term traffic flow needs to be built.In this study, a forecasting model based on an ensemble learning strategy is developed using the obtained training set.Ensemble learning is a very popular and useful technique in machine learning [46], which can be utilized to improve the generalization performance of the forecasting models, and has achieved great success in various domains.The training and test procedure of the developed forecasting model is as follows: Step 1: Randomly draw an instance Step 2: Use D NT to train an unpruned regression tree, denoted as T α (1 ≤ α ≤ K).The training procedure is as follows: for each node of the regression tree, ϕ, randomly sample β features from the constructed features, and use them to compute the best split that has the maximum mean decrease in impurity.The impurity of the r(1 ≤ r ≤ β)th sampled feature, x r , associated with node ϕ is defined as follows: where x r ∈ ϕ is the value of x r of the training samples, which falls into the range of the split at node ϕ; y r (ϕ) is the response value of the training samples associated with x r ∈ ϕ; y(ϕ) is the mean of y r (ϕ); and n ϕ is the number of the training samples associated with x r ∈ ϕ.
In the above training procedure, if ϕ is a leaf node, y(ϕ) is set as the forecast of the node.
Step 3: Repeat Step 1 and Step 2 K times to train a number of K regression trees.
Step 4: Given the instance in the established test data set D E , denoted as I E (I E ∈ D E ), the forecasts from the K regression trees are first computed.Assume the forecasts are f 1 (I E ), f 2 (I E ), . . ., f K (I E ), respectively.The final forecast of the instance is calculated as the median of the forecasts of the K regression trees, as follows: Note that the median rule instead of the average rule is used to combine the outputs of the regression trees, in that the median rule can yield more robust forecasts in the cases where there are some outliers in the leaf nodes [47].

Data Description
The traffic flow data collected from four dual-loop detectors on I-80 freeway segments was used to evaluate the proposed approach.The detectors recorded the traffic flow observations of three traffic measures including the flow rate, speed, and occupancy every 30 s.In the study, the lane-by-lane traffic flow observations were aggregated into 5-min intervals at each detector station in order to obtain stable traffic data.The data were acquired through the California Performance Measurement System (PeMS).Figure 3

Model Calibration
The first step of the developed regime identification models associated with the four detector stations was to determine the model parameters.To this end, the models were implemented and calibrated using the whole study data set.The determined model parameters  and  are listed in Tables 2 and 3, respectively.The statistical means of the traffic measures for each identified regime are depicted in Table 4.
The parameter  describes the probability distribution of the initial state.As seen from Table 2, for each detector station, the values of four of the five states are 0 or close to 0, while the value of the other state is 1.0.This might be because the start time of the traffic flow when the data were collected from each detector station was at 00:00, when the traffic on the road is under a free-flow condition.3 are the determined probabilities.Take Station 400976, for example.The transition probability between the same states was much higher than that between the different states.This indicates that the variations of traffic

Model Calibration
The first step of the developed regime identification models associated with the four detector stations was to determine the model parameters.To this end, the models were implemented and calibrated using the whole study data set.The determined model parameters π and A are listed in Tables 2 and 3, respectively.The statistical means of the traffic measures for each identified regime are depicted in Table 4.
The parameter π describes the probability distribution of the initial state.As seen from Table 2, for each detector station, the values of four of the five states are 0 or close to 0, while the value of the other state is 1.0.This might be because the start time of the traffic flow when the data were collected from each detector station was at 00:00, when the traffic on the road is under a free-flow condition.Another significant model parameter A describes the transition probability distribution from one state to another.The digital numbers in the last five columns of Table 3 are the determined probabilities.Take Station 400976, for example.The transition probability between the same states was much higher than that between the different states.This indicates that the variations of traffic conditions usually occured after a certain time period.For State 1, its more likely next state is State 2, meaning the traffic evolved from slightly crowded conditions to completely free flow conditions at times.For State 2, the transition probability from it to another state was very small.Occasionally, it transformed into State 1 (slightly crowded) or State 5 (very congested).The traffic goes through from free-flow conditions to very congested conditions, which might be because of the occurrence of traffic crashes.For State 3, its next possible states were State 5 and State 4, implying the traffic evolved from transition conditions to more congested conditions.For State 4, it is possible to transit to the other four states.For State 5, it might change to State 2, State 3, or State 4, meaning that the traffic congestion has been gradually dissipating.For the other stations, similar patterns could be identified.However, there were some slightly differences in the cases where different traffic states were transformed into each other.

Identification Results Analysis
To observe the statistical characteristics of the traffic flow associated with the different regimes, the mean of each measure at each detector station for each regime was calculated, as depicted in Table 4.Meanwhile, the second column of the table provides the corresponding regime in Figure 2, which is associated with each hidden state in the identified model.From the table, we can see that each regime has distinct traffic flow characteristics.Regime 1 has the minimum flow rate mean and occupancy mean, but the maximum speed mean.Compared with Regime 1, Regime 2 shows a higher flow rate mean and an occupancy mean, but a lower speed mean.In Regime 3, the traffic flow shows the maximum flow rate mean and the mild occupancy mean and speed mean.In Regime 4, the flow rate mean begins to drop down while the occupancy mean continues to increase and the speed mean continues to drop down.Finally, in Regime 5, the occupancy mean achieves the maximum and the speed mean comes to the minimum.Meanwhile, the flow rate mean reaches a similar level to that of Regime 2. Figure 4 provides the regime identification results at each detector station.In each subfigure, the data points with each color represent an identified regime.Comparing Figure 2 with Figure 4, we can see that the proposed approach can properly identify different regimes of traffic flow.

Feature Importance Analysis
To check the importance of the constructed features, a measure named the increased node purity [48] (illustrated as IncNodePurity in Figure 5) was calculated for each feature.The measure computes the total decrease in node impurity from splitting on the given feature, and then averages it over all of the component regression trees.The node impurity is quantified by the residual sum of the squares and is calculated only at the node at which that feature is used for that split.Based on the measure,

Feature Importance Analysis
To check the importance of the constructed features, a measure named the increased node purity [48] (illustrated as IncNodePurity in Figure 5) was calculated for each feature.The measure computes the total decrease in node impurity from splitting on the given feature, and then averages it over all of the component regression trees.The node impurity is quantified by the residual sum of the squares and is calculated only at the node at which that feature is used for that split.Based on the measure, the importance of each feature associated with each station was calculated and ranked.As similar results were obtained for all four stations, for illustration purposes, we only provide the results associated with Station 400081, as shown in Figure 5. From the figure, we can observe that the identified regimes at different times play an important role in forecasting.Accordingly, the regime at time t − 1 is more important than the regimes at time t − 2 and time t − 3.For different forecasting tasks, the most important traffic flow measure is distinct.For example, if the forecasted measure is the flow rate, the most important feature is flow rate at time t − 1.For the other two forecasting tasks, the most important feature corresponds to the forecasted traffic flow measure at time t − 1.Taken together, it can be seen that it is necessary to use the regimes as the representative features in traffic flow forecasting tasks.

Performance Measures
The first step of the developed regime identification models associated with the four detector stations was to determine the model parameters.To this end, the models were implemented and calibrated using the whole study data set.The determined model parameters  and  are listed in Tables 2 and 3.The statistical means of the traffic measures for each identified regime are depicted in Table 4.
To evaluate the developed forecasting models, two performance measures were employed, including root mean square error (RMSE) and mean absolute percentage error (MAPE).The measures are defined as follows:

Performance Measures
The first step of the developed regime identification models associated with the four detector stations was to determine the model parameters.To this end, the models were implemented and calibrated using the whole study data set.The determined model parameters π and A are listed in Tables 2 and 3.The statistical means of the traffic measures for each identified regime are depicted in Table 4.
To evaluate the developed forecasting models, two performance measures were employed, including root mean square error (RMSE) and mean absolute percentage error (MAPE).The measures are defined as follows: where y γ is the true value of the γth sample of the considered traffic flow measure, ŷγ is the forecasted value of the γth sample of the considered traffic flow measure, and n s is the number of forecasted samples.

Forecasting Results Analysis
In the study, a one-step forecasting task was carried out.The modeled traffic flow measures include flow rate, occupancy, and speed.Moreover, four forecasting models were implemented and compared, including ARIMA, Regression Tree (RT), Ensemble Regression Trees (ERT), and ensemble regression trees based on multi-regime modeling (ERT-MRM), developed in the study.As a typical kind of time series model, ARIMA has been successfully applied in various domains [7] because of its solid theoretical foundations and good ability to capture local trends in stationary time series.Meanwhile, it is commonly used as a baseline in traffic flow forecasting tasks.There are three parameters to be determined in the ARIMA model (i.e., the number of autoregressive terms, p; the number of nonseasonal differences needed for stationarity, d; and the number of lagged forecast errors in the forecasting equation, q.In this paper, the three parameters were determined using the Akaike information criterion (AIC).That is, p, d, and q were chosen when the forecasting model has the lowest AIC.RT is another popular model for forecasting modeling.It has shown to be competitive in many applications while possessing a good interpretability [48].In this study, it was implemented as a pruned regression tree in order to achieve a good generalization performance.ERT is an ensemble version of the RT model.As mentioned, ensemble learning can improve the accuracy and robustness of the forecasting system.To this end, we added ERT to the comparative list.The number of the considered trees in ERT was set as 100 in order to balance the accuracy and efficiency of the model.The three models mentioned above did not utilize a multi-regime strategy, and hence the regimes were not used as the representative features in modeling.Instead, the ERT-MRM developed in this paper was implemented using a multi-regime strategy, and its component trees were also set as 100 for a fair comparison.
The performances of the implemented models on the traffic flow data sets at the four study stations are illustrated in Figure 6.As the figure shows, the ERT-MRM model achieved the best performance on each data set.The ARIMA and RT models show similar forecasting errors, which are both higher than that of the ERT and ERT-MRM models.This might be because the ensemble strategy introduced in the latter two models can take advantage of the strengths of the component tree forecaster to effectively improve the accuracy and robustness of the forecasting system.Moreover, it can be observed that the ERT-MRM model shows a more competitive performance than the ERT model, meaning the multi-regime strategy is capable of being utilized to improve the accuracy of forecasting models [32].Taken together, for the ERT-MRM model, the RMSEs associated with the flow rate measure are less than 14 veh/5-min, and the MAPEs are no more than 4%; the RMSEs associated with the occupancy measure are less than 0.01%, and the MAPEs are no more than 4.5%; and the RMSEs associated with the speed measure are less than 2 mi/h, and the MAPEs are no more than 1.5%.
Figure 7 gives the comparisons between the observed and forecasted values of one-day traffic flow randomly selected from the test data set.From the figure, we can see that ERT-MRM can provide reliable traffic flow forecasts at all of the four stations.As a result, the model developed in this study can be perfectly used in proactive freeway management and control.can be observed that the ERT-MRM model shows a more competitive performance than the ERT model, meaning the multi-regime strategy is capable of being utilized to improve the accuracy of forecasting models [32].Taken together, for the ERT-MRM model, the RMSEs associated with the flow rate measure are less than 14 veh/5-min, and the MAPEs are no more than 4%; the RMSEs associated with the occupancy measure are less than 0.01%, and the MAPEs are no more than 4.5%; and the RMSEs associated with the speed measure are less than 2 mi/h, and the MAPEs are no more than 1.5%.

Conclusions
Traffic flow forecasting has been a significant and hot research topic during the past decades because of its key role in proactive traffic management and control.In this paper, a short-term traffic flow forecasting approach based on multi-regime modeling and ensemble learning is presented.The

Conclusions
Traffic flow forecasting has been a significant and hot research topic during the past decades because of its key role in proactive traffic management and control.In this paper, a short-term traffic flow forecasting approach based on multi-regime modeling and ensemble learning is presented.The approach consists of two procedures.In the first procedure, multiple regimes of traffic flow were properly identified using a probabilistic modeling method, and further used to construct the representative features for the forecasting modeling.In the second procedure, the constructed features were utilized to establish the training and test data sets that are employed to train and evaluate the forecasting model.To improve the generalization performance of the model, an ensemble learning strategy from the machine learning domain was introduced.To evaluate the proposed approach, 5-min traffic flow data collected from four dual-loop detectors on I-80 freeway segments were used.The experimental results show that the identified regimes are able to explain the different phases of traffic flow well, and play an important role in forecasting.Furthermore, the developed forecasting model outperformed four comparative models in terms of RMSE and MAPE on three traffic flow measures.
For a forecasting model, its accuracy is closely associated with two aspects, that is, data and algorithm.To ensure the quality of the used data, the representative features need to be properly determined.In the study, we developed a multi-regime modeling strategy to enable the model to have good input features.On the other hand, an elegant algorithm needs to be designed to make the forecasting model well-fitting with the training data, while possessing a good generalization ability.To achieve this goal, a typical ensemble learning strategy was employed in the study.The above two strategies ensure the good performance of the proposed approach.
In the future, more traffic flow data collected from different road types will be used to evaluate the proposed approach.As the number of identified regimes could affect the performance of the forecasting model, it is necessary to check the sensitivity of the parameter and to explore how to optimally determine the parameter.In addition, more forecasting models will be implemented and compared.Finally, more multi-regime modeling and ensemble learning strategies will be developed and integrated into the framework of the proposed approach.

Figure 1 .
Figure 1.Overall framework of the proposed approach.

Figure 2 .
Figure 2. Regimes defined in the traffic fundamental diagram.

Figure 2 .
Figure 2. Regimes defined in the traffic fundamental diagram.
Appl.Sci.2020, 10, 356 6 of 17 the training data set D T and add it to a new training data set D NT .Next, return the instance to D T .Repeat the sampling process n times and generate the final D NT .That is, D NT = {I 1 , I 2 , . . ., I n }.

17 Figure 3 .
Figure 3. Locations of the study detector stations.
parameter  describes the transition probability distribution from one state to another.The digital numbers in the last five columns of Table

Figure 3 .
Figure 3. Locations of the study detector stations.

Figure 4 .
Figure 4. Regime identification results associated with the four study stations.

Figure 4 .
Figure 4. Regime identification results associated with the four study stations.

Figure 6 .
Figure 6.Performance of the comparative forecasting models associated with different stations.

Figure 7 Figure 6 .
Figure7gives the comparisons between the observed and forecasted values of one-day traffic flow randomly selected from the test data set.From the figure, we can see that ERT-MRM can provide reliable traffic flow forecasts at all of the four stations.As a result, the model developed in this study can be perfectly used in proactive freeway management and control.

Figure 6 .
Figure 6.Performance of the comparative forecasting models associated with different stations.

Figure 7
Figure 7 gives the comparisons between the observed and forecasted values of one-day traffic flow randomly selected from the test data set.From the figure, we can see that ERT-MRM can provide reliable traffic flow forecasts at all of the four stations.As a result, the model developed in this study can be perfectly used in proactive freeway management and control.

Figure 7 .
Figure 7. Comparisons between the observed and forecasted values of the three traffic flow measures.

Figure 7 .
Figure 7. Comparisons between the observed and forecasted values of the three traffic flow measures.

Table 1 .
Constructed features for forecasting modeling.

Table 2 .
Determined  in four regime identification models.

Table 2 .
Determined π in four regime identification models.

Table 3 .
Determined A in four regime identification models.

Table 4 .
Statistical means of traffic measures for each identified regime.