Freeway Short-Term Travel Speed Prediction Based on Data Collection Time-Horizons: A Fast Forest Quantile Regression Approach

: Short-term tra ﬃ c speed prediction is vital for proactive tra ﬃ c control, and is one of the integral components of an intelligent transportation system (ITS). Accurate prediction of short-term travel speed has numerous applications for tra ﬃ c monitoring, route planning, as well as helping to relieve tra ﬃ c congestion. Previous studies have attempted to approach this problem using statistical and conventional artiﬁcial intelligence (AI) methods without accounting for inﬂuence of data collection time-horizons. However, statistical methods have received widespread criticism concerning prediction accuracy performance, while traditional AI approaches have too shallow architecture to capture non-linear stochastics variations in tra ﬃ c ﬂow. Hence, this study aims to explore prediction of short-term tra ﬃ c speed at multiple time-ahead intervals using data collected from loop detectors. A fast forest quantile regression (FFQR) via hyperparameters optimization was introduced for predicting short-term tra ﬃ c speed prediction. FFQR is an ensemble machine learning model that combines several regression trees to improve speed prediction accuracy. The accuracy of short-term tra ﬃ c speed prediction was compared using the FFQR model at di ﬀ erent data collection time-horizons. Prediction results demonstrated the adequacy and robustness of the proposed approach under di ﬀ erent scenarios. It was concluded that prediction performance of FFQR was signiﬁcantly enhanced and robust, particularly at time intervals larger than 5 min. The ﬁndings also revealed that speed prediction error (in terms of quantiles loss) ranged between 0.58 and 1.18. short-term performance


Introduction
With rapid growth in car ownership, traffic congestion has become one of the most critical social concerns in urban metropolitans around the world. In addition to restraining smooth inter-city mobility, it also poses a threat to the urban economy and stable development [1][2][3][4]. Traffic congestion could be recurrent resulting from routine cyclic fluctuations in traffic, or it may be non-recurrent due to emergency incidents, special events, unforeseen bad weather conditions, and so forth. China is one of the fastest growing economies in the world, second after the US. China's transport sector has witnessed a dramatic increase during the past four decades, with the motorization rate exponentially accumulated from 1.8 million in 1980 to 340 million in 2019 [5,6]. Similarly, motor vehicle ownership in the country has increased from 1.8 per 1000 persons in 1980 to 179 in 2019 [6]. China has also become the largest car market with annual sales exceeding 24 million vehicles in 2016 [7]. But this rapid economic development has also brought severe consequences in terms of energy, environment, and social costs. According to statistics, the most prominent cities in the country are accounting for huge daily economic loss worth $1 billion, due to traffic congestion [8], which is an alarming situation. Further, traffic congestion has slowed down the average running speed in many Chinese cities. For example, in 2011, the average driving speed in Beijing was 7.5 miles per hour compared to 12.4 in Hong Kong, 15.5 in New York city, and 18 in London despite the fact, that all of these cities have car populations larger than Beijing [9]. Additionally, around 30% of Beijing's air pollution is dominated by transport emissions [9]. Thus, it is essential to identify the underlying causes, and properly tackle the issue of traffic congestion.
Accurate traffic information is of great importance for managing traffic congestion in urban areas. In addition to information about existing traffic conditions, accurate knowledge about traffic state parameters (traffic flow, density, speed) in subsequent short time intervals is vital for deciding on a potential control and management strategy. Accurate traffic prediction is an integral component of advanced travelers information system (AITS) in intelligent transportation system (ITS). It has numerous applications such as route planning, navigation, dynamic traffic assignment, congestion estimation, and other mobility services [10,11]. Among traffic state parameters, travel speed is one of the main indexes that reflect the quality of operating conditions along the highways. Travel speed directly influences the implementation of traffic management strategies like traffic control system (TCS) and traffic guidance System (TGS) [12]. The accuracy of travel speed prediction is largely influenced by available data, traditionally from loop detectors, radars, and traffic cameras fixed at some important road locations. However, with the increasing amount of available data collected from mobile services (smartphones and on-board GPS devices), probe vehicles, remote traffic microwave sensors (RTMS), and various internet of things (IOT) sensors, the challenge is no longer related to data quantity, but rather to extraction and modeling of useful information from this data [13]. With accurate travel speed prediction, travelers can make more informed decisions about trip generation and dynamic route planning. Developing traffic congestion can be mitigated collectively, and traffic conditions can become more stable. However, it is always challenging to realistically estimate short-term future travel speed conditions because of the complexity of road network, instability and stochasticity in traffic flow, and floating vehicles speed.
In previous literature, different approaches have been utilized for short-term travel speed prediction including time series analysis methods [14,15], statistical regressions [16], artificial neural (NN) [17,18], and support vector regression (SVR) methods [19]. Although, time series and statistical methods have good theoretical interpretations, these methods have been frequently questioned regarding prediction performance. While traditional machine learning methods like NN and SVR have too shallow architecture to capture latent interactions among variables, particular for complex network. Recently, with unprecedented opportunities for collecting detailed data, deep learning has drawn widespread research attention due to its excellent ability to extract essential data features, with enhanced computational efficiency at a rapid pace. Although prediction accuracy from all machine learning is relatively better, it receives criticism of being operated within a black box lacking sound theoretical basis. Regarding travel speed prediction, most of the methods used in the existing literature have focused on selecting arbitrary time-horizons for data collection without accounting for influence of different time intervals on predictive performance. It is essential to study the influence of varying time-horizons for the collection of data on travel speed prediction. Present study attempts to fill this research gap using a novel regression approach.
Given the variability in travel speed under recurrent and non-recurrent traffic conditions, the objective of current research study is to make better speed predictions under multiple traffic data collection time-horizons, that would ultimately assist in alleviating congestion in the city of Beijing. Speed data with varying time intervals was collected from loop detectors on 2nd Ring Road in Beijing. The specific contributions of this study are: (i) short-term traffic speed performance using novel FFQR approach, to the best of our knowledge, FFQR has not been used in traffic flow forecasting; (ii) to compare the performance of proposed approach under varying data collection time-horizons (i.e., 5, 10, and 15 min intervals); (iii) to conduct hyperparameter optimization of model by random grid to improve the prediction accuracy; (iv) to demonstrate prediction accuracy for varying data collection time-horizons in terms of quantile loss. Study results indicated that the proposed approach was efficient and robust under the considered multiple time-horizons.
The remainder of this paper is structured as below. Section 2 provides a detailed review of existing literature in traffic flow forecasting in general, and particularly short-term speed prediction. Section 3 describes the study area, data used in the study, and key algorithm parameters setting. Section 4 discusses the architecture of the proposed approach in context of current study, and model performance evaluation. Section 5 presents results and discussions with reference to quantile loss associated with multiple time-ahead scenarios. Finally, Section 6 summarizes the key study conclusions, study limitations, and recommendations for potential future work.

Literature Review
Travel speed is an essential measure for estimating the quality of operating conditions in traffic networks. Accurate short-term travel speed prediction plays a vital role in proactive traffic control in ITS. Short-term traffic prediction involves precise predictions of various traffic parameters such as traffic flow, speed, density, and occupancy [20,21]. Researchers are challenged continuously by the consequences of rapid urbanization, including severe congestion and safety issues. The ideal conditions for precise prediction of traffic state is that vehicles occupy their respective lane without frequent lane change maneuvers, as sudden lane changes have been found to be associated with low prediction accuracy as well as motor vehicle crashes [22,23]. Uncertainty in travel time and speed estimation can be disastrous, leading to extreme man-hours wasted waiting in a queue, increased fuel consumption, and vehicular emissions [24][25][26]. Travel speed prediction refers to the estimation of average vehicle fleet speed in the near future (for example, 1 to 60 min) using real-time traffic data. Robust and accurate traffic state prediction has numerous applications for active traffic management, intelligent driving, high-precision navigation, route planning, and several other advanced applications [18,27]. However, realistic traffic state prediction is a challenging task due to the non-linear and stochastic nature of traffic data. Also, it is challenging to record individual vehicle speed on a busy urban route, particularly during rush hours. This issue was addressed by Anil et al., who proposed a comprehensive framework incorporating a processing module with traffic cameras [28]. The proposed architecture was found capable of tracking and estimating speed in real-time for every single vehicle in the camera frame. Further, most of the existing approaches for traffic state prediction rely on previous speed records, and it is closely associated with other different traffic variables such as density and traffic volume on contagious links and road segments. These roads may not be necessarily linked to the target road, but changes in traffic attributes of surrounding roads will affect travel speed later on [29]. However, considering too many irrelevant adjacent roads are likely to aggravate the complexity of the prediction algorithm as well as decreasing its running performance, while considering only a few adjacent links will degrade its prediction accuracy. Hence, a sensitivity analysis is recommended to select the most relevant adjacent roads providing a reasonable trade-off between the two.
Traffic state prediction always requires prior real-time speed information/data from devices such as loop detectors, traffic cameras, GPS navigation devices, and mobile phones. It is rather difficult and impractical to capture network-wide speed data using fixed location devices; GPS and mobile phones may serve as a suitable alternative. Mobile phone navigation devices have several advantages over the former methods such as high accuracy, reliability, optimal performance in real-time, less construction time and costs, etc. [30]. To detect the instantaneous speed of vehicles, remote traffic microwave sensor (RTMS) is another useful non-intrusive new piece of equipment. The device is installed roadside and is capable of directly recording moving or stationary vehicle speed without interrupting traffic flow. In addition to capturing speed data, RTMS can provide reliable information about traffic volume, density, and occupancy for multiple lanes simultaneously, even during adverse weather conditions [31,32]. In recent years, researchers have proposed various methods to improve the accuracy of speed estimation. For example, speed estimation results from the cellular probe system, and loop detectors were aggregated using the travel-time based method [33]. To avoid tracking each vehicle using any labeled data, velocity-based estimation approach was proposed [34]. A recent study introduced a path inference approach [35], using taxi GPS traces having low sampling frequency to accurately estimate network-wide speed on congested links.
The principal input parameters in predicting short-term travel speed are traffic flow, travel speed, and occupancy. While each of the three parameters for traffic congestion can be used, both speed and traffic flow correlate with occupancy. Furthermore, speed is more directly associated with traffic operation status. A study previously conducted found that short-term speed prediction results are significantly influenced by real-time dynamic traffic control [36]. Traffic speed is a commonly used metric to evaluate the road segment's traffic status. A wide variety of sensors, including GPS vehicles, inductive loop detectors, and mobile phones, have been continuously collecting large scale traffic data promoting the advancement of data-driven intelligent transport systems (ITS). In general, the term "short-term" relates to a prediction horizon of up to one hour. It predicts traffic conditions ahead of the present moment for a few seconds to a few hours, which is the optimal time for individual navigation and global urban traffic planning. Existing traffic state prediction methods with traffic sensor data are commonly divided into three categories: data-driven, model-driven, and data-driven streaming [37]. In recent years the analysis of road traffic data and future traffic characteristics were investigated by statistical, machine learning, and data mining techniques. Numerous methodologies have been introduced and adopted for short-term traffic prediction, and the ultimate objective remains the same: to acquire the prediction results accurately and as efficiently as possible. Predicting with machine learning models, a fine setting of parameters for any model has a significant impact on its performance, as highlighted by previous study [38].

Previous Studies
During the past two decades, numerous studies have been conducted for travel speed prediction. Researchers have considered various methods based on statistical modelling, neural networks, machine learning, big data, etc. These methods are studied under two main categories i.e., parametric and non-parametric based. Recently, some studies have utilized hybrid-based techniques combining two or more methods to enhance prediction accuracy. Parametric methods have a fixed structure, where parameters are learned using an observed data set [39]. Parametric methods have explicit theoretical interpretations, and are easily implemented. These methods require high data quality, with a data sequence desired to be stable and accurate. However, the nature of obtained traffic data is usually unstable and stochastic, which limits their use in complex applications. Some parametric methods explored for short-term traffic flow and speed prediction are: time series models [40,41]; exponential smoothing model [42]; spectral analysis [43,44]; autoregressive integrated moving average (ARIMA) models [45][46][47]; ARIMA model with extended structures like Kohonen-ARIMA [48]; model seasonal autoregressive integrated moving average (SARIMA) model [49]; and ARIMA with Kalman filter [50]. Like parametric methods, non-parametric methods assess dynamic correlation directly from training data; however, they have an enhanced adaptive learning ability and strong generalization resulting in better prediction accuracy [51]. Some nonparametric used for speed prediction are: artificial neural network (ANN) model [52,53]; multi-type neural network [54]; deep convolutional neural network [55]; kernel smoothing [56], k-nearest neighbor approach [57,58]; and support vector regression model (SVRM) [59,60]. Ma et al. suggested a long short-term memory neural network commonly known as LSTM-NN, using a remote sensor network data in the city of Beijing [32]. In another study, researchers compared LSTM with a convolutional neural-network (CNN) for network-wide travel speed prediction, and found that CNN was more robust than LSTM with a 42.91% improvement in mean square error [55].
Recently some studies have focused on hybrid approaches in an attempt to improve prediction accuracy considering the merits and application associated with each prediction method. Few studies that have utilized hybrid models are; the Bayesian-neural network approach [61]; hybrid fuzzy rule-based approach [62]; state-space approach coupled with least-squares support vector machine (LS-SVM) [63]; KNN-Gaussian regression process [64]; and chaos-wavelet analysis support vector machine approach (CWSVM) [65]. Intuitively, hybrid models provide better prediction accuracy compared to single prediction models [66][67][68]. However, complex model architecture and high computational efforts limit their network-wide implementation [43]. With the advent of big data and machine learning technology, different types of machine learning are being explored in short-term travel speed prediction. Some commonly used machine learning models utilized for travel speed prediction are: evolving fuzzy neural network (EFNN) [69]; long short-term memory networks (LSTM) [32,70]; bi-directional long short-term memory neural-network (Bi LSTM-NN) [71]; and include support vector regression (SVR) [59]. NN, and fuzzy schemes have also been successfully used in other related disciplines such as image retrieval, feature extraction, and signal cycle length optimization [72][73][74].
It is evident from the reviewed literature, that different approaches have been adopted for traffic state prediction to improve prediction accuracy. These may be categorized into three distinct categories, parametric, non-parametric methods, and methods based on deep learning. The former has reliable theoretical interpretations but is not considered good in terms of prediction accuracy because of the stochastic nature of traffic data, while the non-parametric methods works in a black box with a weak theoretical basis. However, machine learning approaches are relatively flexible, with very little or no initial assumptions for input parameters. Machine learning methods have much deeper and complex architecture to capture stochastic variation thus yielding improved prediction accuracy. Further these methods are capable of processing outliers, missing, and noisy data. Comprehensive review of existing literature also indicates that most of the studies focused on selecting arbitrary time-horizons for data collection without taking into account the effects of time interval on predictive performance. It is important to study the influence of varying time-horizons for the data collection on travel speed prediction. Hence, this paper examines short-term travel speed performance of novel fast random forest quantile regression (FRFQ) using varying data collection time-horizons speed data from four loop detectors on a freeway segment of 2nd Ring Road in Beijing. To the best of our knowledge, this method has not been used in previous studies for traffic forecasting.

Data Collection and Parameters Settings
Numerous powerful traffic simulation tools are available to replicate realistic field driving conditions by incorporating appropriate parameters inputs. It was anticipated that very realistic outcomes can be achieved by enabling a precise geometric representation of conditions, the behavior of drivers, and vehicle features. A number of verifications have been initiated, involving examination of coded networks, so that the coded network can replicate current field conditions. Microscopic traffic simulation tool VISSIM was used to realistically simulate traffic conditions in the study area. However prior to the field conditions, it is essential to calibrate the driving behavior parameters for the traffic simulator using appropriate procedures as reported by a recent study [75]. After default parameter calibration and validation of a traffic simulator, multiple simulation runs were performed with different random seeds to ensure that the model worked as planned. A portion of Beijing's 2nd freeway Ring Road (shown in Figure 1) was selected to verify the performance of the proposed approach. The length of the selected segment was 1.326 km. The 2nd freeway Ring Road is approximately 33 km and includes 37 on-ramps and 53 off-ramps. After getting the appropriate freeway architecture, the macroscopic characteristics (e.g., split ratio and demand flow) needed for the tuning of the complete microscopic simulator were identified. Using the VISSIM 2nd freeway Ring Road network simulation model, the traffic flow from around 06:00 a.m. to 12:00 p.m. has been further mimicked. The data was collected on a selected portion of 2nd freeway Ring Road from different loop detectors with different time intervals, including 05, 10, 15 min. The location of 2nd freeway Ring Road from detector 1 to detector 4 can be seen in Figure 2. Figure 3 presents the flow chart for sequential methodology.
Sustainability 2020, 12, x FOR PEER REVIEW 6 of 19 freeway Ring Road from detector 1 to detector 4 can be seen in Figure 2. Figure 3 presents the flow chart for sequential methodology.   freeway Ring Road from detector 1 to detector 4 can be seen in Figure 2. Figure 3 presents the flow chart for sequential methodology.

Fast Forest Qunatile Regression
There are various regression types. Regression models aim to fit a target variable that is expressed as a numerical vector. Nevertheless, statisticians have increasingly developed sophisticated regression techniques. Quantile regression enables to understand the predicted value

Fast Forest Qunatile Regression
There are various regression types. Regression models aim to fit a target variable that is expressed as a numerical vector. Nevertheless, statisticians have increasingly developed sophisticated regression techniques. Quantile regression enables to understand the predicted value distribution. FFQR is a powerful tree-based quantile regression model utilized in this study that is capable of predicting non-parameter distributions. FFQR uses decision trees to implement random forest quantile regression. Random forests (RF) can help prevent overfilling with decision trees. A tree ensemble is developed in a random forest using bagging to select a subset of random samples and training data characteristics, and then fit a decision tree into each data subset, unlike the algorithm of random forests, which averages all trees output. FFQR keeps all of the predicted labels in trees indicated by the quantile sample count parameter. It displays the distribution to allow the user to view the quantile values for the given instance. The main strength of FFQR is that in every leaf of every tree, all relevant observations are stored, not just their average like happens in a random forest. Instead of the conditional mean or average, it helps to predict conditional quantiles of a given instance. Tree-based quantile regression models such as fast forest quantiles regression have the additional advantage that they can be used to predict non-parametric distributions. In general, RF combines several regression trees into an ensemble to generate more accurate regressions by extracting many bootstrap samples from the original training data and fitting each sample with a tree [76].
In FFQR, traffic prediction problem could be formalized as: let X m (t) be a measured value vector containing traffic measurements from a point of traffic network indexed by m at time t. The vector X could have travel speed component measured by a specific loop detector indexed by m. The datasets further divided into three sets, training, validation, and testing. Firstly, the training procedure uses two sets, whilst the third sets evaluates the capacity of predicting trained FFQR. In the RF, predicted outcomes Y p for m = 1, . . . , k, new data samples resulting from predictors, X m , are modelled as a weighted average of responses Y, y = 1, . . . , training data samples, with weights w given in Equation (1), [77]. In particular, we consider the conditional distribution function (in Equation (2)) of response variable Y conditioned on the specific values x of the predictor variable X m .
Each RF is comprised of several trees. Every tree T is grown by repeatedly splitting s the training data by a bootstrap sample. Every split is a predictor value. Splitting frequently happens until the partition has reached a minimum number of observations. At that point, the partition becomes reach a terminal node. The average overall trees provide predictions that depend on the complete set of training data, including responses and predictors. Random forests provide an accurate and consistent estimation of the conditional mean of the variable response. FFQR is an overall random forest generalization that provides a robust, non-linear, and non-parametric method to estimate conditional quantities [78]. Besides, FFQR provides a robust, non-linear, and non-parametric method to estimate conditional quantities. On the other hand, random forests estimated the conditional mean, while FFQR gives an estimation of the entire conditional distribution. A brief overview of the Algorithm 1 is given below [78].
Grow k trees T(θ t ), t = 1, . . . , k. Considering all observation for each leaf of tree. θ is the random parameter vector that defines how the tree works (e.g., and variables are used for split points at each node).

2.
For a given X = x, drop x down all trees. Measure the weight w j (x, θ t ) of observation j ∈ {1, . . . n} for every tree.

4.
Compute the estimate of the distribution function for all from Equation (3), using the weights from Step 2.
The model depends on several parameters, which are essential for the efficacy of the model. In order to find improved results and high accuracy, we used the random grid as a hyperparameters optimization for FFQR. The range of best combination for hyperparameters optimization used for different prediction horizons are given in Table 1, The above listed tuned-optimized hyperparameters were achieved using 10-fold cross-validation. For each prediction horizon, the number of iterations performed was 18. The FFQR via random grid implemented in Azure machine learning studio.

Model Evaluation
Quantile Loss functions proved to be useful for the prediction of an interval instead of only point-predictions. Also, quantile loss is simply an extension of mean absolute error (MAE). The performance evaluation metrics used in this study were quantile loss and root mean squared errors (RMSE). The metrics were calculated from the below equations; where y i is the ground truth, y p i is the predicted output, γ is the selected quantile and n is the number of observations.

Quantile Loss for Different Time Horizon
The hyper tuning parameters included for mean prediction, 0.07 quantile interval, and 0.95 quantile interval are the number of trees, number of leaves, minimum leaf instances, bagging fraction, and feature fraction. Figure 4 shows the quantile loss of detector 4 for 15 min prediction horizons, which are number of trees and a number of leaves. The quantile loss obtained for the mean prediction, 0.07 quantile, and 0.95 quantiles were 0.8007, 0.471, and 0.94, respectively. Figure 5 presents the detector 4 quantile loss for 10 min prediction horizons which were obtained for mean prediction, 0.07, and 0.95. The achieved values for a number of trees and the number of leaves is 0.68, 0.77, and 0.60. Similarly, Figure 6 indicates the quantile loss of detector 4 achieved for 5 min prediction horizons for mean prediction, 0.07, and 0.95 were 1.1, 1.4, and 0.84, respectively. Figure 7a depicts the impact of minimum leaf instances, bagging fraction, split fraction, and feature fraction on quantile loss for 0.95 interval, which was achieved when the quantile loss was 0.47. The values achieved for minimum leaf instances, bagging fraction, split fraction, and feature fraction was 2, 0.9, 0.6, and 0.6, respectively. Figure 7b,c indicate the quantile loss for 0.07 interval and mean prediction, which was obtained for 0.94 and 0.80, respectively. The obtained values of 0.07 quantile for minimum leaf instances, bagging fraction, split fraction, and feature fraction were 2, 0.6, 0.9, and 0.1, respectively. Similarly, Figure 8c shows the achieved values of mean prediction for minimum leaf instances, bagging fraction, split fraction and feature fraction were 2, 0.9, 0.6, and 0.6, respectively. In Figure 7a-c, the encircled values show the values of the best-tuned parameters for mean prediction with less quantile loss, which was obtained for 10-fold cross-validation.

Quantile Loss for Different Time Horizon
The hyper tuning parameters included for mean prediction, 0.07 quantile interval, and 0.95 quantile interval are the number of trees, number of leaves, minimum leaf instances, bagging fraction, and feature fraction. Figure 4 shows the quantile loss of detector 4 for 15 min prediction horizons, which are number of trees and a number of leaves. The quantile loss obtained for the mean prediction, 0.07 quantile, and 0.95 quantiles were 0.8007, 0.471, and 0.94, respectively. Figure 5 presents the detector 4 quantile loss for 10 min prediction horizons which were obtained for mean prediction, 0.07, and 0.95. The achieved values for a number of trees and the number of leaves is 0.68, 0.77, and 0.60. Similarly, Figure 6 indicates the quantile loss of detector 4 achieved for 5 min prediction horizons for mean prediction, 0.07, and 0.95 were 1.1, 1.4, and 0.84, respectively. Figure 7a depicts the impact of minimum leaf instances, bagging fraction, split fraction, and feature fraction on quantile loss for 0.95 interval, which was achieved when the quantile loss was 0.47. The values achieved for minimum leaf instances, bagging fraction, split fraction, and feature fraction was 2, 0.9, 0.6, and 0.6, respectively. Figure 7b,c indicate the quantile loss for 0.07 interval and mean prediction, which was obtained for 0.94 and 0.80, respectively. The obtained values of 0.07 quantile for minimum leaf instances, bagging fraction, split fraction, and feature fraction were 2, 0.6, 0.9, and 0.1, respectively. Similarly, Figure 8c shows the achieved values of mean prediction for minimum leaf instances, bagging fraction, split fraction and feature fraction were 2, 0.9, 0.6, and 0.6, respectively. In Figure 7a-c, the encircled values show the values of the best-tuned parameters for mean prediction with less quantile loss, which was obtained for 10-fold cross-validation.

Quantile Loss for Different Time Horizon
The hyper tuning parameters included for mean prediction, 0.07 quantile interval, and 0.95 quantile interval are the number of trees, number of leaves, minimum leaf instances, bagging fraction, and feature fraction. Figure 4 shows the quantile loss of detector 4 for 15 min prediction horizons, which are number of trees and a number of leaves. The quantile loss obtained for the mean prediction, 0.07 quantile, and 0.95 quantiles were 0.8007, 0.471, and 0.94, respectively. Figure 5 presents the detector 4 quantile loss for 10 min prediction horizons which were obtained for mean prediction, 0.07, and 0.95. The achieved values for a number of trees and the number of leaves is 0.68, 0.77, and 0.60. Similarly, Figure 6 indicates the quantile loss of detector 4 achieved for 5 min prediction horizons for mean prediction, 0.07, and 0.95 were 1.1, 1.4, and 0.84, respectively. Figure 7a depicts the impact of minimum leaf instances, bagging fraction, split fraction, and feature fraction on quantile loss for 0.95 interval, which was achieved when the quantile loss was 0.47. The values achieved for minimum leaf instances, bagging fraction, split fraction, and feature fraction was 2, 0.9, 0.6, and 0.6, respectively. Figure 7b,c indicate the quantile loss for 0.07 interval and mean prediction, which was obtained for 0.94 and 0.80, respectively. The obtained values of 0.07 quantile for minimum leaf instances, bagging fraction, split fraction, and feature fraction were 2, 0.6, 0.9, and 0.1, respectively. Similarly, Figure 8c shows the achieved values of mean prediction for minimum leaf instances, bagging fraction, split fraction and feature fraction were 2, 0.9, 0.6, and 0.6, respectively. In Figure 7a-c, the encircled values show the values of the best-tuned parameters for mean prediction with less quantile loss, which was obtained for 10-fold cross-validation.    Figure 8 shows the predicted trends measured for detector 4 under different time intervals during morning peak hours and off-peak hours. It also indicates the actual mean prediction, 0.07, and 0.95 quantiles for 5, 10, and 15 min prediction horizons. We find that mean travel speed prediction, 0.07, and 0.95 quantiles were close to the actual speed data in the period of off-peak between (6:00 a.m. to 7:30 a.m. and 10:30 a.m. to 12:00 p.m.). In addition, results showed that the prediction accuracy for off-peak hours (normal time) is better than the peak hours' period because the traffic flow is more stable during normal time than the peak time.  Figure 8 shows the predicted trends measured for detector 4 under different time intervals during morning peak hours and off-peak hours. It also indicates the actual mean prediction, 0.07, and 0.95 quantiles for 5, 10, and 15 min prediction horizons. We find that mean travel speed prediction, 0.07, and 0.95 quantiles were close to the actual speed data in the period of off-peak between (6:00 a.m. to 7:30 a.m. and 10:30 a.m. to 12:00 p.m.). In addition, results showed that the prediction accuracy for off-peak hours (normal time) is better than the peak hours' period because the traffic flow is more stable during normal time than the peak time.  Figure 8 shows the predicted trends measured for detector 4 under different time intervals during morning peak hours and off-peak hours. It also indicates the actual mean prediction, 0.07, and 0.95 quantiles for 5, 10, and 15 min prediction horizons. We find that mean travel speed prediction, 0.07, and 0.95 quantiles were close to the actual speed data in the period of off-peak between (6:00 a.m. to 7:30 a.m. and 10:30 a.m. to 12:00 p.m.). In addition, results showed that the prediction accuracy for off-peak hours (normal time) is better than the peak hours' period because the traffic flow is more stable during normal time than the peak time. (a)

Model Perfrmance under Different Time Intervals
The impact of data collection time intervals on the prediction of short-term travel speed accuracy is essential. FFQR was used to predict short term travel speed at different time intervals (i.e., 5, 10, and 15 min). The quantile loss of mean prediction for data collected in different time intervals from different loop detectors can be seen in Figure 9, which shows a similar trend for detector 3 and detector 4, as time-horizons increased the quantile loss decreased. It can be observed from Figure 9 that the proposed FFQR yielded robust travel speed prediction outcomes with varying time-horizons, particularly at larger time intervals. The prediction accuracy increased for increasing the time intervals, as shown in Figure 9. The model performed better as shown in Tables 2 and 3

Model Perfrmance under Different Time Intervals
The impact of data collection time intervals on the prediction of short-term travel speed accuracy is essential. FFQR was used to predict short term travel speed at different time intervals (i.e., 5, 10, and 15 min). The quantile loss of mean prediction for data collected in different time intervals from different loop detectors can be seen in Figure 9, which shows a similar trend for detector 3 and detector 4, as time-horizons increased the quantile loss decreased. It can be observed from Figure 9 that the proposed FFQR yielded robust travel speed prediction outcomes with varying time-horizons, particularly at larger time intervals. The prediction accuracy increased for increasing the time intervals, as shown in Figure 9. The model performed better as shown in Tables 2 and 3 with lower quantiles loss and RMSE for loop detectors at different time intervals, demonstrating the traffic speed pattern characteristics over time. For example, the mean quantile loss and RMSE for all loop detectors for time intervals of 10 min are 0.941 and 11.26, respectively. In contrast, the mean quantile loss and RMSE for all loop detectors for time interval of 15 min are 0.93 and 10.73 respectively. Despite the sophisticated and complex road conditions in the empirical test, model performance remained entirely satisfactory. The decrease in these two indicators, quantile loss and RMSE, indicated the improvement in the short-term speed prediction. Further, it may be noted from the results that the suggested model yielded desirable travel speed prediction results with low RMSE. These results are indicative of the fact that increasing the time interval for data collection could reduce traffic uncertainty, therefore the speed pattern is more stable and also predictable [79]. Additionally, the higher accuracy relationship with increased time intervals for data collection is aligned with many other legitimate prediction approaches [80]. Similar studies conducted suggest that prediction accuracy is inversely proportional to data collection time-horizons. For example, in a study the accuracy obtained (in term of RMSE) for traffic speed prediction using Elman NN for 1 min and 4 min were 10.79 and 12.92, which was less reliable relating to our obtained prediction accuracy for different time-horizons [32]. In addition, researchers have compared various models such as SVR, ANN, bayesian regularized neural network (BRNN) and SARIMA to forecast short-term speed and achieved prediction accuracy estimates comparable to our proposed method. In these studies, authors have demonstrated the predicted travel speed trend during off-peak hours and peak hours of the day and captured traffic nonlinearity in arbitrary time horizons [81][82][83]. indicative of the fact that increasing the time interval for data collection could reduce traffic uncertainty, therefore the speed pattern is more stable and also predictable [79]. Additionally, the higher accuracy relationship with increased time intervals for data collection is aligned with many other legitimate prediction approaches [80]. Similar studies conducted suggest that prediction accuracy is inversely proportional to data collection time-horizons. For example, in a study the accuracy obtained (in term of RMSE) for traffic speed prediction using Elman NN for 1 min and 4 min were 10.79 and 12.92, which was less reliable relating to our obtained prediction accuracy for different time-horizons [32]. In addition, researchers have compared various models such as SVR, ANN, bayesian regularized neural network (BRNN) and SARIMA to forecast short-term speed and achieved prediction accuracy estimates comparable to our proposed method. In these studies, authors have demonstrated the predicted travel speed trend during off-peak hours and peak hours of the day and captured traffic nonlinearity in arbitrary time horizons [81][82][83]. To evaluate the predictive accuracy of the models at different time intervals, the performance metrics of the model were also presented in Tables 2 and 3.  To evaluate the predictive accuracy of the models at different time intervals, the performance metrics of the model were also presented in Tables 2 and 3.

Conclusions
The objective of this study was to predict short-term travel speed under different time-horizons, which is extremely essential for travel route planning, real-time proactive traffic control, and management in ITS. Existing literature on the topic was reviewed, which revealed that previous studies have mostly focused on time-series, statistical regression, and conventional artificial intelligence techniques (such as ANN, SVM). However, prediction accuracy from time-series methods are relatively low, whereas traditional AI approaches have too shallow architecture to capture the non-linear, stochastic, and intricate characteristics of traffic flow. Thus, we proposed a novel FFQR model for short-term travel speed forecasting under multiple data collection time-ahead horizons. FFQR is an ensemble technique having relatively deep architecture that combines several regression trees to yield more accurate regressions estimates for a predictor variable. The proposed method was applied using loop detectors data based on a microscopic traffic simulator along a freeway segment on 2nd Ring Road in Beijing. The results showed that the FFQR model performed well in predicting short-term travel speed, particularly at larger time-horizons. The study findings also showed that speed prediction error quantified in terms of quantiles loss on average ranged between 0.58 and 1.18. It was also noted that the proposed FFQR model was efficient in capturing the observed variations in field speed data. Prediction results demonstrated the adequacy and robustness of the proposed approach under different data collection time scenarios. Hence, travel speed prediction from the FFQR model could serve as useful guidance for policy and decisions makers particularly in the study area (city of Beijing) as wells as travelers in the city for efficient operations and commute through urban metropolitans.
Future studies should focus on exploring the influence of other important external factors such as weather and traffic incidents to enhance prediction accuracy of travel speed. In addition, current study could be extended to data collected from network-wide loop detectors or sensors considering the spatial information of travel to evaluate the adequacy of the proposed approach. Lastly, future studies could concentrate on additional advanced optimization techniques to explore more appropriate parameter combinations for the current proposed model, and to achieve more accurate short-term travel speed prediction outcomes.

Study Limitations
This study has a few limitations that must be acknowledged. First, the current study utilized speed data from a single road segment; however, traffic on adjacent road segments may affect the predicted speed performance on the target road segment. Second, uncertainty in travel speed prediction emerged as an inevitable issue due to the stochastic nature of traffic data. Third, this study used speed data from fixed location loop detectors that are not reliable for collecting such data network-wide. Data from GPS navigation devices and RTMS could serve as a potentially more valuable alternative for capturing instantaneous speed in a congested urban network. Finally, this research utilized VISSIM simulation data to justify the efficacy of the proposed approach, however, it should be noted that there are some limits on the developed simulated urban freeway network model.