Optimal Control Design for Trafﬁc Flow Maximization Based on Data-Driven Modeling Method

: This paper proposes enhanced prediction and control design methods for improving trafﬁc ﬂow with human-driven and automated vehicles. To achieve accurate prediction for the entire time horizon, data-driven and model-based prediction methods were integrated. The goal of the integration was to accurately predict the outﬂow of the trafﬁc network, which was selected as the highway section in this paper. The proposed novel prediction method was used in the optimal design for calculating controlled inﬂows on highway ramps. The goal of the design was to reach the maximum outﬂow of the trafﬁc network, even against disturbances on uncontrolled inﬂows of the network. The control design leads to an optimization problem based on the min–max principle, i.e., the trafﬁc outﬂow is considered to be maximized by controlled inﬂows and to be minimized by uncontrolled inﬂows. The effectiveness of the prediction and the control methods through simulation examples are illustrated, i.e., trafﬁc outﬂow can be maximized by the control system under various uncontrolled inﬂow values.


Introduction and Motivation
Providing control strategies for automated vehicles has become a subject of significant focus in research and development centers of vehicle industry. An important task is to design a velocity profile for vehicles in order to guarantee effective, comfortable, safe and economical traffic by exploiting vehicle dynamics and environmental circumstances, e.g., certain characteristics of fuel consumption, delivered cargo, road inclinations, speed limits, traffic flow and traffic forecast. The complexity of the control task results in various performance requirements which must be simultaneously guaranteed by the control system [1].
Existing control solutions result in speed profiles for automated vehicles, which differ from the speed selection strategy of human drivers. The reason for this is that the control systems of automated vehicles can obtain information on the road ahead, e.g., the usage of the road capacity or the upcoming downhill terrain characteristics. At the same time, most of these data are not available for human drivers. It can be shown that the speed selection of automated vehicles and human-driven vehicles are not independent from each other. Consequently, the motion of human-driven vehicles must be adapted to automated vehicles. Moreover, the ratio of automated vehicles (κ) in the entire traffic network also influences the characteristics of the traffic flow through the modification of traffic speed. Figure 1 presents an illustration of the impact of κ on the variation in average traffic speed. In this scenario, a 20 km-long three-lane (outer, middle and inner lanes) segment of the hilly Hungarian M1 highway is modeled in a VISSIM traffic simulator. In the simulation scenario, the speed limit is 130 km/h, except for the section between 6 and 8 km, where the speed limit is 90 km/h. Without automated vehicles (κ = 0%), the average traffic speed is close to the speed limit, as can be seen in Figure 1a. When there is an increased number of automated vehicles on the highway (κ = 20%), the average traffic speed varies due to the automated vehicles, whose speed profiles are also influenced by the uphill and downhill sections. The latter case results in an approximately 2% decrease in energy consumption, within the entire traffic, compared to the previous case, i.e., without automated vehicles.   Figure 2a,b show the results of the scenario in the cases of the previous example, i.e., κ = 0% and κ = 20%, respectively. In these examples, the inflow into the highway section is q in = 3000 veh/h. Figure 2c presents the result of the scenario, in which the ratio of automated vehicles is κ = 50%. Moreover, Figure 2d presents the result of the scenario, in which the inflow into the highway section is significantly increased to q in = 5000 veh/h, and κ = 20%. Through the illustrations, it can be concluded that increasing the ratio κ and/or increasing the inflow q in has a significant impact on traffic flow.

Brief Literature Overview on the Related Achievements
Since automated vehicles have a significant impact on traffic flow, it is necessary to take the modeling and control design of the traffic system into consideration. A comprehensive overview of different classical control solutions for ramp metering has been proposed by [2,3]. Moreover, the vehicle-to-infrastructure (V2I) communication provides new perspectives in the control of the traffic system, because a huge amount of data about the motion of the vehicles in the traffic network can be obtained [4]. This allows a more sophisticated prediction of the upcoming traffic scenario, in which the effectiveness of the traffic control can be improved.
The exploitation of the result of the traffic flow analysis in the modeling and control of automated vehicles is one of the hot topics among researchers, as can be seen in (e.g., [5]). The most important approaches of traffic flow modeling were summarized by [6]. The analysis of the traffic flow, in which semi-automated and automated vehicles travel alongside human-driven vehicles, was proposed by [7]. Stability issues of the traffic flow of connected and automated vehicles were examined by [8]. In [9], it has been shown that automated vehicles have only slightly negative effects but significant positive effects on the traffic flow depending on the penetration rate of the automated vehicles and the traffic scenario. Interactions between automated and human-driven vehicles were presented in [10]. The authors in [11] highlighted new achievements in the construction of a macroscopic fundamental diagram for traffic flow with automated vehicles. The presented data analysis results revealed that the traditional triangular fundamental diagram structure remains applicable to describe the traffic flow characteristics of traffic with automated vehicles. For mixed traffic, a parsimonious formula was provided to estimate the fundamental diagram with measures from pure human-driven traffic and pure automated vehicle traffic. Control-oriented applications are strongly connected to the modeling of traffic flow. For example, in [12], a network-level coordination for automated vehicle control and traffic light control was presented, i.e., a distributed optimization scheme to reduce the computational complexity and to improve the effectiveness of coordination was developed. A more complex problem was that of control in mixed traffic, because the motion of automated vehicles and the motion of human-driven vehicles simultaneously impact the traffic flow. The interactions between human-driven and autonomous vehicles in optimal control synthesis for tolls were studied by [13].
Based on the huge amount of data on traffic, a novel data-driven approach for the analysis and modeling of traffic flow dynamics was proposed by [14]. Traffic flow prediction using a deep-learning algorithm was presented by [15]. In that study, a deep-learning architecture model was applied by using auto-encoders as building blocks to represent traffic flow features for prediction. Similarly, Lasso regression was used for traffic flow prediction in [16]. Cell phone information-based big data analysis and control for transportation purposes was proposed by [17]. The work of [18] focused on generating models for microscopic traffic simulation, which was built upon real-world data. The identification and prediction of traffic flow states based on the big data analysis method was presented by [19]. An increased number of achievable information on traffic flow was used for training deep neural networks, which were then able to predict traffic flow under urban traffic scenarios, as can be seen in [20]. A deep learning method using convolutional neural network and long short-term memory architectures for monitoring traffic flow in urban region was provided by [21]. The fusion-based technique resulted in high accuracy based on the evaluation through simulation scenarios.

Proposed Methodology of the Paper
The overview of the literature shows that several approaches exist for modeling traffic flow dynamics, but most of these are related to pure human-driven or automated vehicles. The modeling and analysis of mixed traffic flow is an emerging research field, in which partial solutions have been achieved. Modeling methods with classical model-based approaches do exist [11], as do those with unconventional, e.g., network-level [12] or data-driven approaches [20]. Methodologically, the classical traffic modeling methods are based on physical relationships, in which the nonlinear characteristics of the traffic flow are described. The advantage of these methods is that the traffic flow model provides theoretical fundamentals for designing a controller with guaranteed performances [22]. Nevertheless, their drawback is the increased uncertainty concerning the modeling of short-time traffic flow, due to robustness features. In the case of data-driven modeling methods, the actual measured information on the traffic flow is highly relevant, i.e., the short-time prediction of forthcoming traffic flow can be more accurate due to the actual information. However, the data-driven modeling solutions also have disadvantages. In the case of neural-network-based formulation, the evaluation of the prediction/control process is challenging, while in the case of a regression-based formulation, the accuracy of long-time prediction can be limited.
In spite of the various existing methods, it is difficult to find a systematic controloriented modeling method which exploits the advantages of data-driven analysis. The aim of this paper was to integrate data-driven traffic flow prediction in a polynomial modelbased traffic flow model and thus, improve the accuracy of the prediction. The proposed novel prediction method for control design purposes was applied, i.e., an optimization problem of traffic flow volume with a ramp metering by the min-max principle was provided. The resulting control system provides robustness against disturbances of the system, i.e., the uncontrolled inflows of the traffic network. A novel contribution of this paper is that the proposed prediction model is able to handle the presence of automated vehicles in the traffic network at the level of data-driven prediction as well as at the level of the polynomial traffic flow model.
The prediction and control process is illustrated in Figure 3. The proposed prediction method has two components, namely data-driven prediction and model-based prediction. Their outputs are the predicted flow volumes (q N , q 1 ) at the end of the highway section on a p max horizon. The outputs of the predictions were integrated into a final prediction q out , which is used in the control process. The output of the control comprises the controlled inflows on N number of ramps r i on horizon p max . The control inputs on the entire horizon are used in the prediction process and their values at the k + 1 step are used as control inputs of the traffic network. Different measurements on the traffic network for each prediction block are used as inputs and thus, the control loop is closed.

Data-driven prediction
Model-based prediction Integration Control on ramps Traffic network The paper is organized as follows. In Section 2, the data-driven estimation of the traffic flow is presented. This is performed through a two-step analysis. First, a subset selection method is applied, which is able to prioritize the main attributes based on their relations to the measured signals. Second, a linear regression model using the least squares (LS) method is applied to derive a relationship between the attributes and the traffic flow. Section 3 proposes an enhanced traffic flow model, which results from the interconnection of the data-driven prediction and the classical traffic flow modeling. Section 4 proposes the traffic flow control. The effectiveness of the novel optimal control is demonstrated through simulation scenarios. In this paper, the VISSIM complex traffic simulator was used the modeling and simulation of the traffic network, while the WEKA data-mining software was used in the LS-based analysis, as can be seen in [23]. Finally, Section 6 summarizes with some concluding remarks.

Data-Driven Prediction of the Traffic Flow
In this section, data-driven analysis for the prediction of traffic flow dynamics is proposed. The purpose is to find a mathematical structure, which can be used to model the real-time prediction of the forthcoming traffic flow values.

Brief Overview of Data-Driven Analysis
In this paper, LS-based estimation with subset selection was used to generate prediction models for the traffic flow. In the following, the most important features, based on the work of [24], are briefly summarized.
The difficulty in data-driven estimation is that a model should be obtained from a large number of measurements. Consider a dataset with n independent instances, b input variables and one output variable. The instances are written in the form of n × b dimension design matrix X. Let ζ * be the parameter vector of the true prediction model M(ζ * ). Through the application of X and ζ * , the resulting output vector y is determined as where is the noise vector, whose elements have normal distribution N(0, σ 2 ) with σ variance. It is assumed that σ 2 is known or can be estimated, which is denoted byσ 2 . If the number of inputs b of the model increases, the number of subset models also increases by 2 b , which may lead to a computationally unfeasible LS estimation task. Therefore, the effective estimation requires the determination of preferences among the sets of variables in ζ. The subset selection is based on the instances X, whose subset models are generated. In the generation of subset models, it is necessary to reduce the complexity of the estimation problem, while information loss from the data must be limited [25].
The goal of the subset selection is to find the attributes of ζ * , which have a significant impact on y. Consider instance i from vector y as where ζ * j values are the elements of vector ζ * . For the selection of the relevant subsets, it is necessary to analyze the sensitivity of y i depending on x i,j . Since the attributes are time-domain variables, the partial derivatives of (2) are computed as The partial derivatives are computed for all n instances using the measured signals (3). This results in n number of derivatives for all attributes. If the deviation of the resulting partial derivatives for a given attribute is small, then it has a significant impact on y. This attribute is incorporated in the prediction model andζ j is estimated with high effectiveness. However, if the deviation of the resulting ζ * j values is high, then attribute j may not be part of the prediction model.
The decision to incorporate j into the prediction model was made based on the probability density function, which is fitted in a Gaussian form: where µ j and σ j are the mean and the deviation of ζ * j , respectively. The relevance of an attribute can be expressed by integrating the Gaussian function and taking into account the sign of ζ * j together with the scaling of σ j as where D j (ζ * j ) expresses the relevance of an attribute on y. In the subset selection, it is necessary to compute its value for all attributes b. In the selection procedure, the values of D j (ζ * j ) are arranged in descending order and the first l number of attributes is selected for taking part in the prediction model. l is a previously defined scalar value, which is required to select as small as possible values to result in a relatively simple prediction model for real-time computation.
In the next step, the coefficients of the linear prediction model M(ζ) was computed. The prediction was formed asŷ whereζ contains the coefficients of the attributes and X l is an n × l dimension subset of matrix X. The purpose ofζ selection is to find a prediction model from the entire model space M = {M(ζ) : ζ ∈ R b }, whose accuracy has the highest value on the given dataset. The accuracy is scaled by the distance D between the candidate model and the true model, which is defined as where || · || denotes the L 2 norm, y is the measured output, and σ 2 is the variance of noise . The minimization in (7) can be solved as an LS optimization problem [24]. The result of the entire analysis is a linear model, which depends on the relevant attributes.

Data-Driven Prediction of Traffic Flow
The result of data-driven analysis is applied for the prediction of the traffic flow, which has inflows of the highway on controlled gates. In the first step, the relevant attributes based on the analysis of D j (ζ * j ) must be selected. In the second step, the prediction model is formed (6) and the outflow of the traffic system q N (k) is approximated throughq N (k) aŝ where j is a design parameter which represents the previous data. In (8), parameters ω q N , α i,l , β i,c , γ i,c , δ c are the members ofζ and the following data in the prediction are incorporated: • q i (c) + r i (c) are the inflows of each segment, in which r i is the inflows on the controlled ramps, considering the information of current k and the past j; • v i (c) is the average traffic speed on each segment, considering the information of current k and past j; • ρ i (c) is the traffic density on each segment, considering the information of current k and past j information; • κ is the ratio of automated controlled vehicles, considering the information of current k and past j.
Parameters ω q N , α i,l , β i,c , γ i,c , δ c are computed through the optimization algorithm (7). The result of the regression is a linear function ofq N (k + 1) in the form of (8), which approximates q N (k + 1) as where e q N (k + 1) is prediction error. Since e q N (k + 1) is unknown, it is handled as a disturbance of the system. The linear regression form of the system with e q N (k + 1) can be transformed into a state-space representation, such as where x represents the state vector of the system, which contains the measured actual and past values of the process. Thus, most of the components in the regression (8) are incorporated in x, such as where the number of states is M. Furthermore, the control inputs of the system are the signals, with which the traffic system (8) in k can be influenced such as u = r 1 (k) . . . r N (k) T .
The unknown signals of (8) in k and the prediction error in k + 1 are comprised in w(k) = q 0 (k) e q N (k + 1) . Thus, the control inputs of the system are the inflows on controlled ramps, and the disturbance is the uncontrolled inflow into the traffic network. Although (10) provides accurate information on the outflow of the system, the relationship between u(k), w(k) and q N (k + 1) was not sufficiently identified because of state vector x(k), whose elements play a significant role in the value of the outflow. Since (10) does not provide information on the impact of u(k) and w(k) on x(k), it is necessary to expand the traffic flow model with further relationships: where x i (k) represents the ith state (i = 1 . . . M) in x(k). In (12) δ i , A i , B i , D i are parameter vectors, which are estimated by an LS method through traffic data. The relationships (10) and (12) result in the control-oriented state-space representation of the system: where the vector ω and the matrices A, B, D are yielded by ω q N , ω i and A q N , , then (13) is reduced to: The resulting state-space form of the system provides an efficient representation of the traffic flow dynamics which can be used for control design purposes.

Illustration of the Traffic Flow Prediction
The effectiveness of the data-driven prediction of the forthcoming traffic flow is illustrated through the following simulation scenarios. In the example, a 7 km-long hilly section of M1 highway between Budapest and Vienna with two lanes was considered. The section is divided into N = 5 segments and the speed control of the automated vehicles is designed through [1]. The collection of the data was carried out with the VISSIM traffic simulator. More than 200 traffic simulations were performed with various κ = 0; 10; 20-50% and q 0 = 750; 1000; 1250-5000 veh/h average inflow values, while the vehicles randomly arrive into the network at all entrances.
The data-driven analysis based on the results of a large number of traffic simulations was performed. In the analysis, the data-mining WEKA software was used, in which the pace regression algorithm was implemented [23]. The analysis yields linear regression for q 5 (k) in the form of (8). Throughout the analysis, j = 1 past data were also considered with a 5 min horizon backwards. The subset selection has 6 relevant attributes, which are v 5 (k − 1), q 4 (k) and ρ 3 (k), ρ 3 (k − 1), ρ 5 (k), ρ 5 (k − 1). Figure 4 illustrates the effectiveness of the regression analysis, comparing q 5 (k) and q 5 (k) of the training set. The pace regression method results in 94.1% accuracy, which means that most of the q 5 (k) are close toq 5 (k). Time-domain simulations of the examination are shown in Figure 5. Two scenarios are illustrated, i.e., an average traffic flow q 0 (k) = 3000 veh/h and a rush traffic flow q 0 (k) = 5000 veh/h with a congestion at the beginning of the simulation. Figure 5 illustrates that the generated prediction model is able to accurately approximate q 5 (k) in both scenarios.  The examples show that the proposed data-driven model can be effective with regard to making a traffic flow prediction. The results of the prediction are satisfied under normal and congested traffic circumstances. However, the resulted prediction model (14) cannot be used for control purposes alone, because it is based on a linear formula, which is effective for short-term prediction. For long-term prediction, the nonlinear characteristics of the traffic flow must be taken into consideration, which is achieved by the interconnection of the data-driven prediction and the conventional traffic modeling approaches.

Formulating Control-Oriented Traffic Flow Prediction Model
In this section, the control-oriented model for traffic flow dynamics, in which the datadriven prediction model and traffic modeling principles are used together, is proposed.
In the formulation of traffic dynamics, the traffic network is gridded into N number of segments. The traffic flow of each segment is represented by a dynamical equation, which is based on the law of conservation. The relationship contains the sum of inflows and outflows for a given segment i. Traffic density ρ i (veh/km) is expressed as where k is the index of the discrete time step; T is the discrete sample time; L i is the length of the segment; q i (veh/h) and q i−1 (veh/h) are the inflow of the traffic in segments i and i − 1, and finally, r i (veh/h) is the sum of the controlled ramp inflow. Another important relation of the traffic dynamics is the fundamental relationship which creates a connection between outflow q i (k), traffic density ρ i (k) and average traffic speed v i (k) [26]. The fundamental relationship is formed as Conventionally, the fundamental relationship is derived through historic measurements and depends on several factors, as can be seen in, e.g., [27,28]. v i (k) and q i (k) are formed in the traffic flow model as nonlinear functions of traffic density [29], such as v i (k) = F v (ρ i (k)) and q i (k) = F q (ρ i (k)), where F v , F q are nonlinear functions. In the modeling of the traffic flow dynamics, the impact of κ on v i (k) can be considered in the nonlinear function of F , such as Function F can be effectively formulated through polynomial relationships [30]. An example of q i , which depends on the ρ i and the presence of automated vehicles in the traffic, is illustrated in Figure 6. The dynamics of the traffic flow based on (15) and (17) is written as which contains the nonlinear characteristics of the fundamental diagram F . The advantage of the expression (18) is that it incorporates the nonlinear behavior of the traffic flow dynamics, and thus, (18) can be effective for the long-term prediction of q i (k + 1) through ρ i (k + 1). However, F is derived based on historic measurements, which means that it has an increased error in terms of short-term prediction. Consequently, (18) uses only a small number of actual data.
The following model combines the data-driven prediction model and the conventional traffic model. The purpose of this solution is to eliminate the drawbacks of each method in the prediction. The highway in the case of the conventional traffic model is formed as a queue with one segment i = 1, while in the case of the data-driven model, the highway is divided into N number of sections. The prediction of the outflow of the highway is as follows: (19) where q N (k + 1) is the prediction from the data-driven model and q 1 (k + 1) is the prediction from the traffic model. The following form is applied: q 1 (k + 1) = F (ρ i (k + 1), κ(k + 1)), in which κ(k + 1) = κ(k). Moreover, the forgetting factor λ(p) ∈ [0; 1], is introduced, which depends on the step of the prediction p. In the case of p = 1, λ(p) has high value, while the increase in p leads to the reduction in λ p . Through the modification of λ, a balance of the prediction can be achieved. In a short-term prediction, q N has priority, while in a long-term prediction, q 1 has priority. The prediction of the outflow at k + 2 is formed as The factors are the following. q 1 (k + 2) is expressed in the following form: and thus, q out (k + 1) is also used in the prediction. q N (k + 2) is expressed based on (14), such as where x(k + 1) results from (14) and u(k) = r 1 (k) for all k.
Through the proposed traffic model, the prediction of the traffic flow can be updated through the results of the data-driven analysis. Through the increase in the value p, the impact of the data-driven model through λ(p) is reduced; consequently, the emphasis of the nonlinear characteristics of the traffic flow is considered.

Optimal Control Design for Traffic Flow Maximization
The design of an optimal control was based on the previously formed enhanced traffic flow model. The purpose of the control design is to guarantee the maximum outflow of the traffic network q N through the inflow of the controlled ramp r i . Since the system contains disturbances, their impact on q N must be reduced. This leads to an optimization task.
In the control design of the traffic system, the following performance requirements must be guaranteed.

1.
It is necessary to achieve the maximum outflow of the traffic network q out (k + 1), such as: where p max represents the length of the horizon. This performance specification is advantageous compared to the classical solution, which is based on the setting of ρ i related to critical density ρ i,crit , as can be seen in, e.g., [1]. Namely, the setting of ρ i requires preliminary knowledge on the critical density of the traffic flow. Nevertheless, it depends on several factors, as can be seen in [31]. At the same time, in the proposed control strategy, data are obtained from the traffic system through data-driven analysis. This means that the result of the optimization can be effective without a fixed fundamental diagram.

2.
In the control task, the control inputs must be as small as possible. The control input r i (k), i = 1 . . . N is a positive variable, which has physical limits. Moreover, it is necessary to limit its variation to prevent the rapid change during the actuation. Therefore, two constraints on r i (k) are defined, such as where r i,max is the maximum of r i (k) and ∆r i is its maximum variation.
The traffic system has also disturbances within the inflow data, q 0 (k) . . . q 0 (k + p max ) because the number of future entering vehicles is unknown, even if the routes of the automated vehicles are assumed to be known. Due to the κ ratio of automated vehicles, q 0 (k) . . . q 0 (k + p) can be divided into known (q + 0 (k + p)) and unknown (q u 0 (k + p)) disturbances in the following way: The known disturbances can be incorporated in the traffic flow maximization procedure as constant values. Thus, only a part of the disturbances, i.e., the unknown disturbances, can be handled in a worst-case scenario. The form of the control task requires three components. First, the performance in (23) was used as an objective of optimization. Second, the performance (24) is handled as a constraint in the control design. Third, disturbance q u 0 (k + p) in (25) must be minimized in z 2 . Thus, the control design leads to a min-max task: with the following constraints: where q u 0,min (k + p), q u 0,max (k + p) are the bounds of the unknown disturbance. The results of the optimization are the intervention signals r i (k), . . . r i (k + p max ) and the values of the unknown disturbances q u 0 (k) . . . q u 0 (k + p max ). The control signal at time step k is r i (k). The solution of task (26) requires the joint handling of the minimization and maximization tasks. In practice, these are separated and an iterative solution is applied.
• First, the minimization task is solved for initial fixed values q u 0,min (k + p), q u 0,max (k + p). The solution is achieved by an optimization algorithm, which is able to handle nonlinear constraints, as can be seen in, e.g., [32]. This results in r i (k), . . . r i (k + p max values, which are used as fixed values during the solution of the maximization task in (26). • Second, the maximization task is also solved by using q u 0,min (k + p), q u 0,max (k + p) values. The maximization task also results in r i (k), . . .
The iteration procedure is stopped, when the relative errors of the solutions q u 0,min (k + p), q u 0,max (k + p), r i (k), . . . r i (k + p max ) in the actual and the previous steps are smaller than a predefined value.

Simulation Examples
In this section, the effectiveness of the proposed method is illustrated through simulation examples. In the examples, a 5 km-long highway section with three lanes for was selected illustration purposes. In the example, κ = 35% was set, meaning that the traffic flow contains a significant number of automated vehicles. Several simulations were performed in the VISSIM traffic simulator in order to determine the function F . The illustration of a selected scenario is shown in Figure 7a. Figure 7a shows the variation of traffic flow volume, depending on highway section (axis X) and on time (axis Y). Here, the values of the volume in several small sections of the highway depending on the time are shown. There is a congestion at the end of the highway section at 10 min, which has a significant impact on the entire highway section. The reduced traffic flow has reached the 1 km section point at 20 min. The example shows that the dynamics of the traffic flow is close to the experience in the context of highway scenarios. In the illustration, the simulated data are synchronized, which is guaranteed by the data acquisition process in VISSIM. Figure 7b shows the derived fundamental diagram of the highway. The data for the determination of F were obtained by various VISSIM simulations. The volume-density pairs of all simulation scenarios on this figure were matched through their time stamp in VISSIM. F was approximated to a sixth-order polynomial form. The effectiveness of the outflow prediction is illustrated in Figure 8. The example presents the effectiveness of the integration of the data-driven and the classical traffic models for the prediction of the forthcoming traffic flow. Figure 8 shows the real outflow, which is approximated in three ways. First, q 1 illustrates the prediction made by the classical traffic model (18). In the simulation, the model uses q 0 (k) as new data and ρ 1 (k) as its state in all time step k. Although the model predicts the forthcoming congestion for high p values, it has a higher prediction error at low p values. Second, the data-driven prediction is shown by q N . The prediction model (14) uses q 0 (k) as new data for all k step and x(k) states in the computation of x(k + 1). This model is unable to predict nonlinearities due to its linear form. Consequently, q N predicts the real outflow for small p values, while for increased p values, q 5 significantly differs from the real-traffic flow. Third, the results of the interconnected model are shown in Figure 8. In the interconnection between q 1 and q N , the λ function is selected as λ = 1 − p/10 if p < 10, otherwise λ = 0. The selection guarantees the smooth transition between the interconnected signals. The predicted outflow approximates the real outflow on the entire term of the prediction. The operation of the controlled system is shown in Figure 9. In the example, the 5 km-long-highway section has an on-ramp at the highway segment 1 km. The inflow on the ramp r can be controlled by traffic light. The purpose of the control is to guarantee the maximum outflow at the end of the highway section through the actuation of r. The inflows represent a traffic scenario, in which there is a rush hour in the middle of the simulation. The q 0 inflow is shown in Figure 9a and the inflow demand on the on-ramp r dem is shown in Figure 9b. The inflow r dem is limited by the computed r, which results in the real inflow r real . In the simulation, the variation limit is set to ∆r i = 100 veh/h. Another result of the min-max optimization task is q u 0 , which is illustrated in Figure 9c. The limits of q u 0 are q u 0,min = 0 veh/h and q u 0,max = 400 veh/h. The role of the computation of q u 0 is to characterize the worst-case scenario. For example, between 250 s and 1000 s, the worst-case is q u 0 = q u 0,max , which facilitates reduced outflow through a congestion. Although q u 0 overestimates the real value of the unknown q u 0 , it guarantees the avoidance of the traffic jam. The achieved outflow of the highway is illustrated in Figure 9d. It can be seen that the performance of the control, i.e., the maximization of the outflow, is achieved.   Figure 10a illustrates the scenario without control, which means that the r dem inflow enters the highway. Due to the increasing q 0 and r dem , the traffic flow on the highway is over saturated, which leads to a congestion, between 900 s and 1500 s. The comparison of Figures 9d and 10a shows the effectiveness of the traffic control system, i.e., the outflow is significantly higher due to the avoidance of congestion.
The following simulation example is shown in Figure 10b,c. In this scenario, the control strategy is modified by the simplification of the optimization task (26), which leads to the following form: such that r i (k + p) ≤ min(r i,max , r i (k + p − 1) + ∆r i ), ∀i ∈ [1; N], p ∈ [1; p max ].
This means that the worst-case scenario is not considered during optimization. This results in increased inflow on the ramp, as can be seen in Figure 10b. In this scenario, the limitation of the inflow started at 450 s, while in the scenario of Figure 10b, r is reduced after 250 s. Although the achieved outflow is similar until 1000 s, its significant reduction in the simulation between 1000 s and 1500 s was due to the over saturation of the traffic flow.

Conclusions
This paper proposed an effective method for the prediction and maximization of traffic outflow on a highway. The results of the simulations showed that the prediction through the integration of data-driven and model-based methods can be improved compared to each prediction. Moreover, the examples on the control illustrated that maximum outflow during the entire traffic flow process can be achieved. Therefore, the proposed prediction and control design methods can be efficiently used as a theoretical control basis for highway traffic applications. Nevertheless, the proposed method requires a large number of data, from which the data-driven model is to be generated. Since the data-driven and modelbased prediction blocks contain a large number of parameters, the implementation of the method for different types of highway sections (e.g., varying lane numbers and section lengths) can require the calculation of parameters.
The future challenge of the method is to improve the model-based prediction through learning-based features. Instead of using parameters in physical-based relationships, a possible way to achieve the traffic flow model is to tune the model parameters through the learning process. It requests enhanced learning features, with which the structure and the parameters of the model can be automatically selected, as can be seen in, e.g., [33]. Using this improvement, the advantages of the model-based prediction on an increased horizon can be exploited.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.