Flow Control in Wings and Discovery of Novel Approaches via Deep Reinforcement Learning

: In this review, we summarize existing trends of ﬂow control used to improve the aerodynamic efﬁciency of wings. We ﬁrst discuss active methods to control turbulence, starting with ﬂat-plate geometries and building towards the more complicated ﬂow around wings. Then, we discuss active approaches to control separation, a crucial aspect towards achieving a high aerodynamic efﬁciency. Furthermore, we highlight methods relying on turbulence simulation, and discuss various levels of modeling. Finally, we thoroughly revise data-driven methods and their application to ﬂow control, and focus on deep reinforcement learning (DRL). We conclude that this methodology has the potential to discover novel control strategies in complex turbulent ﬂows of aerodynamic relevance. which provides a classiﬁcation of the ﬂow and conﬁdence in the prediction. The model was validated in a realistic aircraft with trailing-edge separation.


Introduction
Over the past decades, aviation has become an essential component of today's globalized world: before the current pandemic of coronavirus disease 2019 (COVID- 19), over 100,000 flights took off everyday, allowing the transportation of people and goods and the establishment of global commercial relations. Despite the significant impact of the pandemic on the aviation sector, a number of studies indicate that, after the pandemic, its relevance in the transportation mix will be similar to that before COVID-19 [1,2]. Aviation alone is responsible for 12% of the carbon dioxide emissions from the whole transportation sector and for 3% of the total CO 2 emissions in the world [3]. On the other hand, fuel represents around 40% of the costs in regular airlines, corresponding to hundreds of millions of dollars spent yearly [4]. This has important implications in the context of the Sustainable Development Goals (SDGs) 9 (on sustainable infrastructure) and 11 (on sustainable cities), as well as having an important impact on SDGs 3 (on health) and 13 (on climate change) from the United Nations (UN) 2030 Agenda [5][6][7]. Due to the major environmental and economic impacts associated with aviation, it is desirable to improve the aerodynamic performance of airplane wings, with the aim of reducing the fuel consumption and emissions associated with air travel.
In order to develop more efficient wings, it is necessary to reduce the losses associated with their movement within the surrounding fluid. This implies reducing the force parallel to the incoming flow (the drag), and one of the strategies to achieve such a reduction is to perform flow control. A wide range of methods aimed at controlling the flow to reduce the drag have been reported, and some have documented net-energy savings, i.e., taking into account the energy spent on the control, as documented by Fahland et al. [8]. These strategies include passive methods, such as riblets [9], which are drag-reducing surfaces proven to

Active Control of Turbulent Flows
A widely used method of predetermined active flow control is uniform blowing/ suction. The first wind-tunnel experiments using the micro-blowing technique (MBT) [19] reported that it is possible to achieve a significant drag reduction with relatively moderate blowing, as well as to have net-energy savings in full-scale applications. More recent studies have confirmed this possibility, investigating the effects of MBT on more complex geometries and on adverse-pressure-gradient (APG) turbulent boundary layers (TBLs). A detailed description of the MBT technique is provided by Hwang [20], and Kornilov [21] discusses more recent developments, particularly regarding experimental results. On the other hand, high-fidelity numerical simulations have been used to better characterize the interaction between control and wall-bounded turbulence. One of the first numerical studies investigating TBLs with blowing and suction is that of Park and Choi [22], who employed direct numerical simulations (DNSs, in which all turbulent scales are resolved) and considered a Reynolds number based on displacement thickness δ * and freestream velocity U ∞ of Re δ * = 500. Kametani and Fukagata [23] performed DNS of a zero-pressuregradient (ZPG) TBL with blowing and suction at Reynolds numbers based on momentum thickness Re θ between 300 and 700, with intensities up to 1% of U ∞ . They also analyzed the energy input associated with uniform blowing to estimate the upper bound of control efficiency, and they confirmed that it is theoretically possible to achieve net-energy savings. As expected, uniform suction has opposite effects. Later, other numerical simulations (more relevant to the full-scale conditions) have been performed at higher Reynolds numbers. For instance, Kametani et al. [24] carried out high-resolution large-eddy simulations (LESs, where only the smallest turbulent scales are modelled) of a ZPG TBL at Re θ = 2500, considering blowing and suction with an intensity of 0.1% of the freestream velocity. These authors achieved more than 10% drag reduction despite the relatively low blowing intensity.
The numerical studies discussed above are focused on the description of the effect of blowing on a spatially developing ZPG TBL, which is an idealized study case. Firstly, more realistic scenarios exhibit more complex turbulent flows [25], including as pressuregradient TBLs [26] and finite aerodynamic bodies [27,28]. In these cases it is not trivial to generalize the control techniques. Secondly, the skin-friction reduction is beneficial in engineering applications only if it corresponds to a reduction of total drag (which includes additional components as discussed above), and/or to an improvement of the aerodynamic efficiency (defined as the lift-to-drag ratio L/D). For these reasons, the two following experimental studies on the effects of blowing and suction in airfoils are of particular relevance. We first discuss the work by Eto et al. [29], who applied a blowing intensity of 0.14% of U ∞ to the suction side of a Clark-Y airfoil at Re c = 1, 600, 000 (where Re c is the Reynolds number based on U ∞ and wing chord c). They observed a local reduction in the skin friction between 20 and 40%, but they also reported an increase in the total drag. On the other hand, Kornilov et al. [30] carried out experiments on a NACA0012 airfoil at Re c = 700, 000, applying blowing and suction over both sides of the airfoil. They confirmed that blowing over the suction side does not reduce the total drag, but they also observed that blowing over the pressure side and suction over suction side have a beneficial effect, achieving a reduction in total drag of around 10%. This highlights the additional difficulty of performing control in wings, where the various contributions to the total drag are tightly coupled. One of the first high-fidelity simulations of turbulent wings with control was conduced by Vinuesa and Schlatter [31] in 2017. In that work, as well as in more recent studies [32], high-resolution LES was used to study the turbulent flow around a NACA4412 wing section, up to Re c = 400, 000, where different combinations of blowing and suction rates over the suction and pressure sides were applied. Using predetermined active flow control, they achieved a maximum increase in the aerodynamic efficiency of 11% [32]. In Figure 1 we show the effect of applying uniform blowing on the suction side of the wing section, which leads to an increase in boundary-layer thickness and turbulence activity far from the wall. This produces a reduction in the wall-shear stress and an increase in the pressure drag, leading to a higher total drag; note that the opposite holds for uniform suction [32]. Another interesting numerical work is that of Albers et al. [33], who performed LES on an airfoil at Re c = 400, 000 to assess the effect of transversal surface waves, reporting a drag reduction of 7.5%. As discussed below, it is possible to obtain more sophisticated and efficient control strategies by sensing the flow and exploiting all the available information on its state, i.e., by performing reactive flow control.  [34] around a NACA4412 wing section at Re c = 400, 000, colored by streamwise velocity ranging from (dark blue) −0.2 to (dark red) 1.7. The yellow line indicates the extent of the control region (with uniform blowing on the suction side), whereas the red one denotes the tripping location [35,36]. Figure extracted from Ref. [32], with permission of the publisher (Springer Nature).
A widely used method of reactive control is the so-called opposition control, where suction and blowing are introduced at the wall with the aim of suppressing the sweep and ejection events in the near-wall region, so as to reduce the skin friction [14]. Essentially, the velocity imposed at the wall v w should be opposite to the wall-normal velocity v at a certain sensing plane y s , according to the equation: Note that here x, y and z denote the streamwise, wall-normal and spanwise coordinates, t is the time, α is a positive constant and V is the instantaneous wall-normal velocity averaged over the control area. Subtracting this term ensures a zero-net-mass-flux condition at the wall. It is important to note that, through this equation, the control aims at opposing the fluctuations at a certain wall-normal location, which is typically around the near-wall fluctuation peak, i.e., y + s = 15. The superscript '+' denotes inner scaling, in terms of the viscous length * = ν/u τ , where ν is the fluid kinematic viscosity and u τ = τ w /ρ is the friction velocity (with τ w being the wall-shear stress and ρ the fluid density). The constant α is set empirically, which means that the resulting control law is relatively simple. Despite this simplicity, Stroh et al. [15] reported drag-reduction rates of around 20% in turbulent channels and boundary layers up to Re τ 660 (which is the friction Reynolds number, based on the 99% boundary-layer thickness δ 99 and the friction velocity u τ ). However, it may be possible to obtain more sophisticated control laws by formulating an optimization problem, as discussed in Section 3.

Active Control of Separation
Several studies exploring the capabilities of active flow control (AFC) actuators in high-lift devices with massive separation (i.e., suction and blowing, sweeping jets, fluidic oscillators, plasma actuators, synthetic jets) have been conducted in the literature. A brief review on the state-of-the-art of active flow control techniques for civil aircrafts was carried out by Batikh et al. [37]. Khodadoust and Washburn [38] conducted wind-tunnel measurements on a high-lift device fitted with AFC actuators. They observed that the application of a small amount of suction and blowing increased the lift performance. Khün et al. [39] simulated a 3D high-lift wing with constant blowing using RANS. The results showed that blowing can be beneficial to suppress massive separation in the flap. Radespiel et al. [40] reviewed different techniques for AFC using constant blowing, showing that tangential blowing can be promising when increasing the lift at high angles of attack. Fricke et al. [41] simulated the AFC by means of pulsed blowing to control flow separations in the wing engine junction with RANS. Later, Schloesser et al. [42] conducted experimental investigations in the same configuration. Their results showed that AFC successfully suppressed the flow separation with a lift increase and that the results are independent of the Reynolds and Mach numbers. Hue et al. [43] used a RANS-simulated constant and pulsed blowing devices and observed gains of up to 3% in the lift and the retard in separation due to the nacelle. Fluidic actuators placed at a tail of an aircraft were simulated using unsteady RANS and validated by means of experimental results by Shmilovich et al. [44]. More recently, Andino et al. [45] tested fluidic actuators in a generic tail at low speeds and demonstrated that a modest increase in the momentum coefficient can result in important increments of the side force. Whalen et al. [46] presented wind-tunnel test results of the AFC of the vertical tail of a Boeing 757 equipped with sweeping jet actuators; a significant increase in the side force at a maximum rudder deflection of 30 • was observed. Other works involving the use of sweeping jets can be found in Refs. [47,48]. The effectiveness of microjets in drag reduction was experimentally studied by Aley et al. [49] in a simplified 2D wing; a significant wake velocity deficit reduction and, thus, drag, was observed when using the microjets actuation.
To finalize, we focus on the application of synthetic jets with zero net mass flux as a promising technique for the AFC of wings. In these devices, the fluid necessary to alter the boundary layer is intermittently injected through an orifice driven by the motion of a diaphragm located on a sealed cavity below the surface [50]. Indeed, synthetic jets have been shown to succeed at reducing the fuel burnt during the operations of take-off and landing [51]. In the context of synthetic jets for AFC, there have been significant advances in the past years in airfoils (see, for instance, Refs. [52][53][54][55]). However, whether they can be implemented on a full aircraft is still subject of investigation. Recently, Jabbal et al. [56] analyzed different system architectures for AFC for real-size civil aircrafts in terms of efficiency, power requirements, and integration issues. They concluded that synthetic jets might be useful to control separation in short-duration operations. Shmilovich and Yadlin [57] studied different AFC strategies of a high-lift profile in the conditions of take-off and landing using RANS. Bauer et al. [58] conducted experiments on a two-element wing with unsteady AFC near the leading edge and showed that stall can be delayed. Lin et al. [59] addressed different strategies in the flap of a high-lift profile comprising steady suction, blowing, and periodic excitation of the boundary layer. Several of these AFC strategies are planned to be tested experimentally by NASA for increasing lift-to-drag ratios (L/D) in take-off configurations [60]. Although most of the numerical studies conducted so far have been performed using RANS, Jansen et al. [61] proved, by comparing experimental and numerical simulations, that delayed detached-eddy simulations are useful in the analysis of the effect of a synthetic jet on the flow field of a tail at Re c = 350, 000. Finally, Lehmkuhl et al. [62] have studied the aerodynamic performance of active flow control on wings using synthetic jets with zero net-mass flow by means of wall-modeled large-eddy simulations; see Figure 2. The performance of synthetic jets was evaluated for the highlift configuration of the JAXA Standard Model at realistic Reynolds numbers for landing Re c = 1.96 × 10 6 . The results show that, at high angles of attack, the control successfully eliminates the laminar/turbulent recirculations located downstream of the actuator, thus increasing the aerodynamic performance.

Turbulence Simulation Approaches
The use of computational fluid dynamics (CFD) for external aerodynamic applications has been a key tool for aircraft design in the modern aerospace industry [63][64][65]. CFD methodologies with an increasing functionality and performance have greatly improved our understanding and predictive capabilities of complex flows. These improvements suggest that the design of novel and highly reliable control strategies via CFD may soon be a reality. The fully virtual design of flow-control strategies is expected to limit the number of required wind-tunnel tests, reducing both the turnover time and cost of the design cycle [66,67]. However, flow predictions from the state-of-the-art CFD solvers are still unable to comply with the stringent accuracy requirements and computational efficiency demanded by the industry [68]. These limitations are imposed, largely, by the ubiquity of turbulence [69]. To tackle current challenges and encourage further advances in CFD, simulation of an aircraft configuration across the full flight envelope has been posed as one of the Grand Challenge Problems in the recent NASA CFD Vision 2030 [68].
From the early days of industrial CFD to present times, the treatment of turbulence has mostly been based on closure models for the Reynolds-averaged Navier-Stokes (RANS) equations. The approach appears in different flavors: from pure RANS solutions to hybrid methods, such as the detached-eddy simulation and its variants [70,71]. In the latter, RANS is utilized close to the wall, whereas the outer layer is modeled via eddy-resolving methodologies. Many RANS models (and their variants) have been devised to overcome the limitations of their predecessors, usually by expanding and calibrating its coefficients to account for missing physics. Despite the reliance of RANS-based approaches on tunable parameters and empirical correlations, they have dominated the CFD industry for external aerodynamic applications, including commercial aviation [72].
The sophistication of RANS closure models has increased over time [71]. However, no practical model has emerged as a competent approach across the broad range of flow regimes of interest to the industry. The latter encompass separated flows, afterbodies, mean flow three-dimensionality, shock waves, aerodynamic noise, fine-scale mixing, laminar-toturbulent transition, etc. In these scenarios, RANS predictions tend to be inconsistent and unreliable, especially for geometries and conditions representative of the flight envelope of commercial airplanes. An example of such deficiencies is the prediction of the onset and extent of three-dimensional separated flow in wing-fuselage junctures, in which RANSbased approaches have shown poor performance [72,73]. The RANS accuracy is also known to decline in aeroacoustic noise and vibration predictions for transonic airfoils [72]. Additional CFD experience in aircrafts at high angles of attack has revealed that RANSbased solvers have difficulty in predicting maximum lift and the corresponding angle of attack, along with the physical mechanisms for stall. This was highlighted in the third AIAA CFD High Lift Prediction Workshop [74], where RANS solutions exhibited a significant scatter in the lift, drag, and pitching moment near stall.
Recently, large-eddy simulation (LES) has gained momentum as a tool for both research and industrial applications. In LES, the large eddies containing most of the energy are directly resolved, whereas the dissipative effect of the small scales is accounted for by a subgrid-scale (SGS) model. Additionally, if the near-wall flow is also modeled (i.e., wall modeling) such that only the large-scale motions in the outer region of the boundary layer are resolved, the grid-point requirements for this wall-modeled LES (WMLES) scale, at most, linearly with an increasing Reynolds number [75]. The cost-efficiency of WMLES and its demonstrated predictive capabilities over the last decade make this approach a realistic contender to overcome the deficiencies of RANS-based methodologies.
Several strategies for modeling the near-wall region in LES have appeared in the literature, and comprehensive reviews can be found in Piomelli and Balaras [76], Cabot and Moin [77], Larsson et al. [78], and the most recent review by Bose and Park [79]. Most wall models utilize, as the input, the LES solution at a given location in the LES domain, and return the wall heat and momentum fluxes needed by the LES solver. Among the most widespread approaches are those computing the wall stress using either the law of the wall [80][81][82] or simplified RANS equations [83][84][85][86][87][88][89][90], whereas recent advances in wall modeling are rooted in mathematical and physical principles completely free of RANS empiricism [91][92][93].
Advances in machine learning and data science have also incited new efforts to complement the existing turbulence-modeling approaches in the WMLES community. One of the first attempts at using supervised machine learning for WMLES can be found in Yang et al. [94], who proposed a physics-informed neural-network (PINN) model to predict the wall stress in turbulent channel flows. Recently, Lozano-Durán and Bae [95] formulated a wall model using building block units (such as turbulent channel flows, ducts, and separation bubbles), which provides a classification of the flow and confidence in the prediction. The model was validated in a realistic aircraft with trailing-edge separation.
Radhakrishnan et al. [96] formulated a wall model using gradient-boosted decision trees and predicted the wall shear stress in a turbulent channel flow and a wall-mounted bump. Along these lines, Eivazi et al. [97] have recently reported efforts towards RANS modeling by means of PINNs.
According to the NASA Vision 2030 report [68], hybrid RANS/LES and WMLES are identified as the most viable approaches for predicting realistic flows at high Reynolds numbers in external aerodynamics. As such, both hybrid RANS/LES and WMLES will be instrumental in the development of control strategies for realistic external aerodynamic applications, as shown in Figure 3.

Data-Driven Methods for Control and Deep Reinforcement Learning
As discussed above, the flow around wings is very complex, and it is difficult to devise efficient control strategies to optimize the aerodynamic efficiency, even when having access to flow information in real time, as in the case of opposition control. One approach to obtain more efficient control strategies is to formulate an optimization problem aimed at, e.g., minimizing the drag or maximizing the aerodynamic efficiency. There have been several data-driven approaches to achieve this for flow control; for instance, in the context of genetic programming [98]. Genetic programming (GP) is based on automatically choosing the terms in a symbolic equation through the evolution and selection of the best candidates, a fact that ensures the interpretability of this method [99,100], although the formula obtained can be deeply nested and complex. The GP approach has been successfully employed for the control of external flows by Li et al. [101] and Minelli et al. [102]. Another interesting data-driven approach is Bayesian regression based on Gaussian processes [103], which was employed by Morita et al. [104] in CFD optimization, and by Mahfoze et al. [105] to identify the best combination of control region length and blowing amplitude to maximize the energy savings, also including intermittent control regions. Note that these authors also took into account the data by Kornilov and Boiko [106] to formulate a more realistic estimate of the power consumption by blowing, and they reported a net-energy saving of around 5%. It is interesting to note that other data-driven methods may help to model the near-wall region and, consequently, may provide novel venues for improved flow control [107][108][109][110][111].
One very promising data-driven approach to flow control is deep reinforcement learning (DRL), which we will focus on in the following. In DRL, an agent (usually built based on a neural network, NN) interacts with an environment (the flow) in a closed loop. At each time t, the agent receives a partial observation of the environment o t used to choose an action a t , which will influence the evolution of the environment. The agent periodically receives a reward r t , which indicates the quality of those actions under a certain norm. The goal of DRL is to find an optimal decision policy π from with the action is derived, i.e., a t = π(o t ), such that the cumulative reward is maximized. This process is summarized in Figure 4, and the goal of the DRL algorithm is to learn by interacting with the environment by gathering experience [112]. A pioneering work in this direction in the context of instability control in fluid mechanics was conducted by Rabault et al. [113], who used DRL to optimize the actuation from two jets on a two-dimensional cylinder flow. This resulted in significant drag-reduction rates using synthetic control jets blowing with a very low mass-flow rate intensity (typically a fraction of the percentage of the incoming mass flow rate intersecting the cylinder). We want to highlight that DRL is a very promising approach to discover novel and potentially more efficient control strategies that go beyond classical control, since (i) it does not make any assumptions of the properties of the system, except for the ability to establish a close-loop control and to extract a reward signal, and (ii) it takes advantage of the efficiency of NNs at representing complex, nonlinear functions following their universal approximator property [114]. The objective is to find the optimal decision policy π such that the cumulative reward is maximized.
The work by Rabault et al. [113] employed the proximal policy optimization (PPO) algorithm [115], which is an actor-critic policy gradient algorithm. The PPO algorithm is simpler and faster than other similar techniques, such as the trust region policy optimization (TRPO) methods [115], and it requires relatively little hyper-parameter tuning. Furthermore, it is more suitable for continuous control than the deep Q-network (DQN) learning [116], as well as its variations [117]. Offering a detailed overview of the PPO algorithm is beyond the scope of this review, and for a detailed discussion about the PPO algorithm, the reader is referred to either the initial PPO paper [115] or to a fluid-mechanics-focused review of the DRL and PPO method [118,119]. However, the main lines of the PPO algorithm are as follows. The PPO method is episode-based, i.e., it learns from performing active control for a limited amount of time before analyzing the obtained results and continuing with the learning process in a new episode. The learning problem is aimed at iteratively training (i.e., finding the weights of) the policy network. Denoting the set of weights of the policy NN by Θ, the aim is, therefore, to maximize the long-term discounted reward function R(t) = ∑ t γ t r t , where γ is a discount factor (usually in the range [0.95-0.99]), formulated as finding: where π Θ is the policy function described by the neural network with weights Θ, and s t is the (hidden) state of the system. In the present context, s t would correspond to the complete flow information, whereas the limited observations o t would be obtained from sensors. This maximization problem is solved by means of a gradient descent performed on the weights Θ of the network following experimental sampling of the system through interaction with the environment.
Following the initial work controlling the vortex shedding in a cylinder wake, a number of further refinements and applications have proven the potential of the method for the control of flow instabilities. Bucci et al. [120] successfully applied DRL to control chaotic systems, such as the Kuramoto-Sivashinsky equation. Paris et al. [121] investigated how sensors providing an overview of the state of the system to the DRL agent can be placed optimally. Beintema et al. [122] demonstrated efficient control of the Rayleigh-Bénard instability in a 2D channel. Tang et al. [123] proved through numerical simulations that DRL is able to perform robust control over a range of inflow conditions. Xu et al. [124] investigated in a simulation how small counter-rotating cylinders can be used to reduce the drag behind a cylinder, whereas Fan et al. [125] provided an experimental demonstration of the technique. Finally, Ren et al. [126] pushed the value of the Reynolds number to a weakly turbulent regime and demonstrated that DRL can control fluid motion in the turbulent regime.
While research articles have mostly focused on relatively simple flow configurations so far, as these are the easiest to tackle computationally for both the CFD and the DRL agent, there are a number of possible refinements in the use of the PPO algorithm that also make it a promising method for controlling more complex, 3D cases. Firstly, the PPO algorithm is able to sample data from several independent environments when performing learning. Therefore, one can effectively parallelize the DRL training by using many CFD simulations running in parallel, as was presented by Rabault and Kuhnle [127]. This allows us to drastically accelerate the training. In ( [127], speed-ups of up to 20 were reported, though the more complex the system is to control, the higher the speed-up attainable using this technique). This allows us to perform reasonably fast PPO training, even in cases when the underlying environment is difficult to speed up. For example, in the case of the CFD-based environment, this allows us to scale the training to a number of compute cores N × M, where N is the number of simulations run in parallel and M is the maximum optimal parallelization of the CFD simulation itself. Secondly, it is possible to formulate the learning problem in such a way as to take advantage of invariants in the physical system that is undergoing control, as was illustrated by Belus et al. [128]. In their work, Belus et al. [128] formulated the learning problem as a self-collaborative interaction between several clones of the same environment, as is visible in Figure 5. This, in turn, allows the DRL agent to combine the information obtained at many locations that follow the same physical rules into a single policy. Belus et al. [128] argue based on theoretical considerations and prove empirically that, without such a technique, performing learning on a system with N o outputs has a cost that scales as C N o , where C is the cost of training for a single output. This is prohibitively expensive, as the number of outputs is increased. Belus et al. [128] then demonstrated empirically that, by contrast, using the approach presented in Figure 5 allows us to perform training at a constant cost, independently of the number of outputs, as long as the control law to learn is similar at all outputs. This is a critical enabling factor for the application of DRL to realistic configurations, where many similar control outputs will be distributed across the physical system to control.

Conclusions and Outlook
Machine-learning-based control methods are an exciting set of techniques that are receiving considerable attention recently for performing active flow control. This spike in interest follows both increases in computational power and the development of effective algorithms that can learn effectively through direct interaction with black-box, complex systems. These ML methods follow a completely different approach compared with how flow control strategies have usually been designed. Instead of performing a local analysis of the flow properties by considering the flow equations and using advanced mathematical and analytical tools to find optimal perturbations, ML techniques discover a control strategy through a trial-and-error approach. There are a number of promising methods that belong to the ML family of control algorithms, including, for example, genetic programming (GP) and deep reinforcement learning (DRL). In this review, we focused on DRL methods in particular, and we discussed how recent works indicate that they can be efficiently used for controlling large, complex, non-linear systems arising from control tasks in fluid mechanics.
The efficiency of DRL has been demonstrated in a number of active flow control situations so far, and the fluid mechanics community is progressively tackling more and more complex flow configurations. In particular, recent works are pushing the use of DRL for flow control into intermediate Re values, leading to more non-linear and more complex flows, so far successfully. The next steps will be to demonstrate the DRL control of complex 3D CFD simulations and to further increase Re to reach fully turbulent conditions. While this will pose new challenges to the DRL method due to the inherent increase in complexity compared with the configurations studied so far, a number of preliminary works indicate that DRL is well adapted to controlling complex systems with a large number of control locations, and that the inherent parallelism present in the DRL experience sampling process can offer large speed-ups on complex dynamical systems.
This push to more complex systems represents not only a scientific, but also a technical endeavor. Indeed, applying DRL to 3D flow control at moderate to high Re will pose a number of technical challenges regarding the amount of CFD computational power required, the ability to handle large amounts of data, and the coupling of CFD and DRL codes that were designed independently of each other at a time when the ability to couple them was not yet foreseen. All these aspects put tough requirements on the level of both expert knowledge (few people are experts in both DRL and large scale CFD) and general technical expertise (combining several different complex software stacks into a single system, and deploying this in HPC environments). In our opinion, these technical challenges, rather than fundamental issues, are presently the main limiting factor for applying ML control to active flow control. A possible way out of this challenge is to follow the example set by the ML community and to adopt a resolute open source release policy of codes, scripts, tutorials, and trained networks to reduce the barrier to entry for new groups joining in this research direction.
Author Contributions: All authors contributed towards the ideation, writing of the original draft, and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.