Real-Time Minimization of Mechanical Specific Energy with Multivariable Extremum Seeking

Drilling more efficiently and with less non-productive time (NPT) is one of the key enablers to reduce field development costs. In this work, we investigate the application of a data-driven optimization method called extremum seeking (ES) to achieve more efficient and safe drilling through automatic real-time minimization of the mechanical specific energy (MSE). The ES algorithm gathers information about the current downhole conditions by performing small tests with the applied weight on bit (WOB) and drill string rotational rate (RPM) while drilling and automatically implements optimization actions based on the test results. The ES method does not require an a priori model of the drilling process and can thus be applied even in instances when sufficiently accurate drilling models are not available. The proposed algorithm can handle various drilling constraints related to drilling dysfunctions and hardware limitations. The algorithm’s performance is demonstrated by simulations, where the algorithm successfully finds and maintains the optimal WOB and RPM while adhering to drilling constraints in various settings. The simulations show that the ES method is able to track changes in the optimal WOB and RPM corresponding to changes in the drilled formation. As demonstrated in the simulation scenarios, the overall improvements in rate of penetration (ROP) can be up to 20–170%, depending on the initial guess of the optimal WOB and RPM obtained from e.g., a drill-off test or a potentially inaccurate model. The presented algorithm is supplied with specific design choices and tuning considerations that facilitate its simple and efficient use in drilling applications.


Introduction
Drilling a petroleum well is a complicated process with a multitude of factors that affect the drilling efficiency. Because of the high costs associated with well construction, the industry has for more than a century sought to improve drilling performance, in particular through automation and mechanization; a process which has been traced by Eustes [1]. The current state of drilling automation mainly consists of separate functionalities that can aid the driller by performing tasks like providing envelope control [2,3], fault detection [4,5], vibration mitigation [6,7] or selection of the best suited weight on bit (WOB) and drill string rotational rate (RPM) for rate of penetration (ROP) optimization [8,9]. The focus of this study is on developing an automatic system for real-time drilling optimization that automatically seeks out and maintains the WOB and RPM resulting in optimal and safe drilling for the current downhole conditions.
To apply any automated algorithm to drill more efficiently, an objective function is needed to quantify what is meant by optimal drilling conditions. In this work, we employ the mechanical specific energy (MSE) as the objective function to be minimized. The MSE is a measure of the energy required to excavate a unit volume of rock and can be expressed as a ratio between the rate of energy usage to the rate of penetration [10], which provides a relative measure of the drilling efficiency [11]. The MSE is strongly dependent on the relationship between the ROP and the applied WOB and RPM. It is expected that for a certain region of WOB and RPM values, the bit will drill at peak efficiency [12]. Increasing the WOB or RPM inside the efficient drilling region will result in corresponding proportional gains in the ROP, while the MSE decreases or stays constant. At some threshold value, often referred to as the founder point, further increases in WOB or RPM will no longer yield a proportional response in ROP. The lower than expected response in ROP is caused by a drilling dysfunction such as vibrations, bit-or bottomhole balling, which reduces the drilling efficiency and drastically increases the MSE. The founder point can therefore be identified as the combination of WOB and RPM that corresponds to the minimum MSE. If there is no specific operating point that results in minimal MSE, but rather a range of WOB and RPM values at which the MSE is minimal and nearly constant, the founder point can be identified by increasing the WOB and RPM until the MSE starts to grow [12].
It is important to note that drilling at the founder point results in high ROP and the most energy-efficient drilling, but moderately higher ROP can in most cases be obtained by increasing the WOB and/or RPM somewhat past the point of founder. Drilling with dysfunctions can however be deleterious for the bit, downhole tools and borehole quality [12,13], which can result in equipment wear and NPT by having to pull the bit prematurely [14]. The ROP that is achieved when the MSE is at its minimal value is therefore the maximal "good ROP" that can be attained without re-engineering drilling equipment or procedures [11].
In addition to drilling dysfunctions that should be avoided, there are also process constraints that the driller or an algorithm controlling the drilling must adhere to. Drilling at the founder point might not be feasible because of process constraints such as a maximal allowable ROP related to hole cleaning, an upper limit on the WOB to prevent bit damage or top-side energy constraints. In these constrained cases, the authors consider the optimal drilling conditions to be at the smallest MSE value that can be attained without violating the process constraints.
Selecting the optimal WOB and RPM is not a trivial task. Available drilling models might not be accurate enough in predicting the relationship between the ROP and related drilling parameters [15,16]. Varying downhole conditions such as changes in pore pressure or formation properties as well as degradation of the bit teeth/cutters can alter drilling efficiency so that the combination of WOB and RPM that was optimal a short time ago might no longer be the best solution. Historically, designated testing procedures like the Drill-off test [12] or five-point test [17] have been used to empirically explore how the ROP responds to various combinations of WOB and RPM. The downside of this type of "one-time testing" is that the results are only valid for the current downhole conditions, and as soon as the conditions change, the test will have to be repeated.
An alternative to optimization based on models and on "one-time testing" are approaches employing "testing on the fly". In these approaches, the relation between the WOB and/or RPM and an objective function such as the ROP or MSE is explored by performing tests while drilling ahead and selecting more optimal WOB and RPM based on the obtained information. As the downhole conditions change, the repeated tests can identify how the WOB and RPM should be adjusted to drill more efficiently, given the new circumstances. Rommetveit et al. [18] describe an approach of making changes in the WOB and RPM to gather information on how the ROP reacts to these changes. The gathered information can then be used to generate recommendations for the driller or for closed loop control by an optimization algorithm [18]. An automated golden search algorithm that varies the WOB to identify drilling with minimal MSE has been tested on a lab-scale drilling rig [19]. Field trials of advisory systems that can suggest variations in the applied WOB and RPM to search for the drilling conditions that yield the lowest MSE have been described in [20,21]. In recent years, several authors have investigated a data-driven Energies 2021, 14, 1298 3 of 35 method called extremum seeking (ES) for drilling optimization. This method relies on continuous testing and optimization based on the test results. Banks [22] explored single variable ES to minimize the MSE with a laboratory drill rig. Aarsnes et al. [23] showed with simulations that ES can be used to seek out the optimal WOB to drill with. A method for adhering to process constraints while optimizing the applied WOB with ES has also been investigated [24]. A drilling optimization system that employs multivariable ES has been tested in the field with good results [25], although no specific details on the algorithm have been provided in that paper.
Extremum seeking is a model-free control algorithm that provides a framework for automatically conducting small tests of the current operating conditions and adapting to the results of the tests to optimize the process. ES has previously been utilized in a variety of engineering systems; an extensive list is provided by Tan et al. [26]. In the context of drilling optimization, the ES algorithm can be employed to find the combination of WOB and RPM which minimizes the MSE (or some other objective function). While drilling ahead, small periodic variations in the WOB and RPM are automatically implemented by the algorithm to test the current drilling conditions. How the MSE responds to these variations is calculated and logged from real-time measurements of the relevant drilling parameters. This generates a local linear "map" of how the MSE is related to the WOB and RPM, which is used by the ES algorithm to make small adjustments in the WOB and RPM in the direction that lowers the MSE. By iteratively performing this procedure of testing and adapting to the results, the WOB and RPM will be steered to the values which result in drilling with minimal MSE. As new tests are performed and new data is recorded, older measurements are discarded from the analysis so that the information used by the algorithm is up to date and representative of the current downhole conditions. In this way, the algorithm will be able to adapt to downhole changes like drilling into a new formation where new values of WOB and RPM might be more beneficial to drill with.
The main advantage of applying the ES method for drilling optimization is that it is model-free, and therefore requires limited a-priori knowledge about the current drilling environment to be employed. When using models to predict how to drill optimally [8,9,15,16,27], the models need to be tuned based on data that is representative of the current downhole conditions. When the conditions change, the models will no longer be valid before they are re-tuned to the new circumstances, which can limit their applicability for real-time optimization. Nevertheless, the drilling models are still a valuable tool that can be combined with data-driven approaches such as Extremum Seeking. The models can provide an initial estimate of the optimal WOB and RPM to drill with, which the ES method can use as a starting point to further improve the drilling efficiency.
In this paper, we present a multivariable ES algorithm that automatically adjusts the WOB and RPM to reach drilling with a minimal MSE value. Although an application of multivariable ES to drilling was presented in [25] with successful field trials, limited details of the algorithm were provided. The algorithm presented in our paper is given in detail with a description of specific design choices and tuning considerations that lead to its simple and efficient use for drilling applications. In addition to that, the presented algorithm can automatically handle operational constraints relevant to safe drilling. The paper details several options on how this functionality can be implemented. Finally, to test the algorithm, a new qualitative model that links the ROP, WOB, RPM and Torque as well as drilling dysfunctions is presented. Without dysfunctions, the model coincides with the drilling model developed by Detournay et al. [28]. This combined model is qualitative when it comes to modelling the dysfunction effects. Yet, it represents phenomena observed in field operations where drilling with dysfunctions result in reduced ROP and high MSE [12][13][14], and can be utilized for testing of ES algorithms as well as other data-driven (model-free) drilling optimization approaches.
The remainder of the paper is organized in the following way: in Section 2, we formulate the challenge of achieving safe and efficient drilling as an optimization problem and present models that qualitatively describe the relations between the drilling efficiency Energies 2021, 14, 1298 4 of 35 in terms of MSE, drilling dysfunctions and operational constraints. These models will be used for testing the proposed algorithm in a simulation environment. In Section 3, the multivariable extremum seeking method and different techniques for constraint handling are detailed together with practical aspects on how to apply and tune the algorithm. Section 4 presents simulation results that demonstrate the performance of the proposed algorithm and highlight its properties. Section 5 contains a discussion of the results of the study, while Section 6 presents conclusions and directions for further work.

Safe and Efficient Drilling as an Optimization Problem
The overall goal in drilling optimization (when it comes to mechanical aspects of drilling) is to ensure WOB and RPM that result in drilling that is both safe for the onsite personnel and drilling equipment (including wear minimization) and provides high efficiency. To achieve this goal, the concept of MSE can be used as a performance index to identify the most efficient drilling conditions, which will generate high ROP without exposing the bit and downhole tools to excessive vibrations. The latter can accelerate equipment wear and reduce the ROP.
Although it is theoretically possible to develop accurate models describing both the rock cutting process and various dysfunctions (e.g., using bit-rock interaction models [28] and advanced proprietary drill string models [14]), such models can be of limited value for real-time drilling optimization. They require detailed knowledge of downhole conditions like mechanical rock properties, the current bit wear state and formation characteristics such as heterogeneity, anisotropy and interbedding [14,28], parameters that change over time and are hard, if possible at all, to measure while drilling. Field experience do however show that at certain combinations of WOB and RPM, downhole vibrations that can be detrimental to the ROP and drilling equipment do occur [9,14,29]. Situations where the drilling efficiency is hampered by vibrations should therefore be accounted for in any optimization approach that attempts to seek out the optimal WOB and RPM to drill with.
To study drilling optimization in the presence of vibrational effects, we have chosen an approach which qualitatively includes vibrational dysfunctions into a drilling model for polycrystalline diamond compact (PDC) bits [28], and refer to this combined model as the extended model. The extended model accounts for vibrations by reducing the ROP and thus the drilling efficiency when drilling with combinations of WOB and RPM that places the operation in regions with expected vibrations. The extended model is qualitative when it comes to modelling the dysfunction effects. Yet, it represents phenomena observed in field operations where drilling with dysfunctions result in reduced ROP and high MSE [12][13][14]. When applying static models to replicate the bit/rock interaction, as is commonly done in the literature [28,30], the model variables such as the WOB, RPM and ROP need to be averaged over a suitable time-window for the model to be representative [28]. The same logic is applied in the extended model; it will not capture the dynamics of the dysfunction effects, but it will on average qualitatively represent drilling responses that could be seen in field operations. Because the underlying drilling model [28] in the extended model is defined for PDC bits, we focus on vibrational dysfunction effects, which tend to dominate bit dysfunction with PDC bits [12]. Yet, the extended model could be applied to qualitatively account for other types of dysfunction such as bit-and bottomhole balling as well.

Drilling Model
The drilling model developed by Detournay et al. [28] is used in this work as a base case scenario to simulate the drilling response of a PDC bit operating under ideal conditions. What is meant here by ideal conditions is that the drilling response for a given bit and formation is fully determined by the interface laws proposed by Detournay et al. [28], which define static relationships between the WOB, RPM, ROP and the bit torque (T) based on bit and formation properties. Drilling dysfunctions such as vibrations are however not covered by this drilling model and will be introduced in the next section. The Detournay Energies 2021, 14, 1298 5 of 35 model relies on the existence of three distinct drilling regimes that relate the amount of applied WOB and the resulting ROP (for a given RPM), separated into

•
Phase I drilling, where the WOB is not adequate to force the cutters to fully engage the formation, resulting in inefficient drilling. It is postulated that this inefficiency is caused by the cutters having a blunt underside, a wear flat, which supports some of the WOB and is a source of friction that does not contribute to the excavation of rock.
In phase I, drilling with higher WOB will increase the depth of cut, which translates to higher ROP. At the same time, the increased depth of cut will expose a larger area of the wear flats to contact with the formation, which in turn makes the wear flats carry more WOB. The WOB being translated partly to increased cutting action and partly as friction on the wear flats continues until a threshold WOB which marks the onset of the next drilling phase. An ideally sharp bit will in theory never drill in phase I, as it has no wear flats. • Phase II drilling, which is characterized by efficient drilling with the bit acting incrementally as an ideally sharp bit. At the onset of phase II drilling the contact forces between the wear flat and the formation are fully engaged. Further increases in WOB value will result in the rock deforming beneath the cutters without any increase in the contact area between the wear flat and formation. An increase in WOB while in phase II will be transferred solely to increasing depth of cut and correspondingly increasing ROP at peak efficiency, up to a point where a drilling dysfunction starts diminishing the efficiency of the cutting action. • Phase III drilling, where an increase in contact forces between the bit and formation results in less of the applied WOB being translated to cutting action, which leads to a reduction in depth of cut and less efficient drilling. The onset of phase III drilling is referred to as the founder point and is often considered the optimal conditions to drill at [12,31].
The relationship between the applied WOB and RPM and the resulting bit torque (T) and ROP in phase I and phase II drilling can be expressed as [28]: where the asterisk subscript signifies the transition point between phase I and phase II drilling, which is determined by bit bluntness and the formation strength. The values of ROP * and T * correspond to the ROP and torque at a weight on bit of WOB * . The parameter r is the bit radius, and c 1 , c 2 , c 3 and c 4 are model parameters dependent on bit and formation properties.
Equation (1) can be viewed as a calculated depth of cut per bit revolution, determined by the model parameters and the applied WOB, which is multiplied with the RPM to find the equivalent ROP. The torque can be observed from Equation (2) to be independent of the RPM, as is often assumed in drilling models [32]. The modelled drilling response from Equations (1) and (2) for a relatively sharp 12 1 4 " diameter PDC bit drilling through a generic formation A is shown in Figure 1, where the transition between phase I and phase II drilling occurs at a WOB value of approximately 2700 kg. As Equations (1) and (2) do not account for phase III effects, Figure 1 shows drilling at high efficiency throughout the investigated WOB and RPM interval after the onset of phase II drilling. In real world drilling operations, the ROP response to increasing WOB and RPM will at some point deviate from the ideal phase II drilling, but the ROP response in region III is not unique and depends on the loading path [28] as well as the dysfunction which causes the foundering to occur [12,31]. Region III drilling is therefore not explicitly included in the Detournay drilling model [28]. A qualitative way of including vibrational drilling dysfunctions in the model is proposed in the next section. occur [12,31]. Region III drilling is therefore not explicitly included in the Detournay drilling model [28]. A qualitative way of including vibrational drilling dysfunctions in the model is proposed in the next section.

Drilling Dysfunctions and Constraints
There are a multitude of factors that can affect the drilling efficiency. For an efficient bit drilling with the expected depth of cut, the ROP will increase linearly with applied WOB or RPM as shown in Figure 1, unless a dysfunction reduces the drilling efficiency or a constraint limits the application of additional input energy [12,31]. The factors that influence the ROP can in general be grouped into two categories [13] • Foundering effects that reduce the efficiency of energy transferal between the bit and the formation, which causes inefficient drilling. They can be caused by vibrations such as stick-slip and whirl, as well as bit or bottomhole balling. These dysfunctions will result in ROP values that are lower than what would be seen with an efficient bit for a given WOB and RPM. • Energy input limiters, which constrain the amount of energy that can be applied through the input parameters WOB and RPM when drilling. In the case when the input energy is constrained before the onset of foundering effects, the bit would still be able to drill more efficiently at higher values of WOB and/or RPM, but because of a system constraint these parameters cannot be increased. A multitude of input energy limiters have been reported in the literature, such as a maximal WOB or RPM determined by bit or bottom hole assembly (BHA) design, a maximal ROP dictated by hole cleaning or solids handling capacity on the surface, a maximal top drive torque rating or top-side vibrations [8,9,13].
The onset of foundering effects and non-bit limiters can in many cases be extended to higher values of WOB and RPM through reengineering of the drilling equipment [13], but such considerations are beyond the scope of this study. Here, we rather focus on the existence of these effects and how they can be qualitatively included in a drilling model to explore the performance of a data-driven optimization technique in drilling simulation scenarios.
Critical values of RPM and WOB that trigger the onset of whirl and stick-slip vibrations are heavily affected by bit and BHA characteristics, as well as mechanical rock properties [14]. For an appropriately designed drill string, it is expected that there is a region

Drilling Dysfunctions and Constraints
There are a multitude of factors that can affect the drilling efficiency. For an efficient bit drilling with the expected depth of cut, the ROP will increase linearly with applied WOB or RPM as shown in Figure 1, unless a dysfunction reduces the drilling efficiency or a constraint limits the application of additional input energy [12,31]. The factors that influence the ROP can in general be grouped into two categories [13] • Foundering effects that reduce the efficiency of energy transferal between the bit and the formation, which causes inefficient drilling. They can be caused by vibrations such as stick-slip and whirl, as well as bit or bottomhole balling. These dysfunctions will result in ROP values that are lower than what would be seen with an efficient bit for a given WOB and RPM. • Energy input limiters, which constrain the amount of energy that can be applied through the input parameters WOB and RPM when drilling. In the case when the input energy is constrained before the onset of foundering effects, the bit would still be able to drill more efficiently at higher values of WOB and/or RPM, but because of a system constraint these parameters cannot be increased. A multitude of input energy limiters have been reported in the literature, such as a maximal WOB or RPM determined by bit or bottom hole assembly (BHA) design, a maximal ROP dictated by hole cleaning or solids handling capacity on the surface, a maximal top drive torque rating or top-side vibrations [8,9,13].
The onset of foundering effects and non-bit limiters can in many cases be extended to higher values of WOB and RPM through reengineering of the drilling equipment [13], but such considerations are beyond the scope of this study. Here, we rather focus on the existence of these effects and how they can be qualitatively included in a drilling model to explore the performance of a data-driven optimization technique in drilling simulation scenarios.
Critical values of RPM and WOB that trigger the onset of whirl and stick-slip vibrations are heavily affected by bit and BHA characteristics, as well as mechanical rock properties [14]. For an appropriately designed drill string, it is expected that there is a region of WOB and RPM which is not notably affected by vibrations, while a combination of high WOB and low RPM can result in stick-slip vibrations, low WOB and high RPM can result in forward whirl, and a combination of high WOB and high RPM can induce backward whirl [9,14,29]. Figure 2 shows the concept of different regions in the WOB-RPM plane where the drilling process can be affected by vibrations, together with the ROP contours calculated from the Detournay model for formation A. The shaded center region in Figure 2 where one would drill with an acceptably high ROP while not being affected by the foundering effects was dubbed the optimum zone by Wu et al. [14], as it is in this region the combination(s) of WOB and RPM which results in the most efficient drilling can be found. The locations of the dysfunction regions for formation A, as seen in Figure 2, are generically placed in the WOB-RPM plane to qualitatively represent a scenario where there is an optimum zone surrounded by regions where dysfunctions will occur [14].  [9,14,29]. Figure 2 shows the concept of different regions in the WOB-RPM plane where the drilling process can be affected by vibrations, together with the ROP contours calculated from the Detournay model for formation A. The shaded center region in Figure  2 where one would drill with an acceptably high ROP while not being affected by the foundering effects was dubbed the optimum zone by Wu et al. [14], as it is in this region the combination(s) of WOB and RPM which results in the most efficient drilling can be found. The locations of the dysfunction regions for formation A, as seen in Figure 2, are generically placed in the WOB-RPM plane to qualitatively represent a scenario where there is an optimum zone surrounded by regions where dysfunctions will occur [14]. To incorporate vibrational foundering effects in the drilling model described by Equations (1) and (2) in a qualitative way, a penalty term proposed by the authors is included in the model. The penalty is formulated by defining limits in the WOB-RPM plane at which the dysfunctions start to occur, as illustrated in Figure 2. When drilling with a combination of WOB and RPM that places the operation in a region that is not affected by vibrations, the drilling response is dictated entirely by Equations (1) and (2). When drilling in the regions where vibrations are occurring, the proposed penalty term reduces the ROP calculated from Equation (1) by an amount that is dependent on the specific dysfunction and how far into the dysfunction region we are operating. This logic mimics the response seen in field operations for a bit drilling with a dysfunction; if we keep increasing the WOB and/or RPM further into the dysfunction regions, the experienced ROP will deviate further and further away from the straight-line ROP response that was expected if the bit was still drilling efficiently [12,13].
In this modified model, which we refer to as the extended model, the torque is not affected by the dysfunctions and is calculated from Equation (2) for all values of WOB and RPM. This property can be argued for from an MSE perspective. In the field, drilling with vibrational dysfunctions can reduce the drilling efficiency to the extent that the energy To incorporate vibrational foundering effects in the drilling model described by Equations (1) and (2) in a qualitative way, a penalty term proposed by the authors is included in the model. The penalty is formulated by defining limits in the WOB-RPM plane at which the dysfunctions start to occur, as illustrated in Figure 2. When drilling with a combination of WOB and RPM that places the operation in a region that is not affected by vibrations, the drilling response is dictated entirely by Equations (1) and (2). When drilling in the regions where vibrations are occurring, the proposed penalty term reduces the ROP calculated from Equation (1) by an amount that is dependent on the specific dysfunction and how far into the dysfunction region we are operating. This logic mimics the response seen in field operations for a bit drilling with a dysfunction; if we keep increasing the WOB and/or RPM further into the dysfunction regions, the experienced ROP will deviate further and further away from the straight-line ROP response that was expected if the bit was still drilling efficiently [12,13].
In this modified model, which we refer to as the extended model, the torque is not affected by the dysfunctions and is calculated from Equation (2) for all values of WOB and RPM. This property can be argued for from an MSE perspective. In the field, drilling with vibrational dysfunctions can reduce the drilling efficiency to the extent that the energy consumption at the bit is more than an order of magnitude higher than what the rock strength would indicate [33]. This implies that either the torque continues to grow with the applied WOB also in the dysfunction region while the ROP is moderately reduced, or that the torque stays constant or decreases while the ROP is severely reduced as a response to increasing WOB. The former logic is applied in the extended model. Exactly how the torque and ROP reacts to drilling with dysfunctions cannot be captured adequately by a static model like the one we are proposing, but the model will be able to qualitatively capture the expected behavior of reduced ROP and increased MSE when drilling in the dysfunction regions.
The penalty functionality is implemented by means of straight-line functions (as shown in Figure 2) that mark the onset of drilling dysfunctions, but the method we propose is generic and could be applied to other curves as well. The method is in the following explained by an example of drilling with backward whirl, but the same logic applies to the other dysfunctions as well. If we are currently drilling ahead at an RPM of 150 and a WOB of 11,500 kg, Equation (1) predicts that the resulting ROP will be approximately 45 m/h in formation A, as can be seen from the contour lines in Figure 2. A penalty for drilling in the whirl region is calculated based on how far into the dysfunction region we are operating, which can be quantified by: In Equation (3), WOB and RPM are the current operating parameters, WOB' and RPM' signifies the point on the dysfunction curve closest to the operating parameters, and WOB max and RPM max are normalizing values of 20,000 kg and 200 RPM, respectively. The normalization is performed to assign approximately equal weight to the WOB and RPM when calculating the parameter L, which is a normalized measure of how far into the dysfunction region we are operating. When drilling in regions that are not affected by the dysfunctions, the parameter L is set equal to zero. Equation (3) is used to find the magnitude of the penalty, R, from: where S is the smoothstep function, which is a clamping function that gives smooth s-shaped output values between 0 and 1. Using Equation (4) to calculate the penalty, the ROP will only be marginally reduced when drilling slightly into any of the dysfunction regions where L will take on small values, and more severely affected as L grows. The parameter m in Equation (4) is a model constant that can be used to customize how much the ROP is penalized by the different dysfunctions, so that e.g., whirl can have a stronger negative impact on the ROP than stick-slip [12]. In this work, the authors have used generic values of m = 1 to calculate the penalty in the forward and backward whirl regions, and m = 0.5 for the stick-slip region. When drilling at a point that simultaneously falls within two dysfunction regions, e.g., in the intersection between the stick-slip and backward whirl regions at an RPM value of 100 and a WOB value of 16,000 kg, the calculated penalty is the sum of the penalties incurred for drilling in both dysfunction regions. The output ROP from the extended model is calculated from: where the parameter ROP D signifies the ROP calculated from the "ideal" Detournay model in Equation (1), while R is calculated from Equations (3) and (4). From Equation (5), the penalized ROP that would be output from the model when operating at a WOB of 11,500 and an RPM of 150 is reduced from 45 to 36 m/h. Figure 3 displays how the ROP varies as a function of WOB and RPM when the proposed extended model is applied to model drilling in formation A. Figure 3a shows a drilling curve for a constant RPM value of 90, where it can be observed that WOB values above 12,900 kg correspond to drilling with dysfunction, which reduces the ROP compared to the straight-line response predicted by the Detournay model. At even higher values of WOB, the penalty is further increased and the ROP starts decreasing. In Figure 3b, it can be seen from the ROP contours produced by the extended model that drilling in the dysfunction regions reduces the ROP so that the highest ROP that can be achieved in this formation is approximately 38 m/h, which occurs in the region around a WOB value of 14,000 kg and an RPM value of 120. This maximal ROP value does however correspond to drilling somewhat into the backward whirl region (as can be seen from Figure 2), and it does not necessarily represent the optimal conditions to drill at, as will be explained in the next section.
Energies 2021, 14, 1298 9 of 35 of 90, where it can be observed that WOB values above 12,900 kg correspond to drilling with dysfunction, which reduces the ROP compared to the straight-line response predicted by the Detournay model. At even higher values of WOB, the penalty is further increased and the ROP starts decreasing. In Figure 3b, it can be seen from the ROP contours produced by the extended model that drilling in the dysfunction regions reduces the ROP so that the highest ROP that can be achieved in this formation is approximately 38 m/h, which occurs in the region around a WOB value of 14,000 kg and an RPM value of 120. This maximal ROP value does however correspond to drilling somewhat into the backward whirl region (as can be seen from Figure 2), and it does not necessarily represent the optimal conditions to drill at, as will be explained in the next section.

Mechanical Specific Energy
The concept of mechanical specific energy (MSE) was investigated by Simon [34] and Teale [10] in the sixties and has since been used for applications such as drilling optimization [11,13] and lithology identification [35]. MSE is defined as the energy required to excavate a unit volume of rock, and can be expressed as [10]: where g is the gravitational acceleration constant with a value of 9.81 m/s 2 . Equation (6) can be seen as the ratio between the energy input to the drilling process and the output ROP. This ratio will assume its minimal value when drilling at peak efficiency in the transition between phase II and phase III, with higher MSE values when drilling in phases I and III [13]. It can be noted that of the two right-hand terms in Equation (6), the rightmost term will normally be larger by a substantial margin and chiefly dictate the value of the calculated MSE [10]. To calculate an MSE value that reflects the actual energy expenditure at the bit, the downhole torque should be used when using Equation (6) [11,36]. This is because friction along the drill string will cause the surface torque to be higher than the torque on bit. When used as a trending tool, the MSE calculated from the surface torque can still be applied to identify more efficient drilling, but with the risk of possible inaccuracies in the analysis caused by fluctuations in the drill string frictional losses. The authors have assumed in this work that we have access to the downhole torque values, which could come from either measurements from a downhole tool or be calculated from the topside torque with a torque and drag model.

Mechanical Specific Energy
The concept of mechanical specific energy (MSE) was investigated by Simon [34] and Teale [10] in the sixties and has since been used for applications such as drilling optimization [11,13] and lithology identification [35]. MSE is defined as the energy required to excavate a unit volume of rock, and can be expressed as [10]: where g is the gravitational acceleration constant with a value of 9.81 m/s 2 . Equation (6) can be seen as the ratio between the energy input to the drilling process and the output ROP. This ratio will assume its minimal value when drilling at peak efficiency in the transition between phase II and phase III, with higher MSE values when drilling in phases I and III [13]. It can be noted that of the two right-hand terms in Equation (6), the rightmost term will normally be larger by a substantial margin and chiefly dictate the value of the calculated MSE [10]. To calculate an MSE value that reflects the actual energy expenditure at the bit, the downhole torque should be used when using Equation (6) [11,36]. This is because friction along the drill string will cause the surface torque to be higher than the torque on bit. When used as a trending tool, the MSE calculated from the surface torque can still be applied to identify more efficient drilling, but with the risk of possible inaccuracies in the analysis caused by fluctuations in the drill string frictional losses. The authors have assumed in this work that we have access to the downhole torque values, which could come from either measurements from a downhole tool or be calculated from the topside torque with a torque and drag model. Figure 4 illustrates how the MSE varies with WOB in formation A together with the corresponding drill-off curve. The plot is generated using the extended model detailed in Equations (1)-(5) and a constant RPM value of 90. From Figure 4, it can be seen that the minimum MSE occurs at a value of approximately 12,900 kg of WOB, at the founder point at which the ROP starts deviating from straight-line phase II drilling. Higher values of ROP can be achieved by increasing the WOB past the founder point, but this increase will come at the cost of detrimental foundering effects which can damage the downhole equipment. The minimum MSE will therefore correspond to the maximal "good ROP" that can be achieved without deleterious side-effects [11]. The shape of the ROP-WOB curve in region III will determine how rapidly the MSE increases when entering this region, but as long as the ROP deviates from the efficient phase II drilling, the MSE will increase at this point. This property makes the MSE a valuable diagnostic tool for drilling optimization; as long as the MSE shows an increasing trend in regions I and III (when moving "outward" from region II drilling in either direction), the most efficient drilling can be identified by seeking out the highest WOB that does not make the MSE increase.  Figure 4, it can be seen that the minimum MSE occurs at a value of approximately 12,900 kg of WOB, at the founder point at which the ROP starts deviating from straight-line phase II drilling. Higher values of ROP can be achieved by increasing the WOB past the founder point, but this increase will come at the cost of detrimental foundering effects which can damage the downhole equipment. The minimum MSE will therefore correspond to the maximal "good ROP" that can be achieved without deleterious side-effects [11]. The shape of the ROP-WOB curve in region III will determine how rapidly the MSE increases when entering this region, but as long as the ROP deviates from the efficient phase II drilling, the MSE will increase at this point. This property makes the MSE a valuable diagnostic tool for drilling optimization; as long as the MSE shows an increasing trend in regions I and III (when moving "outward" from region II drilling in either direction), the most efficient drilling can be identified by seeking out the highest WOB that does not make the MSE increase.  Outside of this region, where dysfunctions affect the drilling efficiency, the MSE is seen to increase. This relationship can be deduced from the rightmost term in Equation (6) under the assumption that the RPM and torque are not coupled, as is the case with Equation (2). As long as the ROP scales linearly with the RPM, the MSE ratio will remain constant. In the dysfunction regions, where the gain in ROP is less than the expected linear relationship with the RPM, the numerator in Equation (6) will grow faster than the denominator. The highest RPM that can be applied without increasing the MSE above the constant minimum value in the optimum region will therefore yield the highest dysfunction-free ROP and the most efficient drilling.  Outside of this region, where dysfunctions affect the drilling efficiency, the MSE is seen to increase. This relationship can be deduced from the rightmost term in Equation (6) under the assumption that the RPM and torque are not coupled, as is the case with Equation (2). As long as the ROP scales linearly with the RPM, the MSE ratio will remain constant. In the dysfunction regions, where the gain in ROP is less than the expected linear relationship with the RPM, the numerator in Equation (6) will grow faster than the denominator. The highest RPM that can be applied without increasing the MSE above the constant minimum value in the optimum region will therefore yield the highest dysfunction-free ROP and the most efficient drilling. A contour plot detailing how the MSE varies as a function of applied WOB and RPM is shown in Figure 6. This plot is generated using the proposed extended model, where the ROP is penalized when drilling in the three dysfunction regions (as shown in Figure  2). As can be seen in Figure 6, there is a region around the point at which the WOB value is approximately 12,900 kg and the RPM value is 90, where one would drill with the minimal MSE value of 180 MPa. This point corresponds to the top corner of the optimum zone depicted in Figure 2. Moving away from this low MSE region in any direction will increase the MSE; at first with small values and then progressively larger values as we move into the different dysfunction regions where drilling is less efficient. Comparing Figures 6 and  3b, it can also be observed that the highest possible ROP values which are found in the region around a WOB of 14,000 and an RPM of 120, correspond to drilling with a dysfunction, as is reflected by the higher MSE values around this point in Figure 6.  A contour plot detailing how the MSE varies as a function of applied WOB and RPM is shown in Figure 6. This plot is generated using the proposed extended model, where the ROP is penalized when drilling in the three dysfunction regions (as shown in Figure 2). As can be seen in Figure 6, there is a region around the point at which the WOB value is approximately 12,900 kg and the RPM value is 90, where one would drill with the minimal MSE value of 180 MPa. This point corresponds to the top corner of the optimum zone depicted in Figure 2. Moving away from this low MSE region in any direction will increase the MSE; at first with small values and then progressively larger values as we move into the different dysfunction regions where drilling is less efficient. Comparing Figures 6 and 3b, it can also be observed that the highest possible ROP values which are found in the region around a WOB of 14,000 and an RPM of 120, correspond to drilling with a dysfunction, as is reflected by the higher MSE values around this point in Figure 6. A contour plot detailing how the MSE varies as a function of applied WOB and RPM is shown in Figure 6. This plot is generated using the proposed extended model, where the ROP is penalized when drilling in the three dysfunction regions (as shown in Figure  2). As can be seen in Figure 6, there is a region around the point at which the WOB value is approximately 12,900 kg and the RPM value is 90, where one would drill with the minimal MSE value of 180 MPa. This point corresponds to the top corner of the optimum zone depicted in Figure 2. Moving away from this low MSE region in any direction will increase the MSE; at first with small values and then progressively larger values as we move into the different dysfunction regions where drilling is less efficient. Comparing Figures 6 and  3b, it can also be observed that the highest possible ROP values which are found in the region around a WOB of 14,000 and an RPM of 120, correspond to drilling with a dysfunction, as is reflected by the higher MSE values around this point in Figure 6.

Drilling Optimization with Extremum Seeking
As detailed in Section 2, accurate modelling of the drilling process, which regions will be affected by dysfunctions and which combination(s) of WOB and RPM which will yield the most efficient drilling is a challenging task. Not knowing at which point drilling dysfunctions will be induced can cause the driller to use conservative limits imposed on the WOB and RPM, which can result in sub-optimal drilling. Accurate modeling of the drilling process will often require detailed knowledge of downhole parameters which cannot be measured directly and are therefore hard to obtain in real-time operations. The situation is further complicated by changes in downhole conditions which can cause models tuned to data from before the change to no longer be valid for the current circumstances.
Employing a data-driven optimization technique like ES can be used to solve these challenges, as the method does not rely on having detailed a priori knowledge of the downhole conditions. The ES algorithm relies instead on executing small tests while drilling ahead by varying the applied WOB and RPM. Real-time measurements of how drilling parameters such as the ROP, T and calculated MSE vary when the tests are performed are recorded by the algorithm. The measured response to the tests represents the most up to date knowledge on how the drilling process reacts to changes in WOB and RPM and are automatically used by the algorithm to perform optimization actions that reduce the MSE if possible. When a change in downhole conditions occurs, such as a formation shift, this will be reflected in the measured drilling parameters and the ES algorithm will be able to adapt to the new downhole circumstances.
Using the MSE as an objective function to quantify when we are drilling efficiently can be a powerful tool for drilling optimization. If the MSE exhibits the general shape shown in Figure 6; where drilling efficiently will result in lower MSE values and drilling into the dysfunction regions will make the MSE progressively increase, the proposed ES algorithm can be used to seek out the WOB and RPM that result in drilling with minimal MSE. The only a priori information that is needed is knowing the general shape of the MSE response to drilling efficiently and inefficiently, as well as some general drilling engineering knowledge that is needed to initiate and tune the algorithm. The ES method is an iterative algorithm, which means that it needs to be initiated when drilling at some WOB and RPM and use this as a starting point from which it can perform optimization actions. This starting point can be viewed as an "initial guess" of the optimal WOB and RPM, and can be based on the drillers experience, data from an offset well or an estimate provided by a drilling model.

The Extremum Seeking Algorithm
Extremum seeking is in essence a hill climbing optimization method that is applied to a process in real-time. ES works by systematically exciting the system to gather information about the current operating conditions by varying one or several controllable input variables. Real-time and recent measurements are used to calculate an objective function that quantifies the system's reaction to the excitations. Based on how the objective function changes with the variations in the input parameters, the ES algorithm will automatically make small changes to the input variables that steers them towards the values optimizing the objective function. This happens in an iterative fashion, where new measurements are continuously included in the analysis and old measurements are discarded. The optimization method does not require a model of the system, since all adjustments are performed based on measurements of how the process performs with different values and combinations of the input variables.
In this work, we consider a multivariable ES approach in which the controllable variables we seek to manipulate to drill more optimally are the WOB and RPM. The MSE, as detailed in Equation (6), is used as an objective function to quantify what combination of WOB and RPM constitutes optimal drilling. The procedure is illustrated in Figure 7, where the left-hand plot demonstrates how the ES algorithm automatically varies the WOB and RPM to investigate the drilling response in the local region marked with green shading. The right-hand tracks show the varying input variables and the resulting MSE as functions of time. It can be observed from Figure 7 that, in this case, higher values of both WOB and RPM results in lower MSE, which would prompt the ES algorithm to slowly increase the WOB and RPM, as indicated by the dotted lines. This procedure of testing and adapting to the MSE-response is performed continuously and will over time drive the system to drill at the optimal conditions that minimize the MSE. In cases where the MSE does not change when the WOB and/or RPM are varied, this is interpreted by the proposed ES algorithm as a situation where it should increase the applied WOB and/or RPM further, as explained in Section 2. Several techniques for avoiding violation of drilling constraints are proposed and implemented in the following, to ensure that the ES algorithm will adhere to process limitations while seeking out the minimal MSE. the left-hand plot demonstrates how the ES algorithm automatically varies the WOB and RPM to investigate the drilling response in the local region marked with green shading. The right-hand tracks show the varying input variables and the resulting MSE as functions of time. It can be observed from Figure 7 that, in this case, higher values of both WOB and RPM results in lower MSE, which would prompt the ES algorithm to slowly increase the WOB and RPM, as indicated by the dotted lines. This procedure of testing and adapting to the MSE-response is performed continuously and will over time drive the system to drill at the optimal conditions that minimize the MSE. In cases where the MSE does not change when the WOB and/or RPM are varied, this is interpreted by the proposed ES algorithm as a situation where it should increase the applied WOB and/or RPM further, as explained in Section 2. Several techniques for avoiding violation of drilling constraints are proposed and implemented in the following, to ensure that the ES algorithm will adhere to process limitations while seeking out the minimal MSE. The ES algorithm can be split into three main components: • The excitation signal, which varies the input variables around a base value to investigate the current drilling conditions. • Gradient estimation, which quantifies how the process reacts to the excitation signal by estimating partial derivatives of the objective function with respect to the input variables.

•
Adaptation, which adjusts the base values of the input variables with a magnitude and direction determined by the estimated gradients, to seek out drilling conditions that result in lower MSE values.
These components are detailed in the subsequent sections. Because the measurements of drilling parameters and commands given to the control system on the rig are performed at regular intervals, discrete time notation is used. It is assumed that relevant measurements are performed at a time interval of Δ seconds, and that the top drive and autodriller can receive updated setpoints for target RPM and WOB every Δ seconds. For simplicity, Δ is set to a value of 1 s. The current timestep is denoted by t, so that a command for the coming timestep is indicated by the notation + Δ . The ES algorithm can be split into three main components:

•
The excitation signal, which varies the input variables around a base value to investigate the current drilling conditions. • Gradient estimation, which quantifies how the process reacts to the excitation signal by estimating partial derivatives of the objective function with respect to the input variables. • Adaptation, which adjusts the base values of the input variables with a magnitude and direction determined by the estimated gradients, to seek out drilling conditions that result in lower MSE values.
These components are detailed in the subsequent sections. Because the measurements of drilling parameters and commands given to the control system on the rig are performed at regular intervals, discrete time notation is used. It is assumed that relevant measurements are performed at a time interval of ∆t seconds, and that the top drive and autodriller can receive updated setpoints for target RPM and WOB every ∆t seconds. For simplicity, ∆t is set to a value of 1 s. The current timestep is denoted by t, so that a command for the coming timestep is indicated by the notation t + ∆t.

The Excitation Signal
To probe the current drilling conditions, a periodic excitation signal is continuously applied to the input variables. Assume that we are currently drilling ahead with the base values WOB and RPM as initial guesses of the optimal input variables. These initial values could be based on e.g., data from an offset well or estimates given by a drilling model. The ES algorithm dictates a periodic variation in the WOB and RPM about the base values according to: where the left-hand sides signify the WOB and RPM that will be sent to the control system on the rig as setpoints. The parameters A and P are the amplitude and period of the excitation signal, d, which is given by: Equation (8) describes a square wave, where sgn is the signum function which takes a value of 1 when the argument is positive, a value of 0 when the argument is zero and a value of −1 when the argument is negative. The applied WOB and RPM prescribed by Equations (7a) and (7b) will oscillate about the base values, WOB and RPM, with amplitudes of ± A wob kg and ± A rpm rpm, respectively. Through the information gathered from the excitation signals, the ES algorithm will adjust the base values in the direction that reduces the MSE.
The induced variations in RPM and WOB can potentially influence the measured MSE to different extents and in different directions. For the ES algorithm to be able to draw conclusions as to how the two input variables individually affect the drilling efficiency, the parameters P wob and P rpm should be designed to minimize the coupling between the MSE-responses resulting from the two signals. In this work, the periods of the excitation signals are set so that P wob = 2P rpm . This tuning is illustrated in the right-hand tracks in Figure 7, where the RPM oscillates with twice the frequency of the WOB-signal. For each half-period of the WOB fluctuations, the WOB remains relatively constant while the RPM performs a full oscillation, from which the dependency between the MSE and RPM can be deduced by the gradient estimator. The frequency of the RPM signal is an even multiple of the WOB signal frequency, causing the average RPM value during each period of the WOB oscillation to be approximately RPM. This allows for estimation of the relationship between the MSE and the varying WOB as if the RPM was held constant. The tuning of the excitation signals is further explored in Appendix A.

Gradient Estimation
To estimate a local model of the MSE as a function of the applied WOB and RPM, a lest-squares approach is used in this work. As we drill ahead, measurements of the WOB, RPM, T and ROP as well as the calculated MSE are stored in buffers containing a few minutes of the most recent data. These buffers contain a sliding window time series of data that represents the most up to date information that is available about the current drilling conditions. At each update of measurements, the newest measurements are included in the buffers, while the oldest are discarded. The buffers contain data from one period of the excitation signal with the longest period time, which in this case is P wob seconds.
The excitation signals are designed to elicit responses in MSE that can be associated with each individual signal. This allows the gradient estimation to be performed by correlating the variations in measured MSE with the applied WOB and RPM. At each new timestep, ∆t, the updated buffers are used to solve the least-squares problem: In Equation (9), a wob , a rpm and b are the slopes and intercept, respectively, of the lestsquares fit. The parameters a and b represent a linear approximation (local model) of how The gradients described by Equation (10) are based on the P wob = 2P rpm seconds of the most recent measurements and represent the current best estimate of how the MSE is related to the input variables in the local region that has been explored by the excitation signals. Because of the symmetry of the excitation signals, the average values for WOB and RPM during P WOB seconds of drilling will on average be close or equal to WOB(t) and RPM(t), respectively, which is why the gradients in Equation (10) are evaluated at this point.

Adaptation
Assuming that there is a response in the MSE to the variations in the input variables, the gradients calculated from Equations (9) and (10) determine in which direction the WOB and RPM should be adjusted to reduce the MSE. When drilling in the optimum zone, the changes in MSE resulting from variations in the WOB and RPM are expected to be small. This results in zero or near zero values for the estimated gradients. When using MSE to increase real-time performance, a negative or zero gradient value indicates that drilling is efficient and the input WOB and/or RPM should be increased until the point of foundering [12]. To include this logic in the ES algorithm, a tuning parameter, k, is subtracted from the estimated gradients. This makes the algorithm see a zero gradient as a scenario where the corresponding input should be increased.
From the estimated gradients at the current timestep, the ES algorithm prescribes updated base values for the input variables for the coming timestep from: The left-hand sides of Equation (11) denote the new base values that will be used in Equations (7a) and (7b) in the next iteration of the algorithm. It can be observed from Equations (11a) and (11b) that for each iteration, the input base values, WOB and RPM, will change incrementally from their previous values with a magnitude dictated by the rightmost terms. The magnitude of this incremental change is determined by the adaptation gain, γ, and the output of the saturation function, sat, which is given by: The use of Equation (12) in combination with Equation (11b) is illustrated in Figure 8. In Equation (12), σ is a tuning parameter that determines the width of the region where the saturation function shifts from negative to positive output values. The saturation function is used to limit the maximal step size that the ES algorithm is able to implement per iteration by using the principle of sliding mode extremum seeking control [37]. As the maximal output of Equation (12) is a value of ±1, the greatest rate of change that the algorithm can demand in the input variables is given by γ. This property makes the algorithm easier to tune from a safety standpoint, as the maximal adaptation rate is explicitly stated by the parameter γ in units of kg or rpm per second. The maximal limit on adaptation rate is useful in cases where an abrupt change in drilling conditions occurs, e.g., a formation change, as the gradients calculated by Equations (9) and (10) can be erroneous in this situation. This error would be introduced by the algorithm's assumption that any changes in the MSE can be attributed to the variations in the WOB and RPM. For a large change in MSE caused by differences in lithology, the estimated gradients could become artificially large as the algorithm relates the relatively small WOB and RPM oscillations to a large change in MSE. If the adaptation was directly proportional to the estimated gradients in this scenario (as is done in conventional ES algorithms, see e.g., Tan et al. [26]), it could cause the ES algorithm to demand large and rapid changes in the WOB and/or RPM that could steer the system away from the optimum and into the dysfunction regions. It should be noted that in a case like this, the estimated gradients would only be erroneous for a brief time window before the buffers would be filled with data representative of the new formation, which would produce more accurate gradient estimates. The downside of limiting the adaptation with Equation (12) is that in cases where the estimated gradients correctly indicate that large improvements in drilling efficiency could be achieved by adapting the inputs, the rate at which the inputs are adapted to more suitable values will be limited. Weighing faster adaptation versus more robust control is an algorithm design and tuning consideration, where the authors have opted to lean towards more robust control through the use of the saturation function.
The saturation function is illustrated in Figure 8, which exemplifies how this function is applied in Equation (11b) for RPM optimization. The example parameter values σ rpm = 2, k rpm = 1 and γ rpm = 1 are used in Figure 8. It can be seen that for a gradient value of zero, the saturation function will yield an output of −0.5, which will translate to an increase of γ rpm /2 in the base value RPM for the next timestep. When drilling in the optimum zone, the estimated gradient is expected to have a low or zero value, and the proposed ES algorithm relies on the parameter k rpm to indicate that the RPM should be increased to reach the foundering point, see Section 2.3. With this configuration, the algorithm will request increasing RPM until the estimated gradient is equal to k rpm in magnitude and the saturation function's output is zero. At some point, the ES algorithm will drive the value of RPM close to the dysfunction region. Because the MSE is expected to increase drastically when drilling dysfunctions occur [13], the gradients estimated past this point will take on relatively large, positive values. A suitably small value of k rpm will therefore provide increasing RPM values up to the limit at which foundering starts to occur. If, for some reason, drilling outside of the optimal region occurred, the large estimated gradients would make the ES algorithm adapt at its maximal rate of γ rpm rpm/s to exit the dysfunction region as quickly as possible. The same logic as described above also applies to the adaptation in WOB determined by Equation (11a).
A block diagram of the proposed ES algorithm is shown in Figure 9. A loop through this diagram represents an iteration of the ES algorithm, which is continuously repeated every ∆t seconds. Starting from the lower left corner, the updated base values and excitation signal values are combined to produce new values for the WOB and RPM, which are fed as setpoints to the control system on the rig. The resulting ROP, torque, WOB and RPM values are measured and used to calculate the current MSE value. The new measurements are subsequently included in the buffers, while the oldest measurements are discarded. The updated buffers are used to estimate the current gradient values, which are translated to updated base values that are employed in the next iteration of the algorithm.  A block diagram of the proposed ES algorithm is shown in Figure 9. A loop through this diagram represents an iteration of the ES algorithm, which is continuously repeated every Δt seconds. Starting from the lower left corner, the updated base values and excitation signal values are combined to produce new values for the WOB and RPM, which are fed as setpoints to the control system on the rig. The resulting ROP, torque, WOB and RPM values are measured and used to calculate the current MSE value. The new measurements are subsequently included in the buffers, while the oldest measurements are discarded. The updated buffers are used to estimate the current gradient values, which are translated to updated base values that are employed in the next iteration of the algorithm.

Algorithm Design Choices
The Extremum Seeking method provides a whole range of algorithms and tools suitable for various applications, starting with the fundamental ES controllers described in [26,38]. The ES algorithm presented in this paper is a result of selection various elements from this toolbox to make it robust and well suited for drilling applications.
In particular, the square wave excitation signal was chosen because this is the signal shape that, for a given amplitude, gives the maximal (output) signal power and results in faster convergence to the optimal values at least for the standard ES algorithm configurations [39]. It is also expected that square excitation waves are more suitable for realizing WOB variations with a standard autodriller functionality. We can expect that due to transients and various disturbances acting on the drill string, the actual WOB and RPM realized by the autodriller and top drive may noticeably deviate from the corresponding setpoints requested by the ES algorithm. Gradient estimation by the selected least-squares method is less affected by these deviations. In addition to this, the selected gradient estimation technique accounts for changes in WOB and RPM (caused by adaptation) and will therefore calculate more accurate gradients than the standard ES method.
Finally, we have opted to use the saturation function in the adaptation block defined in Equation (11), to limit the rate of change for WOB and RPM. This makes the algorithm more robust when experiencing sudden changes in downhole conditions, e.g., a formation shift.

Constraint Handling
The ES algorithm detailed in the previous section will automatically steer the WOB and RPM towards the optimal values that minimize the MSE. As the MSE will increase greatly when foundering occurs, the ES algorithm will inherently try and avoid these dysfunctions by seeking out dysfunction-free combinations of WOB and RPM. There are however many situations where drilling at the minimal MSE is not feasible, as the drilling process is restricted by energy input limiters, as described in Section 2.2. It is imperative that the ES algorithm does not exceed these process constraints in the search for the minimal MSE.
A distinction can be made between constraints that are known a priori in the RPM-WOB plane and constraints related to process output values that are not known in advance. In the former category, a maximal WOB associated with e.g., a buckling criterion can be implemented in the algorithm with a logic condition that would not allow for increase in the WOB past a certain point, even if the algorithm recognized potential for lower MSE at WOB which would exceed the buckling criterion. A similar logic condition could enforce e.g., a maximal RPM value related to surface vibrations. In the latter category, where e.g., a maximal ROP related to hole cleaning should not be exceeded, the combinations of WOB and RPM that produce too high ROP are not known in advance and a different approach is needed. Three techniques that can be used to ensure that the ES algorithm does not violate constraints related to process outputs while searching for the optimum drilling conditions are investigated in the following. The constraint handling techniques are generic and can be applied to different types of limitations. To demonstrate the constraint handling techniques, we apply them to maximal values imposed on the torque and the ROP that the ES algorithm must adhere to.

Modified Objective Function
A practical way of making the ES algorithm avoid e.g., ROP values above a given threshold, is through modification of the objective function. Instead of trying to minimize the MSE, an objective function on the form: is used in the ES algorithm to identify the optimal combination of WOB and RPM. A function similar to Equation (13) has previously been explored to limit drilling with stick-slip [40]. In Equation (13), max is a function which outputs the largest of the input arguments and ρ is a tuning parameter that determines how much the objective function increases when drilling with higher than allowed ROP. The modified objective function, J, will start increasing when the threshold ROP is exceeded, which will make the ES algorithm avoid higher ROP values. Different constraining parameters can be added to Equation (13) in a similar fashion as the ROP term, to penalize the presence of e.g., measured vibrations or high torque, making this constraint handling technique very versatile. A downside of this approach is that if e.g., a sudden change in drilling conditions makes the ROP increase by a some margin above the threshold, the time it takes for the ES algorithm to steer the WOB and RPM to better values is determined by the rather slow adaptation rate dictated by γ. A more prudent approach would then be to use a separate control loop with the ability to modify the applied WOB and/or RPM more rapidly.

Predictive Constraint Handling
A combination of a predictive and a reactive constraint handling technique that can be used to avoid violation of a constraint related to the torque has been tested in the case of single variable ES [24]. Here, we demonstrate that these techniques can also be applied in a multivariable ES approach. It must be noted that when using the predictive constraint handling technique, it should be combined with the reactive approach, as this ensures that the constraint handling does not make the ES algorithm "get stuck".
The predictive constraint handling method [24] relies on obtaining additional information about the downhole conditions by relating changes in measured output parameters to the known variations of the excitation signals. For this purpose, the same least-squares technique as detailed in Equation (9) can be used to estimate a gradient of how the torque relates to the WOB. This technique relies on the assumption that the torque is mainly a function of the WOB, as is commonly assumed in the literature [28,32]. Considering a sliding window time series that contains measured values of the torque and WOB for the past P wob seconds, a gradient describing the current T-WOB relationship can be estimated from: ∂T In Equations (14) and (15), α and β are the least-squares slope and intercept, respectively. As the parameter α is a linear fit to how the torque has changed recently as a function of WOB, α can be used to predict what the torque will be if the WOB is increased or lowered in the region around WOB. As long as a reasonably accurate estimate of the torque gradient can be obtained, it can be used to stop the WOB from being steered to a region where the torque is higher than allowed, assuming that we are operating at a point where the torque constraint is currently not violated.
Let T avg denote the average measured torque value for the past P wob seconds. The torque constraint which we do not want to exceed is represented by T limit . To avoid the WOB being steered to values which would cause the torque to grow past the allowable limit, the following rule is imposed on the WOB adaptation gain: The rule formulated in Equation (16) takes advantage of the fact that during one oscillation of the excitation signal, the weight on bit will take on values in the range WOB(t) ± A wob . Assuming that the adaptation gain is low, the value of WOB will be nearly constant in this time interval and the average weight on bit will be approximately equal to WOB. The average torque value, T avg , will therefore correspond to drilling with a weight on bit of WOB kg. The product A wob α(t) is a projection of how much the torque will grow if the WOB is increased by a value of A wob kg. This product is multiplied by a safety factor with a value greater than 1, which determines how far away from the torque limit we wish to stop the WOB adaptation. Using e.g., a safety factor of 2, Equation (16) will stop the WOB adaptation when WOB is 2A wob kg away from the weight on bit value which would make the torque exceed its limit. Stopping the adaptation with some margin will allow the ES algorithm to continue performing the WOB excitations without the torque limitation being violated.

Reactive Constraint Handling
There are instances where Equation (16) will not be adequate to avoid violation of the torque constraint. The torque measurements are commonly very noisy, which can cause the estimated gradient to be inaccurate. Changes in downhole conditions could affect the torque in a way that cannot be predicted by the gradient, causing the torque to exceed its maximal limit. In this case, a reactive constraint handling technique should be used in combination with the predictive method to automatically steer the WOB back to the safe region if the torque constraint is violated [24]. At each timestep a variable, e, is calculated that quantifies if the constraint is violated and in which case by how much: If the variable e takes on a value larger than 0, this indicates that the constraint is violated and the adaptation gain, γ wob is set to zero. In Equation (17), T is the measured torque value at the current timestep. If the torque measurements are very noisy, the torque used in Equation (17) should be filtered to avoid that the constraint handling reacts too aggressively as a response to noise [24]. The variable e from Equation (17) is used as the error term in a discrete proportional-integral (PI) controller which calculates a penalty, λ, from: In Equation (18), K P and K I are the proportional and integral gains, respectively. These parameters are used to tune the controller and determine how fast the WOB will be adjusted if the torque exceeds the imposed limit. If the torque constraint at some point in time is violated, the summation term, Ψ, will continue to grow and make the penalty term larger until the torque is adjusted down to acceptable levels. At the time when the torque is returned to a level below the limiting value, the summation term is reset by setting the parameter n equal to this time, t, essentially forgetting that the torque constraint has previously been violated and continuing optimization with the ES algorithm from this point on.
The WOB that is requested by the ES algorithm is adjusted based on the penalty according to: The first term on the right-hand side of Equation (20) is the WOB value calculated from Equation (11a). The variable WOB constrained is used in Equation (7a) as the base WOB value when the constraint handling is activated. As long as the torque limit has not been violated, λ will be equal to zero and the WOB will not be adjusted by Equations (17)- (20). The penalty term, λ, will grow if we are drilling with torque values above the allowable limit, which will cause the applied WOB to be reduced according to Equation (20) until the torque is within the allowable bounds and λ is reset to a zero value. Making these adjustments to the WOB with Equation (20) rather than Equation (13) allows the algorithm to reduce the applied WOB faster, which will cause the torque constraint to be violated for a shorter period of time. It can be noted that the reactive constraint handling method will not make any adjustments to the WOB until the torque limit is exceeded. For this reason, a lower value for T limit than the actual system's limit should be used in Equations (17) and (19).

Practical Requirements and Algorithm Tuning
Implementation of the proposed algorithm requires the following measurements: WOB, RPM, bit torque (either calculated or measured), and calculated ROP. These measurements are used to calculate the MSE from Equation (6). In essence, the components of the ES algorithm act like filters when calculating the gradients and performing adaptation of the WOB and RPM values. Yet, if the measurements are too noisy, appropriate filtering should be applied before using them in the algorithm.
The algorithm automatically adjusts the setpoints for the WOB and RPM, which are then used by the autodriller to control the actual WOB and by the top drive to control the actual RPM. The internal control algorithms in the autodriller and top drive must be able to realize the requested small changes in the setpoints corresponding to the excitation signals. This places a lower bound on the excitation signals' amplitudes, based on the resolution of these control systems.
There are several key parameters in the ES algorithm that need to be tuned. These parameters are the period (P) and amplitude (A) of the excitation signals (defined in Equations (7a), (7b) and (8)), as well as the adaptation rate (γ) and the tuning parameter k in Equations (11a) and (11b). These parameters should be tuned taking into account the guidelines presented in Table 1.

Simulation Results
To simulate the dynamics of a control system on the rig that receives setpoints for WOB and RPM from the ES algorithm and steers the input variables to the requested setpoints as a function of time, the following functions were used to emulate this effect: In Equations (21a) and (21b), the left-hand sides represent the current values of the WOB and RPM that are applied at the bit, while WOB SP and RPM SP are the corresponding setpoints. How quickly the control system is able to steer the WOB and RPM from their current values to new values dictated by the setpoints is determined by the time constants, τ. For small values of τ, the WOB and RPM will quickly converge to their respective setpoints. For larger values of τ, convergence to the setpoints will take longer time.
The optimization algorithm and constraint handling approaches detailed in the previous sections were investigated by using the proposed extended model detailed in Equations (1)-(5) coupled with Equations (21a) and (21b) as a drilling simulator. In the simulation scenarios, the ES algorithm provides setpoints for the WOB and RPM, which are translated to applied WOB and RPM through Equations (21a) and (21b). The model simulates the ROP and torque response to these values of WOB and RPM that could be seen in the field for a given bit and formation. The current values for the WOB, RPM, T and RPM are "measured" from the extended model and used to calculate the MSE with Equation (6). These updated measurements are read by the ES algorithm and used to perform the optimization actions described in Section 3. It must be re-emphasized that the ES algorithm only uses measurements taken from the simulated drilling process to minimize the MSE, it has neither prior knowledge about the drilling model nor the locations of the different drilling dysfunctions.
The simulations emulate drilling in two generic formations, Formation A and Formation B. Formation A is described in detail in Section 2, where the optimal point to drill at in this lithology was identified to be at a WOB of 12,900 together with an RPM value of 89.5, as this combination minimizes the MSE. Formation B represents a softer formation than Formation A, otherwise they are identical. To emulate a preference for drilling with lower WOB and higher RPM in softer rocks, the onset of dysfunctions in Formation B are slightly different than in Formation A, placing the optimal point to drill at in Formation B at a WOB of 12,000 kg and an RPM value of 109. The parameters c 1 , c 2 , c 3 and c 4 that are used by the extended model in Equations (1) and (2) are provided in Table 2. These values are generated by picking generic values for the bit and formation parameters in the ranges suggested by Detournay et al. [28], and correspond to using units of kg for the WOB, rpm for the drill string rotational rate and the bit radius given in meters in Equations (1) and (2). All the simulation scenarios are initiated by ramping up the WOB and RPM to their starting setpoints, which is an initial guess at the optimal input values, which could e.g., be based on the driller's experience, a drill-off test, data from an offset well or estimates given by a drilling model. When the WOB and RPM have reached their initial values, the ES algorithm is activated and starts testing the drilling conditions with the excitation signals described by Equations (7a), (7b) and (8). After one full period of the WOB excitation signal, P wob = 120 s, the buffers needed for the gradient estimation are filled up with the relevant measurements and the algorithm starts adapting the WOB and RPM in the direction that will reduce the MSE. The parameter values that are common in all the simulations are provided in Table 3. Table 3. Parameter values that are common for all the simulations.

Parameter
Value Units

Unconstrained Drilling Optimization
This section contains the results from two runs of simulated drilling through the homogeneous Formation A. The theoretical optimal point in this scenario is located approximately at a WOB of 12,900 kg in combination with an RPM value of 89.5. No constraints are considered in these two simulations, meaning that the ES algorithm is free to search for the drilling conditions that will minimize the MSE without any limits imposed on the torque or ROP. Figure 10 shows the WOB, RPM, MSE and ROP for Simulation 1. The orange lines in the WOB and RPM tracks marks the base values, WOB and RPM. This run was initiated with conservative values of 8000 kg WOB and 60 RPM, which resulted in drilling at a low ROP of about 11.5 m/h and a calculated MSE of approximately 184 MPa. After performing the initial variations in the input variables, the ES algorithm detects that increasing the WOB and RPM will result in more efficient drilling. Both the WOB and RPM are steadily increased by the algorithm until they converge to the region of the founder point after drilling for about 2700 s. The rest of the interval is drilled at peak efficiency, where the average ROP is 31.5 m/h, which is an increase of more than 170% from the starting point. Throughout the simulation, the MSE is only marginally reduced by the adaptation in the input variables. This is because the initial WOB and RPM from the start of the simulation resulted in dysfunction free phase II drilling, where the MSE was already close to its minimal value. The proposed ES algorithm is designed to interpret small or zero MSEgradients as a situation where the corresponding input variable should be increased, which is why the WOB and RPM was adapted to the optimum in this scenario.
The values of WOB and RPM where the ES algorithm converges to in Simulation 1 are approximately 13,000 kg and 90 rpm, respectively, which are slightly higher than the pre-calculated values of 12,900 kg and 89.5 rpm. The ES algorithm's convergence to this point is caused by the k parameters used in Equation (11). The k values dictate that a (small) positive gradient must be calculated before the algorithm stops adaptation, and the point at which this occurs is slightly into the dysfunction region. This property can also be seen from Figures 4 and 5; the MSE does not significantly grow before the WOB and/or RPM has moved slightly into the dysfunction region, which is why the ES algorithm converges to the point seen in this simulation scenario.  Figure 11 shows the calculated gradients for Simulation 1, where it can be seen from the lower track that the estimated ⁄ values show some oscillatory behavior until the MSE and RPM has converged to the optimum. This is caused by the adaptation in the WOB and RPM signals, which can sometimes interfere with accurate gradient estimation. A moving average of ⁄ is plotted in the same track, which shows that the estimated gradient on average is unaffected by the oscillations. As long as the average gradients indicate which way the ES algorithm should adapt the input variables, the algorithm will be able to steer the WOB and RPM to the optimum. Figure 11 also shows the estimated gradients converging to a value equal to the k values for the respective signals, at which point the algorithm stops adjusting the and . The results from Simulation 2 are displayed in Figure 12. This scenario is the same as Simulation 1, with the exception that the we initiate drilling at a WOB of 13,000 kg and  Figure 11 shows the calculated gradients for Simulation 1, where it can be seen from the lower track that the estimated ∂MSE/∂RPM values show some oscillatory behavior until the MSE and RPM has converged to the optimum. This is caused by the adaptation in the WOB and RPM signals, which can sometimes interfere with accurate gradient estimation. A moving average of ∂MSE/∂RPM is plotted in the same track, which shows that the estimated gradient on average is unaffected by the oscillations. As long as the average gradients indicate which way the ES algorithm should adapt the input variables, the algorithm will be able to steer the WOB and RPM to the optimum. Figure 11 also shows the estimated gradients converging to a value equal to the k values for the respective signals, at which point the algorithm stops adjusting the WOB and RPM.  The results from Simulation 2 are displayed in Figure 12. This scenario is the same as Simulation 1, with the exception that the we initiate drilling at a WOB of 13,000 kg and with an RPM value of 160, which is in the dysfunction region (see Figure 2). The initial WOB is in fact the optimal WOB value, but only when combined with the appropriate RPM. As can be seen from Figure 12, drilling commences at a high average MSE value of about 300 MPa. The ES algorithm recognizes that we are drilling with a dysfunction and reduces both the WOB and RPM for the first 1800 s to exit the dysfunction region. At this point, the average MSE has been reduced to a value of approximately 181 MPa. As soon as the optimum zone is entered and it is safe to increase the WOB again, and the algorithm spends the next 3500 s converging more slowly to the optimal point. It can be noted that the adaptation in WOB and RPM that occurred during the first 1800 s happened at the algorithms maximal rate of 2.5 kg/s and 0.02 rpm/s, respectively. This is because of the large variations in MSE seen when drilling in the dysfunction region and the correspondingly large estimated gradients, which prompts the algorithm seek out better operating conditions as quickly as it is allowed to.
It can also be observed from Figure 12 that the adaptation that happens from 1800 s and onwards only results in small enhancements in ROP and MSE, the larger gains in drilling efficiency occurred during the early adaptation when moving out of the dysfunction region. At the start of the simulation, the ROP was about 33 m/h, which has been slightly reduced when compared to the ROP at the end of the run. The MSE has however been reduced substantially, which means that the drilling has become more energy efficient and possibly less detrimental for the bit and downhole tools. with an RPM value of 160, which is in the dysfunction region (see Figure 2). The initial WOB is in fact the optimal WOB value, but only when combined with the appropriate RPM. As can be seen from Figure 12, drilling commences at a high average MSE value of about 300 MPa. The ES algorithm recognizes that we are drilling with a dysfunction and reduces both the WOB and RPM for the first 1800 s to exit the dysfunction region. At this point, the average MSE has been reduced to a value of approximately 181 MPa. As soon as the optimum zone is entered and it is safe to increase the WOB again, and the algorithm spends the next 3500 s converging more slowly to the optimal point. It can be noted that the adaptation in and that occurred during the first 1800 s happened at the algorithms maximal rate of 2.5 kg/s and 0.02 rpm/s, respectively. This is because of the large variations in MSE seen when drilling in the dysfunction region and the correspondingly large estimated gradients, which prompts the algorithm seek out better operating conditions as quickly as it is allowed to.
It can also be observed from Figure 12 that the adaptation that happens from 1800 s and onwards only results in small enhancements in ROP and MSE, the larger gains in drilling efficiency occurred during the early adaptation when moving out of the dysfunction region. At the start of the simulation, the ROP was about 33 m/h, which has been slightly reduced when compared to the ROP at the end of the run. The MSE has however been reduced substantially, which means that the drilling has become more energy efficient and possibly less detrimental for the bit and downhole tools.  The blue and orange datapoints can be viewed as the "path" that the ES algorithm took to converge to the founder point in these two scenarios. In the case of Simulation 2, it can be seen that the path taken by the algorithm is not the most efficient way to reach the founder point. It is however close to the most efficient path to exit the dysfunction region as quickly as possible. Because of this "detour", more time is spent to converge to the optimum in Simulation 2, compared to Simulation 1.  The blue and orange datapoints can be viewed as the "path" that the ES algorithm took to converge to the founder point in these two scenarios. In the case of Simulation 2, it can be seen that the path taken by the algorithm is not the most efficient way to reach the founder point. It is however close to the most efficient path to exit the dysfunction region as quickly

Constrained Drilling Optimization
In this section, the proposed constraint handling techniques are demonstrated in two simulation scenarios of drilling through the homogeneous formation A. In Simulation 3, we have imposed a maximal ROP of 20 m/h on the system through the use of Equation (13) with a value of 0.1. This means that in this scenario, we seek to minimize the modified objective function given by Equation (13) and not the MSE. This run is initiated with the same values as in Simulation 1; a WOB of 8000 kg and an RPM value of 60. The results from Simulation 3 are shown in Figure 14, where it can be seen that the ES algorithm increases the applied WOB and RPM for the first 1300 s, before the ROP reaches the limiting value and the algorithm determines that any further adaptation will cause the ROP to exceed the allowable amount. Because of this constraint, the ES algorithm converges to the minimal MSE value that it can obtain while still drilling at an ROP at or below 20 m/h, which it finds at a WOB of 10,900 and an RPM of 71. There are several operating points at which the ES algorithm could converge to in this scenario, based on the initial values for WOB and RPM. Because the MSE response in the optimal region is relatively constant, the algorithm will seek out the first combination of WOB and RPM that it can find that drills at the ROP limit. Any adaptation in the input variables beyond this operating point will cause the objective function to artificially grow through Equation (13), which will discourage any further changes to the WOB or RPM unless it significantly decreases the MSE.
In Simulation run 4, a maximal torque limit of 10,000 Nm is enforced by the predictive and reactive constraint handling approaches detailed in Equations (14) through (20). A safety factor of 2 is used in Equation (16), and the parameters KP and KI in Equation (18) have values of 0.05 kg/Nm and 0.001 kg/Nm·s, respectively. The initial setpoints for the WOB and RPM are 8000 kg and 100 rpm, respectively.

Constrained Drilling Optimization
In this section, the proposed constraint handling techniques are demonstrated in two simulation scenarios of drilling through the homogeneous formation A. In Simulation 3, we have imposed a maximal ROP of 20 m/h on the system through the use of Equation (13) with a ρ value of 0.1. This means that in this scenario, we seek to minimize the modified objective function given by Equation (13) and not the MSE. This run is initiated with the same values as in Simulation 1; a WOB of 8000 kg and an RPM value of 60. The results from Simulation 3 are shown in Figure 14, where it can be seen that the ES algorithm increases the applied WOB and RPM for the first 1300 s, before the ROP reaches the limiting value and the algorithm determines that any further adaptation will cause the ROP to exceed the allowable amount. Because of this constraint, the ES algorithm converges to the minimal MSE value that it can obtain while still drilling at an ROP at or below 20 m/h, which it finds at a WOB of 10,900 and an RPM of 71. There are several operating points at which the ES algorithm could converge to in this scenario, based on the initial values for WOB and RPM. Because the MSE response in the optimal region is relatively constant, the algorithm will seek out the first combination of WOB and RPM that it can find that drills at the ROP limit. Any adaptation in the input variables beyond this operating point will cause the objective function to artificially grow through Equation (13), which will discourage any further changes to the WOB or RPM unless it significantly decreases the MSE.
In Simulation run 4, a maximal torque limit of 10,000 Nm is enforced by the predictive and reactive constraint handling approaches detailed in Equations (14) through (20). A safety factor of 2 is used in Equation (16), and the parameters K P and K I in Equation (18) have values of 0.05 kg/Nm and 0.001 kg/Nm·s, respectively. The initial setpoints for the WOB and RPM are 8000 kg and 100 rpm, respectively.  Figure 15 displays the results from Simulation 4. In the first 1200 s, both the WOB and RPM are adapted to higher values, which results in drilling with lower MSE and higher ROP. At around 1200 s, Equation (16) predicts that the torque will surpass the allowable amount of 10,000 Nm if the WOB is increased any further. The adaptation in WOB is halted at this point and onwards, while the RPM continues to grow up to a value of approximately 111, where further increases in RPM would result in drilling in the dysfunction region. After 2000 s, an unexpected torque-increase of 1000 Nm is simulated, which makes the torque exceed its limit. This rise in torque could represent e.g., a build-up of cuttings around the BHA. As the torque is increased, the average MSE is elevated from about 180 to 200 MPa. The predictive constraint handler rapidly lowers the WOB until the torque is again within the allowable bounds, resulting in a reduction in of about 900 kg. The reduction in steers the drilling further away from the optimum, and the MSE is somewhat increased as a response. When drilling with this lower WOB, the ES algorithm detects that it is now safe to increase the RPM without encountering any dysfunctions which would increase the MSE. The RPM is seen to adapt to a value of 120, where the ROP is increased to approximately 29 m/h and we are drilling at the highest efficiency that can be obtained given the constraint on the torque.  After 2000 s, an unexpected torque-increase of 1000 Nm is simulated, which makes the torque exceed its limit. This rise in torque could represent e.g., a build-up of cuttings around the BHA. As the torque is increased, the average MSE is elevated from about 180 to 200 MPa. The predictive constraint handler rapidly lowers the WOB until the torque is again within the allowable bounds, resulting in a reduction in WOB of about 900 kg. The reduction in WOB steers the drilling further away from the optimum, and the MSE is somewhat increased as a response. When drilling with this lower WOB, the ES algorithm detects that it is now safe to increase the RPM without encountering any dysfunctions which would increase the MSE. The RPM is seen to adapt to a value of 120, where the ROP is increased to approximately 29 m/h and we are drilling at the highest efficiency that can be obtained given the constraint on the torque. Energies 2021, 14, 1298 28 of 35 Figure 15. Simulation run 4, a scenario where the torque is not allowed to exceed 10,000 Nm.

Unconstrained Drilling with Formation Shifts
The results from Simulation 5 are shown in Figure 16, where we investigate how the proposed optimization algorithm handles abrupt formation changes. This scenario can be though as a continuation of Simulations 1 or 2, where the optimum for formation A was identified by the algorithm as 13,000 kg WOB and an RPM of 90. This scenario could also represent a setting where e.g., a drill-off test has been performed in formation A, which identified the optimal WOB and RPM. We initiate drilling with these optimal input values. After drilling 5 m in Formation A, we enter an 18-m-thick layer of the softer Formation B, at approximately 580 s. The optimal point in Formation B is located at a WOB of 12,000 kg and an RPM of 109, as indicated by the dotted lines in Figure 16. As we enter the softer formation, the WOB and RPM that were optimal for formation A are no longer the optimal input values to drill with and should be adjusted to drill more efficiently. Shortly after the formation shift occurs, the ES algorithm recognizes that the downhole conditions have changed and spends the following 1200 s converging to the optimal point in Formation B, where the MSE is minimized at an average value of 149 MPa and the average ROP has increased by approximately 20% from 36 to 43 m/h. At around 2200 s of simulation time, we enter Formation A again, and the WOB and RPM are slowly adjusted back to the optimal values that the simulation started out with. It can be seen that the "path" taken by the algorithm back to the optimum in Formation A consists of first reducing the value before building it up to 13,000 kg, in the same manner as was done in Simulation 2 (see Figures 12 and 13) to quickly exit the dysfunction region.
Two important aspects of the proposed ES algorithm are shown in Figure 16. First, the advantage of continuously applying the excitations in WOB and RPM also even when we are operating at the current optimum, becomes apparent. When the drilled formation suddenly changes, the information gathered by the excitation signals is used by the ES Figure 15. Simulation run 4, a scenario where the torque is not allowed to exceed 10,000 Nm.

Unconstrained Drilling with Formation Shifts
The results from Simulation 5 are shown in Figure 16, where we investigate how the proposed optimization algorithm handles abrupt formation changes. This scenario can be though as a continuation of Simulations 1 or 2, where the optimum for formation A was identified by the algorithm as 13,000 kg WOB and an RPM of 90. This scenario could also represent a setting where e.g., a drill-off test has been performed in formation A, which identified the optimal WOB and RPM. We initiate drilling with these optimal input values. After drilling 5 m in Formation A, we enter an 18-m-thick layer of the softer Formation B, at approximately 580 s. The optimal point in Formation B is located at a WOB of 12,000 kg and an RPM of 109, as indicated by the dotted lines in Figure 16. As we enter the softer formation, the WOB and RPM that were optimal for formation A are no longer the optimal input values to drill with and should be adjusted to drill more efficiently. Shortly after the formation shift occurs, the ES algorithm recognizes that the downhole conditions have changed and spends the following 1200 s converging to the optimal point in Formation B, where the MSE is minimized at an average value of 149 MPa and the average ROP has increased by approximately 20% from 36 to 43 m/h. At around 2200 s of simulation time, we enter Formation A again, and the WOB and RPM are slowly adjusted back to the optimal values that the simulation started out with. It can be seen that the "path" taken by the algorithm back to the optimum in Formation A consists of first reducing the WOB value before building it up to 13,000 kg, in the same manner as was done in Simulation 2 (see Figures 12 and 13) to quickly exit the dysfunction region.
Two important aspects of the proposed ES algorithm are shown in Figure 16. First, the advantage of continuously applying the excitations in WOB and RPM also even when we are operating at the current optimum, becomes apparent. When the drilled formation suddenly changes, the information gathered by the excitation signals is used by the ES algorithm to rapidly recognize that the conditions have changed, and adjustments should be made in the applied WOB and RPM to drill more efficiently. These adjustments resulted in an increase in ROP of about 20%, compared to a setting where formation B had been drilled with constant WOB and RPM based on what was the optimal point when the simulation was initiated in formation A.
A second and closely related aspect is seen in the adjustments in WOB,and RPM performed by the algorithm immediately after the formation change occurs at about 580 s of simulation time. In the following we consider the WOB, but the same analysis applies to the RPM. When the drilled formation becomes softer at 580 s, the calculated MSE is reduced. The ES algorithm relates this reduction in MSE to the currently applied WOB, which at this time was in an elevated position of WOB + A wob kg. This causes the algorithm to estimate an artificially large and positive gradient for a short period of time, as higher values of WOB are related to a significant reduction in MSE. This erroneous gradient indicates that large improvements to the drilling efficiency can be made if WOB is increased. At this point, the adjustment of WOB in the wrong direction is limited by the saturation function and the adaptation gain in Equation (11), that disallows adaptation faster than 2.5 kg/s even if the estimated gradient is large. During the 50 s that the WOB is steered in the wrong direction, the WOB is only increased by about 125 kg. The requested changes in WOB would be much higher if the adaptation was directly proportional to the estimated gradient (as is usually the case in classical ES algorithms [26]). After drilling in this new formation for 50 s, the buffers used in the algorithm contain enough data sampled from the current conditions to detect that the WOB should be reduced to minimize the MSE, and the algorithm subsequently steers the WOB to the correct optimal value. The same effect as just described also occurs at the second formation shift at about 2200 s. algorithm to rapidly recognize that the conditions have changed, and adjustments should be made in the applied and to drill more efficiently. These adjustments resulted in an increase in ROP of about 20%, compared to a setting where formation B had been drilled with constant WOB and RPM based on what was the optimal point when the simulation was initiated in formation A.
A second and closely related aspect is seen in the adjustments in ,and performed by the algorithm immediately after the formation change occurs at about 580 s of simulation time. In the following we consider the WOB, but the same analysis applies to the RPM. When the drilled formation becomes softer at 580 s, the calculated MSE is reduced. The ES algorithm relates this reduction in MSE to the currently applied WOB, which at this time was in an elevated position of + kg. This causes the algorithm to estimate an artificially large and positive gradient for a short period of time, as higher values of WOB are related to a significant reduction in MSE. This erroneous gradient indicates that large improvements to the drilling efficiency can be made if is increased. At this point, the adjustment of in the wrong direction is limited by the saturation function and the adaptation gain in Equation (11), that disallows adaptation faster than 2.5 kg/s even if the estimated gradient is large. During the 50 s that the is steered in the wrong direction, the is only increased by about 125 kg. The requested changes in would be much higher if the adaptation was directly proportional to the estimated gradient (as is usually the case in classical ES algorithms [26]). After drilling in this new formation for 50 s, the buffers used in the algorithm contain enough data sampled from the current conditions to detect that the should be reduced to minimize the MSE, and the algorithm subsequently steers the to the correct optimal value. The same effect as just described also occurs at the second formation shift at about 2200 s.

Discussion of Results
The five simulation scenarios detailed in the previous section demonstrate how the proposed ES algorithm can be utilized to automatically steer the drilling process to the optimum conditions where the MSE is minimized. Since the optimization algorithm inherently requires full control of the WOB and RPM to continuously adjust these variables towards the optimal point, it is of utmost importance that the algorithm recognizes and avoids circumstances that could be damaging to the drilling equipment or could cause a contingency situation. In simulation 2 it was shown that the algorithm automatically steers away from drilling with dysfunctions, as drilling in this region resulted in high MSE values that could be reduced by regulating the WOB and RPM. In simulation runs 3 and 4, the possibly detrimental effects that we wish to avoid are not directly related to the MSE, but rather to other parameter that we wish to keep within certain limits. Simulations 3 and 4 demonstrate generic approaches to how we can avoid these types of constraints. In Simulation 4, the predictive constraint handling routine stops WOB adaptation before the constraint is violated. A separate control loop (the reactive constraint handling) is able to adjust the WOB back to the safe region when the limit is violated, much faster than if this constraint was implemented through modification of the objective function. This technique is however only applicable when the constrained output (the torque) is related to only one of the input variables, in this case the WOB. When several of the input variables are related to the output constraint, as is the case between the WOB, RPM and ROP in simulation run 3, the modified objective function described in Equation (13) is a better alternative to avoid the limit being exceeded.
The main advantage of using a data-driven algorithm like ES to optimize the drilling process is that it does not require detailed a priori knowledge of the downhole conditions or a drilling model to seek out more efficient drilling, and it can adapt to downhole changes. Both of these properties are shown in Simulation 5, where two formation shifts occurred which prompted the ES algorithm to seek out the new optimal conditions shortly after the changes happened. The ES algorithm considers a fixed window of time to perform its analysis and inherently tries to relate any change in the MSE to the applied excitations in WOB and RPM. If the MSE changes without any relation to the excitation signals, e.g., at a formation shift, the estimated gradients can become inaccurate when the data series used to estimate the gradients contain measurements from two differing downhole conditions. To avoid the algorithm having an exaggerated reaction to disturbances like this, the saturation function together with moderate values for the gain parameters, γ, is used to limit the algorithm's maximal adaptation rate regardless of the magnitude of the estimated gradients. This design encourages slow and steady adaptation towards the optimum. Faster convergence could be achieved by increasing γ, but this would also make the algorithm more susceptible to disturbances and noise.
Although the ES algorithm performs optimization actions without using a model of the system, engineering knowledge about the process is required to tune the algorithm and provide appropriate initial values for the WOB and RPM. In the simulation scenarios, the starting points were chosen to showcase different properties of the ES algorithm like constraint handling, convergence to the optimum and avoidance of drilling with dysfunctions. In simulation scenarios 1 and 5, the ES algorithm was able to increase the ROP by about 170% and 20%, respectively. How much the algorithm can improve the drilling efficiency (through higher ROP and/or lower MSE) is strongly related to how far away from the optimal WOB and RPM that the algorithm is initiated. In a field application, the initial WOB and RPM values should be a best guess of the optimal drilling conditions, which could be based on the driller's experience, a drill-off test, data from an offset well or estimates given by a drilling model.
There are both benefits and drawbacks to choosing MSE as the objective function to minimize. The MSE can be used to identify the founder point by seeking out the maximal WOB and RPM values that results in a decreasing or flat response in MSE. The expected flat region in MSE when operating in drilling region II can however pose some challenges when applying the ES algorithm. When drilling in this region, the estimated gradients will have zero or near zero values and the ES algorithm depends on the parameters k wob and k rpm to indicate if the input variables should be increased. In field applications, the calculated MSE could be susceptible to noise (especially though the torque) which could make it hard to estimate zero or close to zero valued gradients. This complication could be alleviated by increasing the amplitudes and periods of the excitation signals or considering a longer sliding window time series that encompassed several oscillations of the excitation signals, which would average out disturbances. Another possible alternative could be to use a combination of ROP and MSE as the objective function, as in [25].
There are several paths for further research that could be explored. Additional studies on a more advanced drilling simulator, field trials or lab experiments are needed to investigate how dynamic effects such as vibrations affect the performance of the proposed algorithm. Tests in the field or in the lab would also provide the opportunity to compare the ES method with other optimization methods, either data-driven or model based, in a realistic setting. Testing the algorithm in e.g., a lab setting would allow us to study if the algorithm in its current form will be able to converge to the optimum if it is initiated in a region where severe vibrations occur. The extended model used to simulate drilling in this study assumes that the MSE will keep increasing when operating further into regions where vibrations are expected to occur (see Figures 4 and 5). If this is not the case, and the MSE rather reaches some plateau value that does not change as a function of WOB and RPM in these regions, the proposed ES algorithm would not be able to find the optimum if the initial point was in the MSE plateau region. If this is the case, a different objective function, e.g., on the form of Equation (13) could be used to remedy the issue.
A second possibility for further work relies on the ES method's inherent nature of relating measurements of drilling parameters to the known variations in WOB and RPM induced by the excitation signals. If available, additional measured and/or calculated parameters such as the magnitude of different forms of vibrations could be related to the variations in WOB and RPM. Knowing how downhole vibrations vary as a function of WOB and/or RPM could be utilized for constraint handling or be displayed as useful information for the driller.

Conclusions
We have presented an algorithm based on the multivariable extremum seeking method that automatically optimizes the WOB and RPM to achieve drilling with minimal MSE, while adhering to operational constraints for safe and efficient drilling. The algorithm detailed in the paper is data-driven and does not require detailed a priori knowledge or models of the drilling process. The algorithm gathers information about the current downhole conditions by continuously performing small tests with the applied WOB and RPM while drilling and automatically implements optimization actions based on the test results. To investigate the algorithm's performance in a simulation environment, a drilling model for bit-rock interaction has been extended by the authors to qualitatively account for drilling dysfunctions. The simulations demonstrate that the proposed algorithm is able to find and maintain the WOB and RPM that result in drilling with minimal MSE, while adhering to operational constraints. The constraint handling functionality has been demonstrated with limits imposed on the ROP and torque. Yet, it is generic and can be applied to other constraining factors. The simulations show that the ES method is able to track changes in the optimal WOB and RPM corresponding to changes in the drilled formation. As demonstrated in the simulation scenarios, the overall improvements in ROP can be up to 20-170%, depending on the initial guess of the optimal WOB and RPM obtained from e.g., a drill-off test or a potentially inaccurate model. Along with the algorithm's description, we provide an explanation of specific design choices and tuning guidelines that simplify the use of the algorithm in practice.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Period Selection for the Excitation Signals
The tuning of the excitation signals is an important part of the extremum seeking algorithm. To extract a gradient of how the MSE relates to the WOB and RPM independently, setting P WOB = 2P RPM is suggested by the authors. Under some simplifying assumptions, it can be shown that this tuning of the excitation signal's periods allows for exact estimation of ∂MSE/∂WOB and ∂MSE/∂RPM without interference between the two excitation signals. Here, we investigate this property by considering the estimation of ∂MSE/∂WOB with a continuous-time, single-variable version of Equation (9), which is applied to a system where both the WOB and RPM is varied according to Equations (7a), (7b) and (8). A similar analysis can also be performed to show how the least-squares estimation of ∂MSE/∂RPM is not affected by the variations in the WOB.
Although the MSE is a non-linear function of both WOB and RPM when considering the entire span of WOB and RPM values (see Figures 4 and 5), the extremum seeking algorithm uses only a local region of this non-linear relationship when estimating gradients. The extent of this local region is determined by the amplitudes of the excitation signals. If suitable (not too large) amplitudes are used, it can be assumed that locally there is an approximately linear relationship between the MSE and the applied WOB and RPM, which is the relationship that is estimated by the least-squares gradient calculation in Equation (9). Using compact notation, let the WOB be denoted by x = x + d x and the RPM be represented by y = y + d y , as detailed in Equations (7a) and (7b). In the neighborhood of the point (x, y), the non-linear relation between the MSE, WOB and RPM can be approximately described by: where z represents the MSE and the parameters α and β take on constant values in this local region. We further assume that the adaptation in WOB and RPM is small so that x and y are approximately constant throughout the investigated time interval of P x seconds, as is common practice for average analysis of extremum seeking algorithms [26].
We consider a scenario where we are drilling ahead through a homogeneous formation and have varied the WOB (x) and RPM (y) according to Equation (8) while recording the MSE (z) for the past P x seconds. The measured drilling data is used to solve for the leastsquares slope and intercept parameters, a and b, using a continuous-time, single-variable version of Equation (9): where a represents the gradient-estimate, ∂MSE/∂WOB. Substituting in the previously defined relationships for x and y and approximating the response of the drilling system with Equation (A1) yields: Further, using Equation (8) At any point in time, t, the integral in Equation (A4) can be split into intervals in which the signum function takes on constant values of ±1. Using the relation P x = 2P y and the assumption that x and y are constant values, Equation (A4) can be expressed as: min a,b P x x 2 y + A 2 x α 2 y 2 + α 2 A 2 y − 2αay + a 2 + 2P x x(β − b)(a − αy) Taking the partial derivatives of Equation (A5) with respect to a and b and equating them to zero results in the set of equations: which has the solution a = αy and b = β. The estimated gradient, αy, corresponds to the slope of ∂MSE/∂WOB described by Equation (A1) evaluated at the average RPM value, y. This shows that in an ideal scenario where the simplifying assumptions are met, the tuning P x = 2P y allows for accurate estimation of ∂MSE/∂WOB without interference from the variations in RPM. The same analysis can be repeated for estimation of ∂MSE/∂RPM to find the expected gradient, αx, for this case. Other combinations of periods for the excitation signals can also be employed based on similar analysis, as long as one of the periods is an even multiple of the other, P wob = nP rpm or P rpm = nP wob where n is an even number larger than zero. In reality, the applied WOB and RPM will exhibit dynamics and cannot be expected to perfectly follow the square wave setpoints requested by the extremum seeking algorithm. Any deviations from the setpoints will however be dealt with by the least-squares approach to gradient estimation, which will incorporate these transient periods into the analysis. Furthermore, if the system is not currently at the optimal point, there will be adaptation in both WOB and RPM which will make the base values, x and y, change throughout the investigated time interval. The adaptation can cause some inaccuracies in the estimated gradients, but this effect can be kept to a minimum by choosing conservative values for the adaptation gains as well as through appropriate filtering of the data.