Controlling Remotely Operated Vehicles with Deterministic Artificial Intelligence

Unmanned ocean vehicles can be guided and controlled autonomously or remotely, and even remote operation can be automated significantly. Classical methods use trajectory tracking errors in negative feedback. Recently published methods are proposed instead. Deterministic (nonstochastic) artificial intelligence (DAI) combines optimal learning with an asserted self awareness statement in the form of the governing mathematical model (based on physics in this instantiation) to allow control that can be alternatively adaptive (i.e., capable of reacting to changing system dynamics) or learning (i.e., able to provide information about what aspects of the system dynamics have changed). In this manuscript, deterministic artificial intelligence is applied to the heading control of a simulated remotely operated underwater vehicle (ROV). Research is presented illustrating autonomous control of a Seabotix vLBV 300 remotely operated vehicle within milli-degrees on the very first step of a shaped square wave command, and error decreased an additional sixty-two percent by the third step of the square wave command.


Introduction
Unmanned vehicles are increasingly popular in society and have proven especially useful in dangerous environments. Exploring for life deep at the bubbling chimneys of magma-heated water in the very deep, dark depths of the ocean [1] acts as prototype practice for deep space exploration by such unmanned systems as depicted in Figure 1. Planning to explore distant planets and moons such as Europa and Enceladus, the National Aeronautics and Space Administration (NASA) uses the BRUIE underwater rover to practice looking for life under the ice in Antarctica. The BRUIE rover is depicted in Figure 2 in an Arctic lake near Barrow, Alaska in 2015 [2]. Such dangerous environments illustrate the importance of highly capable, advanced systems automating as much activity as possible. The popularity of using such vehicles is due in part to their very long history of technological understanding and development, being governed by fundamental principles developed hundreds of years ago. Vehicle translation is governed by principles developed by Newton in 1687 [3], while rotation is governed by principles elaborated by Euler in 1776 [4]. The two natures of motion (translation and rotation) were expressed very early by Chasles in 1830 [5] as coupled and nonlinear and shortly afterwards embodied by Coriolis in a now famous theorem [6]. Hamilton introduced alternative formulation methods in 1834 using energy [7], while Lagrange introduced a presentation using energy that more closely resembled Newton's original formulation [8,9], and in the last century Kane [10][11][12] provided the latest parameterization of the same natural relationships very often referred to as a Lagrangian formulation [8,9] of D'Alembert's principle, which was introduced in 1743  [1]. Image used for educational purposes in accordance with NASA Media Usage Guidelines [22].

Figure 2.
NASA BRUIE under-surface rover designed to explore alien oceans [2]. Image used for educational purposes in accordance with NASA Media Usage Guidelines [22].
A recent lineage of adaptive controls applied to robotic systems [19,20] and spacecraft [21,23] illustrated an ability to autonomously recover from significant damage without assistance [24]. The foundation of those methods was unique manifestations of the governing differential equations of motion in a feedforward fashion as controls, augmented with classical feedback rules used to adapt the feedforward controls. Highlighting and amplifying this foundation, Ref. [25] demonstrated feedforward controls comprised of the governing differential equations of motion driven by autonomously generated trajectory commands was superior to linear-quadratic optimal feedback control. Later, Ref. [26] developed optimal (in a two-norm sense) feedback to realize adaption, replacing the former adaptive methods based in classical feedback. In 2020, the combined method, newly labeled deterministic artificial intelligence [27], comprised of self-awareness statements (the feedforward codification of the governing differential equations) plus optimal learning (through feedback), was applied to unmanned ocean vehicles [27] as an embodied update to classical [28] and optimal (so-called "modern") approaches [29].
Deterministic artificial intelligence [27] is thus an alternative to adaptive control [19][20][21]23,24] and classical control schemes [28] and is based on the observation that, if one can discern the governing mathematical equations for a system (using any of the cited approaches [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17]), then one could use those governing equations themselves as a control law (in both feedforward [25] and feedback senses [26]), which is particularly effective if the governing mathematical relationship is stipulated by physics. This allows the design of a feedforward controller that might perfectly track a desired trajectory [30,31] simply by plugging prescribed desired velocity and acceleration into the system's governing equations, and then using them, calculating the required control forces and torques. Deterministic artificial intelligence first asserts the general structure of the system dynamics, then uses some form of learning to determine the parameters of the model. Ref. [26] Assuming the chosen model structure is sufficiently comprehensive to capture the dominant system dynamics, a deterministic artificial intelligence controller will be able to learn all of the relevant parameters, Ref. [27] and adapt to any changes in the underlying system dynamics. By adding simple feedback the controller is able to correct for disturbances or other unmodeled dynamics to track a prescribed trajectory with very little error.
While [28] developed benchmark classical methods on the Phoenix underwater vehicle, Ref. [27] developed deterministic artificial intelligence for the Aries underwater vehicle in Figure 3. In this manuscript, a deterministic artificial intelligence controller is newly developed and proposed as applied to yaw control of a simulated remotely operated submersible (ROV). The simulator was configured to simulate the dynamics of a Seabotix vLBV 300, with no environmental disturbances (e.g., wind, swell, currents) or measurement noise.  [27] originally taken from [32], (b) Naval Postgraduate School Aries autonomous unmanned underwater vehicle [28] originally taken from [32], where the image may be distributed or copied, subject only to any indicated copyright restrictions and normally accepted procedures for properly crediting sources as done here [33], and (c) Seabotix vLBV 300 remotely operated submersible. Photo taken in lab.

Proposed Innovations
This section succinctly and directly lists the innovations presented in this manuscript.

1.
First-time development and presentation of deterministic artificial intelligence applied to yaw control of a remotely piloted submersible (the Seabotix vLBV 300 in this case): includes both self-awareness statements and newly presented parameterization of optimal learning. 2.
First instantiation of cubic polynomial autonomous trajectory generation supplied to deterministic self-awareness statements; 3.
Illustration of milli-degree tracking accuracy on the very first heading change of an aggressive shaped-square wave command, and an additional sixty percent (root mean square) error reduction by the third heading change of the square wave (illustrating the efficacy of optimal learning to eliminate startup transient behavior).

Manuscript Structure
The next section describes the methods and materials used in the manuscript with the results of validating simulation experiments provided next in Section 3 followed by a very brief discussion of the results and how they can be interpreted from the perspective of previous studies and of the working hypotheses. The findings and their implications are discussed in the broadest context possible. Future research directions are also highlighted.

Materials and Methods
This section builds the proposed methodology from first principles cited in the Introduction, naturally starting with the system dynamics which lead to the governing differential equations of motion, which are quickly re-parameterized to permit two-norm optimal feedback learning. Since utilization of the governing equations necessitates expression of a desired trajectory (and that expression is preferably autonomous and analytic), autonomous trajectory generation is introduced next, where cubic polynomials are introduced and proposed. These notions are assembled (as depicted in Figure 4) to produce control and resulting motion trajectories where results are presented in Section 3 and discussed in Section 4. The presentation of materials and methods is described with sufficient details to allow others to replicate and build on published results, where the simulations used to manifest the mathematical developments in this section are presented in the Appendix A, permitting readers to develop identical simulations themselves.

Controller Architecture
The controller operations for a given time step are as follows. First, a trajectory is generated between the current system state and the set point. This trajectory is used to determine the current desired velocity and acceleration. Based on a system dynamics model, the desired velocity and acceleration are used to calculate a control input to the system that will achieve the desired velocity and acceleration. The inputs are applied to the system, and then measurements of the system state are fed into a state estimator, and the estimated system state is used to optimally update the estimates for the parameters of the system dynamics model. Then this new system state estimate is used to generate a new trajectory, and the process repeats.

System Dynamics
The ROV yaw dynamics model asserted for this controller ignores Coriolis forces and assumes decoupled motion for calculating drag forces, and takes the form of Equation (1) with variable definitions given in Table 1. In order to estimate the coefficients, Equation (1) must be rewritten in "regression form", i.e., in the form y = ΦΘ, where Φ is a matrix of known quantities and Θ is a vector the coefficients to estimate. This can be done by parameterizing the system in accordance with Equation (2).
Then the optimal estimates for Θ,Θ, can be computed using recursive least squares (RLS) as follows in Equation (3) [34] (which identically form the basis of comparative benchmark adaptive techniques [35]), and the readers will recognize Equation (3) as the feedback portion of the classical linear-optimal state estimator, the Kalman Filter [36,37]. Thus, deterministic artificial intelligence is akin a dual representation of a Kalman Filter applied to control rather than state estimation, where the limitation of linear or linearized feedforward is released and replaced with any form of coupled, nonlinear dynamics demanded by the governing physics.

State Estimation
Estimates of the system states in Φ are required in order to estimate the parameters in Θ. The simulator used to test the system provides "truth" measurements for roll, pitch, yaw, and their respective velocities, so those values are used directly. A Kalman filter is used to estimate the attitude accelerations. Since "true" measurements are used as the inputs to the Kalman filter and to the RLS estimation, this paper does not attempt to assess the performance of the controller in the presence of noisy measurements.

Trajectory Generation
The DAI-based controller requires a trajectory to track between the current state and the desired state. It is desirable that the trajectory be continuous with the system velocity and position, and needs to reach the desired position and velocity after some finite time interval, i.e., if the current position is x 0 , the final position is x f , and the desired time to complete the move is T then the trajectory q(t) must satisfy One way of satisfying these requirements is to choose a trajectory that is a cubic polynomial of the form where ∆x = x f − x 0 . Solving for the coefficients gives If constant velocity is assumed for t < t 0 and t > t 0 + T then the trajectory becomes the piecewise defined functions

Control Output
Each time the user specified yaw set point changes a new "primary" trajectory, q 1 (t), is generated between the current vehicle state, and the specified set point and 0 velocity. Then, on each time step, a "secondary" trajectory q 2 (t) is generated between the current vehicle position, and the position and velocity of the primary trajectory a short time, ∆t, in the future, i.e., if the current position and velocity at time t k are x k andẋ k , then the "secondary" trajectory, q 2 (t), is generated between x k andẋ k , and q 1 (t k + ∆t) andq 1 (t k + ∆t). This keeps the vehicle on the specified "primary" trajectory, driving it back to the trajectory if it is perturbed. The ∆t used to generate the "secondary" trajectory is a tunable parameter. For the system simulated in this paper, a value of 0.25 s was chosen.
The velocity and acceleration trajectories calculated for q 2 (t) using Equations (12) and (13), and the estimated model parameters calculated from (3) are then used to calculate the controller output for each time step. First, calculate the desired yaw velocity, r d , from Equation (12), then calculate the desired yaw acceleration,ṙ d , from Equation (13). Next, substitute r d andṙ d for r andṙ, respectively, in Φ to yield the desired state vector Φ d . Then using the most recent estimate for the model parametersΘ calculated from (3) the prescribed yaw torque is Finally, the yaw torque is related to the control, u ψ , by the A 0 coefficient via

Results
This section provides a concise and precise description of the experimental simulation results, their interpretation as well as the experimental conclusions that can be drawn.
The controller was tested by applying a series of positive then negative 30-degree step changes to the desired vehicle heading. The trajectory generation was configured to generate a trajectory that would take approximately 2.42 s to complete the 30-degree heading change. Figure 5 shows the response to these inputs and Table 2 summarizes some performance metrics. The rise time in the table is calculated as the amount of time to complete 99.9% of the thirty degree heading change (29.97 degrees). The settling time is calculated as the amount of time required after the heading change was started before the heading stayed within 0.1% (0.03 degrees) of the final value. Deterministic artificial intelligence was able to quickly converge to values that provided a stable response with good tracking of the desired trajectory. Even for the first heading change the controller was able to closely track the trajectory, with a root mean square (RMS) tracking error of only 0.043 degrees. By the third positive step the RMS error had further decreased by almost a factor of 3, to only 0.016 degrees.
The simulator was configured such that there were no disturbances, and such that the measurements obtained from the simulator were not corrupted by noise. This represents idealized conditions, but serves to validate that the general algorithm and controller architecture may be viable.

Discussion
The slight variations in response for each step, best seen in the slightly differing amounts of error in Figure 5, are likely the result of small variations in the model parameters and timing variations in the controller and/or simulator.
Previous applications of DAI to attitude control and UUVs have demonstrated that DAI can be used to achieve a high degree of trajectory tracking accuracy with minimal lag [26,27]. This paper presents a DAI-based controller that uses a different feedback mechanism and trajectory generation scheme; however, the results achieved by the DAI controller in this paper are consistent with the level of performance achieved by the previous DAI controllers. The results are also consistent with the central hypothesis behind DAI: that using a system's own dynamics as the control law can produce extremely effective trajectory tracking.

Findings and Their Implications
The controller plays a crucial part in the operation of an ROV. At its core, the controller's role is to translate the operator's instructions into actual vehicle motion, so the operator's ability to complete an objective is directly tied to the quality of the ROV's controller. The controller outlined in this paper provides a framework that presents a number of advantages to both operators and manufacturers over classical controllers typically used for ROV control. One advantage that this controller presents is the high degree of tracking accuracy. The controller presented here achieved milli-degree tracking accuracy within seconds of startup. This level of accuracy makes the controller well suited to extremely sensitive tasks which require a high degree of precision such as sampling delicate marine life or explosive ordnance disposal (EOD). The adaptive nature of the controller also makes it well suited to applications where the system dynamics may change unexpectedly. Again, EOD operations present situations where the ROV may be subject to significant damage, and the adaption could allow recovery and mission completion after sustaining damage that would otherwise cripple the vehicle. The adaptive nature of the controller also means that it required no tuning. Tuning can be an expensive part of ROV design, often requiring significant time from trained engineers. Removing the requirement for tuning presents manufacturers with an opportunity to reduce the cost to produce an ROV.
The approach is validated to effectively control the remotely operated vehicle with increasing accuracy with the passage of time. The application of deterministic artificial intelligence in autonomous unmanned vehicles (UUVs) proved to be effective with no design tuning required, and that feature appears to be true for the disparate system equations of the Seabotix vLBV 300 remotely operated vehicle. The working hypothesis is proved by results that are comparable to the previous studies on UUVs.

Recommendations for Future Research
This manuscript proposed a controller architecture and tested it for yaw control of an ROV under idealized conditions, namely no disturbances and perfect measurements.
This presents many avenues for future work. One area of future work involves expanding the controller testing and validation with the end goal of a control system operating in real-world open ocean conditions. The first step in this is to apply the controller to the ROV's other degrees of freedom: roll, pitch, position and depth. Then performance of these controllers will be assessed under simulated conditions more akin to a real-world operating environment. This will involve adding measurement noise and disturbances such as wind, swell or tether dynamics to the simulated test environment. After demonstrating performance under non-ideal simulated conditions the research will progress to laboratory validation on real vehicle hardware, and finally to assessing the performance of the proposed controller on a "live" ROV in open water conditions.
Another area of future work involves assessing and improving the control architecture itself. The controller's adaptive nature will be tested to determine the controller's robustness to changes in system dynamics, for example from damage to the vehicle or changes in payloads. Alternate trajectory generation schemes will also be explored. These will focus on optimal trajectories that for example minimize energy consumption, and trajectories generated under constraints such as maximum and minimum applied force, or maximum slew rate of actuators that better reflect the actual operating capabilities of the ROV. Previous work such as [26] has evaluated the efficacy of DAI-based controllers against "classic" controllers, but an important area of future research will involve comparing this DAI controller against controllers based on other forms of artificial intelligence such as deep learning. Finally, analysis of the computational burden required for this controller should be performed in order to determine minimum required system specifications.
The cited lineage (e.g., Chasle, Slotine, Fossen, Nakatani, Cooper, Smeresky, etc.) develops dynamic-based controls in order to instill inherent resilience to dynamic parameter change, and sources of such change include damage, fuel expenditure, and sudden grasping of objects by the remotely operated vehicle. Following this prequel research, the limits of efficacy of the validated approach should be investigated to ascertain the ability to autonomously reject the deleterious effects of such changes. Next, bench-board laboratory experiments should be performed to prepare for validating hardware experiments in the open ocean.   c o n s t double p s i _ q _ g a i n = 1 ; c o n s t double p s i _ d o t _ q _ g a i n = 1 ; c o n s t double p s i _ a c c e l _ q _ g a i n = 1 e3 ; Eigen : : Matrix <double , Num_states ( ) , 1> G { { . 5 * dt * dt * s t d : : s q r t ( p s i _ q _ g a i n ) } , { dt * s t d : : s q r t ( p s i _ d o t _ q _ g a i n ) } , { s t d : : s q r t ( p s i _ a c c e l _ q _ g a i n ) } , { . 5 * dt * dt * s t d : : s q r t ( p s i _ q _ g a i n ) } , { dt * s t d : : s q r t ( p s i _ d o t _ q _ g a i n ) } , { s t d : : s q r t ( p s i _ a c c e l _ q _ g a i n ) } , { . 5 * dt * dt * s t d : : s q r t ( p s i _ q _ g a i n ) } , { dt * s t d : : s q r t ( p s i _ d o t _ q _ g a i n ) } , { s t d : : s q r t ( p s i _ a c c e l _ q _ g a i n ) } } ; Attitude_model : : MatrixNN q = G * G. t r a n s p o s e ( ) ; q = q * 0 ; q ( 0 , 0 ) = p s i _ q _ g a i n ; q ( 1 , 1 ) = p s i _ d o t _ q _ g a i n ; q ( 2 , 2 ) = p s i _ a c c e l _ q _ g a i n ; q ( 3 , 3 ) = p s i _ q _ g a i n ; q ( 4 , 4 ) = p s i _ d o t _ q _ g a i n ; q ( 5 , 5 ) = p s i _ a c c e l _ q _ g a i n ; q ( 6 , 6 ) = p s i _ q _ g a i n ; q ( 7 , 7 ) = p s i _ d o t _ q _ g a i n ; q ( 8 , 8 ) = p s i _ a c c e l _ q _ g a i n ;