Multistage Dynamic Optimization with Di ﬀ erent Forms of Neural-State Constraints to Avoid Many Object Collisions Based on Radar Remote Sensing

: This article presents the possibility of helping navigators direct the movement of an object, while safely passing through other objects, using an artiﬁcial neural network and optimization methods. It has been shown that the best trajectory of an object in terms of optimality and security, from among many possible options, can be determined by the method of dynamic programming with the simultaneous use of an artiﬁcial neural network, by depicting the encountered objects as moving in forbidden domains. Analytical considerations are illustrated with examples of simulation studies of the developed calculation program on real navigational situations at sea. This research took into account both the number of objects encountered and the di ﬀ erent shapes of domains assigned to the objects encountered. Finally, the optimal value of the safe object trajectory time was compared on the setpoint value of the safe passing distance of objects in given visibility conditions at sea, and the degree of discretization of calculations was determined by the density of the location of nodes along the route of objects.


Introduction
The International Maritime Organization (IMO) has introduced the requirement for objects to have an automatic radar plotting device, called an Automatic Radar Plotting Aid (ARPA), installed. This system automatically initiates and continues the tracking of detected echoes, generating alarms in hazardous situations, and was presented by Graziano et al. and Huang et al. in their research [1,2]. The ARPA computer calculates the distance and time for the critical approach of objects and then compares the obtained values with their recommended values for the current situation at sea. If the calculated values exceed the set limits, the dangerous target alarm is activated. It is also possible to simulate a trial maneuver of the object, but only for one target, namely the most dangerous encountered object. This process helps the navigator evaluate the effects of the planned anti-collision maneuver at an accelerated timescale. Therefore, as stated by Bist [3], the ARPA is a source of support for the work of the navigator, increasing navigational safety. The main task of ARPA is to prepare for maneuvering decisions made by the navigator, especially in situations of concentrated object traffic in restricted waters [4][5][6].
In practice, there are many possible safe collision-avoidance maneuvers; ideally, the navigator chooses the optimal maneuver, which is the one that, in addition to minimizing the collision risk, results in the smallest deviation from the original path. The possibilities for supplementing the system with appropriate methods to support maneuvering decisions taken in an uncertain navigational situation that occurs in a short time in relation to the greater number of objects encountered are described in [7][8][9][10][11][12]. The process of reducing the uncertainty when assessing the real navigational situation of an object by using an artificial neural network is shown in [13][14][15]. Lenart [16] proposed the parameter "time for a safe distance" after detecting dangerous objects as a potentially important parameter, accompanied by the display of possible evasive maneuvers. Borkowski [17] presented acceptable solutions for altering the course range in compliance with the International Regulations for Preventing Collisions at Sea (COLREGs Rules). Liu et al. and Lisowski [18,19] proposed a different approach to preventing object collisions at sea, through use of game-theory methods to determine the safe trajectory of ships by considering elements of an indefinite nature in real navigational situations at sea.
In previous works, the subjectivity of the navigator making a maneuvering decision in a collision situation has been omitted in the calculation algorithms, however this subjectivity is a factor in 88% of accidents at sea. This is why the purpose of this article is the synthesis of an algorithm for safe and optimal object control in situations where several objects pass by, using an artificial neural network and generating subjective domains of passing objects and the Bellman dynamic-programming method.

ARPA Radar Remote Anti-Collision Process Sensing
The purpose of the navigator's decision-support system is to propose a sequence of maneuvers that have been determined by anti-collision calculations. These calculations are carried out by a specific algorithm that determines a safe trajectory for the object on the basis of input data that describe the current navigational situation. If during calculations one of the encountered objects in question changes course or speed, it is necessary to repeat the calculations, which therefore last only a dozen or so seconds. Figure 1 shows that the parameters that are needed to perform these calculations are the parameters of the object.
Remote Sens. 2020, 12, x FOR PEER REVIEW 2 of 13 [16] proposed the parameter "time for a safe distance" after detecting dangerous objects as a potentially important parameter, accompanied by the display of possible evasive maneuvers.
Borkowski [17] presented acceptable solutions for altering the course range in compliance with the International Regulations for Preventing Collisions at Sea (COLREGs Rules). Liu et al. and Lisowski [18,19] proposed a different approach to preventing object collisions at sea, through use of gametheory methods to determine the safe trajectory of ships by considering elements of an indefinite nature in real navigational situations at sea. In previous works, the subjectivity of the navigator making a maneuvering decision in a collision situation has been omitted in the calculation algorithms, however this subjectivity is a factor in 88% of accidents at sea. This is why the purpose of this article is the synthesis of an algorithm for safe and optimal object control in situations where several objects pass by, using an artificial neural network and generating subjective domains of passing objects and the Bellman dynamicprogramming method.

ARPA Radar Remote Anti-Collision Process Sensing
The purpose of the navigator's decision-support system is to propose a sequence of maneuvers that have been determined by anti-collision calculations. These calculations are carried out by a specific algorithm that determines a safe trajectory for the object on the basis of input data that describe the current navigational situation. If during calculations one of the encountered objects in question changes course or speed, it is necessary to repeat the calculations, which therefore last only a dozen or so seconds. Figure 1 shows that the parameters that are needed to perform these calculations are the parameters of the object. Radar imaging of passing objects: X, Y-objects position coordinates; V, ψ-speed and course, respectively, of own object; Vj, ψj-speed and course, respectively, of j object; Dj, Nj-distance and bearing in relation to j object; Dj min, Tj min-distance and time for critical passing of objects.
The calculation process consists of downloading the needed data for the anti-collision calculations from ARPA, entering these data into the program that implements the selected safe object-control algorithm, and displaying the calculation results as an illustration of the designated route of the object and giving the value of its final deviation from the reference cruise route.
In addition to accessing the ARPA, the designed system includes an external microcontroller with specialized software that manages communication and performs anti-collision calculations. The Radar imaging of passing objects: X, Y-objects position coordinates; V, ψ-speed and course, respectively, of own object; V j , ψ j -speed and course, respectively, of j object; D j , N j -distance and bearing in relation to j object; D j min , T j min -distance and time for critical passing of objects.
The calculation process consists of downloading the needed data for the anti-collision calculations from ARPA, entering these data into the program that implements the selected safe object-control algorithm, and displaying the calculation results as an illustration of the designated route of the object and giving the value of its final deviation from the reference cruise route.
In addition to accessing the ARPA, the designed system includes an external microcontroller with specialized software that manages communication and performs anti-collision calculations. The integrated application that supports communication and determines the safe trajectory of the object was created in the MATLAB environment.

Dynamic Programming Algorithm with an Artificial Neural Network Procedure
The Dynamic Programming method with Artificial Neural Network constraints (DPANN) was then entered into the MATLAB command window (Figure 2). The first stage of the application's operation is the introduction of simulation parameters, such as maneuvering time, advance time, safe distance, deviation at which the speed reduction was to occur, and the percentage by which the speed was to be reduced. These parameters can be manually entered, or their default values can be set. The default values of these parameters set in the program were as follows: maneuver time = 3 min; advance time = 3 min; safe distance = 1 nautical mile; deviation from the set course at which the speed was to be reduced = 361 degrees; and the percentage by which the speed had been reduced = 30%. In the next step, communication with the ARPA was performed, and the necessary information to perform the anti-collision calculations was downloaded. Before transmitting the signal, the communication-port identifier to be used was determined.
The downloaded frames were analyzed in order to extract the parameters needed for the anti-collision calculations, and these selected parameters were saved in the appropriate format. When the transmission of all necessary data was completed, the communication port was closed, and the automatic transition of the program realized the anti-collision calculations. The last stage of the application's operation was the presentation of the calculation results. The application could then be terminated, or the whole process could be repeated.

Dynamic Programming of the Safe and Optimal Object Trajectory
In the basic problem of optimal control with a discrete time k, the quality index I is minimized: where x and u are the state and control, respectively, of the object, which is described by the state equation: with given initial and final conditions: and limitations for control variables and constraints of individual variables describing the state of the process: The essence of the method of discrete dynamic programming is the following Bellman recursive equation: where L represents the minimum value of the control quality index I; that is, in the safe ship control considered in the article, the minimum time needed to safely reach the nearest turning point on a given cruise trajectory. This allows determination of the optimal controls: Considering the description of the hydrodynamic properties of the sea object in the form of appropriate differential equations and then transforming them, the following mathematical relationships are obtained: where ψ.
V, u 1 = α r /α max , u 2 = n r /n max, α r is the reference rudder angle; n r is the reference screw speed; ψ is the object course; . ψ is the ship return speed; V is linear speed; . V is linear acceleration; (X 0 ,Y 0 ) is the position of the object; a 1 , k 1 , and k 2 are gain coefficients; and T 1 , T 2 , and T 3 are time constants of object dynamics.
The condition of the safe navigation of an object when meeting j other objects meets the restriction described by the following inequality: where g j is the function describing the shape of the domain of the encountered object (for example, circles, hexagons, parabolas, ellipses), (X j ,Y j ) is the position of the encountered object, and k is the period of time discretization, ensuring compliance with the following condition of safe object traffic control: where D s is the safe distance for approaching objects. The object domains, described by Inequality (8), take different shapes that are determined with the help of software, such as the Neural Network Toolbox of the MATLAB software [20][21][22][23]. To avoid collision, there are many possible safe-course values and object speeds, and the optimal solution is chosen from this set of possibilities [24][25][26]. The minimal-collision-risk criterion is achieved by satisfying Inequality (8), which is equivalent to the object not disturbing the domain of another encountered object. Then, out of the many possible safe trajectories of the object that do not violate the areas of the encountered objects, the one that ensures the smallest deviation from the route while safely passing the objects is chosen.
In practice, considering the dynamics of the object, most collision situations are avoided by changing the course of the object while maintaining a constant speed. Then, the smallest deviation from the route of the object during the safe passing of encountered objects is equivalent to controlling the time-optimal movement of its own object: The use of the principle of optimality allows calculation of the optimal strategy of the object, ensuring the minimal value of control purpose Function (9). According to this principle, regardless of initial state and control values, subsequent control decisions lead to optimal control associated with the first decision, and the appropriate calculations of the optimal solution run from the last to the first stage [27]. However, because the process of safe object control is subject to theoretical conditions of duality, the calculation of a safe object trajectory can be carried out from the first to the final stage [28,29].
The optimal value of object-movement time in the first stage is as follows: The optimal value of object-movement time in the first two stages is as follows: Finally, after mathematical derivation, optimal value of object-movement time in k stages is as follows: The value of object-movement time ∆t k in stage k can be determined as follows: where ∆L k is object travel distance from (x 1,k , x 2,k ) to (x 1,k+1 , x 2,k+1 ). The value of the optimal movement time along k stages depends on the location of the object at the (k−1) stage and control at the (k−2) stage. Moving from the first to last stage, Dependence (13) defines the functional equation for object steering by rudder angle u 1 and screw speed u 2 . Limitations ensure that a safe approaching distance is maintained. To generate recommendations for a maneuver, priority safety (Equation (9)) checks that the positions of the met moving objects are not within the state constraints represented by the domain of the object. The node that is inside the domain is excluded from further calculations of the optimal object trajectory. The entire DPANN calculation program for determining an object trajectory consists of two procedures: dynamic-programming object-route optimization and an ANN creating domains of encountered objects [30].
The optimal trajectory is mathematically derived from Bellman's principle of optimality. First, all possible safe transition paths from start to end point are determined; then, a trajectory is determined from them that provides the smallest value of time for the object to cover this road (Figure 3).

Artificial-Neural-Network Domains of Encountered Objects
The area of collision risk within an encountered-object domain was created in the ANN computer algorithm. In accordance with COLREGS rules for good visibility conditions at sea, the value of the safe passing distance of objects Ds = 0.5-1.0 nm (nautical miles) was assumed, and the obligation to give way to objects approaching the starboard was applied. Therefore, the domain was assigned to them in the form of hexagons, parabolas, or ellipses, and, for the remaining objects, the domain was assigned in the form of circles. In conditions of restricted visibility at sea, domains with radius Ds = 1.0-3.0 nm apply to all objects. The surface area of the domain, generated by the neural network, is a function of encountered object speed and risk of collision [31,32].
In the structure of the artificial neural network, there are six components of input quantity vector x and one component of output quantity vector y: y .
where the value 0.1 means a safe situation; 0.3 means attention; 0.5 means collision risk; 0.7 means a dangerous situation; and 0.9 means collision. The network learning process meets the following equation: .
where F is the activation function form, W(W1,W2,W3) are the weight factors, yk is the output variable of the artificial neural network, yek is the reference output variable of the artificial neural network,  is an average sum error of neural-network learning to the reference navigator-teacher's assessment of the navigational situation, and k is discrete time.
Vector components xk are information coming from ARPA, and vector yk components evaluate the collision risk value in proportion to the size of the domain area assigned to j encountered object. The structure of the considered artificial neural network contains three neuron layers; nonlinear activation functions are included in the first and second layers, and a sigmoidal activation function can be found in the third output layer [33][34][35][36].

Artificial-Neural-Network Domains of Encountered Objects
The area of collision risk within an encountered-object domain was created in the ANN computer algorithm. In accordance with COLREGS rules for good visibility conditions at sea, the value of the safe passing distance of objects D s = 0.5-1.0 nm (nautical miles) was assumed, and the obligation to give way to objects approaching the starboard was applied. Therefore, the domain was assigned to them in the form of hexagons, parabolas, or ellipses, and, for the remaining objects, the domain was assigned in the form of circles. In conditions of restricted visibility at sea, domains with radius D s = 1.0-3.0 nm apply to all objects. The surface area of the domain, generated by the neural network, is a function of encountered object speed and risk of collision [31,32].
In the structure of the artificial neural network, there are six components of input quantity vector x and one component of output quantity vector y: where the value 0.1 means a safe situation; 0.3 means attention; 0.5 means collision risk; 0.7 means a dangerous situation; and 0.9 means collision. The network learning process meets the following equation: where F is the activation function form, W(W 1 ,W 2 ,W 3 ) are the weight factors, y k is the output variable of the artificial neural network, y ek is the reference output variable of the artificial neural network, Σ is an average sum error of neural-network learning to the reference navigator-teacher's assessment of the navigational situation, and k is discrete time.
Vector components x k are information coming from ARPA, and vector y k components evaluate the collision risk value in proportion to the size of the domain area assigned to j encountered object. The structure of the considered artificial neural network contains three neuron layers; nonlinear activation functions are included in the first and second layers, and a sigmoidal activation function can be found in the third output layer [33][34][35][36].
The MATLAB Neural Network Toolbox software was used to design the ANN network, and an error-propagation algorithm with adaptive learning pace and momentum was used to teach it. Training data were prepared by simulating navigational situations and recording the corresponding expected network answers given by about 300 experienced navigators during ARPA training courses at the Officers Training Center of the Gdynia Maritime University in Poland. To ensure data accuracy, the network learning process was based on several standard scenarios for navigational situations at sea. For each situation, each navigator chose the best option according to his own opinion; that is, subjectively, in accordance with good maritime practice, he chose an anti-collision maneuver to change the course and/or speed of the ship. In this way, the learned network represents the average experience of a larger population of navigators.

Results
Simulation studies of the DPANN algorithm were carried out for three navigational situations that differed in the number of encountered objects. Situations were recorded using information from the ARPA on a research and training ship. Registered navigational situations in which an object was involved were considered safe if the safe passing distance was 0.5 nautical miles. For the computer simulation, to get a collision situation, the safe passing distance was increased to 1.0 nautical miles. In simulation studies, the trajectory of the object was determined in good visibility at sea, so the value of the safe distance of ships in these conditions was assumed to be D s = 1.0 nm (nautical mile). The bold part of the trajectory corresponds to a maneuver that reduced the propeller speed by 25%. The MATLAB Neural Network Toolbox software was used to design the ANN network, and an error-propagation algorithm with adaptive learning pace and momentum was used to teach it. Training data were prepared by simulating navigational situations and recording the corresponding expected network answers given by about 300 experienced navigators during ARPA training courses at the Officers Training Center of the Gdynia Maritime University in Poland. To ensure data accuracy, the network learning process was based on several standard scenarios for navigational situations at sea. For each situation, each navigator chose the best option according to his own opinion; that is, subjectively, in accordance with good maritime practice, he chose an anti-collision maneuver to change the course and/or speed of the ship. In this way, the learned network represents the average experience of a larger population of navigators.

Results
Simulation studies of the DPANN algorithm were carried out for three navigational situations that differed in the number of encountered objects. Situations were recorded using information from the ARPA on a research and training ship. Registered navigational situations in which an object was involved were considered safe if the safe passing distance was 0.5 nautical miles. For the computer simulation, to get a collision situation, the safe passing distance was increased to 1.0 nautical miles. In simulation studies, the trajectory of the object was determined in good visibility at sea, so the value of the safe distance of ships in these conditions was assumed to be Ds = 1.0 nm (nautical mile). The bold part of the trajectory corresponds to a maneuver that reduced the propeller speed by 25%.         Figure 6 shows the own-object trajectory and the control by means of rudder deflection and rotational speed of the propeller. a b  Figure 6 shows the own-object trajectory and the control by means of rudder deflection and rotational speed of the propeller.

Simulation of Own-Object Steering While Passing Sixty Encountered Objects
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 13 c d

Discussion
Traditionally, by using the ARPA, the navigator can select a single anti-collision maneuver, taking into account the movement of a dangerous ship, but must then check it for other objects and possibly change the selected maneuver. However, the use of the DPANN algorithm allows the calculation of the trajectory of one's own facility that provides the smallest deviation from the set cruise route as a sequence of subsequent necessary maneuvers with respect to all tracked objects.
Considering the different dynamics of the object when changing its course and speed, timeoptimal control of its movement is first achieved by means of rudder deflection. It is only when there is no possible safe course change that the speed of the object is reduced, which is realized by reducing the rotational speed of the propeller by 25%. Therefore, with the increase in the number of encountered objects, it is more difficult to find an optimal and safe trajectory of the object when only steering the rudder. The calculation time of the algorithm is between a few and a dozen or so seconds, depending on the number of encountered objects. However, the more objects encountered, the more unacceptable nodes in the dynamic programming grid that are rejected by the ANN procedure and the shorter the calculation time. This is one of the advantages of using Bellman's practical principle

Discussion
Traditionally, by using the ARPA, the navigator can select a single anti-collision maneuver, taking into account the movement of a dangerous ship, but must then check it for other objects and possibly change the selected maneuver. However, the use of the DPANN algorithm allows the calculation of the trajectory of one's own facility that provides the smallest deviation from the set cruise route as a sequence of subsequent necessary maneuvers with respect to all tracked objects.
Considering the different dynamics of the object when changing its course and speed, time-optimal control of its movement is first achieved by means of rudder deflection. It is only when there is no possible safe course change that the speed of the object is reduced, which is realized by reducing the rotational speed of the propeller by 25%. Therefore, with the increase in the number of encountered objects, it is more difficult to find an optimal and safe trajectory of the object when only steering the rudder. The calculation time of the algorithm is between a few and a dozen or so seconds, depending on the number of encountered objects. However, the more objects encountered, the more unacceptable nodes in the dynamic programming grid that are rejected by the ANN procedure and the shorter the calculation time. This is one of the advantages of using Bellman's practical principle of optimality-the more limitations, the more the search area of acceptable solutions decreases and the faster the optimal solution to the problem is found. Figure 7 shows the relative value of time needed for safe and optimal passing with met objects as a function of the reference value for safe passing distance in relation to various forms of domains assigned to encountered objects.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 13 of optimality-the more limitations, the more the search area of acceptable solutions decreases and the faster the optimal solution to the problem is found. Figure 7 shows the relative value of time needed for safe and optimal passing with met objects as a function of the reference value for safe passing distance in relation to various forms of domains assigned to encountered objects. Figure 7. Dependence of optimal time of own-object movement needed for safe passing of other objects as a value function of the previously adopted safe passing distance for shaped domains: 1, circle; 2, parabola; 3, hexagon; 4, ellipse.
The shortest time needed to safely pass objects is provided by an ellipse-shaped domain. Figure 8 illustrates the variability of the relative time value of the safe movement of the object for different distance values between nodes in the dynamic-programming grid. In practice, the degree of node-distribution density in the dynamic-programming grid of the object trajectory is a compromise between a long calculation time and determined-trajectory accuracy.

Conclusion
The task of optimizing the safe management of own-object movement in situations of passing many met objects was presented, which allowed us to formulate the following conclusions. The use of radar remote sensing to identify object movement parameters allows the synthesis of an appropriate algorithm to support the navigator in determining the safe trajectory of the object as a sequence of subsequent changes in its course and speed. Presentation of encountered-objects' movement in the form of moving neural domains of variable size, depending on the distance and time of approaching objects, reflects the navigator's subjectivity in the assessment of collision risk. The use of several hundred navigator officers to teach an artificial neural network causes it to The shortest time needed to safely pass objects is provided by an ellipse-shaped domain. Figure 8 illustrates the variability of the relative time value of the safe movement of the object for different distance values between nodes in the dynamic-programming grid.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 13 of optimality-the more limitations, the more the search area of acceptable solutions decreases and the faster the optimal solution to the problem is found. Figure 7 shows the relative value of time needed for safe and optimal passing with met objects as a function of the reference value for safe passing distance in relation to various forms of domains assigned to encountered objects. Figure 7. Dependence of optimal time of own-object movement needed for safe passing of other objects as a value function of the previously adopted safe passing distance for shaped domains: 1, circle; 2, parabola; 3, hexagon; 4, ellipse.
The shortest time needed to safely pass objects is provided by an ellipse-shaped domain. Figure 8 illustrates the variability of the relative time value of the safe movement of the object for different distance values between nodes in the dynamic-programming grid. In practice, the degree of node-distribution density in the dynamic-programming grid of the object trajectory is a compromise between a long calculation time and determined-trajectory accuracy.

Conclusion
The task of optimizing the safe management of own-object movement in situations of passing many met objects was presented, which allowed us to formulate the following conclusions. The use of radar remote sensing to identify object movement parameters allows the synthesis of an appropriate algorithm to support the navigator in determining the safe trajectory of the object as a sequence of subsequent changes in its course and speed. Presentation of encountered-objects' movement in the form of moving neural domains of variable size, depending on the distance and time of approaching objects, reflects the navigator's subjectivity in the assessment of collision risk. The use of several hundred navigator officers to teach an artificial neural network causes it to In practice, the degree of node-distribution density in the dynamic-programming grid of the object trajectory is a compromise between a long calculation time and determined-trajectory accuracy.

Conclusions
The task of optimizing the safe management of own-object movement in situations of passing many met objects was presented, which allowed us to formulate the following conclusions. The use of radar remote sensing to identify object movement parameters allows the synthesis of an appropriate algorithm to support the navigator in determining the safe trajectory of the object as a sequence of subsequent changes in its course and speed. Presentation of encountered-objects' movement in the form of moving neural domains of variable size, depending on the distance and time of approaching objects, reflects the navigator's subjectivity in the assessment of collision risk. The use of several hundred navigator officers to teach an artificial neural network causes it to interpret-in the computational algorithm of the domain of danger of met objects-as if it is done by a single experienced navigator. Analysis of possible domain shapes indicates the possibility of their adaptation to open or restricted waters. Node density in the dynamic-programming trajectory of the object is a compromise between calculation time and object-route accuracy.
The essence of the article is the use of an artificial neural network to map the navigator's subjectivity in the assessment of a collision situation, and the optimization method in the form of dynamic programming is only a tool in the synthesis of the entire calculation algorithm.
This work did not exhaust all issues related to the safe management of the movement of objects at sea. In subsequent studies, consideration of the safe object control sensitivity analysis to the inaccuracy of information from navigation devices, changes in object-dynamics parameters, and the impact of hydrometeorological disturbances should be undertaken.