A Vision-Based Underwater Formation Control System Design and Implementation on Small Underwater Spherical Robots

: The ocean is a signiﬁcant strategic resource, and the insufﬁcient development and use of the ocean, as well as the increase in attention to the ocean, have led to the development of underwater robot technology. The need for in-depth marine exploration and the limitations of one underwater robot has sparked research on the underwater multi-robot system. In the underwater environment, weak communication is caused by the shielding effect of the seawater medium, which makes multi-robot systems difﬁcult to form. Hence, we combine the robot’s vision system with the leader-follower structure to form a vision-based underwater formation method, in which the visual solution serves as the control system’s feedback. By using three small underwater robot platforms, the proposed method is proved to be effective and practicable through underwater formation experiments. Furthermore, the coordination period and error of the control system are analyzed.


Introduction
A significant part of the earth and a significant strategic asset of the country is the ocean, which is rich in resources [1,2]. In contrast, humans cannot handle various tasks due to the complex and constantly changing underwater environment. Consequently, underwater robot technology has been used and developed widely in the fields of marine hydrology exploration, seabed resource exploration, regional search, and target detection in recent decades [3]. With an in-depth understanding of the ocean, new ocean exploration programs can put forward higher requirements for underwater robots, and many of these requirements cannot be achieved by a single underwater robot. More specifically, the underwater continuous working ability and equipment-carrying capacity of underwater robots are seriously limited by the current technical level. The capacity of a single robot to detect the environment is low, making it difficult to meet current requirements of high precision, high strength and high complexity. Thus, the concept of the autonomous underwater multi-robot system is on the rise [4,5].
An underwater multi-robot system refers to a system composed of multiple underwater robots that are isomorphic or heterogeneous, in which an individual completes specific tasks through some form of cooperation [6]. Compared with a single complex robot, underwater multi-robot systems have advantages including lower cost and design difficulties, higher efficiency, higher fault tolerance, and stronger robustness. However, the interaction of information is the premise for carrying out cooperative robotic work throughout the system, where the main means of underwater communication is through the underwater acoustic wave. Underwater acoustic communication has a high noise and a high lag, which will bring challenges to the coordination of individuals in the underwater multi-robot system.
Thus, in such a weak communication environment, the use of the vision system as a means of obtaining information has become a consensus among researchers worldwide [7]. Through the vision system equipment carried on the underwater robot platform, images of the surrounding individuals can be obtained, and their position and stance information can be analyzed through its own processor, so as to achieve the goal of a pseudo-information exchange, where some others are interested in the minimum data for information extraction [8]. At present, methods of control of the realization of the formation of multi-robot systems mainly include methods based on leader-follower structure [9,10], virtual structure [11,12], artificial potential fields [13], behavior-based [14,15], and path tracking. Due to the existence of zero potential energy in the artificial potential field method, it can lead to the local puzzlement of robots. Meanwhile, the path tracking method has certain defects in obstacle avoidance that may not be conducive to emergency situations. Furthermore, in swarm robotics, as with an animal swarm in the wild, one of the goals is to achieve and maintain the desired pattern. One of the possibilities for the team to reach this aim is to see what its neighbors are doing [16]. Thus, the behaviour-based method is very specific to local behavior, making it difficult to ensure the stability of the entire training control system [17]. Nowadays, the leader-follower method has achieved a significant theoretical formalization due to its simplicity and reliability [18]. Therefore, we adopt the leader-follower structure for the underwater formation control system.
In this paper, we proposed a vision-based underwater formation control system using the leader-follower structure, and physical experiments were implemented to prove that the proposed method can achieve three robot formation tasks, including the "V-shape escort" formation experiment and the "round-up hunting" formation experiment. The formation control diagram of the proposed method is introduced with the control law. The formation experiments design is abstracted from the actual underwater missions. Theoretically, this method can be extended to more robots and has practical value in real application situations. Furthermore, the coordinate time and the errors of the formation system are analyzed, and additionally, the long-time effect of the proposed method is discussed.
The whole structure of this paper is expanded as is. Firstly, related works are discussed in Section 2, and an underwater spherical robot platform is set up in Section 3. Secondly, an underwater visual positioning system is introduced to discuss the positioning principle and the binocular field of view, which is related to the judgment of motion, in Section 4. Thirdly, the multi-robot formation control system based on a visual system is designed in Section 5, and experiments are implemented to verify the effectiveness of our proposed method in Section 6. Then, the results are discussed in Section 7. Finally, we conclude our work in Section 8.

Related Works
At present, the control methods of multi-robot formation systems mainly include the methods based on leader-follower structures, virtual structures, artificial potential fields, behavior, and path following. For these four main training frameworks, specific control strategies are implemented differently.
Chen introduced a multiple autonomous underwater vehicle (multi-AUV) control strategy, which uses a fusion control strategy with a redistribution mechanism (RM) based on virtual structure and leader-follower formation characteristics [19], where the RM mechanism is used to allocate the following tasks of nodes and plan the path according to the situation of obstacles. Khoshnam studied three-dimensional platoon control where the prescribed performance function (FPP) method was used to limit the relative distance and angles between successive pairs during their motion. A robust neural network (NN), a hyperbolic tangent function and a dynamic surface control technology were simultaneously used to offer the controller. The system input refers to the relative distance and angles [20]. Gao designed the formation controllers based on the finite time observer using the timevarying ln-type barrier Lyapunov function (BLF) method. The objective of control was to conceive a non-linear control law with the desired relative distance and angle [21].
Liang presented a finite-time velocity-observed-based adaptive output-feedback trajectory tracking formation control for underactuated unmanned underwater vehicles (UUVs) with prescribed transient performance [22]. Xiang addressed a dedicated nonlinear path following a controller built on a Lyapunov-based design and the leader-follower strategy aiming at the control problem of inspecting underwater pipelines [23]. The output during the geometrical task was the relative co-ordinate compared to the trajectory, and the output during the speed control was the relative speed. Cao introduced a leader-follower formation algorithm to improve the efficiency of target hunting, where the task was assigned based on the distance between the autonomous underwater vehicle and the target, and the individuals with the same task were formed based on the leader-follower mode [24]. The system input was the coordinate and heading angle, while the output was the threedimensional velocity and the rotation speed around the Z axis.
He designed a decentralized adaptive formation controller where the dynamic surface control (DSC) technique is introduced to avoid the use of vehicle acceleration and present NN approximators to estimate uncertain nonlinear dynamics [25], where the output is the position of the robot. An adaptive image-based visual servoing control strategy was proposed following Lin's prescribed performance control methodology [26], where the adaptive control law estimated the inverse height between the optical center of the camera and the single feature point attached to the leader online. Han proposed an integrated relative localization and leader-follower formation control developed by combining the proposed relative localization scheme and a complex Laplacian-based formation control scheme [27]. The output relative positioning system was set as the input of the formation control system, which had an output of velocity and relative distance. Gao investigated a fixed-time leader-following formation control method for a set of AUVs with eventtriggered acoustic communications, where an event-triggering communication strategy was developed to govern the communications between leaders and followers [28]. Ai considered the leader-follower formation control problem for multiple quadrotors in the presence of external disturbances. An observer-based finite-time controller which aimed to reconstruct the leader's states for each follower was proposed based on an adaptive disturbance rejection approach [29].
Zhang developed a soft robotic fish swarm system with global vision positioning, where individuals can further coordinate and form a swarming system [30]. Zheng proposed an embedded architecture formation strategy for a group of turtle-inspired amphibious robots to maintain a long-distance-parameterized path based on dynamic visual servoing [31]. He proposed a path planning strategy to handle the application requirements of static and dynamic targets being rounded up with multiple robots and designed a controller based on the linear quadratic regulator method to realize static/dynamic target rounding up with multiple robots [32]. Richard presented a global alignment method to correct the dead reckoning trajectories of multiple vehicles to resemble the paths followed during the mission using the acoustic messages passed between vehicles [33]. Millan declared a control strategy for underwater formation consisting of a feedback H-2/H-infinity controller in combination with a feedforward controller based on the virtual leader approach [34]. Das proposed a new adaptive sliding mode control scheme for achieving coordinated motion control of a group of autonomous underwater vehicles with variable added mass [35]. Qi provided a distributed formation tracking controller for three-dimensional moving underactuated underwater vehicles (UUVs). The formation controller can be divided into two parts, where in the first part, the condition in which the formation controller must be satisfied was given, and in the second part, a decentralized formation controller was proposed, and a stability analysis based on small gain theorem was introduced [36].
It can be summarized that for the stability of the formation, feedback needs to be established to maintain the form and reduce errors. Different controller design methods have been adopted according to additional requirements in the formation control system because of the closed-loop feedback. For the design of the leader-follower formation control system, its control input often uses relative position, relative angle, planned trajectory, speed, angular velocity, etc. Compared with other structures, the leader-follower controller is more sensitive to relative position. The input of the controller of the leader-follower formation will change accordingly if the means of obtaining the relative position are different. In this paper, we will use the visual system combined with the leader-follower control strategy to form a formation control system.

Electronic System
In robot design, bionics has always been an important source of inspiration [37]. We are inspired by a mouse living in a pipe as the prototype to design a bionic mouse [38,39]. We use the characteristics of the human hand and upper limbs to complete the design of complex and delicate joints [40,41]. Additionally, the underwater robots are designed by referring to amphibious turtles and jellyfish, where the details of the robot platform are introduced in our previous work [42][43][44][45][46][47][48][49][50][51][52][53][54][55]. To better organize our paper, the development of our spherical robot platform will be briefly introduced. The laboratory has a total of three finished robots, where the current spherical robot has experienced three generations of development and improvement.
As for the structure design, the spherical robot has four water jets, which are distributed in "H" mode or "X" mode, where different distributions will lead to different motion characteristics. The "X" mode can achieve motion in three directions, X, Y, and Z, and the "H" mode can achieve the motion of course angle rotation, Y and Z. In this paper, we adopt the "H" mode. The new-generation robot has a promotion on the propeller, which is shown in Figure 1b. The diameter of the new generation of the robot is 350 mm, the weight is 7.74 kg, and a 3.66 kg counterweight is needed for the robot to be completely submerged in water to reach a suspended state. The overall structure design and the improved motor jet of the underwater spherical robot platform is shown in Figure 1, where Figure 1a shows the mechanical structure of the old-generation robot platform.  The electrical structure of the robot is the basis for the robot's motion control and vision system functions, which mainly include the main controller, the co-processing module, sensor module, the communication module, the drive execution system, and the power system.
In this paper, the NVIDIA Jetson TK1 development board is used as the main information processing unit of the robot, which has a quad-core arm cortex-a15 processor and 2G of running memory. The co-processing module is mainly composed of an STM32F103 controller and steering gear control board. The sensor module is mainly composed of the vision system, MENS sensor, and depth sensor, where the MENS sensor is the JY901 module, and the accuracy of its output angle is 0.01°. The drive execution system is mainly composed of water spray motors, steering gears, and electric regulators. The PWM control method is used to control the angle of the steering gear. The maximum torque of the steering gear at 7.4 V is 12.9 kg/cm. The electric regulator is used to control the brushless water spray motor. The power system is mainly composed of the lithium battery pack and step-up and step-down modules. The power supply system of the robot is composed of five lithium batteries, each of which has a rated voltage of 7.4 V and a capacity of 6600 mAh.
Strong and weak current isolation is carried out in the robot system, divided into control electricity and power electricity, respectively. The control electricity system comprises two lithium batteries, while the power electricity system comprises three lithium batteries. The control electricity controls the power supply through the optocoupler switch. During the power supply of control electricity and power electricity, the power supply of each part can reach the rated voltage through the step-up and step-down modules. The electrical structure design of the robot is shown in Figure 2.

Motion Control System Design
The underwater motion control of the small amphibious spherical robot mainly controls the force on each leg of the robot by controlling the rotation speed of the water jet motor, where the heading angle of the robot is taken as the feedback of the control system. This paper adopts the incremental PID control algorithm for good stability. Specifically, the course control of the underwater robot is realized through the vector synthesis of the forces on each leg. The increment of incremental PID control quantity is only related to the adjacent three feedback values, which reduces the cumulative error. The characterization of the incremental PID needs historical data. Therefore, the steering gear and water jet motor have to memorize the control quantity. Such control theory has little impact on the system in case of problems. The equation of the incremental PID control theory is where K p represents the scale factor, K i represents the integral coefficient and K d represents the differential coefficient.
and T is the sampling period. The control quantity executed during the movement of steering gear and water spray motor is where u(k) is the increment of the control quantity, u(k) is the control quantity at this time, and u(k − 1) is the control quantity at the previous time.
The robot measures the heading angle of the robot in real-time through sensors, uses the deviation of the heading angle as the feedback of PID, and changes the heading angle of the robot by controlling the rotation speed of the robot water jet motor. The block diagram of the heading angle control is shown as follows, where the expectation value is the course angle θ and the control quantity u(k) is ω. The block diagram of the heading angle control is shown in Figure 3.

Relative Positioning Principle
To achieve the position and stance information of the leader, a relative underwater positioning system is discussed, where a binocular camera is used to obtain the threedimensional information.
The principle of binocular vision is parallax and triangulation, which obtains the target information through two cameras and makes a corresponding solution for calculating threedimensional information. The matching process consists of matching the points of the image characteristics to reduce the amount of calculation by reducing the number of corresponding pixels. The three-dimensional position of the robot can be reached simply by matching the pixels at the center.
As shown in Figure 4, the left camera and the right camera form a binocular camera. The coordinate of point P under the earth coordinate system O w − X w Y w Z w is (X w , Y w , Z w ), and the coordinate under the left camera coordinate system The corresponding mapping points under the image coordinate system of the left camera O 1 − u 1 v 1 and the right camera O 2 − u 2 v 2 are p 1 (u 1 , v 1 ) and p 2 (u 2 , v 2 ), respectively. According to the imaging principle of the camera, the process of converting the world coordinate system to the camera coordinate system of the point P in the left camera and the right camera can be expressed as Equations (3) and (4), where R 1 and t 1 represent the rotation matrix and translation vector of the left camera, respectively. R 2 , t 2 represents the rotation matrix and translation vector of the right camera, respectively. We compare Equations (3) and (4) to obtain the relationship between the left camera and the right camera and use a similar expression to obtain Equation (5). Based on the imaging feature, the central position of the light in the camera's optical system is the far point O C . The far points of the left camera and the right camera are O C1 and O C2 , respectively, and the components on the optical axis are Z c1 and Z c2 . The relationship between the corresponding mapping point p 1 (u 1 , v 1 ), p 2 (u 2 , v 2 ) and the coordinate point P(X w , Y w , Z w ) is as Equations (6) and (7). R 12 , t 12 is the rotation matrix and translation vector between the left camera and the right camera, respectively, and where M 1 and M 2 are the projection matrix of the left camera and the right camera under the world coordinate system O w − X w Y w Z w . α x1 , α y1 , u 1 , v 1 are the internal parameters of the left camera, and α x , α y , u 0 , v 0 are the internal parameters of the right camera. The internal parameters can be obtained from the binocular camera parameter manual. Assume that the coordinate system O c1 − X c1 Y c1 Z c1 of the left camera coincides with the the left camera has a translation vector of t 2 = ( 0 0 0 ) T . The parameters of the camera are ideal parameters, which are different from the actual camera parameters. Consequently, we have to calibrate the camera to obtain the exact parameters. Thus, the rotation matrix of the right camera Comparing Equations (6) and (7) and eliminating Z c1 , Z c2 , we can obtain Equation (8). Suppose that K is the parameter matrix and U is a nonhomogeneous term; then, Equation (8) can be simplified into Equation (9), The least squares solution of Equation (9) is Thus, the target's three-dimensional coordinate P(X w , Y w , Z w ) can be calculated. Therefore, a positioning system based on the visual system is achieved. In particular, the relative positioning method discussed in this paper is completed under natural light sources instead of artificial light sources, such as LED. If there is an additional light source emitter on the robot platform, one can refer to the content about the positioning method with an LED light array [56,57].

Underwater Camera Calibration
The current camera calibration methods can be divided into three categories: active vision-based, camera self-calibration, and conventional camera calibration methods, where the camera makes specific movements on a high-accuracy platform based on the active vision calibration method. Using camera movement parameters and collected images for calibration has the advantages of high robustness and the disadvantages of high calibration cost. The camera self-calibration method uses the relationship between the matching points of the image collected in camera motion for calibration. It is suitable for occasions with low precision demands and has low robustness.
The classic camera calibration method uses high accuracy markers and the corresponding relationship between images and markers to perform the calibration. It is suitable for situations where the camera settings are no longer changed. Its advantages are high calibration accuracy, while its disadvantages are that it cannot be calibrated in real-time and is not suitable for scenes where calibration objects cannot be placed.
In this article, Zhang Zhengyou's calibration method is used, which is the most commonly used traditional camera calibration method. During calibration, it is necessary to collect the images of the calibration plate at different angles, extract the corners of the calibration plate image based on the Harris algorithm, and further locate the corner information at the sub-pixel level. According to the corner information obtained, the camera's parameters are calculated.

Analysis of Binocular Field of View
The underwater wide-angle system of the robot binocular system serves to detect the underwater environment, where we adopt waterproofing measurements on the shell.
Because of the light refraction, the large angle of the binocular camera observing the target underwater becomes smaller. Figure 5a shows the schematic light pattern from the maximum incident angle in the water to a given edge of a binocular camera. We assume that the wide angle of the camera is θ, γ = θ 2 , the maximum incident angle of the water light to the waterproof glass foil is α, the maximum refractive angle within the planar glass sheet is β and thus, the maximum incident angle of the glass foil in the air is β as well. It can be seen that the maximum refraction angle between the waterproof flat glass and the air where the camera is located is γ. The following Formula (11) can be obtained depending on the refracting law.
n water ×sin α = n glass ×sin β n glass ×sin β = n air ×sin γ (11) where n water , n glass and n air represent the refraction index of the water media, glass media and air media, respectively. The formula can be translated into Formula (12), The air refractive index is about 1, whereas the water refractive index is 1.333. It can be calculated that the binocular camera used in this paper has a wide angle of 105 degrees, where γ = 52.5 degrees. Therefore, α = 36.5243 degrees. The camera has a field of vision of about 73.0486 degrees. Figure 5b depicts the top view of the underwater visual field of the camera. Analysis in the binocular field of vision is linked to the feedback process for designing the vision-based formation system.

Multi-Robot Formation Control System Design
Usually, the multi-robot system is complemented by the interplay of information such as the kernel. Underwater communication is generally carried out through underwater acoustic communications. However, underwater acoustic communication differs from terrestrial wireless electromagnetic communication, which has a slow propagation speed, massive lag, instability, and energy consumption. Although many multi-robot formation al-gorithms exist in a good communication environment, it is challenging to train underwater multi-robots in practical applications in this weak communication condition.

Formation Structure Design
In the case of robot training, most research is based on the exchange of information or simulation between robots. Not only is the algorithm complex, but because of the constraints of the underwater environment, it is not easy to communicate large-scale data in real time between robots wirelessly [58]. In this paper, we would like to solve the problem of formation difficulty in weak communication utilizing visual sensing, design the formation structure by using the series hierarchical formation system similar to the hierarchical structure, and use the leader-follower method to control the formation. Visual sensing means exchanging information through an optical system where the transmission can be achieved through the vision system without needing large-scale equipment [59]. Real-time communication realizes the regular operation of the formation of the robot by controlling the position information of the following robots and leaders.
Each robot is regarded as a rigid body in the underwater formation design of a small bionic spherical robot. Depending on the tasks involved, the size and shape of the robot training may be designed, which means the robot's formation determines the robot's distance and orientation in the formation.
The classic leader-follower formation control strategy is mainly used in two-dimensional planes [60]. More specifically, two methods are used to achieve formation control. One is an angle-distance feedback control method, and the other is a distance-distance feedback control method. The angle-distance-based feedback control method utilizes the deviation of the angle and distance between the follower robot and the leader robot to achieve the follower's motion to maintain the queue. The distance-distance-based feedback control method uses the deviation of the distance between the follower and the two leader robots to achieve the follower's movement to maintain the queue. The follower robot only needs to follow the leader, thus reducing resource allocation.
In this paper, we use a cascading approach to achieve the piloting of the robot to follow the formation. Each follower has a leader, and the entire multi-robot formation has only one leader. Figure 6 is the flow chart of the following actions the follower took in the formation. The overall process can be described as the robot follower n + 1 searching for the target and tracking the target's movement through the visual system. The binocular positioning method is adopted to calculate the three-dimensional coordinate, which is the feedback of the motion control.

Modelling and Control of the Vision-Based Formation System
As this paper adopts a vision-based leader-follower formation control model, one pair of leader-follower formation control models can be used to study the model of the whole formation system control model.
To better present the scheme of the leader-follower, a kinematic equation is set up for a leader-follower pair, and a simple leader-follower configuration is displayed in Figure 7 [61,62].
where (x, y, z) represents the position of each robot, and θ is the orientation with respect to the world coordinate. v represents the velocity of the robot, and ω represents the angular velocity of the robot. The leader robot has a configuration vector [ x L y L z L θ L ] T , while the follower robot has a vector of [ x F y F z F θ F ] T . Thus, the kinematic model of the leader robot and the follower robot can be obtained, which is similar to Equation (13). The control inputs of the follower are the declination distance and the declination course angle with respect to the leader, u F = [ v F v Fy ω F ] T , respectively. v F represents the velocity on the XOZ plane of the follower. v Fy represents the velocity in the depth. ω F represents the angular velocity of the course angle. The two-robot system is transformed into a new set of coordinates where the state of the leader is treated as an exogenous input. The outputs of the system are the position of the leader, for the moving process of the follower will inevitably cause the relative positioning change of the leader. The kinematic model can be written as follows: The kinematic model in the case of n followers can be obtained by extending Formula (13). In this case, the input vector can be rewritten as u = [ u 1 u 2 ...  In particular, the three-dimensional positioning of the leader robot in its coordinate system is achieved by the follower robot's vision system. Through the three-dimensional coordinates of the leader robot, various relative angles and distances of the follower robot and the leader robot are calculated. In the three-dimensional plane, offset feedback information enables the follower robot to follow the leader robot. Figure 8 shows the location information of the leader robot in the follower robot's vision coordinate system. The point P under the world system O w − X w Y w Z w is (X w , Y w , Z w ), which is also the center of the leader robot and the coordinate under the follower's visual system , where the visual system is also the follower's left camera coordinate and can be expressed as O c1 − X c1 Y c1 Z c1 , is (X c1 , Y c1 , Z c1 ). According to that, the left camera coordinate is coincident with the world system, andthe point P under the left camera can be expressed as (X L , Y L , Z L ), which is considered to be the center of the leader robot. Thus, the distance between the follower robot and the leader The angle relationship between follower robot and the leader robot is tan θ = X L Z L , tan β = The control block diagram of the proposed algorithm for formation is as shown in Figure 9. In the three directions of the follower robot, its declination angle and offset with the leader robot are used. The PID control algorithm is used to realize the three-direction control, whose structure is similar to the structure shown in Figure 3. According to the diagram of the control system, the course control is implemented using the declination θ and the offset d x . Depth control is achieved using declination β and offset d y . Velocity control is achieved using the bias moment d z and the offset distance d l . The declination and offset of the leader robot with respect to the follower robot are α and d l . Thus, the three controllers can be expressed as By substituting Equations (15)-(17) into Formulas (1) and (2), the control law can be obtained as Equations (18)- (20), After determining the state information of the leader robot, the further problem to be studied is whether the error of the position and the angle converge to zero. If they converge to zero, it is proved that the following robot tracks the position of the leader robot in real-time. Because we use PID controllers, the whole system is stable when its open-loop transfer function is stable. In actual engineering practice, it is generally stable. Therefore, its stability will not be repeated in this paper. We reckoned that the whole system is sound.
Additionally, the declination angle between the leader robot and the follower robot's binocular camera coordinate system along the Z-axis depends on the robots' formation requirements because the direction of the force when the follower robot's main course angle is the same as the direction of force when the leader robot is in the main course angle. When the follower robot is following the leader robot, each follower robot can maintain the following and the formation of the main robot by adjusting its course angle and speed so that the formation of the multi-robot system can be realized. Distance and orientation are provided by the vision system's resolution information, and the leader robot is kept in a fixed position through the control of depth direction, course angle, and velocity. Therefore, the coordination period of the formation is determined by the visual system's frame rate and the robot control frequency.

Underwater Motion Control Experiment
In order to verify the stability and controllability of the new generation robot platform, we carried out a control experiment of a single robot in a laboratory pool, whose size is 3 m × 2 m × 1 m, with a water depth of about 0.5 m. The motion control experiment was divided into two small experiments. The first experiment was regarding the linear motion of the spherical robot, and the second experiment was on the rotation motion of the spherical robot. In the linear motion experiment, the real-time feedback of heading angle was used as the control quantity through the MENS sensor.

Linear Motion of a Single Robot
The spherical robot was controlled by the mentioned method. The heading angle was set as a fixed value of 71°. The robot started to move from the position where the heading angle was 0 until the robot makes a linear motion in the set fixed heading angle, as is shown in Figure 10. It can be concluded that the rise time for the linear movement control is about 8 s.

Rotation Motion of a Single Robot
To achieve the anti-loss mechanism in the formation system, the robot needed to be controlled to achieve the expected rotation movement. During the rotation motion control experiment, the water jet motors in the spherical robot's left front and right rear were controlled to make the robot rotate with an approximate radius of 0, as shown in Figure 11.

Underwater Visual System Experiment Underwater Camera Calibration Experiment
Camera calibration is the basis for visual system utilization, where the parameters of the camera to calculate the three-dimensional coordinates of the target can be obtained. The binocular camera parameters contain each camera's intrinsic parameter matrix and the distortion coefficient matrix, the translation vector, and the rotation matrix for the two cameras. The whole calibration process is aimed at obtaining the internal and external parameters of camera 1 and camera 2.
This paper adopted the most widely used Zhengyou Zhang calibration method in the traditional camera calibration method [63,64]. The Zhengyou Zhang calibration method is easy to implement, has a simple calibration process, and is high in precision. To achieve accurate calibration of the binocular camera, this paper used the Matlab 2016 calibration toolbox for calibration. Firstly, the underwater images of the calibration board were simultaneously acquired by the left and right cameras of the binocular camera, where 103 pairs of images of the underwater calibration board were collected through the underwater camera calibration experiment. Figure 12a represents the physical status of one couple of pictures of the right camera and the left camera. Figure 12b shows the distribution of the calibrated board acquired. The overall mean error is 0.06 pixels. The results are listed in Table 1.

Underwater Formation Experiments
In order to further verify the effectiveness of the proposed method, this paper conducts experimental verification on a formation of two or three robots. Because of the limitation of the pool, there was one stable individual during the experiments with three robots. The formation experiment was abstracted from actual underwater formation missions, which has more significance for practical applications.

Parameters
Camera 1 Camera 2

Underwater Vision-Based Ranging Experiment
In order to facilitate distance measurement, we used a yellow submarine model, which had a length of 10 cm, to perform real-time positioning and distance measurement. As shown in Figure 13, when distance measurement was performed, the recording of the scale distance was performed every 10 cm. Figure 14 shows the distance of the binocular camera's measurement compared with the ruler. The root-mean-square error was used to measure the relationship between the distance measured by binocular camera vision and the distance measured by the ruler.
where d i andd i are the distance between the object measured by the vision system and the ruler, respectively. The root-mean-square error in the underwater ranging experiment was 11.29 cm when the maximum of the distance was 160 cm. The approximate mean error rate was 7%. It can be concluded from Figure 13 that during the experiment of continuous positioning, the error mainly occurred in the return phase of distance from large to small. We speculate that the error may be caused by the change in the moving direction of the hand-held target as it left and returned, which produced ripples in different directions; the diffusion of such ripples brings optical interference. At the same time, it may also be because the target trembles during hand-held movement, which brings unexpected offset in the other directions. However, the magnitude of this error is acceptable. In the process of formation, we can reduce the impact of this error by building a closed-loop feedback regarding position.

Dynamic Straight-Line Formation Experiment
This experiment included two robots, where the leader was dynamic. The improved previous-generation underwater spherical robot was set as the leader robot, and the newgeneration amphibious spherical robot was the follower robot. The leader robot performed a linear motion in a particular direction, and the follower robot navigated under the guidance of the leader robot to form the leader-follower formation structure. Figure 15a, 15b, 15c and 15d show the status of the leader and follower robots captured on the global camera above the pool at 0 s, 5 s, 10 s and 15 s, respectively. Figure 16a, 16b, 16c and 16d show the state of the leader robot tracked by the binocular camera at 0 s, 5 s, 10 s and 15 s of the follower robot, respectively. In the process of linear motion, Figure 17a shows the course angle of the leader robot measured by the binocular vision system of the robot compared with the course angle change of the follower robot measured by the MENS sensor. Figure 17b shows the course angle transformation of the leader robot measured by the MENS sensor carried by the leader robot.
In Figure 17a, it can be seen that the leader robot's course angle obtained by following the robot's binocular vision system is near 90 degrees. That is, the leader robot was following the binocular vision system of the robot. In order to ensure that the leader robot is following the robot's field of view, the follower robot adjusted itself. The course of the leader robot measured by the following robot is opposite to its course change trend, which ensures that the following robot keeps following the leader robot. The MENS measured data and the measured binocular data can represent the following robot's heading angle and the leader's heading angle, respectively. The opposite trend means that the heading angles of the two robots are gradually approaching the same, which also means that the positions of the two robots are getting closer. (a) Data measured by the follower (b) Data measured by the leader Figure 17. The course angle measured by the vision of the follower robot and the leader robot compared with the self course angle measured by the MENS sensor over time. Figure 18 shows the leader robot's relative distance and relative three-dimensional coordinates on the respective coordinate axes measured by the binocular vision system of the follower robot in the formation process. As can be seen in Figure 18a, the distance between the follower robot and the leader robot in the Z-axis direction gradually becomes stable. Figure 19 is the frame rate of the follower robot vision system. During the whole movement, it can be seen that the frame rate of the binocular vision is stable at about 15 frames per second, which means it is feasible for application in actual underwater formation missions. It should be noted, though, that 15 frames per second is high, which may cause a lot of computational overhead. However, this visual frame rate will bring benefits that cannot be ignored in the visual feedback formation process. One of the necessary conditions for the proposed formation method of the leader-follower structure is that the follower can solve the relative position coordinates, which means that the follower must be able to detect and track the leader within its perspective. As shown in Figure 6, if the detection and tracking behavior fails, it will need to re-detect until the leader can be observed. If the visual frame rate is low when the leader's speed is relatively high, it is easy for the leader to go out of the scene from the follower's perspective, and the target may get lost, such that the formation process is easy to challenge. Therefore, even if the frame rate is 15 per second and the calculation cost is relatively higher than a lower frame rate, the increase in cost is still worthwhile.

"V-Type Escort" Formation Experiment
The "V-type escort" is rooted in cruiser escorts. In this experiment, the previous prototype was set as the first-level leader robot, which was in the stationary state, and the improved previous-generation underwater spherical robot, which was numbered No.2, and the new-generation amphibious spherical robot, which was numbered No.1, served as the follower robots. The following robots approached the leader robot at the same time, and the three robots maintained the "V-type" formation structure.
As shown in Figure 20a, 20b, 20c and 20d, the leader robot and the follower robots were captured by the global camera above the pool at 0 s, 3 s, 6 s and 9 s, respectively.  Figure 23a shows the course angle of the leader robot measured by the No.1 follower robot vision system compared with its own heading angle measured by the MENS sensor. Figure 23b shows the course angle of the leader robot measured by the No.2 follower robot vision system compared with the own course angle measured by the MENS sensor. The MENS sensor itself is affected by the magnetic field and causes the data to suddenly change when t = 2 s, which is rooted in the susceptibility of the chosen JY901 IMU module. Compared with the six-axes IMU module, JY901 is a nine-axes module. It has three more magnetic field sensors in the three directions, which makes it more sensitive to changes in the magnetic field and makes it easy to generate magnetic interference.
As is shown in Figure 23b, it can be speculated that the water jet motor may cause some bad magnetic field during propulsion, which interferes with IMU data. Figure 24a  In the leader-follower formation, the No.1 follower robot moved to the leader robot. At the same time, the No.2 follower also moved to the leader robot, proving that two follower robots can maintain the "V-type" formation with the leader robot through this experiment. The "round-up hunting" formation is designed for the ideal underwater capture task, where small robots keep approaching the target and blocking its escape route. In this experiment, the previous generation prototype was set as the leader robot, which was static. In contrast, the new-generation amphibious spherical robot and the improved previous-generation underwater spherical robot served as follower robots. The follower robots approached the leader robot simultaneously, then encircled and caught the leader robot. The new-generation amphibious spherical robot was numbered as No.1, while the improved previous-generation underwater spherical robot was numbered as No.2.
As shown in Figure 27a, 27b, 27c and 27d, the state of the leader robot and follower robot was captured by the global camera above the pool at 0 s, 3 s, 6 s and 9 s, respectively. Figure 28a, 28b, 28c and 28d show the state of the leader robot tracked by the binocular cameras on the No.1 follower robot at 0 s, 3 s, 6 s and 9 s, respectively. Figure 29a, 29b, 29c and 29d show the state of the leader robot tracked by the binocular cameras on the No.2 follower robot at 0 s, 3 s, 6 s and 9 s, respectively. Figure 30a shows the course angle of the leader robot measured by the No.1 follower robot vision system compared with its course angle measured by the MENS sensor. Figure 30b shows the course angle of the leader robot measured by the No.2 follower robot vision system compared with its course angle measured by the MENS sensor. Figure 31a and 31b are the measured distances of the No.1 follower robot and No.2 follower robot, respectively. Figure 32a and 32b are the three-dimensional coordinates measured by the vision system of the No.1 follower robot and the No.2 follower robot, respectively. Figure 33a and 33b are the frame rates of the No.1 follower robot and No.2 follower robot, respectively. In the leader-follower formation, the No.1 follower robot moves to the leader robot. In contrast, the No.2 follower robot moves to the leader robot simultaneously, proving that two follower robots can "round up" the leader robot at the same time through this experiment.

Discussion
Because this paper adopts a cascade leader-follower structure to follow the formation strategy, it is necessary to study a pair of leader-follower robot systems to achieve the formation of multiple robots.
Two factors mainly cause the error of the underwater robot formation system: one is the robot motion control precision, and the other is the binocular vision system of the robot. The accuracy of motion control is generally accomplished by adjusting the frequency of control movement and the parameters of PID controllers. In the formation experiment, the time interval of motion adjustment is about 50 ms. The errors caused by binocular vision systems mainly come from two aspects: the error of binocular measurement and the frame rate of the binocular system. The frame rate of the binocular system is about 15FPS. In particular, we have to mention that the visual frame rate and control frequency are closely related to hardware conditions, and both need to consider the computational cost and overhead of the whole system. Theoretically, increasing the control frequency and frame rate will reduce the error, but from the perspective of robot manufacturing, this improvement is limited, especially in the design of small underwater robots, where there are multiple restrictions on the selection of hardware, which may bring restrictions on computing capacity, power, battery capacity, and so on.
The measurement error of the binocular system is mainly rooted in the camera calibration and the camera shaking during the tracking process. It can be concluded from the underwater ranging experiment that the mean error is mostly 11 cm when the maximum distance is 160 cm, with an approximate mean error rate of 7%.
Due to the particularity of the underwater environment compared with the terrestrial environment, its light transmittance is poor, its image is easy to twist and deform, and the transparency of the water body and the plankton content will affect binocular measurements. However, this effect is relatively slight under short distance conditions, such as the range under the laboratory conditions. In practical application, once the distance between two underwater robots is too far, the visual measurement effect will be significantly reduced, such that the long distance not only declines the image quality but also affect the similar triangles set up during binocular ranging, where the two sides of the triangle can be reckoned as approximately parallel when the target is too far.
In addition, we have noticed that in other people's work, without using additional markers such as lasers, the maximum distance to complete underwater visual relative positioning is beyond 3 m [65,66]. Therefore, in practical application, without the aid of other auxiliary optical equipment, it is acceptable to study the formation of centimeterlevel positioning and complete visual feedback. In this paper, we can achieve 20 cm to 200 cm ranging and positioning by utilizing the binocular system in the laboratory pool environment. We can also theoretically achieve underwater formations in the range of 200 cm.
Additionally, the current time scale of the experiment is about 20 s, and the time should be expanded to understand how it works over a long distance. Frankly speaking, compared with large-scale underwater robots, small-scale underwater robots generally work for less than one hour due to the limitations of their power supplies. At the same time, due to the particularity of formation tasks, potential obstacles cannot be ignored when moving long distances. Therefore, the proposed visual feedback leader-follower formation may be challenged in long-term and long-distance formation practical tasks. These challenges may come from potential movable obstacles or the MENS module's cumulative error. However, the MENS sensor JY901 chosen in this paper has high accuracy and a built-in filter module, so the accumulated error in long-time use is smaller than other ordinary sensors. Therefore, it is feasible for general time formation tasks.

Conclusions
In this paper, a vision-based formation control method based on the leader-follower structure is designed. The visual system is set as the consensus means for positioning and measurement, where the field of view on the binocular camera is analyzed, and the visual solution quantities are calculated and set as the input of the formation control system to achieve the vision-based formation control. The control diagram of the formation control system and the control law is discussed. Meanwhile, underwater formation experiments, including the "Dynamic Straight-line" formation experiment, "V-type Escort," and "Roundup Hunting," which are abstracted from actual underwater missions, are implemented. Furthermore, the coordinate time as well as the error of the formation system, is analyzed. The error of the underwater robot formation system is mainly caused by the robot motion control precision, evaluated during the motion control experiments, and error measured by the binocular vision system, which the vision-based ranging experiment has discussed. Additionally, the long-time effect of the formation is discussed to demonstrate the proposed method's effectiveness and practicability.