Development and Research of a Multi-Medium Motion Capture System for Underwater Intelligent Agents

A multi-medium motion capture system based on markers’ visual detection is developed and experimentally demonstrated for monitoring underwater intelligent agents such as fish biology and bionic robot-fish. Considering the refraction effect between air and water, a three-dimensional (3D) reconstruction model is established, which can be utilized to reconstruct the 3D coordinate of markers underwater from 2D data. Furthermore, the process of markers matching is undertaken through the multi-lens fusion perception prediction combined K-Means clustering algorithm. Subsequently, in order to track the marker of being occluded, according to the kinematics information of fish, an improved Kalman filtering algorithm is proposed. Finally, the feasibility and effectiveness of proposed system are verified through experimental results. The main models and methods in this paper can provide a reference and inspiration for measurement of underwater intelligent agents.


Introduction
As a new kind of autonomous underwater vehicle (AUV) combined with propulsion mechanism of fish and robotics, the bionic robot-fish has been widely applied in water quality monitoring, scientific underwater exploration, oceanic supervision and fishery conservation [1][2][3][4], because of several advantages, compared with traditional AUV based on screw propeller, such as low energy consumption, low noise, high propulsion efficiency and high mobility [5]. Nevertheless, the bionic robot-fish still has a large gap in speed, propulsive efficiency and maneuverability in comparison with biological prototype. Therefore, in order to improve the swimming performance of robotic fish, exploring the locomotion patterns of fish through movement observation are the essential issues. Among the available techniques used in observation of fish swimming [6][7][8][9], the vision-based method is considered to be a simple, low cost and easily available one for high precision [10], and has attracted extensive attention from researchers. Budick used a high-speed camera (500 fps) to photograph the swimming and turning behavior of juvenile zebra fish to study the effects of nervous system on fish swimming [11]. Bartol used the Kodak high-speed camera (500~1000 fps) to shoot the movement of squid in the shallow water and obtained its swimming strategy at different speed [12]. For obtaining

Three-Dimensional Reconstruction Model
In order to study the effect of refraction on 3D reconstruction of markers, the process of light beam emitted from the marker to the camera is considered: the light emitted from marker reaches the water surface and be bent due to the different refractive index of water and air, then reaches the camera imaging plane through camera lens. The refraction process follows Snell's law: n air sin θ air = n water sin θ water (1) where n air and n water are the refractive indices of air and water respectively, θ air and θ water represents the angles of incidence and refraction respectively. The fish swims generally at the bottom of tank, which means the fluctuations in the water surface are small and can be ignored. Therefore, the model can be simplified into a refraction model that is shown in Figure 1. Where the P c is the position of the camera light spot, P m is the actual position of the marker point, and P w and P p are, respectively, the intersections of light and water and imaging plane.
Calibrated by camera, the P c (x c , y c , z c ) is known. The P p x p , y p , z p can be obtained by using Equation (2) based on its 2D coordinates u p , v p on the imaging plane and the camera parameters.
where the R 3×3 and t 3×1 are, respectively, orthogonal rotation matrix and translation matrix, all of which are the external parameters of camera. The K and α are the intrinsic parameters of camera. Meanwhile, the height of water surface is known and P w is on the extension line of P c P p , according to Snell's law and the following equations We can obtain Because the focal points of P c , P w , P m and P p projected on water surface are coplanar, the Equation (7) is satisfied.
Combining the Equations (5) and (8), and assuming that the horizontal plane is above the object, that is z w − z m ≥ 0, the equations of unknown parameters of x m , y m and z m can be expressed as a matrix and shown in Equation (9). where In general, the 3D reconstruction of a marker requires it to be captured by at least two cameras; therefore, the position of marker P m can be obtained by solving n simultaneous Equation (9) (n is the number of cameras which have captured the markers).

Markers Matching
Because of the light source, scattering and other issues, the markers captured by camera and displayed on screen have a certain deviation in color. Therefore, a larger number of marker colors is not better. With these considerations, three very different colors: red, green and blue, are selected for markers. Nevertheless, it will introduce another issue that there exist more than one of the same colored markers in corresponding frames by different cameras.
In this section, we firstly simplify markers matching process in the first frame body according to body structure of fish, and based on that, propose the matching method with forecast in subsequent frames, then classify the markers that have been matched through K-means clustering algorithm. Finally, the markers matching is accomplished.

First Frame
Take the case of two cameras, for example, because of the error, the two lights from same marker cannot be intersected in the water. As shown in Figure 2, the yellow line indicates the light in the ideal case and the red line indicates the light in actual measurement, hence the intersection point cannot be used to judge whether markers are matched. Despite the two lights being in different planes, their vertical distance is short, especially compared to the shortest distance of the optical path of unmatched markers. Hence, the shortest distance of the light can be used as the basis to filter the data. If the shortest distance of light is greater than the threshold, the markers must be unmatched. Because the general marker will be captured by more than two cameras, we can get many measurement points near the actual position of the marker. If there is more than a pair of measurement points in the vicinity, they must be matched. Due to the flat structure of fish, the markers are set on both sides of fish body and the cameras which have captured same marker are adjacent. To simplify the calculation, the candidate points of 3D reconstruction can be obtained by consecutively matching two adjacent cameras.
The markers matching process is shown in Figure 3. Assume that the intrinsic parameters and pose of each camera and 2D coordinates of each marker on the image are known. The flow chart demonstrates how the markers, originally in 2D coordinates, become the candidate points through the coordinate system conversion and data screening.

In Subsequent Frames
In general, the swim speed of the freshwater fish is within 120 cm/s, and the frame rate of the GoPro camera that we used in the experiment is 120 fps, which results in the moving range of the marker between adjacent frames being within 10 mm. Therefore, it can be regarded that the marker in the current frame is confined in a spherical area, the center of which is the marker in the previous frame, with a radius of 10 mm, as shown in Figure 4. Due to various factors such as speed, comparing with previous frame, the points in the current frame are not evenly distributed in the spherical area. We use Kalman Filtering, because of the speed, acceleration and other factors, to narrow the candidate area. This is shown in Figure 5.

Markers Classification
When the total number of markers is known, the K-means algorithm can be utilized to classify the candidate points, and then the spatial positions of the markers are finally obtained by Equation (9).
According to the characteristics of the data in this paper, we can select K candidate points as the K-means initial clustering center, and then calculate the distance between all candidate markers and K centers. If the candidate marker P i is closest to the clustering center C j , P i belongs to the C j cluster. Then recalculate C j , the new position of C j is the arithmetic mean of all the marker points belonging to this cluster. Next, recalculate C j , the new position is the arithmetic mean of all markers belonging to the cluster. Then update the candidate markers that are contained in each cluster again until the cluster is no longer changed. Because there may be a wrong point in the data we use, that is, non-matching points are also clustered in candidate points, we need to filter the points within the cluster. Calculate the distance between each point in the cluster and its cluster center. If all distances are less than 10 mm, then the cluster is considered to be unmistakable. If not, we consider that a wrong point exists. Then, remove the point which has farthest distance away from the center and recalculate the center position, repeat the method above until all distances between points and its cluster center are less than 10 mm.
It should be noted that, for the subsequent frames, taking the marker position of previous frame as the initial clustering center can improve the clustering accuracy and speed.

Marker Tracking
In the motion capture system, there will be a problem that the marker cannot be reconstructed because it is occluded or only captured by one camera, and at present, in most motion capture systems, it can be solved by employing the differential simulation method based on human body model where the adjacent joints are considered as rigid connections. However, the above method is not suitable for fish because the fish spine cannot be simulated with a rigid connection. According to the kinematic characteristics of fish, the body undulation model is related to the time and its motion period is unmeasurable in a short time. Therefore, we adopt the mean-variance adaptive Kalman filter to track the marker being occluded. Additionally, considering the movements of markers stuck to the fish body are consistent, the overall movement information is added to the filter link to improve the prediction effect.

Mean-Variance Adaptive Kalman Filter
In this paper, the system state vector includes the position, velocity and acceleration of the marker and is represented as ..
The state equation can be expressed by .. ..
where the a is the mean value of acceleration.
Assuming the sampling period is T, after discretization, the state equations at the time of t k is where the W k is the perturbing noise, which is white noise with normal distribution, the Φ and U are the state transition matrix and control vector respectively, which are shown as follows.
The observation equation is where H is the transfer matrix that represents the transformation relation between vectors of state and observation, V k is the observation noise.
Hence, the prediction equations of state and covariance are shown aŝ where the a is taken as a step of acceleration prediction. The equations of filter gain, state update and covariance update are summarized as follows: where the R k stands for the covariance matrix of observation noise.

Marker Tracking with Improved Kalman Filter
In order to improve the tracking precision, in this section, the kinematic model of fish is considered in the process of marker tracking and the improved adaptive Kalman Filter is proposed.

Kinematics Description of Fish
The general model of freshwater fish is in accordance with the push model of the carangidae. It is Illustrated by the example of bream, the body structure of which is shown in Figure 6. Along the longitudinal axis of body, the bream can be divided into three parts: the head, the trunk and the tail. When swimming, the head and trunk swing is small, and the propulsion movement is completed mainly by the tail. According to Videler's experiment [23], the propulsion movement consists of two parts: the body's fluctuation and the caudal fin translational swing. The movement of caudal fin is shown in Figure 7   Thus, the kinetic model can be established. If we consider the longitudinal direction of fish body as the X-axis and the lateral direction of body as the Y-axis, the body fluctuations can be described as: where x is the longitudinal displacement of the body, A is the amplitude of the caudal fin movement, y b (x, t) is the lateral displacement of body fluctuation of position x at time t, and ω, k are the wave angular frequency and wave number of body, respectively. The swing movement of caudal fins can be described as: where L b , α and ϕ are, respectively, the length of the fish, the strike angle, that is, the included angle between the swing axis and the center line of caudal fin, and the translational swing phase.

Improved Adaptive Kalman Filter
According to the body structure and movement model of the fish, it can be seen that, if the markers are attached to the fish, the swing amplitude of the fish side line is the largest, and the upper and lower areas of the line position are consistent with the movement frequency, and the fish body is symmetrical with respect to the dorsal fin. If the points are symmetrical, then their relative position does not change. Therefore, there is a constraint relationship between the position of the markers.
The number of vertebrae of Chinese Cypriniformes is 30 to 52, with an average of 39.5 ± 4.4. Assuming that the maximal angle between the two vertebrae is 2 • , then the two markers M and N satisfy the following relationship: where M k−1 N k−1 and M k N k are, respectively, the connection of marker M, N of last and current frame, l represents the actual distance between two markers, L is the fish length, and i is the number of vertebrae. Meanwhile, the marker is attached to the fish body and its movement is inseparable from the whole, whereas the integrated movement can also be obtained by the markers movement. Hence, to improve the prediction effect, the Kalman filtering results of the markers are corrected based on overall movement information, as shown in Figure 8. Therefore, Kalman filter can be improved aŝ whereM k|k−1 the is the estimation of the overall movement of fish and is the estimated coefficient of the overall motion. Figure 9 shows the spatial position and the position after Kalman filtering of a marker in the live fish experiment, where the blue one indicates the measurement position and green one indicates the position processed with adaptive Kalman filtering.

Error Analysis and Correction
In order to measure the accuracy of the underwater 3D reconstruction, we fix the calibration plate and establish the world coordinate system at the current position of it, then pour the water into the tank until the water level is higher than the height of the calibration plate. Subsequently, the calibration plate corners are reconstructed, and its spatial positions are obtained. Finally, the error of the underwater 3D reconstruction can be obtained by comparing the actual position with the position derived from combining different cameras.

Reconstruction Experiment
In order to capture the movement information of fish and meet the image clarity, a fish tank with size of 1 m by 0.5 m by 0.5 m, with a surrounding LED strip, is used and eight numbered cameras of GoPro Hero4 with an image resolution of 1980 by 1080 and frame rate of 120 fps are fixed on an aluminum alloy bracket in the peripheral area, three of which are arranged on the long side and one is arranged on the short side. The overall experimental platform is shown in Figure 10. It should be noted that the distribution of eight cameras are centrosymmetrical to enable shooting with a wider range. For the underwater error analysis, we fixed the calibration plate at the bottom of the fish tank and shot it from the air by the fixed camera, which set the world coordinate system with the plane of the calibration plate as Z = 0. Then we poured the water into tank, measured the water depth, and shot again. After that, the calibration plate corner can be reconstructed based on the underwater 3D reconstruction theory mentioned above. Figure 11 shows the result of underwater 3D reconstruction with water depth of 206 mm. The green plane in the figure represents the plane of the calibration plate and the positions of the intersection points are the actual positions of the calibration plate corners. It can be seen that combinations with a large error of underwater 3D reconstruction includes camera 4 and 8. Figure 11. Underwater 3D reconstruction with combinations of adjacent cameras. Figure 12 shows the positions of the reconstruction points of above partial combinations in the Y-Z plane, the reconstruction points are tilted at both ends of the Y axis, and cameras 5 and 7 are arranged in the positive direction of the Y axis, while cameras 1 and 3 are arranged in the negative direction of Y-axis, that is, the location where tilt is close to the camera. By analyzing the error of 3D reconstruction in water and in the air, we can find that the error in the water is significantly larger than that in the air. The reasons for this are as follows: (1) The principle of underwater 3D reconstruction is to determine the intersection of the optical path and the water surface with the position of camera and marker and to solve the position with least squares method based on the intersection, which weakens the ability of least squares method to reduce the error. (2) Because of the accuracy of process and other issues, the plane of calibration plate and the water surface can only be approximately parallel, which introduces error simultaneously. (3) As we all know, the error increases as the distance of the point and camera increase.

Normalization
In this paper, Equation (9) and the least squares method are used to calculate the underwater 3D reconstruction position. Moreover, the λ in Equation (9) is related to the camera position and the intersection of optical path of physical point and water surface. When y w is approximately equal to y c , λ will be very large. While the principle of least squares method is to minimize the sum of the squares of error, if the weighting of a term is too large, the measured point will be tilted to it. Therefore, we use the normalization method to correct Equation (9) and the corrected equation is as follows: After the normalization, the error when y w is approximately equal to y c is significantly reduced. Figure 13 shows the results of 3D reconstruction before and after normalization, it can be seen that when y is equal to about 120 mm, 3D reconstruction of the point is deviate from the actual position, while the deviation has been amended after normalization.

Correction Function
According to underwater 3D reconstruction theory, it can be deduced that the error is related to the water height, the camera position, and the marker position. Therefore, the following equation can be established to fit the Z-axis error: where k i = g i0 + g i1 x c1 + g i2 y c1 + g i3 x c2 + g i4 y c2 + g i5 x 2 c1 + g i6 y 2 c1 + g i7 x 2 c2 + g i8 y 2 c2 , [x, y, z] is the position of underwater 3D reconstruction, [x c1 , y c1 , z c1 ] and [x c2 , y c2 , z c2 ] are the positions of two cameras, which constitute the 3D reconstruction, in world coordinate system. The g ij is the ith coefficient of k i . The fitting results are as shown in Figure 14:

Results Verification
The verification method is as follows: fixing the calibration plate and verifying the validity of parameters of correction function at different water surface height. Figure 15 shows the comparison of errors before and after correction of the adjacent camera and we can see that the error is greatly reduced.
It can be concluded that before calibration, the maximum error is 8.45 mm and the average error is 2.06 mm and after correction, the maximum and average error are, respectively, 2.37 mm and 0.53 mm. The error distributions are shown in the Figure 16 and It is obvious that the error is significantly reduced after correction.

Implementation Settings
In this experiment, a carp about 0.5 kg that tied with 2 cm Velcro stuck with numbered markers is used as the experimental object. We numbered the markers attached to the fish and sorted it from the head to the tail. Figure 17 shows the pictures taken by camera 1 (left) and camera 5 (right) respectively, where marker 1 to 3 are located in the front part of the fish, 4 to 5 are located in the middle part of the fish, and 6 to 9 are located in the back part of the fish. Based on the key technology of multi-media motion capture, we use Visual Studio, OpenCV, OpenGL, DataBase and other tools to design and develop its software system for achieving the camera calibration, motion capture, 3D display and other function. The system adopts modularization design and it is divided into six modules: log, auxiliary function, calibration, correction, 3D reconstruction and 3D display. The system structure and the software interface are shown in Figures 18 and 19 respectively.

Data Analysis
After setting the world coordinate system, it starts the 3D reconstruction experiment of live fish: capturing the video of fish with 8 cameras simultaneously and reconstructing the markers attached to the fish body based on the image information with homemade motion capture software. The reconstruction result is shown in Figure 20. It displays the spatial positions of markers after 3D reconstruction taken from 0 to 700 frames with 100 frames separated, and the corresponding frame numbers by camera 6 with appropriate angle, where all markers are covered, is chosen as the comparison. According to the previous experimental data, we can get the motion trail of fish, which is shown in Figure 21, it can be seen from the picture that the tracks of marker 1 to 3 are relatively stable, while the tracks of markers 4 to 9 swing with a certain degree, and this is consistent with the swimming model of the Carangidae that the propulsion force is mainly caused by tail swing. Because fish swimming is a dynamic process, we take the relative movement between markers on the fish tail and markers on the front part of fish as a study object and select the motion direction of markers 1 to 3 as the motion direction of the fish. Then, the relative motion direction of each marker can be obtained, that is adjusting the overall fish swimming to the swimming along the X axis and observing the position of marker relative to it. As shown in Figure 22, the position P 0 and velocity direction can be determined by the marker 1 to 3. The position of each marker in the relative speed direction is the shortest distance from the axis of P 1 to P 0 . Figure 23 shows the distances between each marker and the axis, it can be seen that the markers of 5, 7, and 8 are similar in motion, all of which belong to the same side. Near the 50th frame, the motion direction of these three markers is opposite to that of the other four markers, which is because the axis is in the middle of the two markers sets; the markers on one side are away from the axis and the markers on other side must be close to the axis. At the same time we can see that when the fish is suspended in the water at 500th frame, the markers of 5, 7 and 8 are farthest from the axis, this because the markers of 1 to 3 are chosen as the basic markers, which leads to the result that the axis will be tilted to this side and the markers of the other side are farthest from the axis. Observing the data from 0 to 200 frames, it can be seen that the movement of the fish tail is similar to the sinusoidal motion, which also conforms to the tail swing of the carangid in Formula (21).  Because of the limitation of the experimental conditions, the system errors in our experiment are obtained by comparing the relative distances of calibrated markers which are arranged as in Figure 17. In order to achieve accurate results, both the motion of translation and rotation are analyzed, and the results are shown in Figure 24. The maximum error in translational motion is 3.84 mm, and the maximum error during rotation is 3.22 mm, which indicates that the error between the measured value and actual value is small.

Conclusions
In order to explore the locomotion patterns and swimming modes of fish for improving the swimming performance of bionic robot-fish, a multi-medium motion capture system based on markers is developed in this paper. Initially, a multi-medium 3D reconstruction model is established through the least square method. Due to the multi-colored markers used in system, there will be more than one of same colored markers in same frame by different cameras. Therefore, it is necessary to match the same colored markers. We firstly match the markers in the first frame according to the flat structure of the fish, amd subsequently, with the constrains of fish speed and others, we match the markers in subsequent frames, then accomplish it through the K-means clustering algorithm. Furthermore, because of the shooting angle limit and fish movement, there are some markers that are occluded or only captured by one camera, which cannot be reconstructed. To solve this, we adopt the mean-variance adaptive Kalman filtering algorithm and add the overall swimming information of the fish in it according to its kinematics model for improving the prediction effect. Finally, through the experimental method, the kinematics model of the fish swimming is verified and the feasibility as well as effectiveness of the proposed system are demonstrated. The main models and methods of this paper can provide a reference and inspiration for the measurement of underwater objects.