Detection of the Deep-Sea Plankton Community in Marine Ecosystem with Underwater Robotic Platform

Variations in the quantity of plankton impact the entire marine ecosystem. It is of great significance to accurately assess the dynamic evolution of the plankton for monitoring the marine environment and global climate change. In this paper, a novel method is introduced for deep-sea plankton community detection in marine ecosystem using an underwater robotic platform. The videos were sampled at a distance of 1.5 m from the ocean floor, with a focal length of 1.5–2.5 m. The optical flow field is used to detect plankton community. We showed that for each of the moving plankton that do not overlap in space in two consecutive video frames, the time gradient of the spatial position of the plankton are opposite to each other in two consecutive optical flow fields. Further, the lateral and vertical gradients have the same value and orientation in two consecutive optical flow fields. Accordingly, moving plankton can be accurately detected under the complex dynamic background in the deep-sea environment. Experimental comparison with manual ground-truth fully validated the efficacy of the proposed methodology, which outperforms six state-of-the-art approaches.


Introduction
Plankton are organisms that live in oceans and fresh water [1] that play an important role in the material and energy recycling within the marine food chain [2]. The study of plankton community and plankton itself is indispensable for understanding of marine resources and the impacts of climate change on ecosystems [3]. In addition, the number of plankton is a key indicator of carbon and energy cycling [4], and of great significance to species diversity and ecosystem diversity [5]. From the early 19th century to date, many examples of large-scale sensor equipment were used to solve the challenge of getting reliable high-resolution estimates of plankton abundance at depth [6]. Acoustic and optical techniques for the in-situ observation of zooplankton are currently popularly used for plankton distribution assessment. Although acoustic-based observation has outstanding advantages of high observation frequency, it has inaccurate quantification and usually requires the combination of optical image analysis or other traditional sampling of zooplankton. In recent years, a series of advances were made in computer vision [7], including hyperspectral imaging [8], principal component analysis of images [9,10], and deep learning [11][12][13] for image classification [14]. As marine plankton is small and uneven in size, it is difficult to describe it quantitatively, such as with inventory and abundance statistics.
At present, a lot of plankton detection methods are proposed that often rely heavily on the use of sophisticated underwater instruments. J. Craig et al. [15,16] constructed an ICDeep system, based on the Image Intensified Charge Coupled Device (ICCD) camera, to assess the quantity of low-light bioluminescent sources in the marine environment. Philips et al. [17] created a marine biological detector, where a Scientific CMOS (SCMOS) camera was used to image the organisms before conducting statistical analysis of the plankton abundance. With the development of the computer vision, multitarget trackingenabled automatic analysis was gradually applied to this field [18]. Kocak et al. [19] proposed to use the active contour (snake) models to segment, label, and track images of the snake model for the classification of the plankton. Luca et al. [20] also presented an automatic plankton counting method, which mainly used the interframe difference and the intersection of the bounding boxes to perform multitarget matching. The aforementioned methods achieved some results in automatic analysis and counting. However, there are still some challenges due to the particularity and complexity of plankton's own form and passive movement mode. Applying machine vision techniques to underwater images or videos is a feasible way to study plankton at present. Underwater plankton imaging has the capacity to detect patterns of the plankton distributions that we would be unable to be tackled by sampling with nets. [21]. Therefore, we consider applying machine vision technology to underwater images or videos is currently a feasible method for studying plankton.
Underwater robots play an important role in various video surveillance tasks including data collection. A mobile robot that can be fixed on a rotatable axis would be advantageous because it provides 360 • visual coverage instead of using a fixed image camera installed in a predetermined direction. These mobile robots capture unprecedented shots of marine life in dangerous environments inaccessible to humans. A submarine can push and control the underwater robot to complete the collection of deep-sea data and store the data in the computer for analysis. Some underwater robots are shown in Figure 1.
In this paper, we propose a deep-sea plankton detection method based on the Horn-Schunck (HS) optical flow [22]. The optical flow is the instantaneous velocity of the pixel movement of the moving object on the image plane. The advantage of the optical flow method is that the motion vectors can be estimated by the optical flow vector accurately. In this way, one can detect the plankton and easily analyze statistically its volume using image processing and machine vision. The research on plankton can be specifically divided into density, position, number, individual and total volume, etc. In the case where the spatial position of plankton does not coincide in two consecutive frames, the presence or absence of plankton should be determined according to the following conditions: the time gradient maps at the plankton's location in two consecutive optical flow fields will be opposite to each other, and the horizontal and vertical gradients of the plankton at that location are equal and their direction is the same. Since the connected components are marked as the location of plankton, the number of connected components can be regarded as the number of plankton. By using this method, we firstly count the number of plankton in the video, followed by a statistical analysis. Various comparative experiments are carried out to benchmark with other methods to fully demonstrate the effectiveness of the proposed methodology.

Principle
The deep ocean floor is clear and suitable for video acquisition with active lighting. During the video acquisition process, the camera position and shooting angle change with the movement of the submersible, making the plankton detection task a moving target detection problem under complex and dynamic backgrounds. Two consecutive optical flow field matrices derived from three consecutive video frames in a video are employed. For fast-moving plankton (plankton does not overlap in space in two consecutive frames), the two consecutive optical flow values at the position where the plankton is located are opposite. In practice, the amount of grayscale change is often close to 0. Therefore, the two consecutive optical flows are approximately opposite to each other, and we discuss this situation by setting two thresholds in the experiment section. We use this property to map out the location of the plankton. Figure 2, hereafter provides an overview of the proposed method, which consists of three modules. As shown in the Module 1 of Figure 2, grayscale images are obtained by weighting three channels of the input frames. In module 2, three convolution operations are performed on two consecutive frames to produce three different gradients(see Figure 3), which correspond to three different convolution kernels. The details of the convolution process are shown in Figure 3 to illustrate this process. We find that the time gradients of the two optical flow fields derived from three consecutive frames of images are opposite in numerical value and direction in the corresponding positions of plankton in the middle frame. In the following description, the time gradients of the two consecutive optical flow fields are represented by ∇ t and ∇ t . The horizontal gradients of the two consecutive optical flow fields derived from three consecutive frames are equal in magnitude and direction in the corresponding positions of plankton in the middle frame. Similarly, the vertical gradients are also equal. In the following description, the horizontal gradients of the two optical flow fields are represented by ∇ x and ∇ x , the vertical gradients are ∇ y and ∇ y . Finally, Module 3 is for dual thresholding, which is explained separately when discussing the parameters later. . Three convolution kernels corresponding in time and space. Two consecutive frames are used to form a 3D matrix whose size is (height Result of each operation is gradient of the pixel at upper-left corner of convolution kernel.

Proof
In the HS optical flow method, the constraint equation of optical flow can be established as Equation (2) according to the premise of the optical flow method: invariance of gray level [22]. Three first-order differences are used to replace the horizontal, vertical, and time gradients. Let the gray value at plankton's position in the middle frame be I x,y,t , where the subscripts x and y are the pixel index, and t is the time index. The position of plankton changes with the movement of ocean current and the camera lens. As shown in Figure 4, the plankton is small-sized, so its position in frame t doesn't overlap in frame t + 1. When it changes from position 1 to position 2, the gray value corresponding to position 2 of plankton at frame t − 1 is the background gray value I x,y,t−1 . In a similar way, when the position of plankton changes from position 2 to position 3, the gray value corresponding to position 2 at frame t + 1 becomes the background gray value I x,y,t+1 . Based on the characteristics of deep-sea underwater video, the background around the plankton is invariant in time, i.e.,: (t-1)th frame t-th frame (t+1)th frame position 1 position 2 position 3 The time gradients at the plankton's positions in the two adjacent optical flow fields are: Based on Equation (1), the background gray value I x,y,t−1 = I x,y,t+1 , ∇ t = −∇ t , the time gradients of the two optical flow fields derived from three consecutive frames of images are opposite in the corresponding positions of plankton in the middle frame.
The horizontal gradients of the plankton's location in the two optical flow fields are: The same way, based on Equation (1), we can get that ∇ x = ∇ x , i.e., the horizontal gradients of the two optical flow fields derived from three consecutive frames are equal in the corresponding positions of plankton in the middle frame. In the same way, we can get ∇ y = ∇ y .
In fact, in the process of proof, the time and space gradients are estimated in a 2 × 2 × 2 cubic neighborhood by taking the mean.
Then, we iterate n times for gray gradient relaxation by setting the initial conditions The parameter α 2 reflects the smoothness constraints of the HS optical flow algorithm; ∆ is an iteration factor in the process of the iterative algorithm; ∇ x and ∇ y are the horizontal and vertical gradients, and u and v are the horizontal and vertical optical flow field matrices, respectively.
The relationships of Equations (7)-(9) are represented by a series, where the number of iterations is n. Let's substitute Equations (7)-(9), the new formulas are as follows: where u n+1 and u n are two horizontal optical flow fields before and after the n-th iteration, v n+1 and v n are two vertical optical flow fields before and after the n-th iteration. We can v 1 and v 1 are the two consecutive vertical optical flow field at the first iteration. If the time gradients of the last two optical flow fields are opposite, that is ∇ t = −∇ t , we can get: v 1 = −v 1 .
When n = k, v k+1 = −v k+1 . That is, Equations (14) and (15) are opposite: where v k+1 and v k+1 represent the previous and the next vertical optical flow field matrix at the (k + 1)th iteration, respectively.
By adding Equations (16) and (17), and substituting v k+1 = −v k+1 , ∇ x = ∇ x and ∇ y = ∇ y into Equation (16) and Equation (17), respectively, we have: Therefore, for fast-moving plankton, the values of the vertical optical flow field matrices of the space position where the plankton is located are opposite from each other: v = −v , and the same applies horizontally: u = −u .

The Volume of Plankton
Based on the above proof, one can calculate the number of pixels where plankton is located, and then multiply the actual size of a pixel to obtain the area of plankton. The resolution of the known image is height × width. According to camera internal reference, the actual range of our field of view is about W m by H m. The calculation of the actual area is given by: where N is the number of pixels, and S is the corresponding actual surface. A method of approximate calculation is adopted here. Firstly, we can get the radius of a circle that has the same area as the plankton, and then calculate the volume of the sphere based on that radius. The advantage of this method is that we can get the 3D volume of an irregular object only by its area [23]. In addition, we can predict the type of plankton based on the estimated size, laying the foundation for the later identification of plankton types. The volume can be calculated by: The proposed method adds its own theoretical innovation on the basis of the original optical flow method and was proved mathematically. In this way, the complexity and passive motion patterns of plankton are well-solved, and the accuracy improves as the above problems are solved.

Experimental Results and Analysis
The data capture was provided by the China National Deep Sea Center. The data set was obtained by an underwater robotic nondestructive testing system carried by a deep-sea manned submersible. The camera's technical specifications are: resolution: 1080i HDTV; minimum illumination: 2l ux; optical zoom: 10 times; digital zoom: 12 times; aperture range: 3.2 mm-32 mm; video aspect ratio: 16:9 or 4:3. In this study, three six-minute videos of the plankton community from appearing to disappearing from the screen were selected, which were obtained from a submarine on the western Pacific sea mountain slope, and the diving depths are 2741.88 m and 5555.68 m, corresponding to 76 and 77 dives, respectively. The reason why the three videos are selected is that plankton appeared more frequently in them. Due to the complexity of the deep-sea environment and the irregular camera movement, the background is complex and dynamic. In this case, using high-precision image processing technology to study the plankton community from appearing to disappearing from the screen can effectively distinguish sedimentary clouds and plankton community in images. Examples of deep-sea plankton images are shown in Figure 5 and the details of data set including diving number, date, diving time, longitude, latitude and depth are shown in Table 1.

Number and Volume of Plankton
Processing the recorded video of a complete plankton community from appearing to disappearing from the screen, the results obtained are shown in Figure 6. Figure 6a shows the variation of the number of plankton in three six-minute videos, and Figure 6b shows the variation of the volume of the corresponding three videos. The process of plankton appearing in front of the camera to disappearing is shown in Figure 6c,d. In the first 30 s of Figure 6c, the amount of plankton is small and the detection results are more accurate. We can see that the amount of plankton rises in the last 30 s of Figure 6c. For dense particle clouds, overlap, and hence, occlusion occurs frequently, which leads to relatively low average accuracy and recall rates. The actual volume curve of plankton in the video is shown in Figure 6b,d. We can see that the volume curve and the quantity curve of plankton generally follow the same trend. At the 40th second in Figure 6c, the plankton community moves away from the camera and then comes back, resulting in a smaller scene and a smaller overall volume due to perspective. So, we can see that the volume curve goes down and then goes up from Figure 6d.

Comparison with Six Target Detection Methods
The proposed method is compared with six state-of-the-art methods for performance evaluation. The results are shown in Figure 7, where Figure 7a represents some original images of the video, including sediment clouds, plankton, and uneven backgrounds. Top-Hat transform [24] is used to detect the location of the plankton in the image as shown in Figure 7b, the weakness of this algorithm is that there are some missed cases. Figure 7c and Figure 7d show the detection results of the frame difference method [25] and the motion estimation and image matching method [26], respectively. We show the result from the scan line marking method [27] in Figure 7e results from the simple block-based sum of absolute differences flow (SD) method [28], and the Lucas-Kanade (LK) optical flow method [29] are given in Figure 7f,g. The weakness of the above three methods is that there are a few false positives, and both Figure 7c,e detected the sediment cloud in the background by mistake. The result of Figure 7h is obtained using the proposed method. After comparing with the manual ground truth, we find that the plankton detected by the proposed method is more consistent with the original image in Figure 7a.
We take 20 images of the video, and the data are cleaned by manual counting to get the ground-truth. Then, we compare the number of plankton, recall rate, precision rate, and F1-score of the seven methods. When using 10 frames in the first 30 s of the video, the amount of plankton is small and the detection results are more accurate, the average accuracy rate is 0.901, the average recall rate is 0.955, and F1-score is 0.927. In addition, the equations and related symbols are shown in Table 2 and Equations (21)- (23). The results are shown in Tables 3 and 4. Taking 10 frames in the last 30 s of the video, the amount of plankton is high. For dense particle clouds, overlap can easily occur, and hence, occlusion occurs frequently, so the average accuracy and recall rates are relatively low, i.e., 0.895 and 0.943, respectively, and the F1-score is 0.918, The results are shown in Tables 5 and 6. In addition, we randomly selected 10 frames from the video for testing. The experimental results are shown in Tables 7 and 8. The performance of the proposed method is still very good. We use bold font to highlight the best results in each category in Tables 4, 6, 8 and 9.

Discussion of Parameters
For each imaging system, there is a depth of field within which the closest field objects and farthest field objects are all in focus. If we deploy the system in air, the light intensity for the near field object and far field object should not be different in theory. However, when deployed in seawater, the light intensity changes as the light propagates in the water from near-field to far-field because of scattering caused by seawater and particles in the seawater. Therefore, during the experiment, there are two situations that need to be discussed. Firstly, 'grayscale invariance' is one of the prerequisites of the HS optical flow method, but in actual operation, the amount of grayscale change is often close to 0 but not equal to 0. Therefore, the threshold β 1 is set to handle this situation, as shown in Equation (24).
Secondly, when there is no plankton and the optical flow happens to be small, if the values of the optical flow are not the opposite but the sum still conforms to Equation (24), the threshold β 2 needs to be set to solve this situation, as shown in Equation (25).
The best threshold value is obtained by traversing the range value, the scope of β 1 is 0.05 to 0.35, step size is 0.05, the scope of β 2 is 3-9, and the step length is 1. Then, the original images and all those resulting from different thresholds are represented by vectors. At last, we calculate the cosine similarity between two images, that is the calculation of cosine distance between two vectors; the larger the cosine distance between the two vectors, the more similar the two images are. The results are shown in Table 9. As shown in Figure 8, Figure 8a is the original image, Figure 8c represents the result of using the threshold β 2 , and the one without the threshold β 2 is shown in Figure 8b.

Time Complexity Comparison
The time complexity comparison of the proposed method and six state-of-the-art methods is provided in Table 10. We select a one-minute video of 1440 frames and calculate the computation time to measure the time complexity of difference methods. Although the proposed method doesn't have great advantage in term of the time complexity, it outperforms other methods in accurate detection of plankton. In terms of the detection efficiency, some experimental comparisons were carried out. Based on the same one-minute video, the computation time and recall rate of the following four different strategies are compared, respectively. We sample pixels at intervals of 1, and take interval frames from full sequence at intervals of 1 frame. According to the results shown in Table 11, the interval between pixels has a weak influence on the error of the result, where the recall rate, precision rate, and F1-score are the closest to the original image's result, and the detection efficiency is improved by greatly reducing the calculation time.

Conclusions
Detection of plankton plays an important role in the exploration and research of deepsea areas. Variations in the quantity and spatial distribution of plankton determine the function of the entire marine ecosystem. In this paper, we introduce a method for deep-sea plankton community detection in marine ecosystem with an underwater robotic platform. Compared with that of traditional methods, our method simultaneously improves the precision and recall of plankton detection. The obtained results and the proved theory provide a scientific basis for studying the material cycle and energy flow of deep-sea ecosystems. For our future work, with a view to strengthening the proposed solution, we aim to improve our plankton detection approach, and then conduct studies for plankton recognition and identification of their species.
Author Contributions: Conceptualization, J.W. and Z.D.; methodology, J.W. and M.Y.; writing-original draft preparation, J.W., K.K. and D.W.; visualization, J.R. and K.K.; writing-review and editing J.W., K.K., Q.Z. and J.R. All authors have read and agreed to the published version of the manuscript.