Segmentation of laterally symmetric overlapping objects: application to images of collective animal behaviour

Video analysis is currently the main non-intrusive method for the study of collective behaviour. However, 3D-to-2D projection leads to overlapping of observed objects. The situation is further complicated by the absence of stall shapes for the majority of living objects. Fortunately, living objects often possess a certain symmetry which was used as a basis for morphological fingerprinting. This technique allowed us to record forms of symmetrical objects in a pose-invariant way. When combined with image skeletonization, this gives a robust, nonlinear, optimization-free, and fast method for detection of overlapping objects, even without any rigid pattern. This novel method was verified on fish (European bass, Dicentrarchus labrax and Tiger Barbs, Puntius tetrazona) swimming in a reasonably small tank, which forced them to exhibit a large variety of shapes. The correct number of objects was determined for 88% of overlaps and the mean Dice-S{\o}rensen coefficient was 0:84, implying that this method is feasible in real-life applications such as toxicity testing.


Introduction
The study of collective behaviour is a challenging task for any method of individual object tracking.
Since the artificial marking of individuals, e.g. by an electronic device, can affect their behaviour 1 , it is necessary to use non-intrusive methods. Due to its excellent spatial and time resolution, video analysis is the most prominent among these methods. However, the disadvantage of this method is that it only creates a 2D projection of observed objects that are in reality always 3D. Indeed, the 2D projection of the whole space unavoidably leads to overlapping of objects in the image. Moreover, information about the shapes and textures of objects is irrecoverably lost. In tracking of individual objects, this can be crucial, especially, if the density of the objects is high 2 . This holds mainly for video analysis of typical collective behaviour like fish schooling, bird flocking, or crowds of people, when the individuals are overlapping most of the time [3][4][5] .
The most commonly-used solution of the problem of multiple object tracking using machine vision systems is ignoring of the overlaps. Nevertheless, the overlaps can be typically detected from morphological parameters (e.g., area, perimeter, eccentricity, and their combinations) of the image binary mask and the trajectories of the individuals are then reconstructed using track extrapolation 6 , particle filter 7 , or, more frequently, Kalman movement prediction 8,9 . These methods work well for less crowded scenes where, in addition, the mobility of the individuals is low. Solutions that are directly aimed at observing the interactions in groups usually utilize texture matching before and after collision 10 instead of movement prediction. This introduces some robustness into the approach but important information about the movement is not utilized.
None of these methods mentioned above can track individuals in very dense scenes, where the objects are overlapping most of the time. For these complicated cases, the method of rigid pattern matching can be used 11 . This method requires the shape of the object to be invariant, which is, in practical applications, rarely the case and, thus the detection rate of this method is low. Most of the more advanced solutions include complex nonlinear optimizations to fit a set of ellipsoids around a fish's central line in 3D 12 . This approach works for dense scenes but the model used contains many degrees of freedom, which lead to significant inaccuracies, and requires highresolution imaging and a high frame rate. In addition, this method is not single-image and requires knowledge of the number of objects and the previous state of the system. But the previous state of the system can unavoidably lead to the error propagation and the number of objects is not always known, for example in cases where the observed volume contains hideouts or is not fully open.
A method which is reliable under such conditions is crucial for the study of collective behaviour, since many interesting behavioural patterns are observed mainly in complex environments with obstacles, hideouts, and inanimate models 13,14 .
In this paper, we, for the very first time, propose a robust, nonlinear, optimization-free and pose-invariant way of solving laterally symmetric, overlapping, objects. The method is truly single-image and, as verified on image sets of fish schools, tolerant of severe data corruption and low image resolution.

Material and methods
Experiment design and Image Preprocessing To describe and validate the method, two species of fish were video-observed and analyzed: relatively large European bass (Dicentrarchus labrax), and Tiger Barb, a popular aquarium fish (Puntius tetrazona).
Two experiments were conducted with European bass. In the first experiment, only one individual was recorded in a relatively small (d = 3 m) circular tank by an IR camera with 1280 × 1024 resolution. The scene was illuminated by a 830-nm, 60-mW IR diode placed above the tank.
The required set of individual object fingerprints was collected during this experiment. In the second experiment, 20 individuals of European bass swam together in the same tank. We note that the usage of an IR camera is not essential to the method, as the same results may be obtained with an ordinary video camera. The contours of the detected individuals as well as of the overlaps are very noisy, due to the dependency of the fish texture coloration in the IR on the depth oat which the fish are swimming. The average length and width of the fish projected on the camera was 190 px and 110 px (3 µm 2 /px), respectively, in 12-bpc colour depth.
For Tiger Barb, only one experiment was conducted, with 6 fish individuals swimming in a small (375× 210 mm 2 ) tank. All sides of the aquarium were observed simultaneously using a mirror system. Data collection was done separately for each view of the aquarium. Classification of the detected objects into single fish and overlapping fish was performed using perimeter and area threshold. The typical projected size of the fish in the bottom and top views of the tank was 50 × 30 px (5 µm 2 /px), in 12-bpc colour depth.
For both datasets, foreground detection was performed using the Gaussian Mixture Model 15, 16 with 8 Gaussians. Two consecutive image dilations 17 with a 1 px diamond structural element fixed the detection artifacts. From the training set, binary masks of European bass individuals themselves and overlapping ones were manually extracted. For Tiger Barb, automatic classification was used for distinguishing individual fish and overlapping masks. It was assumed that the analyzed object was a single fish if the area of the produced mask obeyed: where A is the area of mask, A s is the areas of all detected blobs, and σ is the standard deviation. If the mask area fell outside this range, the object was classed as overlapping individuals. The binary masks of the detected objects were used as the input data for the proposed algorithm (figure 1).

Solution search
The most accurate fits were found when we applied a simple greedy search to the known set of solutions (figure 1c).However, first two variables had to be calculated: the unknown count of objects and the cost function of the greedy search. The unknown count of objects was resolved by introducing stop criteria. As can be understood intuitively, these criteria are related to the optimal coverage of overlaps by the reconstructed fingerprints. The introduction of these criteria requires the introduction of weights for overlapping pixels. The weights can vary from 1 (absence of a reconstructed object in the vicinity of the corresponding contour pixel) to 0 (ideal coincidence). The stop criteria were defined as: where N is the count of pixels in an overlapping contour, W n is the weight of the n-th pixel in the contour, and F is the robustness of the method. We used F = 0.22 which means the algorithm will The excessively large distances are then eliminated from the comparison by replacing them with the reference. The normalized cross-correlation with zero lag was used as the measure of similarity.
To handle the broken distances, the correlation was decreased about the relative (to the number of equidistant points) number of broken distances. The local cost was then defined as where D is a distance measured from the central line to the overlapping contour, R m is the reference distance, M is the number of doubled distances, and s is the forward or backward orientation of the reference distance.
The global and local cost functions are required simultaneously and the resultant cost contains their product. The remaining issue in searching for solutions, the uniqueness of the solutions, can be resolved by introducing a degree of uniqueness. The degree of uniqueness is defined as the median of the weights of points of overlapping contour which are the closest to the solution: where arg is the index of the value. The measure of the uniqueness is maximized. This intuitively corresponds to the idea that the solution is correct if it is localized near the points where there are no other solutions.
Combining the above, the total cost was defined as where global and local are global and local costs, respectively, as defined in Eqs. (4)-(5) and uniq is a degree of solution uniqueness as introduced in Eq. (6).
Searching for solutions includes three main steps: finding the solution of the minimal cost, renewing the weights, and checking stopping criteria. To determine the weights, discrepancies 13 between the solution and contour are calculated. The discrepancies are defined as the mean of the distances between the points of solution and the nearest 3 points of the overlapping contour.
We denote this parameter as f uzziness. All weights where the minimal distance from the corresponding points to the solution contour was less than three f uzzinesses were divided by the term 3 × f uzziness. This feedback was aimed to eliminate coincident solutions.

Results and discussion
The morphological fingerprinting technique demonstrates great robustness and stability for both testing datasets: high-resolution (in terms of pixels per object) but noisy images of European bass  We present the results on the manual segmentation only for the European bass dataset, but results on the automatic segmentation of the Tiger Barb dataset are available in supplementary materials 24 and, thus, its validity may be evaluated visually (e.g., figure 4). In the case of extremely low-resolution images, the method works well mainly for dorsal and ventral views of the fish (figure 4).
The mean time of calculation per overlap was 5 s. The code is written as a prototype in MATLAB and is not fully optimized. A substantial amount of time is consumed by the selfoverhead of functions and thus the method can be significantly optimized.
Both data sets, all codes, verification tools, and the GUI for the fingerprinting and collision solving are available in the supplementary materials 24 .

Conclusions
The solving of overlaps itself is a non-trivial task, especially if objects are non-convex and have no strict shape. We present a method that combines only known morphological properties and symmetries of objects with empirical features of image skeletons. Such an approach attains a significant efficiency (hardly attainable by any model-free approach) both in quality and speed.
The method has been applied to the dorsal and ventral sides of the fish but is supposed to work particularly with semi-symmetrical objects such as lateral sides of fish. The same approach with minor changes may be applied (with even greater efficacy than in the case of fish) to other elongated organisms that frequently overlap, e.g., worms and snakes. We believe that the developed method can greatly improve existing systems of tracking and will initiate the development of new ones which will facilitate the processing of previously intractable data.