Automating Jellyfish Species Recognition through Faster Region-Based Convolution Neural Networks

In recent years, citizen science campaigns have provided a very good platform for widespread data collection. Within the marine domain, jellyfish are among the most commonly deployed species for citizen reporting purposes. The timely validation of submitted jellyfish reports remains challenging, given the sheer volume of reports being submitted and the relative paucity of trained staff familiar with the taxonomic identification of jellyfish. In this work, hundreds of photos that were submitted to the “Spot the Jellyfish” initiative are used to train a group of region-based, convolution neural networks. The main aim is to develop models that can classify, and distinguish between, the five most commonly recorded species of jellyfish within Maltese waters. In particular, images of the Pelagia noctiluca, Cotylorhiza tuberculata, Carybdea marsupialis, Velella velella and salps were considered. The reliability of the digital architecture is quantified through the precision, recall, f1 score, and κ score metrics. Improvements gained through the applicability of data augmentation and transfer learning techniques, are also discussed. Very promising results, that support upcoming aspirations to embed automated classification methods within online services, including smart phone apps, were obtained. These can reduce, and potentially eliminate, the need for human expert intervention in validating citizen science reports for the five jellyfish species in question, thus providing prompt feedback to the citizen scientist submitting the report.


Introduction
Citizen science incorporates scientific research and monitoring projects for which members of the public collect, categorise, transcribe or analyse scientific data. Citizen science has definitely come of age in recent years in generating knowledge, creating new learning opportunities and enabling civic participation [1]. The marine conservation and biodiversity monitoring potential of citizen science campaigns is increasingly being recognised [2,3], being deployed, for example, in the monitoring of invasions by marine alien species [4] and in an array of other marine applications summarised in [5]. Despite an increased uptake of marine monitoring protocols based on the citizen science rationale, a number of challenges still undermine the mainstreaming of the same protocols. Chief among these challenges is the need to validate individually submitted citizen science reports in order to avoid the contamination of the main dataset with misidentification reports. In fact, for citizen science projects to fully fulfil their potential, they need to generate good-quality datasets [6].
A decade ago, smart phone apps and the virtual/gaming approach were considered as cutting-edge, emerging technologies which held great promise for the advancement of citizen science [7]. The state-of-the-art has progressed to a state where smart phone apps are being enhanced with a new suite of sensors to enable greater citizen science functionality of the same technologies. Given the image-intensive nature of many ongoing marine citizen science campaigns, the image analyses domains of Artificial Intelligence and Machine Learning have increasingly been applied to automate the validation of the same images [8] and thus to expedite the normally lengthy process. These protocols have been incorporated within fixed-location, jellyfish bloom early-warning systems, including the JellyMonitor prototype deployed on the Scottish seabed and which makes use of both classic computer vision and deep learning neural networks to detect the occurrence of jellyfish blooms [9]. The automated processing of underwater footage, besides static imagery, has also been achieved through the same protocols. For instance, the JellyToring system automatically detects and quantifies different species of jellyfish based on a deep object detection neural network, allowing the user to automatically record jellyfish presence during long periods of time [10]. To date, our University of Malta research group has proposed an image-analysis protocol for the characterisation of Large Microplastics (LMPs) extracted from beach sediment [11].
Jellyfish swarms, or blooms, consist of large numbers of jellyfish which aggregate occasionally, mainly as a result of the direction of prevailing water currents, and the occurrence of such jellyfish aggregations has attracted increased attention in recent years (see, e.g., in [12]) by virtue of their socio-economic impact. The "Spot the Jellyfish" initiative, run by the University of Malta and the International Ocean Institute (IOI) since June 2010, follows a citizen science approach and relies on the collaboration of the general public, mariners, divers and especially the younger generations through their teachers and parents, by recruiting their assistance in recording the presence and location of different jellyfish within Maltese waters. The reporting is done by simply matching a sighted jellyfish with a simple visual identification guide, giving the date and time of the sighting, and indicating the number of jellyfish individuals observed. Sightings can also be reported online, through the campaign's social media page, via SMS or email. Citizen scientists are also encouraged to supplement their report with a photo, which is used for verification purposes and to quality control the database. All jellyfish reports within this campaign are also shared with Maltese tourism authorities for beach management purposes. Current citizen science validation protocols are labour-intensive and subject to potential human errors, such that citizen scientists do not receive prompt feedback to their submitted reports, leading to a possible disengagement of the same contributors. On the pro side, current protocols are characterized by a high degree of taxonomic precision, such that the rate of misidentification is generally very low.
In this work, the hundreds of photos submitted to the "Spot the Jellyfish" initiative for the five most commonly recorded species within Maltese waters (Pelagia noctiluca, Cotylorhiza tuberculata, Carybdea marsupialis, Velella velella and salps) are used to train and evaluate the performance of a tailored-designed, region-based Convolution Neural Network (R-CNN). The main aim of the developed digital architecture is to identify correctly the jellyfish species recorded in the submitted photo, thus eliminating or reducing the need for human expert intervention and providing a prompt feedback to the citizen scientist. Semiquantitative estimates of jellyfish individual abundance will also be generated by the algorithm when presented with photos of jellyfish blooms. A region-based method is applied to identify and record the jellyfish sighting and the confidence level for the model's taxonomic identification output. Over the last decade, R-CNNs have been researched and successfully applied to a wide-range of applications including the detection of tumours [13], the detection of ships [14], the detection of debris on the airfield pavement [15], to give an early warning of sharks in coastal areas through the use of drones [16], for traffic sign recognition [17], to detect three-dimensional objects during autonomous driving [18], as well as for the classification of cracks in concrete [19].
The most basic form of R-CNN makes use of an external algorithm such as the Edge Boxes method to come up with region proposals [20]. Pixels corresponding to the identified regions are then extracted, resized and processed via a support vector machine (SVM) that is trained using CNN features [21]. The Fast R-CNN implementation makes use of the Edge Box algorithm to identify regions that might contain an object of interest [22]. Although the CNN features of the identified regions are extracted, the results coming from overlapping parts are shared. While this makes the process faster, the entire images still need to be processed. In the Faster R-CNN method, a Region Proposal Network (RPN) is embedded within the detector to eliminate the need of an external algorithm to identify important sub-areas [23]. Apart from making the implementation faster, this allows the detection of regions to be attuned to the dataset itself. RPNs are able to identify valid boxes of any size quickly, by making use of anchor boxes. In particular, processing is carried out within a set of predefined boxes, with dimensions that correspond to the objects that are to be identified. This eliminates the need to perform computations on every possible position of a sliding window. In this work, the anchor boxes allow the definition of the expected aspect ratios of image regions that show a jellyfish. The classification models then try to fit objects within the given ratios. This approach allows for a better performance to be obtained when multiple jellyfish cover each other, such as in the case of a bloom. Faster R-CNNs are designed to give quick and accurate results, even when a large number of regions are defined. Being a deep learning method, features are extracted from the provided data automatically. Throughout the supervised learning process, filters in the hidden layers are modified so as to extract important features and minimise the prediction error.
All development in this study was done using Matlab by Mathworks (https:// www.mathworks.com). In particular, use of the Vision toolbox was made to label and augment the dataset, as well as to train the models. The Image Labeler App was used to define bounding boxes and the corresponding species types around every visible jellyfish in all images within the database. All of the annotated data and the developed code will be made available upon request. The public distribution of such files will allow for a common base of labelled images that can be used to benchmark and compare the performance of future jellyfish classification algorithms.
The ultimate objective is to embed the optimised algorithm within the "Spot the Jellyfish" campaign website as well as on a mobile phone app. This service will simplify and speed up the report validation process drastically, providing a prompt response to citizen scientists on the taxonomic identity of the recorded species, and this is the main motivation behind this exercise. When fully implemented, users would just need to capture and upload an image of the observed jellyfish. The time and geolocation information will be collected from the mobile phone and the server-based Artificial Intelligence software will complete the information by estimating the number of jellyfish individuals in the bloom, as well as suggesting the taxonomic identity of the sighted jellyfish species. All the submitted photos will be used for online learning and to continuously improve the classification model deployed by the algorithm in an adaptive approach. Once fully implemented, through embedding within a customised app, for instance, the optimised algorithm can serve as a blueprint for citizen science campaigns relying on image analyses. In fact, by facilitating the automated extraction and interpretation of information from submitted images, and thus the report validation process, the same algorithm can make a positive contribution by improving the efficacy and attractiveness of the same campaigns.

Image Dataset
The "Spot the Jellyfish" citizen science campaign provides an interactive interface that shows a map of the Maltese Islands with all the validated jellyfish sighting reports received by connecting to the central campaign database [24]. In order to train the developed classification model, all of the jellyfish photos submitted through the campaign were initially organised in different directories, according to species. Additional photos for each jellyfish species were also obtained from online sources retrieved through the Google image search feature. A mixture of different submitted jellyfish image typologies was availed of, consisting of images that show jellyfish within the water column, jellyfish beached on shores, jellyfish photographed against complex backgrounds as well as a mixture of images with and without watermarks. Figure 1 shows images of Pelagia noctiluca, Cotylorhiza tuberculata, Carybdea marsupialis, Velella velella and salps, while Table 1 highlights the number of images defined in each class. In total, 602 images were used to train the classification model, while 200 images were used for its testing. The 40 images used to test the classification accuracy of each species were chosen randomly from the available set of photos.
The original images were of varying sizes. In the preprocessing stage, the containment area within each image that showed jellyfish individual/s, was defined by recording the lower left coordinate, the width, and the height (in terms of pixel numbers), of the bounding box. During this stage, some of the images were manually cropped to remove unnecessary image background so that the species occupied around 60 percent of the image. The process is demonstrated in Figure 2.    In a second processing stage, all images and the corresponding boundaries that specify the regions that contain the jellyfish, were resized so that the maximum width was that of 224 pixels. These dimensions were applied in order to apply transfer learning protocols and take advantage of models that were pretrained using the same image sizes. The rasters were downscaled using a bicubic interpolation scheme with antialiasing. Up to this stage, the image height was left to vary proportionally, so not to distort the same images.

Classification Model
Although some of the jellyfish images extracted from the "Spot the Jellyfish" campaign database were captured by SCUBA divers, and thus show the entire jellyfish against a consistently blue background, the majority of database photos were captured from a boat or from the shore. Therefore, even though the images were all of high resolution, in most cases the jellyfish individuals only occupied a small region of the frame. As a result, highly reliable detection and classification schemes were required to correctly identify the jellyfish in the images.
Initially, conventional convolution neural networks (CNNs) were trained to classify the raw and unprocessed images into five categories for the identification of Pelagia noctiluca, Cotylorhiza tuberculata, Carybdea marsupialis, Velella velella and salps jellyfish. While good results were generally obtained, the applicability of region-based convolution neural networks (R-CNNs) was subsequently tested and found to further improve the classification method's accuracy. Such deep learning techniques improve on the CNN models by following a two-stage approach. Initially, sub-regions within the image that might contain a jellyfish are identified. Features are then extracted from the convolved areas and adopted as an input for the classifier.
In this study, five classification models were implemented to distinguish between five different jellyfish species. In all cases, transfer learning was used to improve the accuracy. Such an approach involves the use of a pre-trained CNN that can extract features from thousands of natural images coming from various categories. The developed faster R-CNN models were then optimised to extract features that are specific to jellyfish images. The five models were based on ResNet18, ResNet50 and GoogLeNet. The ResNet18 and ResNet50 networks were 18 and 50 layers deep, respectively. These have been trained on more than a million images [25]. On the other hand, GoogLeNet is 22 layers deep [26]. Since these networks are based on 224 × 224 images, all jellyfish frames had to be resized in order to ensure consistent dimensions. The resizing of images was carried out through a bicubic interpolation scheme. Table 2 provides a summary of the salient differences between the five deployed classification models. The time required to classify the 200 test images is also listed. Data augmentation was also applied to increase the number of training examples. Deep learning techniques such as the Faster R-CNN method often rely on big data. However, in this case, only a few examples were available. In such circumstances, the model might learn the function very well by overfitting. Although very accurate results will be noted during the training process, the performance on unseen images will not be equivalent. In this study, as only 602 training images were available, overfitting was avoided through augmentation. As discussed by Shorten and Khoshgoftaar [27], such a process will help to create synthetic instances to increase the size of the training set. In this case, the jellyfish images were vertically and horizontally flipped and in each case, a different feature layer was defined. For the majority of cases, the number of anchors was set to three. However, in classification Model 5, this was increased to ten. The test images were not part of the training set. As required by the models, these were resized to 224 × 224 pixels. When more than one jellyfish was detected within the submitted image, the one with the highest percentage probability was considered. The adopted framework for the training of each classifier is graphically represented in Figure 3.

Results
The confusion matrices obtained from testing all five classification models with the same set of images are presented in Table 3. Here, the columns represent the true (actual) class of the jellyfish shown in the test images, while the rows correspond to the category predicted by the classifier. A "not classified" category was added to the list of categories within the "predicted" section, to denote the images that registered a jellyfish which was not characterised by the classifier. The test set contained 40 images for each jellyfish species. For a perfect classifier, the values within the diagonal would be 40, implying that the correct species was identified every time. As demonstrated below, this was not always the case. Confusion matrices help to identify whether a classifier is behaving randomly or whether for instance, it cannot distinguish between two classes.
The precision, recall and f 1 score for every class by every model were computed by Equations (1)-(3). Here, TP represents the number of true positive cases and corresponds to the number of instances for which the correct class was predicted. On the other hand, FP corresponds to the false positives and indicates the number of times other jellyfish were incorrectly classified with the group being tested. The number of instance that belong to the class under test but were labelled otherwise are treated as false negatives (FN). The results obtained for the trained models are presented in Table 4. The precision score gives an estimate of the fraction of the predicted jellyfish species that actually belong to the same species. The recall score indicates the percentage of correctly classified instances. Obviously, classifiers with high precision and recall scores are preferred. The f 1 score combines these two metrics by taking the harmonic mean. The Cohen's κ score that measures the convergence between the actual classes and the predicted ones, was computed as shown in Equation (4). For this metric, p o represents the actual observed agreement and is equivalent to the accuracy of the model while p e is the probability expected by a random classifier. In this case, a value of 1 denotes perfect agreement, while a value of zero indicates that the classifier performance is similar to that of a random draw. Table 3. Confusion matrices for classification. Clearly, the implementation of transfer learning has allowed very accurate jellyfish species classifications/assignments to be made. The models that performed best where those that used the GoogLeNet feature extraction network. While an increase in the number of anchors resulted in a slower classification process, a higher accuracy was also concurrently obtained through this measure. As expected, ResNet50 gave better results than ResNet18 and took less than 4 more seconds to classify the 200 test images. Figures 4-8 present a sample of the results obtained through the implementation of classification Model 5. Table 4. Precision, recall, f 1 score and κ metrics for predictions by the five classification models.

Model
Species Precision Recall f 1 Score kappa

Conclusions
The Cohen's κ output values were high, ranging between 0.917 (Model 4) and 0.994 (Model 5), suggesting that the automated identification of jellyfish through images submitted to these classification models is generally robust and reliable. In this work, the default model parameters were used. The use of hyperparameter optimisation methods that will further fine-tune the selected model is recommended for future work. This exercise is expected to further improve on the results presented here.
In terms of precision values, the five jellyfish species adopted for the current study were classified correctly in the following decreasing order; Cotylorhiza tuberculata (4/5 models reported a value of 1 for precision), Velella velella (3/5 models reported a value of 1 for precision), salps (2/5 models reported a value of 1 for precision), Pelagia noctiluca (1/5 models reported a value of 1 for precision) and Carybdea marsupialis (precision values of 1 were not reported by any model).
Corresponding results obtained by Martin-Abadal et al. [10] similarly rank Cotylorhiza tuberculata (along with Rhizostoma pulmo, which was not considered in the current study) higher than Pelagia noctiluca in terms of correct identification success. These authors attribute this result to the fact that Cotylorhiza tuberculata is a larger jellyfish whose body remains relatively unchanged whilst swimming, thus rendering them more amenable for correct identification, while in Pelagia noctiluca, the relative position of the tentacles in relation to the main body (umbrella) changes to a greater extent with the movement of the animal, adopting a multitude of shapes, making it more difficult to identify. The fact that Carybdea marsupialis was the least correctly identified jellyfish species might be tentatively attributed to the highly transparent nature of the species and to the high incidence of night-time (and, consequently, low-lit) images of the species, due to its nocturnal and photo-tactic habits.
The lowest precision score obtained from the five classification models was that of 0.814 (Pelagia noctiluca, Model 4), with this being one value, out of just two, which were inferior to a threshold of 0.9. Similarly, high recall and f 1 score metric values were recorded through the implementation of the five classification models. This further substantiates the robust performance of the same classification models in correctly identifying jellyfish species from the images provided. The f 1 score metric values obtained in the current study, ranging between 0.843 (Pelagia noctiluca within classification Model 4) to 1 (Velella velella, salps and Cotylorhiza tuberculata within classification Model 5) are comparable to the same metric values reported within [10], which range between 0.936 and 0.952.
It is essential for every citizen science campaign to continuously monitor and validate data quality [28]. We are confident that the results presented in this study support our upcoming aspirations to embed the image analysis protocols described therein within a smart phone app and within our campaign website to enable its broad application. This would in turn spearhead its further performance testing and subsequent optimisation through the submission of an increasing number of jellyfish images and output monitoring. Additionally, the formulation of guidelines on the taking of high-quality jellyfish photos by citizen scientists in order to ensure their applicability within image analysis protocols is recommended.
Author Contributions: All three authors contributed significantly to the publishing of this work. A.G. drafted the manuscript, planned, and implemented the experiments. J.A. assisted in the methodology and model development, validation, as well as in the review and editing of the paper. A.D. carried out the analysis of results and helped in the preparation, writing, review, and editing of the manuscript. A.D. manages the "Spot the Jellyfish" database from which most of the images used in this work were obtained. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.