DeepOtolith v1.0: An Open-Source AI Platform for Automating Fish Age Reading from Otolith or Scale Images

: Every year, marine scientists around the world read thousands of otolith or scale images to determine the age structure of commercial ﬁsh stocks. This knowledge is important for ﬁsheries and conservation management. However, the age-reading procedure is time-consuming and costly to perform due to the specialized expertise and labor needed to identify annual growth zones in otoliths. Effective automated systems are needed to increase throughput and reduce cost. DeepOtolith is an open-source artiﬁcial intelligence (AI) platform that addresses this issue by providing a web system with a simple interface that automatically estimates ﬁsh age by combining otolith images with convolutional neural networks (CNNs), a class of deep neural networks that has been a dominant method in computer vision tasks. Users can upload otolith image data for selective ﬁsh species, and the platform returns age estimates. The estimates of multiple images can be exported to conduct conclusions or further age-related research. DeepOtolith currently contains classiﬁers/regressors for three ﬁsh species; however, more species will be included as related work on ageing will be tested and published soon. Herein, the architecture and functionality of the platform are presented. Current limitations and future directions are also discussed. Overall, DeepOtolith should be considered as the ﬁrst step towards building a community of marine ecologists, machine learning experts, and stakeholders that will collaborate to support the conservation of ﬁshery resources.


Introduction
Every year, the ages of thousands of fish are determined from otoliths or scales in the framework of national data collection fishery programs, supporting several objectives, such as the estimation of parameters for the demographic and population dynamics of fish stocks and the fitting of stock assessment models and length-at-age growth curves [1,2]. Age information is manually extracted by expert readers who count daily or annual growth zones in otoliths using a microscope or high-resolution images [2,3]. However, this is a labor-intensive, time-consuming, and costly process to perform [4]. This limits the number of fish that can be age-analyzed, and monitoring programs must-to a larger extent-rely on growth-at-age models to determine the age composition of fish populations [1]. This underlines the need for automated tools that could be practically used by otolith scientists to facilitate a more streamlined analysis in their laboratories.
Methods to automatically read otoliths for fish age extraction have been proposed using diverse data, such as images, fish biological features (e.g., fish length, catch data, sex) and geometrical features (shape and the opaque and translucent zonation patterns) [4][5][6]. Although these methods have shown good predictability, they require additional biological and geometric information beyond otoliths and a complex preprocessing stage to extract certain features from otolith images (e.g., measurement of translucent and opaque rings) before being able to determine fish age. This tends to limit their applicability as automated solutions for fish ageing.
Deep learning is a subfield of artificial intelligence (AI) that has revolutionized automation in a wide range of real-world applications related to images, text, audio, and videos [7,8]. Convolutional neural networks (CNNs) are a dominant class of deep neural networks for processing images that are being widely used, among others, for image classification (note: Image classification is the task of identifying the class that an image represents, e.g., predicting gender from a face image. In our case, prediction of age as a class category from an otolith image is a classification task.) [9] and image regression (note: Image regression is the task of predicting a continuous variable from an image, e.g., predicting house price from a house image. In our case, prediction of age as a numeric value from an otolith image is a regression task.) [10,11]. CNNs receive images as input, while the whole learning process is carried out in the network; they learn sequentially from simple shapes (lines, edges, etc.) in the first layers, to more detailed patterns in the next layers, and finally, classes of objects or numeric features in the final layers. This is achieved through a stack of alternately arranged convolutional, pooling, and activation layers, followed by a fully connected layer that performs the image classification or regression task ( Figure A1, Appendix A). The key advantage of CNNs lies in their efficiency in capturing the spatial interaction between adjacent pixels in an image and, hence, extracting meaningful features that are able to correctly resolve the computer vision task [12]. For more details about CNNs, the reader is referred to the recent review paper [9].
In recent years, CNNs have received increased attention for automating fish age estimation from otolith images, as shown for Greenland halibut (Reinhardtius hippoglossoides) [13,14], snapper (Pagrus auratus) and hoki (Macruronus novaezelandiae) [15], Atlantic salmon (Salmo salar) [16], and red mullet (Mullus barbatus) [17]. These studies provided evidence that deep learning could offer an automated methodology for the analysis of otolith images, although with varying levels of accuracy, and provide cost-efficient and effective support towards the sustainability and management of fishery resources.
Despite the aforementioned AI advancements in fishery science, otolith researchers may not have the computing expertise or budget to take advantage of AI tools. In addition, even when AI algorithms are employed, without a community, scientists may face obstacles in implementing their case studies, collaborating with each other, and, on the whole, contributing to the field. DeepOtolith (http://otoliths.ath.hcmr.gr/, accessed on 1 March 2022, Figure 1) brings together AI researchers, fish scientists, and software developers to bridge the gap between state-of-the-art computing techniques and otolith research, providing a simple web interface that automatically estimates fish age by combining otolith images with deep learning. The user can export age estimates from multiple images to conduct further age-related research. At present, the platform contains models for three fish species (Table 1, Figure 2). Additional species, however, can be incorporated as related works will be published in the future. This image analysis platform is a Python application (https://www.python.org, accessed in 2001) that uses Python Flask (https: //flask.palletsprojects.com/en/2.1.x/, accessed in 2010) and ReactJS [18] to operate as a webserver at the front-end. The source code for DeepOtolith, as well as sample otolith/scale images for experimentation, are available at: https://github.com/dimpolitik/Deep-Otolith (accessed on 1 March 2022). Below, the platform and its structure are introduced; the three   Atlantic salmon (Salmo salar) 1-6 (river age) YYQX 1-9 (sea age) Norway [16] Red mullet (Mullus barbatus) 0-5+ Greece [17] 2. Materials and Methods

Platform Architecture
The architecture of DeepOtolith is shown in Figure 3. The tool consists of two main components: the front-end, visible to the end user, and the back-end, where all processing takes place. On the front-end, the user can select one of the three currently available fish

Greenland Halibut (Reinhardtius hippoglossoides)
The automatic age determination of Greenland halibut otoliths was based on the work of [14], who focused on explaining the decisions of deep neural networks used for fish age prediction. The considered dataset was a subset of the one described in

Greenland Halibut (Reinhardtius hippoglossoides)
The automatic age determination of Greenland halibut otoliths was based on the work of [14], who focused on explaining the decisions of deep neural networks used for fish age prediction. The considered dataset was a subset of the one described in Moen et al. (2018), which consisted of pre-existing otolith images from the Institute of Marine Research (IMR, Bergen, Norway) collected between 2006 and 2017. For the acquisition of the images, the whole paired right and left otoliths were first collected and put into plastic trays for transportation, where salt or water was added to keep them moist until they could be frozen in the lab. This was done for preservation reasons prior to image capture. After the thawing and cleaning processes, the paired otoliths (or single otoliths if the corresponding pair was damaged or lost) were immersed in water and placed under a stereomicroscope on a white background with transmitted light such that the digital images could be taken. Those were finally imported into Photoshop and calibrated to a 10 mm scale. The resolution of the images was 2596 × 1944 pixels. In the deep learning training process, ref. [14] adopted the VGG19 CNN architecture [19]. They used 8218 training sample images of right and left otoliths. The original images, were resized to fit a 224 × 224 square, as expected in the VGG19 model. The model was trained to classify ages into one of 26 categories (from 1 to 26 years), and age labels were provided by human experts based on current recommendations in the field. Using a test set of right otoliths composed of 165 samples, the trained model attained a root mean square error (RMSE) of 1.69 years between age prediction and age reading by experts. The predicted error was comparable to the earlier study of Moen et al. (2018) (RMSE = 1.65 years), which used a regression network based on the Inception v3 architecture [20] to automate the Greenland halibut ageing.

Atlantic Salmon (Salmo salar)
Vabø et al. [16] utilized an implementation of the EfficientNetB4 CNN network [21], with an input image resolution of 380 × 380, to automate the age estimation of Atlantic salmon (Salmo salar) scales. EfficientNetB4 was trained using transfer learning, with pretrained weights from ImageNet [22]. The dataset used consisted of a total of 9056 highresolution images of salmon scales sampled by the Institute of Marine Research in Bergen (IMR), Norway (from 2015 to 2018) and Rådgivende Biologer (from 2016 to 2017) in rivers along the coast of Norway. Salmon scale photos were taken using a Nikon SMZ25 stereomicroscope with a Nikon Digital Sight DS-Fi2 camera using an SHR Plan Apo 1× objective. The images were captured with a resolution of 2560 × 1920 pixels on a light gray background and postprocessed using NIS Elements D software. The images were annotated both for sea and river ages by expert readers. Two independent networks were trained for separately predicting river and sea age. each task. Age prediction was treated as a regression problem, returning a decimal number that was rounded to the nearest integer age and compared with the ground truth. From the total dataset, 8286 images were annotated with sea age, and 6238 were annotated with river age. Sea ages ranged from 1 to 9 years, with 2 years being the most frequent age (50.6%), followed by age 1 and age 3. River ages ranged from 1 to 6 years, with age 3 seen most frequently (56.5%), followed by age 2.
The prediction of sea age obtained an accuracy of 86.99%, while the predictive accuracy of river age was 63.2%. The study also included a test of the network's performance in comparison with six human readers on an additional dataset of 150 scales. This revealed that the ground truth estimates of river age by expert readers exhibited higher variance and lower levels of agreement compared to sea age, and this may indicate why this task appeared more difficult for the CNN to attain high accuracy [16]. Additionally, the CNN overpredicted the age of 1 year, whereas predictions were best for 2 and 3 years sea age and 3 years river age. This can be partially attributed to the imbalanced distribution of ages in the salmon imagery. Specifically, 90.2% of images had a river age of 2 or 3 years, when only 6% were 4-year-olds and 3% were 1-year-olds.

Greek Red Mullet (Mullus barbatus)
The automatic age estimation of the Greek red mullet (Mullus barbatus) was based on [17]. The dataset included 5027 otolith images, provided by the Hellenic Centre for Marine Research (HCMR) database, along with the age readings and fish length (body size in mm) of each individual fish. For the acquisition of the images, the whole otolith was used without any treatment; it was placed in a petri dish with the inner face looking upwards and immersed in water. The petri dish was placed under a stereoscope on a black background. Reflected cold LED light 50 W was used, provided by two photonic goosenecks, and adjusted to illuminate the whole surface of the otolith. Digital images were taken under a magnification of 16× with a resolution 768 × 576 pixels. Since different lighting conditions, zoom levels, or backgrounds of the tested images may impact the performance of the CNNs, the webpage users should consider, as possible, the above protocols to attain a more reliable age estimation. The age of red mullet in the dataset ranged from 0 to 11 years old. Due to the low number of specimens aged >5 years old (~6%), these were merged into the 5+ age group.
The Inception v3 CNN model [20] was trained using transfer learning with otolith images of resolution 400 × 400 as input, considering fish age estimation as a multi-class classification task with six age groups (Age-0, Age-1, Age-2, Age-3, Age-4, Age-5+). The potential benefit of multitask learning was also explored to improve the network's predictability, with the auxiliary task being the prediction of fish size. The enhanced neural network simultaneously received as input the otolith images and predicted fish age and length. The results showed that, without multitask learning, the ages of the red mullet were predicted correctly by 64.4%, performing better in the younger Age-0 and Age-1 classes (F1 score > 0.8) and moderately in the older age classes (F1 score between 0.50-0.54). Multitask learning increased the correct age prediction to 69.2%, with an additional 28.2% being within 1 year of error; this also proved a better approach to estimate older age groups, increasing accuracy between 3-23%. Additionally, the multitask network achieved a root mean square error (RMSE) of 0.56 years between predicted and human-based age predictions. The moderate accuracy in predicting older age groups can be attributed to the objective difficulty in distinguishing the growth zones in the otoliths of older fish, as well as to the low number of older-fish otoliths in the dataset. This was verified in age-reading workshops [23,24], where age estimations of older fish showed high variability amongst reader experts.

Results
To demonstrate the functionality of the platform, 30 images from each fish species were used, along with age estimates from human readers. For instance, the red mullet species was selected, and the images were uploaded on the platform (note: When the file size of the images is large (>2 MB), it is suggested to upload them consecutively through the "Add Files" button (Figure 4)). The platform estimates the fish age for each image separately ( Figure 4). Then, age predictions can be extracted into a CSV file using the "Export to CSV" button (default name: "export.csv"; this name can be changed to a different filename) ( Figure 4). The content form of the CSV file after being loaded into Excel can be found in the Supplementary Materials (Suppl_red_mullet.xls); the "Refresh page" button allows the user to restart age prediction after completing the limit of 30 images per minute or testing a new species.
Outside of the platform, the predicted age frequency of red mullet was compared with human age estimates (Figure 5a). Although the dataset is small enough to extract general conclusions, we noticed that model predictions tend to underestimate the age-5 class, leading to higher occurrences of ages 3 and 4. This can be partially explained by the fact that during the training of the CNN model for the red mullet, the age-5 class also included ages 5 to 9, due to their small representation in the dataset.
The corresponding results of human against AI age estimates and the exported CSV file for Greenland halibut can be found in Figure 5b and Supplementary Materials (Suppl_Greenland_halibut.xls); for Atlantic salmon (river age), the results can be found in Figure 5c and Supplementary Materials (Suppl_Atlantic_salmon_rive_age.xls); and for Atlantic salmon (river age), the results can be found in Figure 5d and Supplementary Materials (Suppl_Atlantic_salmon_sea_age.xls).  . Snapshot of otolith images for the red mullet species, along with predicted probabilities for each age group. The highest probability implies the predicted age for the specific image. The user can upload at once or consecutively 30 images per minute through the "Add Files" button and extract age predictions in a CSV file (default name: "export.csv") using the "Export to CSV" button. The "Refresh page" button allows the user to restart the age prediction process after completing the limit of 30 images per minute or testing a new species.
Outside of the platform, the predicted age frequency of red mullet was compared with human age estimates (Figure 5a). Although the dataset is small enough to extract general conclusions, we noticed that model predictions tend to underestimate the age-5 class, leading to higher occurrences of ages 3 and 4. This can be partially explained by the fact that during the training of the CNN model for the red mullet, the age-5 class also included ages 5 to 9, due to their small representation in the dataset. . Snapshot of otolith images for the red mullet species, along with predicted probabilities for each age group. The highest probability implies the predicted age for the specific image. The user can upload at once or consecutively 30 images per minute through the "Add Files" button and extract age predictions in a CSV file (default name: "export.csv") using the "Export to CSV" button. The "Refresh page" button allows the user to restart the age prediction process after completing the limit of 30 images per minute or testing a new species. The corresponding results of human against AI age estimates and the exported CSV file for Greenland halibut can be found in Figure 5b and Supplementary Materials (Suppl_Greenland_halibut.xls); for Atlantic salmon (river age), the results can be found in Figure 5c and Supplementary Materials (Suppl_Atlantic_salmon_rive_age.xls); and for Atlantic salmon (river age), the results can be found in Figure 5d and Supplementary Materials (Suppl_Atlantic_salmon_sea_age.xls).

Discussion
Fish ageing information is vital for extracting knowledge about the biological traits (e.g., mortality rates, when fish mature, recruitment success) and status of fish stocks [25]. Public access to cutting-edge AI solutions is the novel feature that DeepOtolith brings to the otolith community. The goal of this system was to decrease the amount of research time and effort needed to perform fish age estimation. The availability of the models can also stimulate further research, as it allows comparison with the included models/papers. The platform directly benefits otolith scientists with a non-computer background who seek to process their datasets for the selective species currently available on the platform. In addition, the platform may serve as a reference point from which deep learning experts and marine ecologists can communicate their suggestions to improve the existing models or their interest in developing their own model and uploading it to DeepOtolith.
In fact, the actual age of a fish is on continuous scale. However, expert readers count annual rings in otoliths or scales to determine fish age, hence, providing human estimates as integers/classes/groups. In most fish assessments, age is also considered as age group, not as a continuous variable. Accordingly, CNN models predict fish age as (i) a discrete integer/class/group, if the prediction is treated as a classification problem (Greenland halibut and red mullet case studies) or (ii) as a continuous value, if prediction is treated as a regression task (Atlantic salmon case study).
It should be noted that DeepOtolith, as with any platform built to save time and money, empowers the user with automated methods but does not entirely replace conventional methods of research. The reported differences in age accuracy and error estimates among the case studies may be explained by the different life spans of the studied species (Table 1), the size of the datasets, and the adopted CNNs. Moreover, in all case studies, the performance of trained CNNs was more moderate for older fish than for younger fish. These remarks imply that users should pay attention to the way they use the platform; case studies are species-specific and should not be used for other species; and the uncertainty in age prediction, especially for older fish, should be considered.
In general, implementing a CNN algorithm in otolith imagery comes with several challenges. First, CNNs do not have an inherent level of accuracy because their accuracy is highly dependent on the data provided. Specifically, the otolith of a given species has its own distinct morphometric gestures (shape, surface area, diameter, anatomy), while each fish species has its own lifespan and life history, resulting in different ways that otolith growth zones are formed as the fish gets older [25]. Second, otolith datasets are often imbalanced, with fewer images for older fish since these are less captured in overexploited stocks. Third, there is an increased difficulty, even for experienced human readers, in distinguishing the annual ring at older ages due to high uncertainty in assigning a growth zone as an age year [24]. Fourth, the age readability of the same fish species may be influenced when it is captured from different regions or analyzed by different labs. This can be attributed to several factors, such as different fish environments and catch seasons, different protocols for the conservation and preparation of the otoliths, and other imaging setup conditions (camera quality, lighting conditions, zoom levels) [26]. All these complexities tend to cause considerable difficulty in the training process and performance of CNNs, and, overall, make the development of a single generic CNN for the age determination of multiple species seem, for the moment, unattainable.
For each case study, a different imaging setup of the otoliths or scales was adopted. Since different lighting conditions, zoom levels, or backgrounds of the tested images may impact the performance of the CNNs, the webpage users should consider, as possible, the above protocols to attain a more reliable age estimation.
The present work can be expanded in several directions. First, suitable adaptations to the network architectures and training procedures of the current fish species on the platform, combined with the collection of more images, will potentially improve the performance of CNNs. Second, DeepOtolith currently supports three species, so the integration of additional species into the platform is a primary future step. Third, technical improvements in the infrastructure of the platform will allow the direct uploading of large (order of thousands) image datasets. Fourth, additional tools in the platform, such as automating the estimation of von Bertalanffy growth curves [27], performing an uncertainty analysis to quantify potential biases in age predictions, and comparing human-and model-based estimates on the platform, are worthy issues for future consideration. Finally, in future work, the platform could be further extended to enable the identification of fish species and families by otolith images, a topic that has been addressed in the past with Fourier transform and discriminant analysis methods [28], and otolith shape analysis [29].
It is also worth noting that the fish age-reading process is subject to the experience and the relative bias of a reader on different aspects, such as the identification of the first annual ring, the axis of the otolith used for the measurements, and the date of fish birth (ICES, 2012). This often results in significant differences in age estimates among readers. To overcome this issue, results among readers or among readings of the same reader are compared to understand observed variations (ICES, 2012). Besides different readings by the same or various readers, common interpretations of an image (i.e., the most frequent age interpretation) are considered decisive for the potential age group. Similarly, in the platform, the trained CNNs were configured to provide either a single age estimation for each image corresponding to the highest probability, along with ages of lower probability (Greenland halibut and red mullet case studies), or a single continuous value as an age prediction (Atlantic salmon case study).
AI has a tremendous toolkit that could potentially be used in otolith research. For instance, unsupervised methods [30] have been proposed to automatically group images into clusters without the need for manual annotation. As CNNs require thousands of labelled images to be fully trained before being able to generalize their learning to unseen scenarios, unsupervised learning could eliminate the time needed for annotating thousands of otolith images with human-based age estimates. With the exception of CNNs, other deep learning-based methods (e.g., adversarial generative adaptation, adversarial discriminative adaptation, self-supervised adaptation) have been used to transfer the knowledge gained from predicting fish ages from otolith images from one lab to the same species in another lab without requiring extra labelling effort [26].
The development of web-based systems in marine science to support automatic systems should be considered a valid goal. In the past, the FAbOSA project (https: //www.imagescience.de/old_pages/fabosa/start.htm, accessed in 2003) aimed to automate fish age estimation using otolith shape analysis, and the web-based environment AFORO (http://aforo.cmima.csic.es/upload_img_en.jsp, accessed on 13 September 2005) was designed to process otolith images for fish species identification by combining morphometric features of otoliths and signal analysis [29]. Recently, ref. [31] released Flukebook (https://www.flukebook.org/, accessed on 27 December 2021), an open-source AI platform for cetacean photo identification and detection. Finally, other platforms powered by AI for fish catch detection and optimizing farmed fish production (http://www.ai.fish/, accessed on 1 November 2019; https://xpertsea.com/valuable-insights#xpercount, accessed in 2021) can also be found on the web. On the whole, the present work should be viewed as a first step towards this direction.