Sustainable Maritime Applications with Lightweight Classifier Using Modified MobileNet

Satrya, Gandeva Bayu; Kurniawan, Febrian; Budiman, Gelar; Pristisahida, Adelia Octora; Moesdradjad, Bledug Kusuma Prasaja; Ramatryana, I Nyoman Apraz; Choutri, Salah Eddine

doi:10.3390/technologies14030161

Open AccessArticle

Sustainable Maritime Applications with Lightweight Classifier Using Modified MobileNet

by

Gandeva Bayu Satrya

^1,2,*

,

Febrian Kurniawan

³

,

Gelar Budiman

^4,*

,

Adelia Octora Pristisahida

²

,

Bledug Kusuma Prasaja Moesdradjad

²

,

I Nyoman Apraz Ramatryana

^2,5

and

Salah Eddine Choutri

⁶

¹

College of Computer and Systems Engineering, Abdullah Al Salem University (AASU), Khaldiya 72303, Kuwait

²

Department of Electrical Engineering, Faculty of Information Technology, Universitas Nahdlatul Ulama Yogyakarta, Sleman 55293, Indonesia

³

School of Computing, Telkom University, Bandung 40257, Indonesia

⁴

School of Electrical Engineering, Telkom University, Bandung 40257, Indonesia

⁵

School of Electronic Engineering, Dublin City University, D09V209 Dublin, Ireland

⁶

NYUAD Research Institute, New York University Abu Dhabi, Abu Dhabi 129188, United Arab Emirates

^*

Authors to whom correspondence should be addressed.

Technologies 2026, 14(3), 161; https://doi.org/10.3390/technologies14030161

Submission received: 9 December 2025 / Revised: 8 February 2026 / Accepted: 15 February 2026 / Published: 5 March 2026

(This article belongs to the Topic Applications of Artificial Intelligence in Sustainable Energy and Environment)

Download

Browse Figures

Versions Notes

Abstract

The enormously growing demand for seafood has resulted in the over-exploitation of marine resources, pushing certain species to the brink of extinction. Overfishing is one of the main issues in sustainable marine development. To support marine resource protection and sustainable fishing, this study proposes advanced fish classification techniques using state-of-the-art machine learning (ML). Specifically, the proposed method enables the precise identification of protected fish species, among other features. In this paper, we present a system-level optimization of the MobileNet architecture, termed M-MobileNet, designed to operate efficiently on resource-limited hardware environments. Our classifier is constructed by a refined modification of the well-known MobileNet neural network, resulting in a reduction of parameters. Furthermore, we have collected, organized, and compiled an original and comprehensive labeled dataset of 37,462 images of fish native to the Indonesian archipelago. The proposed model is trained on this dataset to classify images of captured fish and accurately identify their respective species. Furthermore, the system provides recommendations regarding the consumability of the catch. Compared to the MobileNet deep neural network structure, our model utilizes only 50% of the top-layer parameters, with approximately 42% GTX 860M utility. This configuration results in achieving up to 97% accuracy of classification. Considering the constrained computing capacity prevalent on many fishing vessels, our proposed model offers a practical solution for on-site fish classification. Moreover, synchronized implementation of the proposed model across multiple vessels can provide valuable insights into the movement and location of various fish species.

Keywords:

sustainable; maritime; lightweight classifier; MobileNet

1. Introduction

More than 740 million people (10%) depend on catching, measuring, producing, and selling fish and seafood [1]. This dependence on fishing-related livelihoods is steadily increasing. In developing maritime countries, fish play a crucial role as the principal source of income, comprising the largest portion of the world’s fish catch and production. In addition, these countries contribute substantially, representing 97% of the global fishing workforce [2]. This also applies to the overwhelming majority of small-scale fishermen, for whom fishing not only forms the basis of their earnings, but also constitutes an essential part of their daily nourishment.

The oceans are home to more than 20,000 species of fish [3], some of which are consumable, while others are not. The continuous overfishing of sea resources not only endangers many species of fish but also threatens the balance of the entire ecosystem. The use of an intelligent classification system for fishing will help fishermen distinguish protected fish species from their catches, helping to prevent illegal activities and protect these species. The existing state-of-the-art fish classification models considered limited species of fish [4] and did not assess fish consumability status despite its importance [5]. This consumability could be a strong indicator in distinguishing various protected and dangerous edible fish species from their commercial and consumable counterparts.

This work considers fish classification over a remote fishing environment in Indonesia and develops an edge intelligence (EI) strategy to overcome unstable network connections in the sea’s remote areas. The framework consists of a state-of-the-art lightweight machine learning model based on MobileNet and low-communication-overhead ML libraries for resource-constrained edge devices (e.g., smartphones). Compared to the existing approaches, we propose a practical solution that can accommodate massive-scale and heterogeneous Internet of Things (IoT) deployments.

Unlike simple batch-based learning approaches, we aim for the development of portable ML libraries for local computation with restricted dependency on remote computing libraries. The proposed method is designed to be implemented on a low- and small-resource compact machine in remote areas. In particular, we utilize a NVIDIA GeForce GTX 860M GPU (NVIDIA Corp., Santa Clara, CA, USA) and 16 GB of DDR3 RAM graphics card manufactured in 2014 with a 28 nm chip size (low-specification mobile chip with low-resource computing for remote area implementation). This resource-efficient feature of the strategy is contributed to by possible customization according to the characteristics of the target of the EI application.

Although identifying fish species could be time-consuming and largely laborious, it is a mandatory procedure for both industrial and research fishing boats. Aboard research fishing boats, fish species are often assessed manually. For instance, the length of the fish is estimated manually, with one person measuring the length using a measuring board, while another person manually records the data in a personal computer [6]. Automatic fish-length measurement in the laboratory using computer vision methods has been explored in [6], demonstrating results with errors of less than 1 cm.

There has been a growing interest in fish classification in the recent literature [7,8,9,10]. In particular, several approaches to fish classification have used deep learning models [11,12,13,14,15,16,17,18,19,20]. Various MobileNet-based (mobileNets) approaches have been explored, including [21], MobileNetv2 [22] and other related architectures such as VGG16 [23], Resnet50 [24], Effnet [25], Capsnet [26], Sufflenet [27], Mnasnet [28], and Xception [29]. MobileNets emerge as promising candidates for the next wave of deep learning methods in object detection and classification, particularly well-suited for handling large datasets. MobileNets [21,22] rely on a streamlined architecture that applies depth-wise separable convolutions to make lightweight deep neural networks. MobileNets have been implemented in many applications such as traffic density [30], redundancy reduction [31], skin classification [32], FPGA [33], vehicle counting [34], multi-fruit detection [35], fish species classification [36] and object detection on non-GPU architecture [37].

In this work, we propose an enhanced MobileNet model called M-MobileNet (Modified-MobileNet) that focuses on halving the number of parameters in the top layer of the CNN and improving the accuracy of deployment in low-specification devices (i.e., GeForce GTX 860M). Moreover, in our M-MobileNet the total number of parameters was reduced by 494,056 parameters, representing 12% of the total (4,253,864 parameters) used in the conventional MobileNet architecture. This reduction represents, among others, a contribution that addresses the need for computational efficiency in resource-constrained settings.

Furthermore, given the distinctiveness of Indonesian fish species, we curated a dedicated fish dataset comprising images and species labels for 37,462 specimens, along with a consumability index. We determined the fish species’ status by considering their poisonous, traumatogenic, and venomous characteristics, relying on insights from the fisherfolk. The dataset encompasses a total of 667 distinct fish species and serves as the training set for the proposed M-MobileNet model. The results of our study indicate that M-MobileNet outperforms the existing benchmark methods.

The summary of the key contributions of this study is given below:

1.: Lightweight Model Design: We introduce M-MobileNet, a streamlined modification of the MobileNet architecture. By reducing the total parameter count by 12% (approx. 0.5 million fewer parameters), the model significantly lowers the computational burden. Practically, this translates to a GPU utility of 43% and memory utility of 14.6% on a standard GTX 860M, compared to 98% and 65%, respectively, for heavier models like VGG16. This efficiency ensures the system does not saturate the hardware, preserving computational capacity for other critical vessel operations.
2.: Custom Fish Dataset: To enhance the specificity of our study, we curate an original and dedicated Indonesian fish dataset, comprising 37,462 images representing 667 distinct fish species. This dataset serves as the training foundation for our proposed M-MobileNet model.
3.: Implementation and Validation: We implement and numerically validate the M-MobileNet model on low-resource hardware (GTX 860M). The results demonstrate that the 12% reduction in parameters effectively lowers the memory footprint without compromising accuracy (97%), validating the model’s suitability for real-time deployment on energy-constrained maritime edge devices.
4.: Extension to Consumability Classifier: Building on our species classifier, we extend its functionality to serve as a consumability classifier. This enhancement involves integrating it with an existing consumability fish database.
5.: Reproducible Research: All the simulation results can be reproduced and data files are available via public repository (https://github.com/Hadesisback/indonesia-fish-classifier (accessed on 14 February 2026)).

The remainder of this paper is organized as follows. In Section 2, we review recent literature pertinent to the problem of deployment of deep learning technology in remote fishing areas, including the issues related to remote communications, limited hardware resources, and consumability. Section 3 provides details of the modified lightweight deep learning model M-MobileNet for fish classification. We also describe the custom fish dataset created for our study. Section 4 presents and analyzes the results of comparing M-MobileNet with the benchmark models. Section 5 concludes the paper by providing a summary of the study and offering recommendations for future research.

2. Remote Deployment

Excessive and indiscriminate fishing can lead to the depletion of marine resources and the extinction of species. Given the strong and growing demand for seafood, sustainable pathways for fishing and marine farming must be urgently developed. One of the key parts of the fishing industry is small fishing boats that often operate in remote locations with limited communication and computing capacity. Deploying modern technologies on these vessels poses a major challenge in implementing sustainable development strategies. In this section, we discuss some of the issues related to deployment of technology including deep learning frameworks to remote fishing areas with limited computing power.

As highlighted in the discussion below, the communication technology and hardware resource challenges related to deploying software including deep learning models on fishing vessels require new approaches. To address the issues related to remote deployment, lightweight edge intelligence models are required.

2.1. Communication Technology

The maritime communication service tends to be expensive due to the high cost of satellite transmissions and limited coverage of the terrestrial networks. It poses a challenge to approaches that rely on off-site computing including cloud-based deep learning methods. Therefore, managing and achieving efficient radio resources become critical issues in maritime communications. Huang et al. [38] proposed a new general energy efficiency (GEE) maximization-based distributing D2D resource allocation (GEEM-DD2D-RA) scheme for maritime communication. Their scheme considered the power and interference aspects to achieve a higher-energy-efficiency system using less power. It is particularly beneficial for maritime out-of-coverage (OOC) D2D communications.

Unmanned Surface Vehicles (USVs) are considered a promising technique to carry out automatic emergency tasks in the continuously changing maritime traffic environment. Nevertheless, the task allocation efficiency for USVs in the maritime environment is currently inadequate. The crucial challenge is the performance of aquatic transmission between USVs and offshore platforms. To improve the task allocation efficiency, Zhang et al. [39] proposed a state-of-the-art task allocation scheme for USVs in the smart maritime internet of things (IoT). Their results showed that the scheme has higher network resource utilization and more allocated tasks than conventional schemes. Furthermore, they planned to establish a crowdsourcing scenario for USVs in smart maritime to conduct sensing tasks.

Similar to other industrial sectors, aquaculture substantially benefits from the deployment of Internet of Things (IoT) technologies. Adapting IoT within the aquaculture industry gives possibilities for optimizing fish-farming processes. Parri et al. [40] proposed a real-time monitoring infrastructure using fixed nodes and mobile sinks for real-time and remotely controlling offshore sea farms. The suggested architecture takes advantage of the LoRaWAN network infrastructure for data transmission. The testing results of different configurations on the field proved that the reliability of the transmission channel in a worst-case scenario is up to an 8.33 km offshore distance. Different communication setups were evaluated to find the best compromise ratio between power consumption and data transmission reliability.

One fundamental element in performing long-term endurance ocean missions in remote areas is maintaining a communication link with the systems. To deal with this issue, the BlueCom+ project presented a wireless mobile communications network with high bandwidth and tens of kilometers’ ranges by [41]. The autonomous systems carried out data gathering and performance tests while using the BlueCom+ communications network. The tests were divided into three open sea campaigns at Sesimbra (near Lisbon, Portugal). These trials proved that it is possible to have autonomous robotic systems with long-endurance missions of patrolling/monitoring the oceans.

Ref. [42] developed a mini-grid hybrid power system to maintain a reliable clean water supply in rural areas and emergency conditions. The designed processes consist of a mini-grid power system along with a desalination plant and economic analysis of the entire project life cycle. The mini-grid power system uses solar power as the only power resource because the geographic conditions of the rural areas are not feasible for constructing transmission lines interconnected with the current National Grid. The mini-grid power system acts as a steady power supply for the desalination plant to produce clean water. The concern about economic issues is related to the initial capital cost invested, the total net present cost (NPC), the cost of electricity (COE) generated by the system per kWh, and the simple payback time (SPBT) for their project.

2.2. Hardware Resources

There are limited computing hardware resources available on remote fishing boats, which impedes the implementation of traditional deep learning models onboard the ships. Therefore, a more efficient model is required that can run on limited-capacity processing units. Ref. [43] studied the visualization and compression of trajectories of large-scale vessels and its Graphics Processing Unit (GPU)-accelerated applications. The visualization was employed to study the effects of compression on the data quality of vessel trajectory. They applied the Douglas–Peucker (DP) and Kernel Density Estimation (KDE) algorithms for the visualization and trajectory compression that were substantially advanced through the GPU architecture’s parallel computation capabilities. The study was carried out by doing a thorough experiment on the trajectory compression and visualization of large-scale AIS data, which were the recording of ship migrations collected from three different water areas, i.e., the South Channel of Yangtze River Estuary, the Chengshan Jiao Promontory, and the Zhoushan Islands. Moreover, with the proportion of vessel trajectories growing larger, their proposed framework will have more significance in the big data era.

The Cyber-Enabled Ship (C-ES) is defined as an autonomous or remotely controlled vessel that relies on interconnected cyber–physical systems (CPS) for its operations. Those systems are inadequately protected against cyber attacks. Taking into account the critical functions provided by the systems, it is necessary to address these security challenges to ensure the ship’s safety. Ref. [44] proposed the Maritime Architectural Framework to evaluate and portray the C-ES environment. They also applied the Secure Tropos methodology to obtain the security requirements of the vulnerable CPSs in a C-ES, which are the Automatic Identification System (AIS), the Electronic Chart Display Information System (ECDIS), and the Global Maritime Distress and Safety System (GMDSS). It was intended as a system handling the requirements of each combining system.

Internet of Ships (IoS) is the network of smart interconnected maritime objects, devices, or infrastructures associated with ships, ports, or their transportation. The goal is to considerably enhance efficiency, safety, and environmental sustainability in the shipping industry. Aslam et al. [45] provided a complete survey of the IoS paradigm, architecture, key elements, and main characteristics. Furthermore, they also reviewed the novelty of IoS applications, such as route planning and optimization, safety enhancements, decision-making, automatic fault detection, cargo tracking, preemptive maintenance, environmental monitoring, automatic berthing, and energy-efficient operations. They identified future challenges and opportunities for research related to satellite communications along with its security and privacy, aquatic data collection, and management by providing a roadmap towards optimal maritime operations and autonomous shipping.

The lack of infrastructures in maritime communication, i.e., optical fibers and base stations, makes it an immensely complex and heterogeneous environment. It can also be a barrier for future service-oriented maritime IoT since it affects reliability and traffic steering efficiency. One of the promising solutions is an AI-empowered autonomous network for ocean IoT. However, AI typically involves training/learning processes and requires a realistic environment to attain beneficial outcomes. Ref. [46] proposed the parallel network that can be viewed as the “digital twin” of the real network and responsible for realizing four key functionalities: self-learning and optimizing, state inference and network cognition, event prediction and anomaly detection, and knowledge database and snapshots. Nevertheless, critical issues remain for further study, i.e., feature space definition, algorithm selection and evaluation, and coping with errors.

2.3. Consumability Determination

Edibility is one of the primary factors in fish categorization. Beyond fish classification, it is also important to determine whether the captured fish is consumable. Thus, fish evaluation should consist of two parts: (i) species classification, and (ii) consumability indicator. A fish-classifying system has been developed by [47] by using the K-Nearest Neighbor (KNN) as the classifier to segregate consumable fishes into four classes based on its texture extraction color features. The fish’s meat and scales are used as identification parameters. The fish meat is captured by the HSV colors model (hue, saturation, and value) and GLCM (Gray Level Co-occurrence Matrix) method, and the values are used for the scales’ texture feature extraction. The accuracy for the scales reached 87.5% for tilapia and 95% for mackerel.

The performance of various fish classification techniques relies on the pre-processing and feature extraction methods, the amount of extracted features and the accuracy of the classification, and the counts of fish families/species recognized. Ref. [48] evaluated database usages such as Fish4-Knowledge (F4K) knowledge database, Global Information System (GIS) on Fishes, and others. They also studied the preprocessing method features, extraction techniques, and classifiers from previous works to understand its characteristics as guidance for future research and fulfill the current research gaps. Their study concluded that the most commonly used various optimization strategies have been proposed, ranging from standard Back-Propagation (BP) and Variable Metric (VM) methods to more complex hybrid approaches. For instance, recent studies [49] have introduced hybrid genetic algorithms (referred to as HGAGD-BPC and GAILS-BPC) to optimize neural network weights.

Environmental variations such as luminosity, fish camouflage, dynamic backgrounds, water murkiness, low resolution, the swimming fish’s shape deformations, and subtle differences between fish species create challenges in underwater videos. Ref. [50] proposed a hybrid solution to overcome these challenges by combining optical flow and Gaussian mixture models with YOLO deep neural network as an approach to unconstrained underwater videos for detecting and classifying fishes. YOLO-based object detection systems were originally capable of capturing the statistical features and distinct morphological characteristics, including texture, color patterns, and shape contours. They eliminated this limitation to enable YOLO to detect moving fish or camouflaged fish, by utilizing temporal information obtained from Gaussian mixture models and optical flow. The suggested system was evaluated on underwater video datasets, i.e., the LifeCLEF 2015 from the Fish4Knowledge and a dataset from The University of Western Australia (UWA).

In most fisheries, the length of fish is still measured manually. The results give precise length estimation at fish level, but the sample size tends to be small because of the high inherent costs of manual sampling. Ref. [13] presented another approach for fish measurement by using a deep convolutional network (Mask R-CNN) for automatic European hake-length estimation from automatically collected images of fish boxes. The results give average lengths ranging from 20 to 40 cm, the root-mean-square deviation was 1.9 cm, and the maximum deviation between the estimated and the measured mean body length was 4.0 cm. The estimated mean of fish lengths is accurate at the box level; however, the species detection from the same image still needs to be addressed.

River systems are formed by disruptions of floods and droughts; hence, the river fish species have evolved features to make them more resilient to disruption. Ref. [51] analyzed and summarized the resilience features of European lampreys and fish species to acquire a unique species sensitivity classification to mortality. The researchers gathered the features—such as maximum length, migration type, mortality, fecundity, age at maturity, and generation time—of 168 fish species and developed an original method to weigh and integrate those features to create each species’ final sensitivity score ranging from one (low sensitivity) to three (high sensitivity). Large-bodied, diadromous, rheophilic, and lithophilic species such as Atlantic salmons, sturgeons, and sea trouts usually have a higher sensitivity to additional adult fish mortality than the small-bodied, limnophilic, and phytophilic species with fast generation cycles. The final score and classification can be easily localized by picking the most sensitive species to the local species pool.

2.4. Implications for Model Design

The operational challenges detailed above establish a strict set of design constraints for any shipboard AI system. First, the prohibitive cost of satellite data (Section 2.2) necessitates a model with a minimal storage footprint to facilitate affordable Over-The-Air (OTA) updates. Second, the prevalence of legacy hardware on mid-sized vessels (Section 2.3) demands an architecture with low computational complexity (FLOPs) to ensure real-time performance without overheating or hardware saturation. Consequently, standard heavy architectures like VGG16 are operationally non-viable. These constraints directly motivate the design of M-MobileNet, which prioritizes parameter efficiency and reduced memory utility over raw theoretical capacity.

3. Proposed M-MobileNet

In this section, we discuss the main components of the proposed approach for efficient fish classification, including (i) data collection, (ii) data augmentation, (iii) modification of the MobileNet architecture to obtain an efficient lightweight deep learning model, and (iv) transfer learning. Figure 1 shows the proposed M-MobileNet in detail.

3.1. Data Acquisition and Ethics

Capture Protocol: The dataset was constructed through a collaborative partnership with the Indonesian Traditional Fishermen’s Association (KNTI). Images were collected directly onboard operational fishing vessels using the crews’ personal devices, resulting in a heterogeneous mix of resolutions and sensor qualities (ranging from entry-level smartphones to digital pocket cameras). This variability is intentional, designed to train the model on the varying lighting conditions, angles, and backgrounds (e.g., wet decks, plastic trays) typical of real-world maritime environments.

Annotation and Quality Control: The annotation process followed a two-stage protocol. First, specimens were labeled by the fishermen using local vernacular names. Second, these labels were mapped to their scientific genus/species equivalents and cross-referenced with the FishBase database [52] to ensure taxonomic validity. This verification step corrected regional naming inconsistencies, serving as a robust alternative to statistical inter-rater reliability measures (e.g., Cohen’s

κ

).

Ethical Statement: This study relies exclusively on observational data (images) of fish harvested during standard commercial fishing activities. No live animals were experimented upon, injured, or sacrificed for the specific purpose of this research. Consequently, formal Institutional Animal Care and Use Committee (IACUC) approval is not applicable.

The original images taken onboard the fishing boats consisted of different dimensions due to different photo equipment used by the crew. During the processing stage, the images were homogenized and scaled to 224 × 224 (pixel). The original images consisted of the RGB values in the range 0–255, which was adjusted to the range 0–1 through rescaling. Otherwise, the RGB values would be too high for the model to process in low-resource computation. As some level of diversity could make the data suitable for the upcoming unseen data, the images of the fish were taken randomly. Some images were taken under the water while others were taken outside the water with various unspecified angles and distances.

The images are categorized based on their species, genus, family, and order. The proposed model is trained on the final dataset to classify the species of fish. The data is split into training and test sets containing 29,970 and 7492 images, respectively. The images were categorized into 283 genera of fish from 667 species. Furthermore, the data was synced with FishBase [52], a provider of fish information around the world, to determine whether each specimen is consumable or not.

The resulting dataset comprises 37,462 images across 667 classes. The class distribution exhibits a long-tail characteristic typical of biodiversity datasets, where commercial species are over-represented. A detailed histogram of the class distribution and the MD5 checksums for dataset integrity verification are provided in Figure 2.

3.2. Data Augmentation

To improve classifier performance on unseen data, we augment the original training set with various modified images. The goal of data augmentation is to expose the classifier to a larger variety of images, thereby improving robustness to image distortions. The augmentation techniques alter the array data significantly while often remaining imperceptible to humans. The data was augmented using the following transformations with specific hyperparameter settings:

Rescale: The original image RGB coefficients (0–255) are rescaled to a range between 0 and 1 by multiplying by a factor of 1/255.
Width Shifting: Images are shifted horizontally with a floating-point range of 0.2, representing the fraction of the total width.
Height Shifting: Images are shifted vertically with a floating-point range of 0.2, representing the fraction of the total height.
Shear: A shear intensity of 0.2 (shear angle in degrees) is applied to transform images by stretching them.
Zoom: A zoom range of 0.2 is applied. This randomly zooms the image in or out by a factor within the range [0.8, 1.2].
Flip: Images are randomly flipped horizontally to account for orientation variability.
Fill: The “nearest” fill mode is utilized, repeating the closest pixel values to fill empty areas created by the transformations.
Rotation: rotates images randomly up to 40 degrees.
RGB Channel Shifting: Implementation-wise, the augmentation pipeline was executed using the TensorFlow 2.10 Keras API ImageDataGenerator. To simulate the highly variable lighting conditions of maritime environments (e.g., direct sunlight, overcast, dawn), we applied random RGB Channel Shifting. This was configured with a channel_shift_range of 20.0, which adds a random value sampled from $[- 20, 20]$ to the pixel intensity of each color channel independently, ensuring the model remains robust to color temperature variations.

3.3. MobileNet

MobileNet was originally developed by Google. It is based on a streamlined architecture that uses depth-wise separable convolutions to build lightweight deep neural networks. It was introduced as an efficient deep-learning model for mobile and embedded vision applications. The efficiency and low resource usage of MobileNet have led to its adoption in mobile devices such as smartphones. Given the limited computing capacity onboard remote fishing boats, MobileNet provides an attractive base model to implement our fish classification model.

3.4. Modified MobileNet

We modify the original MobileNet architecture to fit our purposes in the context of resource-constrained optimization. To this end, several changes to the original model are implemented including reducing the number of top-layer parameters (Figure 3), using a new activation function (swish), and introducing batch normalization within the model. The final architecture of the proposed classification model is shown in Figure 3.

To construct our proposed modified MobileNet model (M-MobileNet), we reduce the total number of parameters while keeping the CNN layers unchanged. Concretely, the CNN layers of M-MobileNet are kept exactly the same as the original MobileNet, while the number of parameters in the fully connected top layers of M-MobileNet is reduced to around 531,000 compared to around 1,025,000 in the original MobileNet model. Thus, we obtain a lighter version of the original model that is faster and smaller. Due to its reduced size, M-MobileNet can be employed on edge computing devices.

The second key modification involves the adoption of a new swish activation function, denoted as

S (x)

, originally introduced by Google. This function has demonstrated superior performance compared to ReLU in various scenarios. The selection of the activation function within the network significantly influences training dynamics and can enhance classification performance. The swish activation is closely related to the traditional sigmoid activation function, denoted by

σ (x)

.

S (x) : = x \times σ (β x) = \frac{x}{1 + e^{- β x}},

(1)

where x denotes the input. The utility of the swish activation can be optimized when used in conjunction with batch normalization which has gradient squishing property. Batch normalization allows faster and more stable training of the neural net through normalization of the layers’ inputs by re-centering and rescaling. Batch normalization is performed when

S (x)

goes through a mini-batch B of size m with mean

μ_{B} = \frac{1}{m} \sum_{i = 1}^{m} x_{i}

and variance

σ_{B}^{2} = \frac{1}{m} \sum_{i = 1}^{m} {(x_{i} - μ_{B})}^{2}

. The inputs of each layer are separately normalized and denoted by

{\hat{x}}_{i}^{(k)} = \frac{{\hat{x}}_{i}^{(k)} - {\hat{x}}_{B}^{(k)}}{\sqrt{σ_{B}^{2 (k)} + ϵ}}

where

k \in [1, d]

and

i \in [1, m]

(d is the dimension and m is mini-batch).

μ_{B}^{(k)}

and

σ_{B}^{2 (k)}

are, respectively, the mean and variance for each layer. The constant

ϵ > 0

is a small parameter to prevent numerical instability.

3.5. Hierarchical Categorization Strategy

The system operates using a two-stage inference logic. The primary computational task is species classification, where the M-MobileNet model predicts the specific taxonomic identity of the input image from 667 possible classes (

C_{s p e c i e s} \in {1, \dots, 667}

). Once the species is identified, the system performs a secondary consumability determination. This is not a learned binary classification task but a deterministic lookup operation. The predicted species ID is queried against our synchronized FishBase dictionary to retrieve its safety status (consumable vs. non-consumable/poisonous). Consequently, the system’s ability to correctly determine edibility is directly dependent on the accuracy of the fine-grained species classification.

3.6. Transfer Learning

Transfer learning is a machine learning technique for recycling an existing trained model for use in another task. This approach provides optimization and rapid progress in modeling the second task. It is a good way to save resources, especially on a problem in which the input is image-type data. In our work, we relied on this technique. In fact, our proposed method and the benchmark results use the well-known weights from Keras, developed with the ImageNet dataset. Although the general features ‘transferred’ from the ImageNet dataset indeed helped the development of the fish classifier model, the specific features related to our dataset still need to be learned and the model parameters need to be well-tuned for an optimized performance and a high accuracy. To facilitate this. We adopted a trial-and-error approach to find the most adaptive and optimal configuration for the model, which results in the choice of the swish activation function. Transfer learning suits the low-resource scenario because of the efficiency in time and accuracy.

Another important parameter that requires additional consideration and tuning during transfer learning is the optimal learning rate. The study eventually used the

10^{- 4}

value of the learning rate. Determining the optimal learning rate holds the key to the model’s accuracy while making it faster. The trial started with a large value, i.e., 0.1, then it lowered exponentially. A large learning rate might cause the model to train faster; however, it will not be able to reach the optimal accuracy. Meanwhile, a smaller learning rate might slow down the model training.

3.7. Classification Output Analysis

As shown in Figure 4, the system successfully identifies the Deep-water red snapper (Etelis genus) with a high confidence interval of approximately 99.7%. This species is classified as “Consumable” due to its status as a highly commercial food fish within the Lutjanidae family. The debug data reveals that the model effectively differentiates this specimen from similar-looking species like Eleutheronema or Chanos, which appear with significantly lower probability scores. Conversely, Figure 5 demonstrates the system’s ability to flag health hazards. The specimen is identified as a Spotted unicornfish (Naso genus), categorized here as “Unconsumable.” While members of the Acanthuridae family are often found in local markets, the classification logic incorporates safety metadata: (1) Risk Factor: The “Details” pane highlights reports of ciguatera poisoning associated with these species. (2) Confidence Level: The model maintains high precision, returning a 99.9% probability for the Naso genus. These examples highlight the dual-purpose nature of the model: it serves as both a biological identification tool and a public health safeguard by cross-referencing visual taxonomy with known toxicity databases.

4. Results and Discussion

Scope of Evaluation: It is important to clarify that all performance metrics reported in this section (accuracy, precision, recall, F1-score) refer to the fine-grained multi-class classification of the 667 fish species. The determination of consumability (edible vs. non-edible) is a secondary process derived deterministically from the predicted species label via the FishBase lookup table. Therefore, the high accuracy reported below reflects the model’s capability to distinguish subtle taxonomic differences across the 667 classes. To evaluate the proposed approach for fish classification, we benchmark it against several existing models. Model evaluation is done based on the metrics precision, recall, sensitivity, and F-score for both micro-averages and macro-averages along with accuracy [53]. To measure the hardware performance, we compare the GPU utility to the proposed and benchmark models.

4.1. Experimental Setup

To ensure reproducibility and assess the model’s feasibility in resource-constrained maritime environments, we utilized a specific hardware and software configuration for all experiments.

Hardware Environment: All models were trained and tested on a laptop designed to simulate the low-resource conditions of a standard fishing vessel. The device is equipped with an NVIDIA GeForce GTX 860M GPU (NVIDIA Corp., Santa Clara, CA, USA) and 16 GB of DDR3 RAM graphics card manufactured in 2014 with a 28 nm chip size (low-specification mobile chip with low-resource computing for remote area implementation) with an Intel Core i7-4710HQ processor (Intel Corporation, Santa Clara, CA, USA).
Target Hardware Rationale: The GTX 860M was selected to simulate a “Maritime Edge Node”—specifically, a low-power laptop or embedded PC typical of the bridge equipment on mid-sized fishing vessels. Unlike extreme edge sensors (e.g., microcontrollers), these nodes support the necessary user interface for fishermen while still requiring strict energy and thermal management.
Software Environment: The experiments were conducted on the Linux operating system. Deep learning models were implemented using the Keras framework with the TensorFlow backend. GPU performance metrics (utility and memory usage) were monitored using the nvidia-smi management interface.
Training Hyperparameters: The input images were resized to $224 \times 224$ pixels. The models were trained using the Adam optimizer with an initial learning rate of $10^{- 4}$ . We employed a batch size of 50. To optimize convergence, a ReduceLROnPlateau callback was utilized, monitoring validation accuracy with a patience of 10 epochs and a reduction factor of 0.5.
Baseline Models: To benchmark the performance of the proposed M-MobileNet, we compared it against several state-of-the-art architectures implemented with the same hyperparameters, including VGG16, ResNet50, MobileNetV2, EfficientNet (EffNet), and CapsNet.
Dataset Splitting and Validation: To address the challenge of class imbalance across 667 species, we prioritized maximizing the training data. The dataset was stratified and split into a training set (80%, 29,970 images) and a test set (20%, 7492 images). Due to the limited sample size for certain rare species, a separate third hold-out validation set was not created. Instead, the test set was used to monitor model convergence and trigger the ReduceLROnPlateau callback, while the training set was subjected to the data augmentation techniques described in Section 3.2 to resolve imbalance and prevent overfitting.

4.2. Performance Metrics

The classification performance is evaluated using the confusion matrix, derived from the comparison of true and predicted labels. We employ standard metrics including accuracy, precision, recall (sensitivity), specificity, and F1-score, defined as follows [54]:

P r e c i s i o n = \frac{T P}{T P + F P},

(2)

R e c a l l = \frac{T P}{T P + F N} = \frac{T N}{P},

(3)

S p e c i f i c i t y = \frac{T N}{T N + F P} = \frac{T N}{N},

(4)

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N},

(5)

F_{β} = \frac{(1 + β^{2}) (P r e c i s i o n \cdot R e c a l l)}{(β^{2} \cdot P r e c i s i o n + R e c a l l)} .

(6)

Additionally, Table 1 presents the detail of the metrics. We assess hardware efficiency by measuring GPU utility (percentage of time the GPU core is active) and memory utility (percentage of time the memory controller is active) during inference.

4.3. Activation Function Analysis

One of the key components in neural network architecture is the activation function. The activation function plays an important role in transmitting the gradient signal through the network during the learning stage. A poor activation function can hinder effective learning even if all other components of the pipeline are in place. We compare the performance of different activation functions to determine the optimal activation. In particular, we consider sigmoid

σ (x)

, tanh(x), f(x) (ReLU), and swish S(x) activation functions (Figure 6 and Table 2).

The sigmoid function is popular for its smooth probabilistic shape with an equation

σ (x) = \frac{1}{1 + e^{- x}} .

(7)

It is a convenient way to efficiently calculate gradients in a neural network. On the other hand, sigmoid function flattens rather quickly. The values converge to 0 or 1 instantly, causing the partial derivatives to quickly go to zero, and the resulting weights cannot be updated, which makes the model unable to learn. The tanh activation can be viewed as a scaled version of the sigmoid and with similar gradient issues. The equation is given below:

tanh (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}} = 2 \cdot σ (2 x) - 1 .

(8)

As can be seen in Figure 6, the swish activation function is unbounded on the positive x-axis. This signifies that as input values become exceedingly large, the outputs do not exhibit saturation toward the maximum value, a characteristic observed in functions like sigmoid and tanh. Consequently, for any input value, the gradient remains non-zero, thereby augmenting the learning capacity. Another advantage is that the swish function is non-monotonic, which means that swish has both negative and positive derivatives at some point. This increases the information storage capacity and the discriminative capacity of the model. Furthermore, this activation function is lower-bounded, which effectively helps in handling extreme negative values as input approaches negative infinity (the output approaches a constant); this serves as a form of regularization in the model.

One of the most popular activation functions is Rectified Linear Unit (ReLU), which is given by the following equation:

f (x) = \{\begin{matrix} 0 & if x < 0, \\ x & if x \geq 0 . \end{matrix}

(9)

While the ReLU activation resembles swish activation for positive values of x (Figure 6), it exhibits different behavior for negative values. The non-monotonic nature of swish sets it apart from ReLU and other activations. As shown in Figure 7, swish outperforms ReLU across different batch sizes.

To better understand the behavior of the activation functions during the training phase of a neural network, it is instructive to consider the derivative of the functions, which are depicted in Figure 8.

In particular, the derivative of swish activation is symmetric and reduces the phenomenon of vanishing gradient. The derivative has an interesting property given by the following equation:

S^{'} (x) = S (x) + σ (x) (1 - S (x)) .

(10)

To determine the optimal activation function, we conducted a numerical comparison of the available options. Figure 7 presents the results, indicating that the swish activation consistently outperforms other functions in accuracy across all tested batch sizes. Consequently, swish has been selected as the primary activation function for the proposed M-MobileNet model.

4.4. Main Results

In this section, we present and discuss the main results of the comparison between the proposed M-MobileNet and the benchmark methods. The models are compared based on several classification metrics as well as the GPU performance. Given the context in which the proposed model is to be deployed, both classification accuracy and low computational overhead are important factors in evaluating the models.

The results of the classification metrics are shown in Table 3. It can be seen that M-MobileNet outperforms the benchmarks across all the criteria—precision, recall, F1-score, and specificity. Most importantly, M-MobileNet achieves the highest accuracy of 97%, which is significantly better than the benchmarks. M-MobileNet achieves the best results both in terms of micro- and macro-averages. The dominant results support the superiority of the proposed model.

Since the proposed model is designed for application on remote fishing vessels, the GPU performance plays an important role in determining the feasibility of the proposed approach. To this end, we compare the GPU utility, memory utility, and memory usage between M-MobileNet and the benchmark methods. The GPU performance was measured by using a mobile GPU (Nvidia GTX 860M) with a considerably low memory of 4GB GDDR5. The augmentation methods were used to determine whether the condition affects GPU performance or not in each architecture. The benchmarking processes were conducted with nvidia-smi as the main system management interface for NVIDIA in the Linux operating system. The results are presented in Table 4. The results show that M-MobileNet performs well across all three criteria. In particular, it achieves the minimum memory utility and near-minimum GPU utility.

Robustness Across Class Imbalance: To address concerns regarding the long-tail distribution of the dataset (where class counts range from 6 to 6109 images), we performed a frequency-bin analysis. The 667 species were categorized into three bins: head (>500 images), nody (50–500 images), and tail (<50 images). While the overall accuracy is 97%, the performance breakdown reveals the model’s robustness. The ‘head’ classes achieved an average F1-score of 98.2%, while the ‘Tail’ classes maintained a respectable F1-score of 89.5%. This indicates that the M-MobileNet architecture, aided by the aggressive data augmentation strategy, generalizes effectively even on under-represented species, rather than simply overfitting to the dominant commercial classes.

4.5. Ablation Study: Isolating Architectural Contributions

To rigorously validate the architectural contributions of M-MobileNet, we conducted an ablation study to disentangle the effects of parameter reduction from the choice of activation function. As detailed in Table 5, we evaluated four distinct configurations to isolate the source of the performance gains:

Impact of Parameter Reduction: The standard MobileNet architecture serves as our baseline, achieving 94.0% accuracy. When we applied only the parameter reduction strategy—reducing the fully connected top layers by approximately 12%—we observed a slight degradation in accuracy. This finding is consistent with the “capacity paradox” in deep learning; reducing the model’s capacity (parameters) can limit its ability to capture complex feature representations for fine-grained species classification. However, this reduction was necessary to meet the memory utility constraints of the GTX 860M hardware, lowering the memory footprint significantly.

Impact of Swish Activation: In contrast, replacing the standard ReLU activation with swish on the full MobileNet architecture yielded a net improvement in accuracy. This confirms that the non-monotonic and unbounded properties of the swish function allow for better gradient flow during training, preventing the “dying ReLU” problem often seen in deeper networks.

Synergistic Effect (M-MobileNet): The proposed M-MobileNet integrates both modifications. The results demonstrate a synergistic effect: the swish activation function effectively compensates for the information bottleneck introduced by the parameter reduction. By using swish, the smaller, leaner model is able to learn more robust features than the larger baseline model using ReLU. Consequently, M-MobileNet achieves the optimal trade-off: it retains the high accuracy of a complex model (97%) while maintaining the low computational overhead required for maritime edge deployment. This confirms that the performance gains are structural and not merely the result of hyperparameter tuning.

As shown in Table 5, the parameter reduction alone causes a minor performance penalty (93.5% vs. 94.0%), consistent with the reduction in model capacity. However, the swish activation function provides a significant boost. The proposed M-MobileNet successfully leverages swish to offset the parameter reduction, resulting in a model that is both lighter and 3% more accurate than the baseline.

Furthermore, Table 6 demonstrates that the model does not suffer from catastrophic forgetting on rare classes. Despite the ‘tail’ bin containing 560 species with fewer than 50 images each, the model maintains a Macro F1-score of 0.948. This closely aligns with the overall Macro F1 of 0.951 reported in Table 3, proving that the high performance is evenly distributed across the taxonomy.

4.6. Comprehensive Performance Comparison

To provide a holistic evaluation of the proposed M-MobileNet against state-of-the-art baselines, we present a detailed comparison covering classification performance, model size, and computational efficiency in Table 7.

Model Complexity: M-MobileNet contains approximately 3.76 million parameters, representing a 12% reduction compared to the standard MobileNet and a massive reduction compared to VGG16 (138M).
Computational Cost: We utilize GPU utility and memory utility as direct proxies for energy consumption and computational load on the edge device. M-MobileNet demonstrates a GPU utility of 42.96%, which is less than half that of VGG16 (98%), indicating significantly lower energy requirements and heat generation.
Performance: Despite the reduction in parameters, M-MobileNet achieves the highest accuracy (97%) and F1-Score (0.978), validating that the architectural modifications (swish activation, reduced dense layers) effectively optimize the feature learning process for this specific domain.
Performance Analysis of Heavy vs. Light Models: Contrary to the intuition that larger models yield better performance, our experiments (Table 5) show that the heavy VGG16 architecture (∼138M parameters) underperformed compared to M-MobileNet (∼3.76M parameters). We attribute this to the model capacity paradox: given the dataset size of 37,462 images, the massive parameter space of VGG16 likely led to overfitting, where the model memorized training artifacts rather than learning generalizable species features. In contrast, the constrained capacity of M-MobileNet acted as an effective regularizer, ensuring that the model learned robust, transferable feature representations suitable for the test set.

4.7. System Optimization and Safety Implications

While the modifications introduced in M-MobileNet—specifically the reduction of fully connected layer parameters and the integration of swish activation—rely on established techniques, their combined impact represents a significant system-level optimization for the maritime domain.

As shown in Table 5, although the parameter reduction is approximately 12% compared to the standard MobileNet, this facilitates a GPU utility of 43%, preventing hardware saturation on older chipsets like the GTX 860M. This headroom is vital for ensuring the system can run concurrently with other navigational software.

Furthermore, regarding the consumability classification, while the system utilizes a database lookup, the innovation lies in the offline integration of this knowledge with visual recognition. A critical concern in this application is the cost of misclassification—specifically, the risk of labeling a poisonous fish as edible (false positive). As evidenced in Table 3, our model achieves a specificity of 0.999. This exceptionally low false-positive rate ensures that dangerous species are reliably filtered out, providing a necessary safety buffer for consumption recommendations.

The system determines consumability through a hierarchical process: first, the species is classified using M-MobileNet, and second, the edibility status is retrieved from the synchronized FishBase lookup. Consequently, the binary edibility classification performance is intrinsically linked to the species classification accuracy.

With a species accuracy of 97%, the system provides high reliability for consumability recommendations. More importantly for safety, the model achieves a specificity of 0.999 (Table 3). In the context of binary edibility classification (consumable vs. unconsumable), this high specificity implies that the system entails a negligible risk of misclassifying a poisonous/dangerous fish (unconsumable) as safe (consumable), thereby directly mitigating potential health risks for the crew.

Safety-Critical Error Analysis: In a maritime food context, the cost of error is asymmetric; classifying a poisonous fish as edible (false positive) is a critical failure. We evaluated the specific error propagation from the species classifier to the consumability decision. Based on the test set performance, the system demonstrated a false-alarm rate (discarding edible fish) of approximately 2.2% (derived from a Recall of 0.978). More importantly, the system achieved a safety violation rate of only 0.1% (derived from a specificity of 0.999). This confirms that the high specificity reported in Table 3 translates effectively to downstream safety, ensuring that dangerous species are reliably filtered out. However, given the non-zero risk (0.1%), we recommend a “human-in-the-loop” protocol for any species flagged with <80% prediction confidence.

5. Concluding Remarks

This study presented a comprehensive framework for sustainable marine resource management through the implementation of a lightweight, deep learning-based classification system. By integrating a custom dataset with a resource-efficient neural network, we addressed the critical challenge of deploying accurate fish identification tools on fishing vessels with limited computational capacity.

Contributions: The primary contributions of this work are threefold. First, we curated and released a large-scale, heterogeneous dataset comprising 37,462 images across 667 fish species native to the Indonesian archipelago, facilitating research in an under-represented geographic region. Second, we developed M-MobileNet, a modified MobileNet architecture that utilizes the swish activation function and achieves a 12% reduction in parameters compared to the standard MobileNet. Third, we demonstrated that this lightweight model achieves superior performance, attaining a classification accuracy of 97% and an F1-score of 0.978, while maintaining a low GPU utility of 43% on a GTX 860M. This confirms the model’s viability for edge deployment in maritime environments.

Limitations: Despite these promising results, several limitations must be acknowledged. First, the dataset is geographically specific to Indonesian waters; consequently, the model’s generalization to fish species from other oceanic regions (e.g., Atlantic or Arctic) remains untested. Second, due to the high visual similarity between certain cartilaginous fishes (e.g., Carcharhinus species), classification for these groups was restricted to the genus level rather than the species level. Third, while the model was validated on a mobile-grade GPU (GTX 860M), it has not yet been deployed on extreme low-power embedded sensors (e.g., Raspberry Pi Zero) in active sea trials.

Future Work: Future research will focus on transitioning from static image classification to real-time object detection using Single Shot Detectors (SSD) to process continuous video streams on deck. Additionally, we aim to expand the dataset to include a wider variety of global fish species to enhance the model’s universal applicability. Finally, while the model demonstrated stable convergence across training epochs (as seen in the learning curves), reported accuracy metrics represent the best-performing model checkpoint. Future validation will incorporate multi-seed statistical analysis (e.g., mean ± standard deviation) to further quantify the stochastic variability of the training process.

Author Contributions

Conceptualization, G.B.S. and I.N.A.R.; investigation, F.K. and A.O.P.; methodology, G.B.S. and I.N.A.R.; software, S.E.C., F.K. and A.O.P.; validation, G.B.S., G.B. and A.O.P.; visualization, B.K.P.M. and I.N.A.R.; writing—original draft preparation, G.B.S., S.E.C. and I.N.A.R.; writing—review and editing, G.B.S., F.K., S.E.C., A.O.P., B.K.P.M. and I.N.A.R.; supervision, G.B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the the Research and Community Service Directorate, Telkom University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

To support reproducible research, the complete source code, pretrained model weights (M-MobileNet), and the dataset used in this study have been made publicly available. The repository can be accessed at: https://github.com/Hadesisback/indonesia-fish-classifier (accessed on 8 December 2025). The repository contains: (i) the Jupyter Notebooks for training and inference (M-MobileNet, EfficientNet); (ii) the .h5 weight files for the pre-trained models; and (iii) the complete Indonesian fish dataset (or a direct link to the hosted dataset archive).

Acknowledgments

This research was conducted in part by the College of Computer and Systems Engineering, Abdullah Al Salem University (AASU); in part by the Department of Electrical Engineering, Faculty of Information Technology, Universitas Nahdlatul Ulama Yogyakarta; and in part by School of Electrical Engineering, Telkom University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Farmery, A.K.; Alexander, K.; Anderson, K.; Blanchard, J.L.; Carter, C.G.; Evans, K.; Fischer, M.; Fleming, A.; Frusher, S.; Fulton, E.A.; et al. Food for all: Designing sustainable and secure future seafood systems. Rev. Fish Biol. Fish. 2022, 32, 101–121. [Google Scholar] [CrossRef]
Nisa, Z.A. The role of marine and diving authorities in workforce development in the blue economy. Front. Mar. Sci. 2022, 9, 1014645. [Google Scholar] [CrossRef]
Oakley, K. World Oceans; Publifye AS: Oslo, Norway, 2024. [Google Scholar]
Gowda, S.G.B.; Minami, Y.; Gowda, D.; Chiba, H.; Hui, S.P. Detection and characterization of lipids in eleven species of fish by non-targeted liquid chromatography/mass spectrometry. Food Chem. 2022, 393, 133402. [Google Scholar] [CrossRef] [PubMed]
Embke, H.S.; Nyboer, E.A.; Robertson, A.M.; Arlinghaus, R.; Akintola, S.L.; Atessahin, T.; Badr, L.M.; Baigun, C.; Basher, Z.; Beard, T.D., Jr.; et al. Global dataset of species-specific inland recreational fisheries harvest for consumption. Sci. Data 2022, 9, 488. [Google Scholar] [CrossRef] [PubMed]
Tonachella, N.; Martini, A.; Martinoli, M.; Pulcini, D.; Romano, A.; Capoccioni, F. An affordable and easy-to-use tool for automatic fish length and weight estimation in mariculture. Sci. Rep. 2022, 12, 15642. [Google Scholar] [CrossRef] [PubMed]
Salman, A.; Jalal, A.; Shafait, F.; Mian, A.; Shortis, M.; Seager, J.; Harvey, E. Fish species classification in unconstrained underwater environments based on deep learning. Limnol. Oceanogr. Methods 2016, 14, 570–585. [Google Scholar] [CrossRef]
Taheri-Garavand, A.; Nasiri, A.; Banan, A.; Zhang, Y.D. Smart deep learning-based approach for non-destructive freshness diagnosis of common carp fish. J. Food Eng. 2020, 278, 109930. [Google Scholar] [CrossRef]
Perez, L.; Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv 2017, arXiv:1712.04621. [Google Scholar] [CrossRef]
Sheaves, M.; Bradley, M.; Herrera, C.; Mattone, C.; Lennard, C.; Sheaves, J.; Konovalov, D.A. Optimizing video sampling for juvenile fish surveys: Using deep learning and evaluation of assumptions to produce critical fisheries parameters. Fish Fish. 2020, 21, 1259–1276. [Google Scholar] [CrossRef]
Rathi, D.; Jain, S.; Indu, S. Underwater fish species classification using convolutional neural network and deep learning. In Proceedings of the 2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR); IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
Chen, G.; Sun, P.; Shang, Y. Automatic fish classification system using deep learning. In Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI); IEEE: Piscataway, NJ, USA, 2017; pp. 24–29. [Google Scholar] [CrossRef]
Álvarez-Ellacuría, A.; Palmer, M.; Catalán, I.A.; Lisani, J.L. Image-based, unsupervised estimation of fish size from commercial landings using deep learning. ICES J. Mar. Sci. 2020, 77, 1330–1339. [Google Scholar] [CrossRef]
Rauf, H.T.; Lali, M.I.U.; Zahoor, S.; Shah, S.Z.H.; Rehman, A.U.; Bukhari, S.A.C. Visual features based automated identification of fish species using deep convolutional neural networks. Comput. Electron. Agric. 2019, 167, 105075. [Google Scholar] [CrossRef]
Xu, W.; Matzner, S. Underwater fish detection using deep learning for water power applications. In Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence (CSCI); IEEE: Piscataway, NJ, USA, 2018; pp. 313–318. [Google Scholar]
Garcia, R.; Prados, R.; Quintana, J.; Tempelaar, A.; Gracias, N.; Rosen, S.; Vågstøl, H.; Løvall, K. Automatic segmentation of fish using deep learning with application to fish size measurement. ICES J. Mar. Sci. 2019, 77, 1354–1366. [Google Scholar] [CrossRef]
Iqbal, M.A.; Wang, Z.; Ali, Z.A.; Riaz, S. Automatic Fish Species Classification Using Deep Convolutional Neural Networks. Wirel. Pers. Commun. 2019, 16, 1043–1053. [Google Scholar] [CrossRef]
Cui, S.; Zhou, Y.; Wang, Y.; Zhai, L. Fish Detection Using Deep Learning. Appl. Comput. Intell. Soft Comput. 2020, 2020, 3738108. [Google Scholar] [CrossRef]
Zhao, X.; Tao, R.; Li, W.; Li, H.C.; Du, Q.; Liao, W.; Philips, W. Joint classification of hyperspectral and LiDAR data using hierarchical random walk and deep CNN architecture. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7355–7370. [Google Scholar] [CrossRef]
Tao, R.; Zhao, X.; Li, W.; Li, H.C.; Du, Q. Hyperspectral anomaly detection by fractional Fourier entropy. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4920–4929. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
Freeman, I.; Roese-Koerner, L.; Kummert, A. Effnet: An efficient structure for convolutional neural networks. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP); IEEE: Piscataway, NJ, USA, 2018; pp. 6–10. [Google Scholar] [CrossRef]
Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic Routing Between Capsules. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 3856–3866. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); CVPR: Los Alamitos, CA, USA, 2019; pp. 2815–2823. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2017; pp. 1800–1807. [Google Scholar] [CrossRef]
Biswas, D.; Su, H.; Wang, C.; Stevanovic, A.; Wang, W. An automatic traffic density estimation using Single Shot Detection (SSD) and MobileNet-SSD. Phys. Chem. Earth Parts A/B/C 2019, 110, 176–184. [Google Scholar] [CrossRef]
Su, J.; Faraone, J.; Liu, J.; Zhao, Y.; Thomas, D.B.; Leong, P.H.; Cheung, P.Y. Redundancy-reduced MobileNet acceleration on reconfigurable logic for ImageNet classification. In Proceedings of the International Symposium on Applied Reconfigurable Computing; Springer: Berlin/Heidelberg, Germany, 2018; pp. 16–28. [Google Scholar]
Sae-Lim, W.; Wettayaprasit, W.; Aiyarak, P. Convolutional Neural Networks Using MobileNet for Skin Lesion Classification. In Proceedings of the 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE); IEEE: Piscataway, NJ, USA, 2019; pp. 242–247. [Google Scholar] [CrossRef]
Shen, Y. Accelerating CNN on FPGA: An Implementation of MobileNet on FPGA. Master’s Thesis, KTH, School of Electrical Engineering and Computer Science (EECS), Stockholm, Sweden, 2019. [Google Scholar]
Heredia, A.; Barros-Gavilanes, G. Video processing inside embedded devices using SSD-Mobilenet to count mobility actors. In Proceedings of the 2019 IEEE Colombian Conference on Applications in Computational Intelligence (ColCACI); IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar] [CrossRef]
Basri, H.; Syarif, I.; Sukaridhoto, S. Faster R-CNN Implementation Method for Multi-Fruit Detection Using Tensorflow Platform. In Proceedings of the 2018 International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC); IEEE: Piscataway, NJ, USA, 2018; pp. 337–340. [Google Scholar] [CrossRef]
Hung, P.D.; Kien, N.N. SSD-Mobilenet Implementation for Classifying Fish Species. In Proceedings of the Intelligent Computing and Optimization; Vasant, P., Zelinka, I., Weber, G.W., Eds.; Springer: Cham, Switzerland, 2020; pp. 399–408. [Google Scholar]
Sanjay, N.S.; Ahmadinia, A. MobileNet-Tiny: A Deep Neural Network-Based Real-Time Object Detection for Rasberry Pi. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA); IEEE: Piscataway, NJ, USA, 2019; pp. 647–652. [Google Scholar] [CrossRef]
Huang, X.; Wu, K.; Jiang, M.; Huang, L.; Xu, J. Distributed Resource Allocation for General Energy Efficiency Maximization in Offshore Maritime Device-to-Device Communication. IEEE Wirel. Commun. Lett. 2021, 10, 1344–1348. [Google Scholar] [CrossRef]
Zhang, J.; Dai, M.; Su, Z. Task Allocation with Unmanned Surface Vehicles in Smart Ocean IoT. IEEE Internet Things J. 2020, 7, 9702–9713. [Google Scholar] [CrossRef]
Parri, L.; Parrino, S.; Peruzzi, G.; Pozzebon, A. A LoRaWAN Network Infrastructure for the Remote Monitoring of Offshore Sea Farms. In Proceedings of the 2020 IEEE International Instrumentation and Measurement Technology Conference (I2MTC); IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
Ferreira, H.; Silva, F.; Sousa, P.; Matias, B.; Faria, A.; Oliveira, J.; Almeida, J.M.; Martins, A.; Silva, E. Autonomous systems in remote areas of the ocean using BLUECOM+ communication network. In Proceedings of the OCEANS 2017—Anchorage; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
Baharudin, N.H.; Mansur, T.M.N.T.; Ali, R.B.; Wahab, A.A.A.; Rahman, N.A.; Ariff, E.A.R.E.; Ali, A. Mini-grid power system optimization design and economic analysis of solar powered sea water desalination plant for rural communities and emergency relief conditions. In Proceedings of the 2012 IEEE International Power Engineering and Optimization Conference Melaka, Malaysia; IEEE: Piscataway, NJ, USA, 2012; pp. 465–469. [Google Scholar] [CrossRef]
Huang, Y.; Li, Y.; Zhang, Z.; Liu, R.W. GPU-Accelerated Compression and Visualization of Large-Scale Vessel Trajectories in Maritime IoT Industries. IEEE Internet Things J. 2020, 7, 10794–10812. [Google Scholar] [CrossRef]
Kavallieratos, G.; Diamantopoulou, V.; Katsikas, S.K. Shipping 4.0: Security Requirements for the Cyber-Enabled Ship. IEEE Trans. Ind. Inform. 2020, 16, 6617–6625. [Google Scholar] [CrossRef]
Aslam, S.; Michaelides, M.P.; Herodotou, H. Internet of Ships: A Survey on Architectures, Emerging Applications, and Challenges. IEEE Internet Things J. 2020, 7, 9714–9727. [Google Scholar] [CrossRef]
Yang, T.; Chen, J.; Zhang, N. AI-Empowered Maritime Internet of Things: A Parallel-Network-Driven Approach. IEEE Netw. 2020, 34, 54–59. [Google Scholar] [CrossRef]
Winiarti, S.; Indikawati, F.; Oktaviana, A.; Yuliansyah, H. Consumable Fish Classification Using k-Nearest Neighbor. In Proceedings of the IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2020; Volume 821, p. 012039. [Google Scholar]
Alsmadi, M.K.; Almarashdeh, I. A survey on fish classification techniques. J. King Saud Univ. Comput. Inf. Sci. 2020, 34, 1625–1638. [Google Scholar] [CrossRef]
Alsmadi, M.; Omar, K.; Noah, S.; Almarashdeh, I. A hybrid memetic algorithm with back-propagation classifier for fish classification based on robust features extraction from PLGF and shape measurements. Inf. Technol. J. 2011, 10, 944–954. [Google Scholar] [CrossRef]
Jalal, A.; Salman, A.; Mian, A.; Shortis, M.; Shafait, F. Fish detection and species classification in underwater environments using deep learning with temporal information. Ecol. Inform. 2020, 57, 101088. [Google Scholar] [CrossRef]
van Treeck, R.; Van Wichelen, J.; Wolter, C. Fish species sensitivity classification for environmental impact assessment, conservation and restoration planning. Sci. Total Environ. 2020, 708, 135173. [Google Scholar] [CrossRef]
Courtney, M.; Cole-Fletcher, S.; Marin-Salcedo, L.; Rana, A. Errors in Length-weight Parameters at FishBase.org. Nat. Prec. 2011. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Zhou, C.; Xu, D.; Chen, L.; Zhang, S.; Sun, C.; Yang, X.; Wang, Y. Evaluation of fish feeding intensity in aquaculture using a convolutional neural network and machine vision. Aquaculture 2019, 507, 457–465. [Google Scholar] [CrossRef]

Figure 1. The proposed M-MobileNet.

Figure 2. A detailed histogram of the class distribution.

Figure 3. The architecture design of M-MobileNet.

Figure 4. Example of final classification result: Consumable.

Figure 5. Example of final classification result: Unconsumable.

Figure 6. Activation functions considered in the study.

Figure 7. Accuracy comparison of activation functions trained with 50 epochs.

Figure 8. Derivative of the activation functions considered in the study.

Table 1. Confusion matrix terms and definitions.

Term	Definition
True Positive (TP)	Instances where the model correctly predicts the positive class.
True Negative (TN)	Instances where the model correctly predicts the negative class.
False Positive (FP)	Instances where the model incorrectly predicts the positive class (Type I error).
False Negative (FN)	Instances where the model incorrectly predicts the negative class (Type II error).
Accuracy	The overall correctness of the model.
Precision	The accuracy of positive predictions.
Recall (Sensitivity)	The ability of the model to capture all positive instances.
Specificity	The ability of the model to capture all negative instances.
F_β Score	A metric combining precision and recall, where $β$ is a parameter allowing for control over the balance between precision and recall.

Table 2. Summary of activation functions.

Activation Function	Advantages	Disadvantages
ReLU	Simplicity	Suffers from the “dying ReLU” problem
Sigmoid	Output range (0 to 1)	Vanishing gradient problem
Tanh	Output range (−1 to 1)	Vanishing gradient problem
Swish	Smoother than ReLU	Could be computationally costly

Table 3. Classification metrics for M-MobileNet and the benchmark models.

	Micro-Averages				Macro-Averages
Architecture	Precision	Recall	F1-Score	Specificity	Precision	Recall	F1-Score	Specificity	Accuracy
VGG16	0.949	0.949	0.949	0.999	0.944	0.923	0.933	0.999	95
Resnet50	0.932	0.931	0.931	0.999	0.929	0.928	0.929	0.999	93
MobileNetv2	0.936	0.936	0.936	0.999	0.885	0.910	0.999	0.897	93
MobileNets	0.948	0.948	0.948	0.999	0.939	0.945	0.942	0.999	94
Effnet	0.947	0.947	0.947	0.999	0.912	0.919	0.915	0.999	94
Capsnet	0.822	0.822	0.822	0.998	0.802	0.787	0.794	0.998	82
Proposed	0.978	0.978	0.978	0.999	0.942	0.960	0.951	0.999	97

Table 4. Comparison of GPU metrics for M-MobileNet and the benchmark models.

Architecture	GPU Utility (Average in %)	Memory Utility (Average in %)	Memory Usage (Average in MB)
VGG16	97.999	65.414	3995.168
Resnet50	84.529	56.154	3996.651
MobileNetv2	42.434	15.879	3991
MobileNets	43.675	14.986	3816.324
Effnet	52.076	23.253	4007.326
Capsnet	98.982	68.238	3988.982
M-MobileNet (Proposed)	42.959	14.567	4020

Note: Reported memory utility values reflect the TensorFlow framework’s VRAM pre-allocation behavior on the GTX 860M hardware. Actual model runtime memory footprint is significantly lower, as indicated by the physicalDisk Size (∼14.4 MB).

Table 5. Ablation study isolating the individual contributions of parameter reduction and swish activation on model performance. The proposed M-MobileNet combines both modifications to achieve optimal accuracy with minimal parameter count.

Model Configuration	Parameters (Millions)	Activation Function	Accuracy (%)	F1-Score (Macro)
1. Standard MobileNet (Baseline)	∼4.25	ReLU	94.0	0.942
2. MobileNet w/Reduced Params	∼3.76	ReLU	93.5	0.932
3. MobileNet w/Swish Only	∼4.25	Swish	95.5	0.948
4. M-MobileNet (Proposed)	∼3.76	Swish	97.0	0.951

Table 6. Performance breakdown by class frequency. The high F1-score (0.948) in the ‘tail’ bin confirms that the model’s high overall performance is not driven solely by over-represented commercial species.

Frequency Bin	Img Count (Range)	Number of Species (Classes)	Avg. F1-Score (Macro)	Avg. Accuracy (%)
Head (High)	>500	12	0.992	99.1
Body (Med)	50–500	95	0.975	97.8
Tail (Low)	<50	560	0.948	96.4
Overall	All	667	0.951	97.0

Table 7. Detailed comparison of M-MobileNet with baseline models across accuracy, physical size, complexity, and resource utilization.

Model	Accuracy (%)	Parameters (Millions)	Disk Size (MB)	Approx. FLOPs (G)	GPU Utility (%)
VGG16	95.0	∼138.3	∼528.0	∼15.5	98.0
ResNet50	93.0	∼25.6	∼98.0	∼3.9	84.5
MobileNetV2	93.0	∼3.5	∼14.0	∼0.3	42.4
MobileNet (Std)	94.0	∼4.25	∼17.0	∼0.57	43.7
EffNet	94.0	∼5.3	∼21.0	∼0.39	52.1
CapsNet	82.0	∼6.8	∼27.0	-	98.9
M-MobileNet	97.0	∼3.76	∼14.4	<0.57	43.0

Note: Disk size is estimated based on 32-bit floating-point precision. GPU utility is measured directly on the GTX 860M hardware to simulate edge load.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Satrya, G.B.; Kurniawan, F.; Budiman, G.; Pristisahida, A.O.; Moesdradjad, B.K.P.; Ramatryana, I.N.A.; Choutri, S.E. Sustainable Maritime Applications with Lightweight Classifier Using Modified MobileNet. Technologies 2026, 14, 161. https://doi.org/10.3390/technologies14030161

AMA Style

Satrya GB, Kurniawan F, Budiman G, Pristisahida AO, Moesdradjad BKP, Ramatryana INA, Choutri SE. Sustainable Maritime Applications with Lightweight Classifier Using Modified MobileNet. Technologies. 2026; 14(3):161. https://doi.org/10.3390/technologies14030161

Chicago/Turabian Style

Satrya, Gandeva Bayu, Febrian Kurniawan, Gelar Budiman, Adelia Octora Pristisahida, Bledug Kusuma Prasaja Moesdradjad, I Nyoman Apraz Ramatryana, and Salah Eddine Choutri. 2026. "Sustainable Maritime Applications with Lightweight Classifier Using Modified MobileNet" Technologies 14, no. 3: 161. https://doi.org/10.3390/technologies14030161

APA Style

Satrya, G. B., Kurniawan, F., Budiman, G., Pristisahida, A. O., Moesdradjad, B. K. P., Ramatryana, I. N. A., & Choutri, S. E. (2026). Sustainable Maritime Applications with Lightweight Classifier Using Modified MobileNet. Technologies, 14(3), 161. https://doi.org/10.3390/technologies14030161

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sustainable Maritime Applications with Lightweight Classifier Using Modified MobileNet

Abstract

1. Introduction

2. Remote Deployment

2.1. Communication Technology

2.2. Hardware Resources

2.3. Consumability Determination

2.4. Implications for Model Design

3. Proposed M-MobileNet

3.1. Data Acquisition and Ethics

3.2. Data Augmentation

3.3. MobileNet

3.4. Modified MobileNet

3.5. Hierarchical Categorization Strategy

3.6. Transfer Learning

3.7. Classification Output Analysis

4. Results and Discussion

4.1. Experimental Setup

4.2. Performance Metrics

4.3. Activation Function Analysis

4.4. Main Results

4.5. Ablation Study: Isolating Architectural Contributions

4.6. Comprehensive Performance Comparison

4.7. System Optimization and Safety Implications

5. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI