1.1. Context and Motivation
Since ancient times, the maritime domain has been vital as a hub for communication, travel, and trade, driving cultural exchange, economic growth, and the global movement of goods and people [
1]. Today, it plays an even more crucial role in global trade, security, and environmental sustainability. Over 80% of global trade by volume is now transported via maritime routes, supported by a fleet of over 50,000 merchant vessels. In 2022 alone, container carriers earned an estimated
$296.3 billion, a 38% increase from 2021. In 2023, maritime trade rebounded with a 2.4% increase, reaching approximately 12.3 billion tonnes, while the global fleet grew by 3.4%, totalling around 2.4 billion deadweight tons of carrying capacity [
2].
Despite these benefits, the scale of maritime activity brings major challenges. According to the 2022 United Nations Conference on Trade and Development (UNCTAD) report, the number of container ship port calls surged, particularly in Asia, which handled 63% of global container trade, creating renewed congestion. By mid-2024, average waiting times at key ports had nearly doubled, with some, like Singapore, seeing rates approach 10 days [
2]. Between 2014 and 2023, European waters saw 26,595 maritime incidents, averaging 2660 annually, primarily involving cargo and passenger ships. These led to hundreds of injuries and over 600 pollution events per year. Similarly, Canada averaged 289 maritime accidents annually from 2010 to 2019, many causing fatalities and environmental damage [
3,
4].
To address these and other challenges, public and private maritime entities are advancing the digital transformation of the sector. This transformation includes adopting advanced technologies such as artificial intelligence (AI), the Internet of Things (IoT), Blockchain, and Big Data analytics [
5,
6]. These innovations aim to improve operational efficiency, security, and environmental compliance by applying modern solutions to long-standing challenges. Data is central to this transformation. A key example is Automatic Identification System (AIS) data, collected in real time via ground stations, vessel transmitters, and satellites. It enables the continuous exchange of vessel status and location, improving maritime safety [
7]. UNCTAD’s 2024 report prioritised the use of real-time digital platforms based on AIS data [
2]. However, working with AIS data is challenging due to its massive and continuous volumes, which makes processing and storage complex and costly [
7]. Privacy and copyright-related challenges are also critical when using AIS data. Often owned by entities unwilling to share it for business or confidentiality reasons, AIS data remains largely inaccessible, hindering efforts to tackle maritime challenges [
8].
An effective approach should simultaneously limit the amount of AIS data to be collected, processed, and stored for digital applications, such as AI-based solutions, while preserving data privacy. This means avoiding the explicit sharing of critical vessel information like location, status, or characteristics. Since most maritime challenges occur in known spatial areas (e.g., ports, canals) and during specific times (e.g., seasonal peaks, rush hours) when vessel traffic density (VTD) is high, the solution should identify global VTD hotspots. Digital tools could then be applied selectively over limited spatial and temporal ranges, filtering only the necessary AIS data. Additionally, the approach must incorporate privacy-preserving strategies to avoid exposing proprietary or sensitive information.
This paper proposes a privacy-preserving, performance-driven approach for Vessel Traffic Density (VTD) prediction that offers a global view of traffic patterns across wide spatiotemporal intervals. It preserves data privacy for both data owners and individual vessels while reducing high-performance hardware requirements by distributing model training across multiple local deployments. The approach integrates proprietary AIS datasets with standardized VTD calculation methods, particularly the European Maritime Observation and Data Network’s (EMODNet) VTD calculation method [
9], Machine Learning (ML), Deep Learning (DL) [
10] and Federated Learning (FL) approaches [
11] to generate accurate VTD forecasts while safeguarding AIS data privacy.
1.3. Proposed Solution
Predicting VTD using AIS data is essential for addressing maritime challenges. Accurate predictions enhance safety by identifying collision hotspots or helping vessels avoid congestion. Port authorities can improve logistical planning, while environmental agencies can monitor high-traffic zones to manage pollution. Security forces can target surveillance in high-risk areas to strengthen maritime security. AIS datasets are characterized by their large volume and privacy limitations. Nearly every vessel transmits AIS data at short intervals, typically tens to hundreds of seconds, making AIS a true Big Data source with significant processing and analysis challenges. Additionally, AIS enables near real-time vessel tracking, raising serious privacy concerns. These datasets are often proprietary, owned by public or private entities that collect data from specific fleets or geographic regions.
To tackle these limitations, the proposed solution uses several strategies. To manage its Big Data nature, the approach uses vessel traffic densities instead of raw AIS records for training VTD predictive models. While density calculation is somewhat processing-intensive, it significantly reduces data volumes and simplifies model training. Both density calculation and model training occur in a distributed environment, where each local data provider trains a model on their own hardware using smaller local datasets. These models are then aggregated via an FL architecture. A central server receives model weights from clients and federates them into a global model. This global model helps identify high-density traffic regions, which are more likely to experience collisions, delays, or bottlenecks. Recognizing these areas enables targeted training of other predictive models, such as collision detection or avoidance systems, using only data from dense traffic regions and periods, reducing the need to train models on global-scale datasets.
On the other hand, regarding AIS data privacy, the approach uses several privacy-enhancing methods:
EMODNet’s density calculation method is used to transform AIS data into grid-based density maps, representing vessel traffic flows and densities while anonymizing individual vessel tracks and routes.
Density maps are then used to train local prediction models for each AIS dataset. Since the models reflect the underlying data without directly exposing it, they can be shared without disclosing explicit AIS information.
FL is applied to generate a global VTD prediction model by aggregating local models, without accessing proprietary or sensitive AIS data.
The proposed solution implements a lifelong learning cycle.
Figure 1a shows five hypothetical AIS data providers, each continuously supplying AIS data from different regions. In the first cycle, each provider uses recent data (e.g., from the previous month) to calculate vessel traffic densities and train local prediction models.
Figure 1b describes the lifelong learning cycle’s steps in a flowchart. Local model weights are sent to a central server and federated into a global model.
The resulting global model weights are then back-propagated to update local models. In subsequent cycles, providers continue updating local models with new data and density calculations, sending updated weights to the central server, which refines the global model and redistributes the new federated weights. The integration of EMODNet’s VTD method, CNN-based local training, and model federation via FL represents a novel and disruptive approach in the current literature.
This back-propagation approach keeps all local models synchronized with the global model, allowing for the selection of the most accurate one. The choice between local or global model weights is based on accuracy metrics: if a local model achieves better accuracy than the global model, its weights are retained; otherwise, the global model’s weights are adopted for predicting local vessel traffic densities. This selection scheme also applies to global model versions. If a newer version outperforms its best predecessor in accuracy, its weights are backpropagated to the providers. If the predecessor performs better, due to overfitting in the newer model, for instance, then its weights are used instead. This ensures that providers always have access to the most accurate density predictions.
Section 1 outlined the research context and key challenges addressed and introduced the proposed solution, along with a review of the relevant state-of-the-art.
Section 2 details the materials and methods, including AIS datasets, development steps, and methods for VTD calculation, local model training, and global model federation.
Section 3 presents the evaluation results of model training and federation. Finally,
Section 4 discusses the outcomes, explores future research directions, and concludes the study.