1. Introduction
The IMO introduced the Maritime Service Portfolio (MSP) and MSW to enhance maritime transportation safety and efficiency through digitalization. The MSP defines 16 maritime services, including maritime safety information services, while MSW provides a unified digital platform where ships can submit and process all necessary information for port arrival, stay, and departure through automated data exchange without human intervention [
1]. The development of Maritime Autonomous Surface Ships (MASS) and operations with reduced crew complement require automated processing capabilities that can operate without continuous human oversight, making MSW’s human-intervention-free capabilities essential for supporting these autonomous maritime systems. Therefore, maritime safety information services need to be integrated into MSW’s automated data exchange framework to ensure comprehensive operational support [
2]. NAVTEX, which provides navigational warnings, meteorological forecasts, and search and rescue coordination messages that directly impact vessel safety and route planning decisions, represents a critical component requiring integration. The IMO requires commercial vessels to install NAVTEX systems as a mandatory component of the Global Maritime Distress and Safety System (GMDSS) [
3]. NAVTEX operates through medium frequency radio transmission, with each message following a standardized structure that ensures consistent formatting of critical safety information [
4].
However, NAVTEX systems face transmission challenges that compromise automated integration requirements. NAVTEX incorporates FEC mechanisms to detect and correct transmission errors [
5]. When errors exceed the system’s correction capability, affected characters become unrecoverable and are marked with asterisk characters (‘*’) in received messages. These corrupted positions traditionally require manual interpretation by navigation officers, who must deduce intended content from surrounding context based on their experience with regional terminology. IMO standards specify that NAVTEX messages with error rates exceeding 4% should be discarded to prevent unreliable information transmission, resulting in loss of safety information [
6,
7]. This manual interpretation process creates fundamental incompatibility with MSW’s automated requirements that depend on reliable, human-intervention-free data exchange.
To address this incompatibility and support automated MSW integration, this paper introduces MLM using Transformer encoder architectures for automated NAVTEX message restoration. MLM employs bidirectional context processing to predict missing tokens, utilizing both preceding and following context simultaneously [
8,
9]. This approach treats asterisk characters as masked tokens, enabling automatic restoration of corrupted content without human intervention. We evaluated MLM restoration performance against statistical methods across error rates ranging from 1% to 33% of character corruption using 69,658 NAVTEX messages. The evaluation framework assesses computational efficiency and restoration rates to determine MLM viability as a software-based enhancement for existing NAVTEX receiver infrastructure, supporting automated MSW integration requirements.
2. Backgrounds
2.1. Navigational Telex and Built-In Error Restoration
NAVTEX messages begin with a four-character header “ZCZC”, include a four-digit message identifier and message body, and end with a four-character footer “NNNN” [
4]. ITU-R Recommendation M.476-5 for NAVTEX requires receivers to display asterisk symbols for corrupted characters that cannot be recovered [
5]. ETSI EN 300065 standard and IMO MSC.148(77) performance standard define that messages are correctly received when character error rates remain below 4%, while printing or displaying shall be inhibited when error rates exceed 33% for more than 5 s [
6,
7]. NAVTEX systems use FEC techniques based on SITOR-B protocol specified in ITU-R Recommendation M.476-5 [
5]. The FEC mechanism transmits each character twice with 280-millisecond intervals and compares both transmissions to detect and correct single-bit errors [
5]. The receiver compares corresponding bits from both transmissions and selects the correct character when one transmission contains errors while the other remains intact.
However, the FEC mechanism cannot restore characters when transmission errors affect both transmissions.
Figure 1 illustrates a failure case where interference corrupts both the initial transmission and the 280-millisecond retransmission of character ‘E’. When both received signals are corrupted, the receiver cannot determine the correct character through comparison. This results in restoration failure, with the detected but unrecoverable character marked by an asterisk as specified in ITU-R M.476-5 [
5].
SITOR-B FEC fails when multi-bit errors occur in both transmissions, when signal fading affects both time-diverse transmissions, or when continuous interference persists longer than the 280-millisecond diversity interval. When FEC detects errors but cannot restore characters, these positions are marked with asterisk symbols according to ITU-R specifications [
5]. These error thresholds ensure that only reliable maritime safety information reaches navigation officers while preventing display of potentially misleading corrupted messages.
When FEC fails to restore detected errors, the NAVTEX system marks these characters with asterisk symbols in the received message.
Figure 2 shows a corrupted NAVTEX navigational warning message where transmission errors replaced multiple characters with asterisks, compromising the readability and usability of MSI.
2.2. Statistical Methods for Text Restoration
Text restoration has been addressed through computational methods with distinct theoretical foundations. This section reviews statistical methods applied to character-level text restoration tasks, providing conceptual background for comparative evaluation in NAVTEX message restoration.
Dictionary-matching method represents pattern matching that utilizes pre-built vocabularies to identify and replace corrupted words [
10]. These methods are commonly applied to various text corruption scenarios including optical character recognition (OCR) errors, typographical mistakes, and transmission-induced character substitutions. This discussion focuses on the restoration process, excluding the error detection phase that identifies which words require correction. The method operates by comparing corrupted words against reference vocabularies to identify probable intended words through evaluation processes. This method follows an approach that combines similar constraints and frequency data through two main steps. First, similarity matching evaluates all vocabulary candidates using edit distance calculations to identify words with minimal character-level differences from the corrupted input, typically selecting candidates below a predefined distance threshold [
11]. Second, when multiple candidates satisfy the similarity criteria, frequency-based selection determines the most probable restoration using statistical word frequency information derived from training corpora.
Figure 3 illustrates this restoration process using a typical typo correction example, showing how corrupted input “mta” undergoes evaluation through edit distance calculation and frequency-based selection to produce the restoration output “mat”.
n-gram methods provide probabilistic frameworks for predicting characters based on preceding character sequences [
12]. These methods analyze statistical patterns in training data to determine character co-occurrence frequencies [
13]. The fundamental principle involves calculating the probability of a character occurring given specific preceding characters, as expressed in Equation (1).
In Equation (1), the numerator represents the frequency of the complete character sequence in training data, while the denominator represents the frequency of the preceding context. For character restoration, corrupted positions are examined and probabilities calculated for each candidate’s replacement character. The character with the maximum probability is selected as the restoration candidate. Standard implementations include the tri-gram (
n = 3) method that considers two preceding characters, the four-gram (
n = 4) model that considers three preceding characters, and the five-gram (
n = 5) model that considers four preceding characters, with longer contexts providing additional information while requiring larger training data [
14].
Both methods exhibit limitations. The dictionary-matching model depends on vocabulary coverage and encounters difficulties with Out-of-Vocabulary (OOV) words, proper nouns, and technical terminology not represented in training data [
15]. Their word-level scope constrains the ability to utilize semantic context beyond individual word boundaries. N-gram methods are limited by fixed context windows and dependence on observed training patterns [
16]. Character sequences not encountered during training present challenges, while fixed context windows prevent utilization of long-range dependencies and semantic relationships that extend beyond immediate character neighborhoods. The trade-off between context window size and statistical reliability constitutes an important consideration, as larger n-values can capture more complex dependencies but encounter data sparsity problems in specialized domains [
17].
2.3. Masked Language Modeling and Transformer Encoder
MLM is an NLP training method that enables models to learn contextual representations by predicting masked tokens in input sequences [
8]. Unlike traditional autoregressive language models that predict tokens sequentially, MLM uses both preceding and following context to make predictions, providing bidirectional processing of textual relationships [
18]. This bidirectional capability makes MLM applicable to text restoration tasks where the surrounding context provides information for predicting missing elements [
9].
MLM operates by randomly masking a percentage of input tokens during training, typically 15% of the total tokens, and training the model to predict the original tokens at these masked positions.
Figure 4 illustrates the standard MLM process, where tokens in a sentence are randomly replaced with <MASK> tokens, and the model learns to predict the original words based on the bidirectional context [
8]. This training method enables models to develop understanding of linguistic patterns, semantic relationships, and contextual dependencies within text sequences. MLM’s self-supervised characteristic enables learning from unlabeled text data, making it effective for text completion, error correction, and information extraction tasks.
Transformer encoder architecture provides the foundation for modern MLM implementations [
8]. As shown in
Figure 5, the encoder processes input sequences through parallel computation rather than sequential processing, enabling processing all positions at once within a message [
19]. This architecture enables long-range dependency capture and complex contextual relationships that are important for accurate character restoration in structured maritime messages.
The self-attention mechanism forms the core component enabling bidirectional context processing [
19]. This mechanism allows each position to attend to all other positions simultaneously, computing attention weights that determine the relevance of different parts of the sequence for making predictions at each position. The mathematical foundation is expressed in Equation (2).
Q, K, V represent query, key, and value matrices respectively. The variable denotes the dimension of key vectors. The scaling factor prevents the dot products from becoming too large, which could lead to extremely small gradients during training.
For NAVTEX restoration, this mechanism considers both local character patterns and global structural patterns. The bidirectional processing capability allows restoration decisions to incorporate context from multiple directions, including message headers, coordinate patterns, and maritime terminology that may appear at distant positions within the sequence. This contextual understanding may provide advantages over statistical methods that rely on limited local context windows.
2.4. Literature Review
Prior research on NAVTEX systems focused on communication error mitigation, data classification, and transmission management improvements. However, research specifically addressing character-level restoration of corrupted NAVTEX messages using advanced natural language processing techniques remains limited, particularly with recent transformer-based and diffusion model approaches.
NAVTEX research concentrated on signal quality enhancement at the transmission stage. Yao et al. proposed a transmitter integrating NAVTEX and weather facsimile charts with improved signal quality through filter applications; however, this work did not address reception-stage error restoration [
20]. In subsequent studies, maritime communication research has begun incorporating NLP techniques for NAVTEX message analysis. Sun et al. proposed a modified TF-IDF method for NAVTEX alert classification and subsequently applied deep learning-based classification models to achieve high accuracy in processing navigational alerts and weather information [
21,
22]. Lee, C. et al. advanced this method by using Bi-LSTM CRF models to classify words in NAVTEX messages by semantic units, establishing foundations for automated interpretation in ECDIS and autonomous vessel environments [
23].
Recent advances in transformer-based error correction provide highly applicable methodologies for maritime text restoration. Park et al. published the Multiple-Masks Error Correction Code Transformer (MM-ECCT), which employs parallel masked self-attention blocks with different mask matrices to learn diverse relationships among codeword bits [
24]. The U-Shaped Error Correction Code Transformer (VU-ECCT) by Nguyen et al. combines U-Net inspired architectures with variational autoencoders for enhanced error correction through residual information and mutual information extraction [
25]. These approaches demonstrate significant improvements in error correction capabilities that could be adapted for NAVTEX message restoration. Text restoration for incomplete or corrupted sequences has seen substantial progress. Bakare et al. demonstrated BERT-base, BERT-large, and RoBERTa models for restoring punctuation in social media texts, achieving effective restoration of improperly punctuated text [
26]. This approach to handling incomplete text formatting has direct applications for maritime communications, where messages often lack proper formatting due to transmission constraints. The DiffusER by Reid et al. parameterizes generation steps as text editing operations including insertion, deletion, and editing rather than traditional noise, enabling conditioning on prototypes and iterative revision particularly useful for correcting corrupted messages [
27].
Text restoration using MLM method has demonstrated effectiveness across various domains beyond maritime communications. Assael et al. developed Ithaca, a Transformer-based model for restoring missing characters in ancient Greek inscriptions, achieving 62% accuracy in text restoration and improving historians’ reconstruction capabilities when used collaboratively [
28]. Similarly, Lazar et al. applied MLM to complete missing text in Akkadian cuneiform tablets, achieving 89% Hit@5 accuracy despite limited training data by leveraging models trained on multiple languages [
29]. OCR error correction represents another relevant application area for MLM-based restoration. Kundaikar et al. demonstrated that MLM using BERT could effectively correct Hindi OCR errors by leveraging contextual information, improving word accuracy from 89% to 92.6% [
30]. This contextual approach to character-level error correction shares conceptual similarities with NAVTEX message restoration challenges, where surrounding maritime terminology and message structure provide valuable context for predicting corrupted characters. Miller et al. explored the integration of Large Language Models (LLMs) into maritime safety operations, demonstrating their potential for enhancing multilingual communication among international crews, automating reporting and documentation, and supporting real-time risk assessment and decision-making processes. Their comprehensive review highlighted the transformative potential of AI-based approaches for improving maritime safety through enhanced communication, reduced human error, and streamlined compliance processes [
31].
Although progress has been made in related domains, research on automatically restoring character-level errors in NAVTEX messages using MLM methods or recent transformer architectures is lacking. While existing studies have made contributions to NAVTEX message classification and general text restoration tasks, they have not specifically addressed the challenge of restoring asterisk characters that result from FEC mechanism failures in maritime communication systems. Recent advances in error correction transformers and diffusion-based text restoration models from 2022 to 2024 provide promising methodologies that have not yet been applied to NAVTEX restoration. This research gap motivates our study, which evaluates MLM for NAVTEX character restoration by treating asterisk characters as masked tokens. By comparing MLM performance with statistical language models including n-gram models and a dictionary-matching model under various error rate conditions, this paper investigates whether a restoration method using MLM provides advantages for enhancing maritime communication reliability where FEC mechanisms prove insufficient.
3. Methodology
This paper evaluates MLM effectiveness for NAVTEX message character restoration through an experimental framework. The experimental procedure is organized into four distinct phases, as illustrated in
Figure 6. The methodology begins with Phase 1, Data Collection and Preprocessing, which involves automated NAVTEX message collection, data cleansing operations, and character masking to simulate transmission errors at various error rates ranging from 1% to 33%. The collected data undergoes filtering and preprocessing operations to establish a standardized dataset, followed by dataset splitting into training, validation, and testing portions. Phase 2, Model Architecture Design, encompasses the architectural development of all comparative models, including the dictionary-matching model, n-gram models and Transformer encoder-based masked language model. Phase 3, Model Training, applies training procedure methodologies to ensure consistent experimental conditions across different restoration methods. Phase 4, Performance Evaluation, conducts comprehensive assessment of restoration rate across all methods, measuring restoration rates across different error rates and character types, and evaluating computational efficiency through inference time analysis.
3.1. Data Collection and Preprocessing
For developing the NAVTEX message error restoration model, we collected actual NAVTEX message data accumulated over 10 years, collected from 2014 to 2023 using web crawling methods. The crawler targeted NAVTEX message archive sites, accessing directories by year and automatically collecting message files in text format. Through this process, we initially acquired 769,089 text files containing MSI including navigational warnings, weather alerts, and search and rescue communications transmitted from broadcasting stations in European regions.
To maintain data consistency, this paper implemented a filtering process. First, we verified that message bodies started with “ZCZC” and ended with “NNNN” to select only messages conforming to standard NAVTEX format. To guarantee original messages without pre-existing errors, we selected only data showing zero error rate in the metadata. Messages already containing asterisk characters were excluded to avoid confusion with our masking process. Duplicated contents were removed, preserving only messages with unique content.
Given the computational constraints of Transformer models and limitations of VRAM in our experimental system, we retained only data with message lengths suitable for sequence modeling. This length limitation was established to ensure efficient processing while maintaining complete message context. These filtering steps produced a final dataset of 69,658 NAVTEX messages.
To prepare training examples, we randomly masked characters at predefined error rates following the Missing Completely At Random (MCAR) assumption, where each character has equal probability of being corrupted [
32]. According to Little and Rubin’s classification of missing data mechanisms, MCAR occurs when the probability of being missing is the same for all cases, implying that causes of the missing data are unrelated to the data values [
33]. We generated training examples across seven error rate conditions: 1%, 5%, 10%, 15%, 20%, 25% and 33%. The masking process selected characters uniformly at random within each message, replacing them with asterisk symbols to simulate FEC restoration failures. While real maritime radio transmissions often exhibit non-random error patterns such as burst errors from interference or frequency-selective fading, which may follow Missing At Random (MAR) or Missing Not At Random (MNAR) mechanisms, accurate modeling of these complex error patterns requires detailed channel characteristics and interference statistics that were not available in our archived message dataset. Therefore, we adopted MCAR as a standard first approach following established practices in missing data literature.
The training data follows a normal distribution centered at 15% error rate, adhering to standard MLM practices established in BERT and similar models, while including diverse error rates to enhance model robustness [
8]. Data with 15% error rate constituted approximately 40% of the total training data, and data at 20% error rate accounted for about 25%, with decreasing proportions for other rates. ETSI standards specify output inhibition when error rates exceed 33% for more than 5 s during transmission. We used 33% as the upper evaluation boundary to assess restoration rate under severe corruption conditions [
5]. We configured the testing and validation datasets with uniform distributions across all error rates.
Table 1 shows the error rate distribution and file counts of train, test, and validation data.
The masking implementation used a fixed random seed of 42 for all experiments to ensure reproducibility. Character selection employed uniform random sampling without replacement. The masking symbol was the asterisk character as specified by ITU-R standards [
4]. Each message contained at least one masked character. All printable NAVTEX characters including letters, digits, spaces, and special symbols were eligible for masking.
3.2. Model Architecture Design
This section details the architectural design of statistical methods and the proposed MLM method for NAVTEX message restoration, establishing their applicability to the specific characteristics of maritime communication errors.
The dictionary-matching model can be applied to NAVTEX message restoration with domain-specific considerations. NAVTEX transmission errors that exceed FEC capabilities result in character substitution with asterisk characters rather than insertion or deletion, ensuring that original and corrupted words maintain identical length. This length preservation property enables a two-stage optimization approach that differs from general NLP applications. First, length-based filtering eliminates vocabulary entries with different character counts, which improves computational efficiency and reliability for NAVTEX restoration. Second, since corrupted characters are marked with asterisk characters at known positions, direct pattern matching can be applied instead of edit distance calculations. Unlike general NLP applications where edit distance is necessary to handle insertions, deletions, and substitutions, NAVTEX corruption involves only asterisk substitutions at specific positions, making pattern matching more appropriate. For example, a corrupted word “W*ND” can be matched against vocabulary entries of identical length by checking if non-asterisk characters correspond exactly to candidate words. This method follows a three-step approach exploiting the length preservation and substitution pattern properties of NAVTEX character errors.
Figure 7 illustrates this restoration process using a NAVTEX weather warning example, showing how corrupted input “W*ND” undergoes evaluation through length-based filtering, pattern matching against uncorrupted positions, and frequency-based selection to produce the restoration output “WIND”.
N-gram models for NAVTEX restoration operate through character-level statistical analysis of maritime message patterns. For computational efficiency, the implementation employs tri-gram, four-gram, and five-gram models, which balance sufficient contextual information for character prediction with manageable training and inference requirements. This paper did not implement models beyond five-gram due to data sparsity issues and increased computational overhead that would compromise real-time processing capabilities. During restoration, the n-gram methods process asterisk characters by extracting preceding context according to the model order: tri-gram models consider two preceding characters, four-gram models examine three preceding characters, and five-gram models analyze four preceding characters. For each corrupted character, the model evaluates conditional probabilities for all possible substitution characters from the NAVTEX character set based on the available preceding context, selecting the character with the highest conditional probability as the restoration candidate. When statistical evidence is insufficient, the implementation incorporates fallback rules specific to maritime message formatting, such as inserting spaces between alphabetic and numeric character transitions that frequently occur in coordinate representations and timestamp formats.
The equivalence between asterisk characters in corrupted NAVTEX messages and masked tokens in MLM frameworks enables direct application of Transformer-based restoration methods. In NAVTEX systems, characters that cannot be restored by FEC mechanisms are marked with asterisk characters, creating a direct correspondence with the <MASK> tokens used in standard MLM training. This structural similarity allows MLM framework to be applied to NAVTEX restoration without requiring specialized architectural modifications.
Figure 8 shows the complete pipeline where corrupted NAVTEX messages undergo tokenization, self-attention processing, and character prediction to restore corrupted positions.
The tokenizer converts input NAVTEX messages into subwords and their token sequences, mapping each character including asterisk characters to numerical identifiers. Special tokens <BOS> and <EOS> are added to mark sequence boundaries, ensuring proper sequence delimitation for Transformer processing. The tokenization process treats asterisk characters as special tokens while maintaining their positional information for subsequent restoration. The embedding layer combines token embeddings with positional encodings to create input representations. Each token receives both its semantic representation and positional information, allowing the model to process both character identity and sequence position. This combined representation enables contextual understanding, as the relative positions of characters influence restoration decisions in structured NAVTEX messages.
The Transformer encoder processes embedding vectors through stacked Transformer encoder layers containing multi-head attention and feed-forward networks with residual connections and layer normalization. The bidirectional processing capability allows the model to consider both preceding and following context when generating representations for each position. During attention computation, each position in the sequence can attend to all other positions, enabling the model to capture long-range dependencies and contextual relationships. For NAVTEX messages, this capability enables restoration decisions to incorporate distant context such as message headers, coordinate patterns, or standardized maritime terminology that appear elsewhere in the message.
The projection layer receives the encoder outputs and selectively processes only the hidden representations corresponding to masked positions marked with asterisk characters. This layer includes layer normalization, linear transformation, GELU activation, dropout regularization, and final linear projection. The layer maps the encoder outputs to probability distributions over the complete NAVTEX character set, which encompasses exactly 53 printable characters including uppercase letters, digits, spaces, and special symbols defined in the CCIR-476 character set used for NAVTEX messages [
4]. The restoration process applies argmax selection to identify the most probable character for each masked position. The predicted character indices are then mapped back to actual characters using the character vocabulary, and the original asterisk characters are replaced with the predicted characters to generate the restored message.
3.3. Model Training and Validation
This section describes the training procedures and validation methods applied to all comparative methods, ensuring consistent experimental conditions for performance evaluation across different restoration methods.
Training procedures were applied across all comparative methods to ensure fair evaluation conditions. The dictionary-matching model constructs domain-specific vocabularies from the NAVTEX training data, recording word frequencies for statistical selection during restoration. The n-gram models are prepared by counting character sequence frequencies in the training data to build conditional probability tables. Character-level n-gram statistics are extracted using sliding window methods across all training messages, with frequency counts normalized to produce conditional probability distributions for character prediction. MLM employs the standard masked language modeling objective where the model learns to predict original characters from asterisk characters by minimizing cross-entropy loss. To improve generalization and prevent overconfidence in predictions, the implementation employed Label Smoothing Cross Entropy with a smoothing parameter of 0.1, which distributes probability mass from the correct character across the remaining character set [
34,
35].
For MLM implementation, the Transformer encoder architecture was implemented with the x-transformers library version 2.4.1, utilizing unigram tokenizer from the SentencePiece library for NAVTEX message tokenization [
36]. The model was configured with manually selected hyperparameters: model dimension of 512, 5 attention heads, 6 encoder layers, learning rate of 5 × 10
−4, batch size of 128, and 15 training epochs. The maximum sequence length was set to 1024 characters due to computational constraints. Training employs Adaptive Moment Estimation with Weight Decay optimizer with learning rate scheduling [
37,
38]. All comparative methods employ identical training data and preprocessing procedures to maintain experimental consistency. Uniform validation procedures were applied to ensure fair comparison across all methods. All comparative methods process identical corrupted test messages generated through the masking procedure described in
Section 3.1, ensuring that performance differences reflect algorithmic capabilities rather than dataset variations. The validation framework employs five independent experimental runs including character masking, data splitting and training with different random seeds to ensure statistical reliability of results. Performance metrics are calculated consistently across all methods using identical evaluation criteria and data processing procedures. For safety-critical maritime applications, MLM restoration incorporates confidence-based filtering using softmax output probabilities. While recent research by Zhang et al. highlights calibration challenges in deep neural network predictions [
39], this study implements a simplified approach with a fixed confidence threshold of 0.7 to ensure reproducibility and practical applicability.
The computational environment consisted of an Intel® Xeon® Gold 5218R processor and NVIDIA A100 graphics processing unit with 80 GB memory capacity, operating under Ubuntu 20.04. The implementation framework incorporates CUDA version 11.8, CUDNN 8.7.0, and PyTorch 1.9.0 for deep learning model development and execution, while statistical methods were implemented by Python 3.10 and libraries such as Pandas 2.2.3, Numpy 2.2.6, and Matplotlib 3.10.3 to ensure computational comparison validity.
4. Results and Analysis
This section presents evaluation results comparing MLM with statistical methods for NAVTEX message restoration. The analysis examines performance across varying error rates and character types to assess the practical applicability of each restoration method.
4.1. Evaluation Metrics
To evaluate each method for restoring corrupted NAVTEX messages, this paper employs comparison across multiple methods under identical experimental conditions. The evaluation framework encompasses restoration rate assessment, computational efficiency analysis, and character-type-specific performance evaluation across error rates ranging from 1% to 33% character corruption.
The primary evaluation metric is the restoration rate, which measures the percentage of correctly restored characters among all corrupted positions marked with asterisk characters. For MLM with confidence filtering implemented as described in
Section 3.3, the restoration rate is calculated using Equation (3), where
represents the number of correctly restored characters with confidence ≥ 0.7 and
represents the total number of corrupted characters requiring restoration.
The error rate for each message is defined as the proportion of corrupted characters relative to the total message length, calculated using Equation (4).
represents the number of asterisk characters in the corrupted message and
represents the total number of characters in the original message. This metric quantifies the severity of corruption in each test case.
To assess compliance with IMO MSC.148(77) standards, we calculate the residual error rate, which represents the remaining error rate after applying restoration methods [
7]. The residual error rate is computed using Equation (5), where the initial error rate is reduced by the proportion of successfully restored characters. This metric evaluates whether restored messages meet the regulatory requirement of maintaining error rates below 4% for operational use.
To provide evaluation across different character types commonly found in NAVTEX messages, restoration rates are calculated separately for five distinct categories which are overall performance, alphabetic characters, numeric digits, space characters, and special symbols. This categorization enables detailed analysis of model performance across different character types that may exhibit varying restoration difficulty levels. For overall performance comparison, the Area Under the Curve (AUC) metric is calculated from the error rate versus restoration rate graph using the trapezoidal rule for numerical integration. Higher AUC values indicate better overall performance and robustness to varying error rates. Computational efficiency is measured through inference time evaluation, recording the average time required to process individual NAVTEX messages. This metric assesses the practical applicability of different restoration methods for real-time maritime communication systems where timely message delivery is important for navigation safety.
4.2. Experimental Results
Table 2 presents comprehensive performance statistics and computational efficiency metrics for all evaluated methods across the complete range of error rate levels. Results represent means from five independent runs with different random seeds to ensure statistical reliability. MLM achieved the highest mean restoration rate of 85.4% with a standard deviation of 0.3%, demonstrating consistent reproducibility across independent experimental runs. Five-gram obtained the second-highest mean restoration rate at 64.0% with a standard deviation of 0.2%. Four-gram and dictionary-matching model achieved similar mean restoration rates of 59.1% and 58.6% with standard deviations of 0.2% and 0.1% respectively. Tri-gram recorded the lowest mean restoration rate at 44.4% with a standard deviation of 0.1%. Low variability indicates reliable experimental reproducibility and consistent algorithmic performance.
For overall robustness evaluation, MLM recorded the highest AUC value of 2889, followed by five-gram with 2007, four-gram with 1862, dictionary-matching with 1832, and tri-gram with 1408. Computational efficiency analysis revealed four-gram as the fastest method with 4.17 ms inference time per message, followed by five-gram and tri-gram at approximately 5.6 ms each. MLM processed messages at 19.30 ms per message, representing approximately 4–5 times slower processing than statistical methods. Dictionary-matching exhibited the longest processing time at 334.77 ms per message.
The experimental results suggest that MLM is among the most effective software-based post-processing method for NAVTEX message restoration, maintaining restoration rates above 82% even at 33% error rate. Statistical methods demonstrated varying capabilities with five-gram achieving moderate performance, while tri-gram provided consistent but limited restoration rate. The computational overhead of MLM remains practical for real-time maritime communication systems, potentially enabling processing of approximately 52 messages per second while providing potential restoration accuracy improvements over statistical methods. MLM achieved improved performance across the evaluated conditions, with restoration rate improvements ranging from 21.4%p compared to the best-performing statistical method to 41.0%p compared to the lowest-performing method. Statistical significance analysis using McNemar’s test on 311,051 character-level predictions confirmed statistically significant performance differences between MLM and all baseline methods with Bonferroni-corrected significance levels (p < 0.001 for all comparisons). Effect size analysis indicated substantial practical significance with odds ratios ranging from 10.4 to 122.4 and Cohen’s g values exceeding 0.8 for all comparisons. These results demonstrate both statistical significance and meaningful performance differences for maritime safety applications.
4.3. Performance Across Different Error Rates
Figure 9 illustrates restoration rate performance across different error rates from 1% to 33%, evaluating each method’s robustness under varying error rates. The results demonstrate distinct performance patterns and degradation characteristics for each restoration method.
MLM with confidence threshold of 0.7 demonstrates consistent performance across all error rate levels, maintaining restoration rates above 82% at the 33% error rate. The method achieved maximum performance of 86.5% at 15% error rate, with moderate performance variations as error rate changes. At 1% error rate, MLM recorded 86.0% restoration rate, declining to 84.6% at 5%, increasing to 86.0% at 10%, reaching peak performance of 86.5% at 15%, slightly increasing to 86.7% at 20%, then declining to 84.7% at 25% and 82.0% at 33% error rate. The performance variations across different error rates may be attributed to the confidence threshold filtering mechanism, where the proportion of high-confidence predictions varies with error rate conditions, and differences in training data volume across error rate levels, as shown in
Table 1. This pattern indicates that MLM’s bidirectional context processing capability enables character restoration under high-error-rate conditions while maintaining prediction reliability through confidence filtering.
Five-gram exhibited the most variable performance among all methods, showing degradation as error rates increased. Starting at 77.55% restoration rate at 1% error, five-gram’s performance declined to 76.83% at 5%, 59.56% at 20%, and 42.33% at 33% error rate. The method demonstrated sensitivity to error rates above 15%, where restoration rates fell below 70%, indicating limitations in handling corrupted messages at higher error rates. Four-gram and the dictionary-matching model showed similar performance trajectories with degradation patterns. Four-gram achieved 68.65% restoration rate at 1% error, declining to 60.69% at 15%, 51.72% at 25%, and 43.91% at 33% error rate. Dictionary-matching recorded 72.0% at 1% error, decreasing to 58.89% at 15%, 49.69% at 25%, and 42.24% at 33% error rate. Both methods exhibited restoration capabilities with performance declining as corruption severity increased. Tri-gram demonstrated stable performance among statistical methods but had low restoration rates across all error rate levels. The method achieved 49.49% restoration rate at 1% error, declining to 44.45% at 15%, 41.18% at 25%, and 36.54% at 33% error rate. Tri-gram showed predictable performance degradation with restoration rates lower than other methods across all tested conditions.
The performance analysis reveals that MLM maintains restoration capability across the complete range of error rate levels, with advantages at higher error rates where statistical methods show performance deterioration. The advantage becomes more pronounced as error rates exceed 15%, where MLM maintains restoration rates above 82% while statistical methods fall below 70%.
To evaluate regulatory compliance, we assess residual error rates after restoration processing according to IMO MSC.148(77) standards, which require maintaining error rates below 4% for operational message acceptance [
7].
Table 3 presents residual error rate calculations and compliance assessment across all tested methods and error conditions.
MLM maintains regulatory compliance across error rates from 5% to 25%, demonstrating the method’s capability to restore corrupted messages to acceptable operational standards under moderate to high error conditions. Statistical methods exceed the 4% compliance threshold beginning at 10% initial error rate, indicating limited applicability for message restoration under degraded communication conditions. The analysis shows that MLM enables message retention and operational use across a broader range of transmission quality conditions compared to conventional statistical restoration approaches. At 33% initial error rate, all restoration methods fail to meet regulatory requirements, confirming the established transmission quality limits for maritime safety communications.
4.4. Character-Type-Specific Performance
Character type affects restoration rates in NAVTEX messages. Character-type-specific analysis provides insights into the relative difficulty of restoring different categories of corrupted characters and reveals the strengths and limitations of each restoration method.
Figure 10 illustrates restoration rates categorized by character type across all tested methods, demonstrating distinct performance patterns for alphabetic characters, numeric digits, space characters, and special symbols.
MLM with confidence threshold of 0.7 demonstrates strong performance across all character categories. MLM achieved 91.1% alphabetic character restoration, outperforming statistical methods by leveraging contextual information from maritime terminology. The bidirectional attention mechanism enables the model to consider both local character patterns within words and broader semantic context across message sections, facilitating accurate prediction of alphabetic characters in corrupted positions.
Space character restoration represents the highest-performing category for MLM, achieving 95.4% restoration rate. This high performance stems from the predictable positioning of spaces in structured NAVTEX messages, where spaces consistently separate standardized elements such as message headers, coordinates, and time stamps. The Transformer encoder’s attention mechanism effectively identifies these structural patterns, enabling reliable space restoration even under high-error-rate conditions. Numeric character restoration presents the most challenging task across the tested methods. MLM achieved 44.6% restoration rate for numeric digits, indicating significant difficulty in predicting corrupted numbers. This limitation reflects the inherent challenge of numeric restoration, where contextual clues may be insufficient to determine specific digit values. Geographic coordinates, frequencies, time stamps, and other numeric data in NAVTEX messages often lack sufficient contextual constraints to enable accurate prediction through surrounding text alone. Special symbol restoration showed strong performance with MLM achieving 90.2% restoration rate. Special symbols in NAVTEX messages include punctuation marks, degree symbols, and formatting characters that serve structural functions within message content. The high restoration rate reflects the contextual predictability of these symbols, where structural patterns provide clear cues for accurate restoration.
Statistical methods demonstrated varying capabilities across character types. Five-gram achieved the highest alphabetic restoration rate among statistical methods at 55.3%, followed by four-gram at 53.4%, dictionary-matching at 65.7%, and tri-gram at 39.5%. For numeric restoration, statistical methods showed uniformly low performance, with five-gram achieving 27.8%, four-gram at 24.9%, dictionary-matching at 29.1%, and tri-gram at 18.6%. These results indicate that statistical methods face limitations when predicting numeric content that lacks strong local character dependencies. Dictionary-matching method exhibited unique characteristics in space character restoration, achieving 0% restoration rate due to its word-level processing method. The method operates by matching complete words against vocabulary entries, inherently excluding space characters from its restoration scope. This limitation highlights the word-boundary constraints of dictionary-based methods and their inability to address character-level restoration tasks that extend beyond lexical units.
The character-type analysis reveals differences in restoration difficulty across different categories of NAVTEX message content. Alphabetic characters benefit from semantic context and linguistic patterns that enable effective restoration through both statistical and deep learning methods. Space characters demonstrate high predictability due to structural message formatting. Numeric characters present significant challenges for all methods due to limited contextual constraints for specific digit prediction. Special symbols show strong restoration performance depending on their structural function within message content.
4.5. Computational Efficiency
Analysis reveals computational trade-offs between restoration rate and processing speed, with inference times compared across all methods in
Table 4. N-gram models show the fastest processing capabilities, with four-gram achieving the shortest inference time at 4.17 ms per message. Five-gram and tri-gram follow with 5.20 ms and 5.63 ms, respectively, representing 3.7 times and 3.4 times faster processing compared to MLM. MLM shows moderate computational overhead at 19.30 ms per message, which represents approximately 4 to 5 times slower processing than n-gram methods but maintains practical feasibility for real-time maritime communication systems. This processing time enables systems to handle approximately 52 messages/s, which supports real-time restoration requirements. Dictionary-matching exhibits higher computational complexity at 334.77 ms per message, representing 17.3 times slower processing than MLM. This overhead stems from vocabulary search operations and pattern matching computations required for word-level restoration, making it less suitable for time-critical maritime communication scenarios.
5. Discussion and Limitations
MLM method with confidence threshold of 0.7 shows potential to restore messages that might otherwise be discarded, maintaining restoration rates above 82% even at error rates from 1% to 33% where current systems would typically discard transmissions. Specifically, the model achieved 86.5% restoration rate at 15% error rate and maintained over 84% restoration rate across most error conditions. This capability could reduce information loss during adverse maritime communication conditions when safety information transmission is compromised [
5]. Furthermore, regulatory compliance analysis demonstrates that MLM maintains residual error rates below the IMO MSC.148(77) standard of 4% for initial corruption levels up to 25%, while statistical methods exceed this threshold beginning at 10% initial error rate, indicating MLM’s enhanced capability to restore messages to operational standards under degraded communication conditions [
7]. The automated restoration capability addresses a critical requirement for MSW implementation, where maritime safety information services must be processed through automated data exchange without manual intervention. Traditional manual interpretation of corrupted NAVTEX messages creates bottlenecks incompatible with MSW requirements for reliable digital data exchange [
2]. The experimental results show that MLM provides improvements over statistical methods for text restoration.
Figure 11 illustrates MLM’s restoration capability, where a 33% corrupted Norwegian weather bulletin showing extensive asterisk characters is reconstructed into a readable message achieving 82% restoration rate, showing the model’s ability to handle complex maritime terminology and technical specifications.
However, several limitations have been identified that warrant future research attention. A limitation concerns numeric character restoration, with MLM achieving only 44.6% restoration rate compared to 91.1% for alphabetic characters. This performance gap represents a significant concern for maritime safety applications where numeric information constitutes essential navigational data including coordinates, frequencies, and timing information. Errors in coordinate data could lead to incorrect position reporting, while incorrect frequencies could prevent vessels from accessing critical communication channels, and timing errors could compromise search and rescue operations. The low numeric restoration rate indicates that nearly half of corrupted numeric characters remain unrestored, potentially rendering safety-critical numeric information unreliable for operational use.
While MLM achieved 85.4% overall restoration rate, the disproportionate weakness in numeric character prediction presents challenges for practical deployment in maritime safety applications. In safety-critical maritime communications, restoration errors in coordinates, frequencies, or timing data could have operational implications that extend beyond simple communication failures. The implemented confidence-based filtering approach addresses general prediction uncertainty by accepting only high-confidence predictions while flagging uncertain restorations for manual verification but does not specifically address the systematic weakness in numeric character restoration that affects safety-critical information. This limitation may affect maritime safety applications where numeric information represents navigational data including coordinates, frequencies, and timing information. The current encoder-only architecture also presents limitations for errors that are not corrected by FEC mechanisms, requiring sequence-to-sequence methods for message reconstruction. Regional focus in the training dataset represents another limitation. The collected NAVTEX messages originated from European maritime regions, potentially limiting effectiveness for other geographical areas.
Figure 12 illustrates this limitation, where the model fails to restore the Japanese place name “OKINAWA SHIMA” but reconstructs global maritime terminology like “EAST” and “SOUTHEASTWARD”.
Also, while MLM’s processing time of 19.30 ms per message is 4 to 5 times slower than n-gram models, this processing speed appears practical for maritime communication systems, enabling processing of approximately 52 messages per second. Furthermore, model distillation and pruning techniques could potentially reduce this computational overhead for deployment in more resource-constrained environments.
Future research should explore character-type-specific architectures for improved numeric restoration and confidence calibration techniques such as those proposed by Zhang et al. to improve the reliability of confidence-based filtering mechanisms for maritime safety applications [
39]. Also, hyperparameter optimization for training MLM model can be adopted for increasing restoration rate. Transfer learning approaches for adapting models to non-European maritime regions represent another research direction, where regional adaptation could be achieved through fine-tuning existing models on local NAVTEX data or generating synthetic training datasets for non-European regions using large language models to create geographically diverse maritime terminology and place names. Additionally, extending this approach to multilingual NAVTEX systems represents a significant research opportunity, as many coastal regions operate national NAVTEX services in local languages alongside the international English-language service [
4]. Multilingual models could enable automated restoration across different language-specific NAVTEX transmissions, requiring specialized tokenization and language-specific maritime terminology handling. Furthermore, investigations of how training data distribution and error rate patterns influence restoration rate could optimize model training strategies for specific operational scenarios.
Error pattern modeling represents another important research direction. While this paper adopted MCAR assumptions for character corruption simulation, real maritime radio transmissions often exhibit more complex error patterns including burst errors caused by interference, where consecutive characters may be corrupted simultaneously, and frequency-selective fading that may make certain characters more susceptible to damage. Future studies should investigate MAR and MNAR mechanisms to better model realistic transmission error patterns. The confidence threshold of 0.7 applied in this paper requires further optimization research, as threshold selection significantly impacts the trade-off between restoration coverage and prediction reliability for maritime safety applications.
Validation using actual maritime communication data with real interference patterns would provide more realistic performance assessment for practical deployment. Hybrid methods combining MLM’s contextual understanding with statistical method’s computational efficiency could provide optimal practical deployment solutions, while methods such as model distillation, pruning, quantization could be considered for computational efficiency in resource-constrained environments.
6. Conclusions
We evaluated the effectiveness of MLM for NAVTEX message restoration through comparison with statistical methods including dictionary-matching model, tri-gram, four-gram, and five-gram models across error rates ranging from 1% to 33% character corruption. The research addresses limitations in maritime communication systems where NAVTEX’s FEC mechanisms cannot restore corrupted characters, which are marked with asterisk characters in received messages. Current ETSI standards require discarding messages with error rates exceeding 4%, resulting in loss of safety information. This safety information loss compromises automated data exchange requirements essential for MSW implementation, where maritime safety information services must be processed without manual intervention [
2]. Experimental results show that MLM with confidence threshold of 0.7 achieved 85.4% mean restoration rate compared to statistical methods which achieved 44.4–64.0% restoration rates. MLM maintained restoration rates above 82% at 33% error rates while statistical methods fell below 45%, enabling restoration of messages that would otherwise be discarded under ETSI standards. Regulatory compliance analysis demonstrates that MLM maintains residual error rates below IMO MSC.148(77) requirements of 4% for initial corruption levels up to 25%, while statistical methods exceed this threshold at 10% or higher error rates, indicating enhanced capability to restore messages to operational standards.
Our contribution applies Transformer encoder architecture to NAVTEX restoration, demonstrating direct MLM application by treating asterisk characters as masked tokens, establishing the technical feasibility of adapting existing NLP methodologies to maritime communication problems without requiring specialized architectural modifications. The bidirectional context processing capability allows the model to consider both preceding and following context when making restoration decisions, unlike statistical methods that rely on limited local context windows. This automated processing capability enables reliable machine-to-machine data exchange. Also, it eliminates dependency on manual interpretation by navigation officers.
Results suggest potential applications for maritime communication operations. The demonstrated restoration capabilities may support operational requirements for MASS and operations with reduced crew complement, where automated processing could become beneficial. The method enables restoration of messages containing navigational warnings, meteorological forecasts, and search and rescue coordination that would otherwise be discarded due to corruption, potentially supporting information availability during periods when communication quality is degraded.
Several limitations merit consideration for future research. Numeric character restoration limitations present the most significant concern for applications involving coordinates, frequencies, and timing information that are important for maritime safety, with MLM achieving only 44% restoration rate compared to 91% for alphabetic characters. This performance gap represents a critical challenge for practical deployment, as errors in coordinated data could lead to incorrect position reporting, while incorrect frequencies could prevent vessels from accessing critical communication channels, and timing errors could compromise search and rescue operations. The training data focused on European maritime regions may limit effectiveness for other geographical areas, as evidenced by restoration failures with non-European place names. Error pattern modeling constitutes another research area, as real maritime radio transmissions often exhibit burst errors caused by interference where consecutive characters may be corrupted simultaneously, requiring investigation of MAR and MNAR mechanisms beyond the MCAR assumptions adopted in this paper. The confidence threshold of 0.7 applied requires optimization research to balance restoration coverage with prediction reliability. Future research should explore character-type-specific architectures for improved numeric restoration, data augmentation techniques for geographic diversity, confidence calibration methods, and validation using actual maritime communication data with real interference patterns.