Next Article in Journal
Calibration of a Multiphase Poroelasticity Model Using Genetic Algorithms
Previous Article in Journal
Application of Modified Lignocellulosic Biomass for Sorption of Anionic Dye Reactive Black 5 in an Air-Lift and Column Reactor
Previous Article in Special Issue
The Impact of Regulation Amendments on Decision Support System Effectiveness on the Example of Vessel Traffic Planning on the Dredged Świnoujście–Szczecin Fairway
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Speech Recognition-Based Analysis of Vessel Traffic Service (VTS) Communications for Estimating Advisory Timing

by
Sang-Lok Yoo
1,
Kwang-Il Kim
2 and
Cho-Young Jung
3,*
1
Department of Artificial Intelligence, Jeonju University, Jeonju 55069, Republic of Korea
2
Department of Marine Industry and Maritime Police, Jeju National University, Jeju 63243, Republic of Korea
3
Department of Public Service in Ocean and Fisheries, Kunsan National University, Gunsan 54150, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(22), 11968; https://doi.org/10.3390/app152211968
Submission received: 10 October 2025 / Revised: 2 November 2025 / Accepted: 6 November 2025 / Published: 11 November 2025
(This article belongs to the Special Issue Risk and Safety of Maritime Transportation)

Abstract

Vessel Traffic Service systems play a critical role in maritime safety by providing timely advisories to vessels in congested waterways. However, the optimal timing for VTS operator interventions has remained largely unstudied, relying primarily on subjective operator experience rather than empirical evidence. This study presents the first large-scale empirical analysis of VTS operator intervention timing using automated speech recognition technology applied to actual maritime communication data. VHF radio communications were collected from five major VTS centers in Korea over nine months, comprising 171,175 communication files with a total duration of 334.2 h. The recorded communications were transcribed using the Whisper speech-to-text model and processed through natural language processing techniques to extract encounter situations and advisory distances. A tokenization and keyword framework was developed to handle Maritime English and local-language communications, normalize textual numerical expressions, and facilitate cross-site analysis. Results reveal that VTS operator intervention timing varies by encounter type. In head-on and crossing encounters, advisories are provided at distances, with mean values of 3.1 nm and 2.8 nm, respectively. These quantitative benchmarks provide an empirical foundation for developing standardized VTS operational guidelines and decision support systems, ultimately enhancing maritime safety and operational consistency across jurisdictions.

1. Introduction

Maritime traffic safety has become increasingly critical as global shipping volumes continue to grow and waterways become more congested. Maritime transport carries more than 90% of world trade, and the global fleet comprises well over 50,000 merchant ships operating internationally [1,2]. This substantial increase in maritime traffic, particularly in major ports and coastal areas, has raised significant concerns about vessel collision risks and navigational safety. Empirical studies consistently indicate that human factors contribute to a large proportion of maritime accidents, with communication breakdowns and suboptimal decision-making repeatedly identified as major causal factors [3,4,5].
Vessel Traffic Service (VTS) systems play a pivotal role in managing maritime traffic and ensuring safe navigation by monitoring vessel positions, speeds, headings, and intentions. Academic treatments of VTS, which reflect the SOLAS V/12 regulation and subsequent guideline developments, consistently emphasize that VTS is instituted to enhance the safety and efficiency of navigation, protect life at sea, and safeguard the marine environment [6,7]. The primary function of VTS operators is to provide timely information, navigational assistance, and traffic organization services to vessels within designated areas. However, determining the optimal timing for VTS advisories remains a critical, unresolved challenge. Prior VTS research demonstrates that operators work in a complex and time-sensitive environment where delayed advisories may allow risks to escalate, while premature interventions can impose unnecessary workload on communication channels and bridge teams [8].
Recent advances in maritime safety research increasingly leverage big data analytics and artificial intelligence methods applied to Automatic Identification System data. AIS has become foundational for traffic pattern analysis, collision risk assessment, and route optimization [9]. Studies have reconstructed maritime traffic networks using evolutionary algorithms [10] and employed deep learning methods for intelligent route extraction [11]. Unsupervised analysis of historical AIS data has revealed recurrent behaviors and spatiotemporal regularities in vessel movements [12]. Building on these data-driven approaches, researchers have developed sophisticated algorithms for collision avoidance and route planning using ant colony optimization [13] and hybrid GA/PSO methods [14]. Additional methodologies include CUSUM-based maneuver detection in maritime observation data [15] and automatic routing analysis from massive AIS trajectories [16].
While these AIS-based studies have substantially advanced understanding of vessel movement patterns and algorithmic collision avoidance, they primarily focus on vessel behavior analysis without incorporating VTS operator communication practices. The temporal dynamics of VTS interventions, particularly when operators should issue advisories, have received limited scholarly attention within real-time operational contexts. Addressing this limitation is crucial because VTS operators serve as the critical human element in maritime traffic management, bridging the gap between vessel position monitoring and timely safety advisories through direct communication.
This gap necessitates integrating vessel movement analysis with communication content analysis. The application of natural language processing (NLP) and speech recognition technologies in maritime communication represents an emerging research area with significant potential for this integration. Traditional maritime communication relies heavily on very high frequency (VHF) radio exchanges between vessels and VTS operators, often conducted in both English as standardized Maritime English and local languages [17]. Recent developments in automatic speech recognition (ASR) and natural language understanding have shown promise for analyzing maritime communications [18,19]. However, research specifically addressing the analysis of VTS communication content to determine optimal intervention timing remains notably scarce, representing a critical research opportunity at the intersection of vessel traffic monitoring and operator decision-making.
Despite the critical importance of VTS advisory timing, research in this area has been limited and predominantly qualitative in nature. Kim [20] conducted a survey-based study of VTS operators across major Korean ports to determine safe distances between vessels and obstacles. The study reported that the safe control distance for Jindo Coastal VTS was 3.6 nautical miles (nm), while Busan Port VTS was 1.75 nm. However, as acknowledged by the author, these findings were based solely on subjective assessments through questionnaires, lacking objective quantitative validation. Park and Park [21] advanced this research by analyzing actual VTS communication data from Busan Port over a three-day period, examining 60 communication cases. Their analysis revealed that VTS operators typically provided information or advisory guidance at distances of 1.65 nm for crossing situations and 1.64 nm for head-on encounters. While this study introduced a novel methodology for determining VTS intervention timing through empirical data analysis, its scope was limited to a single port and a relatively short observation period, necessitating further comprehensive research across multiple VTS centers and extended timeframes.
Despite the critical role of timely VTS operator interventions in maritime safety, several fundamental research gaps persist in the existing literature, significantly constraining the development of evidence-based operational standards. First, while previous research has examined VTS operations through surveys and case studies at individual facilities, large-scale empirical studies analyzing actual VTS communication data across multiple operational centers with different traffic characteristics and geographical complexity variations are notably lacking. This absence of comparative cross-jurisdictional research is particularly problematic because VTS centers operate under substantially different conditions, including varying traffic densities, navigational hazards, and geographical constraints. This uncertainty fundamentally impedes efforts toward developing coherent, internationally harmonized VTS operational guidelines. Second, the systematic integration of speech recognition and maritime domain knowledge for the analysis of routine VTS communications remains underexplored. Developing speech-recognition systems tailored to VTS radiotelephony poses technical challenges distinct from general-purpose applications and, accordingly, necessitates the development of a domain-specific ASR model tightly integrated with a domain-informed NLP layer capable of interpreting the standardized phraseology, syntax, and operational semantics of maritime operations. Third, and most critically for operational standardization, systematic quantitative analysis distinguishing between different encounter situations and vessel roles, specifically stand-on versus give-way vessels based on the International Regulations for Preventing Collisions at Sea (COLREG), is notably absent from existing literature [22]. Previous studies have either examined VTS interventions without differentiating encounter types or have focused narrowly on specific scenarios without conducting comparative analysis across situation categories. This gap presents a significant impediment to developing nuanced operational guidelines because COLREG assigns fundamentally different collision avoidance responsibilities to stand-on and give-way vessels in crossing situations. Without empirical evidence demonstrating how intervention distances vary by encounter type and vessel role in actual operational practice, the formulation of standardized guidelines that appropriately account for situational complexity remains fundamentally constrained.
Therefore, this study aims to address these gaps by analyzing VTS operator intervention timing using large-scale speech communication data collected from five major VTS centers in Korea. The collected audio data were transcribed and processed using natural language processing techniques, enabling statistical analysis of information provision distances across different encounter situations. Specifically, this research seeks to determine the typical distances at which VTS operators provide advisory information across different encounter situations and operational areas, thereby establishing evidence-based criteria for standardized VTS operational guidelines. By employing speech recognition technology combined with maritime domain expertise, this study contributes to the development of standardized and objective criteria for VTS information provision timing, ultimately enhancing maritime safety and operational efficiency.
The quantitative intervention timing benchmarks established through this research will enable several key practical applications that address current operational challenges. First, these benchmarks will inform the development of evidence-based VTS training curricula that can teach operators specific intervention distances for different encounter types rather than relying solely on subjective experience-based judgment. This standardization can improve consistency across newly qualified operators. Second, the findings will support the creation of standardized operational guidelines that maritime authorities can adopt to ensure consistent service delivery across different VTS centers and jurisdictions, reducing potential confusion for vessel operators who interact with multiple VTS areas. Third, the empirical distance thresholds will provide calibration parameters for automated decision support systems, enabling such systems to alert operators at appropriate times or validate operator intervention decisions against evidence-based standards. These applications directly address current challenges in VTS operations where intervention timing remains largely subjective and unstandardized, potentially contributing to inconsistent safety outcomes across different operational contexts.
The remainder of this paper is organized as follows. Section 2 details the methods for audio data. Section 3 describes the results. Section 4 discusses the findings, and Section 5 concludes this work and suggests future research directions.

2. Materials and Methods

2.1. Data Collection and Study Design

This study employed a large-scale observational approach to analyze VTS operator communication practices across multiple operational centers in the Republic of Korea. Data collection was conducted in collaboration with Future Ocean Information Technology Co., Ltd. (Singapore), utilizing specialized recording equipment installed at five major VTS centers: Busan Port VTS, Ulsan Port VTS, Yeosu Port VTS, Jeju Port VTS, and Jindo Coastal VTS. VHF radio communication recording devices were installed at each VTS operational center to capture real-time voice communications between VTS operators and vessels. The recording system was designed to automatically capture all VHF radio exchanges on designated maritime communication channels, ensuring comprehensive data coverage across different traffic scenarios and operational conditions. The geographic distribution of these five VTS centers, spanning the southern coastal waters in the Republic of Korea (Figure 1).

2.2. Speech Recognition and Transcription

The audio data were processed through a comprehensive transcription workflow designed to ensure high accuracy in converting spoken communications to text format. For the speech-recognition component, several candidate architectures were systematically reviewed prior to selecting the final model. The shortlist included widely used open-source systems, namely DeepSpeech 2 [23], wav2vec 2.0 [24], and Whisper [25]. These systems were compared with respect to robustness under VHF-like noise and overlapping speech, multilingual coverage, requirements for domain-specific fine-tuning, and practical deployment constraints. The selection criteria were established based on domain-specific requirements of maritime VTS communications, including transcription accuracy under challenging acoustic conditions characteristic of VHF radio transmissions, multilingual processing capability for handling both standardized Maritime English and Korean language communications within the same operational context, deployment flexibility enabling processing of large archival datasets without continuous internet connectivity requirements, and computational feasibility for batch processing of extensive audio corpora.
Following systematic evaluation on representative samples of VTS communications, the Whisper model developed by OpenAI was selected as the optimal speech-to-text system [23]. Whisper represents a transformer-based encoder–decoder architecture trained on 680,000 h of diverse multilingual and multidomain audio data collected from internet sources. Several technical characteristics rendered Whisper particularly suitable for the maritime VTS application domain. First, the model’s extensive pretraining on acoustically diverse data provides inherent robustness to the signal quality variations, background noise, and transmission artifacts that characterize VHF maritime radio communications. Second, Whisper’s multilingual training enables seamless processing of code-switching between English and Korean without requiring manual language specification or separate processing pipelines for each language, a critical capability given that VTS operators frequently alternate between Maritime English standard phraseology and local-language communications. Third, as an open-source model enabling local deployment, Whisper addresses data security and confidentiality considerations associated with transmitting potentially sensitive maritime traffic coordination information to third-party cloud infrastructure.
We adopted a two-stage quality-assurance workflow for transcription. First, automatic transcription was performed with the Whisper-Large model, selected after validation for its strong performance on mixed-language VTS audio. Owing to pretraining on approximately 680,000 h of diverse multilingual speech, the model delivered reliable results across both standardized Maritime English and local-language exchanges without domain-specific fine-tuning. When automatic output yielded partial or incomplete utterances—because of degraded audio, background noise, or signal dropout—segments were screened by maritime-domain experts to determine whether sufficient contextual information remained for dependable situation classification and distance extraction. Utterances judged insufficiently complete were excluded from quantitative analyses to preserve data quality.
Second, three maritime-navigation experts with professional VTS experience verified all transcripts. This expert validation ensured the faithful rendering of technical terminology, vessel names, positional references, and distance expressions that are critical to downstream analysis, and corrected any misinterpretations of specialized maritime vocabulary. Discrepancies identified during review were systematically logged. Analysis revealed that most errors involved proper nouns, particularly ship names, and numerals affected by poor audio quality. These observations informed targeted post-processing rules that improved numerical-expression normalization.
To evaluate the transcription accuracy of the Whisper model on maritime VHF communications, a representative sample of audio files was manually labeled and compared against the automated transcriptions. Performance was assessed using Word Error Rate for Maritime English communications and Character Error Rate for Korean language communications. These metrics were selected based on the structural characteristics of each language. Word Error Rate serves as the standard evaluation metric for English speech recognition due to the clear word boundaries established by spaces in English text, enabling straightforward word-level comparison between reference and hypothesis transcriptions. Conversely, Character Error Rate provides a more appropriate measure for Korean language evaluation because Korean exhibits agglutinative morphology where grammatical elements attach to word stems, and character-level assessment better captures transcription accuracy in languages where word segmentation may be ambiguous or where morphological complexity affects word boundary determination. The evaluation revealed that the Whisper model achieved a Word Error Rate of 9.0% for Maritime English radio exchanges and a Character Error Rate of 5.5% for Korean communications. These error rates fall within acceptable ranges for speech recognition systems operating on challenging audio conditions characteristic of VHF radio transmissions, including background noise and signal interference. The performance metrics confirm that the transcribed text constitutes a reliable data source for the subsequent natural language processing-based situation analysis and distance extraction procedures.

2.3. Natural Language Processing and Situation Classification

The transcribed communication data were processed using natural language processing techniques to identify and extract specific encounter situations and associated distance information. Figure 2 presents the comprehensive flowchart for the situation extraction methodology. We used different tokenization methods for English and Korean to account for their unique linguistic structures. For Maritime English communications, word-based tokenization was employed, leveraging the natural word boundaries established by spaces in English text. For Korean language communications, character-based tokenization was implemented to address the agglutinative morphological structure of Korean, where grammatical particles and verbal endings are affixed to word stems.
The tokenization process can be formally represented as T E =   { w 1 ,   w 2 , ,   w n } for English, where each w represents an individual word, and T K =   { c 1 ,   c 2 , ,   c m } for Korean, where each c represents a character-level token. For the purposes of situation classification, T denotes the general set of tokens extracted from a transcript, which corresponds to either T E for English transcripts or T K for Korean transcripts depending on the source language.
Following the tokenization process, all identified communications underwent manual verification to ensure contextual appropriateness. This verification process confirmed that extracted instances genuinely represented VTS advisory communications regarding encounter situations, eliminating inaccurate matches that might arise from keyword matching in unrelated contexts.
Distance information was extracted through a process that identified numerical expressions and their associated unit indicators. The extraction procedure involved converting spoken numerical expressions to standardized numerical format. Text-to-number conversion was implemented to transform both written and spoken numerical expressions into consistent decimal notation. The conversion algorithm processed various linguistic formats of numerical expressions and standardized them to numerical values with appropriate precision.
To ensure computational consistency, all extracted distances were standardized to nautical miles as the uniform unit of measurement. This conversion was not a manual process but was handled by a systematic text-to-number conversion module. This module employs a rule-based system founded on regular expressions to parse a wide range of numerical formats observed in VTS communications. The rule-based system included specific patterns to handle digital, word-based, and hybrid expressions. Regular expression patterns employ metacharacters to define matching rules. Key metacharacters include \b, which denotes word boundary anchors ensuring pattern matching occurs at the beginning or end of complete words, \d, which represents any digit character from 0 through 9, and \s, which matches whitespace characters including spaces and tabs. The plus sign + functions as a quantifier that matches one or more occurrences of the preceding element, requiring at least one instance to be present. The question mark ? functions as a quantifier that makes the preceding element optional, matching zero or one occurrence of that element. For instance, explicit numerical terms such as three miles were captured via the pattern \b(\d+(?:\.\d+)?)\s+miles?\b, which matches a word boundary followed by one or more digits where the plus sign requires at least one digit, with an optional decimal component where the question mark makes the entire decimal portion optional, then one or more whitespace characters where the plus sign requires at least one space, and finally the term mile or miles at a word boundary, where the question mark after the s makes the plural form optional. This pattern standardized expressions to 3.0 nm. More complex verbal expressions, such as one decimal five nautical mile, were parsed by rules designed to map combinations of number-words and decimal keywords to their numerical counterparts, yielding 1.5 nm. Furthermore, common fractional terminology, such as half a mile or half nautical mile, was identified by dedicated patterns such as \bhalf(?:\s+an?)?\s+mile\b, where the pattern matches the word half at a word boundary, optionally followed by whitespace and the article a or an, with the plus sign requiring at least one whitespace character, the question mark making the n in an optional, and the outer question mark making the entire article group optional, then one or more whitespace characters where the plus sign again requires at least one space, and the unit term mile at a word boundary. These expressions were converted to the decimal equivalent of 0.5 nm.
Subsequently, an equally critical standardization step involved contextual filtering to ensure that the extracted distances represented the VTS operator’s own intervention. As the study’s objective is to measure operator-initiated advisories, a protocol was enforced to identify and filter out distances reported by third parties. For example, any utterance that did not contain a callsign phrase explicitly referencing VTS was programmatically filtered. This heuristic was applied under the assumption that communications lacking a VTS identifier in the callsign likely represented inter-vessel communications or vessel reports not directly related to an VTS operator’s advisory, rather than a direct, operator-initiated intervention. This standardization process ensured consistency across all extracted distance measurements regardless of the original communication format or language, enabling reliable statistical analysis of VTS intervention distances. The comprehensive approach to numerical expression conversion and unit standardization enabled quantitative analysis of maritime communication patterns and spatial relationships between vessels.
Encounter situations were classified through predefined keyword patterns specific to each encounter scenario. The classification was designed to identify three primary maritime encounter situations as defined by the COLREG. This approach leverages the structured nature of professional maritime communications, where vessel traffic services personnel employ standardized terminology to describe spatial relationships and navigational scenarios.
Let K represent the universal set of directional and positional keywords extracted from maritime communication transcripts. The classification system defines four distinct subsets corresponding to the primary encounter scenarios and distance measurements identified in vessel traffic management operations. Each subset contains lexical indicators that maritime professionals consistently employ to describe specific navigational contexts and spatial relationships between vessels.
Head-on encounters are characterized by vessels approaching from directly ahead along reciprocal or nearly reciprocal courses. The classification algorithm identifies these situations through the co-occurrence of directional keywords indicating forward approach vectors and distance measurement terms that provide spatial context. The English directional indicators include “ahead” and “head,” while distance measurements encompass “mile” and “nautical mile.” Korean maritime communications employ functionally equivalent terminology to convey identical spatial relationships. Professional maritime communications typically structure head-on situation, for example, “Ahead of you, distance X nautical miles.”
Crossing encounters involve vessels approaching from lateral quadrants, creating potential collision scenarios where one vessel crosses the intended path of another. The classification system distinguishes between port and starboard crossing situations based on the relative bearing of the approaching vessel. Port-side encounters are identified through keywords such as “port bow” and “port beam,” indicating vessels approaching from the port forward quadrant. Starboard-side encounters utilize corresponding terminology including “starboard bow” and “starboard beam” to denote approaches from the starboard forward quadrant. Korean maritime terminology provides equivalent expressions for these positional relationships.
Overtaking situations occur when faster vessels approach from astern, requiring specific navigational protocols to ensure safe passage. These scenarios are identified through keywords indicating aft relative positions, with both English and Korean maritime communications employing standardized terminology for vessels approaching from behind the reference vessel.
Since VTS operators inconsistently use the terms “mile(s)” or “nautical mile(s)” without distinguishing singular and plural, the keyword was standardized to the singular form “mile.”
The formal keyword sets for encounter classification are defined as follows:
K h e a d _ o n = { a h e a d ,   h e a d } K c r o s s i n g = { p o r t   b o w ,     p o r t   b e a m ,   ,   s t a r b o a r d   b e a m , s t a r b o a r d   s i d e } K o v e r t a k i n g = a s t e r n ,   b e h i n d ,   o v e r t a k i n g ,   o v e r t a k e ,   s t e r n K d i s t a n c e = { m i l e }
The encounter classification algorithm examines intersections between predefined keyword sets and token sets extracted from communication transcripts. For any given vessel traffic services communication transcript, the situation classification function S(T) determines encounter type by evaluating the presence of relevant keyword combinations within the extracted token set T.
S T = h e a d o n ,     i f   T     K h e a d _ o n       T     K d i s t a n c e                               c r o s s i n g ,           e l s e   i f     T     K c r o s s i n g       T     K d i s t a n c e           o v e r t a k i n g ,   e l s e   i f   T     K o v e r t a k i n g       T     K d i s t a n c e   u n d e f i n e d ,     o t h e r s w i s e                                                                                                                                                                
In this formulation, S(T) denotes the situation classification based on the token set T extracted from a transcript. The sets K h e a d _ o n , K c r o s s i n g , and K o v e r t a k i n g represent predefined keyword collections for head-on, crossing, and overtaking situations, respectively. The variable T represents the set of tokens extracted from the transcript, and the intersection operation identifies whether any situation-specific keywords are present. A non-empty intersection indicates the presence of at least one situation-specific keyword in the transcript, which then determines the corresponding situation classification.

3. Results

3.1. Extracted Transcripts for Analysis

The study collected data over the period from 7 August 2022 through 15 May 2023. The corpus comprises 171,175 distinct radio-communication files with a cumulative transcribed duration of 334.2 h. The primary computational environment consisted of Intel Xeon E52650 CPUs, 128 GB of RAM, and an NVIDIA RTX 3090 GPU with 24 GB VRAM to support compute intensive automatic speech recognition ASR inference and downstream natural language processing NLP operations. The entire analytical workflow was implemented in Python 3.9 as the primary programming language. The NLP pipeline employed custom Python modules built on the Natural Language Toolkit NLTK for text processing, supplemented by regular expression matching to extract structured information including encounter type labels and advisory distance measurements from the transcribed communications. All processed data were systematically organized in a PostgreSQL relational database to facilitate efficient querying and cross center analytical operations.
Table 1 presents representative Maritime English and Korean utterances extracted from the labeled corpus for head-on, crossing, and overtaking encounters. In crossing encounters, stand-on denotes advisories provided by VTS operators to the vessel on the starboard side, whereas give-way denotes advisories provided by VTS operators to the vessel on the port side. Each sentence includes situation-specific keywords rendered in boldface together with the distance to the relevant target. Taken together, these examples demonstrate that the dataset captures the keywords and distance expressions central to vessel-to-vessel and VTS–vessel communications in critical navigation.
Notably, in overtaking encounters, vessel-to-vessel exchanges occur more frequently than VTS–vessel advisories. Analysis of the overtaking situation data revealed distinct communication patterns that differ from other navigational encounters. The extracted dataset included only seven overtaking encounters, making this the smallest category among the three navigational situations analyzed. A notable finding emerged regarding distance reporting practices during overtaking maneuvers, as navigators consistently omitted explicit distance references in their communications. The mean distance recorded for seven overtaking situations was 1.1 nm, which falls well within the range of clear visual observation under normal visibility conditions. This finding indicates that operators consider precise distance articulation unnecessary when visual confirmation is readily available. Due to the limited sample size of only seven overtaking situations, these cases were excluded from the statistical analysis to ensure the validity and reliability of the results. The small number of overtaking samples was insufficient to generate statistically meaningful conclusions or identify robust patterns when compared to the other navigational scenarios.
Figure 3 depicts the temporal distribution of utterance durations. Most individual radiotelephony exchanges fall between five and twenty seconds, with a mean of 10.7 s. Standardized phraseology and operating procedures yield concise and consistently timed transmissions across the dataset, aligned with the operational imperatives of VTS radiotelephony, where brevity minimizes channel occupancy while preserving the clarity required for time critical collision avoidance advisories.
Table 2 presents the distribution of labeled VTS advisory transmissions across the five operational areas, representing the final analytical dataset of 865 communications that satisfied all filtering criteria described in Section 2.3. The Labeling data total duration column denotes the cumulative hours of voice communications collected by VTS, and the total across all areas is 334.2 h. The number of VTS advisory transmissions varies substantially by location, reflecting differences in traffic volume, geographical complexity, and encounter frequency characteristics of each jurisdiction. Busan Port VTS contributed 24 communications extracted from 19.1 h of recordings, while Ulsan Port VTS provided 233 communications from 91.4 h, Yeosu Port VTS contributed 27 communications from 37.5 h, Jeju Port VTS provided 146 communications from 110.9 h, and Jindo Coastal VTS contributed 435 communications from 75.3 h. These variations in extraction rates demonstrate that the density of relevant encounter situation advisories differs significantly across operational contexts, with coastal VTS areas such as Jindo exhibiting higher frequencies of collision avoidance communications compared to port-based VTS centers that handle proportionally more routine traffic management exchanges.

3.2. Distance Analysis by Encounter

Table 3 summarizes the count of labeled sentences and the mean advisory distance in nautical miles by VTS area including Busan, Ulsan, Yeosu, Jeju Port, and Jindo Coastal and encounter type. Encounters are classified as head-on and crossing, with the latter further split into stand-on vessel and give-way vessel.
Sample sizes and encounter type frequencies vary across VTS jurisdictions. Ulsan Port VTS exhibited a pronounced preference for crossing situation communications, with 180 crossing cases compared to 53 head-on encounters, representing a ratio of approximately 3.3:1. Conversely, Jindo Coastal VTS displayed an inverse pattern, recording 269 head-on encounters versus 166 crossing situations, yielding a ratio of approximately 1.6:1 in favor of head-on communications. These disparities likely reflect the distinct traffic patterns and geographical characteristics specific to each VTS operational area.
Figure 4 displays the mean advisory distances for head-on and crossing encounters in each VTS area. White numerals inside the bars indicate the mean distance in nautical miles. The black error bars denote the 95% confidence intervals (95% CI), calculated using the standard Formula (3).
CI = x ¯   ±   1.96 s n
where x ¯ is the sample mean, s is the sample standard deviation, and n is the sample size for each category.
The superscripted values with the plus-minus symbol above each bar represent the confidence interval half-width. Head-on advisories cluster between 2.3 and 3.3 nm across all sites, with Busan at 2.3 nm, Ulsan at 3.1 nm, Yeosu at 2.3 nm, Jeju at 2.6 nm, and Jindo at 3.3 nm. For crossing encounters, mean advisory distances differed by VTS area. In Busan, stand-on advisories averaged 2.8 nm and give-way 1.7 nm. The corresponding means for stand-on and give-way vessels were 2.7 nm and 3.0 nm in Ulsan, 3.2 nm and 2.9 nm in Yeosu, 2.3 nm and 2.8 nm in Jeju, and 2.8 nm and 2.9 nm in Jindo. Collectively, these results indicate site-specific variation, with earlier advisories to the give-way vessel in Ulsan, Jeju, and Jindo, whereas Busan exhibits the reverse ordering and Yeosu shows a modest reversal.
Figure 5 presents the mean distances for head-on and crossing situations at each VTS, with white numerals indicating mean values and black error bars representing 95% confidence intervals. The analysis revealed that communication at Busan Port VTS occurred at a mean distance of 2.5 nm, whereas Jindo Coastal VTS exhibited a greater mean communication distance of 3.2 nm. This finding indicates that Coastal VTS tend to provide advisories at earlier intervention points compared to Port VTS. The mean communication distances across all VTS areas ranged from 2.5 to 3.2 nm. The wider confidence intervals observed for Busan and Yeosu VTS can be attributed to their relatively smaller sample sizes.
Figure 6 shows the mean VTS advisory ranges by encounter type, averaged across five operational areas. Head-on encounters exhibit a larger mean advisory distance of 3.1 nm than crossing encounters at 2.8 nm. The error bars quantify uncertainty around the means, with half-widths of 0.09 nm for head-on and 0.07 nm for crossing situations, indicating limited dispersion. Analysis reveals that VTS operators provided head-on advisories at longer ranges than crossing advisories, averaging 0.3 nautical miles, equivalent to 550 m, in advance.
A standard assumption for the t-test, homogeneity of variance, was assessed using Levene’s test. The test was significant (p < 0.05), indicating unequal variances between the two groups. We therefore employ Welch’s t test, which does not assume equal variances. In Table 4, Welch’s t test compared head-on encounter advisory distances (n = 410; Mean = 3.1 nm; SD = 1.8) with crossing encounter advisory distances (n = 455; Mean = 2.8 nm; SD = 1.5). The mean difference of 0.3 nm (≈550 m) was statistically significant, t(803.6) = 2.8, p < 0.005.

4. Discussion

The findings provide quantitative evidence for standardizing VTS operational procedures with varying intervention timing by encounter type. Head-on encounters received advisories at greater distances, with a mean of 3.1 nm, compared to crossing situations, with a mean of 2.8 nm, representing a difference of approximately 550 m. This pattern aligns with the inherently higher risk associated with head-on encounters, where the combined closing speeds of opposing vessels necessitate earlier intervention to allow sufficient time for collision avoidance maneuvers.
Our findings partially corroborate the survey-based study conducted by Kim in 2013, which reported safe control distances of 3.6 nm for Jindo Coastal VTS and 1.75 nm for Busan Port VTS [20]. While our measured mean distances differ slightly, with 3.3 nm for Jindo and 2.3 nm for Busan, the relative pattern of Coastal VTS requiring greater intervention distances remains consistent. The discrepancies likely arise from methodological differences between subjective operator assessments and objective communication analysis.
The observed differences in VTS operator intervention timing across the five operational centers reflect not only variation in encounter types but also fundamental contextual factors that shape jurisdiction-specific decision-making. Several primary contextual dimensions influence these patterns and should be considered when interpreting the results. First, traffic intensity differs markedly among the study sites. Busan Port VTS, Yeosu Port VTS, and Ulsan Port VTS routinely manage high vessel volumes and complex multi-vessel interactions. In such high-density port environments, operators tend to intervene more frequently, and advisory distances are often shorter, as the need for early warnings must be balanced against the practical constraint of coordinating numerous simultaneous movements. Second, geographical constraints inherently condition intervention opportunities and strategies. Jindo Coastal VTS operates in relatively open coastal waters with longer lines of sight and fewer navigational hazards, allowing operators to issue strategic advisories at greater ranges to vessels on long approaches. By contrast, Busan Port VTS, Yeosu Port VTS and Ulsan Port VTS work within confined harbor approaches characterized by breakwaters, traffic separation schemes, and complex port geometries that create natural chokepoints where traffic converges, often necessitating more tactical, short-range interventions when immediate maneuvering decisions are required.
The quantitative advisory distances established in this study provide an empirical foundation for developing standardized VTS operational guidelines. The differential treatment of stand-on versus give-way vessels in crossing situations warrants particular attention. The variation in advisory timing between these vessel categories across different VTS areas indicates a lack of standardization that could potentially create confusion for navigators operating across multiple jurisdictions. In most areas, give-way vessels receive earlier advisories, while Busan shows the reverse pattern.
Our study has some limitations that should be noted. First, the limited sample size for overtaking situations, with only seven cases, prevented meaningful statistical analysis of this encounter type. Second, the study focused exclusively on Korea VTS centers, limiting generalizability to international contexts where communication patterns and operational practices may differ. Future research should extend this methodology to international VTS centers, enabling cross-cultural comparison of intervention practices. Integration with AIS data would enable correlation of communication patterns with actual vessel movements, validating whether earlier advisories result in improved collision avoidance outcomes. Specifically, future studies could combine timestamped VHF communication records with synchronized AIS trajectory data to quantitatively assess the effectiveness of operator interventions by measuring vessel response times, course alterations, and closest points of approach following advisories. Such integrated analysis would provide empirical validation of optimal intervention distances for different encounter scenarios.

5. Conclusions

This study presents the first large-scale, multi-center empirical analysis of VTS operator intervention timing derived from automated ASR applied to real-world maritime VHF radiotelephony. We integrate ASR with domain-specific maritime NLP to infer encounter type and to extract advisory distances directly from transcripts spanning five heterogeneous VTS jurisdictions. A tokenization-and-keyword framework handles Maritime English and local-language usage, normalizes textual numerals to numeric values, and standardizes units, thereby enabling consistent, cross-site statistical analysis.
These results fill a critical quantitative gap between AIS-based risk detection and human advisory action. By demonstrating that head-on conflicts are typically addressed earlier than crossings and by establishing typical advisory distances, the study supplies data-driven baselines for evaluating timeliness, designing decision support thresholds. Specifically, the findings suggest that VTS operators should consider initiating advisory calls at approximately 3.1 nm for head-on encounters and approximately 2.8 nm for crossing situations, subject to prevailing conditions. Adopting these benchmarks and the associated extraction pipeline is expected to provide important benefits by reducing late interventions in rapidly closing encounters. These outcomes support higher navigational safety margins and more predictable service for mariners.

Author Contributions

Conceptualization, S.-L.Y. and K.-I.K.; methodology, S.-L.Y.; software, S.-L.Y.; validation, S.-L.Y., C.-Y.J. and K.-I.K.; formal analysis, S.-L.Y. and C.-Y.J.; investigation, S.-L.Y. and C.-Y.J.; resources, K.-I.K.; data curation, C.-Y.J. and S.-L.Y.; writing—original draft preparation, S.-L.Y.; writing—review and editing, C.-Y.J. and K.-I.K.; visualization, S.-L.Y.; supervision, K.-I.K.; project administration, S.-L.Y. and K.-I.K.; funding acquisition, K.-I.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Research Grant of Jeonju University in 2025. This work was also supported by the project “Development of voice recognition technology for ship wireless communication equipment to respond to illegal foreign ship” (RS-2025-02219912), which was funded by the Korea Coast Guard, and by the project “Next-Generation Digital VTS International Standard Service and Equipment Development” (RS-2025-14322995), which was funded by the Korea Coast Guard.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study uses audio recordings, ASR transcripts, and derived analysis datasets. Aggregated analysis tables and the analysis scripts/model configurations are available from the corresponding author upon reasonable request. Access to the underlying materials may be subject to restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ducruet, C. The geography of maritime networks: A critical review. J. Transp. Geogr. 2020, 88, 102824. [Google Scholar] [CrossRef]
  2. Sharma, A.; Kim, T.E. Exploring technical and non-technical competencies of navigators for autonomous shipping. Mar. Policy Manage. 2022, 49, 831–849. [Google Scholar] [CrossRef]
  3. Hetherington, C.; Flin, R.; Mearns, K. Safety in shipping: The human element. J. Saf. Res. 2006, 37, 401–411. [Google Scholar] [CrossRef] [PubMed]
  4. Wróbel, K. Searching for the origins of the myth: 80% human error impact on maritime safety. Reliab. Eng. Syst. Saf. 2021, 216, 107942. [Google Scholar] [CrossRef]
  5. Sánchez-Beaskoetxea, J.; Basterretxea-Iribar, I.; Sotés, I.; Maruri Machado, M.M. Human error in marine accidents: Is the crew normally to blame? Mar. Transp. Res. 2021, 2, 100016. [Google Scholar] [CrossRef]
  6. Hughes, T. When is a VTS not a VTS? J. Navig. 2009, 62, 439–442. [Google Scholar] [CrossRef]
  7. Moreno, F.C.; Roca Gonzalez, J.; Suardíaz Muro, J.; García Maza, J.A. Relationship between human factors and a safe performance of vessel traffic service operators: A systematic qualitative-based review in maritime safety. Saf. Sci. 2022, 155, 105892. [Google Scholar] [CrossRef]
  8. Praetorius, G.; Hollnagel, E.; Dahlman, J. Modelling Vessel Traffic Service to understand resilience in everyday operations. Reliab. Eng. Syst. Saf. 2015, 141, 10–21. [Google Scholar] [CrossRef]
  9. Yang, D.; Wu, L.; Wang, S.; Jia, H.; Li, K.X. How big data enriches maritime research—A critical review of Automatic Identification System (AIS) data applications. Transp. Rev. 2019, 39, 755–773. [Google Scholar] [CrossRef]
  10. Filipiak, D.; Węcel, K.; Stróżyna, M.; Michalak, M.; Abramowicz, W. Extracting maritime traffic networks from AIS data using evolutionary algorithm. Bus. Inf. Syst. Eng. 2020, 62, 435–450. [Google Scholar] [CrossRef]
  11. Yan, Z.; Xiao, Y.; Cheng, L.; He, R.; Ruan, X.; Zhou, X.; Li, M.; Bin, R. Exploring AIS data for intelligent maritime routes extraction. Appl. Ocean Res. 2020, 101, 102271. [Google Scholar] [CrossRef]
  12. Forti, N.; Millefiori, L.M.; Braca, P. Unsupervised extraction of maritime patterns of life from Automatic Identification System data. In Proceedings of the OCEANS 2019—Marseille, Marseille, France, 17–20 June 2019; pp. 1–5. [Google Scholar]
  13. Tsou, M.C.; Hsueh, C.K. The study of ship collision avoidance route planning by ant colony algorithm. J. Mar. Sci. Technol. 2010, 18, 746–756. [Google Scholar] [CrossRef]
  14. Liu, Z.; Liu, J.; Zhou, F.; Liu, R.W.; Xiong, N. A robust GA/PSO-hybrid algorithm in intelligent shipping route planning systems for maritime traffic networks. J. Internet Technol. 2018, 19, 1635–1644. [Google Scholar]
  15. Lamm, A.; Hahn, A. Detecting maneuvers in maritime observation data with CUSUM. In Proceedings of the 2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Bilbao, Spain, 18 December 2017; pp. 122–127. [Google Scholar]
  16. Zhang, S.K.; Shi, G.Y.; Liu, Z.J.; Zhao, Z.W.; Wu, Z.L. Data-driven based automatic maritime routing from massive AIS trajectories in the face of disparity. Ocean Eng. 2018, 155, 240–250. [Google Scholar] [CrossRef]
  17. International Maritime Organization. IMO Model Course 3.17. In Maritime English; International Maritime Organization: London, UK, 2010. [Google Scholar]
  18. Wang, Y.; Zhang, J.; Liu, R.W. Deep learning for automatic speech recognition in maritime VHF communication. J. Mar. Sci. Eng. 2021, 9, 411. [Google Scholar]
  19. Chen, P.; Huang, Y.; Mou, J.; van Gelder, P.H.A.J.M. Ship collision candidate detection method: A velocity obstacle approach. Ocean Eng. 2018, 170, 186–198. [Google Scholar] [CrossRef]
  20. Kim, J.S. A basic study on the VTS operator’s minimum safe distance. J. Korean Soc. Mar. Environ. Saf. 2013, 19, 476–482. [Google Scholar] [CrossRef]
  21. Park, S.W.; Park, Y.S. A Basic Study on Development of VTS Control Guideline based on ship’s operator’s Consciousness. J. Navig. Port Res. 2016, 40, 105–111. [Google Scholar] [CrossRef]
  22. International Maritime Organization. Convention on the International Regulations for Preventing Collisions at Sea (COLREG), 1972; IMO: London, UK, 1972. [Google Scholar]
  23. Amodei, D.; Ananthanarayanan, S.; Anubhai, R.; Bai, J.; Battenberg, E.; Case, C.; Zhu, Z. Deep speech 2: End-to-end speech recognition in english and mandarin. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 173–182. [Google Scholar]
  24. Baevski, A.; Zhou, Y.; Mohamed, A.; Auli, M. wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 2020, 33, 12449–12460. [Google Scholar]
  25. Radford, A.; Kim, J.W.; Xu, T.; Brockman, G.; McLeavey, C.; Sutskever, I. Robust speech recognition via large-scale weak supervision. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 28492–28518. [Google Scholar]
Figure 1. Geographic distribution of VTS study sites in the Republic of Korea.
Figure 1. Geographic distribution of VTS study sites in the Republic of Korea.
Applsci 15 11968 g001
Figure 2. Keyword-driven pipeline for situation classification (head-on, crossing, overtaking) from VHF radio transcripts.
Figure 2. Keyword-driven pipeline for situation classification (head-on, crossing, overtaking) from VHF radio transcripts.
Applsci 15 11968 g002
Figure 3. Distribution of utterance duration in transcribed VTS radiotelephony across the five operational areas. Bars show 1 s bins, and numbers indicate bin counts.
Figure 3. Distribution of utterance duration in transcribed VTS radiotelephony across the five operational areas. Bars show 1 s bins, and numbers indicate bin counts.
Applsci 15 11968 g003
Figure 4. Bar graph and error bars of advisory distance for each head-on and crossing situations (stand-on and give-way) by VTS area.
Figure 4. Bar graph and error bars of advisory distance for each head-on and crossing situations (stand-on and give-way) by VTS area.
Applsci 15 11968 g004
Figure 5. Bar graph and error bars of advisory distance including head-on and crossing situations by VTS area.
Figure 5. Bar graph and error bars of advisory distance including head-on and crossing situations by VTS area.
Applsci 15 11968 g005
Figure 6. Mean advisory distance by encounter type, averaged across the five VTS areas.
Figure 6. Mean advisory distance by encounter type, averaged across the five VTS areas.
Applsci 15 11968 g006
Table 1. Samples labeled sentences from VHF radio transcripts and text-to-number conversion by encounter types.
Table 1. Samples labeled sentences from VHF radio transcripts and text-to-number conversion by encounter types.
Encounter
Type
Sample of Sentences and Text-to-Number Conversion
Head-onAhead of you two nautical mile. → 2.0 nm
전방 이 마일 → 2.0 nm
CrossingStand-onOn your port bow distance three point six mile outbound vessel → 3.6 nm
좌현 선수 삼점 육 마일에 출항선이 있습니다. → 3.6 nm
Give-wayInbound vessel hanyu dream your starboard bow two mile pass port to port. → 2.0 nm
우현 선수 이 마일에 있는 입항선과 좌현대 좌현 하십시오. → 2.0 nm
OvertakingAstern from you, I will overtake on your port side.
귀선 선미에서 좌현으로 추월하겠습니다.
Note. Non-English (Korean) terms are retained to preserve the original VHF utterances. Boldface denotes situation-specific keywords.
Table 2. Labeled VHF radiotelephony by VTS area showing total duration and number of sentences.
Table 2. Labeled VHF radiotelephony by VTS area showing total duration and number of sentences.
VTS AreaLabeling Data Total DurationNumber of VTS Advisory Transmissions
Busan Port VTS19.1 h24
Ulsan Port VTS91.4 h233
Yeosu Port VTS37.5 h27
Jeju Port VTS110.9 h146
Jindo Coastal VTS75.3 h435
Total334.2 h865
Table 3. Labeled sentence counts and average distance (nm) at the time of VTS advisories, by VTS area and encounter type.
Table 3. Labeled sentence counts and average distance (nm) at the time of VTS advisories, by VTS area and encounter type.
VTS AreaEncounter TypeNumber of VTS Advisory TransmissionsMean Advisory Distance(nm)
Busan Port VTSHead-on72.3
Crossing (stand-on vessel)132.8
Crossing (give-way vessel)41.7
Ulsan Port VTSHead-on533.1
Crossing (stand-on vessel)792.7
Crossing (give-way vessel)1013.0
Yeosu Port VTSHead-on152.3
Crossing (stand-on vessel)63.2
Crossing (give-way vessel)62.9
Jeju Port VTSHead-on662.6
Crossing (stand-on vessel)352.3
Crossing (give-way vessel)452.8
Jindo Coastal VTSHead-on2693.3
Crossing (stand-on vessel)782.8
Crossing (give-way vessel)882.9
TotalHead-on4103.1
Crossing (stand-on vessel)2112.7
Crossing (give-way vessel)2442.9
Table 4. Results of Welch’s t-test for advisory distances by encounter type.
Table 4. Results of Welch’s t-test for advisory distances by encounter type.
Encounter TypenMean (nm)SDtdfp
Head-on4103.11.82.8803.60.0048
Crossing4552.81.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yoo, S.-L.; Kim, K.-I.; Jung, C.-Y. Speech Recognition-Based Analysis of Vessel Traffic Service (VTS) Communications for Estimating Advisory Timing. Appl. Sci. 2025, 15, 11968. https://doi.org/10.3390/app152211968

AMA Style

Yoo S-L, Kim K-I, Jung C-Y. Speech Recognition-Based Analysis of Vessel Traffic Service (VTS) Communications for Estimating Advisory Timing. Applied Sciences. 2025; 15(22):11968. https://doi.org/10.3390/app152211968

Chicago/Turabian Style

Yoo, Sang-Lok, Kwang-Il Kim, and Cho-Young Jung. 2025. "Speech Recognition-Based Analysis of Vessel Traffic Service (VTS) Communications for Estimating Advisory Timing" Applied Sciences 15, no. 22: 11968. https://doi.org/10.3390/app152211968

APA Style

Yoo, S.-L., Kim, K.-I., & Jung, C.-Y. (2025). Speech Recognition-Based Analysis of Vessel Traffic Service (VTS) Communications for Estimating Advisory Timing. Applied Sciences, 15(22), 11968. https://doi.org/10.3390/app152211968

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop