1. Introduction
The continuous escalation of global economic uncertainty, coupled with frequent geopolitical conflicts and public events, has intensified volatility in the shipping market, a core pillar of international trade. Fluctuations in freight rates directly affect transportation costs, supply chain stability, and investment decisions across global commodity markets. As a widely recognized barometer of dry bulk shipping activity, the Baltic Dry Index (BDI) reflects the balance between shipping demand and fleet capacity and is therefore closely monitored by market participants, policymakers, and researchers. Improving the understanding and forecasting of BDI dynamics is of substantial practical and academic importance, particularly in periods of heightened uncertainty.
Forecasting the Baltic Dry Index (BDI) has long been challenged by pronounced nonlinear behavior, time-varying dynamics, and strong sensitivity to external shocks. Early studies typically relied on econometric and volatility-oriented frameworks that model BDI dynamics primarily from historical freight rates and related structured market variables, aiming to capture persistence and cyclical behavior in shipping markets (e.g., Chen et al. [
1]; Abakah et al. [
2]). However, empirical evidence increasingly suggests that such approaches struggle to maintain predictive accuracy when market conditions deviate sharply from historical patterns, particularly under major economic events and policy-induced disruptions [
1,
3]. This limitation has motivated a growing body of research seeking more adaptive and information-rich forecasting paradigms.
To overcome these limitations, subsequent studies have progressively extended traditional frameworks by enhancing feature representation and information sources. Wu et al. [
4] combined signal decomposition with probabilistic modeling to alleviate structural instability and forecast uncertainty in BDI series, while Zhang et al. [
5] introduced a dynamic fluctuation network coupled with artificial intelligence techniques to explicitly capture evolving volatility structures, particularly under extreme market conditions. Complementary evidence indicates that incorporating external market information further improves predictive performance. Su et al. [
6] demonstrated that commodity futures data significantly enhance BDI forecasting accuracy through a CNN–BiLSTM ensemble, and Bouri et al. [
7] showed that low-frequency climate indicators such as ENSO provide incremental predictive power when integrated into mixed-frequency forecasting models.
Methodologically, decomposition-based forecasting frameworks have gained prominence as an effective means of handling multiscale and nonstationary characteristics in freight-related time series. Variational mode decomposition and related techniques have been widely integrated with deep learning models to enhance prediction stability and robustness [
8,
9]. Extensions of this paradigm to multi-step forecasting further demonstrate that decomposition-assisted deep learning frameworks outperform conventional approaches when prediction horizons are extended, with similar long-horizon forecasting architectures also reported in other domains [
10,
11,
12]. Similar conclusions have been reported in other high-volatility domains, where adaptive decomposition and denoising strategies improve forecasting reliability under noisy conditions [
13]. These studies collectively indicate that decomposition is not merely a preprocessing step but a key component for constructing multiscale representations suitable for long-horizon prediction.
In parallel, recent research has explored the intrinsic complexity and predictability limits of time series to support complexity-aware modeling strategies. Related evidence from financial markets further suggests that explicitly modeling dynamic dependence structures through network-based and copula-based frameworks can substantially improve volatility forecasting performance in complex systems [
14]. Entropy-based analyses have been used to quantify the upper bounds of short-term predictability in complex systems [
15], while hybrid frameworks combining optimized decomposition with deep learning and uncertainty quantification demonstrate improved robustness for highly complex sequences [
16]. From a broader perspective, comprehensive reviews of shipping economic forecasting highlight a clear methodological evolution toward hybrid, AI-driven, and information-enriched models, while also noting the continued dominance of structured historical variables in most existing frameworks [
17].
Beyond structured quantitative data, the price dynamics and forecasting literature provides compelling evidence that incorporating unstructured textual information into predictive models yields forward-looking insights unavailable from historical prices alone [
18]. In particular, market sentiment extracted from text has been shown to contain predictive information for asset prices and trading behavior [
19], motivating the construction of sentiment indices as forecasting inputs. Recent studies also emphasize the value of integrating sentiment indicators, non-traditional signals, and behavioral factors with machine learning to improve predictive accuracy in financial markets [
20]. Subsequent research has shown that sentiment effects are inherently dynamic and scale-dependent, with time–frequency analyses revealing heterogeneous impacts across different horizons [
21,
22]. Advances in modeling frameworks further integrate sentiment into ensemble learning and data-driven prediction systems, improving both accuracy and interpretability under volatile market conditions [
23,
24]. More recently, domain-adapted language models, such as fine-tuned BERT variants, have been shown to substantially enhance the quality and predictive value of sentiment indices compared with traditional lexicon-based approaches [
25].
Building on these methodological advances, sentiment analysis has been increasingly applied to shipping and freight markets. Bai et al. [
26] constructed a shipping sentiment index from news coverage and provided early empirical evidence of its predictive relevance for dry bulk freight rates. This line of research was extended by Sui et al. [
27], who employed market-specific language models and demonstrated that dry bulk market sentiment Granger-causes the BDI, outperforming conventional sentiment measures. Further studies revealed nonlinear and state-dependent relationships between shipping sentiment and freight rates, particularly in iron ore transportation markets [
28]. In related work, textual sentiment has also been integrated with system dynamics and supply chain models to enhance freight rate forecasting under disruption scenarios [
29].
Taken together, existing studies demonstrate substantial progress in modeling the nonlinear and multiscale characteristics of the Baltic Dry Index and in incorporating sentiment-based information into freight market forecasting. However, most existing approaches remain predominantly centered on historical freight time-series data and structured explanatory variables. While effective in capturing past market dynamics, such frameworks often exhibit lagged responses to rapidly evolving market conditions and provide limited insight into shifts in market sentiment. Moreover, sentiment information is typically introduced in a homogeneous manner, with insufficient consideration of how its predictive role may vary across different levels of time-series complexity or forecasting horizons. These limitations underscore the need for more sensitive, forward-looking, and complexity-aware analytical frameworks for BDI prediction.
In response to these limitations, this study introduces the integration of sentiment data with multiscale decomposition and complexity-aware models, developing a sentiment-based forecasting framework for the BDI. Specifically, building upon existing literature, the study constructs a shipping market sentiment index from news data and incorporates a sentiment gating mechanism, which explicitly accounts for the heterogeneity in time-series complexity. This key innovation aims to enhance the sensitivity and robustness of multi-step BDI predictions under volatile market conditions.
2. Methods
2.1. Research Approach
This study aims to improve the prediction accuracy of shipping freight rate indices by leveraging the complementary informational advantages of structured freight rate data and unstructured textual data to develop a comprehensive technical framework. The research is divided into two core phases: (1) Focusing on data preprocessing and feature engineering, this phase completes the construction of sentiment indices, the decomposition and reconstruction of the BDI series, and complexity assessment; (2) Focusing on the development of prediction models, this phase involves building separate models for low- and high-complexity components, integrating sentiment features via a gating mechanism, and finally systematically validating the models’ practical predictive performance through a combination of performance comparison and case analysis.
Figure 1 illustrates the technical framework and complete implementation process of this study.
2.2. Data Collection and Preprocessing
This study constructs and analyzes two datasets: historical observations of the BDI and a corpus of shipping-industry-related news headlines.
The BDI serves as a benchmark indicator of conditions in the global dry bulk shipping market. The index is published by the Baltic Exchange in London and is released on all working days throughout the year. Movements in the index encapsulate changes in freight rates and transportation demand for key bulk commodities, including iron ore, coal, and grain, and thus provide a timely reflection of shipping market cycles, supply chain dynamics, and underlying macroeconomic activity. Given its established role in both academic research and industry practice, the BDI is employed as a proxy for fundamental market conditions in the shipping sector. The sample covers the period from November 2019 to January 2025 and consists of 1888 daily observations, capturing multiple phases of market expansion and contraction. Due to the inherent gaps in the BDI release schedule, linear interpolation is applied to fill in the missing values, ensuring a continuous and coherent time series for analysis.
Shipping sentiment is quantified using news headline data drawn from industry-specific media sources. News headlines are concise linguistic representations of underlying events and typically exhibit a clear evaluative orientation. Relative to full-text articles, headlines more directly convey sentiment while avoiding the semantic ambiguity and contextual dependency inherent in longer narratives, thereby enhancing their suitability for natural language processing applications.
Headline data are collected from several authoritative and highly specialized shipping information platforms, including China Shipping Network, China Water Transport Network, Port Information Network, and Logistics Baba. The news corpus is classified into three thematic categories: maritime transportation, port operations, and logistics systems. A multi-stage screening procedure is implemented, combining keyword-based semantic filtering with manual verification, to exclude non-informative content such as personal interviews and promotional materials. Following data cleaning and preprocessing, the final dataset comprises 53,650 news headlines published between 4 November 2019 and 3 January 2025, providing a robust textual foundation for the construction of the cumulative shipping sentiment index.
2.3. Construction of the Cumulative Shipping Sentiment Index
2.3.1. Sentiment Quantification Model
- 1.
Manual Annotation Procedure
Shipping-related news headlines contain extensive industry-specific terminology and nuanced semantics, rendering generic sentiment analysis tools inadequate for extracting market-relevant signals. Accordingly, a three-category sentiment scheme (positive, negative, and neutral) is constructed, where positive sentiment reflects favorable market information (e.g., freight rate increases or supportive policies), negative sentiment captures adverse shocks (e.g., freight rate declines or supply chain disruptions), and neutral sentiment corresponds to informational content without a clear emotional orientation. Manual labeling is performed by three annotators with expertise in shipping economics using a two-stage cross-labeling and adjudication protocol, preceded by annotator calibration. Two annotators independently assign labels, with disagreements resolved by a third reviewer. Inter-annotator reliability, assessed using Cohen’s Kappa coefficient, equals 0.84 in a pilot sample and remains above 0.80 for the full dataset, indicating robust labeling consistency.
- 2.
Sentiment Quantification Based on a RoBERTa Pretrained Model
The sentiment quantification model is constructed using manually annotated shipping news headlines and a RoBERTa pretrained language model, whose core architecture is based on a Transformer encoder composed of stacked self-attention and feed-forward layers. Given an input sequence
X ∈
Rn×d, where
n denotes the sequence length and
d the embedding dimension, linear projections are applied to generate the query, key, and value matrices
Q,
K, and
V. Self-attention weights are computed via scaled dot-product attention, as defined in Equation (1), where
WQ,
WK, and
WV ∈
Rd×dk are learnable parameter matrices and
dk denotes the feature dimension.
To capture semantic information from multiple representation subspaces, multi-head attention is employed by concatenating the outputs of parallel attention heads and applying a linear transformation, as specified in Equations (2) and (3), where
WO denotes the output projection matrix.
The attention outputs are subsequently processed by position-wise feed-forward networks with GELU activation, combined with residual connections and layer normalization, forming a deep contextual representation framework.
During pretraining, contextual semantic features are learned through a masked language modeling objective, formalized in Equation (4), where M denotes the set of masked positions,
the corresponding ground-truth tokens, and
the masked input sequence.
For the downstream shipping news sentiment classification task, the RoBERTa model is fine-tuned on the labeled dataset. The data are stratified and split into training, validation, and test sets following an 80–10–10 ratio. Headlines are tokenized using the RoBERTa tokenizer, with a maximum sequence length set to 128; sequences exceeding this length are truncated, while shorter sequences are padded with [PAD] tokens to ensure batch-wise input consistency. On top of the pretrained encoder, two fully connected layers and a Dropout layer are added to mitigate overfitting. Sentiment labels are discretized into negative, neutral, and positive classes, and supervised learning is conducted using a cross-entropy loss function. Model training is implemented in a GPU environment using the roberta-base-chinese pretrained model from the Hugging Face Transformers library, optimized with the AdamW optimizer and augmented by early stopping and learning-rate scheduling strategies. In the inference stage, predicted class probabilities are transformed into a continuous sentiment polarity score in the interval
using a linear weighting scheme, as defined in Equation (5).
- 3.
Performance Comparison and Evaluation of Sentiment Quantification Models
Considering the characteristics of shipping news, where positive sentiment headlines are more frequent and neutral headlines are relatively sparse, the dataset exhibits class imbalance. To address this, Weighted-F1 is primarily used as the evaluation metric, as it accounts for class imbalance by weighting each class’s performance according to its frequency. Macro-F1 and Accuracy are also reported to provide a comprehensive evaluation of the model’s performance.
A set of benchmark models is employed for comparative analysis, encompassing both traditional text modeling approaches that rely on static word embeddings or local sequence features and pretrained BERT-type models that learn dynamic contextual representations from large-scale corpora. All models are trained and evaluated under identical experimental settings, including the same text preprocessing pipeline, training data, and hyperparameter configurations, to ensure result comparability.
As reported in
Table 1, the Word2Vec-MLP and FastText models exhibit comparable but relatively limited performance, reflecting the inability of static word embeddings to capture contextual semantics and domain-specific polysemy prevalent in shipping news. Models incorporating sequential modeling mechanisms, such as TextCNN and BiLSTM, achieve notable performance improvements; however, TextCNN is constrained to local feature extraction, while BiLSTM remains limited in modeling long-range dependencies. Consequently, both models underperform relative to pretrained language models. Leveraging bidirectional Transformer-based contextual encoding, the BERT model delivers substantial gains in both classification accuracy and overall fit. RoBERTa further strengthens semantic representation learning and consistently outperforms all benchmark models across evaluation metrics, confirming its superiority for sentiment classification in shipping-related news. Notably, RoBERTa effectively addresses class imbalance while preserving strong overall classification performance. By strengthening semantic representation learning, it consistently outperforms all benchmark models, confirming its superiority in sentiment classification for shipping-related news.
2.3.2. Cumulative Sentiment Index
Market sentiment in the shipping industry does not arise from isolated news items but reflects the cumulative influence of multiple events over time. To capture this dynamic process, we construct a daily Cumulative Sentiment Index (CSI) that aggregates previously released news while allowing their impact to decay gradually.
Each news headline is evaluated using the sentiment classification model described in
Section 2.3.1, which generates an individual sentiment score
. The CSI aggregates these scores across time using an event-smoothing structure. Intuitively, newly released information exerts a stronger influence on market sentiment, and its effect weakens as market attention shifts to subsequent developments.
Formally, the daily value of the cumulative sentiment index is defined as
where the summation is taken over all news items
satisfying
,
denotes the target date,
is the model-derived sentiment score of news item
,
is its publication date,
determines the length of the post-publication influence window, and
governs the rate at which sentiment impact decays over time. In this study, the parameters are set to
and
for constructing the CSI. The sensitivity of forecasting performance to these parameter choices is examined in the subsequent analysis.
The resulting shipping news sentiment index is illustrated in
Figure 2. Key event windows are highlighted, revealing pronounced sentiment fluctuations around major shocks. The index exhibits a sharp decline during the outbreak of the COVID-19 pandemic, a short-term surge surrounding the Suez Canal blockage, sustained volatility during the Red Sea crisis, and a gradual recovery during periods of policy adjustment. These patterns are consistent with the documented market impacts of these events, thereby supporting the index’s ability to sensitively capture dynamics in shipping market sentiment.
2.4. Data Decomposition and Reconstruction
To isolate the intrinsic fluctuation structure of the BDI time series, a decomposition–reconstruction framework based on variational mode decomposition (VMD) and fuzzy C-means (FCM) clustering is implemented. VMD is formulated as a constrained variational optimization problem in which the observed signal
is decomposed into
band-limited modal components
with associated center frequencies
, satisfying the reconstruction constraint:
where
denotes the residual component. The decomposition minimizes the aggregate bandwidth of all modes, defined as:
The constrained problem is solved via an augmented Lagrangian formulation using the alternating direction method of multipliers. Modal components and center frequencies are updated iteratively as specified in Equations (9) and (10) until the convergence criterion in Equation (11) is satisfied.
Following VMD, the resulting modes are aggregated using fuzzy C-means clustering. A feature set
is constructed based on each mode’s center frequency, mean amplitude, and fluctuation intensity. The two key parameters, the number of clusters
and the fuzziness coefficient
, are predefined according to standard clustering criteria. Cluster memberships and centers are obtained by minimizing the objective function:
The iteration terminates when the error falls below 10−5, after which modes assigned to the same cluster are superposed to obtain reconstructed modal components, yielding a parsimonious representation of the underlying dynamics.
2.5. Sequence Complexity Measurement
Following decomposition, the freight index series is represented by reconstructed modal components with heterogeneous dynamic characteristics, ranging from highly regular and quasi-deterministic patterns to irregular and strongly stochastic fluctuations. Applying a uniform forecasting strategy across all components may therefore be suboptimal. To quantify this heterogeneity, sample entropy (SampEn) is employed to measure the complexity of each reconstructed modal component and to support complexity-adaptive modeling.
For a time series
, vectors of embedding dimension
are constructed as:
and the distance between vectors
and
is evaluated. Vector pairs with distances smaller than a tolerance
are regarded as similar, and the proportion of similar pairs is defined as:
where
denotes the indicator function and
is the distance metric. Averaging over all vectors yields:
Increasing the embedding dimension to
and repeating the procedure gives
, and sample entropy is defined as:
Lower SampEn values indicate stronger regularity and lower forecasting difficulty, whereas higher values reflect greater complexity and randomness.
To classify reconstructed modal components by complexity, a relative threshold is introduced to avoid scale dependence. The threshold is defined as:
where
and
denote the maximum and minimum SampEn values across all reconstructed modal components. Components with SampEn exceeding
are classified as high-complexity, and the remainder as low-complexity, providing a parsimonious and comparable basis for subsequent differentiated modeling.
2.6. Dual-Sentiment Gated Forecasting Model
A dual-sentiment gated forecasting model is constructed within a unified multi-step framework to evaluate the predictive contribution of sentiment information. Multi-step forecasting is implemented using an encoder–decoder architecture to generate -step-ahead predictions.
The model takes two types of inputs: the low- and high-complexity component sequences and the cumulative shipping sentiment index (CSI). The two component groups are processed through separate encoding channels to capture heterogeneous dynamic patterns. A sentiment gating mechanism is introduced to modulate the encoded representations. The gating signal is constructed from a sentiment window of length , , which contains only observations up to the forecasting origin . It is transformed into a weight vector in [0, 1] and applied to the encoded representations prior to decoding.
Given the pronounced non-stationarity and heavy-tailed characteristics of the BDI series, the model is trained to predict the first-order difference in BDI rather than its level. The model forecasts the first-order differences over the next horizons. Let denote the observed level at the forecasting origin . Future BDI levels are reconstructed by cumulatively summing the predicted differences and adding them to this observed initial value. All performance evaluations are conducted on the reconstructed BDI levels. Multi-step forecasting errors are evaluated in a horizon-wise manner, with RMSE, MAE, and computed separately for each forecasting horizon, and their arithmetic mean across horizons reported as the final performance metric.
The data are partitioned chronologically into a training-and-tuning sample (the first 80% of observations) and an independent test set (the remaining 20%). Model training and hyperparameter tuning are conducted exclusively within the first 80% under an expanding-window walk-forward scheme. In each rolling iteration, the training set consists of all historical observations available up to the current forecasting origin, while a validation subset extracted from the tail of the training window is used for early stopping. The immediately following forecasting window is used for internal performance comparison. After training is completed, the model is fitted using the entire 80% training sample and evaluated on the independent test set. In the final test period, forecasts are generated using a rolling-origin scheme with a one-step shift, where each day serves as a forecasting origin and an -step-ahead sequence is produced.
The architecture, illustrated in
Figure 3, implements the empirical configuration adopted in this study. The final model employs a BiGRU encoder for the low-complexity components and a BiLSTM encoder for the high-complexity components, each with 64 hidden units. The historical input window length and the forecasting horizon are both set to 14 time steps. The CSI window is first aggregated via global average pooling and then mapped to a 64-dimensional sigmoid-activated gating vector through a fully connected layer. The resulting weights are applied element-wise to the encoded representations. The decoder consists of a single-layer GRU with 128 hidden units, followed by a TimeDistributed dense layer that produces the 14-step output sequence. Model training employs a weighted mean squared error loss with exponentially decaying horizon weights and is optimized using the Adam algorithm (learning rate = 0.001, batch size = 16). Training runs for up to 50 epochs, with early stopping and ReduceLROnPlateau scheduling applied to enhance convergence and generalization.
4. Case-Based Discussion
To better understand the mechanisms underlying the observed performance gains, it is informative to examine the model’s behavior during a period of abrupt exogenous shock. The Red Sea crisis in late 2023 constituted a sudden exogenous shock to the global dry bulk shipping market, marked by severe security risks, forced vessel rerouting around the Cape of Good Hope, and an abrupt contraction in effective transport capacity. Such conditions led to heightened uncertainty and rapidly deteriorating market expectations, creating a setting in which freight rate dynamics departed sharply from historical patterns. During this period, shipping-related news coverage intensified and was accompanied by a pronounced decline in the sentiment index, reflecting a rapid reassessment of market conditions by industry participants.
Starting from the outbreak of the crisis on 19 October 2023, multi-step forecasts generated by the proposed model exhibit close alignment with observed BDI movements, as illustrated in
Figure 6. The model successfully captures both the downward trend and the increased volatility observed in the early stage of the disruption. Compared with approaches that rely exclusively on past freight rates, the inclusion of sentiment information improves responsiveness to abrupt market deterioration, suggesting that textual sentiment contains forward-looking signals that precede their full incorporation into price-based indicators.
The empirical behavior observed during the crisis highlights the functional role of the sentiment gating mechanism. Under conditions of structural disruption, historical freight dynamics provide limited guidance due to rapid shifts in supply–demand balance and elevated uncertainty. By dynamically modulating the influence of sentiment signals, the gating mechanism allows negative sentiment to exert stronger influence on high-complexity components, which are more sensitive to irregular shocks and short-term disturbances. This adaptive adjustment mitigates the risk of mechanical trend extrapolation and enhances the model’s ability to accommodate regime changes.
Beyond its methodological implications, the Red Sea crisis also illustrates how sentiment-enhanced forecasting models can inform decision-making under extreme market stress. In this context, the model consistently signaled downward pressure on the BDI, providing early indications of deteriorating market conditions. From a short-term perspective, such signals may assist shipping firms in evaluating tactical responses aimed at mitigating immediate operational risks, including reassessing routing options, strengthening customer communication, and improving cost control through more flexible capacity scheduling and fuel management. From a medium- to long-term perspective, the persistence of negative sentiment and sustained market stress suggests the need for more strategic adjustments if disruptions to key shipping lanes are expected to continue; sentiment-informed forecasts may therefore support decisions related to capacity restructuring, long-term contract negotiation, and strategic repositioning under multiple contingency scenarios. Rather than prescribing specific actions, these results demonstrate how timely anticipation of adverse freight dynamics can enhance both operational preparedness and longer-horizon strategic alignment under abrupt and recurring disruptions.