Temporal Dynamics in Short Text Classification: Enhancing Semantic Understanding Through Time-Aware Model

Abdalgader, Khaled; Matroud, Atheer A.; Al-Doboni, Ghaleb

doi:10.3390/info16030214

Open AccessArticle

Temporal Dynamics in Short Text Classification: Enhancing Semantic Understanding Through Time-Aware Model

by

Khaled Abdalgader

^1,*

,

Atheer A. Matroud

² and

Ghaleb Al-Doboni

¹

School of Engineering and Computing, American University of Ras Al Khaimah, Ras Al Khaimah P.O. Box 10021, United Arab Emirates

²

Cyber Security Program, De Montfort University-Dubai, Dubai P.O. Box 294345, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Information 2025, 16(3), 214; https://doi.org/10.3390/info16030214

Submission received: 21 January 2025 / Revised: 1 March 2025 / Accepted: 4 March 2025 / Published: 10 March 2025

(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)

Download

Browse Figures

Versions Notes

Abstract

Traditional text classification models predominantly rely on static text representations, failing to capture temporal variations in language usage and evolving semantic meanings. This limitation reduces their ability to accurately classify time-sensitive texts, where understanding context, detecting trends, and addressing semantic shifts over time are critical. This paper introduces a novel time-aware short text classification model incorporating temporal information, enabling tracking of and adaptation to evolving language semantics. The proposed model enhances contextual understanding by leveraging timestamps and significantly improves classification accuracy, particularly for time-sensitive applications such as News topic classification. The model employs a hybrid architecture combining Convolutional Neural Networks (CNNs) and Bidirectional Long Short-Term Memory (BiLSTM) networks, enriched with attention mechanisms to capture both local and global dependencies. To further refine semantic representation and mitigate the effects of semantic drift, the model fine-tunes GloVe embeddings and employs synonym-based data augmentation. The proposed approach is evaluated on three benchmark dynamic datasets, achieving superior performance with classification accuracy reaching 92% for the first two datasets and 85% for the third dataset. Furthermore, the model is applied to a different-fields categorization and trend analysis task, demonstrating its capability to capture temporal patterns and perform detailed trend analysis of domain-agnostic textual content. These results underscore the potential of the proposed framework to provide deeper insights into the evolving nature of language and its impact on short-text classification. This work advances natural language processing by offering a comprehensive time-aware classification framework, addressing the challenges of temporal dynamics in language semantics.

Keywords:

sentence classification; hybrid architecture; attention mechanism; synonym augmentation; GloVe embeddings; temporal sensitivity; ensemble learning

1. Introduction

Short text classification is a critical task in natural language processing (NLP), with applications ranging from sentiment analysis [1,2,3] and news categorization [4,5,6] to decision support systems [7,8]. Traditional short-text classification models, however, rely on static text representations, which ignore the temporal aspect of language evolution [9,10,11,12,13,14]. Over time, words and phrases change in meaning or importance due to cultural, social, and technological shifts. For example, the word “cloud” once predominantly referred to a meteorological phenomenon, but in recent years, it has become more closely associated with cloud computing technology. Ignoring these shifts can lead to inaccurate classifications in time-sensitive applications, such as trend analysis [15,16], public opinion tracking [17], and historical text classification [18]. Time-aware text classification, however, addresses this limitation by incorporating temporal information into the classification process [19,20,21,22,23], enabling models to account for the evolving nature of language [21,24,25,26,27].

Traditional models, such as Naive Bayes [28,29], Support Vector Machines (SVMs) [30,31,32], Convolutional Neural Networks (CNNs) [33,34,35,36], and Recurrent Neural Networks (RNNs) [37,38], have achieved significant success in text classification tasks by focusing on static content [39]. These models typically use static word embeddings like Word2Vec [39] or GloVe [40], which represent words as fixed vectors, regardless of the context or time period in which they appear. While adequate for many NLP tasks, these static embeddings do not capture semantic drift—changes in word meaning over time—or the evolving contexts in which words are used [41,42,43]. This limitation becomes especially apparent in domains where terminology shifts rapidly, such as technology, politics, and economics [44].

To address these limitations, we propose a time-aware sentence (i.e., short-text) classification model that integrates temporal data to evolve with language. By leveraging timestamps and time-aware word embeddings, our model captures the temporal dynamics of language, ensuring more accurate and contextually relevant classifications. This approach significantly enhances the model’s adaptability and performance in tasks where language evolution plays a key role, such as analyzing news trends or shifts in public sentiment.

One of the core challenges addressed by our model is semantic drift, the gradual change in word meanings over time, which was addressed in [41,42,43,44]. For instance, the word “viral” once referred exclusively to biological infections, but today, it is commonly associated with the rapid spread of media on the internet (e.g., “viral videos”). Static models trained on older datasets may misclassify contemporary texts by failing to account for such shifts in meaning. Our model integrates temporal information, allowing it to track these changes and adjust its predictions accordingly. This is particularly useful in time-sensitive tasks like news classification or social media analysis, where word meanings can evolve rapidly.

Contextual relevance is another critical factor in temporal-aware sentence classification. The meaning of a text can vary depending on the period in which it was written. For example, terms like “housing market crash” may carry different implications during an economic recession than during a period of economic stability. Without temporal awareness, models may fail to recognize these contextual shifts, leading to misclassification. By incorporating timestamps into our model, we ensure that it distinguishes between historical and contemporary contexts, offering more accurate classifications based on the time period of the text.

Moreover, traditional models quickly become outdated as language evolves [45,46,47]. This is particularly problematic in fast-changing fields like technology, where new terms emerge frequently. For example, words like “blockchain” and “artificial intelligence” have gained prominence in recent years, whereas they were virtually unknown a decade ago. Our model addresses this issue by dynamically updating its understanding of word meanings through the fine-tuning of pre-trained embeddings and synonym-based data augmentation. This dynamic learning process ensures the model remains relevant and accurate, even as language evolves.

Recent advances in NLP, such as BERT [45] and XLNet [47], have revolutionized the field by introducing contextualized embeddings, allowing for a better understanding of word meaning in context. However, these models do not account for temporal shifts in language, limiting their effectiveness in time-sensitive tasks. While they excel at providing context-sensitive word representations, they fail to incorporate the temporal dimension, leaving a gap in their ability to adapt to evolving language. Our model addresses this gap by explicitly integrating temporal data, allowing for a more accurate and adaptable understanding of language over time.

This paper introduces several key contributions to advanced time-aware sentence classification, with its primary contribution being a hybrid approach that incorporates temporal-aware embeddings. The proposed model seamlessly integrates temporal dynamics into a hybrid Bidirectional Long Short-Term Memory (BiLSTM) [48,49] and Convolutional Neural Networks (CNN) [34,35] architecture. The novelty of this approach lies in its combination of Temporal-Aware Word Embeddings, which enable the model to capture semantic changes over time, and attention mechanisms that enhance focus on the most relevant portions of the text. The detailed contributions of this study are as follows: First, we introduce the Temporal-Aware word embeddings model, which integrates temporal information directly into the word embedding process. Unlike previous approaches that treated temporal information as supplementary, we directly embed temporal data into the word embedding process. Second, our model employs a hybrid CNN-BiLSTM architecture with an attention mechanism, combining the strengths of CNNs for local feature extraction and BiLSTMs for capturing long-range dependencies. This hybrid architecture enhances the model’s flexibility, while the attention mechanism allows the model to focus on the most relevant parts of each sentence, improving both interpretability and classification accuracy. Third, our model operates within a dynamic learning process, which fine-tunes pre-trained GloVe embeddings and employs synonym-based data augmentation to adapt to evolving language contexts. This ensures the model remains relevant even in rapidly changing environments, offering improved performance in long-term applications. We also employ a comprehensive evaluation framework, including metrics such as accuracy, F1 score [50], Cohen’s Kappa [51], Matthews Correlation Coefficient (MCC) [52], Mean Squared Error (MSE), and Mean Absolute Error (MAE) [53]. These metrics, combined with time-specific evaluations, provide a thorough analysis of the model’s ability to handle semantic drift and adapt to temporal changes in language, validating its effectiveness in time-sensitive classification tasks. Finally, we report results obtained by applying the proposed classification model (i.e., in vivo evaluation) to a domain-agnostic (i.e., different-fields) categorization and trend analysis task. The implementation results demonstrated that incorporating temporal information within the embedding process leads to favorable performance among time-sensitive text classification.

The remainder of the paper is organized in the following manner. Section 2 describes relevant work in text classification methods and relevant language models. Section 3 presents preliminaries to the sentence classification task, and Section 4 describes the proposed time-aware model. Empirical results are presented in Section 5, and Section 6 concludes the paper.

2. Related Work

2.1. Traditional Approaches to Sentence Classification

Sentence classification is a fundamental task in natural language processing, underpinning various applications, including sentiment analysis [1,2,3], topic categorization [54,55,56], and spam detection [57,58]. Early approaches to text classification primarily relied on statistical methods like Naive Bayes [28,29] and Support Vector Machines (SVMs) [30,31,32], utilizing bag-of-words and n-gram representations to classify sentences based on word frequency and occurrence patterns [39]. These models, while effective in capturing basic patterns in text, suffered from a key limitation: they treated sentences as unordered sets of words, thus failing to capture contextual relationships between words. Consequently, they were not well-suited to tasks that required an understanding of word order or syntactic structure.

The development of word embeddings, such as Word2Vec [39] and GloVe [40], transformed the field by mapping words into continuous vector spaces, capturing semantic relationships, and allowing models to represent words with more meaningful features [59]. These embeddings significantly improved the performance of text classification models. However, they remained context-free—each word had a single, fixed representation, irrespective of the sentence or time period it appeared [60]. As a result, they struggled with tasks requiring a nuanced understanding of word meaning in context, such as sarcasm detection or sentiment analysis.

The advent of deep learning, however, marked a turning point in sentence classification. Convolutional Neural Networks (CNNs) [33,34,35,36], originally developed for image processing, were adapted to text by treating sequences of words as visual patterns. Kim (2014) [61] demonstrated that CNNs could effectively capture local dependencies in text, such as n-grams, making them highly effective for classification tasks. CNNs’ ability to detect local features in textual data enabled them to outperform traditional statistical models in a range of sentence classification tasks [55].

Meanwhile, Recurrent Neural Networks (RNNs) [37,38] and their variants, such as Long Short-Term Memory (LSTM) [62] networks, were designed to handle sequential data. These models were adept at capturing long-term dependencies in text by retaining information over time. Bidirectional LSTMs (BiLSTMs) [48,49] enhanced LSTMs by processing input text in both forward and backward directions, offering a deeper contextual understanding of words. Graves et al. (2013) [63] showed that BiLSTMs outperformed CNNs on tasks requiring understanding sentence structure and sequence order, such as language modeling and named entity recognition.

Despite the advancements made by CNNs and BiLSTMs, these techniques encounter challenges in capturing long-range dependencies and handling text that requires deeper context. The introduction of the attention mechanism by Bahdanau et al. (2014) [64] addressed this limitation. Attention mechanisms allowed models to selectively focus on the most relevant parts of the input when making predictions, improving performance in tasks like machine translation [65], text summarization [66], and sentence classification [67].

Although these models were effective in capturing short-term and local dependencies, they fell short in accounting for language evolution over time. By treating language as static, they overlooked the temporal dynamics that can influence word meaning and usage in various contexts.

2.2. Contextualized Embeddings: BERT, GPT, and XLNet

The development of contextualized embeddings, epitomized by models like BERT (Bidirectional Encoder Representations from Transformers) [45] and GPT (Generative Pre-trained Transformer) [46], marked a significant leap forward in NLP. Unlike static embeddings, these models generated context-sensitive embeddings that varied depending on the surrounding words. BERT’s bidirectional architecture enabled it to simultaneously capture both left and right contexts, leading to improved performance in various tasks, including question answering, text classification, and natural language inference.

However, despite their contextual sensitivity, models like BERT, RoBERTa [68], and XLNet [47] did not incorporate an explicit approach to modeling the temporal dynamics of language. They treated text as temporally static, failing to account for word meanings and connotations changing over time. This limitation becomes especially significant in applications where understanding language evolution is essential, such as analyzing decades of news articles or monitoring shifts in public sentiment on social media platforms.

For example, while BERT can provide a highly contextualized representation of the word “Apple” based on its immediate context (technology vs. fruit), it cannot account for the temporal evolution of the term. Whether the text was written in the 1980s, when Apple was an emerging technology company, or in the 2020s, when it became a dominant player in the global market, BERT would use the same underlying pre-trained embedding. This limitation in capturing temporal context led to the exploration of time-aware models that integrate temporal information into word representations.

2.3. Time-Aware Models: Capturing Language Evolution

Recent research has increasingly recognized the need to incorporate temporal information into NLP models to capture the evolving nature of language. Language is not static—words shift in meaning, usage, and significance over time, reflecting broader cultural, societal, and technological changes. To address these dynamics, several approaches have been developed to incorporate temporal aspects directly into the training and representation of language models [21,24,25,26,27,41,42,43,44,69].

Yao et al. (2018) [70] introduced Dynamic Word Embeddings; a model designed to capture shifts in word meaning across different time periods. By embedding temporal metadata into the training process, the model was able to produce word representations that evolve with time. For instance, the word “cloud” could be represented differently in datasets from the 1990s compared to more recent contexts where “cloud” refers to cloud computing. This capability allowed the model to distinguish between different meanings of the same word depending on the historical context.

Similarly, Bamler and Mandt (2017) [71] proposed Dynamic Embeddings that explicitly modeled semantic drift by adjusting word representations over time. This approach addressed one of the critical limitations of static embeddings like Word2Vec or GloVe: their inability to reflect how word meanings change. The dynamic embeddings model accounted for the evolving context in which words are used, making it particularly effective for tasks like historical text analysis or trend detection. For example, the term “virus” primarily referred to biological infections many years ago, but recently, its meaning expanded to include references to viral social media content and digital computer viruses.

More recently, several studies have focused on integrating temporal dynamics into more complex architectures like CNNs, LSTMs, and transformers [72,73], addressing a key gap in earlier models [21,24,25,26,27]. Gururangan et al. (2020) [74] introduced domain-adaptive pre-training (DAPT) and task-adaptive pre-training (TAPT) approaches, which adapt models like BERT to specific domains or tasks, including temporally sensitive contexts. Although these methods do not explicitly track temporal changes, they provide a pathway for better domain-specific language modeling. However, they still lack the ability to capture fine-grained temporal shifts in language.

Building on these efforts, Wang et al. (2023) [75] presented BiTimeBERT, a model designed to capture temporal changes in language by modifying BERT’s architecture to include temporal embeddings. This approach incorporates time as an explicit input to BERT, allowing it to generate temporally contextualized word embeddings. BiTimeBERT demonstrated significant improvements in tasks such as sentiment analysis, where the meaning of words like “stock” or “inflation” shifts depending on the time period. The model’s integration of temporal information allows it to adapt to changing word meanings, leading to more accurate classifications.

Similarly, Gong et al. (2020) [76] introduced a framework for enriching word embeddings with temporal and spatial information. This approach explicitly models semantic drift by conditioning word representations on time and location, enabling a better understanding of evolving word usage. The embeddings preserve the geometric consistency of semantically stable words while also capturing different degrees of meaning change across conditions. For example, the model could distinguish the usage of the word “cloud” as a meteorological term in older texts and its association with cloud computing in modern contexts. This method highlights the importance of incorporating temporal and spatial metadata into word representations to reflect linguistic and sociocultural dynamics over time.

Another recent contribution was by Tang et al. (2023) [77], who proposed a template-based temporal adaptation method for pre-trained masked language models (MLMs) to generate dynamic contextualized word embeddings (DCWEs). By fine-tuning pre-trained MLMs using time-sensitive templates, the model adapts to changing word meanings across two time periods. The approach reduces perplexity and outperforms state-of-the-art models such as TemporalBERT, showcasing its efficacy in capturing temporal variations in word semantics. For instance, terms like “mask” shifted their associations from “hide” (2010) to “vaccine” (2020), demonstrating the ability to adapt to emerging contexts.

Despite these advances, most time-aware models are still limited by treating temporal information as a supplementary feature rather than a core component of the model architecture. While models like BiTimeBERT and dynamic embeddings have made strides in capturing temporal shifts in word usage, they often rely on static embeddings augmented with external temporal metadata rather than integrating temporal awareness directly into the learning process. These systems fail to adapt to evolving linguistic contexts and struggle to address long-term semantic drift, particularly in complex architectures like CNNs, LSTMs, and transformers.

Our proposed approach addresses these limitations by fully embedding temporal information into the model architecture at its core. Unlike existing methods, our model integrates temporal-aware word embeddings directly into the training process, ensuring that temporal metadata is inherently learned alongside word representations. Specifically, our hybrid CNN-BiLSTM architecture dynamically uses pre-trained GloVe embeddings to reflect temporal variations in language use. This integration allows the model to adapt in real time to semantic shifts, capturing both content and context while preserving temporal dynamics.

Additionally, our proposed model employs attention mechanisms to selectively focus on the most relevant portions of text, further enhancing its ability to align temporal information with semantic meaning. By incorporating a dynamic learning process with synonym-based data augmentation, the model ensures robust generalization and continual adaptation to emerging trends and evolving language. This integrated approach not only mitigates the limitations of treating temporal data as a supplementary feature but also ensures that the model dynamically adjusts to linguistic shifts, delivering more accurate and contextually relevant predictions.

As a result, our model demonstrates significant improvements in time-sensitive tasks such as trend analysis, news categorization, and historical text classification. By embedding temporal metadata as a core learning component, this approach addresses gaps in existing models, enabling deeper insights into the interplay between language evolution and text semantics. This novel approach ensures that the model remains relevant and effective as the language evolves.

3. Preliminaries and Technical Details

This section introduces the foundational concepts, key terminologies, and formal notations used throughout our proposed time-aware sentence classification model. By establishing a clear mathematical framework, we provide the necessary background to understand the temporal dynamics incorporated into the model. We first describe the static sentence classification process, followed by the temporal extensions that form the basis of our approach.

3.1. Static Sentence Classification

In traditional sentence classification, the task is to predict the label

y \in Y

for a given input sentence

s \in S

, where Y is the set of possible labels (e.g., news categories) and S is the set of sentences in a given dataset D. Each sentence s consists of a sequence of words s = [w₁, w₂, …, w_n], where

w_{i} \in V

represents the i-th word from a vocabulary V.

The classification function (i.e., classifier)

f : S \to Y

typically relies on a feature representation h(s), which is obtained by transforming the sentence into a continuous vector space through word embeddings such as Word2Vec or GloVe. The task is to learn a function f, often represented by a neural network such as a CNN or RNN, that maps the feature representation of the sentence h(s) to the corresponding class y.

Formally, the classifier can be expressed as follows:

\hat{y} = \arg \max_{y \in Y} P (y | h (s))

(1)

where

P (y | h (s))

is the conditional probability of the label y given the sentence embedding h(s). The goal is to minimize the classification error by learning an optimal mapping from sentences to labels.

However, traditional methods assume that word meanings remain constant, leading to limitations when applied to text spanning multiple time periods, where word meanings may shift. This static approach does not account for semantic drift, which is crucial for handling time-sensitive classification tasks.

3.2. Problem Formalism: Time-Aware Sentence Classification

The core challenge in time-aware sentence classification is to extend the static model by incorporating temporal information to capture the evolution of word meanings over time. In this regard, we introduce temporal dependencies into the classification process, allowing the model to dynamically adjust its predictions based on the time context in which a sentence was written.

Let t ∈ T denote a timestamp, where T represents the set of all possible time periods. The input sentence is now represented as s_t = (s, t), where s is the sequence of words, and t is the timestamp associated with the sentence. The task is to predict the label y ∈ Y = {y₁, y₂, …, y_τ} based not only on the content of the sentence but also on the time period t.

Formally, the time-aware classification problem can be expressed as finding a function f : S × T → Y, which maps the sentence and its timestamp to a label:

\hat{y} = \arg \max_{y \in Y} P (y| h (s_{t})),

(2)

where h(s_t) is the time-aware sentence embedding that incorporates both the semantic content of the sentence and its temporal context.

In contrast to static embeddings, we aim to compute a time-dependent representation h(s_t), where the embedding of each word w_i ∈ S evolves over time. This allows us to capture the semantic drift that occurs when word meanings change across different time periods.

3.3. Temporal-Aware Word Embeddings

Traditional word embeddings, such as Word2Vec and GloVe, treat word representations as static. However, words can change their meaning over time. To address this, we introduce Temporal-Aware Word Embeddings, which incorporate the timestamp t ∈ T, where T is the set of possible time periods. Let w_i be a word at time t, the corresponding embedding is represented as follows:

e (w_{i}, t) = f_{e m b e d} (w_{i}, t),

(3)

where

f_{e m b e d} : V \times T \to R^{d}

maps the word and its timestamp into a d-dimensional embedding space. These embeddings evolve with time, capturing the semantic drift of words. The sentence embedding at time t, denoted as h(s_t), is then computed by aggregating the dynamic word embeddings as more details are provided in Section 4.3.

3.4. Time-Aware Sentence Classification with Hybrid Architecture

In our model, we integrate both Convolutional Neural Networks (CNNs) and Bidirectional LSTMs (BiLSTMs) with attention mechanisms to process the time-aware sentence embeddings. This hybrid architecture leverages CNNs to capture local dependencies within sentences and BiLSTMs to capture long-range dependencies across time-aware embeddings. Additionally, the attention mechanism focuses on the most relevant parts of the sentence, further enhancing the model’s ability to adapt to temporal changes.

The classification model is formalized as follows:

h(s_t) = Attention (BiLSTM (CNN (e(w₁, t), …, e (w_n, t)))),

(4)

where the attention mechanism is used to selectively weight important time-dependent word embeddings e(w_i, t) for the final sentence representation. The classification function is then computed by a fully connected layer and a softmax operation to predict the probability distribution over the possible labels:

P (y| h (s_{t})) = s o f t m a x (W h (s_{t}) + b),

(5)

where W and b are learned parameters of the model.

3.5. Loss Function

The model is trained by minimizing the cross-entropy loss, which is defined as follows:

L = - \sum_{j = 1}^{N} \sum_{y \in Y} y_{j} \log P (y| h (s_{t_{j}})),

(6)

where N is the number of training examples, y_j is the true label for the j-th sentence, and

P (y | h (s_{t_{j}}))

is the predicted probability of label y given the sentence

s_{t_{j}}

.

This temporal-aware loss ensures that the model optimizes both the content and the temporal context, adapting to the dynamic evolution of language.

4. Proposed Approach

This section presents the technical details and the underlying rationale for the time-aware sentence classification model, including data preprocessing and augmentation strategies, dynamic embedding, the proposed hybrid architecture, attention mechanisms, ensemble learning, and computational complexity analysis. Figure 1 shows the general architecture of the proposed time-aware sentence classification model.

4.1. Data Processing and Augmentation

Before feeding sentences into the model, text preprocessing ensures that raw text is converted into a clean, usable format. Each sentence s undergoes the following steps:

Lowercasing: All words are converted to lowercase.
Tokenization: Sentences are split into tokens w_i ∈ V, where V is the vocabulary.
Stopwords Removal: Words from a predefined stopword list are removed.
Lemmatization: Words are reduced to their base forms using a lemmatizer, ensuring that inflected forms are handled correctly.

Let s = [w₁, w₂, …, w_n] represent a sentence with n words. After processing, each word w_i is ready to be mapped to its corresponding embedding e(w_i, t), which includes the temporal context. To improve the model’s robustness, we employ synonym-based data augmentation. For each sentence s, we replace a random selection of words w_i with their synonyms with a probability p_augment. This process can be formalized as follows:

{\tilde{w}}_{i} = \{\begin{matrix} s y n o n y m (w_{i}) w i t h p r o b a b i l i t y P_{a u g m e n t} \\ w_{i} O t h e r w i s e . \end{matrix}

(7)

This ensures variability in the training data, helping the model generalize better. The augmented sentences

\tilde{s}

are concatenated with the original data to create a more diverse dataset.

4.2. Dynamic Embedding

A central component of our model is the integration of Temporal-Aware Word Embeddings, which is designed to address the limitations of traditional static embeddings by incorporating temporal dependencies. Given a word w_i ∈ V, the embedding e(w_i, t) is dynamic, based on the corresponding timestamp t. This allows the model to reflect the temporal nuances of word meanings. Figure 2 shows the general process of temporal-aware word embedding.

Mathematically, the embedding e(w_i, t) is a time-varying function:

e (w_{i}, t) = f_{e m b e d} (w_{i}, t),

(8)

where

f_{e m b e d} : V \times T \to R^{d}

maps each word and its corresponding timestamp to a d-dimensional vector space. This mapping function adjusts the embedding of each word based on the specific time period, effectively capturing semantic drift. To construct the sentence embedding h(s_t) at the corresponding time t, however, we explore three different temporal embedding aggregation techniques:

Aggregate-based Embedding Fusion (ABEF): The aggregate function computes a single embedding vector for each word by averaging its temporal embeddings across all available timestamps. Given a word w_i, the aggregate embedding e(w_i)_agg is calculated by averaging its embeddings over all timestamps:

e {(w_{i})}_{a g g} = \frac{1}{T} \sum_{t = 1}^{T} e (w_{i}, t)

(9)

The sentence embedding h(s_t) is then constructed by aggregating these time-independent word embeddings:

h(s_t) = Aggregate(e(w₁, t), e(w₂, t), ..., e(w_n, t)),

(10)

where the aggregation function captures forward and backward dependencies within the sentence.

Time-weighted Embedding Fusion (TAEF): In this technique, recent meanings are prioritized by weighting embeddings according to a temporal decay parameter λ. The time-weighted embedding for a word w_i is computed as follows:

e {(w_{i}, t)}_{t w} = \sum_{t = 1}^{T} α_{t} e (w_{i}, t) w h e r e α_{t} = e^{- λ (T - t)} .

(11)

Here, α_t decays exponentially, emphasizing recent embeddings. The sentence embedding h(s_t) is derived by aggregating the weighted word embeddings:

h(s_t) = Aggregate(e(w₁, t), e(w₂, t), ..., e(w_n, t)),

(12)

Self-Attention with Temporal Positional Encoding (SATPE): This method enhances temporal awareness by incorporating positional encoding into the embeddings. For each word embedding e(w_i, t) at a corresponding timestamp t, a temporal positional encoding P_t is added:

e(w_i, t)_temp = e(w_i, t) + P_t

(13)

The temporal sentence embedding is obtained by applying multi-head self-attention on these temporally enriched embeddings. For each attention head, the weighted embeddings is computed as follows:

A (e {(w_{i}, t)}_{t e m p}) = s o f t m a x (\frac{{Q K}^{T}}{\sqrt{d_{k}}}) V

(14)

where Q, K, and V represent the query, key, value matrices, and d represents the dimensionality of the key and query vectors within the self-attention mechanism. The sentence embedding h(s_t) is then constructed by aggregating these time-independent word embeddings:

h(s_t) = Aggregate (A (e(w₁, t)_temp), A (e(w₂, t)_temp), ..., A (e(w_n, t)_temp))

(15)

In this way, h(s_t) encodes the entire sentence’s content, adjusted by the temporal position of each word, allowing the model to capture both semantic relationships and time-based word evolution within the sentence.

4.3. Model Architecture

The architecture of our model combines the strengths of CNNs and BiLSTMs. CNNs capture local dependencies, while BiLSTMs capture long-term dependencies in the sentence sequence.

Convolutional Neural Networks (CNNs): The convolutional layers extract local features from the input sentence embeddings. Given the sentence embedding matrix

h (s_{t}) \in R^{n \times d}

, where each row corresponds to a word embedding, the CNN applies filters to capture local n-grams. The output of the convolutional layer is:

C (s_{t}) = C N N (h (s_{t}))

(16)

where

C (s_{t})

represents the extracted local features. These features are then passed through a max-pooling layer to retain the most salient information.

Bidirectional Long Short-Term Memory Networks (BiLSTMs): The BiLSTM layer processes the sentence embeddings to capture both forward and backward dependencies. For a sequence h(s_t), the BiLSTM computes:

h_{B i L S T M} = B i L S T M (C (s_{t})),

(17)

where

h_{B i L S T M}

is the output representation that captures both the past and future context for each word in the sentence.

4.4. Attention Mechanism

To further improve the model’s ability to focus on relevant information, we incorporate an attention mechanism. The attention mechanism allows the model to assign different weights to different words based on their relevance to the classification task. The attention score for each word is computed as follows:

α_{i} = \frac{e x p (v^{T} \tanh (W_{h} h_{i}))}{\sum_{j} e x p (v^{T} t a n t (W_{h} h_{j}))},

(18)

where h_i is the hidden state for the i-th word, and v and W_h are learnable parameters. The final sentence representation is then computed as a weighted sum of the hidden states:

h_{a t t n} = \sum_{i} α_{i} h_{i}

(19)

This mechanism ensures that the model attends to the most important parts of the sentence when making predictions.

4.5. Ensemble Learning

To enhance robustness and accuracy, we incorporate ensemble learning by training two models: a CNN-based model and a BiLSTM-based model. The predictions from each model are combined using an averaging strategy. Given two models f₁ and f₂, the final prediction

\hat{y}

is computed as follows:

P (\hat{y} | h (s_{t})) = \frac{1}{2} (P (y | f_{1} (h (s_{t}))) + P (y | f_{2} (h (s_{t}))))

(20)

This approach leverages the complementary strengths of the two models to improve overall performance and stability.

4.6. Computation Complexity Analysis

The computational complexity of our model is driven by the following factors:

Convolutional Layers: The complexity of the CNN layer is O(n ⋅ k ⋅ d ⋅ f), where n is the sentence length, k is the kernel size, d is the embedding dimension, and f is the number of filters.

BiLSTM Layers: The BiLSTM layer has a complexity of O(n ⋅

h^{2}

), where n is the sentence length, and h is the number of hidden units.

Attention Mechanism: The attention mechanism adds a complexity of O(n ⋅ d) for computing the attention scores and the weighted sum of the hidden states.

In total, the model’s complexity scales with O(n ⋅ (

h^{2}

+ k ⋅ d ⋅ f)), making it efficient for large-scale datasets while maintaining flexibility to capture local and global features.

5. Experiments and Results

In this section, we evaluate the effectiveness of the proposed method on three freely available benchmark datasets: Ireland-news-headlines (https://www.kaggle.com/therohk/ireland-historical-news, accessed on 10 February 2025), Sentiment140 [78] and 20 News Groups (https://www.kaggle.com/crawford/20-newsgroups, accessed on 10 February 2025) datasets. We first describe the benchmark datasets, experiment setup, and evaluation criteria. Then, we discuss the obtained results and compare the performance of the proposed model with that of other existing approaches. In addition, as an end-to-end evaluation, we apply the proposed sentence classification model to a news categorization and trend analysis task across different fields.

5.1. Benchmark Datasets

To evaluate our proposed Time-Aware Sentence Classification Model, we tested our method on three distinct datasets: the Ireland-news-headlines dataset, the Sentiment140 dataset and 20 News Groups dataset. These datasets offer diverse temporal and contextual information, allowing us to assess the robustness and adaptability of our model across different language dynamics and classification tasks. Figure 3 shows the distribution of categories (labels) in the selected benchmark datasets.

Ireland-news-headlines Dataset: This dataset consists of over 1.6 million headlines from the Irish Times, spanning multiple decades. Each headline is labeled with a category, such as news, culture, opinion, business, sport, and lifestyle, and is associated with a timestamp in DD, MM, YY format. The category distribution is imbalanced, with the majority of headlines falling under news, followed by sport and business, while categories such as culture, opinion, and lifestyle are less frequent as can be seen from Figure 3a. This dataset provides a unique opportunity to capture temporal shifts in language usage and track semantic evolution over time.

Sentiment140 Dataset: This dataset contains also 1.6 million tweets, each labeled as positive or negative (associated with a timestamp) based on sentiment. The distribution is approximately balanced, with similar proportions of positive and negative sentiments as can be seen in Figure 3b. The Sentiment140 dataset was created by gathering tweets containing emoticons, which were then used as noisy labels. This dataset provides a shorter text format with temporal data, enabling us to test the model’s effectiveness in sentiment analysis over time.

20 News Groups Dataset: This dataset comprises 18,000+ news articles (1995–2000) across 20 domain-agnostic topics (see Figure 3c), including politics, sports, business, technology and etc. It serves as a benchmark for multi-class text classification and evaluates the model’s ability to generalize across varied textual domains while incorporating temporal context.

The Ireland-news-headlines dataset captures the temporal dynamics of news articles, making it suitable for tasks that require understanding the evolution of language in long-form content. In contrast, the Sentiment140 dataset allows for the analysis of short, sentiment-driven texts, enabling the model to handle both sentiment analysis and time-sensitive classification tasks across different domains. Additionally, the 20 News Groups dataset provides a diverse range of topics across multiple categories, making it ideal for evaluating the model’s ability to classify multi-topic documents (i.e., domain-agnostic) over time while handling semantic variations in different domains. Table 1 shows the summarized key statistics for the three datasets, including the number of instances, time span, average text length, and category distribution.

5.2. Experiment Setup

In the experiment setup, we explored three different embedding techniques to evaluate the impact of temporal information on classification performance:

Aggregated Temporal Embeddings: This method computes the word embeddings for each word in a sentence and aggregates them through averaging to form a sentence representation, taking into account the timestamp of each word.

Time-Weighted Embedding Fusion: In this approach, embeddings are adjusted by applying time-based weights to enhance the influence of recent or temporally relevant information.

Self-Attention with Temporal Positional Encoding: In this approach, self-attention mechanisms incorporate temporal positional encodings, allowing the model to assign different levels of importance to words based on their position and time context.

We conducted our experiment usingthe following hyperparameters:

Embedding dimension: 100.
Maximum sequence length: 100 tokens.
BiLSTM units: 128.
Attention dropout: 0.3.
Conv1D filters: 128 with kernel size 5.
Dense layers: Two layers with 256 and 128 units, respectively, followed by dropout layers.
Learning rate: 0.00005 with early stopping and learning rate reduction strategies.

Each model variant was trained using categorical cross-entropy as the loss function and the Adam optimizer [79], with early stopping and learning rate scheduling to prevent overfitting and optimize training convergence.

5.3. Evaluation Criteria

To rigorously evaluate the model performance, we employed a comprehensive set of metrics, each providing unique insights into different aspects of classification:

Accuracy [50]: Measures the overall proportion of correctly classified samples:

A c c u r a c y = \frac{N u m b e r o f c o r r e c t p r e d i c t i o n s}{T o t a l n u m b e r o f p r e d i c t i o n s}

(21)

Cohen’s Kappa Score (CK) [51]: Assesses the agreement between predicted and true labels, accounting for chance:

κ = \frac{p_{o} - p_{e}}{1 - p_{e}}

(22)

where p_o is the observed agreement and p_e is the expected agreement by chance.

F1 Score (Macro and Micro) [50]: The F1 score provides a balanced evaluation of precision and recall, offering a comprehensive measure of classification performance: Macro F1 calculates the mean F1 score across all classes, treating each class equally and Micro F1 aggregates contributions of all classes to compute an average score. The F1 score for each class c is defined as follows:

(F {1)}_{c} = 2 \times \frac{P r e c i s i o n_{c} \times {R e c a l l}_{c}}{{P r e c i s i o n}_{c} + {R e c a l l}_{c}}

(23)

Matthews Correlation Coefficient (MCC) [52]: Provides a balanced measure even when classes are imbalanced:

M C C = \frac{T P \times T N - F P \times F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}

(24)

where TP, TN, FP, and FN are true positives, true negatives, false positives, and false negatives, respectively.

Mean Squared Error (MSE) and Mean Absolute Error (MAE) [53]: Used for error-based evaluation, capturing the magnitude of classification errors:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}, M S E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(25)

Precision-Recall and ROC Curves [50]: These curves, along with their AUC values, help in understanding the model’s performance in distinguishing between classes.

5.4. Results on Benchmark Datasets

The proposed time-aware short text classification model was evaluated using three benchmark datasets: the Ireland-news-headlines dataset, Sentiment140 dataset and 20 News Groups dataset. Three dynamic embedding techniques—Aggregate-based Embedding Fusion, Time-weighted Embedding Fusion, and Self-Attention with Temporal Positional Encoding—were employed to assess the model’s robustness in tracking semantic drift and handling time-sensitive tasks. The results, as shown in Table 2, demonstrate that the proposed approach achieves consistently high performance across all datasets, with accuracy reaching up to 92% for Ireland-news-headlines and Sentiment140 datasets, and 85% for 20 News Groups dataset in the best-performing configurations respectively.

For the Ireland-news-headlines dataset, the model achieves its highest performance with Self-Attention with Temporal Positional Encoding and Aggregate-based Embedding Fusion, achieving an accuracy of 92%. This result underscores the effectiveness of integrating temporal information with attention mechanisms and demonstrates the model’s ability to capture both local and global dependencies. The corresponding F1 Macro and Micro scores, at 90% and 92% respectively, further validate the model’s reliability in handling class imbalances while ensuring consistent performance. Metrics such as Cohen’s Kappa and Matthews Correlation Coefficient (MCC), both at 0.89, confirm the robustness of the model’s predictions. On the other hand, the Time-weighted Embedding Fusion technique performs slightly lower, achieving 91% accuracy, which suggests that its effectiveness may depend on the specific characteristics of the dataset.

On the Sentiment140 dataset, the Time-weighted Embedding Fusion technique outperforms the other approaches, achieving an accuracy of 92%, alongside F1 Macro and Micro scores of 91% and 92%, respectively. This result highlights the importance of time-weighted embeddings in sentiment analysis tasks, where language usage and tone evolve rapidly. While Self-Attention with Temporal Positional Encoding demonstrates slightly lower accuracy (89%), its consistent performance underscores its adaptability to noisy, real-world datasets. In comparison, the Aggregate-based Embedding Fusion approach achieves an accuracy of 87%, showing its effectiveness in more straightforward time-aware tasks but its limitations in more dynamic language contexts.

For the 20 News Groups dataset, the model exhibits a wider performance gap across different techniques. The Time-weighted Embedding Fusion method achieves the highest accuracy of 85%, highlighting its ability to capture temporal dependencies in multi-class text classification. Self-Attention with Temporal Positional Encoding follows closely with 82% accuracy, demonstrating its robustness in handling longer text structures and diverse topics. However, the Aggregate-based Embedding Fusion approach performs significantly lower (73% accuracy), indicating its limitations in adapting to the highly varied textual content present in this dataset. The observed trends suggest that time-aware representations are crucial for effectively classifying diverse topics while maintaining temporal consistency.

The learning curves presented in Figure 4 further illustrate the stability and efficiency of the proposed model during training. For the Ireland-news-headlines dataset, the accuracy curve shows a steady increase over epochs, converging at approximately 92% (Figure 4a), while the loss decreases consistently (Figure 4b). The alignment between training and validation losses demonstrates that the model effectively learns without overfitting. Similarly, for the Sentiment140 dataset, the accuracy curve exhibits a smooth upward trend, reaching around 92% (Figure 4c). In contrast, the loss curve (Figure 4d) decreases despite minor fluctuations, which can be attributed to the noisy nature of the sentiment-based text. For the 20 News Groups dataset, however, the learning curves exhibit a different pattern due to the greater textual diversity across multiple topics. The accuracy curve steadily improves, converging at 85% (Figure 4e) with Time-weighted Embedding Fusion, while the loss curve (Figure 4f) follows a consistent downward trajectory. However, compared to the other datasets, training progresses more gradually, indicating that the model requires more epochs to effectively generalize across the dataset’s longer and more varied textual content. The stability of the validation loss further supports the model’s ability to capture temporal dependencies while handling diverse topics.

Figure 5 presents the ROC curves and precision-recall curves, providing deeper insights into the model’s classification performance. For the Ireland-news-headlines dataset, the ROC curve indicates near-perfect classification with an AUC approaching 1.0, validating the model’s ability to distinguish between classes effectively. The precision-recall curve reinforces this result, showing a strong balance between precision and recall, particularly in handling time-sensitive categories. For the Sentiment140 dataset, while the ROC curve remains strong, the AUC is slightly lower due to the inherent complexity and variability of sentiment data. Nevertheless, the precision-recall curve demonstrates that the model maintains reliable predictions despite the noisy nature of the sentiment-labeled text. For the 20 News Groups dataset, the ROC and precision-recall curves reveal the model’s ability to generalize across diverse topics. The ROC curve shows an AUC close to 1.0 for most categories, indicating high separability. However, certain topics exhibit slight variations, suggesting that some categories have overlapping textual patterns. The precision-recall curve displays variability across different topics, with some classes maintaining high precision, while others show a gradual decline in recall, reflecting the challenges posed by long-form, multi-domain text classification. These results highlight the effectiveness of the proposed model in handling diverse textual structures and evolving semantic patterns over time.

The correlation matrices in Figure 6 highlight the alignment between predicted and true classes. For the Ireland-news-headlines dataset (Figure 6a), the matrix shows clear diagonal dominance, indicating that the model consistently predicts the correct class with high confidence. Misclassifications are minimal, reflecting the model’s precision in handling evolving semantic meanings. For the Sentiment140 dataset (Figure 6b), the correlation matrix also exhibits strong diagonal dominance, with only minor misclassifications observed. These errors occur primarily between semantically similar classes, which is expected in sentiment classification tasks due to subtle variations in word tone and usage. For the 20 News Groups dataset (Figure 6c), the matrix illustrates the model’s effectiveness in multi-class classification across diverse topics. While most categories exhibit strong diagonal patterns, indicating high classification accuracy, some categories, particularly those related to technology, religion, and politics, show slight misclassifications. This suggests some semantic overlap among these topics, where textual similarities may lead to ambiguities in classification. Despite this, the model demonstrates robust generalization across multiple domains, reinforcing its ability to adapt to heterogeneous text categories with temporal variations.

The true versus predicted class distribution in Figure 7 further validates the model’s ability to preserve class balance while maintaining high classification accuracy. For the Ireland-news-headlines dataset (Figure 7a), the predicted class distributions closely align with the true labels, demonstrating the model’s robustness in tracking evolving language semantics and ensuring accurate classification across diverse news categories. For the Sentiment140 dataset (Figure 7b), the distribution remains highly consistent, with minimal deviation between true and predicted classes. While minor discrepancies exist, particularly in sentiment polarity shifts, these are expected due to the dataset’s complex and evolving sentiment expressions. For the 20 News Groups dataset (Figure 7c), the model effectively captures multi-topic classification, as indicated by the similarity between true and predicted class distributions. However, slight variations occur in certain categories such as rec.autos and soc.religion.christian, suggesting that some topics exhibit semantic overlap that challenges precise categorization. Despite these differences, the overall distribution confirms the model’s generalization capability across diverse textual domains while maintaining high classification reliability.

The above results confirm the effectiveness and robustness of the proposed time-aware short-text classification model across all used benchmark datasets in capturing semantic drift and evolving language usage. The integration of Self-Attention with Temporal Positional Encoding and Time-weighted Embedding Fusion proves particularly effective, achieving high accuracy across news categorization, sentiment analysis, and multi-topic classification. The model’s strong performance across accuracy, F1 scores, Cohen’s Kappa, MCC, and AUC highlights its reliability and adaptability to time-sensitive classification tasks, outperforming static approaches.

5.5. Results on Lager Temporal Gap

To evaluate the model’s ability to handle long-term semantic drift, we conducted an experiment on the Ireland-news-headlines dataset, using 1996–2008 as the training period and 2009–2012 as the test period. This setup introduced a significant temporal gap, allowing us to assess the model’s generalization performance across evolving language patterns.

Among the dynamic embedding techniques, Time-weighted Embedding Fusion achieved the best performance, with an accuracy of 91%, F1 Macro of 89%, and Cohen’s Kappa of 0.87, as shown in Table 3. These results confirm that the model effectively generalizes across different time periods despite evolving linguistic trends. The MCC score of 0.87 further reinforces its consistency and reliability in maintaining robust classification decisions even in the presence of significant language shifts.

The Mean Squared Error (MSE) of 0.92 and Mean Absolute Error (MAE) of 0.27 indicate that while the classification model remains highly effective, there is a slight increase in error metrics compared to shorter temporal gaps, which is expected due to the natural evolution of language. The learning curves (Figure 8a,b) demonstrate a stable convergence pattern, with the model gradually improving its accuracy while reducing loss over epochs. Furthermore, the ROC curves (Figure 8c) indicate strong classification performance across all categories, with an AUC exceeding 0.96 for each class, demonstrating a well-calibrated predictive capability.

The precision-recall curves (Figure 8d) further support the effectiveness of the model in distinguishing between different categories, with high AUC values ensuring that the model maintains strong precision-recall trade-offs. The ability to retain high classification accuracy, even when encountering extended time intervals, highlights the effectiveness of the time-weighted embedding approach in mitigating the effects of semantic drift. By dynamically adjusting word importance over time and incorporating both recent and historical language patterns, this method enhances the model’s adaptability to evolving textual distributions.

5.6. Comparison with Well-Known Temporal Models

The comparison between our Time-Aware Sentence Classification Model and various time-aware and traditional classification models [25,80], as presented in Table 4, highlights the effectiveness of our approach in handling temporal text classification. The results demonstrate that our model, particularly when utilizing dynamic word embeddings, achieves state-of-the-art accuracy across multiple datasets, outperforming several established temporal models.

Among the time-aware classification models, RoBERTa with date as text achieved 87.84% on Ireland-news-headlines and 91.13% on Sentiment140, leveraging date metadata as textual input to enhance its temporal awareness. Other RoBERTa-based approaches, such as RoBERTa with added embeddings and RoBERTa with stacked embeddings, also showed competitive performance, demonstrating that incorporating time-related embeddings contributes to classification accuracy [25]. Similarly, Text GloVe with Triples (86.9%) and Text BERT with Triples (77.7%) on 20 News Group further emphasizes the importance of encoding temporal aspects in text representations [80].

In contrast, traditional/static classification models exhibited inferior performance due to their lack of temporal modeling. RoBERTa without date saw a notable drop in accuracy (82.35% on Ireland-news-headlines, 89.27% on Sentiment140), reinforcing the necessity of temporal integration. The Naïve Bayes classifier (85.00%) and Most Frequent Class Baseline (51.10%) underperformed, confirming that static models struggle with evolving semantics. While the Hybrid Stacked Ensemble Model achieved 99.00% on Sentiment140, its performance on other datasets was not reported, limiting its comparative insight [25].

Our proposed Time-Aware Sentence Classification Model consistently outperformed these baselines, achieving 92.00% accuracy on both Ireland-news-headlines and Sentiment140 and 85.00% on 20 News Group when using dynamic word embeddings. This confirms the superiority of time-aware embeddings in learning evolving semantics over time. In contrast, when using traditional/static embeddings, our model’s accuracy was 85.6% on Ireland-news-headlines, 82.1% on Sentiment140, and 77.0% on 20 News Group, further validating that temporal dynamics play a crucial role in classification accuracy. The following are the key important points of our proposed model compared with the well-known approaches:

Temporal Awareness Significantly Improves Classification: The substantial gap between RoBERTa without date and time-aware models emphasizes the necessity of incorporating temporal information into text classification.
Dynamic Word Embeddings Enhance Performance: Our model’s superior accuracy with dynamic embeddings confirms that capturing semantic drift over time is essential for effective classification.
Generalizability Across Datasets: Unlike certain models, such as Hybrid Stacked Ensemble, which perform well on Sentiment140 but lack evaluations on multiple datasets, our approach maintains high accuracy across diverse text domains.
Comparison with Well-Known Work: Our model consistently outperforms existing temporal classification approaches, demonstrating the effectiveness of direct time-aware embedding integration rather than treating time as an external input feature.

5.7. Application to News Categorization and Trend Analysis

To demonstrate the effectiveness of the proposed time-aware sentence classification model for time-sensitive tasks, we applied it to a compiled news dataset specifically designed for categorization and trend analysis, as presented in Table 5. The dataset consists of 25 samples distributed evenly across five categories: News, Opinion, Sport, Business, and Culture. Each sample includes a timestamp and a corresponding headline, reflecting real-world events and trends between January 2020 and May 2020. The headlines span diverse themes, such as economic stimulus measures, climate policies, ethical debates surrounding AI, remote work trends, championship victories, stock market developments, and the digital transformation of cultural events during the pandemic.

The model (with Time-weighted Embedding Fusion) successfully classified the dataset into its respective categories, achieving exceptional performance across all evaluation metrics. As shown in Figure 9a, the per-class precision, recall, and F1-scores consistently reached near-perfect values of 1.0, indicating the model’s ability to effectively differentiate between categories. This strong performance demonstrates the model’s capacity to capture both content and temporal nuances, enabling accurate classifications even in time-sensitive contexts. The alignment between true and predicted labels, illustrated in Figure 9b, further highlights the reliability of the model, with no significant misclassifications observed.

The confusion matrix presented in Figure 10a confirms these findings, as all predictions align perfectly with the true labels, underscoring the model’s robustness in distinguishing semantically similar categories such as News and Business or Opinion and Culture. Additionally, Figure 10b showcases the prediction probability distribution, revealing high confidence scores for correct predictions and minimal uncertainty across all classes. This result emphasizes the model’s ability to generalize effectively, even with a relatively small dataset.

The learning curve in Figure 11a illustrates a steady and consistent improvement in both training and validation accuracy over the epochs, converging at near-optimal levels without signs of overfitting. Simultaneously, the loss curve in Figure 11b shows a smooth and continuous decline in training and validation loss, further validating the model’s stability and capacity to generalize well to unseen data.

The results also highlight the model’s ability to adapt to evolving semantics and capture temporal dynamics. For instance, terms like “remote work” and “AI adoption” reflect societal shifts during the pandemic, enabling the model to distinguish these trends accurately within relevant categories such as Opinion or Business. This capability makes the model particularly suitable for downstream applications, including tracking trends in public sentiment, identifying business developments, and analyzing cultural adaptations over time.

In summary, the application of the proposed time-aware sentence classification model to this dataset demonstrates its effectiveness in handling time-sensitive text classification tasks. The model achieves near-perfect accuracy and exhibits robust generalization, successfully integrating temporal dynamics into its predictions. These results validate the model’s utility for real-world applications such as news categorization, trend analysis, and historical text classification, where understanding the temporal evolution of language is essential for capturing meaningful insights.

6. Conclusions

This paper introduced a novel time-aware sentence classification model that incorporates temporal dynamics to address the limitations of traditional static models in natural language processing. By explicitly integrating temporal information into the word embedding process, the proposed model effectively tracks and adapts to evolving language semantics over time. Leveraging a hybrid architecture that combines Convolutional Neural Networks (CNNs) and Bidirectional Long Short-Term Memory (BiLSTM) networks, enriched with attention mechanisms, the model captures both local and global dependencies while maintaining focus on the most relevant parts of the text.

The proposed approach was validated through rigorous experimentation on benchmark datasets and applied to real-world tasks such as news categorization and trend analysis. The results demonstrated superior performance compared to well-known temporal and static models, achieving high accuracy, precision, and F1 scores. Specifically, the model achieved a classification accuracy of 92% in benchmark experiments, significantly outperforming traditional baselines and comparable time-aware approaches. The model’s ability to capture temporal semantics was further highlighted in its application to news categorization, where it accurately classified headlines across multiple time-sensitive categories and provided meaningful insights into evolving trends.

The key contributions of this study include the introduction of Temporal-Aware Word Embeddings, the design of a hybrid CNN-BiLSTM architecture, and a dynamic learning process that integrates fine-tuned pre-trained embeddings and synonym-based data augmentation. These innovations ensure that the model adapts to semantic drift and maintains relevance as language evolves, making it suitable for various time-sensitive NLP tasks.

By addressing the challenges of temporal dynamics and semantic evolution, this work advances the field of NLP and provides a comprehensive framework for time-aware sentence classification. The proposed model opens new avenues for applications in trend analysis, historical text classification, and sentiment tracking, where understanding the temporal evolution of language is critical. Future work will focus on expanding the model to larger datasets, incorporating domain-specific temporal features, and exploring its applicability to multilingual and cross-lingual contexts. In conclusion, the proposed time-aware sentence classification model effectively bridges the gap between static text representations and evolving language semantics. It offers a robust and adaptable solution for time-sensitive NLP tasks, paving the way for deeper insights into the dynamic nature of language and its impact on modern text classification challenges.

Author Contributions

K.A.: Conceptualization; K.A. and G.A.-D.: Data curation; K.A.: Formal analysis; Funding acquisition; Investigation; Methodology; K.A. and A.A.M.: Project administration; Resources; Software; K.A., A.A.M. and G.A.-D.: Supervision; Validation; Visualization; Roles/Writing—original draft; Writing—review & editing. All authors have read and agreed to the published version of the manuscript.

Funding

The research leading to these results has received funding from the Mohammed Bin Rashid Smart Learning Program, UAE, under the Funding Agreement No: MBRSLP/02/23.

Data Availability Statement

The data that has been used is publicly available.

Acknowledgments

We are grateful to the American University of Ras Al Khaimah (AURAK) for its valuable support.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Liu, B. Sentiment Analysis and Opinion Mining; Springer Nature: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Mayur, W.; Annavarapu, C.S.R.; Chaitanya, K. A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 2022, 55, 5731–5780. [Google Scholar]
Khaled, A.; Aysha, A.S. Experimental results on customer reviews using lexicon-based word polarity identification method. IEEE Access 2020, 8, 179955–179969. [Google Scholar]
Bogery, R.; Nora, A.B.; Nida, A.; Nada, A.; Yara, A.H.; Irfan, U.K. Automatic Semantic Categorization of News Headlines using Ensemble Machine Learning: A Comparative Study. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 689–696. [Google Scholar] [CrossRef]
Agarwal, J.; Sharon, C.; Aditya, P.; Anand, K.M.; Guru, P. Machine Learning Application for News Text Classification. In Proceedings of the 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 19–20 January 2023; pp. 463–466. [Google Scholar]
Sukhramani, K.; Harshika, K.; Akhtar, R.; Abhishek, J. Binary Classification of News Articles using Deep Learning. In Proceedings of the IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 24–25 February 2024; pp. 1–9. [Google Scholar]
Hiremath, B.N.; Malini, M.P. Enhancing Optimized Personalized Therapy in Clinical Decision Support System using Natural Language Processing. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 2840–2848. [Google Scholar] [CrossRef]
Eguia, H.; Carlos, L.S.-B.; Franco, V.; Fernando, A.-L.; Francesc, S.-R. Clinical Decision Support and Natural Language Processing in Medicine: Systematic Literature Review. J. Med. Internet Res. 2024, 26, 39348889. [Google Scholar] [CrossRef]
Palanivinayagam, A.; Claude, Z.E.-B.; Robertas, D. Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review. Algorithms 2023, 16, 236. [Google Scholar] [CrossRef]
Sun, X.; Xiaoya, L.; Jiwei, L.; Fei, W.; Shangwei, G.; Tianwei, Z.; Guoyin, W. Text classification via large language models. arXiv 2023, arXiv:2305.08377. [Google Scholar]
Duarte, J.M.; Berton, L. A review of semi-supervised learning for text classification. Artif. Intell. Rev. 2023, 56, 9401–9469. [Google Scholar] [CrossRef]
Zhao, H.; Haihua, C.; Thomas, A.R.; Yunhe, F.; Debjani, S.; Hong-Jun, Y. Improving Text Classification with Large Language Model-Based Data Augmentation. Electronics 2024, 13, 2535. [Google Scholar] [CrossRef]
Kamal, T.; Paul, D.Y.; Chan, Y.; Dirar, H.; Aya, T. A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights. Comput. Sci. Rev. 2024, 54, 100664. [Google Scholar]
Wu, Y.; Jun, W. A survey of text classification based on pre-trained language model. Neurocomputing 2025, 616, 128921. [Google Scholar] [CrossRef]
Hyojung, K.; Minjung, P. Discovering fashion industry trends in the online news by applying text mining and time series regression analysis. Heliyon 2023, 9, 2405–8440. [Google Scholar]
Hardin, G. Disinformation; Misinformation, and Fake News: The Latest Trends and Issues in Research. In Encyclopedia of Libraries, Librarianship, and Information Science, 1st ed.; Academic Press: Cambridge, MA, USA, 2025; pp. 519–530. [Google Scholar]
Hu, T.; Siqin, W.; Wei, L.; Mengxi, Z.; Xiao, H.; Yingwei, Y.; Regina, L. Revealing public opinion towards COVID-19 vaccines with Twitter data in the United States: Spatiotemporal perspective. J. Med. Internet Res. 2021, 23, e30854. [Google Scholar] [CrossRef] [PubMed]
Sun, G.; Cheng, Y.; Dong, F.; Wang, L.; Zhao, D.; Zhang, Z.; Tong, X. Multi-Label Text Classification model integrating Label Attention and Historical Attention. Knowl.-Based Syst. 2024, 296, 111878. [Google Scholar] [CrossRef]
Ren, H.; Wang, H.; Zhao, Y.; Ren, Y. Time-Aware Language Modeling for Historical Text Dating. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP-2023), Singapore, 6–10 December 2023. [Google Scholar]
Khan, S.U.R.; Muhammd, A.I.; Muhammad, A.; Muhammad, A.I. Temporal specificity-based text classification for information retrieval. Turk. J. Electr. Eng. Comput. Sci. 2018, 26, 2916–2927. [Google Scholar] [CrossRef]
He, Y.; Li, J.; Song, Y.; He, M.; Peng, H. Time-evolving Text Classification with Deep Neural Networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Rabab, A.; Elena, K.; Arkaitz, Z. Building for tomorrow: Assessing the temporal persistence of text classifiers. Inf. Process. Manag. 2023, 60, 103200. [Google Scholar]
Salles, T.; Leonardo, R.; Gisele, L.P.; Fernando, M.; Wagner, M.J.; Marcos, G. Temporally-aware algorithms for document classification. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, 19–23 July 2010. [Google Scholar]
Raneen, Y.; Abdul, H.; Zahra, A. MTS2Graph: Interpretable multivariate time series classification with temporal evolving graphs. Pattern Recognit. 2025, 152, 110486. [Google Scholar]
Pokrywka, J.; Filip, G. Temporal language modeling for short text document classification with transformers. In Proceedings of the 2022 17th Conference on Computer Science and Intelligence Systems (FedCSIS), Sofia, Bulgaria, 4–7 September 2022; pp. 121–128. [Google Scholar]
Santosh, T.; Vuong, T.; Grabmair, M. Time-aware Incremental Training for Temporal Generalization of Legal Classification Tasks. arXiv 2024, arXiv:2405.14211. [Google Scholar]
Chen, X.; Qiu, P.; Zhu, W.; Li, H.; Wang, H.; Sotiras, A.; Wang, Y.; Razi, A. TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning. arXiv 2024, arXiv:2405.03140. [Google Scholar]
Frank, E.; Bouckaert, R. Naive Bayes for Text Classification with Unbalanced Classes. In Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases, Berlin, Germany, 18–22 September 2006; Volume 4213. [Google Scholar]
Jiang, L.; Li, C.; Wang, S.; Zhang, L. Deep feature weighting for naive Bayes and its application to text classification. Eng. Appl. Artif. Intell. 2016, 52, 26–39. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Ramesh, B.; Sathiaseelan, J.G.R. An Advanced Multi Class Instance Selection based Support Vector Machine for Text Classification. Procedia Comput. Sci. 2015, 57, 1124–1130. [Google Scholar] [CrossRef]
Parashjyoti, B.; Deepak, G.; Barenya, B.H. ConCave-Convex procedure for support vector machines with Huber loss for text classification. Comput. Electr. Eng. 2025, 122, 109925. [Google Scholar]
Wu, J. Introduction to Convolutional Neural Networks; National Key Lab for Novel Software Technology, Nanjing University: Nanjing, China, 2017; Volume 5, p. 495. [Google Scholar]
Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017. [Google Scholar]
Zhao, Z.; Wu, Y. Attention-Based Convolutional Neural Networks for Sentence Classification. In Proceedings of the Interspeech, San Francisco, CA, USA, 8–12 September 2016; Volume 8, pp. 705–709. [Google Scholar]
Vieira, J.; Paulo, A.; Raimundo, S.M. An analysis of convolutional neural networks for sentence classification. In Proceedings of the 2017 XLIII Latin American Computer Conference (CLEI), Córdoba, Argentina, 4–8 September 2017. [Google Scholar]
Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
Tarwani, K.M.; Swathi, E. Survey on recurrent neural network in natural language processing. Int. J. Eng. Trends Technol. 2016, 48, 301–304. [Google Scholar] [CrossRef]
Tomas, M.; Kai, C.; Greg, C.; Jeffrey, D. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Pennington, J.; Richard, S.; Christopher, D.M. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
Giulianelli, M.; Marco, D.T.; Raquel, F. Analysing lexical semantic change with contextualised word representations. arXiv 2020, arXiv:2004.14118. [Google Scholar]
Bhuwan, D.; Jeremy, R.C.; Julian, M. Time-Aware Language Models as Temporal Knowledge Bases. Trans. Assoc. Comput. Linguist. 2022, 10, 257–273. [Google Scholar]
Rosin, G.D.; Ido, G.; Kira, R. Time masking for temporal language models. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Tempe, AZ, USA, 21–25 February 2022; pp. 833–841. [Google Scholar]
Rosin, G.D.; Kira, R. Temporal attention for language models. arXiv 2022, arXiv:2202.02093. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL 2019, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://openai.com/research/language-unsupervised (accessed on 10 February 2025).
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 5754–5764. [Google Scholar]
Schuster, M.; Kuldip, K.P. Bidirectional Recurrent Neural Networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Grisoni, F.; Michael, M.; Robin, L.; Gisbert, S. Bidirectional molecule generation with recurrent neural networks. J. Chem. Inf. Model. 2020, 60, 1175–1183. [Google Scholar] [CrossRef]
Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
Willmott, C.J.; Kenji, M. Advantages of the Mean Absolute Error (MAE) Over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
Sheng, L.; Lizhen, X. Topic Classification Based on Improved Word Embedding. In Proceedings of the 14th Web Information Systems and Applications Conference (WISA), Liuzhou, China, 11–12 November 2017; pp. 117–121. [Google Scholar]
Luo, W. Research and Implementation of Text Topic Classification Based on Text CNN. In Proceedings of the 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China, 20–22 May 2022; pp. 1152–1155. [Google Scholar]
Ding, H.; Yang, J.; Deng, Y.; Zhang, H.; Roth, D. Towards Open-Domain Topic Classification. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, Seattle, WA, USA, 10–15 July 2022; pp. 90–98. [Google Scholar]
Ghazala, N.; Muhammad, M.K.; Muhammad, Y.; Bushra, Z.; Muhammad, K.H. Email spam detection by deep learning models using novel feature selection technique and BERT. Egypt. Inform. J. 2024, 26, 100473. [Google Scholar]
Maged, N.; Faisal, S.; Aminu, D.; Abdulaziz, A.; Mohammed, A.-S. Topic-aware neural attention network for malicious social media spam detection. Alex. Eng. J. 2025, 111, 540–554. [Google Scholar]
Mariano, M.; Fernando, D.; Fernando, T.; Ana, M.; Evangelos, M. Detecting ongoing events using contextual word and sentence embeddings. Expert Syst. Appl. 2022, 209, 118257. [Google Scholar]
Elena, M.; Eva, W. Temporal construal in sentence comprehension depends on linguistically encoded event structure. Cognition 2025, 254, 105975. [Google Scholar]
Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the EMNLP, Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar]
Nguyen, A.; Srijeet, C.; Sven, W.; Leo, S.; Martin, M.; Bjoern, E. Time matters: Time-aware lstms for predictive business process monitoring. In Proceedings of the Process Mining Workshops: ICPM 2020 International Workshops, Padua, Italy, 5–8 October 2020; pp. 112–123. [Google Scholar]
Graves, A.; Navdeep, J.; Abdel-rahman, M. Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, 8–12 December 2013; pp. 273–278. [Google Scholar]
Bahdanau, D. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Raffel, C.; Noam, S.; Adam, R.; Katherine, L.; Sharan, N.; Michael, M.; Yanqi, Z.; Wei, L.; Peter, J.L. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Luo, M.; Xue, B.; Niu, B. A comprehensive survey for automatic text summarization: Techniques, approaches and perspectives. Neurocomputing 2024, 603, 128280. [Google Scholar] [CrossRef]
Huang, Y.; Bai, X.; Liu, Q.; Peng, H.; Yang, Q.; Wang, J. Sentence-level sentiment classification based on multi-attention bidirectional gated spiking neural P systems. Appl. Soft Comput. 2024, 152, 111231. [Google Scholar] [CrossRef]
Liu, Y. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692 364. [Google Scholar]
Röttger, P.; Janet, B.P. Temporal adaptation of BERT and performance on downstream document classification: Insights from social media. arXiv 2021, arXiv:2104.08116. [Google Scholar]
Yao, Z.; Yifan, S.; Weicong, D.; Nikhil, R.; Hui, X. Dynamic word embeddings for evolving semantic discovery. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Los Angeles, CA, USA, 5–9 February 2018; pp. 673–681. [Google Scholar]
Bamler, R.; Stephan, M. Dynamic word embeddings. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 380–389. [Google Scholar]
Turner, R.E. An introduction to transformers. arXiv 2023, arXiv:2304.10557. [Google Scholar]
Reimers, N. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
Gururangan, S.; Ana, M.; Swabha, S.; Kyle, L.; Iz, B.; Doug, D.; Noah, A.S. Don’t stop pretraining: Adapt language models to domains and tasks. arXiv 2020, arXiv:2004.10964. [Google Scholar]
Wang, J.; Jatowt, A.; Yoshikawa, M.; Cai, Y. BiTimeBERT: Extending Pre-Trained Language Representations with Bi-Temporal Information. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), Taipei, Taiwan, 23–27 July 2023. [Google Scholar]
Gong, H.; Suma, B.; Pramod, V. Enriching word embeddings with temporal and spatial information. arXiv 2020, arXiv:2010.00761. [Google Scholar]
Tang, X.; Yi, Z.; Danushka, B. Learning dynamic contextualised word embeddings via template-based temporal adaptation. arXiv 2023, arXiv:2208.10734. [Google Scholar]
Go, A.; Richa, B.; Lei, H. Twitter Sentiment Classification Using Distant Supervision; S224N Project Report; Stanford University: Stanford, CA, USA, 2009; Volume 1. [Google Scholar]
Diederik, P.K.; Jimmy, B. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Nagumothu, D.; Peter, W.E.; Bahadorreza, O.; Reda, B.M. Linked Data Triples Enhance Document Relevance Classification. Appl. Sci. 2021, 11, 6636. [Google Scholar] [CrossRef]

Figure 1. Time-aware sentence classification model.

Figure 2. Temporal-Aware Word Embeddings.

Figure 3. Benchmark Datasets Distribution: (a) Ireland-news-headlines, (b) Sentiment140, (c) 20 News Groups.

Figure 4. Performance (Learning Curves) of the proposed model on three datasets: (a) accuracy for Ireland-news-headlines, (b) loss for Ireland-news-headlines, (c) accuracy for Sentiment140, (d) loss for Sentiment140, and (e) accuracy for 20 News Groups and (f) loss for 20 News Groups.

Figure 5. Performance (ROC Curves) of the proposed model on two datasets: (a) ROC for Ireland-news-headlines, (b) precision-recall for Ireland-news-headlines, (c) ROC for Sentiment140, (d) precision-recall for Sentiment140, (e) ROC for 20 News Groups and (f) precision-recall for 20 News Groups.

Figure 6. Performance (Correlation matrices) of the proposed model on three datasets: (a) Ireland-news-headlines, (b), Sentiment140 and (c) 20 News Groups.

Figure 7. Performance (True vs Predicted) of the proposed model on three datasets: (a) Ireland-news-headlines, (b), Sentiment140 and (c) 20 News Groups.

Figure 8. Performance (Learning Curves (a,b) & ROC Curves (c,d)) of the proposed model on the Ireland-news-headlines dataset with larger temporal gap.

Figure 9. Performance of the proposed model on the compiled news categorization dataset: (a) per-class precision, recall, and F-score, and (b) True vs Predicted results.

Figure 10. Performance of the proposed model on the compiled news categorization dataset: (a) confusion matrix, and (b) prediction probability distribution by class.

Figure 11. Performance of the proposed model on the compiled news categorization dataset: (a) learning curve, and (b) loss curve.

Table 1. Statistical Information for Benchmark Datasets.

Dataset	Sentences	Time Span	Avg. Sentence Length (Tokens/Words)	Labels
Ireland-news-headlines	1,610,000	1996–2021	15	news, culture, opinion, business, sport, lifestyle
Sentiment140	1,600,000	2009–2020	13	Positive, Negative
20 News Groups	18,846	1995–2000	150	20 domain-agnostic topic categories

Table 2. Performance results (%) of the proposed time-aware classification model on the used datasets.

Dynamic Embedding Techniques	Acc	CK	F1 Macro	F1 Micro	MCC	MSE	MAE
Ireland-news-headlines dataset
Aggregate-based Embedding Fusion	0.92	0.89	0.90	0.92	0.89	0.75	0.23
Time-weighted Embedding Fusion	0.91	0.88	0.89	0.91	0.88	0.83	0.25
Self-Attention with Temporal Positional Encoding	0.92	0.89	0.90	0.92	0.89	0.75	0.23
Sentiment140 dataset
Aggregate-based Embedding Fusion	0.87	0.74	0.87	0.87	0.74	0.13	0.13
Time-weighted Embedding Fusion	0.92	0.81	0.91	0.92	0.80	0.09	0.09
Self-Attention with Temporal Positional Encoding	0.89	0.78	0.87	0.89	0.78	0.11	0.11
20 News Group dataset
Aggregate-based Embedding Fusion	0.73	0.71	0.71	0.73	0.72	11.5	1.30
Time-weighted Embedding Fusion	0.85	0.85	0.85	0.85	0.85	3.2	0.80
Self-Attention with Temporal Positional Encoding	0.82	0.81	0.82	0.82	0.82	6.3	0.97

Table 3. Performance results (%) of the proposed time-aware classification model with larger temporal gap on the Ireland-news-headlines datasets.

Dynamic Embedding Techniques	Acc	CK	F1 Macro	F1 Micro	MCC	MSE	MAE
Ireland-news-headlines dataset
Time-weighted Embedding Fusion	0.91	0.87	0.89	0.91	0.87	0.92	0.27

Table 4. Performance results (% accuracy) of proposed time-aware classification model and baselines.

Approaches	System	Accuracy (%)	Datasets
Time-Aware Classification	RoBERTa with date as text [25]	87.84	Ireland-news-headlines
	RoBERTa with date as text [25]	91.13	Sentiment140
	RoBERTa with added embeddings [25]	86.82	Ireland-news-headlines
	RoBERTa with added embeddings [25]	91.04	Sentiment140
	RoBERTa with stacked embeddings [25]	87.65	Ireland-news-headlines
	RoBERTa with stacked embeddings [25]	91.1	Sentiment140
	Text GloVe with Triples [80]	86.9	20 News Group
	Text BERT with Triples [80]	77.7	20 News Group
Traditional/Static Classification	RoBERTa without date [25]	82.35	Ireland-news-headlines
	RoBERTa without date [25]	89.27	Sentiment140
	Most frequent class baseline [25]	51.10	Ireland-news-headlines
	Most frequent class baseline [25]	49.88	Sentiment140
	Naive Bayes Classifier [25]	85.00	Sentiment140
	Hybrid Stacked Ensemble Model [25]	99.00	Sentiment140
Proposed Classification Model	Time-Aware Sentence Classification Model with traditional/static word embeddings	85.6	Ireland-news-headlines
		82.1	Sentiment140
		77.00	20 News Group
	Time-Aware Sentence Classification Model with dynamic word embeddings	92.00	Ireland-news-headlines
		92.00	Sentiment140
		85.00	20 News Group

Table 5. A compiled dataset for news categorization and trend analysis task.

Timestamp	News Headlines	Category
1 January 2020	Government announces economic stimulus package.	News
8 January 2020	World leaders meet to discuss global climate goals.	News
15 January 2020	New policies aim to stabilize economic growth.	News
22 January 2020	Climate crisis takes center stage at global summit.	News
29 January 2020	Global literacy programs gain global traction.	News
1 June 2020	Remote work should remain post-pandemic.	Opinion
10 June 2020	Is technology making us more disconnected from reality?	Opinion
17 June 2020	The future of work: Balancing remote and in-office setups.	Opinion
24 June 2020	Ethical concerns in AI dominate opinion columns.	Opinion
1 July 2020	Healthcare debates spark discussions on innovation.	Opinion
5 February 2020	Football team wins championship after dramatic final.	Sport
12 February 2020	Olympics postponed amid global uncertainty.	Sport
19 February 2020	Local soccer team secures thrilling playoff victory.	Sport
26 February 2020	World championships highlight resilience amid challenges.	Sport
4 March 2020	Global sports events thrive with new safety measures.	Sport
11 March 2020	Tech industry sees surge in remote collaboration tools.	Business
18 March 2020	Stock market rallies as companies report strong earnings.	Business
25 March 2020	E-commerce giants report record profits in Q2 earnings.	Business
1 April 2020	Tech startups lead innovation in green energy solutions.	Business
8 April 2020	Small businesses adopt AI for competitive edge.	Business
22 April 2020	Art exhibition explores human connection in isolation.	Culture
29 April 2020	Digital concerts gain popularity among younger audiences.	Culture
6 May 2020	Local artists adapt to virtual mediums during pandemic.	Culture
13 May 2020	Virtual reality performances redefine cultural experiences.	Culture
20 May 2020	Museum curates digital exhibitions for global audiences.	Culture

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abdalgader, K.; Matroud, A.A.; Al-Doboni, G. Temporal Dynamics in Short Text Classification: Enhancing Semantic Understanding Through Time-Aware Model. Information 2025, 16, 214. https://doi.org/10.3390/info16030214

AMA Style

Abdalgader K, Matroud AA, Al-Doboni G. Temporal Dynamics in Short Text Classification: Enhancing Semantic Understanding Through Time-Aware Model. Information. 2025; 16(3):214. https://doi.org/10.3390/info16030214

Chicago/Turabian Style

Abdalgader, Khaled, Atheer A. Matroud, and Ghaleb Al-Doboni. 2025. "Temporal Dynamics in Short Text Classification: Enhancing Semantic Understanding Through Time-Aware Model" Information 16, no. 3: 214. https://doi.org/10.3390/info16030214

APA Style

Abdalgader, K., Matroud, A. A., & Al-Doboni, G. (2025). Temporal Dynamics in Short Text Classification: Enhancing Semantic Understanding Through Time-Aware Model. Information, 16(3), 214. https://doi.org/10.3390/info16030214

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Temporal Dynamics in Short Text Classification: Enhancing Semantic Understanding Through Time-Aware Model

Abstract

1. Introduction

2. Related Work

2.1. Traditional Approaches to Sentence Classification

2.2. Contextualized Embeddings: BERT, GPT, and XLNet

2.3. Time-Aware Models: Capturing Language Evolution

3. Preliminaries and Technical Details

3.1. Static Sentence Classification

3.2. Problem Formalism: Time-Aware Sentence Classification

3.3. Temporal-Aware Word Embeddings

3.4. Time-Aware Sentence Classification with Hybrid Architecture

3.5. Loss Function

4. Proposed Approach

4.1. Data Processing and Augmentation

4.2. Dynamic Embedding

4.3. Model Architecture

4.4. Attention Mechanism

4.5. Ensemble Learning

4.6. Computation Complexity Analysis

5. Experiments and Results

5.1. Benchmark Datasets

5.2. Experiment Setup

5.3. Evaluation Criteria

5.4. Results on Benchmark Datasets

5.5. Results on Lager Temporal Gap

5.6. Comparison with Well-Known Temporal Models

5.7. Application to News Categorization and Trend Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI