1. Introduction
Deep learning has rapidly transformed a wide range of real-world applications, resulting in demonstrable gains in industries such as telecommunications, finance, healthcare, natural language processing (NLP), and intelligent robotics. Large language models (LLMs), particularly Bidirectional Encoder Representations from Transformers (BERT), are now crucial tools in NLP, allowing context-aware and semantically rich text processing [
1,
2,
3]. Traditional sentiment analysis approaches frequently encounter limitations in interpreting nuanced language, often resulting in misclassification. To address such limitations, models like BERT adopt deep bidirectional architectures, offering improved accuracy over conventional statistical or rule-based methods [
4,
5]. These developments underscore the growing need for advanced text-mining solutions capable of managing large-scale, user-generated datasets such as customer reviews and online feedback [
6]. From an applied perspective, robust sentiment classification of technical review data is increasingly important for engineering monitoring, product improvement, customer feedback analysis, and industrial decision-support systems. This practical relevance aligns well with the growing use of applied artificial intelligence methods for domain-specific text analytics in real-world engineering environments.
As digital platforms generate increasing volumes of user content, the requirement for more adaptable sentiment analysis tools continues to grow. Existing techniques frequently fall short of capturing complex linguistic structures, implicit sentiment cues, or dynamic sentiment patterns in specialized text. In this context, we propose a hybrid framework that combines fractional derivative-based feature extraction, transformer-based models, and ensemble learning. Specifically, we extend our previous BERT + Logistic Regression (BERTLR) model by incorporating a RoBERTa-based variant and a unified ensemble using soft voting. This framework combines the semantic richness of LLMs with the mathematical depth of fractional calculus, offering improved expressiveness for classification tasks [
7,
8]. However, limited attention has been given to integrating fractional mathematical modeling with transformer-based embeddings for sentiment classification in technical and domain-specific settings, where sentiment is often expressed through sparse, context-dependent, and terminology-rich language.
Although transformer models such as BERT and RoBERTa excel in modeling contextual semantics via self-attention, they may not fully capture long-memory effects or non-linear lexical variations inherent in domain-specific language. This limitation is particularly evident when analyzing sentiment-rich engineering feedback, short technical reviews, or ambiguous expressions in which polarity depends not only on semantics but also on subtle term interactions. Fractional calculus offers a rigorous mathematical tool to address this gap by modeling memory-dependent behaviors and capturing long-range dependencies. By transforming conventional TF and TF-IDF features using fractional derivatives, we enrich the representation space before classification. This dual-channel approach, combining deep contextual embeddings with fractional statistical features, enables the model to preserve semantic information while also incorporating non-local lexical dynamics that are often overlooked in standard transformer-only sentiment pipelines.
Ensemble learning has been shown to be an effective approach for increasing prediction robustness by aggregating multiple models. Methods such as bagging, boosting, AdaBoost, and Random Forest improve sentiment classification accuracy by reducing overfitting and generalization errors [
9,
10]. Nevertheless, simply combining several existing components does not automatically yield a meaningful methodological contribution. The key challenge is to design a fusion mechanism in which the components provide complementary information rather than redundant predictions. Building on this perspective, our proposed system employs both hard and soft voting mechanisms to unify the outputs from BERTLR and RoBERTaLR, while the fractional derivative-enhanced channel introduces an additional feature representation that complements transformer embeddings instead of merely duplicating their role. In this sense, the proposed framework is not only an ensemble of classifiers, but a structured hybrid representation model for engineering sentiment analysis.
Sentiment analysis is an important technique for organizations seeking actionable insights from consumer feedback. Machine learning (ML) and deep learning (DL) models have emerged as effective methods for automated sentiment recognition and trend forecasting. However, identifying the most effective approach remains challenging due to the diversity of available algorithms, each with varying levels of interpretability and performance across domains [
11,
12]. Comparative research highlights the need for hybrid methods that can adapt to dataset-specific nuances while balancing generalization and accuracy [
7,
11,
13]. In engineering review analysis, these challenges are further amplified by highly specific terminology, irregular sentiment markers, and domain-dependent expressions, motivating the need for models that combine semantic understanding with mathematically grounded feature enrichment. As shown in
Figure 1, the proposed sentiment classification framework integrates BERTLR and RoBERTaLR models.
Recent breakthroughs in hybrid modelling, which combine machine learning and deep learning approaches, have produced promising results in sentiment classification problems. Support vector machines (SVM), random forests (RF), artificial neural networks (ANN), and gradient-boosted regression trees (GBRT) have shown effectiveness in high-dimensional textual domains [
9,
11,
14]. The integration of fractional derivative-based transformations into these frameworks offers added granularity in feature extraction, further elevating classification accuracy.
In addition, recent research has explored sentiment analysis within engineering design and product development contexts, where fine-grained sentiment information is used to guide optimization and redesign processes. For example, ontology-based sentiment analysis combined with evolutionary design strategies has been applied to extract aesthetic and functional preferences from user feedback, enabling intelligent product redesign [
15]. While such approaches demonstrate the importance of domain-specific sentiment modeling in engineering applications, they primarily rely on structured semantic representations and optimization-driven frameworks, without explicitly capturing memory-dependent lexical interactions.
Despite this promise, prior studies have rarely examined whether fractional-order feature modeling can provide a complementary signal to transformer embeddings in a unified and interpretable sentiment analysis framework. This unresolved issue forms the central motivation of the present work. Accordingly, this study is positioned not only as a methodological contribution in hybrid AI-based sentiment modeling but also as an applied framework for engineering-oriented text intelligence and decision support.
In this study, we analyze and compare the performance of different ML and DL models on a public engineering review dataset, with the extended ensemble, which includes both BERT and RoBERTa models, serving as the primary architecture. We evaluate classification performance using common metrics including precision, recall, F1-score, and accuracy. The key contributions of this study are summarized below:
A hybrid semantic–fractional framework that combines BERT and RoBERTa embeddings with logistic regression and fractional derivative-based feature transformation, enabling the integration of contextual semantic information and memory-aware statistical text representations.
A mathematically motivated feature enrichment strategy in which fractional derivative-enhanced TF-IDF is used to model non-local and long-range lexical variations that are not explicitly captured by standard TF-based or embedding-only sentiment pipelines.
An extended ensemble design that includes BERTLR, RoBERTaLR, and soft-voting integration, allowing us to examine both individual and complementary contributions of transformer-based variants within a unified framework.
Empirical validation across multiple engineering sentiment datasets showing that fractional derivative-enhanced TF-IDF consistently outperforms standard TF representations and provides a useful complementary signal to transformer embeddings.
An interpretable evaluation setting for industrial-scale opinion mining and decision support, supported by comparative experiments, ablation-style analysis, and explainability-oriented assessment of the proposed framework.
This paper is organized as follows.
Section 2 examines the current state of the art.
Section 3 outlines the methodology and implementation details.
Section 4 analyses the experimental results.
Section 5 concludes with future directions.
2. Related Work
The growing complexity of user-generated content and large-scale online feedback has motivated the development of advanced classification frameworks capable of extracting reliable and actionable information from text. A recent study introduced an ensemble detection architecture utilizing a voting classifier that incorporates 11 machine learning algorithms, such as Naïve Bayes and K-Nearest Neighbours (K-NN). Following cross-validation, the three most effective classifiers were integrated to create a robust ensemble model. The framework attained a classification accuracy of 94.5% and demonstrated robust performance as indicated by ROC curves, recall, and F1-score [
15]. Although originally developed for fake news detection, this study demonstrates the broader effectiveness of ensemble-based classification for complex text mining and AI-driven decision-support systems.
Misinformation and deceptive user content continue to pose significant challenges across online platforms, with broad socio-political and economic implications. Ensuring the credibility of digital content has therefore become essential, while traditional classifiers often struggle to adapt to evolving text patterns [
16]. In sentiment analysis, particularly for customer reviews, this challenge is equally important because unreliable or noisy language can distort sentiment models and reduce classification quality. Basic bag-of-words (BoW) models provide sparse representations and often fail to capture semantic richness, even when extended with neural embeddings [
17]. This limitation has encouraged the design of more expressive feature extraction and model fusion strategies.
To address these shortcomings, multi-view learning frameworks that combine bag-of-n-grams with parallel convolutional neural networks (CNNs) have been developed. These models incorporate embedding layers with small-kernel convolutions, improving their ability to capture local semantic dependencies and increasing robustness in text classification tasks. In the context of fraudulent review detection, the integration of textual and behavioral features has produced F1-scores of up to 92% on datasets such as Yelp Filtered Reviews [
18]. These findings suggest that hybrid models can benefit from combining heterogeneous signals rather than relying on a single feature family.
Cognitive computing and computational modeling are increasingly applied in sentiment-driven applications, particularly in domains such as finance, telecommunications, and business intelligence. The quality and structure of textual data play a central role in model performance. One study found that high readability and concise review length significantly improved the accuracy of models such as SRN, CNN, and LSTM across benchmark sentiment datasets [
19]. Regression-based analyses further confirmed that readability-related variables correlate with improved classification outcomes, while time-sensitive sentiment evaluation strategies often outperform simple preprocessing-only pipelines [
20]. These studies indicate that sentiment classification benefits not only from stronger classifiers but also from richer and more informative representations of text.
Beyond binary polarity prediction, recent work has focused on improving the granularity and robustness of sentiment classification through hybrid modeling. To enhance classification effectiveness, ensemble systems have been developed by combining deep learning and classical machine learning [
21]. Among ensemble strategies, boosting-based approaches have often outperformed bagging, with optimized combinations yielding superior results [
22]. Similarly, hybrid models that fuse machine learning and deep learning components, particularly those incorporating CNNs, have demonstrated robust performance in sentiment and misinformation detection tasks. These results support the view that ensemble learning can improve sentiment analysis when the constituent models contribute complementary information.
Recent ensemble classifiers have combined multiple base learners, including SVC, Random Forest, Decision Tree, and MLP, within unified architectures. The addition of deep learning components has further strengthened model resilience on large-scale datasets, improving precision, recall, and F1-score [
23]. However, the dynamic and informal nature of user-generated language remains a major challenge. Research targeting platforms such as Twitter and Facebook emphasizes the need for architectures that can better manage linguistic variability, implicit sentiment, and domain-dependent expressions [
24]. This challenge becomes even more pronounced in engineering review corpora, where sentiment is often embedded in technical terminology and short context-dependent statements.
Advanced feature extraction techniques, including TF-IDF, BoW, Word2Vec, and BERT embeddings, remain central to modern sentiment classification. When integrated with ensemble classifiers such as LR, NB, SVM, XGBoost, and hybrid neural models, these representations can substantially improve predictive performance. Comparative investigations involving IDF, LSA, and LDA found that TF-IDF is particularly effective for large datasets, whereas smaller datasets require additional semantic modeling to achieve competitive performance [
25]. Ensemble deep learning frameworks have also outperformed traditional pipelines in multiple settings. For instance, random forests achieved 87.1% accuracy, while SVM classifiers showed stronger precision and F1-score in some tasks, highlighting the advantages of hybrid AI approaches [
26]. Nevertheless, most of these studies still rely on standard lexical or embedding-based representations and do not explicitly model non-local memory effects in text.
Modern sentiment analysis increasingly underpins automated decision-making across industries. Traditional methods often suffer from manual feature engineering bias and limited adaptability, whereas the integration of ML and DL techniques offers scalable alternatives for customer feedback analytics and business intelligence systems [
27]. In addition, computational intelligence approaches, including machine learning-enabled liquid democracy models, illustrate the broader potential of AI-assisted decision frameworks in adaptive classification settings [
28]. These developments reinforce the importance of sentiment models that are not only accurate but also sufficiently flexible and interpretable for real-world deployment.
With the emergence of LLMs, BERT has reshaped NLP tasks through pre-trained contextual embeddings and task-specific fine-tuning. Recent studies have further improved sentiment classification by leveraging sentence pairing and transformer-based contextualization [
29]. Compared with earlier shallow or CNN-based sentiment models, transformer architectures provide substantially richer semantic representations. However, transformer embeddings alone may not fully characterize non-linear lexical variation or long-memory effects in domain-specific review text. Our approach therefore combines deep learning and classical machine learning within a hybrid ensemble architecture and introduces a soft-voting integration strategy to enhance sentiment classification performance [
30]. In parallel, fractional calculus, as an extension of classical differentiation, has recently been explored for AI-based modeling. Numerical computation of fractional derivatives has shown promise in engineering applications, especially for extracting richer and more non-local features in AI-based text classification settings [
31,
32].
Despite these advances, three important gaps remain. First, prior sentiment analysis studies rarely integrate transformer-based contextual embeddings with fractional-order lexical feature transformation in a unified framework. Second, many hybrid sentiment models combine components heuristically without clearly exploiting complementary semantic and memory-aware statistical representations. Third, limited work has examined this problem in technical or engineering review domains, where sentiment is often expressed through specialized terminology, implicit evaluation, and context-sensitive wording. These gaps motivate the proposed framework, which integrates BERT/RoBERTa representations with fractional derivative-enhanced TF-IDF features to construct a more expressive and interpretable sentiment classification pipeline.
3. Materials and Methods
The dataset was analyzed and classified using machine learning (ML) and deep learning (DL) approaches, considering key performance metrics such as precision, F1-score, recall, and accuracy. A comparative analysis was conducted between individual ML/DL models and ensemble learning classifiers, incorporating fractional derivative-based feature extraction, along with hard and soft voting strategies. In addition, new experiments were introduced using both BERT and RoBERTa, as well as a BERT–RoBERTa ensemble, to improve the robustness of sentiment prediction.
Various classification and regression models from ML and DL domains were explored to identify optimal methods for sentiment prediction. This section provides both a theoretical and technical overview of the research methodology, detailing the analytical framework used to process Google Play Store reviews.
The dataset was originally collected using an application programming interface (API), enabling programmatic extraction of user-generated reviews and ratings. Due to the time of collection, the exact API version is not available. The collected data were subsequently processed, preprocessed, and analyzed using Python (version 3.9).
3.1. Dataset
The study utilizes a large-scale dataset of user reviews and ratings from multiple applications across diverse categories, as depicted in
Figure 2. The dataset comprises 322,528 reviews spanning six distinct categories (Action, Casual, Entertainment, Music & Audio, Photography, and Card), ensuring a comprehensive representation of user sentiment across various application domains. The data was systematically processed and visualized to enhance its interpretability, facilitating a deeper understanding of sentiment trends and linguistic patterns.
Figure 3 illustrates the rating system employed to assess the performance of the selected classifiers. To ensure data integrity and compliance, the dataset sources align with publicly available and ethically obtained user-generated content. Although the dataset is derived from Google Play Store reviews rather than a traditional engineering benchmark, it is suitable for evaluating engineering-oriented sentiment analysis because many application reviews contain functional, technical, and performance-related feedback. Users frequently comment on reliability, usability, efficiency, interface behavior, compatibility, and system responsiveness, which are closely related to engineering and product quality assessment. Therefore, the dataset provides a realistic applied setting for studying sentiment classification in technically oriented review text. In this context, the term “engineering sentiment analysis” is used in an applied sense, referring to sentiment evaluation of technically oriented user feedback rather than strictly domain-specific engineering corpora.
3.2. Methodology
This study proposes a hybrid sentiment classification methodology that integrates logistic regression (LR) with transformer-based language representations and fractional-order feature modeling. In particular, Bidirectional Encoder Representations from Transformers (BERT) and its optimized variant RoBERTa are used to capture contextual semantic representations of review text, while fractional derivative-enhanced lexical features are employed to model statistical and memory-dependent patterns in textual data. Unlike conventional sentiment pipelines that rely solely on transformer embeddings or traditional TF-IDF representations, the proposed framework combines semantic embeddings and fractional-order feature transformations within a unified ensemble architecture. This design allows the model to leverage both contextual language understanding and mathematically enriched lexical representations.
To evaluate the effectiveness of this approach, several classifiers and feature extraction techniques were examined. The evaluated models include Support Vector Machine Classifier (SVMC), Decision Tree Classifier (DTC), Random Forest Classifier (RFC), Naïve Bayes Classifier (NBC), Logistic Regression Classifier (LRC), AdaBoost Classifier (ABC), and transformer-based architectures such as BERT. In addition to these individual models, the proposed framework introduces extended variants based on BERTLR and RoBERTaLR as well as a hybrid ensemble combining both representations. These models were evaluated using both term frequency (TF) and term frequency-inverse document frequency (TF-IDF) features, together with ensemble decision mechanisms including soft voting and hard voting. The overall workflow of the proposed classification framework is illustrated in
Figure 4.
The first phase of this study involved preprocessing the dataset. Application reviews were collected using API-based access and publicly available data sources from the Google Play Store. After collecting the reviews, they underwent preprocessing to clean and normalize the data. This preprocessing involved removing unwanted characters, trimming leading spaces, eliminating excessive spaces, and converting text to lowercase. The following figure illustrates the classification mechanism for Google Play Store reviews:
To further improve the quality of textual representations, stop-word removal and stemming techniques were applied during preprocessing. After refining the text, a Bag-of-Words (BoW) representation was initially constructed, followed by term frequency (TF) analysis using Python (version 3.9). The widely used term frequency-inverse document frequency (TF-IDF) approach was then employed to produce more informative lexical representations. TF-IDF provides a weighting mechanism that highlights informative terms while reducing the impact of common or non-discriminative words.
Figure 4 illustrates the TF-IDF-based representation used prior to the feature extraction stage.
In addition to conventional TF and TF-IDF representations, this study incorporates fractional derivative-based transformations to enrich lexical features. Fractional calculus extends classical differentiation to non-integer orders and has been shown to capture long-memory effects and non-local relationships in data representations. By applying fractional derivatives to TF and TF-IDF representations, the proposed framework enhances the sensitivity of feature vectors to subtle lexical variations and term dependencies. This transformation produces a richer representation space that complements the contextual semantic embeddings produced by transformer models.
Ensemble learning, a powerful machine learning strategy, was employed to improve predictive performance by integrating multiple classifiers. Instead of relying on a single model, the ensemble framework aggregates the predictions of multiple models that capture different aspects of the input data. In the proposed framework, transformer-based models capture semantic context, while fractional feature transformations enhance lexical representation. Their integration within an ensemble architecture enables complementary learning signals to contribute to the final prediction.
Two ensemble decision strategies were implemented:
Hard voting: Each classifier in the ensemble casts a vote for a class, and the class receiving the majority of votes is selected as the final prediction.
Soft voting: Each classifier generates a probability distribution over classes, and the class with the highest cumulative probability across classifiers is selected as the final prediction.
By combining fractional derivative-based lexical representations with transformer-based contextual embeddings, the proposed BERTLR framework provides a hybrid semantic–statistical learning architecture. The fractional feature transformation acts as a complementary signal to the contextual embeddings learned by BERT and RoBERTa, enabling the system to capture both semantic meaning and long-range lexical patterns within review text. This integration allows the framework to better interpret subtle sentiment cues often present in technical review datasets.
The ensemble learning framework was implemented in Python using widely adopted machine learning libraries including scikit-learn, TensorFlow, and the Hugging Face Transformers library. Logistic regression (LR), BERT, and RoBERTa models were integrated within both hard and soft voting ensemble structures. Fractional derivative-based feature extraction was applied during the preprocessing stage to enrich the textual feature space prior to classification. Model performance was evaluated using standard metrics including accuracy, precision, recall, and F1-score, enabling a comprehensive comparison between individual classifiers and the proposed hybrid ensemble framework.
3.3. Ensemble Learning
Ensemble learning is a robust technique that improves machine learning performance by aggregating predictions from multiple models. In traditional sentiment classification pipelines, ensemble models typically combine several classifiers to improve predictive stability and reduce variance. However, such ensembles often rely on homogeneous representations, where each model processes the same underlying feature space. In contrast, the proposed BERTLR framework introduces a hybrid representation strategy in which complementary feature families are integrated within the ensemble architecture. Specifically, transformer-based contextual embeddings capture semantic meaning in review text, while fractional derivative-enhanced lexical features introduce a mathematically enriched representation capable of modeling long-range dependencies and non-local lexical relationships.
The proposed BERTLR method therefore extends conventional ensemble learning by integrating fractional calculus-based feature transformation with transformer embeddings. Instead of simply combining multiple classifiers, the framework fuses heterogeneous representations that encode different linguistic properties. Transformer embeddings capture contextual semantic information, whereas fractional-order transformations enrich TF-IDF features by modeling memory-aware lexical relationships. This dual-channel representation allows the ensemble model to exploit complementary signals that are not captured when using transformer embeddings or statistical features independently.
Mathematical Formulation: The fractional derivative, denoted as
, is a generalization of the ordinary derivative to non-integer orders. It allows modeling long-term dependencies and subtleties in review data. The fractional derivative-based enhancement applied to BERTLR is described by the following formula:
where:
is the order of the derivative (non-integer).
is the Gamma function, generalizing the factorial function.
represents the feature extracted from the review data.
is the integration variable.
This formulation enables fractional calculus to capture long-memory characteristics that cannot be represented using classical integer-order differentiation. In the context of text classification, fractional operators introduce a transformation that emphasizes subtle term variations and contextual dependencies within TF-IDF representations. As a result, the feature space becomes more expressive, allowing the classification model to better discriminate sentiment signals embedded in domain-specific language.
Algorithm 1 summarises the full BERTLR pipeline, including preprocessing, fractional GL differencing, BERT CLS embeddings, concatenation, LR training, and optional RoBERTa ensemble voting.
Integration into BERTLR: The BERTLR model employs fractional derivative transformations on TF-IDF features to enhance the model’s capacity to capture nuanced patterns in review data. This transformation enriches the lexical feature representation before it is combined with contextual embeddings generated by transformer models. The integration of these two feature channels enables the system to simultaneously model semantic context and memory-aware lexical dynamics. Consequently, the ensemble does not simply aggregate predictions from multiple models but integrates complementary feature representations that contribute to improved discrimination performance.
| Algorithm 1 BERTLR pipeline: preprocessing, fractional features, embeddings, and ensemble prediction |
Require: Corpus ; FD order - 1:
for each review in do - 2:
lowercase, de-punctuate, stopword remove, stem/lemmatize - 3:
end for - 4:
- 5:
▹ Grünwald–Letnikov fractional differencing - 6:
- 7:
▹ optional - 8:
▹ concatenate - 9:
Train LR on to obtain - 10:
if ensemble then - 11:
- 12:
Train LR on to obtain - 13:
soft/hard voting of - 14:
else - 15:
- 16:
end if - 17:
return predictions and metrics (Accuracy, Precision, Recall, F1)
|
Experimental evaluations demonstrate the effectiveness of the proposed BERTLR framework and highlight the benefits of combining fractional-order feature transformations with transformer embeddings in sentiment classification tasks. The hybrid architecture allows the ensemble to exploit complementary signals from contextual semantic representations and mathematically enriched lexical features.
In ensemble learning, the final prediction is usually derived by aggregating the outputs of multiple models. The most frequent approaches for this aggregation are voting and averaging. The mathematical formulation for integrating machine learning and deep learning algorithms into a voting-based ensemble classifier is expressed as follows:
Each model is assigned a weight (where ), and denotes the output prediction of the corresponding model. These weights determine the individual model’s contribution to the overall prediction. The values of these weights can be derived through methods such as cross-validation, boosting, and averaging.
In the BERTLR method, fractional derivative-based enhancement was incorporated into the ensemble learning framework to improve the discrimination of reviews. To define and describe the fractional derivative-based enhancement equation specific to BERTLR, let us denote
X as the input review data, and
as the fractional derivative-based enhancement applied to
X. The enhancement equation in BERTLR can be represented as:
The operator denotes the fractional derivative of order . Unlike traditional derivatives, the fractional derivative operator extends the concept of differentiation to non-integer orders, enabling the modeling of more complex behaviors. By applying this fractional derivative-based enhancement to the input data, BERTLR captures intricate patterns and lexical dependencies that are not easily detected by conventional feature extraction methods.
The transformed data, , are then utilized within the ensemble learning framework, where predictions from multiple models are aggregated using voting-based strategies. This process allows the proposed framework to combine contextual semantic embeddings from transformer models with fractional-order lexical features, producing a richer and more expressive representation for sentiment classification.
It is essential to emphasize that the exact form and implementation of the fractional derivative operator, , may vary depending on the characteristics of the dataset and the sentiment classification task. Further elaboration on the selection and parameterization of the fractional derivative operator contributes to the interpretability and reproducibility of the proposed framework.
3.4. Voting Techniques
A hybrid classification method leverages several ML and DL algorithms to evaluate key metrics such as accuracy, precision, recall, and F1-score. During this phase, various voting strategies were employed. After statistical data analysis and testing, the model with the highest accuracy is chosen as the final output. After individually assessing the accuracy of each classifier and comparing their performances, the BERTLR model was selected based on its superior accuracy. The two algorithms were then merged using a hybrid approach to maximize the overall accuracy. This study focused on two voting strategies.
The following subsections describe each voting technique.
3.4.1. Soft Voting
The soft voting classifier categorises input data based on the probability of all predictions provided by different classifiers. Soft voting is only possible if all of the classifiers can calculate the probabilities of the outcomes. Soft voting achieves the best outcome by averaging the chances calculated using classification models. If the classifier can determine the possibility of its predictions, soft voting is performed automatically. This can be confirmed by determining whether the classifier has a prediction probability method.
3.4.2. Hard Voting
The input data are classified using a hard-voting classifier that uses the mode of all predictions made by the different classifiers. When a model is picked by a simple majority vote from among a group to make the final forecast, it is known as hard voting. Majority voting is processed differently when classifier weights are equal or unequal. Therefore, the predicted label mode is employed when majority voting is used, along with equal weights. Consider clf1, clf2, and clf3 as three classifiers. Given a set of data, the forecast was [1, 1, 0]. The classifier weights must be equal if a prediction method is used. Therefore, the mode of [1, 1, 0] is 1, and the real class of the record is 1 which is the mode [
1,
26,
33].
3.5. Experimental Configurations
To validate the contribution of fractional derivative-based feature transformation and the role of transformer-based hybrid integration, we evaluated five configurations of the proposed framework. These configurations were designed not only to compare alternative models, but also to isolate the contribution of each major component in the pipeline. In particular, they allow us to examine whether the observed performance gains arise from transformer embeddings alone, fractional-order feature enrichment alone, or the structured combination of both within the proposed ensemble framework.
BERTLR (Proposed): The primary model, which combines BERT embeddings with fractional derivative-enhanced TF-IDF features and Logistic Regression. This configuration represents the main hybrid semantic–fractional architecture proposed in this study.
RoBERTaLR: An extension that replaces BERT embeddings with RoBERTa embeddings while retaining the same fractional derivative-enhanced TF-IDF features and Logistic Regression classifier. This configuration is used to evaluate whether the proposed fractional feature integration remains effective across alternative transformer backbones.
BERT-NoFD (Ablation): A baseline setup that removes the fractional derivative transformation and retains only standard TF-IDF features together with BERT embeddings. This configuration is intended to isolate the contribution of fractional-order feature enhancement in the proposed BERT-based framework.
RoBERTa-NoFD (Optional Ablation): A variant that uses RoBERTa embeddings without fractional enhancement. This configuration provides an additional ablation setting to examine whether the benefit of fractional transformation is consistent when the contextual encoder is changed.
Hybrid Ensemble (Voting): A combined configuration that integrates BERTLR and RoBERTaLR predictions through both soft and hard voting strategies. This setting evaluates whether the two transformer-based variants provide complementary predictive information that can be further exploited through ensemble decision fusion.
These configurations enable a structured comparative analysis in which the role of fractional derivatives, transformer backbones, and voting-based integration can be examined independently and jointly. As a result, the experimental design supports both performance comparison and methodological validation, providing clearer evidence for the contribution of the proposed hybrid sentiment classification framework.
3.6. Feature Extraction Methods
Feature extraction is one of the most important processes in DL/ML-based classification because the quality of the learned representation strongly influences downstream performance. In the present study, the collected Google Play Store review data were processed to construct both conventional lexical features and fractional-order enhanced feature channels. Rather than treating fractional derivatives as an auxiliary post-processing step, we position them as a core mechanism for enriching text representations before classification. The motivation is that conventional TF and TF-IDF statistics capture local term salience, whereas fractional-order transformations can additionally encode long-range and memory-dependent lexical interactions. This makes the resulting representation more expressive for sentiment classification in technical and domain-specific review text.
3.6.1. Extended Fractional Operators (Caputo vs. Grünwald–Letnikov)
We complement the Grünwald–Letnikov (GL) fractional differencing used to construct TF-IDF channels with a theoretical comparison to the Caputo operator. The GL form defines a discrete fractional difference via binomially weighted historical terms:
which we implement with causal convolution and padding for stability.
To make the transformation process explicit, let the TF-IDF representation of a review be denoted by the feature vector
, where each
is the TF-IDF weight of the
nth feature. The Grünwald–Letnikov fractional transformation of order
d is then applied element-wise along the ordered feature sequence by computing a weighted sum of the current and preceding feature values. Specifically, for each position
n, the transformed feature is obtained as
where
denotes the GL-transformed TF-IDF coefficient. In practice, the transformation is carried out through the following steps: (1) compute the conventional TF-IDF vector
for each review; (2) choose the fractional order
d; (3) compute the Grünwald–Letnikov coefficients
; (4) apply the weighted historical accumulation in the above equation to obtain the transformed vector
; and (5) use
as the fractional lexical feature channel for fusion with transformer embeddings. In this way, each transformed coefficient depends not only on its original TF-IDF value but also on a weighted history of preceding feature components, which introduces the desired memory-aware behavior into the lexical representation.
The notion of memory in fractional calculus provides a natural analogy to long-range dependencies in textual data. In conventional TF-IDF representations, each feature is treated independently, and the contribution of a term is determined only by its local occurrence statistics. In contrast, the fractional Grünwald–Letnikov operator introduces a history-dependent transformation in which each feature coefficient is influenced by preceding feature values through a weighted accumulation process across the feature sequence.
From a linguistic perspective, sentiment expressed in text often depends on contextual interactions between words that may not be adjacent in the feature space. For example, the polarity of a term can be influenced by earlier descriptive or modifying words in the review. By incorporating fractional-order differencing, the transformed feature representation implicitly captures such non-local dependencies, allowing the model to encode memory-aware lexical relationships. This establishes a direct correspondence between the mathematical memory effect of fractional operators and the modeling of long-range dependencies in sentiment analysis.
As a result, the fractional transformation enhances the discriminative power of TF-IDF features by incorporating contextual influence across feature components, thereby improving sentiment classification performance compared to standard TF-IDF representations that rely solely on local term statistics.
In contrast, the Caputo derivative acts on continuously differentiable signals and is defined (for
) as
emphasizing recent gradients through a power-law memory kernel. While GL is natural for discrete text features, both forms share the key property of long-range memory controlled by the fractional order
d. Empirically varying
preserves monotonic improvements up to
and then saturates, suggesting that moderate memory depth best complements contextual embeddings. This supports our design choice and provides a principled knob to trade off smoothness and sensitivity in fractional TF-IDF channels.
The inclusion of both operators is important because it clarifies the theoretical basis of the proposed feature transformation. The Caputo form provides an interpretable continuous-time reference for memory-aware differentiation, whereas the GL form is more suitable for discrete text representations and direct implementation on token-weight vectors. In this work, GL is adopted as the operational mechanism because TF-IDF features are discrete and finite-dimensional. Therefore, the proposed methodology does not simply borrow fractional calculus conceptually; it selects a specific fractional operator whose discrete structure aligns naturally with text-based feature engineering.
3.6.2. Fractional Derivatives: An Overview and Their Application in Large Language Models
Fractional derivatives provide a sophisticated method for analyzing the rate of change in functions that may not exhibit typical linear behavior. Unlike traditional derivatives, which measure changes at specific integer orders (such as first- or second-order derivatives), fractional derivatives extend this concept to non-integer orders such as 0.5 or 1.5. This extended range allows a more detailed examination of systems that do not follow simple, predictable patterns, offering a useful mathematical tool to characterize processes that deviate from standard local behavior.
The primary advantage of fractional derivatives is their capacity to describe systems with memory and hereditary traits. In simpler terms, fractional derivatives allow both the current and past states of a system to influence the present representation. For example, in physics, fractional derivatives can describe anomalous diffusion, where future behavior depends not only on the present state but also on a history of previous states. In engineering, they are useful for modeling materials with memory, such as viscoelastic materials, where stress depends on both current and past strains [
34].
In computational disciplines, particularly in artificial intelligence, fractional derivatives help model data with long-term dependencies. This is crucial for sequential and language-related data, where the significance of a term or phrase often depends on previous lexical context. Unlike traditional algorithms that treat each feature independently, fractional derivatives enable the model to encode dependencies across multiple components of the representation. This property makes them attractive for NLP settings in which sentiment is influenced by non-local contextual interactions. Accordingly, fractional derivatives can improve predictive performance and make feature representations more sensitive to subtle textual cues [
31,
35].
The integration of fractional derivatives allows the model to account for long-term dependencies within text data. This is especially useful for sentiment analysis tasks in which previous contexts influence the meaning of words and sentences. Such enrichment helps the model classify text more accurately by providing richer feature representations, which are essential for predicting sentiment in nuanced reviews. In this study, we employ fractional derivatives to strengthen the feature extraction stage of our hybrid framework, which includes BERT and RoBERTa. This integration supports a more detailed analysis of sentiment in text data and captures subtle linguistic characteristics that might be overlooked by purely embedding-based or purely lexical models.
Subsequent to preprocessing, the corpus was partitioned into two subsets with a 3:1 ratio, with one subset designated for training and the other for testing. The methodology employed for feature extraction is illustrated in
Figure 4, demonstrating the use of extraction techniques such as TF and TF-IDF on both training and testing datasets. The models were trained on the training subset, and their classification performance was assessed on the test data.
TF-IDF, commonly utilised in information retrieval (IR) and summarisation tasks, evaluates the importance of terms based on their frequency within a document and across the corpus. The TF and IDF components are essential in constructing TF-IDF representations. The IDF component assigns higher weights to rare terms across the dataset, thereby increasing their contribution to the overall representation.
When calculating the IDF, the following formula is applied:
where
N denotes the total number of documents and
represents the document frequency of term
t. The term frequency (TF) is defined as the frequency of term
t in document
d. Consequently, the total weight of a token in a document using TF-IDF is given by:
Within the proposed framework, this conventional TF-IDF representation is not used as the final lexical descriptor. Instead, it serves as the input to a fractional-order transformation stage, where the feature vector is enriched through GL-based differencing. This design is important because it links the theoretical discussion of fractional operators directly to the actual model pipeline: TF-IDF provides a weighted lexical structure, and the fractional transformation reshapes that structure to encode non-local and memory-aware dependencies before fusion with transformer embeddings.
3.6.3. Fractional Derivative Feature
A fractional calculus-based feature extraction technique was developed to mitigate the adverse effects encountered during feature processing. The derivative allows a more refined feature representation by calculating the rate of change of the features. In particular, the fractional derivative is employed to reduce these negative effects and enhance the weighted numerical values generated by TF-IDF or TF feature engineering methods. The integration of this technique with TF-IDF and TF improves the predictive accuracy of the proposed BERTLR model [
36].
The integration of fractional derivatives in feature extraction facilitates the identification of intricate, non-linear patterns in text data. This indicates that utilising models such as BERTLR and RoBERTa-based variations on the data enhances the model’s capacity to discern complex connections between terms, resulting in improved sentiment comprehension. More importantly, the role of the fractional operator in our framework is not merely to perturb the input features, but to produce a transformed lexical channel that is complementary to transformer embeddings. While BERT and RoBERTa capture contextual semantics through self-attention, the fractional feature mechanism emphasizes graded lexical interactions and memory-aware term behavior that are not explicitly modeled by embedding-only architectures.
This approach allows a richer, non-linear understanding of the data. When applied to feature extraction in our sentiment analysis model, fractional derivatives facilitate the identification of complex relationships and patterns in text that would otherwise be missed. The refined representation of features through fractional derivatives significantly improves the performance of the BERTLR and RoBERTa-based models, as it allows the model to process nuanced and context-dependent features more effectively. Consequently, the proposed framework should be interpreted as a structured hybrid representation model, in which fractional-order lexical transformation and contextual semantic embedding play different but complementary roles.
By modifying the fractional derivative (FD) approach, a new set of characteristics can be discovered. The FD of a function
of
kth order
k is not only an integer but also any real number, and is defined as follows:
Fractional derivatives used with text-based feature extraction techniques have been shown to improve classification model performance. By capturing the non-integer order dynamics within the text data, the fractional derivative-based mechanism offers a deeper and more refined understanding of the relationships between the features. This approach provides a more comprehensive representation of text data when combined with traditional techniques such as TF and TF-IDF. This fusion has been shown to significantly boost the prediction accuracy of classification models, outperforming the results of traditional methods alone. The enhanced representation of text data enables classification models to grasp the underlying relationships more effectively, leading to more accurate decision-making and improved outcomes.
In the context of the proposed BERTLR framework, this fractional derivative feature channel is central to the methodological contribution of the study. It provides the mathematical mechanism that distinguishes the framework from standard transformer-only sentiment classifiers and from conventional lexical-statistical pipelines. By explicitly connecting fractional-order feature modeling with transformer-based semantic embeddings, the proposed method offers a principled and interpretable pathway for hybrid sentiment analysis in technical domains.
3.7. Experimental Setup
The experimental setup was designed to ensure reproducibility and fair comparison across models. The dataset was divided into training and testing subsets using a standard split ratio (80% training and 20% testing), ensuring balanced representation across all categories. Stratified sampling was applied to preserve class distribution.
For transformer-based components, pre-trained BERT and RoBERTa models were employed as feature extractors without fine-tuning. The embeddings generated by these models were combined with fractional derivative-enhanced TF-IDF features within a soft-voting ensemble framework.
The fractional order d was empirically selected from the range , with optimal performance observed near . For the LSTM baseline, standard training procedures were followed, including the use of the Adam optimizer, ReLU activation, and a dropout rate of 0.5 to prevent overfitting.
All experiments were conducted using Python-based deep learning libraries (e.g., TensorFlow/Keras), and model training was performed on a standard computational environment with GPU acceleration.
3.8. Classifiers Used for Review Classification
In this section, various ML and DL algorithms employed in the study are described.
3.8.1. Support Vector Regression
The SVM model represents instances as spatial points that are assigned to as many individual categories as feasible. Support vector machines (SVMs) are widely utilised as dependable and scalable supervised machine learning methods for regression and classification. However, they are more typically used when trying to categorise a scenario. When first adopted in the 1960s, they were refined and improved in the 1990s. Compared to other machine learning algorithms, SVMs have unique display methods. Owing to their ability to deal with various chronic and categorical instances, they have become common in recent years. Consequently, SVM classifiers have excellent precision and can handle large-dimensional spaces [
37]. A subset of the training points was used by the SVM classifiers, which means that they used very little memory. For example, support vector machines separate classes by locating divisional spaces to categorize hyper-floors within large fields. For the most part, a hyperplane is an excellent way to divide various categories. The classifier can generate fewer errors if the rim is larger than the standard value. Consequently, new examples were mapped to the same space and projected as part of the group on the other side of the distance. One can construct an infinite or high-dimensional set of data instances in a hyper-vector machine that can be used to classify, regress, or perform other tasks such as identifying outliers in the data. Each upper-space hyperplane is composed of points with permanent dot products and their vectors, and each of these orthogonal sets of vectors defines an upper-space hyperplane [
38].
3.8.2. Random Forest Classifier
Random Forest (RF) is an estimating technique that focusses on ensemble random forest training. In ensemble learning, various models or relevant concepts are grouped and used repeatedly. For example, the “random forest” algorithm performs the same function using various algorithms. A random forest algorithm can also be used to estimate and recognize functions. By analyzing the output of the local outlier factor and random forest algorithms, we determine the exact fraud percentage in the dataset [
39].
Random forest is a supervised learning technique that can also be used to make regression predictions, while its major application is in classification problems. The random forest performed better as the number of trees rose. This technique generates numerous decision trees from data samples and then aggregates their predictions to find the best effective option. The random forest strategy, unlike a single decision tree, decreases the danger of overfitting by aggregating the results of numerous trees. This reduces the variance and improves flexibility and reliability. Unlike other models that may require scaling or large datasets, the random forest algorithm delivers high precision, even when working with smaller datasets or incomplete data, maintaining its robustness and accuracy [
40].
Random forest was employed to predict the outcomes based on an ensemble of tree-based algorithms. Each tree relied on a distinct set of random features and independently calculated vector estimates. Random forest training helps to reduce overfitting, which is common in decision trees. Random forests improve model resilience by selecting predictors from a random subset of features at each node, compared to typical decision trees that split based on the most optimal feature from all available variables [
41].
3.8.3. Logistic Regression Algorithm
Logistic regression is a popular machine-learning algorithm that is well-suited to binary classification applications. It uses probabilities to express the possible outcomes of a trial. As a type of regression, logistic regression uses a logistic function to map expected values, making it useful when the dependent variable is categorical. The overall structure of the logistic regression model is specified as follows:
3.8.4. Decision Tree Classifier
Decision trees are structures that categorise instances based on feature values for classification purposes. Each node in a decision tree represents a trait that can be categorized into a specific situation and each branch reflects the value that the node can adopt. The Quinlan ID3 algorithm was extended to include the Kotsiantis decision-tree technique. The Q-decision tree method was used in this study, and bagging is another algorithm used for the decision trees. (CHAID) [
42] is a CHi-square automated connection detector.
3.8.5. Long Short-Term Memory
Long short-term memory (LSTM) is a sophisticated sort of recurrent neural network (RNN) architecture that is widely used in deep learning. Unlike standard feedforward neural networks, LSTM uses feedback connections to efficiently handle full data sequences, including both dependent and independent factors. LSTM is particularly effective for tasks such as network traffic monitoring or intrusion detection, where traditional recurrent neural networks, Markov models, and other sequence-learning techniques struggle with varying sequence lengths. The core components of an LSTM unit are the input, output, and forget gates that regulate the flow of information within the unit. This structure allows LSTMs to capture, analyze, and forecast time-series data by handling unknown delays between significant events. The LSTM design was developed to overcome the limitations of standard artificial RNNs, giving it a competitive edge in time-series prediction tasks compared to other AI techniques [
43].
3.8.6. Bidirectional Encoder Representations from Transformers (BERT)
BERT, a sophisticated machine learning model developed by Google, is specifically designed for pre-training in natural language processing (NLP). Utilizing a transformer architecture, BERT effectively captures the context of words in both directions, enhancing its understanding of linguistic relationships. This concept, proposed by Jacob Devlin and his team on Google, was introduced in 2018. Since 2019, Google has integrated BERT into its search algorithm to improve the interpretation and analysis of user queries, thereby enhancing the relevance and accuracy of search results.
3.8.7. AdaBoost
AdaBoost, short for Adaptive Boosting, is a meta-algorithm developed by Freund and Schapire that functions as a binary classifier. It improves the performance of learning algorithms by assigning higher weights to misclassified instances, enabling subsequent models to concentrate on these errors. The final output is derived as a weighted sum of predictions from multiple learning algorithms. AdaBoost’s adaptive nature allows it to iteratively refine the model by learning from earlier mistakes. In certain cases, it demonstrates greater resistance to overfitting than other learning methods. Although individual learners may be weak, AdaBoost creates a robust overall model as long as it performs better than random guessing [
44].
3.8.8. Proposed Hybrid Classifier (BERT + LR)
A hybrid voting classifier, also known as the ensemble learning approach, is shown in
Figure 5. It integrates several ML and DL models to obtain final classification results. BERT and LR predictive algorithms were integrated in this study to solve the prediction challenge. The ensemble learning technique uses the training data to train each model in its ensemble. A class label was predicted for each sample in the testing data using each model, which was fed to the models once the training phase was complete. An ensemble learning method was employed to train and evaluate the models using real-world data. In the subsequent phase, a voting mechanism was applied to generate predictions for each individual sample. It is generally possible to vote on either hard or soft ballots depending on the situation. In hard voting, the ensemble learning method designates a class label to a sample based on the majority vote. Of the seven models, four classified the sample
as belonging to Class C1, while the other three classified it as belonging to Class C2. Due to the preponderance of voters selecting Class C1, that class will be allocated to the specified sample.
Compared to hard voting, soft voting takes the average of all of the expected outputs, such as the class labels, and assigns the sample to the class with the highest chance of being allocated to the sample.
3.9. Additional Experimental Variants with RoBERTa and Ablation
To further examine the robustness and originality of the proposed framework, additional configurations were analysed, as previously outlined in
Section 4.5. These comprise:
RoBERTa-FD: Replaces BERT with RoBERTa embeddings while keeping the fractional derivative-enhanced features and Logistic Regression classifier.
BERT-NoFD and RoBERTa-NoFD: Ablation models that exclude the fractional derivative feature enhancement to assess its individual contribution.
Ensemble Voting: Combines BERT-FD and RoBERTa-FD forecasts with soft and hard voting strategies to improve overall performance.
The conventional metrics (Accuracy, Precision, Recall, F1-score) were employed to evaluate these experiments, which were executed using the same training pipeline. The Results and Discussion section contains comprehensive comparative results.
3.10. Evaluation Matrices
To evaluate and compare the ML/DL models, we relied on a set of performance metrics, as detailed below [
45,
46,
47]:
Precision is defined as the proportion of correctly predicted positive samples among all predicted positive samples:
The F1-score, which is the harmonic mean of precision and recall, is defined as:
Accuracy is defined as the proportion of correctly predicted samples among all samples:
Recall is defined as the proportion of correctly predicted positive samples among all actual positive samples:
4. Result and Discussion
This section evaluates the predictive performance of the proposed framework and compares it with multiple machine learning and deep learning baselines under different preprocessing and feature extraction settings. The objective is not only to report classification accuracy, but also to clarify how the proposed semantic–fractional representation affects performance across engineering review categories. In particular, the experiments are designed to examine the effect of conventional lexical representations (TF and TF-IDF), fractional derivative-enhanced features, and transformer-based hybrid integration. The discussion therefore focuses on both comparative performance and the methodological role of the proposed feature fusion strategy.
4.1. Results with Pre-Processing
This study used a combination of ML and DL classifiers, each with various hyperparameters. These parameters were determined through empirical analysis to achieve high classification performance. In the conventional ML setting, classifiers such as DTC, RFC, NBC, and LRC were evaluated with standard lexical features, whereas the transformer-based configurations incorporated contextual semantic representations through BERT. The purpose of this comparison is to establish a meaningful baseline before introducing fractional feature enhancement and hybrid ensemble integration.
Table 1 presents the category-wise accuracy outcomes for all classifiers when TF-IDF is applied. Compared with simpler statistical classifiers, transformer-based models benefit from richer contextual representations, while the proposed BERTLR model further improves performance by combining such semantic embeddings with enhanced lexical signals.
As shown in
Figure 6, DTC exhibited the lowest accuracy when combined with the TF-IDF feature extraction technique, particularly in the Action and Casual categories. This indicates that shallow tree-based models are less effective in exploiting high-dimensional sparse textual representations. In contrast, both BERT and the proposed BERTLR model consistently achieved high accuracy across all categories, demonstrating the value of contextual semantic information in engineering sentiment classification. More importantly, BERTLR systematically outperformed standalone BERT, suggesting that the gains do not arise solely from transformer embeddings, but from the additional contribution of enhanced lexical representations. The performance of RFC and SVM also varied across categories, especially in Music & Audio, indicating that certain domains present more difficult feature interactions. Overall, these results show that while TF-IDF provides a stronger lexical baseline than TF, its full advantage is realized when it is combined with more expressive representation learning and hybrid classification strategies.
Table 2 summarizes the classification accuracies of various machine learning and deep learning classifiers when the TF feature extraction approach is used. The results indicate that LRC and BERT remain relatively competitive under TF-based representation, while BERTLR again achieves the strongest overall performance. This comparison is important because it shows that the proposed framework does not depend exclusively on TF-IDF superiority; rather, it remains strong even when the initial lexical representation is simpler. At the same time, the gap between TF and TF-IDF settings reveals that richer lexical weighting remains beneficial before hybrid semantic integration.
As shown in
Figure 7, LRC performed strongly across categories under TF-based representation, especially in Entertainment and Photography. However, the proposed BERTLR model remained consistently stronger overall, suggesting that the hybrid semantic–fractional architecture benefits not only from improved lexical representation but also from the complementary contribution of transformer-based semantics. Although the numerical difference between TF and TF-IDF is smaller for some conventional models, the proposed framework preserves a consistent advantage under both representations. This stability is important because it indicates that the model improvement is not tied to a single handcrafted representation, but rather emerges from the structured integration of contextual and fractional feature channels.
Table 3 summarizes the precision, recall, and F1-scores for the different review categories investigated in this study. During the study, the Logistic Regression Classifier (LRC) showed superior performance relative to several classical baselines under TF representation. Nevertheless, the hybrid BERTLR framework achieved stronger and more balanced performance across the categories, indicating that the integration of contextual semantic embeddings and enhanced lexical features yields a more robust classification mechanism than conventional single-representation models.
The Logistic Regression Classifier (LRC) assesses the impact of various factors on binary outcomes [
48]. It performed well on the TF-IDF dataset, outperforming other conventional classifiers in terms of accuracy. Furthermore, the LRC achieved comparatively strong recall and F1 values. While BERT performed well on its own, the proposed BERTLR model surpassed all other classifiers in precision, recall, and F1-score.
Table 4 therefore provides more than a simple accuracy comparison; it shows that the proposed hybrid model produces stronger and more balanced classification quality across multiple evaluation metrics.
Table 5 summarizes the performance metrics of classifiers utilizing the TF feature extraction method together with fractional derivative enhancement. The results show that while several conventional classifiers benefit from fractional augmentation, the strongest gains remain associated with the proposed BERTLR configuration. This suggests that fractional-order features are most effective when they are used as a complementary lexical signal alongside contextual semantic embeddings rather than as an isolated enhancement to shallow classifiers.
Table 6 compares the accuracies of the classifiers when applied to different feature extraction methods. The results demonstrate that TF features generally lead to weaker performance than TF-IDF-based representations. In contrast, under TF-IDF with fractional enhancement, the proposed BERTLR framework achieves the highest accuracy. Overall, these observations confirm that the proposed method benefits from both the lexical discriminability of TF-IDF and the additional representational richness introduced by fractional-order transformation.
Figure 8 compares the average accuracy of all classifiers using TF and TF-IDF feature extraction techniques. The experimental results show only a marginal variation in performance between the two methods for some conventional classifiers, whereas TF-IDF consistently outperforms TF in terms of accuracy, precision, and related metrics across the stronger models. This finding emphasizes the effectiveness of TF-IDF in capturing more informative features from the dataset and further supports the use of TF-IDF as the base lexical representation for fractional enhancement. More importantly, the comparative analysis indicates that the proposed hybrid framework preserves a consistent advantage over alternative classifiers across both representation settings.
4.2. Results with Fractional Derivative
Table 5 presents the classification accuracy achieved when fractional derivative enhancement is applied together with the TF feature extraction method. These results are important because they isolate the effect of fractional-order lexical enrichment under a relatively simple base representation. Although several conventional classifiers benefit from the fractional transformation, the gains are not uniform across models. In particular, classifiers such as RFC and DTC remain comparatively limited, suggesting that fractional enhancement alone is not sufficient to overcome the inherent representational constraints of shallow classifiers. By contrast, stronger improvements are observed for models that can better exploit enriched features, especially the proposed BERTLR configuration.
The results further indicate that the value of fractional derivatives is not merely to increase numerical feature complexity, but to reshape the lexical feature space in a way that becomes more informative for downstream classification. Under TF-based representation, the proposed BERTLR model consistently achieves the highest category-wise accuracy, showing that fractional-order enhancement provides a useful complementary signal even when the underlying lexical weighting is relatively simple. This observation supports the argument that the proposed framework benefits from the interaction between semantic embeddings and fractional lexical dynamics rather than from ensemble aggregation alone.
As shown in
Figure 9, the TF feature extraction technique combined with fractional enhancement resulted in the lowest accuracy for the RFC classifier, indicating its limited effectiveness in exploiting this enriched representation. In contrast, the stronger performance of BERTLR demonstrates that the benefit of fractional derivatives is most pronounced when the enhanced lexical channel is integrated with contextual semantic representations. This finding is significant because it suggests that fractional-order features are not universally advantageous across all classifiers but become substantially more effective when incorporated into a structured hybrid architecture.
To further examine the effect of fractional derivatives under a stronger lexical representation,
Table 6 summarizes the corresponding category-wise accuracies when TF-IDF is used instead of TF. The comparison between TF and TF-IDF is particularly important because it clarifies whether fractional-order enhancement remains beneficial when applied to a more informative base representation. The results show that TF-IDF consistently outperforms TF, indicating that fractional derivatives are most effective when they operate on a lexically discriminative feature space.
Under TF-IDF with fractional enhancement, the proposed BERTLR model again achieves the strongest performance across all six categories, reaching the highest values in Action, Casual, Entertainment, Music & Audio, Photography, and Card. This is an important result because it shows that the improvement is not confined to a single category or a single feature setting. Instead, the gains are systematic, which strengthens the argument that fractional-order transformation contributes meaningful additional information when fused with transformer-based embeddings.
Another important observation is that the performance gap between conventional classifiers and BERTLR becomes more pronounced in the TF-IDF setting than in the TF setting. This suggests that fractional derivatives do not act as a generic performance booster for all models. Rather, their strongest effect emerges when they are integrated into a hybrid architecture that can jointly exploit contextual semantics and enriched lexical statistics. In this sense, the proposed model benefits from a principled interaction between transformer embeddings and fractional feature engineering, which helps explain why its gains remain consistent across categories.
Overall, the results in this subsection indicate that fractional derivative enhancement plays a meaningful methodological role in the proposed framework. The strongest performance is obtained not by voting alone and not by lexical transformation alone, but by the combination of fractional-order feature enrichment with transformer-based semantic modeling. This provides empirical support for the central hypothesis of this study: that memory-aware lexical features can act as a complementary signal to contextual embeddings in engineering sentiment classification.
4.3. Comparison to State-of-the-Art Methods
In this work, we assess the performance of the proposed BERTLR framework with fractional derivative-enhanced features and compare it with a range of previously reported machine learning and deep learning approaches used in text classification and sentiment analysis. The comparative results in
Table 7 are intended to position the proposed framework relative to representative methods in the literature, while acknowledging that differences in datasets, class distributions, and problem settings can affect direct numerical comparability. Therefore, this comparison is interpreted as a contextual benchmark rather than as a strict one-to-one ranking.
The proposed BERTLR framework achieves strong performance in comparison with multiple baseline and hybrid methods reported in the literature. For example, LSTM-based models [
49] reported an accuracy of 85%, while Particle Swarm Optimization combined with LinearSVC [
50] achieved a precision of 63.6% and recall of 55.0%. Convolutional neural networks (CNNs) [
51] achieved high precision and recall on a substantially larger review corpus. In this context, the proposed BERTLR framework demonstrates competitive performance, achieving up to 91% accuracy on the engineering review classification task. More importantly, the proposed model combines strong predictive performance with a structured hybrid representation strategy based on contextual transformer embeddings and fractional-order lexical feature enrichment.
Unlike many existing methods that rely either on shallow lexical representations or on end-to-end neural architectures alone, the proposed framework explicitly integrates two complementary information sources: contextual semantic embeddings and fractional derivative-enhanced TF-IDF features. This distinction is important because the model does not seek to replace transformer-based understanding with handcrafted representations but rather to introduce a mathematically motivated lexical signal that complements the semantic information learned by BERT and RoBERTa. As a result, the observed gains should be interpreted not simply as ensemble effect, but as evidence that the hybrid semantic–fractional representation is useful for engineering sentiment analysis.
As shown in
Figure 10, LRC demonstrated strong performance among the conventional classifiers when TF-IDF feature extraction was used. However, the proposed BERTLR framework consistently exceeded the performance of such single-representation models. This observation is important because it shows that strong TF-IDF-based lexical classification alone is not sufficient to match the hybrid representation obtained through the integration of contextual embeddings and fractional-order feature enhancement. In other words, the improvement achieved by BERTLR cannot be attributed only to the use of TF-IDF or only to the use of logistic regression; rather, it emerges from the structured interaction between semantic and fractional lexical representations.
A deep learning method, specifically LSTM, was also used to assess accuracy on the Google Play Store dataset. The architecture of the LSTM network used in this study is depicted in
Figure 11, where an embedding layer is introduced between the input and LSTM layers to transform input word vectors into word embeddings. A rectified linear unit (ReLU) activation function was selected due to its effective performance with text data [
49,
58]. A dropout layer with a value of 0.5 was employed as the regularization unit. A sigmoid function is used in the final layer to generate the probability of each class [
59]. The Adam optimizer was employed in this investigation since it has been shown to perform better with noisy data [
60]. LSTM achieved an accuracy of approximately 0.85, confirming that sequence-based neural models are effective for sentiment analysis; however, it also shows that the proposed BERTLR framework remains superior for the present task. It is important to note that, in this study, transformer models such as BERT and RoBERTa are employed as feature extractors rather than fully fine-tuned end-to-end classifiers. While fine-tuned transformer models can provide strong baseline performance, the objective of this work is to investigate the complementary role of fractional derivative-enhanced lexical features when combined with contextual embeddings within a hybrid framework. The use of pre-trained embeddings allows a controlled comparison between conventional lexical features and fractional-order transformations under a unified classification setting.
A direct comparison with fully fine-tuned transformer models constitutes an important direction for future work. Such experiments would enable further evaluation of whether fractional-order feature modeling provides additional benefits beyond end-to-end transformer architectures.
The performance improvements of the proposed BERTLR framework can be attributed to the complementary interaction between contextual semantic embeddings and fractional lexical feature enhancement. While conventional models rely either on local lexical statistics (e.g., TF-IDF) or sequential modeling (e.g., LSTM), the hybrid architecture combines semantic understanding with memory-aware feature transformation, enabling richer representation of sentiment signals.
However, it is also observed that the performance gains over strong baselines such as BERT and LSTM are relatively moderate in certain categories. This can be explained by the fact that transformer-based embeddings already capture substantial contextual information, leaving limited room for additional improvement. In such cases, the contribution of fractional-order features acts as a complementary refinement rather than a dominant factor. These observations highlight that the effectiveness of the proposed hybrid approach depends on the complexity and variability of the underlying dataset.
Several studies have shown mixed results when deep learning is applied to smaller datasets. Wang et al. [
61] noted that deep learning-based approaches can perform poorly on small datasets. However, Zampieri et al. [
50] examined foul language detection on social media using SVM, CNN, and bidirectional LSTM (BiLSTM), finding that BiLSTM and CNN outperformed classical machine learning SVM on relatively small datasets. These findings suggest that architecture suitability depends strongly on dataset characteristics [
62]. In the present work, the proposed hybrid semantic–fractional architecture appears to be particularly well suited to engineering review data, where both contextual meaning and lexical nuance contribute to sentiment expression.
Table 7 presents a comparative analysis of various machine learning and deep learning methodologies applied to classification tasks across different datasets. The literature covers models ranging from optimization-enhanced ensembles such as Particle Swarm Optimization (PSO) combined with LinearSVC [
50] to more complex neural network models such as CNNs [
51], hyperparameter-optimized machine learning models [
52], SVM-based systems [
53], big data deep learning approaches [
54], gradient boosting sentiment models [
55], logistic regression baselines [
56], and ensemble methods based on random subspace learning [
57]. Taken together, these results show that no single model is universally dominant across all tasks and datasets. This reinforces the importance of designing representations that are well matched to the target domain.
From this perspective, the contribution of the proposed framework lies not only in its accuracy but also in its representation strategy. By introducing a fractional derivative-based enhancement layer into a transformer-driven sentiment analysis pipeline, the proposed method provides a more expressive hybrid architecture for engineering review classification. The results therefore support the view that the combination of contextual transformer embeddings and memory-aware fractional lexical features offers a meaningful methodological advantage for this domain, beyond the effect of conventional model aggregation alone.
4.4. Discussion on Limitations
While the hybrid BERTLR model combined with fractional derivatives has demonstrated strong performance, it is important to acknowledge certain limitations. For instance, the model may struggle with reviews that contain very ambiguous or contradictory sentiments, where the context is unclear or overly complex. In such cases, the model may fail to accurately classify sentiments, which could lead to misinterpretation of user opinions. Future work could focus on developing advanced techniques for handling such ambiguous sentiment reviews, perhaps by integrating context-aware mechanisms or fine-tuning the model to better detect conflicting sentiments.
Additionally, the model’s reliance on the availability of large labeled datasets means that its performance might degrade when applied to smaller, less diverse datasets. This limitation is especially evident in scenarios where only limited training data is available, which could lead to overfitting or inaccurate results. Future research could address this issue by exploring transfer learning approaches or semi-supervised learning to improve the model’s performance on smaller datasets.
Although the BERTLR model performs well across a broad range of review datasets, future work should investigate its adaptability to niche domains, such as specialized product categories or industry-specific reviews. Such exploration may reveal whether domain-specific training data can further improve classification accuracy and generalization.
Moreover, the computational cost associated with deep learning models—particularly when combined with fractional derivative-based feature extraction—can present challenges for real-time deployment or use in resource-constrained environments. Future research could explore architectural optimizations or lightweight model variants to enhance computational efficiency, making the framework more suitable for deployment on edge devices or in latency-sensitive systems.
Implementation Details: All experiments were conducted on a system equipped with an Intel Core i7 processor, 32 GB RAM, and an NVIDIA RTX GPU. The hybrid model was implemented in Python 3.9 using the TensorFlow and Scikit-learn libraries.
4.5. Additional Results from RoBERTa, Ablation Studies, and Ensemble Voting
To evaluate the novelty and robustness of the proposed hybrid sentiment analysis framework, we conducted additional experiments using the RoBERTa transformer model, fractional derivative (FD) feature variants, and ensemble voting.
Table 8 presents a comparative analysis between BERTLR (our proposed model) and RoBERTa variants with and without fractional features, along with a combined ensemble configuration using soft voting. These extended variants help clarify the individual and joint contributions of transformer embeddings and fractional calculus.
As shown in
Table 8, the proposed BERTLR model with fractional derivative features consistently achieved the highest accuracy across all categories, confirming its strength in engineering sentiment classification. The RoBERTa + FD variant also performed strongly, though slightly below BERTLR. Meanwhile, the ensemble model using soft voting provided balanced and reliable results, benefiting from the complementary characteristics of both transformer-based architectures. These findings reaffirm the contribution of fractional feature augmentation and model integration strategies.
Figure 12 illustrates the accuracy distribution across different model configurations. Notably, BERTLR emerged as the best-performing individual model, while the RoBERTa-FD and ensemble methods demonstrated competitive accuracy, further validating the robustness and flexibility of the proposed framework for sentiment classification in engineering domains.
Table 9 presents the ablation study. The results show that FD features alone fail to provide useful predictive power, while BERT+LR achieves strong performance. The proposed BERTLR (BERT + FD + LR) maintains high accuracy with balanced precision, recall, and F1, confirming the complementary role of fractional derivatives.
4.6. Explainability and Interpretation
To increase transparency, we analyzed the proposed BERTLR classifier using SHAP (SHapley Additive exPlanations). We employed a linear SHAP explainer with a background subset of the training instances to estimate feature contributions for the combined representation (BERT CLS embeddings concatenated with fractional-differenced TF-IDF features).
Figure 13 reports the global importance ranking: the model relies on a small subset of latent embedding dimensions together with several fractional TF-IDF channels, indicating that the fractional operator contributes non-redundant information beyond the contextual embeddings. For interpretability at the lexical level, we additionally trained a TF-IDF + LR surrogate and computed word-level SHAP values (
Figure 14), which highlight sentiment-bearing tokens as the most influential (e.g., “excellent”, “crash”, “useless”, “smooth”). These analyses support the validity of the hybrid design and clarify why ensemble features improve predictive performance.
5. Conclusions
This study presented a hybrid AI framework that integrates Bidirectional Encoder Representations from Transformers (BERT) with Logistic Regression (LR), augmented by fractional derivative-based feature engineering for enhanced sentiment classification. By combining deep contextual embeddings with mathematically enriched lexical transformations, the proposed framework provides a hybrid semantic–fractional representation for domain-specific engineering sentiment analysis. This design goes beyond the use of conventional ensemble aggregation by explicitly integrating contextual semantic information with memory-aware fractional-order feature modeling.
Empirical evaluations demonstrated that TF-IDF consistently outperformed TF, while the introduction of fractional derivative-based features further improved representational granularity and predictive performance. The proposed BERTLR framework achieved accuracy levels of up to 0.91 across the evaluated categories, indicating that fractional-order lexical enhancement can provide a meaningful complementary signal to transformer-based embeddings. In addition, SHAP-based explainability analyses confirmed that fractional TF-IDF features contribute non-redundant information alongside contextual embeddings, helping to address the interpretability limitations often associated with deep language models.
The SHAP-based explainability analysis further highlights how the proposed hybrid framework improves interpretability by identifying the relative contribution of individual features. The results suggest that fractional derivative-enhanced TF-IDF features capture complementary lexical information that is not fully represented by transformer-based embeddings alone.
Moreover, the SHAP values indicate that the model integrates both semantic context and memory-aware lexical patterns when making predictions. This supports the effectiveness of the proposed hybrid architecture, demonstrating that the combination of fractional feature modeling and transformer embeddings leads to more informative and reliable sentiment classification decisions.
Additional experiments using RoBERTa embeddings, with and without fractional derivative augmentation, further supported the generality of the proposed approach. Although RoBERTa+FD achieved competitive performance, BERTLR remained the strongest individual model. The ensemble configuration offered stable and complementary improvements, suggesting that the proposed semantic–fractional integration is robust across transformer backbones. While an LSTM baseline was also explored, it did not yield significant gains. Nevertheless, prior work [
58] has demonstrated that bidirectional LSTM and convolutional neural network (CNN) models can outperform traditional ML classifiers in related sentiment classification settings. This contrast highlights that classification performance depends not only on model complexity but also on the suitability of the representation strategy for the target domain.
Overall, this work demonstrates the value of integrating fractional calculus with large language models for sentiment analysis in technical domains. The main contribution of the study lies in showing that fractional-order lexical modeling can be used as a principled and interpretable complement to transformer-based semantic embeddings, thereby improving sentiment discrimination in engineering review text. In addition to its methodological contribution, the framework has practical relevance for applied engineering analytics, where interpretable sentiment modeling can support product assessment, service optimization, and data-driven industrial decision-making. The methodology is extensible to other domains where nuanced text interpretation is critical, offering a transparent and mathematically motivated framework for future AI-driven decision-support systems. However, transferring the proposed fractional feature extraction mechanism to domains with substantially different vocabulary distributions may present challenges. In particular, domain shifts involving highly specialized terminology, sparse lexical patterns, or different semantic structures may affect the stability and effectiveness of fractional-order transformations. The weighting behavior of the Grünwald–Letnikov operator depends on the structure and ordering of features, which may differ significantly across domains. Therefore, domain adaptation strategies, parameter tuning of the fractional order, or integration with domain-specific embeddings may be required to maintain performance when applying the framework beyond engineering review data.
Future Work
Future research may address ambiguous or mixed-sentiment inputs by incorporating attention mechanisms or reinforcement learning to better capture contextual nuances. Extending the framework to multilingual datasets and domain-specific corpora—such as financial or clinical texts—would further validate its generalizability.
Transfer learning using larger multilingual models and semi-supervised learning on limited labeled data may enhance cross-lingual performance. The RoBERTa experiments suggest that exploring additional transformer variants, such as XLNet or DeBERTa, could yield further insights into adaptability. Deeper ablations, including varying the order of the fractional derivative or integrating domain-specific embeddings, may also advance the theoretical understanding of feature transformation benefits.
Finally, future work should optimize computational efficiency through pruning, quantization, or knowledge distillation, enabling lightweight deployment on edge devices. Real-world applications such as social media monitoring, medical sentiment analysis, and customer feedback prediction stand to benefit from the proposed hybrid framework, particularly in contexts where accuracy, interpretability, and efficiency are equally essential.