Investigation of Exponent-Free LSTM Cells for Virtual Sensing Applications

Jankauskas, Mindaugas; Katkevičius, Andrius; Serackis, Artūras

doi:10.3390/electronics15030576

Open AccessArticle

Investigation of Exponent-Free LSTM Cells for Virtual Sensing Applications

by

Mindaugas Jankauskas

^*

,

Andrius Katkevičius

and

Artūras Serackis

Department of Electronic Systems, Vilnius Gediminas Technical University, Plytines g. 25, LT-10105 Vilnius, Lithuania

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(3), 576; https://doi.org/10.3390/electronics15030576

Submission received: 25 November 2025 / Revised: 24 January 2026 / Accepted: 27 January 2026 / Published: 28 January 2026

(This article belongs to the Special Issue IoT-Enabled Smart Devices and Systems in Smart Environments)

Download

Browse Figures

Versions Notes

Abstract

In this study, we investigate how computationally simplified activation functions affect predictive performance, inference latency, and energy usage in long short-term memory-based temperature prediction for wind turbine generator bearings. We tested three different types of long short-term memory (LSTM) cells, along with bidirectional LSTM (biLSTM) networks, to determine their effectiveness in modeling dynamic changes in gearbox bearing temperatures. We compared several activation-function variants, focusing on variants that are either computationally simple or known to give good performance in deep recurrent networks. The results show that the best-performing architectures achieved root mean squared errors (RMSEs) between

0.0798

and

0.0822

, corresponding to coefficients of determination in the range of

R^{2} = 0.84

–

0.85

. When applied across five turbines, the best-performing architectures (peephole and bidirectional) achieved root mean squared errors of

0.0898

,

0.0882

, and

0.042

, respectively. The best activation function-enhanced variant (the peephole) improved accuracy by approximately

3 %

while maintaining low model complexity. These findings provide a practical and efficient solution for embedded predictive maintenance systems, providing high accuracy without incurring the computational cost of deeper or bidirectional architectures.

Keywords:

virtual sensor; LSTM; time-series prediction; activation function; low power processing; embedded systems

1. Introduction

Modern wind power generators are increasingly equipped with sophisticated monitoring systems to ensure reliable operation, optimize maintenance cycles, and reduce unexpected downtime. Among the many components that require continuous supervision, gearbox bearings remain one of the most failure-prone elements due to variable mechanical loads, changing weather conditions, and long-term fatigue processes. Direct measurement of bearing temperatures—an important indicator of health and lubrication quality—is technically feasible but not always cost-effective or practical in distributed wind farms. As a result, the concept of virtual sensing, where machine learning models infer quantities that are difficult or expensive to measure directly, has gained significant attention in both industrial and academic communities.

Virtual sensors rely on correlated, easily measurable signals and predict target variables that serve as critical indicators of machine condition. In wind energy applications, parameters such as generator rotation speed, rotor speed, wind speed, relative and absolute wind direction, and active/reactive power contain rich dynamic information about thermal and mechanical states within the drivetrain. Recent studies demonstrate that data-driven approaches, especially recurrent neural networks, can effectively model these temporal dependencies. In our research, various long short-term memory-based architectures achieved high-accuracy prediction of gearbox bearing temperature, confirming their suitability as the core predictive mechanism for a virtual sensor. Such models enable early detection of emerging anomalies (deviations between the predicted and actual temperature dynamic changes) that can indicate abnormal thermal behavior, alerting operators before severe degradation occurs.

However, while LSTM networks offer excellent predictive accuracy, their computational complexity poses challenges for deployment on embedded hardware typically used in industrial monitoring systems. Standard LSTM-based architectures rely heavily on nonlinear activation functions such as sigmoid and hyperbolic tangent, both of which require exponential computations. These operations are costly in terms of processing cycles, power consumption, and latency—factors that become critical when virtual sensors must operate continuously on resource-constrained edge devices. In order to expand our anomaly detection approach for predictive maintenance on various alternative devices that require low-cost and low-power-consumption solutions, we investigate the redesign of recurrent neural network-based models to be more hardware-efficient without sacrificing predictive performance.

This study focuses on the replacement of exponential-based activations with computationally simpler alternatives, such as piecewise-linear or hard-saturating functions. These substitutes preserve the gating behavior essential for LSTM dynamics while significantly reducing arithmetic complexity, enabling faster inference and lower energy usage on microcontrollers, FPGAs, or other embedded platforms. Unlike approximation methods that rely on lookup tables or Taylor series expansions to estimate standard functions [1], our approach focuses on the fundamental replacement of the activation mathematical function itself. This ensures numerical stability and precise gating behavior without the memory overhead of lookup tables.

The main contributions of this paper are summarized as follows:

We propose a comprehensive evaluation of exponent-free activation functions specifically for wind turbine virtual sensing, demonstrating that they can match or exceed the accuracy of standard activations.
We provide a rigorous feature selection analysis, reducing the input space from 57 to 9 key variables using Mutual Information, ensuring an optimal balance between information density and model size.
We benchmark the proposed architectures against mainstream lightweight models (TinyLSTM), proving that the optimized activation functions offer a complementary path to efficiency alongside architectural compression.

2. State of the Art

Existing research on wind turbine condition monitoring shows that virtual sensing and data-driven temperature prediction can provide reliable indicators of drivetrain health. Several works demonstrate that gearbox and bearing temperatures can be predicted from Supervisory Control and Data Acquisition (SCADA) variables such as wind speed, rotor speed, generator power, and wind direction, making these measurements suitable for predictive maintenance. Yan et al. proposed a hybrid ensemble-based model combining variational mode decomposition, stacked autoencoders, and GMDH to predict gearbox bearing temperatures and showed that accurate temperature prediction improves early fault detection capabilities in wind turbines [2]. Similarly, Qian et al. applied long short-term memory (LSTM) neural networks to model SCADA time series, using the prediction error between measured and estimated values as an anomaly indicator for proactive monitoring of conditions [3]. In the domain of virtual sensing, Azzam et al. demonstrated that neural-network-based load reconstruction can replace physical load sensors in wind turbine gearboxes, providing strong evidence that virtual sensors can effectively map measurable parameters to internal drivetrain states [4]. Our previous work confirmed that prediction residuals from machine-learning models can serve as reliable markers of emerging gearbox anomalies [5].

In parallel to application-driven virtual sensing research, other research papers focus on deep learning architectures for temperature prediction and anomaly detection in wind turbine components. Hybrid convolutional–recurrent models (e.g., CNN-LSTM) have been shown to outperform traditional statistical approaches for thermal forecasting tasks. Other studies explored gated recurrent units, deep belief networks, and decomposition-based forecasting methods, consistently reporting improvements in modeling the nonlinear temporal patterns of drivetrain temperatures [2]. These results reinforce the suitability of recurrent neural networks, particularly BiLSTM architectures, to predict dynamic changes in gearbox bearing temperatures.

Although these works achieve high predictive accuracy, they generally assume deployment on high-performance servers and pay limited attention to computational constraints of embedded devices. Research focusing on efficient LSTM inference typically explores hardware accelerators or numerical precision optimizations rather than modifying the activation functions themselves. Seidel et al. provided insight into the capability of deep neural networks to perform complex tasks while employing alternative activation functions. They documented how these networks can be utilized not only for enhanced accuracy but also for competitive processing speeds, showcasing implications for real-world applications where energy costs are a major concern [6]. Extended research on classification and optimization tasks was presented by Ali et al. [7]. The authors implemented LSTM networks with 26 alternative activation functions and evaluated their performance on two real datasets (Japanese Vowels and Weather Reports). The study includes extensive simulations and empirical comparisons, demonstrating how different activation functions affect the accuracy of the LSTM classifier in multiple architectures and optimizers [7]. Rybalkin et al. designed optimized architectures for one-dimensional and multidimensional LSTM and BiLSTM accelerators, improving parallelism, yet still relying on conventional sigmoid and hyperbolic tangent functions [8]. Silfa et al. proposed dynamic precision selection for LSTM inference and demonstrated notable reductions in runtime and energy consumption without harming model accuracy [9]. Although such methods reduce computational load, they do not remove or simplify the exponential-based gating nonlinearities that dominate LSTM inference cost in microcontrollers and low-power edge devices.

Chong et al. introduced hardware-efficient approximations of sigmoid and tanh activations using shared piecewise polynomial approximations and lookup tables, allowing lower power consumption in LSTM accelerators [1]. Timmons and Rice systematically evaluated fast approximations to sigmoid and tanh and reported up to 37% reductions in training and inference time on CPU-based LSTM workloads [10]. Another category of work proposes entirely alternative activation functions. Parisi et al. introduced hyper-sinh activation and demonstrated competitive performance with standard tanh-based architectures [11]. Joseph and Bindiya proposed a combinational-logic-based activation unit using a hyperbolic sine function to replace nonlinearities in LSTM gating, achieving hardware efficiency gains while maintaining predictive accuracy [12].

Ding et al. emphasized the necessity of lightweight architectures, mentioning the utilization of GELU and ReLU6 activation functions in their modified MobileViT model aimed at low-energy applications in embedded systems [13]. A review by Vallés-Pérez et al. empirically evaluated newer activation functions such as S-ReLU and Mish, which are noted for their lower computational costs. They concluded that these functions maintain the desired properties of nonlinearity, continuity, and differentiability while potentially benefiting energy-sensitive systems [14]. Zhang et al. highlighted how fine-tuning these elements could significantly enhance the processing abilities of LSTM models while taking into account their implementation in energy-limited devices [15]. These findings support the notion that the choice of activation function can significantly impact both computational speed and energy use.

In the present work, we evaluate how computationally simplified activation functions affect predictive performance, inference latency, and energy usage in LSTM-based temperature prediction for gearbox bearing monitoring. The results obtained enable the development of virtual sensors in real-time and deployable in edge-integrated systems.

3. Materials and Methods

In our investigation, we focused on modeling dynamic changes in gearbox bearing temperature values using deep learning techniques. We tested three different types of long short-term memory (LSTM) cells, along with bidirectional LSTM networks, to determine their effectiveness in capturing temporal dependencies. To further optimize the computational load of these models, we experimented with alternative activation functions that do not rely on exponentiation, as traditional functions like the sigmoid and tanh can be computationally expensive. By doing so, we aimed to enhance the efficiency of our models without compromising their ability to accurately predict temperature variations in the gearbox bearings.

3.1. LSTM Variants and Bidirectional LSTM

This section provides the mathematical formulations of three widely used LSTM variants: Vanilla LSTM, LSTM with forget gate and peephole LSTM, followed by the formulation of bidirectional LSTM. Different LSTM architectures have been successfully applied to a wide range of sensor signal processing tasks due to their ability to capture temporal dependencies, nonlinear dynamics, and long-range correlations.

The Vanilla LSTM architecture introduces three gates—input, forget, and output—which regulate information flow and prevent vanishing gradients [16]. The mathematical expression of this LSTM cell is given by

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i}),

(1)

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f}),

(2)

o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o}), {\tilde{c}}_{t} = \tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c}),

(3)

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t},

(4)

h_{t} = o_{t} ⊙ \tanh (c_{t}) .

(5)

The design enables the network to selectively retain or discard information across long temporal intervals. Vanilla LSTM remains the most widely used due to its balance between modeling capacity and general applicability. Vanilla LSTMs have been widely used in inertial measurement unit (IMU) processing, including human activity recognition and gait modeling, where they outperform traditional HMM-based and feature-engineering methods [17,18,19,20,21]. In biomedical sensor analysis, LSTMs have achieved state-of-the-art results in ECG arrhythmia detection and physiological signal modeling due to their robust temporal memory [22,23,24,25].

The forget gate was introduced by Gers, Schmidhuber, and Cummins to improve the ability of LSTM networks to reset their internal state and thus better model non-stationary or drifting temporal processes [26]. The mathematical formulation of the LSTM cell with the forget gate is

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i}),

(6)

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f}), {\tilde{c}}_{t} = \tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c}),

(7)

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t},

(8)

o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o}),

(9)

h_{t} = o_{t} ⊙ \tanh (c_{t}),

(10)

where

i_{t}

is the input gate,

f_{t}

is the forget gate,

o_{t}

is the output gate,

{\tilde{c}}_{t}

is the candidate cell state,

c_{t}

is the updated memory cell, and

h_{t}

is the hidden state. The forget gate allows the network to control how much of the previous memory

c_{t - 1}

should be retained at each time step.

This formulation enables the model to handle time-varying, noisy, and non-stationary sensor signals more effectively than earlier RNNs, particularly when abrupt state resets or drift adaptation are required [27,28].

The peephole LSTM variant enables the gates to access the internal cell state [29], improving temporal precision, especially for tasks requiring accurate timing. The mathematical representation is given below:

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + V_{i} c_{t - 1} + b_{i}),

(11)

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + V_{f} c_{t - 1} + b_{f}), {\tilde{c}}_{t} = \tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c}),

(12)

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t},

(13)

o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + V_{o} c_{t} + b_{o}),

(14)

h_{t} = o_{t} ⊙ \tanh (c_{t}) .

(15)

Through connections

V_{i}

,

V_{f}

, and

V_{o}

, the gates are directly influenced by the internal memory. This mechanism showed better performance in applications such as phoneme boundary detection, ECG waveform segmentation, and periodic signal analysis.

The bidirectional LSTM processes the sequence in both forward and backward directions, thus incorporating the past and future context [30]. Let the forward and backward hidden states be

{\vec{h}}_{t} = {LSTM}_{f} (x_{t}, {\vec{h}}_{t - 1}, {\vec{c}}_{t - 1}),

(16)

{\overset{\leftarrow}{h}}_{t} = {LSTM}_{b} (x_{t}, {\overset{\leftarrow}{h}}_{t + 1}, {\overset{\leftarrow}{c}}_{t + 1}) .

(17)

The final output at time t is the concatenation of both states,

h_{t} = [\begin{matrix} {\vec{h}}_{t} \\ {\overset{\leftarrow}{h}}_{t} \end{matrix}] .

(18)

BiLSTMs are advantageous when full-sequence context is required, such as in natural language processing, offline speech processing, or offline video analysis. BiLSTMs have been particularly successful in sensor-based offline sequence analysis, such as hand gesture recognition by EMG [31], classification of the sleep stage from multimodal physiological recordings [32], and fault diagnosis [33,34].

3.2. Alternative Activation Functions

In this study we compared several activation functions, focusing on variants that are either computationally simple or known to give good performance in deep recurrent networks.

The hyperbolic tangent is a classical, smooth, bounded activation that maps inputs to the interval

(- 1, 1)

,

f_{\tanh} (x) = \tanh (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}} .

(19)

In conventional LSTM and BiLSTM cells,

\tanh (\cdot)

is often used both for state updates and output activations.

Leaky ReLU is a modification of the standard ReLU that allows a small, non-zero slope for negative inputs in order to avoid “dead” neurons. For a fixed leak coefficient

α = 0.125

, it is defined as

f_{LReLU} (x) = \{\begin{matrix} x, & x \geq 0, \\ α x, & x < 0, \end{matrix} α = 0.125 .

(20)

Hard-Swish is a piecewise-linear approximation of the Swish activation that is cheaper to evaluate in hardware. Replaces the smooth sigmoid with a clipped linear ramp,

f_{HSwish} (x) = x \cdot \frac{ReLU6 (x + 3)}{6},

(21)

where

ReLU6 (x) = \min (\max (x, 0), 6) .

(22)

Equivalently, in explicit piecewise form,

f_{HSwish} (x) = \{\begin{matrix} 0, & x \leq - 3, \\ \frac{x (x + 3)}{6}, & - 3 < x < 3, \\ x, & x \geq 3 . \end{matrix}

(23)

In the quantized Hard-Swish variant, the continuous Hard-Swish output is followed by numeric quantization to a low-precision format (e.g., FP8) in order to reduce memory bandwidth and arithmetic cost. Let

Q (\cdot)

denote the quantization operator that maps real values to a finite set of representable levels. The activation can then be written as

f_{QHSwish} (x) = Q (f_{HSwish} (x)),

(24)

where

f_{HSwish} (x)

is given above and

Q (\cdot)

is implemented as rounding to the nearest representable value in the chosen low-precision format.

Mish is a smooth, non-monotonic activation that combines the input with the hyperbolic tangent of the softplus function. It is defined as

f_{Mish} (x) = x \cdot \tanh (softplus (x)),

(25)

where

softplus (x) = \ln (1 + e^{x}) .

(26)

Swish is a smooth activation that multiplies the input by a sigmoid gate. For the commonly used case

β = 1

it is given by

f_{Swish} (x) = x \cdot σ (x),

(27)

where

σ (x) = \frac{1}{1 + e^{- x}},

(28)

is the logistic sigmoid. More generally, a parameterized form

f (x) = x \cdot σ (β x)

with learnable

β

can be used.

The Gaussian Error Linear Unit (GELU) weights the input by the value of the Gaussian cumulative distribution function. An efficient tanh-based approximation widely used in practice is

f_{GELU} (x) \approx \frac{1}{2} x [1 + \tanh (\sqrt{\frac{2}{π}} (x + 0.044715 x^{3}))] .

(29)

This approximation avoids direct evaluation of the error function while closely matching the exact GELU.

The Exponential Linear Unit (ELU) uses an exponential branch for negative inputs to push mean activations closer to zero. For

α = 1

, it is defined as

f_{ELU} (x) = \{\begin{matrix} x, & x \geq 0, \\ α (e^{x} - 1), & x < 0, \end{matrix} α = 1 .

(30)

3.3. Experimental Setup

The experiments were performed using operational data from five wind turbines (T01, T06, T07, T09, and T11) from the EDP Open Data Challenge. The turbines are Vestas V90 2 MW onshore units located in a variable terrain environment. The Supervisory Control and Data Acquisition (SCADA) system records operational signals at high frequency, which are then averaged over 10 min intervals. This 10 min sampling rate is the industry standard for SCADA-based condition monitoring. The dataset covers a period of two years (2014–2015), providing a sufficient seasonal variety to test model robustness against environmental fluctuations [35]. The objective was to model and forecast the generator bearing temperature (Gen_Bear_Temp_Avg) using multivariate time-series inputs. For each turbine, nine features were selected based on domain knowledge and their relevance to the thermal and mechanical dynamics of the drivetrain. The dataset was split into training and testing subsets using a chronological hold-out strategy to preserve temporal consistency. All models were trained using identical preprocessing steps, scaling procedures, input window size, and training hyperparameters to ensure comparability. Key dataset attributes are summarized in Table 1.

In this study, predictive models were trained using a set of nine physically significant features selected from a larger set of 57 turbine operational variables. The selection of these nine features was not arbitrary but based on a rigorous Feature Importance Analysis. We utilized Mutual Information (MI) and Spearman Correlation coefficients to rank the predictive power of all 57 available SCADA variables. The selected breakdown (1 environmental, 3 mechanical, 5 electrical) captures the primary thermodynamic drivers while excluding redundant or non-informative signals, which would otherwise increase model size and risk overfitting. This feature selection methodology and its impact on predictive maintenance performance were examined in detail in our previous work [5]. In the present study, the same validated feature set is intentionally retained in order to isolate and analyze the effects of LSTM architecture and activation function design.

The inputs chosen represent three fundamental domains that influence the temperature of the generator bearing: electrical loading, mechanical drivetrain dynamics, and environmental operating conditions. Electrical variables included the total active power and the total reactive power generated by the turbine, the active power produced by the generator, the electrical power delivered to the grid, and the phase currents measured in individual electrical phases. These parameters reflect the instantaneous electrical load, torque demand, and power conversion efficiency of the generator, all of which directly affect the thermal stress on the bearing. Mechanical behavior was captured through the average rotational speed of the generator shaft, which influences frictional heating and mechanical load transfer. The environmental conditions were represented by the average speed of the wind and the temperature of the ambient air, both of which play a key role in the aerodynamic loading, the cooling efficiency, and the overall thermal equilibrium of the turbine. In some turbines, the average pitch angle of the blade was also selected, indicating that turbine control actions can shape the thermal dynamics.

The overarching goal of the investigation was twofold. First, we aimed to evaluate how different LSTM variants perform when applied to SCADA-based thermal modeling of wind turbine generator bearings. This comparison provides insight into the relative importance of architectural modifications such as peephole connections and bidirectional temporal context. Second, after identifying the most robust architecture in multiple turbines, we performed a deeper analysis focusing on alternative activation functions within the LSTM cell. This second stage allowed us to study whether the internal nonlinear transformation within the peephole LSTM can be further optimized to improve the sensitivity and stability of temperature prediction.

The experimental design therefore consisted of two sequential phases. In the first phase, independently for all five turbines, the following three recurrent neural network architectures were evaluated:

LSTM with forget gate.
Peephole LSTM.
Bidirectional LSTM.

This phase established a baseline comparison of prediction accuracy, robustness, and computational efficiency for the three structurally different LSTM models. Based on the results, the peephole LSTM was selected for the second phase due to its consistently strong and stable performance across all turbines.

In the second phase, a detailed activation-function study was carried out using the peephole LSTM applied to a single turbine dataset. Here, multiple nonlinear activation functions were substituted into the cell state update equation, including traditional functions such as TANH and ReLU, as well as more recent nonlinearities such as GELU, Mish, SiLU, Softplus, Snake, Smish, and Logish. The goal of this phase was to determine whether activation-level modifications could enhance the model’s ability to capture subtle thermal variations and nonlinear dependencies in the generator bearing temperature dynamics. The performance of each activation variant was compared with the baseline peephole LSTM and against the other two LSTM architectures of the first phase.

All recurrent neural network models evaluated in this study were implemented using a unified architecture configuration to ensure a fair comparison between the different LSTM variants. Each model operated on input sequences of length 60, corresponding to a one-hour temporal window composed of past operational measurements. Recurrent layers consisted of a single hidden layer with 32 units, which provided a balance between model capacity and computational efficiency for turbine-level forecasting tasks. For all models, the output layer consisted of a fully connected regression head mapping the hidden state to the target generator bearing temperature.

Each model was trained independently for each turbine, and performance was assessed using MSE, MAE, RMSE, coefficient of determination (

R^{2}

), and mean absolute percentage error. Training time was also recorded to provide insight into computational efficiency. Training was performed for 10 epochs using a batch size of 64 and the Adam optimizer with a learning rate of 0.001. Training was restricted to 10 epochs based on a preliminary convergence study. Analysis of the loss curves for Turbine T01 demonstrated that the model reaches a reliable loss plateau between epochs 6 and 8. Extended training beyond 10 epochs yielded negligible performance gains on the training set while causing the validation loss to stabilize or slightly increase, indicating the onset of overfitting. Thus, 10 epochs provide the optimal balance between learning sufficiency and generalization.

These hyperparameters were found to offer reliable convergence across all units and were chosen based on previous experience with turbine SCADA data. During training, a maximum of 62,208 samples per turbine were used to ensure uniformity in the volume of data processed by each model.

The experimental workflow is summarized as follows for completeness and reproducibility. The multivariate SCADA time-series data from five wind turbines, with the temperature of the generator bearing as the prediction objective, comprise the data set. The input window length, network depth, number of hidden units, data preprocessing processes, and training hyperparameters are all the same for every model. Standard regression measures (MSE, MAE, RMSE,

R^{2}

and MAPE) calculated on temporally held test data are used to assess the performance of the model. This cohesive configuration guarantees that reported variances in performance may only be attributed to architectural and activation-function changes, not to variations in data processing or training methodology.

4. Results and Discussion

4.1. Horizontal Comparison with Lightweight Baseline

To further assess the embedded feasibility of the proposed approach, a horizontal comparison was conducted against a widely adopted lightweight recurrent architecture, namely TinyLSTM. This comparison aims to contextualize the proposed model within the landscape of resource-efficient time-series predictors commonly considered for edge and embedded deployments.

As summarized in Table 2, the mainstream TinyLSTM configuration (hidden size = 16) exhibits substantially lower computational complexity and inference latency, achieving approximately

40 \times

faster inference compared to the proposed peephole LSTM baseline. This behavior is consistent with its reduced parameter count and smaller memory footprint.

While TinyLSTM demonstrates superior efficiency and strong predictive performance, this comparison highlights an important design insight: lightweight recurrent architectures are well suited for SCADA-based thermal prediction tasks. The primary contribution of this work is therefore not to replace such architectures, but to enhance them.

Specifically, the proposed activation function optimizations are architecture-agnostic and can be directly integrated into lightweight models such as TinyLSTM by replacing conventional exponential-based nonlinearities (e.g., tanh) with computationally optimized alternatives such as Logish. This provides a mathematically grounded pathway to further reduce inference cost while preserving or improving predictive accuracy, without increasing architectural complexity.

4.2. Deployment-Oriented Quantization (INT8) Benchmark

To further validate the feasibility of deploying the proposed models on resource-constrained embedded systems, we conducted a quantization benchmark using PyTorch 1.9 dynamic quantization. This process converts the model weights from 32-bit floating point (FP32) to 8-bit integers (INT8), a standard technique for reducing memory footprint and energy consumption on edge devices (e.g., Cortex-M microcontrollers).

We benchmarked the best-performing proposed variant (peephole LSTM with Logish activation) and the baseline TinyLSTM model. Table 3 presents the comparison of model size, inference latency, and predictive accuracy (

R^{2}

) before and after quantization. The results demonstrate that INT8 quantization reduces the model size by approximately 35–40% while maintaining practically identical predictive accuracy (

R^{2}

variance

< 0.001

).

Figure 1 visualizes the trade-off between latency and accuracy. Quantized models (red markers) significantly reduce size (annotated) while maintaining the error position of their FP32 counterparts (horizontal alignment), confirming robust embedded feasibility.

These findings confirm that the proposed architectures are robust to quantization, making them highly suitable for deployment on low-power hardware where memory is the primary constraint.

4.3. Performance Comparison of LSTM Variants

Table 4 summarizes the results obtained for the three LSTM variants across all turbines in the first phase of experimental investigation. In most turbines, the peephole LSTM achieved the lowest RMSE and the highest

R^{2}

, indicating superior predictive accuracy for the generator bearing temperature.

Bidirectional LSTM achieved the best performance for turbines T09 and T11, whereas the LSTM with a forget gate consistently performed worse than the other two architectures. For turbines T06 and T07, the peephole LSTM demonstrated clear improvements over the other variants, reducing RMSE by up to 6–8% relative to the LSTM with a forget gate. Training time differed substantially between model classes: Peephole LSTMs required around 340 s on average due to additional state connections, while LSTMs with forget gates and bidirectional LSTMs completed training within approximately 5–6 s, highlighting a trade-off between accuracy and computational cost.

To evaluate global performance patterns, the results were averaged across all turbines. The peephole LSTM demonstrated the best overall predictive accuracy (average RMSE of 0.08187), outperforming both the LSTM with forget gate (average RMSE 0.08474) and the bidirectional LSTM (average RMSE 0.08340; see Figure 2).

Although the bidirectional LSTM achieved the best accuracy for turbines T09 and T11, its advantage was not consistent across all units. The performance gains of the peephole LSTM can be attributed to its access to the cell state through peephole connections, which improves the modeling of subtle thermal dynamics and time-dependent interactions within the drivetrain. The LSTM with a forget gate performed consistently worse, suggesting that additional structural enhancements (peepholes or bidirectional context) are beneficial for this prediction task.

Despite its superior accuracy, the peephole LSTM required substantially higher computational time (approximately 340 s per turbine) due to the additional gate-to-state connections. In contrast, LSTM and BiLSTM models required only 5–6 s to train. These results highlight a clear trade-off between computational cost and predictive accuracy that should be considered when deploying models in real-time or embedded monitoring systems. The results show that while both the peephole LSTM and bidirectional LSTM outperform the LSTM with a forget gate, the peephole LSTM provides the most robust accuracy across turbines, making it the preferred architecture for generator bearing temperature forecasting in this context.

During the second phase of the investigation on a single turbine dataset, the baseline peephole LSTM achieved a root mean squared error of approximately

0.0821

and a coefficient of determination

R^{2} \approx 0.8415

, improving over the standard LSTM with forget gate (RMSE

\approx 0.0867

,

R^{2} \approx 0.8232

). The bidirectional LSTM baseline yielded a slightly lower RMSE of about

0.0806

and

R^{2} \approx 0.8472

(see Table 5).

The experimental investigation results showed that most alternative activation functions produced RMSE values in a relatively narrow band around the baseline peephole LSTM, typically between

0.080

and

0.086

(see Table 5). Among them, the Logish-based peephole LSTM achieved the best performance, with RMSE

\approx 0.0799

and

R^{2} \approx 0.8498

, which represents an improvement of roughly

3 %

in RMSE compared to the baseline peephole LSTM and about

8 %

relative to the standard LSTM with forget gate. Other strong performers included the leaky ReLU, Snake and Softplus variants, which yielded RMSE values close to the bidirectional LSTM baseline while maintaining the same unidirectional peephole architecture. In contrast, some activations, such as Mish and ELU, provided noticeably higher errors (e.g., Mish with RMSE

\approx 0.0901

and

R^{2} \approx 0.8090

), indicating that smoother or more complex nonlinearities do not automatically translate into better predictive accuracy for this thermal forecasting task.

A special configuration combining a peephole LSTM with a Snake activation in a modified way (“Special: Peephole LSTM (Snake)”) led to a substantial performance degradation (RMSE

> 1.0

, strongly negative

R^{2}

, suggesting numerical instability or poor interaction between specific parameterization and time-series scaling. Since this outlier heavily distorts the visual scale, it is excluded from the comparative bar chart. When focusing on the remaining configurations, the results indicate that careful selection of activation functions within an already well-performing peephole LSTM can yield modest but consistent improvements in modeling generator bearing temperature changes, without altering the overall network architecture or input feature set.

Figure 3 presents a bar graph of RMSE values for the baseline models and all improved peephole LSTM variants, excluding the worst special Snake configuration to preserve a meaningful vertical scale.

5. Conclusions

The experimental results demonstrate that recurrent neural networks are capable of modeling generator bearing temperature dynamics with high accuracy. When applied across five turbines, the best-performing architectures (peephole LSTM and bidirectional LSTM) achieved root mean squared errors between

0.0798

and

0.0822

, corresponding to coefficients of determination in the range of

R^{2} = 0.84

–

0.85

. These values indicate that recurrent neural models can reliably capture more than

84 %

of the variance in bearing temperature and follow short-term thermal changes with an average absolute error below

0.05

°C.

Investigating alternative activation functions within the peephole LSTM cell revealed that accuracy decreases noticeably when using activation functions that do not rely on exponentials or smooth nonlinear transitions. For example, Mish and ELU produced RMSE values of

0.0901

and

0.0848

, respectively, which corresponds to a reduction in precision of 10–

15 %

relative to the best-performing configuration (Logish, RMSE

= 0.0799

). In contrast, exponential-based or smooth polynomial activation functions such as Logish, Softplus, and SiLU maintained or improved accuracy, confirming that internal nonlinearity plays a significant role in stabilizing temperature dynamics modeling.

Rather than proposing a new lightweight architecture, this work demonstrates that activation-function optimization provides an orthogonal and complementary pathway to efficiency, applicable to both standard and compressed recurrent models.

Regarding suitability for predictive maintenance on embedded platforms, the results suggest a favorable trade-off. The standard peephole LSTM, which is computationally lightweight and requires only a single 32-unit recurrent layer, achieved RMSE values competitive with or superior to the more computationally intensive bidirectional LSTM. The best activation-enhanced peephole LSTM variant improved accuracy by approximately

3 %

while maintaining low model complexity. These findings indicate that modified recurrent neural network cells, especially peephole LSTM variants with carefully chosen activation functions, offer a practical and efficient solution for embedded predictive maintenance systems, providing high accuracy without incurring the computational cost of deeper or bidirectional architectures.

Author Contributions

Conceptualization, A.S. and A.K.; methodology, A.S.; software, M.J.; validation, M.J., A.K. and A.S.; formal analysis, A.K.; investigation, M.J.; resources, M.J.; data curation, M.J.; writing—original draft preparation, A.S.; writing—review and editing, A.K.; visualization, M.J.; supervision, A.S.; project administration, A.S.; funding acquisition, A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset used for this research was initially published as part of a public challenge for detecting wind turbine failures, organized by Energias de Portugal (EDP). The dataset was available to the public on the challenge website at the time of access. The original website is no longer up and running, and the dataset is no longer available to the public. Because of the rules of the challenge and the restrictions on redistributing the data, the authors cannot give you a copy of the dataset. Researchers who are interested can get in touch with EDP to ask about possible access.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LSTM	Long Short-Term Memory
LSTM-FG	Long Short-Term Memory with Forget Gate
BiLSTM	Bidirectional Long Short-Term Memory
SCADA	Supervisory Control and Data Acquisition
FPGA	Field-Programmable Gate Array
GMDH	Group Method of Data Handling
CNN-LSTM	Convolutional Neural Network–Long Short-Term Memory
ECG	Electrocardiography
EMG	Electromyography
GELU	Gaussian Error Linear Unit
MSE	Mean Squared Error
RMSE	Root Mean Squared Error
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error

References

Chong, Y.S.; Goh, W.L.; Ong, Y.S.; Nambiar, V.P.; Do, A.T. Efficient implementation of activation functions for LSTM accelerators. In Proceedings of the 2021 IFIP/IEEE 29th International Conference on Very Large Scale Integration (VLSI-SoC), Singapore, 4–7 October 2021; IEEE: New York, NY, USA, 2021; pp. 1–5. [Google Scholar] [CrossRef]
Yan, G.; Yu, C.; Bai, Y. Wind turbine bearing temperature forecasting using a new data-driven ensemble approach. Machines 2021, 9, 248. [Google Scholar] [CrossRef]
Qian, P.; Tian, X.; Kanfoud, J.; Lee, J.L.Y.; Gan, T.H. A novel condition monitoring method of wind turbines based on long short-term memory neural network. Energies 2019, 12, 3411. [Google Scholar] [CrossRef]
Azzam, B.; Schelenz, R.; Roscher, B.; Baseer, A.; Jacobs, G. Development of a wind turbine gearbox virtual load sensor using multibody simulation and artificial neural networks. Forsch. Ingenieurwesen 2021, 85, 241–250. [Google Scholar] [CrossRef]
Jankauskas, M.; Serackis, A.; Šapurov, M.; Pomarnacki, R.; Baskys, A.; Hyunh, V.K.; Vaimann, T.; Zakis, J. Exploring the limits of early predictive maintenance in wind turbines applying an anomaly detection technique. Sensors 2023, 23, 5695. [Google Scholar] [CrossRef] [PubMed]
Seidel, E.; Franzen, J.; Strake, M.; Fingscheidt, T. Y²-Net FCRN for acoustic echo and noise suppression. arXiv 2021, arXiv:2103.17189. [Google Scholar]
Ali, M.H.E.; Abdel-Raman, A.B.; Badry, E.A. Developing novel activation functions based deep learning LSTM for classification. IEEE Access 2022, 10, 97259–97275. [Google Scholar] [CrossRef]
Rybalkin, V.; Sudarshan, C.; Weis, C.; Lappas, J.; Wehn, N.; Cheng, L. Efficient hardware architectures for 1D-and MD-LSTM networks. J. Signal Process. Syst. 2020, 92, 1219–1245. [Google Scholar] [CrossRef]
Silfa, F.; Arnau, J.M.; González, A. Boosting LSTM performance through dynamic precision selection. In Proceedings of the 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC), Pune, India, 16–19 December 2020; IEEE: New York, NY, USA, 2020; pp. 323–333. [Google Scholar]
Timmons, N.G.; Rice, A. Approximating activation functions. arXiv 2020, arXiv:2001.06370. [Google Scholar]
Parisi, L.; Ma, R.; RaviChandran, N.; Lanzillotta, M. hyper-sinh: An accurate and reliable function from shallow to deep learning in TensorFlow and Keras. Mach. Learn. Appl. 2021, 6, 100112. [Google Scholar] [CrossRef]
Joseph, T.; Bindiya, T. Realization and Hardware Implementation of Gating Units for Long Short-Term Memory Network Using Hyperbolic Sine Functions. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2023, 42, 5141–5145. [Google Scholar] [CrossRef]
Ding, Y.; Yi, Z.; Li, M.; Long, J.; Lei, S.; Guo, Y.; Fan, P.; Zuo, C.; Wang, Y. HI-MViT: A lightweight model for explainable skin disease classification based on modified MobileViT. Digit. Health 2023, 9, 20552076231207197. [Google Scholar] [CrossRef] [PubMed]
Vallés-Pérez, I.; Soria-Olivas, E.; Martínez-Sober, M.; Serrano-López, A.J.; Vila-Francés, J.; Gómez-Sanchís, J. Empirical study of the modulus as activation function in computer vision applications. Eng. Appl. Artif. Intell. 2023, 120, 105863. [Google Scholar] [CrossRef]
Zhang, L.; Li, J.; Walter, N.G. Pretrained Deep Neural Network Kin-SiM for Single-Molecule FRET Trace Idealization. J. Phys. Chem. B 2025, 129, 1167–1175. [Google Scholar] [CrossRef] [PubMed]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Ordóñez, F.J.; Roggen, D. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef]
Xia, K.; Huang, J.; Wang, H. LSTM-CNN Architecture for Human Activity Recognition. IEEE Access 2020, 8, 56855–56866. [Google Scholar] [CrossRef]
Mekruksavanich, S.; Jitpattanakul, A. LSTM Networks Using Smartphone Data for Sensor-Based Human Activity Recognition in Smart Homes. Sensors 2021, 21, 1636. [Google Scholar] [CrossRef]
Mutegeki, R.; Han, D.S. A CNN-LSTM Approach to Human Activity Recognition. In Proceedings of the 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 19–21 February 2020; IEEE: New York, NY, USA, 2020; pp. 362–366. [Google Scholar] [CrossRef]
Yildirim, Ö. A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification. Comput. Biol. Med. 2018, 96, 189–202. [Google Scholar] [CrossRef]
Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Two-Stage Monitoring of Patients in Intensive Care Unit for Sepsis Prediction Using Non-Overfitted Machine Learning Models. Electronics 2020, 9, 1133. [Google Scholar] [CrossRef]
Cicėnas, B.; Abromavičius, V. Investigation of Pneumonia Detection using Convolutional Neural Networks. In Proceedings of the 2022 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lithuania, 21 April 2022; IEEE: New York, NY, USA, 2022; pp. 1–4. [Google Scholar] [CrossRef]
Kapustynska, V.; Abromavičius, V.; Serackis, A.; Paulikas, Š.; Ryliškienė, K.; Andručkevičius, S. Machine Learning and Wearable Technology: Monitoring Changes in Biomedical Signal Patterns during Pre-Migraine Nights. Healthcare 2024, 12, 1701. [Google Scholar] [CrossRef]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Continual Prediction using LSTM with Forget Gates. In Neural Nets WIRN Vietri-99, Proceedings of the 11th Italian Workshop on Neural Nets, Vietri Sul Mare, Salerno, Italy, 20–22 May 1999; Marinaro, M., Tagliaferri, R., Eds.; Springer: London, UK, 1999; pp. 133–138. [Google Scholar] [CrossRef]
Zhang, P.; Li, C.; Peng, C.; Tian, J. Ultra-Short-Term Prediction of Wind Power Based on Error Following Forget Gate-Based Long Short-Term Memory. Energies 2020, 13, 5400. [Google Scholar] [CrossRef]
Chien, H.Y.S.; Turek, J.S.; Beckage, N.; Vo, V.A.; Honey, C.J.; Willke, T.L. Slower is Better: Revisiting the Forgetting Mechanism in LSTM for Slower Information Decay. arXiv 2021, arXiv:2105.05944. [Google Scholar] [CrossRef]
Gers, F.A.; Schraudolph, N.N.; Schmidhuber, J. Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 2003, 3, 115–143. [Google Scholar] [CrossRef][Green Version]
Schuster, M. Acoustic model building based on non-uniform segments and bidirectional recurrent neural networks. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, 21–24 April 1997; IEEE: New York, NY, USA, 1997; Volume 4, pp. 3249–3252. [Google Scholar] [CrossRef]
Rezaee, K.; Khavari, S.F.; Ansari, M.; Zare, F.; Roknabadi, M.H.A. Hand gestures classification of semg signals based on bilstm-metaheuristic optimization and hybrid u-net-mobilenetv2 encoder architecture. Sci. Rep. 2024, 14, 31257. [Google Scholar] [CrossRef] [PubMed]
Supratak, A.; Dong, H.; Wu, C.; Guo, Y. DeepSleepNet: A Model for Automatic Sleep Stage Scoring Based on Raw Single-Channel EEG. IEEE Trans. Neural Syst. Rehabil. Eng. 2017, 25, 1998–2008. [Google Scholar] [CrossRef]
Bharatheedasan, K.; Maity, T.; Kumaraswamidhas, L.; Durairaj, M. An intelligent of fault diagnosis and predicting remaining useful life of rolling bearings based on convolutional neural network with bidirectional lstm. Sādhanā 2023, 48, 131. [Google Scholar] [CrossRef]
Cao, L.; Qian, Z.; Zareipour, H.; Huang, Z.; Zhang, F. Fault Diagnosis of Wind Turbine Gearbox Based on Deep Bi-Directional Long Short-Term Memory Under Time-Varying Non-Stationary Operating Conditions. IEEE Access 2019, 7, 155219–155228. [Google Scholar] [CrossRef]
Portugal, E.D. Wind Turbine Failure Detection. 2019. Available online: https://opendata.edp.com/pages/challenges/#description (accessed on 1 June 2022).

Figure 1. Latency vs. RMSE scatter plot for FP32 and INT8 quantized models. Annotated values indicate model size (KB). The minimal vertical shift between blue (FP32) and red (INT8) markers confirms that quantization does not degrade predictive accuracy (

R M S E

stable).

Figure 1. Latency vs. RMSE scatter plot for FP32 and INT8 quantized models. Annotated values indicate model size (KB). The minimal vertical shift between blue (FP32) and red (INT8) markers confirms that quantization does not degrade predictive accuracy (

R M S E

stable).

Figure 2. RMSE comparison of LSTM variants across turbines for generator bearing tempera- ture prediction.

Figure 3. RMSE comparison for baseline models and peephole LSTM variants with different activation functions on turbine T01. The worst special configuration (peephole LSTM with misconfigured Snake activation) is excluded to preserve scale readability.

Table 1. Summary of the EDP wind turbine dataset and experimental parameters.

Attribute	Description
Turbine Model	Vestas V90 (2 MW, Onshore)
Number of Turbines	5 (T01, T06, T07, T09, T11)
Time Period	2014–2015 (2 Years)
Sampling Rate	10 min SCADA averages
Total Features Available	57
Features Selected	9 (via Mutual Information & Correlation)
Target Variable	Generator Bearing Temperature (`Gen_Bear_Temp_Avg`)
Training Samples	Max 62,208 per turbine
Split Strategy	Chronological Hold-out

Table 2. Computational efficiency comparison: proposed approach vs. lightweight baseline (proxy metrics).

Model Architecture	Hidden Size	Params	Size (KB)	FLOPs/Sample	RMSE	$R^{2}$	CPU Latency (ms)
Proposed Peephole Baseline	32	5665	22.13	≈645 K	0.0821	0.8415	11.70
Mainstream TinyLSTM	16	1745	6.82	≈200 K	0.0553	0.9265	0.45

Table 3. Impact of INT8 quantization on model size and accuracy (embedded proxy).

Model	Format	Size (KB)	Accuracy ( $R^{2}$ )	Latency (ms) *
Peephole (Logish)	FP32	27.82	0.8271	13.89
Peephole (Logish)	INT8	18.43	0.8270	106.67
TinyLSTM	FP32	9.38	0.8164	0.60
TinyLSTM	INT8	5.81	0.8160	8.24

* The increased INT8 latency observed on desktop CPUs is a known artifact of software-based quantization and tensor format conversion. On embedded microcontrollers and DSPs with native INT8 support (e.g., ARM Cortex-M with CMSIS-NN), INT8 inference typically achieves 2×–4× speedups and reduced energy consumption compared to FP32.

Table 4. Performance of LSTM variants for generator bearing temperature prediction.

Turbine	Model	MSE	MAE	RMSE	$R^{2}$	MAPE (%)
	LSTM-FG	0.007845	0.049178	0.088571	0.815423	17.33
T09	Peephole	0.006681	0.042991	0.081737	0.842807	14.81
	BiLSTM	0.006621	0.044040	0.081369	0.844219	15.40
	LSTM-FG	0.006509	0.044625	0.080676	0.846860	15.72
T07	Peephole	0.006375	0.045438	0.079842	0.850010	15.25
	BiLSTM	0.007049	0.044685	0.083956	0.834157	15.72
	LSTM-FG	0.007420	0.046163	0.086141	0.825410	16.33
T11	Peephole	0.007507	0.047833	0.086640	0.823381	16.30
	BiLSTM	0.006753	0.045652	0.082178	0.841106	16.12
	LSTM-FG	0.007449	0.046807	0.086307	0.824737	16.18
T06	Peephole	0.006583	0.043307	0.081139	0.845100	14.98
	BiLSTM	0.007591	0.046228	0.087124	0.821403	16.19
	LSTM-FG	0.006721	0.044640	0.081981	0.841867	15.48
T01	Peephole	0.006399	0.046954	0.079994	0.849438	16.03
	BiLSTM	0.006783	0.044977	0.082360	0.840400	15.77

Table 5. Performance comparison of peephole LSTM with alternative activation functions and baseline models (T01).

Model/Activation	RMSE	MSE	MAE	$R^{2}$	MAPE (%)
LSTM with Forget Gate	0.08668	0.00745	0.04681	0.82474	16.18
Baseline Peephole LSTM	0.08208	0.00674	0.04569	0.84151	16.12
Baseline BiLSTM	0.08058	0.00649	0.04463	0.84686	15.72
Peephole LSTM (TANH)	0.08200	0.00673	0.04560	0.84166	15.94
Peephole LSTM (ReLU)	0.08146	0.00663	0.04490	0.84408	15.82
Peephole LSTM (LeakyReLU)	0.08063	0.00651	0.04447	0.84660	15.59
Peephole LSTM (ELU)	0.08478	0.00719	0.04709	0.83143	16.65
Peephole LSTM (Softplus)	0.08194	0.00671	0.04558	0.84193	15.97
Peephole LSTM (GELU)	0.08369	0.00701	0.04648	0.83518	16.32
Peephole LSTM (Swish)	0.08576	0.00736	0.04737	0.82769	16.79
Peephole LSTM (Mish)	0.09009	0.00811	0.04948	0.80904	17.39
Peephole LSTM (SiLU)	0.08269	0.00684	0.04609	0.83929	16.08
Peephole LSTM (Snake)	0.08071	0.00652	0.04451	0.84633	15.61
Peephole LSTM (Smish)	0.08319	0.00692	0.04622	0.83718	16.15
Peephole LSTM (Logish)	0.07990	0.00638	0.04417	0.84976	15.43

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jankauskas, M.; Katkevičius, A.; Serackis, A. Investigation of Exponent-Free LSTM Cells for Virtual Sensing Applications. Electronics 2026, 15, 576. https://doi.org/10.3390/electronics15030576

AMA Style

Jankauskas M, Katkevičius A, Serackis A. Investigation of Exponent-Free LSTM Cells for Virtual Sensing Applications. Electronics. 2026; 15(3):576. https://doi.org/10.3390/electronics15030576

Chicago/Turabian Style

Jankauskas, Mindaugas, Andrius Katkevičius, and Artūras Serackis. 2026. "Investigation of Exponent-Free LSTM Cells for Virtual Sensing Applications" Electronics 15, no. 3: 576. https://doi.org/10.3390/electronics15030576

APA Style

Jankauskas, M., Katkevičius, A., & Serackis, A. (2026). Investigation of Exponent-Free LSTM Cells for Virtual Sensing Applications. Electronics, 15(3), 576. https://doi.org/10.3390/electronics15030576

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Investigation of Exponent-Free LSTM Cells for Virtual Sensing Applications

Abstract

1. Introduction

2. State of the Art

3. Materials and Methods

3.1. LSTM Variants and Bidirectional LSTM

3.2. Alternative Activation Functions

3.3. Experimental Setup

4. Results and Discussion

4.1. Horizontal Comparison with Lightweight Baseline

4.2. Deployment-Oriented Quantization (INT8) Benchmark

4.3. Performance Comparison of LSTM Variants

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI