mTanh: A Low-Cost Inkjet-Printed Vanishing Gradient Tolerant Activation Function

Akter, Shahrin; Haider, Mohammad Rafiqul

doi:10.3390/jlpea15020027

Open AccessArticle

mTanh: A Low-Cost Inkjet-Printed Vanishing Gradient Tolerant Activation Function^†

by

Shahrin Akter

^* and

Mohammad Rafiqul Haider

Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, USA

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in 2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

J. Low Power Electron. Appl. 2025, 15(2), 27; https://doi.org/10.3390/jlpea15020027

Submission received: 24 March 2025 / Revised: 23 April 2025 / Accepted: 30 April 2025 / Published: 2 May 2025

(This article belongs to the Special Issue Advancements in Low-Power Ubiquitous Sensing, Computing, and Communication Interfaces for IoT: Circuits, Systems, and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Inkjet-printed circuits on flexible substrates are rapidly emerging as a key technology in flexible electronics, driven by their minimal fabrication process, cost-effectiveness, and environmental sustainability. Recent advancements in inkjet-printed devices and circuits have broadened their applications in both sensing and computing. Building on this progress, this work has developed a nonlinear computational element coined as mTanh to serve as an activation function in neural networks. Activation functions are essential in neural networks as they introduce nonlinearity, enabling machine learning models to capture complex patterns. However, widely used functions such as Tanh and sigmoid often suffer from the vanishing gradient problem, limiting the depth of neural networks. To address this, alternative functions like ReLU and Leaky ReLU have been explored, yet these also introduce challenges such as the dying ReLU issue, bias shifting, and noise sensitivity. The proposed mTanh activation function effectively mitigates the vanishing gradient problem, allowing for the development of deeper neural network architectures without compromising training efficiency. This study demonstrates the feasibility of mTanh as an activation function by integrating it into an Echo State Network to predict the Mackey–Glass time series signal. The results show that mTanh performs comparably to Tanh, ReLU, and Leaky ReLU in this task. Additionally, the vanishing gradient resistance of the mTanh function was evaluated by implementing it in a deep multi-layer perceptron model for Fashion MNIST image classification. The study indicates that mTanh enables the addition of 3–5 extra layers compared to Tanh and sigmoid, while exhibiting vanishing gradient resistance similar to ReLU. These results highlight the potential of mTanh as a promising activation function for deep learning models, particularly in flexible electronics applications.

Keywords:

activation function; deep neural network; inkjet-printing technology; neural network; vanishing gradient problem

Graphical Abstract

1. Introduction

In recent decades, silicon-based electronics have achieved unparalleled performance and integration density compared to alternative electronic technologies. While silicon electronics excel in delivering low-power, high-performance solutions, they are now facing fundamental limitations in further miniaturization, as dictated by Moore’s Law [1]. Additionally, the rigid nature of silicon substrates restricts their adaptability for emerging applications that demand flexibility. Simultaneously, the form factor of electronic devices is evolving from rigid structures to flexible, stretchable, and bendable electronics, particularly for sensing applications designed to interface with biological systems and the natural environment [2,3,4,5]. A key enabler of this new wave of flexible electronics is inkjet-printing technology, which is gaining traction due to its minimal fabrication complexity and low cost. The simplicity of inkjet-printing facilitates rapid design iterations, making it highly suitable for prototyping, while its ease of disposal enhances environmental sustainability. Despite these advantages, inkjet-printed (IJP) electronics face several major challenges, including high operating voltage, lower performance, and limited lifespan. In contrast, silicon-based electronics continue to deliver unmatched performance, reliability, and efficiency, albeit constrained by their rigid substrates and the costly, complex fabrication process. This introduction of IJP electronics at the peak of silicon’s dominance is reminiscent of the transition from vacuum tubes to transistors. Initially, transistors struggled to compete with well-established vacuum tubes, but relentless research and development eventually made them the superior technology. Likewise, continuous advancements in flexible electronics could position them as a viable alternative to silicon in the future, marking a significant shift in the evolution of electronic devices. High flexibility and comfort make IJP electronics an ideal platform for the development of personalized sensors and wearable devices [6,7,8,9]. Despite their advantages, IJP sensors often exhibit lower performance compared to traditional silicon-based electronic sensors. To address this limitation, researchers have increasingly incorporated artificial intelligence (AI) techniques to enhance the functionality of IJP sensors, making them more compatible with conventional silicon-based counterparts [4,10,11,12,13]. Beyond sensor applications, IJP electronics have also been explored for computational devices, including transistors [14,15,16] and memristors for neuromorphic computing [17,18,19,20,21]. Notably, in [22], researchers implemented an artificial neuron using inkjet printing technology, incorporating a hyperbolic sine activation function. However, a significant drawback of this approach was the high operating voltage required for functionality—one of the common limitations of IJP electronic devices.

As an extension of our previous work [23], in this work, an IJP device is developed that operates at a low voltage and exhibits reverse-path I-V characteristics that closely resemble the traditional Tanh function, with additional pinch-off behavior. This device, named mTanh, serves as a custom activation function for neural networks and demonstrates its feasibility in predicting the Mackey–Glass time series signal using an Echo State Network. Furthermore, mTanh effectively mitigates the vanishing gradient problem in deep neural networks, enabling the classification of the Fashion-MNIST dataset using a deep Multilayer Perceptron (MLP) without compromising model accuracy, and it enables one to add many more additional layers than Tanh and sigmoid activation functions.

The structure of this paper is organized as follows: Section 2 introduces fundamental concepts related to inkjet-printing technology, activation functions, and the vanishing gradient problem of neural networks. Section 3 presents the development of the IJP activation function, mTanh. The experimental setup and results are detailed in Section 4 and Section 5, respectively. Finally, the conclusion is outlined in Section 6.

2. Preliminaries

This work focuses on developing an IJP nonlinear element that functions as an activation function in neural networks. This activation function is designed to mitigate the vanishing gradient problem commonly encountered in deep neural networks, thereby enhancing their performance. In the preliminary section, the principles of inkjet-printing technology will be introduced, followed by an explanation of the role and significance of activation functions in neural networks. Additionally, the vanishing gradient problem, a major challenge in deep learning, will be discussed in detail.

2.1. Inkjet-Printing Technology

Inkjet-printing is a versatile fabrication technique where liquid material or ink is ejected from a print head in the form of droplets through a nozzle. The deposited material, typically in the form of a chemical solution, solidifies on the substrate after jetting. The resolution of inkjet-printing is influenced by factors such as nozzle spacing, print head design, and the firing frequency of the ink jets. The primary mechanisms used for ink ejection include piezoelectric, thermal, and electrohydrodynamic methods. The inkjet-printing process generally consists of three fundamental steps:

1.: pattern design—creating the desired pattern using an editing tool;
2.: printing—using an inkjet printer to deposit the ink onto the substrate;
3.: curing—solidifying or treating the printed element to ensure stability.

Additional processing steps can be incorporated based on design requirements. These may include spin coating for improved uniformity, plasma oxidation for enhanced ink adhesion, or sonication for better ink processing. Furthermore, a variety of materials, such as silver nanoparticles, hexagonal boron nitride, graphene, and molybdenum disulfide, can be used as inks depending on the intended application.

Inkjet-printing offers several advantages, including minimal fabrication steps, low manufacturing costs, flexibility in substrate selection, and maskless processing, making it a potential competitor of the traditional semiconductor industry. Its cost-effectiveness and streamlined process enable rapid prototyping, multiple design iterations, and large-scale production of printed elements. These benefits position inkjet-printing as a competitive alternative to traditional silicon-based semiconductor manufacturing, particularly for sensors and electronic devices.

2.2. Activation Function

Neural networks are designed to learn the complex relationship of a dataset. The weights and biases in a neural network establish a linear relationship within the data. However, to capture complex patterns and non-linearity, activation functions are essential. These functions enable artificial neurons to transform inputs nonlinearly, allowing the network to learn intricate relationships that a single neuron cannot achieve on its own. Commonly used activation functions include the hyperbolic tangent (Tanh), rectified linear unit (ReLU), sigmoid, and softmax, as shown in Figure 1. In addition to these, various modifications of activation functions are employed for specific tasks in neural networks. The ReLU function outputs the input value directly when it is positive but returns zero for negative inputs. A variant, leaky ReLU introduces a small, non-zero slope for negative values, preventing neurons from becoming inactive. The sigmoid function maps inputs to a range between 0 and 1, making it useful for probability-based tasks. Similarly, Tanh outputs values between −1 and 1, offering a centered activation. However, a limitation of both sigmoid and Tanh is their gradual transition from high to low values, which may obscure subtle variations in input data. Other functions, such as the hyperbolic sine function, exhibit similar nonlinear behavior and are used in specific applications to enhance learning dynamics. By incorporating these activation functions, neural networks gain the ability to model highly complex relationships in data, making them powerful tools for machine learning and artificial intelligence.

2.3. Vanishing Gradient Problem

In supervised learning models, the primary objective of the backpropagation algorithm is to optimize the model by minimizing the cost function, typically achieved using the gradient descent method. During the weight and bias update process, one of the key factors is the partial gradient of the cost function. When multiple gradient values are significantly less than one, they lead to a substantial reduction in the weight update magnitude. This phenomenon, known as the vanishing gradient problem [24,25], poses a major challenge in training deep neural networks, as it hinders effective learning in the deeper layers.

Two of the most commonly used activation functions in neural networks are Tanh and sigmoid. Both functions perform effectively in shallow neural networks due to their symmetry around zero, which helps in balanced gradient flow. However, in deep neural networks, their effectiveness is significantly reduced due to the vanishing gradient problem. The Tanh function maps large-scale inputs into the range [−1,1], while the sigmoid function does so in the range [0,1], providing a nonlinear and noise-robust representation. However, when the input falls into the saturation region, both functions yield very small gradient values close to zero. This results in slower updates of weights and biases during training.

The Tanh function and its derivative are represented by Equations (1) and (2), respectively, with their corresponding waveforms illustrated in Figure 2. As observed in Figure 2, the derivative of Tanh attains its maximum value of 1 at x = 0, but diminishes towards zero beyond the range [−3,3].

tanh (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(1)

\frac{d}{d x} tanh (x) = 1 - {tanh}^{2} (x)

(2)

Similarly, the sigmoid function and its derivative are given by Equations (3) and (4), respectively. From Figure 2, it can be noted that the sigmoid derivative reaches its peak at x = 0, but declines to near zero beyond the range [−5,5]. This leads to the vanishing gradient problem, where the weight updates become increasingly smaller as the network depth increases. Consequently, backpropagation learning slows down exponentially, affecting both weight and bias updates. In extreme cases, weight updates may cease entirely, limiting the learning capability of very deep models and ultimately restricting neural network performance.

σ (x) = \frac{1}{1 + e^{- x}}

(3)

\frac{d}{d x} σ (x) = σ (x) (1 - σ (x))

(4)

Several approaches have been proposed to address the vanishing gradient problem. In 2006, researchers introduced the Rectified Linear Unit (ReLU) [26] as a solution, defining it with an output of zero for negative input values and a slope of one for positive inputs. However, ReLU suffers from the “dying ReLU” problem, where a neuron consistently outputs zero for all inputs during training. This issue arises because the gradient of ReLU is zero for all negative inputs, making it difficult for the affected neurons to recover. If a neuron continuously receives negative gradients, it effectively becomes inactive, or “dead”, during training.

To mitigate this issue, researchers introduced the Leaky ReLU [27], which incorporates a small nonzero slope for negative input values. While this modification prevents neurons from becoming completely inactive, it introduces the problem of non-uniform gradient distribution for negative inputs. A very small slope in the negative region can still lead to the vanishing gradient problem. Moreover, both ReLU and Leaky ReLU are susceptible to the “exploding gradient problem” [28], where gradients become unbounded for large positive input values, leading to instability during training.

To address these challenges, this paper presents a novel iJP nonlinear element that functions as an activation function mechanism, effectively mitigating the vanishing gradient problem.

3. Inkjet-Printed Activation Function

This study presents a nonlinear computational element fabricated using inkjet-printing technology. This element is suitable for use as an activation function in neural network models. Furthermore, the proposed activation function effectively mitigates the problem of the disappearance of the gradient. In this section, the nonlinear element is demonstrated along with its fabrication process and IV characteristics.

3.1. System Architecture

At the center of the non-linear element, there are 13 square blocks, each with 1.5 mm sides, arranged alongside 8 triangular blocks of the same length along two sides. The dimensions and shapes of these square blocks can be modified as needed. All these shapes are evenly spaced, maintaining a 0.2 mm gap between them—a constraint influenced by the printer’s resolution limit due to the ink bleed effect in printing. Extending outward from the central region of the non-linear element, eight connectors are integrated, each incorporating a square block measuring 5 mm in length to facilitate electrical connections. Figure 3 provides a detailed representation of the specific dimensions and structural layout of the non-linear computational element.

3.2. Fabrication Process

The component was initially designed using Microsoft Publisher and subsequently transferred onto a 135 μm thick polyethylene terephthalate (PET) film via a drop-on-demand piezoelectric office-quality inkjet printer employing silver nanoparticle ink. The silver nanoparticle ink used in this study serves as a conductor for the non-linear computing element and is formulated in a noncombustible, eco-friendly aqueous solution. Notably, it is non-toxic to humans and does not require sintering during fabrication. The nanoparticles have an approximate diameter of 20 nm, and the ink’s viscosity is optimized for seamless deposition through inkjet printer nozzles. Following the printing process, the component underwent a curing phase on a hotplate for a specific duration. Subsequently, a hexagonal boron nitride (hBN) layer, serving as a dielectric, was applied at the component’s junction and subjected to further curing using a hot air gun. The deposition of hBN was meticulously executed using a precision pipette; however, minor variations in thickness and uniformity were observed, likely due to manual handling. This material functions as an insulating substrate, which is essential for subsequent fabrication steps. The final stage involved making a precise cut at the junction of the element. A significant challenge in the curing step of inkjet-printing technology is the coffee-ring effect, wherein nanoparticles accumulate at the edges of the dried ink, leading to non-uniform deposition. This inconsistency can impact the reliability and repeatability of the fabricated samples. To mitigate this issue, a well was etched into the hBN layer across the gap.

The flowchart in Figure 3 illustrates the sequential steps of the inkjet-printing process and provides images of the component at various stages of fabrication.

3.3. Characterization

The I-V characteristics of the non-linear computational element were evaluated using a Keithley 2604B Dual Channel Source Measure Unit. The unit performed a voltage sweep from −5 to +5 V, followed by a return sweep to −5 V, with the cycle being repeated twice to ensure the stability of the current response and to examine potential hysteresis effects. The resulting I-V curve for the iJP non-linear computational element is illustrated in Figure 4. The curve distinctly reveals that, within a narrow voltage range of [−1,1] V, the element exhibits a current transition spanning [−0.8,0.8] nA. Notably, the reverse path of the hysteresis loop closely resembles the hyperbolic tangent function, displaying two pinch-off points during the transition. These pinch-offs are a constant behavior in I-V characteristics, as in multiple hysteresis loops, these pinch-offs are remaining constant. These pinch-offs are caused by charge trapping in the Hbn layer. Recent studies suggest that the observed nonlinearity in the I-V response arises from hopping conduction in the silver nanoparticle ink. This unique electrical behavior underscores the element’s potential application in analog neural networks, as highlighted in existing literature [29].

3.4. Curve Fitting

The observed backward I-V curve of the element could potentially serve as an activation function for neural networks, offering an alternative to the widely used hyperbolic tangent function. The unique pinch-off characteristics observed during the transition could yield enhanced sensitivity in that region. Figure 4 presents the I-V characteristics of the non-linear element alongside the optimally fitted curve for the proposed custom activation function. To fit the custom activation function to the I-V characteristics of the non-linear element, the mathematical Tanh function is modified with some additional constant values and coefficients to control the slope and position of the Tanh function. Along with this, the pinch-off of the I-V curve is replicated in the custom activation function with the Gaussian function, where position and spread of the pinch-off is controlled by the mean and variance of the Gaussian function. Moreover, the height of the pinch-off is replicated by adding a factor in the Gaussian function. The custom activation function is defined by (5).

f (x) = t (x) \times g_{1} (x) \times g_{2} (x)

(5)

where t(x) is the modified Tanh function,

g_{1} (x)

and

g_{2} (x)

are the Gaussian functions to replicate the pinch-off in the activation function.

t (x) = a tanh (b x + p) + d x + c

(6)

g_{1} (x) = h_{1} e^{- {(x - μ_{1})}^{2} / 2 σ_{1}^{2}} + m

(7)

g_{2} (x) = h_{2} e^{- {(x - μ_{2})}^{2} / 2 σ_{2}^{2}} + m

(8)

Here, constants a, b, c, d, and p control the position and slope of the Tanh function. The

μ_{1}, μ_{2}

mean of both Gaussian functions control the position of the pinch-off, whereas the

σ_{1}, σ_{2}

variance of the Gaussian function controls the width of the pinch-off, and

h_{1}, h_{2}

and m control their height and position.

4. Experiment with Neural Networks

In this section, the nonlinear computation element is implemented as a customized activation function, named as mTanh in an Echo State Network to evaluate its feasibility for neural network applications. Additionally, it is integrated into a deep neural network to assess its resistance to the vanishing gradient problem.

4.1. Neural Network Feasibility Study

To check the functionality of the IJP nonlinear element as a customized activation function in the neural network, this nonlinear element is used in an Echo State Network (ESN) as an activation function to predict the Mackey–Glass time series signal from the benchmark time series dataset.

4.1.1. Echo State Network

An ESN is a type of recurrent neural network with reservoir computing [30,31,32,33,34,35]. An ESN has three layers: an input layer, a reservoir, and an output layer which is also known as a readout layer. The input layer and the reservoir transform the data into a high-dimensional space. Weights and biases of the input layer and reservoir select randomly in the initial phase of training and remain constant during the whole training phase. The only trainable layer in an ESN is an output layer, which is designed with simple learning algorithms such as linear regression. The reservoir and output state is updated by the following Equations (9) and (10):

x (n + 1) = f (W_{i n} u (n + 1) + W_{r e s} x (n))

(9)

y (n + 1) = W_{o} x (n + 1)

(10)

where u(n), x(n), and y(n) are the input, reservoir, and output state vector, respectively.

W_{i n}

,

W_{r e s}

, and

W_{o}

are the input, reservoir, and output weight matrices, respectively. f(.) is the activation function used to introduce nonlinearity in the ESN. ESNs are gaining popularity because of their simple learning process. An ESN predictive model is set up with the help of the PyRCN library [36]. Initially, the baseline hyperparameters of the ESN are established along with the integration of the custom activation function. Prior to the application as an activation function, IV characteristics are normalized.

4.1.2. Hyperparameter Optimization

Hyperparameter optimization is a crucial process for identifying the most effective combination of hyperparameters to achieve optimal model performance on a given dataset. The two most commonly used strategies for hyperparameter optimization are grid search and random search. In grid search, the hyperparameter space is discretized into a finite number of equally partitioned subregions. The search process systematically evaluates all possible combinations within this grid to determine the most effective hyperparameter values for the model. In contrast, random search selects hyperparameter values randomly from a predefined grid. The model is trained using these randomly chosen combinations, and the most effective set of hyperparameters is identified based on performance metrics. The effectiveness of these optimization methods depends on several factors, including the breadth of the hyperparameter space, model complexity, and available computational resources. The ESN model begins with a predefined set of parameters, including the activation function, hidden layer size, learning rate, and bias scaling. A systematic search is then conducted in three stages to optimize these parameters. In the first stage, the spectral radius and input scaling are explored within a range from 0 to 1.5. The second stage involves a grid search for the optimal leakage parameter within the range from 0 to 1. Finally, in the third stage, a grid search is performed to determine the best bias scaling value within the range from 0 to 1.5. The optimized ESN hyperparameters are presented in Figure 5.

4.1.3. Dataset

To evaluate the feasibility of an IJP nonlinear element as an activation function, an Echo State Network (ESN) is employed to predict the Mackey–Glass time series signal [37]. This signal serves as a benchmark in machine learning and is generated using the Mackey–Glass equation, a nonlinear time-delay equation. The ESN is trained on 990 time frames and tested on 90 time frames. Each time frame comprises 10 values, with the ESN model predicting the 11th value in the time series.

4.2. Vanishing Gradient Resistance

As is evident from the discussion of Section 2.3, the main reason behind the vanishing gradient problem is that the derivative of the activation function becomes zero in the saturation region of the activation function. Custom activation function mTanh can be expressed as (5), and its derivative can be expressed by the following equation:

f^{'} (x) = t^{'} (x) \times g_{1} (x) \times g_{2} (x) + t (x) \times g_{1}^{'} (x) \times g_{2} (x) + t (x) \times g_{1} (x) \times g_{2}^{'} (x)

(11)

where t’(x) is the derivative of modified Tanh function,

g_{1}^{'} (x)

and

g_{2}^{'} (x)

are the derivatives of Gaussian functions, used to replicate pinch-offs in the activation function.

t^{'} (x) = a b (1 - {tanh}^{2} (b x + p)) + c

(12)

g_{1}^{'} (x) = \frac{x - μ_{1}}{σ_{1}^{2}} g_{1} (x)

(13)

g_{2}^{'} (x) = \frac{x - μ_{2}}{σ_{2}^{2}} g_{2} (x)

(14)

In Figure 6, the customized activation function mTanh along with its derivative is shown and is found from (11). From this figure, it is evident that, in the saturation period of this customized activation function, it has a nonzero derivative value, which helps mTanh to resist the vanishing gradient problem.

4.2.1. Multi-Layer Perceptron (MLP)

To evaluate the resistance of the vanishing gradient problem, the mTanh activation function is employed in a deep neural network and compared with conventional activation functions such as Tanh, Sigmoid, and ReLU. An MLP, a widely used feedforward backpropagation-based fully connected deep neural network, is utilized for this study. An MLP is structured in layers and is well-suited for nonlinear classification tasks.

An MLP consists of three primary layers: an input layer, a hidden layer, and an output layer. The input layer is responsible for feeding the dataset into the network, with the number of neurons depending on the number of input features. The hidden layer comprises multiple layers of neurons situated between the input and output layers. Each neuron in the hidden layer receives an input from every neuron in the preceding layer and passes its output to all neurons in the next layer. The output layer contains neurons that generate the final output, with the number of neurons determined by the nature of the classification task.

In this study, an MLP model is designed to classify an image dataset with 10 classes. The input layer flattens the images to make them suitable for MLP processing. Each hidden layer consists of 10 neurons, while the output layer comprises 10 neurons to match the number of classes in the dataset. The number of hidden layers is varied from 1 to 13 to examine the resistance to the vanishing gradient problem. The weights and biases of the hidden and output layers are initialized using the Xavier initializer to enhance model efficiency. The Softmax activation function is used in the output layer, while mTanh and other conventional activation functions are applied in the hidden layers to compare performance. Additional hyperparameters of the MLP model are summarized in Table 1.

4.2.2. Dataset

To evaluate the vanishing gradient resistance, a two-dimensional signal in the form of an image dataset is used in the MLP. In this study, the publicly available Fashion MNIST dataset is utilized. It consists of 60,000 training images and 10,000 testing images, all of which are grayscale images with a resolution of

28 \times 28

pixels. This dataset has 10 classes of different fashion apparels.

5. Results and Discussion

For the first experiment, an Echo State Network (ESN) was designed to assess the feasibility of mTanh as an activation function in neural networks by predicting the Mackey–Glass time series signal. Figure 7 shows the original signal along with the predicted signal from the ESN designed with mTanh. To evaluate the regression performance of the ESN, the following two performance metrics were used:

Mean Squared Error (MSE): This is a widely used metric for evaluating how well a model predicts continuous values by computing the average of the squared differences between actual target values and predicted values generated by the network. A lower MSE value indicates better model performance, as it signifies that the predictions are closer to the actual values.

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

where

n is the total number of data points;
$y_{i}$ represents the actual observed value;
${\hat{y}}_{i}$ represents the predicted value.

R-Squared Score: Also known as the coefficient of determination, this metric measures how well the model fits the data. It ranges from 0 to 1, with higher values indicating a better fit.

R^{2} = 1 - \frac{\sum {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum {(y_{i} - \bar{y})}^{2}}

To compare the performance of mTanh with other commonly used activation functions, such as ReLU, Leaky ReLU, and Tanh, a comparative performance analysis was conducted. In this analysis, all hyperparameters of the ESN were kept constant, and only the reservoir activation function was varied among mTanh, Tanh, ReLU, and Leaky ReLU.

In Table 2, the performance comparison is presented. From the table, it is evident that ReLU performs best in predicting the Mackey–Glass time series signal. Although mTanh does not yield the best results for this prediction task, it demonstrates comparable performance with other activation functions. This suggests that mTanh is a feasible activation function for neural networks.

For the second experiment, the vanishing gradient resistance property of mTanh was evaluated in a deep neural network. A generalized MLP was used to design the network, where the number of hidden layers was gradually increased to deepen the model. The test accuracy was then measured for MLPs with different numbers of hidden layers. A similar experiment was conducted using other activation functions, including Tanh, Sigmoid, and ReLU, for comparison. The results of this experiment are presented in Figure 8.

From this figure, it is evident that, after Layer 8, the Sigmoid function begins to exhibit the vanishing gradient problem (VGP), while for Tanh, the VGP starts after Layer 11. However, mTanh maintains a consistent test accuracy up to Layer 13, demonstrating behavior similar to ReLU, which also shows resilience to the VGP. Although the overall accuracy of mTanh is lower compared to other activation functions, its ability to resist VGP is comparable to ReLU. This property of mTanh enables the MLP model to accommodate five additional layers compared to Sigmoid and three additional layers compared to Tanh, making it a promising activation function for deeper networks.

6. Conclusions

This work presents the design of an IJP device exhibiting nonlinear I-V characteristics within an operating voltage range from −5 to 5 V. The reverse path of this nonlinear I-V curve closely resembles the shape of the hyperbolic tangent function, which is widely used as an activation function in neural networks. Based on this reverse path, the mTanh activation function is formulated, making it a feasible activation function for neural networks while also demonstrating resistance to the vanishing gradient problem in deep neural networks. To evaluate its effectiveness as an activation function, mTanh was implemented in an Echo State Network to predict the Mackey–Glass time series signal. Its performance was then compared with conventional activation functions, including Tanh, ReLU, and Leaky ReLU. Although mTanh did not achieve the highest accuracy in predicting the Mackey–Glass time series signal, it demonstrated comparable performance to other activation functions. Furthermore, mTanh’s resistance to the vanishing gradient problem was examined through a Fashion MNIST image classification task using an MLP model. Its performance was compared with Tanh, Sigmoid, and ReLU activation functions. Notably, mTanh enables the addition of 3–5 extra hidden layers compared to Tanh and sigmoid without compromising model accuracy. While its accuracy is lower than that of other activation functions, mTanh exhibits vanishing gradient resistance similar to ReLU in deep neural networks. Additionally, the low operating voltage of mTanh makes it a promising candidate for the flexible electronics implementation of neural networks.

Author Contributions

Conceptualization, S.A. and M.R.H.; methodology, S.A. and M.R.H.; software, S.A.; validation, S.A. and M.R.H.; formal analysis, S.A.; investigation, S.A.; resources, S.A. and M.R.H.; data curation, S.A.; writing—original draft preparation, S.A.; writing—review and editing, S.A. and M.R.H.; supervision, M.R.H.; project administration, M.R.H.; funding acquisition, M.R.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the USA National Science Foundation (NSF) under Grant no. ECCS-2430440. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Data Availability Statement

Data will be available upon request to the authors.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

VGP	Vanishing Gradient Problem
MLP	Multi Layer Perceptron
ESN	Echo State Network
ReLU	Rectified Linear Unit
hBN	Hexagonal Boron Nitride
PET	Polyethylene Terephthalate
IJP	Inkjet-Printed

References

IEEE. Executive Summary. In Proceedings of the IEEE International Roadmap for Devices and Systems; IEEE: New York, NY, USA, 2022. [Google Scholar] [CrossRef]
Gu, C.; Jia, A.B.; Zhang, Y.M.; Zhang, S.X. Emerging Electrochromic Materials and Devices for Future Displays. Chem. Rev. 2022, 122, 14679–14721. [Google Scholar] [CrossRef]
Katiyar, A.K.; Hoang, A.T.; Xu, D.; Hong, J.; Kim, B.J.; Ji, S.; Ahn, J.H. 2D Materials in Flexible Electronics: Recent Advances and Future Prospectives. Chem. Rev. 2024, 124, 318–419. [Google Scholar] [CrossRef]
Sun, T.; Feng, B.; Huo, J.; Xiao, Y.; Wang, W.; Peng, J.; Li, Z.; Du, C.; Wang, W.; Zou, G.; et al. Artificial Intelligence Meets Flexible Sensors: Emerging Smart Flexible Sensing Systems Driven by Machine Learning and Artificial Synapses. Nano-Micro Lett. 2024, 16, 14. [Google Scholar] [CrossRef] [PubMed]
Torres, S.G. Inkjet Printing Next-Generation Flexible Devices: Memristors, Photodetectors and Perovskite LEDs. Ph.D. Thesis, Universitat de Barcelona, Departament d’Enginyeria Electrònica i Biomèdica, Barcelona, Spain, 2024. [Google Scholar]
Dankoco, M.; Tesfay, G.; Bènevent, E.; Bendahan, M. Temperature sensor realized by inkjet printing process on flexible substrate. Mater. Sci. Eng. B 2016, 205, 1–5. [Google Scholar] [CrossRef]
Wang, C.T.; Huang, K.Y.; Lin, D.T.; Liao, W.C.; Lin, H.W.; Hu, Y.C. A flexible proximity sensor fully fabricated by inkjet printing. Sensors 2010, 10, 5054–5062. [Google Scholar] [CrossRef]
Kim, K.; Jung, M.; Kim, B.; Kim, J.; Shin, K.; Kwon, O.S.; Jeon, S. Low-voltage, high-sensitivity and high-reliability bimodal sensor array with fully inkjet-printed flexible conducting electrode for low power consumption electronic skin. Nano Energy 2017, 41, 301–307. [Google Scholar] [CrossRef]
Abdolmaleki, H.; Haugen, A.B.; Merhi, Y.; Nygaard, J.V.; Agarwala, S. Inkjet-printed flexible piezoelectric sensor for self-powered biomedical monitoring. Mater. Today Electron. 2023, 5, 100056. [Google Scholar] [CrossRef]
Akter, S.; Haider, M.R. Impact Localization in Inkjet-Printed Tactile Grid Sensor with Echo State Network. In Proceedings of the 2024 IEEE 67th International Midwest Symposium on Circuits and Systems (MWSCAS), Springfield, MA, USA, 11–14 August 2024; pp. 1070–1074. [Google Scholar]
Gao, J.; Zhu, X.; Fu, Z.; Zhang, W.; Sun, D.; Gu, W. A grid-less flexible tactile sensing system based on deep neural network for two-point localization and shape recognition. IEEE Sens. J. 2024, 24, 18259–18266. [Google Scholar] [CrossRef]
Hou, Y.; Wang, L.; Sun, R.; Zhang, Y.; Gu, M.; Zhu, Y.; Tong, Y.; Liu, X.; Wang, Z.; Xia, J.; et al. Crack-across-pore enabled high-performance flexible pressure sensors for deep neural network enhanced sensing and human action recognition. ACS Nano 2022, 16, 8358–8369. [Google Scholar] [CrossRef]
Akter, S.; Islam, S.; Haider, M.; Opu, M.; Gardner, S.; Pullano, S. A Low-Cost Flexible Inkjet-Printed Echo State Network for Impact Localization. In Proceedings of the 2024 IEEE International Symposium on Medical Measurements and Applications (MeMeA), Eindhoven, The Netherlands, 26–28 June 2024; pp. 1–5. [Google Scholar]
Conti, S.; Lai, S.; Cosseddu, P.; Bonfiglio, A. An Inkjet-Printed, Ultralow Voltage, Flexible Organic Field Effect Transistor. Adv. Mater. Technol. 2017, 2, 1600212. [Google Scholar] [CrossRef]
Molina-Lopez, F.; Gao, T.Z.; Kraft, U.; Zhu, C.; Öhlund, T.; Pfattner, R.; Feig, V.R.; Kim, Y.; Wang, S.; Yun, Y.; et al. Inkjet-printed stretchable and low voltage synaptic transistor array. Nat. Commun. 2019, 10, 2676. [Google Scholar] [CrossRef] [PubMed]
Grubb, M.P.; Subbaraman, H.; Park, S.; Akinwande, D.; Chen, R.T. Inkjet Printing of High Performance Transistors with Micron Order Chemically Set Gaps. Sci. Rep. 2017, 7, 1202. [Google Scholar] [CrossRef] [PubMed]
Adry, T.Z.; Akter, S.; Eliza, S.; Gardner, S.D.; Haider, M.R. An Inkjet-Printed Flexible Memristor Device for Echo State Networks. In Proceedings of the 2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Knoxville, TN, USA, 1–3 July 2024; pp. 740–744. [Google Scholar]
Zhu, K.; Vescio, G.; Gonzalez-Torres, S.; Lopez-Vidrier, J.; Frieiro, J.L.; Pazos, S.; Jing, X.; Gao, X.; Wang, S.D.; Ascorbe-Muruzabal, J.; et al. Inkjet-printed h-BN memristors for hardware security. Nanoscale 2023, 15, 9985–9992. [Google Scholar] [CrossRef] [PubMed]
Franco, M.; Kiazadeh, A.; Deuermeier, J.; Lanceros-Méndez, S.; Martins, R.; Carlos, E. Inkjet printed IGZO memristors with volatile and non-volatile switching. Sci. Rep. 2024, 14, 7469. [Google Scholar] [CrossRef]
Franco, M.; Kiazadeh, A.; Martins, R.; Lanceros-Méndez, S.; Carlos, E. Printed Memristors: An Overview of Ink, Materials, Deposition Techniques, and Applications. Adv. Electron. Mater. 2024, 10, 2400212. [Google Scholar] [CrossRef]
Hu, H.; Scholz, A.; Liu, Y.; Tang, Y.; Marques, G.C.; Aghassi-Hagmann, J. A Fully Inkjet-Printed Unipolar Metal Oxide Memristor for Nonvolatile Memory in Printed Electronics. IEEE Trans. Electron. Dev. 2023, 70, 3051–3056. [Google Scholar] [CrossRef]
Gardner, S.D.; Haider, M.R. An inkjet-printed artificial neuron for physical reservoir computing. IEEE J. Flex. Electron. 2022, 1, 185–193. [Google Scholar] [CrossRef]
Akter, S.; Haider, M.R. A Low-Cost Minimally-Processed Inkjet-Printed Nonlinear Element for Reservoir Computing. In Proceedings of the 2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Knoxville, TN, USA, 1–3 July 2024; pp. 463–468. [Google Scholar]
Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness -Knowl.-Based Syst. 1998, 6, 107–116. [Google Scholar] [CrossRef]
Philipp, G.; Song, D.; Carbonell, J.G. The exploding gradient problem demystified-definition, prevalence, impact, origin, tradeoffs, and solutions. arXiv 2018, arXiv:1712.05577v4. [Google Scholar]
Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the ICML, Atlanta, GA, USA, 17–19 June 2013; Volume 30, p. 3. [Google Scholar]
Pascanu, R. Understanding the exploding gradient problem. arXiv 2012, arXiv:1211.5063. [Google Scholar]
Chen, T.; van Gelder, J.; van de Ven, B.; Amitonov, S.V.; De Wilde, B.; Ruiz Euler, H.C.; Broersma, H.; Bobbert, P.A.; Zwanenburg, F.A.; van der Wiel, W.G. Classification with a disordered dopant-atom network in silicon. Nature 2020, 577, 341–345. [Google Scholar] [CrossRef]
Jaeger, H. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn Ger. Ger. Natl. Res. Cent. Inf. Technol. Gmd Tech. Rep. 2001, 148, 13. [Google Scholar]
Gauthier, D.J.; Bollt, E.; Griffith, A.; Barbosa, W.A.S. Next generation reservoir computing. Nat. Commun. 2021, 12, 5564. [Google Scholar] [CrossRef] [PubMed]
Kong, L.W.; Brewer, G.A.; Lai, Y.C. Reservoir-computing based associative memory and itinerancy for complex dynamical attractors. Nat. Commun. 2024, 15, 4840. [Google Scholar] [CrossRef]
Sun, J.; Li, L.; Peng, H. An image classification method based on Echo State Network. In Proceedings of the 2021 International Conference on Neuromorphic Computing (ICNC), Wuhan, China, 15–17 October 2021. [Google Scholar] [CrossRef]
Wang, S.; Li, Y.; Wang, D.; Zhang, W.; Chen, X.; Dong, D.; Wang, S.; Zhang, X.; Lin, P.; Gallicchio, C.; et al. Echo state graph neural networks with analogue random resistive memory arrays. Nat. Mach. Intell. 2023, 5, 104–113. [Google Scholar] [CrossRef]
Zhong, Y.; Tang, J.; Li, X.; Liang, X.; Liu, Z.; Li, Y.; Yao, P.; Hao, Z.; Gao, B.; Qian, H.; et al. Memristor-based fully analog reservoir computing system for power-efficient real-time spatiotemporal signal processing. Nat. Electron. 2022, 5, 1–10. [Google Scholar] [CrossRef]
Steiner, P.; Jalalvand, A.; Stone, S.; Birkholz, P. PyRCN: A toolbox for exploration and application of Reservoir Computing Networks. Eng. Appl. Artif. Intell. 2022, 113, 104964. [Google Scholar] [CrossRef]
Glass, L.; Mackey, M. Mackey-glass equation. Scholarpedia 2010, 5, 6908. [Google Scholar] [CrossRef]

Figure 1. Most commonly used activation functions including the hyperbolic tangent, sigmoid, ReLU, and Leaky ReLU.

Figure 2. Activation functions and their derivatives.

Figure 3. The fabrication process of the IJP nonlinear element is shown in flowchart along with (a) all relevant dimensions and microscopic images of the nonlinear element. (b) The ink bleed effect causes uneven edges of the nonlinear element (c) The resolution of the printer creates micro level hollows and (d) a sharp cut with a coffee-ring effect.

Figure 4. I-V characteristics of the non-linear computation element and best fitted activation function. Voltage sweep of two cycles ensure the hysteresis properties of the element.

Figure 5. Echo state network architecture along with all the optimized hyperparameters in every layer.

Figure 6. Customized activation function mTanh along with its derivative.

Figure 7. Original and predicted signal for the Mackey–Glass time series signal from the ESN designed with the mTanh activation function.

Figure 8. Vanishing gradient resistance property check with adding multiple layers in a multi-layer perceptron.

Table 1. Hyper parameters for MLP.

Hyper Parameter	Value
Optimizer	Adam
Loss	Categorical cross entropy
Learning rate	0.005
Batch size	10
Validation split	0.2

Table 2. Regression performance comparison of activation functions.

Activation Function	$R^{2}$ Score	Mean Squared Error
ReLU	0.993	$1.4 \times 10^{- 3}$
Leaky ReLU	0.995	$1.02 \times 10^{- 3}$
Tanh	0.989	$2.52 \times 10^{- 3}$
mTanh	0.986	$3.2 \times 10^{- 3}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akter, S.; Haider, M.R. mTanh: A Low-Cost Inkjet-Printed Vanishing Gradient Tolerant Activation Function. J. Low Power Electron. Appl. 2025, 15, 27. https://doi.org/10.3390/jlpea15020027

AMA Style

Akter S, Haider MR. mTanh: A Low-Cost Inkjet-Printed Vanishing Gradient Tolerant Activation Function. Journal of Low Power Electronics and Applications. 2025; 15(2):27. https://doi.org/10.3390/jlpea15020027

Chicago/Turabian Style

Akter, Shahrin, and Mohammad Rafiqul Haider. 2025. "mTanh: A Low-Cost Inkjet-Printed Vanishing Gradient Tolerant Activation Function" Journal of Low Power Electronics and Applications 15, no. 2: 27. https://doi.org/10.3390/jlpea15020027

APA Style

Akter, S., & Haider, M. R. (2025). mTanh: A Low-Cost Inkjet-Printed Vanishing Gradient Tolerant Activation Function. Journal of Low Power Electronics and Applications, 15(2), 27. https://doi.org/10.3390/jlpea15020027

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

mTanh: A Low-Cost Inkjet-Printed Vanishing Gradient Tolerant Activation Function^†

Abstract

1. Introduction