cBP-Tnet: Continuous Blood Pressure Estimation Using Multi-Task Transformer Network with Automatic Photoplethysmogram Feature Extraction

Pimentel, Angelino A.; Huang, Ji-Jer; See, Aaron Raymond A.

doi:10.3390/app15147824

Open AccessArticle

cBP-Tnet: Continuous Blood Pressure Estimation Using Multi-Task Transformer Network with Automatic Photoplethysmogram Feature Extraction

by

Angelino A. Pimentel

^1,2,†

,

Ji-Jer Huang

^1,*,†

and

Aaron Raymond A. See

^3,*,†

¹

Department of Electrical Engineering, Southern Taiwan University of Science and Technology (STUST), Tainan City 710301, Taiwan

²

Department of Electronics Engineering, Saint Mary’s University (SMU), Bayombong 3700, Nueva Vizcaya, Philippines

³

Department of Electronics Engineering, National Chin-Yi University of Technology (NCUT), Taichung City 411030, Taiwan

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(14), 7824; https://doi.org/10.3390/app15147824 (registering DOI)

Submission received: 12 June 2025 / Revised: 25 June 2025 / Accepted: 8 July 2025 / Published: 12 July 2025

Download

Browse Figures

Versions Notes

Abstract

Traditional cuff-based blood pressure (BP) monitoring methods provide only intermittent readings, while invasive alternatives pose clinical risks. Recent studies have demonstrated feasibility of estimating continuous non-invasive cuff-less BP using photoplethysmogram (PPG) signals alone. However, existing approaches rely on complex manual feature engineering and/or multiple model architectures, resulting in inefficient epoch training numbers and limited performance. This research proposes cBP-Tnet, an efficient single-channel and model multi-task Transformer network designed for PPG signal automatic feature extraction. cBP-Tnet employed specialized hyperparameters—integrating adaptive Kalman filtering, outlier elimination, signal synchronization, and data augmentation—leveraging multi-head self-attention and multi-task learning strategies to identify subtle and shared waveform patterns associated with systolic blood pressure (SBP) and diastolic blood pressure (DBP). We used the MIMIC-II public dataset (500 patients with 202,956 samples) for experimentation. Results showed mean absolute errors of 4.32 mmHg for SBP and 2.18 mmHg for DBP. For the first time, both SBP and DBP meet the Association for the Advancement of Medical Instrumentation’s international standard (<5 mmHg, >85 subjects). Furthermore, the network efficiently reduces the epoch training number by 13.67% when compared to other deep learning methods. Thus, this establishes cBP-Tnet’s potential for integration into wearable and home-based healthcare devices with continuous non-invasive cuff-less blood pressure monitoring.

Keywords:

non-invasive continuous cuff-less blood pressure estimation; photoplethysmogram; multi-task learning transformer; automatic PPG feature extraction; deep learning

1. Introduction

High blood pressure (hypertension) is a prominent cause of death and disability worldwide. Between 1990 and 2019, the number of people with hypertension (blood pressure of ≥140

mmHg

systolic or ≥90

mmHg

diastolic) or on medication increased from 650 million to 1.3 billion [1]. Unfortunately, most patients with hypertension are unaware of their ailment, although it silently affects their internal body organs (e.g., brain, eyes, kidneys, and viscus), which is why it is known as a silent killer [2]. Therefore, accurate, continuous, beat-to-beat, blood pressure estimation is crucial for preventing heart disease and improving human health.

Traditional blood pressure estimation methods comprised cuff-based readings and continuous monitoring. However, cuff-based measures can be heavily influenced by aspects such as operator skill, cuff size, measuring setup and human error [3]. Furthermore, this process only provides a single measurement of blood pressure at a specific moment, making it not ideal to trace blood pressure fluctuations over time. Blood pressure (BP) can fluctuate significantly over time due to various factors, such as diet, exercise, mental state, and stress [4]. Continuous blood pressure monitoring, on the other hand, allows for a higher accurate estimation of a patient’s blood pressure state, comprising nightly blood pressure changes and variations during exercise. This approach allows doctors to more correctly assess patients’ problems and offer more personalized treatment solutions. It allows the early discovery of BP changes and the application of a suitable method to limit the continuation of hypertension. This is particularly crucial for preventing cardiovascular and cerebrovascular events, as well as lowering disease risks [5]. Nonetheless, current continuous blood pressure monitoring techniques usually require the use of intrusive devices, such as intravascular catheters, which are prone to infections, difficult to handle, etc., [6]. Traditional blood pressure measuring techniques have limits in practical applications, necessitating a non-invasive, continuous, and accurate method.

More researchers are delving into continuous blood pressure estimate techniques based on biosignals such as photoplethysmograms (PPGs), electrocardiograms (ECGs), and ballistocardiograms (BCGs) as a result of the recent developments in sensor technology, computer science, and artificial intelligence. Compared to traditional blood pressure measurement techniques, these unique systems that use Pulse Transit Time (PTT), Pulse Wave Velocity (PWV), and other manual feature extraction techniques offer various benefits, including non-invasiveness, real-time continuous monitoring, and user-friendliness [7,8,9]. Many substantial gains have been produced in studies, focusing on estimating approaches that combine ECG and PPG data, demonstrating remarkable precision in blood pressure measurements [10]. However, there are certain drawbacks to monitoring ECG signals. Prolonged usage of patch electrodes on the skin can limit airflow and cause pain for users [11]. Moreover, blood pressure measurement systems that rely on various biological signals, including PPG, BCG, ECG and other signals, encounter challenges with data synchronization, information fusion, complex implementation, increased development costs, and limited noise resistance [12]. In comparison, because the PPG waveform is normally recorded in the fingers, it provides a better consistent estimation value than multiple channel signal gathering, making it ideal for long blood pressure estimation trials. The PPG signal is a simple and inexpensive way to depict the volumetric change of blood flow in the heart. An oximeter that illuminates the skin is used to measure it, and the reflection it produces is directly related to variations in blood flow volume. The PPG signal is a viable option for blood pressure estimation in an environment with limited resources due to its adaptability in terms of the inference to efficiency ratio [13]. Although systems based solely on a single PPG signal provided a more straightforward method by avoiding the intricacies and problems linked with several sensors, machine learning and some deep learning methods relied on a sophisticated manual feature extraction technique [14,15], which requires professional knowledge and experience, necessitating a fully automatic PPG feature extraction technique using deep learning methods to further simplify the data collection/processing procedure. Thus, it is important to eliminate the need for complex expert domain knowledge manual feature extraction methods [5,16,17].

This research describes a unique approach for continuous blood pressure measurement using an efficient single-model cBP-Tnet multi-task learning Transformer network with automated PPG feature extraction. cBP-Tnet enhances the accuracy and robustness of non-invasive continuous blood pressure estimation using large-scale Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) public datasets [18,19]. The proposed cBP-Tnet method for the cuff-less estimation of blood pressure has the potential to be integrated and enable non-invasive continuous monitoring via home and mobile healthcare devices. The key contributions of this work are summarized below:

a.: To date, cBP-Tnet was the only deep learning method with automatic photoplethysmogram feature extraction [5,16,17] to have both systolic (MAE: 4.32 mmHg) and diastolic (MAE: 2.18 mmHg) blood pressure acceptable in accordance to the Association for the Advancement of Medical Instrumentation (AAMI)’s international standards (<5 mmHg, >85 subjects) [20].
b.: cBP-Tnet was designed effectively as it is a single-channel model, taking into account recent deep learning methods for continuous non-invasive blood pressure monitoring [5,16,17], which were hybrid and/or complex in design and needed multiple models to operate.
c.: The cBP-Tnet method is 13.67% faster to train and outputs better as well as AAMI-accepted results compared to recent studies [5] in the field.

The rest of this work is organized as follows: Section 2 addressed the current AI methods for continuous non-invasive blood pressure measurement with a single-channel PPG. Section 3 covered the pipeline and the suggested cBP-Tnet single-channel and model deep learning architecture for blood pressure estimation. Section 4 presents the experimental findings, compares them to the AAMI international standards and other recent publications, and Section 5 concludes the research.

2. Related Works

Existing AI methods for continuous cuff-less non-invasive blood pressure measurement with a single-channel PPG are divided into two types: classical machine and deep learning methods. Classical machine learning approaches include manually identifying features from raw PPG data and then developing a regression algorithm using classical machine learning models to measure diastolic and systolic blood pressure in people [15]. As an example, one of the initial trials for BP measurement using PPG signals was only undertaken by Teng and Zhang in 2003 [21], wherein they investigated the association between Arterial Blood Pressure (ABP) and specific PPG waveform characteristics. Data were obtained from 15 healthy and young subjects utilizing specialized equipment in a well-controlled atmosphere with no movement, complete silence, and constant temperature. Correlation analysis was utilized for feature selection, and the relationship was modeled with a linear regression approach. In another work, Kurylyak et al. in 2013 [22] retrieved 21 factors from more than 15,000 cardiac training samples in the MIMIC database. These characteristics were used as input features for an ANN that estimated blood pressure with PPG data. This strategy proves to be better than the linear regression technique. Liu et al. in 2017 [23] used 14 features from the second derivative of PPG (SDPPG) and blended them with the 21 features from [22]. To better forecast systolic and diastolic blood pressure, they used an SVM as a BP estimator. Compared to [22]’s 21 characteristics and the use of an ANN, their strategy improved BP estimation accuracy by 40%. The need for superior PPG waveforms to reliably detect PPG waveform feature points is a fundamental issue for systems that combine manual features with classical machine learning for blood pressure calculation. However, PPG waveforms captured by wearable devices frequently contain high motion aberrations and noise, making it hard to detect crucial feature points such as troughs, inflection points, peaks and dicrotic notches. As a result, the series of features formed by arranging these feature points has significant errors. Furthermore, generating and selecting manual features necessitates professional expertise and experience, as well as multiple efforts to identify the ideal feature set, which limits practical applications.

With the improvement in processing capacity, we have switched our focus to automated feature extraction methodologies for PPG waveforms utilizing deep learning models [15]. Deep learning models, unlike classical machine learning models, are great at learning features exactly from original data, removing the requirement for manual feature engineering and capitalizing on the latent data contained in original signals. Slapničar et al. in 2019 [16] proposed a residual neural network (ResNet) model that used a recurrent (RNN) and convolutional neural network (CNN) to interpret PPG signals’ frequency and time-domain details and then combined the recovered characteristics to estimate SBP and DBP. Their analyses of 510 participants from the MIMIC III database yielded an MAE of 6.88 mmHg and 9.43 mmHg for DBP and SBP measurements, respectively. However, their ResNet model needs a very long training sequence of 10,000 epochs, making it prone to overfitting. Additionally, both DBP and SBP failed to meet the accuracy requirements outlined by the AAMI international standard that the MAE must be less than 5 mmHg [20]. In another study, Rong and Li in 2021 [17] introduced a multi-type feature fusion (MTFF) neural network model for blood pressure (BP) estimation using PPG only. The model consists of two CNNs that train the morphological and frequency spectrum aspects of the PPG signal, as well as a bidirectional long short-term memory (BiLSTM) network that trains the temporal features of the PPG signal. The MAE of the MTFF-ANN was 5.59 mmHg for SBP and 3.36 mmHg for DBP. However, MTFF-ANNs involve a complex feature extraction process and a low sample size of only 11,546 samples, which is not enough for deep learning methodologies; thus, they cannot be generalized. Furthermore, DBP failed to meet the accuracy requirements outlined by the AAMI international standard. Recently, Dai et al. in 2024 [5] proposed a continuous blood pressure estimation model that joins the convolutional block attention module (CBAM) plus the temporal convolutional network (TCN). The MAE of SBP and DBP estimation using their TCN-CBAM algorithm is 5.35 mmHg and 2.12 mmHg, respectively, outperforming previous deep learning architecture with convolutional and recurrent neural network architectures. However, TCN-CBAM still needs a long training sequence of 1,500 epochs but becomes more efficient compared to [16]. It also has a complex/hybrid deep learning architecture that uses two (2) models. In addition, the DBP still failed to meet the accuracy requirements outlined by the AAMI international standard.

Thus, none of the tested performance of three current deep learning models—ResNet, MTFF-ANN, and TCN-CBAM with automatic feature extraction using PPG signal—met both the SBP and DBP blood pressure estimation accuracy criteria defined by the AAMI international standard [20]. Moreover, previous studies utilized complex automatic feature extraction [17], hybrid models [5], and long training times [16], limiting the developed AI models’ potential to be integrated and enable non-invasive continuous monitoring via home and mobile healthcare devices. This study will explore transformer architectures, which have recently proven to be extremely effective for sequential data modeling by leveraging self-attention mechanisms to capture both local and global dependencies without the vanishing gradient issues of RNNs or the limited receptive fields of CNNs [24]. As Transformers have the ability to collect both local waveform features and global temporal contexts, it makes them ideal for continuous, non-invasive blood pressure estimation based on sequential PPG data.

3. Materials and Methods

The cBP-Tnet pipeline, as shown in Figure 1, employs a multi-task learning Transformer-based architecture for continuous non-invasive blood pressure estimation. Transformers, which were originally developed for natural language processing, have proven useful for a variety of sequence modeling challenges due to their ability to capture long-range relationships through self-attention techniques [24]. In this research, the input sequences are derived from a physiological signal—PPG. In classical machine learning contexts, a model is typically trained to perform a single predicted job. For example, you may use a regression model to estimate only SBP without DBP and vice versa. However, blood pressure measurement tasks frequently require closely related physiological signals. Systolic blood pressure (SBP) and diastolic blood pressure (DBP) are not independent; they result from the same underlying circulatory dynamics. Multi-task learning (MTL) takes advantage of this relatedness [25]. By training a single model to predict both SBP and DBP simultaneously, the researchers hope to improve model generalization by sharing representations and constraints between tasks. Moreover, they hope to reduce the requirement for individual models, potentially increasing estimation accuracy for both SBP and DBP. cBP-Tnet further uses adaptive Kalman filters to preprocess 202,956 synchronized PPG/ABP samples from random 500 subjects from the Multi-parameter Intelligent Monitoring in Intensive Care II (MIMIC II) database [26], and it automatically extracts features from raw and derived PPG signals (PPG′ and PPG″), reducing the need for manual feature engineering. Outliers were removed, and the data were stratified for training (80%), validation (10%), and testing (10%). Signal augmentation and normalization improve robustness, while the multi-task learning Transformer model incorporates temporal relationships to predict both systolic and diastolic blood pressure simultaneously. The pipeline was tested against an international clinical benchmark—Advancement of Medical Instrumentation (AAMI) standards (<5 mmHg, >85 subjects)—confirming accurate, real-time, non-invasive blood pressure measurement and its potential to be integrated/enable non-invasive continuous monitoring by home and mobile healthcare devices.

Additionally, the pseudocode below, in Table 1, further explains the pipeline of the cBP-Tnet workflow implemented in JupyterLab (Version: 4.0.11): initially, random seeds are fixed and PPG/ABP CSV data are loaded, followed by signal smoothing with an adaptive Kalman filter. Next, filtered PPG/ABP pairs are synchronized and PPG peaks are detected. For each detected beat, a fixed-length PPG segment plus its first and second derivatives are concatenated into a feature vector. The corresponding SBP and DBP targets are obtained as the maximum and minimum of the aligned ABP segment. Out-of-range samples (SBP

\notin [80, 180] mmHg

or DBP

\notin [60, 130] mmHg

) are deleted, and the remaining data are stratified and divided into training, validation, and test sets. The process involves standardizing features, creating PyTorch (Version: 2.5.1) DataLoaders, and training is carried out with a multi-task Transformer model with a OneCycleLR scheduler and L1 loss; the best weights are saved. Finally, the model is tested on validation and test sets—calculating MAE, correlation coefficient, AAMI, and BHS metrics—and then converted to TorchScript for edge deployment, with test set MAE comparisons being reported.

3.1. MIMIC II Dataset Loading and Preprocessing

The Physionet Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC) II Waveform database [26] contains recordings of numerous physiological signals and parameters from ICU patients. The information was gathered from Boston’s Beth Israel Deaconess Medical Center (BIDMC) over a seven-year period beginning in 2001. MIMIC II version 2.6 had roughly 33,000 patients, with 25,000 adults (age

> 15 years old

at time of last hospitalization) and 8000 newborns (age

\leq 1 month old

at time of initial admission) [27]. The researchers experimented on the subset of Kachuee et al.’s MIMIC II-derived cleaned cuff-less blood pressure estimation dataset [19]. From the cleaned dataset, the researchers loaded 202,946 PPG/ABP samples from random 500 patient records at 125 Hz. Before training the multi-task Transformer, the raw signals (PPG and ABP) were filtered with an adaptive Kalman filter [28] to decrease noise and then aligned in time using cross-correlation algorithms. The adaptive Kalman filter starts with a state estimate of 0, an error covariance of 1, a process variance of

1 \times 10^{- 5}

, and a measurement variance of

1 \times 10^{- 2}

. At each time point, the algorithm first projects the state forward and multiplies the covariance by the current process variance. It then calculates the Kalman gain, which is the ratio of projected covariance to the sum of projected covariance plus measurement variance, and it applies that gain to include the new measurement into the state estimate, updating the covariance accordingly. The sample’s filtered output is derived from the updated state. Finally, the squared difference between the measurement and the prior prediction is used to smooth both variances using exponential smoothing: each is multiplied by

0.99

and then incremented by

0.01

times the squared residual, allowing the filter to gradually adjust its confidence in the model versus the measurements. The synchronization function first converts each signal to a one-dimensional NumPy array and subtracts its mean before computing the full-length cross-correlation using np.correlate(…, mode=‘full’). It calculates the lag by finding the index of the maximum correlation and subtracting

\leq ((PPG) - 1)

. If the lag is positive, it discards that many initial samples from the ABP signal and truncates the PPG signal to the same remaining length. If the lag is negative, it discards

| lag |

samples from the start of the PPG signal and truncates the ABP signal accordingly. If the lag is zero, both signals remain unchanged. Finally, it returns a pair of synchronized, equal-length signal segments. These preprocessing stages produce cleaner, more aligned inputs, allowing the multi-task Transformer to focus on relevant patterns rather than being confused by noise or misalignment. Since PPG and ABP signals may not be perfectly aligned in time, cross-correlation was used to determine the lag between them. We synchronized signals for model training since the model attempts to map the non-invasive PPG to a continuous estimation of BP that resembles the ABP waveform [15].

3.2. Automatic Photoplethysmogram Feature Extraction

The cBP-Tnet automatically extracts 2 s features from raw and derived PPG signals, including its first and second derivatives (PPG′ and PPG″). The derivatives of the PPG signal also highlight biologically significant alterations (e.g., systolic and diastolic peaks), which are closely linked with SBP and DBP. The first and second derivatives emphasize slope changes and points of inflections, which are empirically proven effective with regard to being able to improve the models’ performance [5,16]. This reduces the need for handmade feature engineering, which was time-consuming and restricted to specific use cases [5,16,17]. Instead, the model learns directly from minimally processed inputs, drawing on multi-task learning Transformer networks’ data driven capabilities:

{PPG}^{'} (t) = \frac{d (PPG)}{d t},

(1)

{PPG}^{″} (t) = \frac{d^{2} (PPG)}{d t^{2}} .

(2)

The combination of [PPG, PPG′, PPG″] generates a rich feature space that captures waveform morphology. That is, a segment of PPG signal (and its derivatives) were truncated or zero-padded to a fixed length L [29] for uniformity in 2 s intervals @125 Hz (250 samples each):

fixed_signal (t) = \{\begin{matrix} signal (t), & t < L, \\ 0, & t \geq L . \end{matrix}

(3)

After synchronization, the segments had a total of 750 input features since raw PPG, PPG′ and PPG″ were sampled at 125 Hz. Simultaneously, SBP and DBP were extracted from ABP. By automatically extracting essential features directly from the PPG signal, the model minimizes domain-specific bias and assures scalability over a wide range of patient populations and situations.

3.3. PPG/ABP Data Filtering and Splitting

To provide robustness, the physiological signals were filtered and separated. Outlier elimination protects the model from being impacted by very high and atypical SBP/DBP values (SBP ≥ 180, SBP < 80, DBP ≥ 130, DBP ≤ 60) [14]. After data filtering, to preserve the distribution of physiological targets of systolic blood pressure (SBP) and diastolic blood pressure (DBP) throughout all subsets, the dataset is stratified (8:1:1 ratio) into training (97,761), validation (12,220), and test sets (12,221). Because SBP and DBP are continuous variables, a common strategy is to bin (N=10) them into discrete categories before stratification. This limits the possibility of overfitting and guarantees a fair and thorough performance evaluation. A stratified splitting method (scikit-learn’s train_test_split with the stratify option, N = 10) splits the dataset into training, validation, and test sets, each with similar SBP and DBP distributions [30]. This technique assures that model evaluation represents real-world clinical situations and that performance indicators are not distorted by a disproportionate representation of specific blood pressure ranges.

3.4. Signal Normalization and Augmentation

Signal normalization and augmentation are important processes in the cBP-Tnet pipeline. Normalization assigns a consistent scale to all raw 750 input PPG features. Normalizing guarantees that the model does not become biased toward any particular amplitude range, allowing the model to learn patterns rather than exact values. Subsequent data standardization of input features is performed. Because physiological data are frequently limited, several augmentation approaches are used to improve model resilience. During training, each raw PPG feature vector undergoes four random augmentations: with 50% probability, we add zero-mean Gaussian noise (std = 0.05); with 50% probability, we apply a uniform scaling factor drawn from

[0.9, 1.1]

(scale_factor = 0.1); with 30% probability, we shift (“pan”) the signal by an integer between

- 10

and

+ 10

samples (max_shift = 10), padding or truncating as needed; and with 30% probability, we zero out a contiguous block of 10 samples (mask_size = 10). Each augmentation is applied independently per sample, so multiple transforms can be combined on the same signal. By incorporating modifications, like noise addition, scaling, shifting, and masking, the model learns to be insensitive to tiny changes in signal amplitude, baseline shifts, and temporal distortions [31]. These strategies simulate real-world variability, such as sensor placement and patient situations, hence improving model robustness.

3.5. Proposed cBP-Tnet Multi-Task Transformer Model Training

The pipeline’s main component is the proposed cBP-Tnet model, a single-channel model deep learning architecture that makes use of a multi-task learning Transformer network. Figure 2 below displays a high-level architectural and training overview of a multi-task Transformer-based neural network called cBP-Tnet, which was efficiently designed to predict blood pressure metrics using PPG data with automatic feature extraction. Transformers excel at processing sequential data because of their self-attention processes, which allow the model to dynamically weigh different parts of input signals [24]. This can help catch tiny waveform patterns associated with blood pressure variations. The multi-task configuration (predicting both SBP and DBP) increases overall model performance. By learning both tasks concurrently, the model benefited from shared underlying representations in the PPG waveform, resulting in better generalization. The training procedure contains advanced approaches such as OneCycleLR learning rate scheduling that starts with 0.001, which stabilizes and accelerates convergence [32], designed to save the best model based on the best combined validation L1 loss to prevent overfitting [15]. The model was trained across an optimal amount of 1295 epochs with combined SBP and DBP loss function L1 loss, providing task balance and smooth gradient updates with a 4.0 grad clip value [33]. Using a OneCycleLR scheduler, training begins with a gradual increase (“warm-up”) of the learning rate and then smoothly anneals it toward zero by the end of training, allowing the optimizer to avoid shallow local minima early on, accelerate convergence, and frequently result in lower validation loss and better generalization. Meanwhile, gradient clipping caps the gradient norm (4.0 in this case), preventing any single batch from producing excessively large updates that could destabilize training; this keeps weight updates stable, reduces oscillations, and protects against exploding gradients, particularly in deep or recurrent architectures. Together, these strategies result in faster, more stable training dynamics and higher ultimate accuracy on both the validation and test sets.

The optimal number of 1295 epochs was found while training the model for 1500 epochs (1500 epochs was the epoch used by the latest previous research, TCN-CBAM method [5]). We coded cBP-Tnet to store the best model and output the optimal model after training for 1500 epochs. Nonetheless, 1295 epochs found is still 13.67% faster than the latest deep learning method in the field, the TCN-CBAM method [5], computed based on the percent change formula. To solve the percent change when transitioning from TCN-CBAM (previous value of 1500 epochs) to cBP-Tnet (new value of 1295 epochs), we defined the old value as

x_{0} = 1500

and the new value as

x_{1} = 1295

and used the percent change standard formula. On the left, the model starts with preprocessed input data streams that comprise not just the raw PPG waveform but also its first and second derivatives. These three inputs, each with a 2 s duration of 250 samples at 125 Hz, were combined to generate a concatenated input dimension of (32,750), for a batch size of 32 instances, each with 750 features from the three PPG-related signals. This preprocessing step was intended to capture both the core form of the PPG pulse wave and the subtle inflections shown by its derivatives. The input data were then projected into a compact, learned feature space using a linear projection layer, which converts the raw signals into a 128-dimensional embedding.

3.5.1. Input Projection and Positional Encoding

Given an input sequence of features

X \in R^{N \times d_{in}}

, where N is the sequence length (250 PPG samples) and

d_{in}

is the concatenated feature dimension

[PPG, {PPG}^{'}, {PPG}^{″}]

.

X_{proj} = X W_{in},

(4)

where

W_{in} \in R^{d_{in} \times d_{model}}

projects the input to the model dimension

d_{model}

(

d_{model} = 128

). The output shape was

(32, 128)

. A learnable positional encoding

P \in R^{N \times d_{model}}

is added to inject sequence ordering because Transformers are inherently order-agnostic [24]:

Z^{(0)} = X_{p r o j} + P,

(5)

The learnable positional encoding output shape was (32, N, 128). This enables the model to distinguish the temporal order of samples, which is crucial for time-series data in the study. This stage is crucial because it converts the raw physiological waveforms into a representation that is better suited to the Transformer’s self-attention processes. The next essential activities that take place in the multi-task Transformer encoder layer occurred when a multi-head self-attention module (with four heads) and a feed-forward network (with a large inner dimension of 2048) collaborated to simulate the PPG signal’s complicated temporal dependencies and non-linear interactions.

3.5.2. Multi-Head Scaled Dot-Product Attention

Each multi-task Transformer layer uses multi-head self-attention (

h = 4

). For each attention head h, the input

Z^{(l - 1)}

(from the previous layer or the input projection for the first layer) is linearly projected into queries (

Q

), keys (

K

), and values (

V

):

Q = Z^{(l - 1)} W_{Q}^{(h)},

(6)

K = Z^{(l - 1)} W_{K}^{(h)},

(7)

V = Z^{(l - 1)} W_{V}^{(h)},

(8)

where

W_{Q}^{(h)}, W_{K}^{(h)}, W_{V}^{(h)} \in R^{d_{model} \times d_{k}}

and

d_{k} = \frac{d_{model}}{h_{heads}}

. The scaled dot-product attention for one head is computed as follows:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V .

(9)

For

h_{heads}

heads, multi-head attention is

H = [h e a d_{1}; h e a d_{2}; \dots; h e a d_{h_{heads}}] W_{O}

(10)

where

[;]

denotes concatenation and

W_{o} \in R^{(h_{heads} \cdot d_{k}) \times d_{model}}

[24]. This attention technique enables the model to pay attention to several parts of the input at the same time.

3.5.3. Residual Connections and Layer Normalization

To aid training and stabilize gradients, each sub-layer is wrapped with residual (skip) connections and layer normalization (LN):

Z^{(l)} = LN (Z^{(l - 1)} + MultiHeadAttention (Z^{(l - 1)})),

(11)

Z^{(l)} = LN (Z^{(l)} + FFN (Z^{(l)})) .

(12)

This ensures that the model learns more successfully by avoiding vanishing and exploding gradients. The dropout rate (

P_{drop} = 0.05

) and the existence of layer normalization stabilize training and prevent overfitting.

3.5.4. Position-Wise Feed-Forward Network (FFN)

Each multi-task Transformer encoder layer has the same feed-forward network (

FFN = 2048

) applied to each position in the sequence. Given the input

X

, we have the following:

FFN (X) = max (0, X W_{1} + b_{1}) W_{2} + b_{2},

(13)

where

W_{1} \in R^{d_{model} \times d_{ff}}, b_{1} \in R^{d_{ff}}, W_{2} \in R^{d_{ff} \times d_{model}}, b_{2} \in R^{d_{model}} .

Now,

d_{f f}

is typically larger than

d_{model}

to allow for more sensitive transformations [24].

3.5.5. Global Max Pooling Layer

After passing through N multi-task Transformer encoder layers (×8) with an output shape of (32, N, 128), a global max pooling operation [5] reduces the sequence-wide representation to a single 128-dimensional summary vector, which successfully captures the most important properties from the whole input waveform, such as its peaks, dicrotic notches, troughs, and inflection points. It was applied to produce a fixed-length vector:

z = max_{1 \leq i \leq N} Z_{i}^{(N)},

(14)

and reducing sequence dimension (32, 128).

3.5.6. Multi-Task Learning Output Layer

This latent vector z now encodes the learned representation of the input signal segment. For multi-task learning (MTL), it uses a linear layer that outputs both SBP and DBP simultaneously [34]:

\hat{y} = [{\hat{y}}_{S B P}, {\hat{y}}_{D B P}] = z W_{o u t} + b_{o u t} \in R^{2},

(15)

where

W_{out} \in R^{d_{model} \times 2}

and

b_{out} \in R^{2}

. For multi-task learning (MTL), we predict systolic and diastolic blood pressure simultaneously. The training objective for MTL is to minimize the combined

L_{1}

loss (MAE) of SBP and DBP. Given

{\hat{y}}_{SBP}, {\hat{y}}_{DBP}

and ground-truth values

y_{SBP}, y_{DBP}

, we define the loss as:

L = MAE ({\hat{y}}_{S B P}, y_{S B P}) + ({\hat{y}}_{D B P}, y_{D B P}),

(16)

where

MAE (y, \hat{y}) = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - {\hat{y}}_{i}| .

(17)

Minimizing the combined

L_{1}

loss enables the cBP-Tnet model to improve performance on both tasks at the same time, balancing learning and guaranteeing that the latent representation can predict both SBP and DBP. The training section shows the usage of a OneCycleLR learning rate schedule at

0.001

and an optimal training run (1295 epochs) that is still 13.67% faster than the current state-of-the-art one [5], outperforming the best model based on the best combined validation

L_{1}

loss. A gradient clipping threshold (

τ = 4.0

) was also implemented to stabilize the training phase; that is, it employed the OneCycle learning rate schedule [35] with an

L_{1}

loss that initially increased and then decreased the learning rate to help converge to better minima. Gradient clipping prevents gradients from exploding, as follows:

{∥ \nabla_{0} L ∥}_{2} \leq τ,

(18)

where

τ

is a predefined threshold (

τ = 4.0

). The final output shape was

(32, 2)

. The code was made available at https://github.com/apimentel-ECE/cBP-Tnet.git (accessed on 19 December 2024).

4. cBP-Tnet Experimental Results and Discussions

This section includes Leave-One-Subject-Out (LOSO) experiments, hyperparameter tuning/analysis, comparative analysis with related works, and model evaluation and visualization to understand the effect of gradual performance improvement of various factors added to the model and to further optimize model performance.

4.1. Leave-One-Subject-Out (LOSO) Experiments

Table 2 depicts the incremental improvement of the proposed cBP-Tnet for estimating blood pressure from raw PPG signals automatically. In our Leave-One-Subject-Out-style study, “subject” refers to each successive experimental condition. “Leaving one subject out” refers to removing or adding one signal-processing component at a time (raw PPG, PPG′, PPG″, Kalman filtering, outlier removal, synchronization, augmentation) and measuring the MAE using the same LOSO cross-validation scheme. Framing each pipeline option as a “subject” allows us to isolate the performance increase generated by each new factor, allowing us to determine which processing steps improve accuracy the most and thus direct future model modification.

Each row illustrates a different model and preprocessing pipeline, beginning with a simple baseline (just the raw PPG waveform) and gradually introducing additional signal derivatives, filtering algorithms, data synchronization and augmentation mechanisms. The values shown are the mean absolute errors (MAEs) in mmHg for both systolic and diastolic blood pressure measurements using a Leave-One-Subject-Out (LOSO) scheme, in which each individual in the dataset is removed from the training set in turn to assess how well the model generalizes to previously unseen subjects. The validation loss MAE was used initially in the base model while the extended (final) model used the testing loss MAE. It should be noted that based on the AAMI international standard, the MAE must be less than 5 mmHg when there are more than 85 subjects [20]. Firstly, the model reaches certain baseline errors using only the raw PPG signal (for example, 5.72 mmHg for systolic pressure and 3.09 mmHg for diastolic pressure). The error values gradually decrease as more detailed signal processing processes are added to following rows, such as adding the first and second derivatives of the PPG waveform (PPG′ and PPG″). Using the LOSO scheme, adding the first derivative (PPG′) of the PPG to the raw signal reduced the SBP MAE from 5.72 mmHg to 5.08 mmHg (11.24% improvement) and the DBP MAE from 3.09 mmHg to 2.79 mmHg (9.6% improvement), demonstrating that the slope information—reflecting arterial compliance and pulse transit dynamics—enhances blood pressure estimates; further, incorporating the second derivative (PPG″) produced additional reductions to 5.00 mmHg SBP MAE (1.56% further gain) and 2.75 mmHg DBP MAE (1.43% further gain), showing that curvature and inflection points (e.g., dicrotic notch) capture reflected wave timing and vascular stiffness. These quantitative gains confirm that derivative features reveal critical rate-of-change and curvature characteristics of the PPG waveform that are otherwise obscured, greatly increasing prediction accuracy. This shows that more information about the signal’s rate of change and acceleration can assist the model capture more subtle blood pressure aspects. Finally, data augmentation procedures artificially increase the diversity and quantity of training samples, hence enhancing model generalization. The model’s error metrics improve gradually when more components are added. The MAEs decrease significantly as the pipeline comprises not only the original signal and its derivatives but also advanced filtering, outlier treatment, synchronization, and augmentation. These improvements show that blood pressure prediction using PPG signals is sensitive to noise, alignment, and data quality. Systematic signal processing and advanced modeling approaches can considerably improve the reliability of non-invasive blood pressure predictions, as seen by the lower MAEs in the table’s final rows.

4.2. Hyperparameter Tuning/Analysis

Table 3 summarizes hyperparameter tuning experiments performed on the proposed cBP-Tnet model to increase its capabilities to estimate systolic blood pressure and diastolic blood pressure (SBP and DBP) from raw PPG input data. Each row represents a change in one or more model parameters, including the model dimensionality (

d_{model}

), the number of heads (h) in the multi-head attention mechanism, the number of encoder blocks (N), the probability of dropout (

P_{drop}

), and the gradient clipping threshold (

grad_clip

). Along with these parameter settings, the table shows the corresponding mean absolute error (MAE) values for SBP and DBP, which indicate how well the model predicts blood pressure relative to genuine observed results—the lower the MAE, the better. In the “Base” configuration, where

d_{model} = 128

,

h = 4

,

N = 8

,

P_{drop} = 0.05

, and

grad_clip = 4.0

, the model originally trained for 150 epochs (10% of the previous state-of-the-art one [5]) produces MAEs of

4.68 mmHg

for SBP and

2.36 mmHg

for DBP. The next rows (A through E) indicate proposed systematic experimental trials that could be carried out to change important parameters. In row (A), the model dimensionality is either reduced to 64 or increased to 256. We observe that a smaller dimensionality significantly increases error (SBP MAE

\sim 7.25 mmHg

), while a larger dimensionality (256) improves performance slightly (SBP MAE

\sim 5.03 mmHg

) but not beyond the original baseline. Similarly, in row (B), decreasing the number of attention heads from four to two impairs forecasts, whereas increasing to eight heads brings SBP MAE closer to the original values but with no substantial improvement over the baseline. Part C adjusts the number of encoder layers (N), comparing 6 and 10 layers to the original 8. Adjustments to the depth result in minor error variations, but no significant improvement in performance (SBP MAE

\sim 4.76

–

4.77 mmHg

and DBP MAE

\sim 2.37

–

2.42 mmHg

). Part (D) tests various dropout probabilities. A

0 %

dropout rate results in comparable performance (SBP MAE

\sim 4.74 mmHg

) to the baseline, indicating that a moderate dropout value may provide stability but is not necessary. A greater dropout rate (

0.10

) somewhat decreases performance, implying that excessive dropout may degrade model accuracy.

Finally, part (E) examines the impact of gradient clipping. Eliminating gradient clipping (

0.0

) results in a catastrophic failure (SBP MAE rises to

145.43 mmHg

), indicating that uncontrolled gradients significantly disrupt training. In contrast, a high gradient clip value (

grad_clip = 8.0

) returns performance to the baseline, demonstrating the necessity of managing gradient explosions. By the end of these extensive experimental trials and hyperparameter tuning/analysis, the researchers identified the final optimal configuration:

d_{model} = 128, h = 4, N = 8, P_{drop} = 0.05,

and

grad_clip = 4.0

. Training the model for up to the optimal 1295 epochs (based on the best combined SBP/DBP validation

L_{1}

loss) yielded a noticeable improvement in accuracy, reducing the SBP MAE to

4.31 mmHg

and the DBP MAE to

2.18 mmHg

. This showed that careful and methodical tuning of network architecture and training parameters is necessary to achieve robust performance in blood pressure prediction tasks, and that certain settings—particularly controlling model size, maintaining moderate dropout, and preventing gradient explosion—are key to achieving high-quality, stable results.

4.3. Comparison Against Related Deep Learning Methods to Estimate Blood Pressure with Automatic Feature Extraction Using Photolethysmogram Feature Extraction

Table 4 summarizes the progress of research methods and performance benchmarks for non-invasive continuous blood pressure estimation using PPG signals with automatic feature extraction mechanisms. The top row, representing Slapničar et al. [16] (2019), uses a ResNet-based architecture on 510 participants but performs poorly, with MAEs well exceeding the AAMI standard threshold of 5 mmHg for both systolic blood pressure and diastolic blood pressure. Not only do the results fail to reach the AAMI requirements, but the model apparently requires lengthy training periods (up to 10,000 epochs), indicating problems with training efficiency and scalability. Rong and Li [17] (2021) apply a multi-type feature fusion (MTFF-ANN) technique to a small dataset of 11,546 samples only, yielding a better but still insufficient systolic MAE. While the diastolic estimate meets the AAMI criteria, the very complex and difficult automatic feature extraction process and the small sample size call into question the model’s applicability to larger populations and situations. Dai et al. [5] (2024) incorporate temporal convolutional networks and a CBAM attention mechanism over a huge dataset of 270,488 samples to achieve even greater improvements. Although the diastolic MAE is high and meets the AAMI standard, systolic estimation fails, and the model takes at least 1500 epochs to converge. This intricacy and extensive training burden indicate a smart but not totally efficient system.

In contrast, the proposed cBP-Tnet model that makes use of a multi-task learning Transformer-based network outperforms all the recent studies. It was based on 500 random patients’ records (202,956 samples); it achieves significantly lower MAEs for both systolic blood pressure (4.32 mmHg) and diastolic blood pressure (2.18 mmHg), meeting the AAMI requirements. Remarkably, this strategy reduces the training time to only 1295 epochs, a 13.67% improvement over its latest competitor [5], while employing a simpler single-model deep learning framework. This not only simplifies model design but may also reduce computational and infrastructure demands. The proposed cBP-Tnet AI model demonstrates that systematic improvements in model architectures, sample sizes, and training protocols can finally close the clinical standard gap, paving the way for more practical and widespread use of continuous non-invasive blood pressure monitoring tools, optimizing the potential to be integrated/enable non-invasive continuous monitoring by home and mobile healthcare devices.

4.4. cBP-Tnet Evaluation and Visualization

Figure 3 and Figure 4 show how well a cBP-Tnet AI model predicts systolic and diastolic blood pressure values from specific input signals. In Figure 3, the training loss (blue line) begins at a very high value and rapidly drops in the initial epochs, demonstrating that the model is effectively learning the underlying patterns in the data. As training develops beyond the early stages, the loss gradually decreases, eventually stabilizing at a lower value. The validation loss (yellow line), while initially higher, follows a general downward trend, indicating that the model is generalizing to previously unseen data rather than overfitting. By 1295 epochs, the training and validation losses appeared to have stabilized, indicating that the chosen training duration was appropriate as it was designed to save the best model based on the best combined validation L1 loss.

The model’s performance was better highlighted in the SBP and DBP prediction against the actual plots, as shown in Figure 4. The predicted SBP (dashed red line) closely reflects the real SBP (solid blue line) values across the sample series, suggesting a high capacity to simulate the dynamic changes of systolic pressure. Similarly, while DBP fluctuates to a greater extent and considering that the peaks and troughs may not coincide completely, the predicted values (green line) show a general relationship with the actual data (yellow line). This shows that the model has learned the most common patterns in DBP; however, it may be more susceptible to noise or more complex fluctuations in diastolic pressure.

The predicted vs. actual scatter graph test results for systolic blood pressure (SBP) in Figure 5 and diastolic blood pressure (DBP) in Figure 6 show how the model’s anticipated values compare to the actual measurements. Each point represents a single test case, with the horizontal axis representing the actual measured blood pressure and the vertical axis representing the model’s predicted value for that same instance. These plots demonstrates the performance of the model within different blood pressure ranges. A strong linear trend, as seen in both graphs, implies that the model is capturing a significant portion of the underlying pattern: correlation coefficients of approximately 0.885 for SBP and 0.874 for DBP indicate a high degree of linear relationship. The red regression lines fitted across the data points, with slopes of approximately 0.87 for SBP and 0.84 for DBP and moderate intercept values, demonstrate that forecasts often grow in tandem with actual values.

However, the appearance of substantial scatter around these regression lines indicates that, while the model’s predictions are frequently accurate, they are not flawless. Some examples fall well above or below the line, indicating over- or under-prediction. Nevertheless, these plots provide a visual assessment of prediction accuracy, demonstrating that the model performs well for a substantial portion of the data but has room for improvement in terms of precision and consistency.

Beyond its empirical performance, cBP-Tnet contributes to our theoretical understanding of how transformer-based architectures can internalize the rich hemodynamic dynamics encoded in raw PPG waveforms. It demonstrates that multi-head self-attention naturally learns slope, curvature, and reflection-timing cues that were previously difficult to manually engineer. By framing SBP and DBP estimates as a multi-task problem, the model takes use of shared vascular compliance patterns while maintaining their separate physiological bases, resulting in more robust representations and closer generalization limitations. The combination of adaptive Kalman filtering and targeted data augmentation improves learning stability by denoising and expanding the signal distribution, strengthening the model’s resilience to real-world artifacts. Practically, cBP-Tnet’s single-channel, end-to-end design reduces development complexity and training time by more than 13.67%, lowering the barrier to deployment on resource-constrained wearables and home monitoring hubs. Meeting the AAMI standard on a large, diverse ICU dataset verifies its suitability for clinical translations, paving the way for continuous, cuff-free blood pressure monitoring in telemedicine, chronic care management, and consumer health applications.

5. Conclusions

To date, cBP-Tnet was the only deep learning method with automatic photoplethysmogram feature extraction [5,16,17] to have both systolic (MAE: 4.32 mmHg) and diastolic (MAE: 2.18 mmHg) blood pressure acceptable in accordance to the Association for the Advancement of Medical Instrumentation (AAMI)’s international standards (<5 mmHg, >85 subjects) [20]. cBP-Tnet was designed effectively as it is a single-channel model, taking into account recent deep learning methods for continuous non-invasive blood pressure monitoring [5,16,17], which were hybrid and/or complex in design and needed multiple models to operate. The cBP-Tnet method efficiently takes 13.67% faster to train and still outputs better as well as AAMI-accepted results compared to recent studies [5] in the field. This establishes cBP-Tnet’s potential for integration into wearable and home-based healthcare devices, paving the way for more accessible, dependable continuous non-invasive blood pressure monitoring.

Author Contributions

Conceptualization, J.-J.H. and A.R.A.S.; methodology, A.A.P., J.-J.H. and A.R.A.S.; software, A.A.P.; validation, A.A.P., J.-J.H. and A.R.A.S.; formal analysis, A.A.P., J.-J.H. and A.R.A.S.; investigation, A.A.P., J.-J.H. and A.R.A.S.; resources, J.-J.H. and A.R.A.S.; data curation, A.A.P.; writing—original draft preparation, A.A.P.; writing—review and editing, J.-J.H. and A.R.A.S.; visualization, A.A.P.; supervision, J.-J.H. and A.R.A.S.; project administration, J.-J.H.; funding acquisition, J.-J.H. All authors have read and agreed to the published version of the manuscript.

Funding

The authors gratefully acknowledge the financial support from the National Science and Technology Council (NSTC), Taiwan, under Grant No. NSTC 113-2221-E-218-008-. This work was also supported by the Center for Intelligent Healthcare through the Higher Education Sprout Project, funded by the Ministry of Education, Taiwan.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The public dataset used and code in this study was made available at https://github.com/apimentel-ECE/cBP-Tnet.git (accessed on 19 December 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MIMIC-II	Multi-parameter Intelligent Monitoring in Intensive Care II
TCN-CBAM	Temporal Convolutional Network-Convolutional Block Attention Module
MTFF-ANN	Multi-type Feature Fusion Artificial Neural Network
BiLSTM	Bidirectional Long Short-Term Memory
AAMI	Association for the Advancement of Medical Instrumentation
Resnet	Residual Neural Network
mmHg	Millimeter of Mercury
LOSO	Leave One Subject Out
ECG	Electrocardiogram
BCG	Ballistocardiogram
PPG	Photoplethysmogram
MTL	Multi-Task Learning
PTT	Pulse Transit Time
PWV	Pulse Wave Velocity
ABP	Arterial Blood Pressure
SBP	Systolic Blood Pressure
DBP	Diastolic Blood Pressure
CNN	Convolutional Neural Network
RNN	Recurrent Neural Network
MAE	Mean Absolute Error
AI	Artificial Intelligence
BP	Blood Pressure
r	Pearson’s Correlation Coefficient

References

Zhou, B.; Carrillo-Larco, R.M.; Danaei, G.; Riley, L.M.; Paciorek, C.J.; Stevens, G.A.; Gregg, E.W.; Bennett, J.E.; Solomon, B.; Singleton, R.K.; et al. Worldwide trends in hypertension prevalence and progress in treatment and control from 1990 to 2019: A pooled analysis of 1201 population-representative studies with 104 million participants. Lancet 2021, 398, 957–980. [Google Scholar] [CrossRef] [PubMed]
World Health Organization. Global Report on Hypertension: The Race Against a Silent Killer. 2023. Available online: https://www.who.int/publications/i/item/9789240081062 (accessed on 19 December 2024).
Picone, D.S.; Schultz, M.G.; Otahal, P.; Aakhus, S.; Al-Jumaily, A.M.; Black, J.A.; Bos, W.J.; Chambers, J.B.; Chen, C.H.; Cheng, H.M.; et al. Accuracy of cuff-measured blood pressure: Systematic reviews and meta-analyses. J. Am. Coll. Cardiol. 2017, 70, 572–586. [Google Scholar] [CrossRef] [PubMed]
Kachuee, M.; Kiani, M.M.; Mohammadzade, H.; Shabany, M. Cuff-less high-accuracy calibration-free blood pressure estimation using pulse transit time. In Proceedings of the 2015 IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal, 24–27 May 2015; pp. 1006–1009. [Google Scholar]
Dai, D.; Ji, Z.; Wang, H. Non-invasive continuous blood pressure estimation from single-channel PPG based on a temporal convolutional network integrated with an attention mechanism. Appl. Sci. 2024, 14, 6061. [Google Scholar] [CrossRef]
Corazza, I.; Zecchi, M.; Corsini, A.; Marcelli, E.; Cercenelli, L. Technologies for hemodynamic measurements: Past, present and future. In Advances in Cardiovascular Technology; Elsevier: Amsterdam, The Netherlands, 2022; pp. 515–566. [Google Scholar]
Huang, J.J.; Yu, S.I.; Syu, H.Y.; See, A.R. The non-contact heart rate measurement system for monitoring HRV. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan, 3–7 July 2013; pp. 3238–3241. [Google Scholar]
Huang, J.J.; Huang, Y.M.; See, A.R. Studying peripheral vascular pulse wave velocity using bio-impedance plethysmography and regression analysis. ECTI Trans. Comput. Inf. Technol. (ECTI-CIT) 2017, 11, 63–70. [Google Scholar] [CrossRef]
Huang, J.J.; Syu, H.Y.; Cai, Z.L.; See, A.R. Development of a long term dynamic blood pressure monitoring system using cuff-less method and pulse transit time. Measurement 2018, 124, 309–317. [Google Scholar] [CrossRef]
Huang, B.; Chen, W.; Lin, C.L.; Juang, C.F.; Wang, J. MLP-BP: A novel framework for cuffless blood pressure measurement with PPG and ECG signals based on MLP-Mixer neural networks. Biomed. Signal Process. Control 2022, 73, 103404. [Google Scholar] [CrossRef]
Vidhya, C.; Maithani, Y.; Singh, J.P. Recent advances and challenges in textile electrodes for wearable biopotential signal monitoring: A comprehensive review. Biosensors 2023, 13, 679. [Google Scholar] [CrossRef] [PubMed]
Rastegar, S.; GholamHosseini, H.; Lowe, A. Non-invasive continuous blood pressure monitoring systems: Current and proposed technology issues and challenges. Phys. Eng. Sci. Med. 2020, 43, 11–28. [Google Scholar] [CrossRef] [PubMed]
Vardhan, K.R.; Vedanth, S.; Poojah, G.; Abhishek, K.; Kumar, M.N.; Vijayaraghavan, V. BP-Net: Efficient deep learning for continuous arterial blood pressure estimation using photoplethysmogram. In Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 13–16 December 2021; pp. 1495–1500. [Google Scholar]
Kachuee, M.; Kiani, M.M.; Mohammadzade, H.; Shabany, M. Cuffless blood pressure estimation algorithms for continuous health-care monitoring. IEEE Trans. Biomed. Eng. 2016, 64, 859–869. [Google Scholar] [CrossRef] [PubMed]
González, S.; Hsieh, W.T.; Chen, T.P.C. A benchmark for machine-learning based non-invasive blood pressure estimation using photoplethysmogram. Sci. Data 2023, 10, 149. [Google Scholar] [CrossRef] [PubMed]
Slapničar, G.; Mlakar, N.; Luštrek, M. Blood pressure estimation from photoplethysmogram using a spectro-temporal deep neural network. Sensors 2019, 19, 3420. [Google Scholar] [CrossRef] [PubMed]
Rong, M.; Li, K. A multi-type features fusion neural network for blood pressure prediction based on photoplethysmography. Biomed. Signal Process. Control 2021, 68, 102772. [Google Scholar] [CrossRef]
Lee, J.; Scott, D.J.; Villarroel, M.; Clifford, G.D.; Saeed, M.; Mark, R.G. Open-access MIMIC-II database for intensive care research. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011; pp. 8315–8318. [Google Scholar]
UCI Machine Learning Repository. Cuff-Less Blood Pressure Estimation. 2015. Available online: https://archive.ics.uci.edu/dataset/340/cuff+less+blood+pressure+estimation (accessed on 21 December 2023).
White, W.B.; Berson, A.S.; Robbins, C.; Jamieson, M.J.; Prisant, L.M.; Roccella, E.; Sheps, S.G. National standard for measurement of resting and ambulatory blood pressures with automated sphygmomanometers. Hypertension 1993, 21, 504–509. [Google Scholar] [CrossRef] [PubMed]
Teng, X.; Zhang, Y. Continuous and noninvasive estimation of arterial blood pressure using a photoplethysmographic approach. In Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No. 03CH37439), Cancun, Mexico, 17–21 September 2003; Volume 4, pp. 3153–3156. [Google Scholar]
Kurylyak, Y.; Lamonaca, F.; Grimaldi, D. A Neural Network-based method for continuous blood pressure estimation from a PPG signal. In Proceedings of the 2013 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Minneapolis, MN, USA, 6–9 May 2013; pp. 280–283. [Google Scholar]
Liu, M.; Po, L.M.; Fu, H. Cuffless blood pressure estimation based on photoplethysmography signal and its second derivative. Int. J. Comput. Theory Eng. 2017, 9, 202. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Caruana, R. Multitask learning. Mach. Learn. 1997, 28, 41–75. [Google Scholar] [CrossRef]
Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef] [PubMed]
Clifford, G.D.; Scott, D.J.; Villarroel, M. User Guide and Documentation for the MIMIC II Database; MIMIC-II Database Version 2.6; The Laboratory for Computational Physiology: Cambridge, MA, USA, 2009. [Google Scholar]
Mehra, R. On the identification of variances and adaptive Kalman filtering. IEEE Trans. Autom. Control 1970, 15, 175–184. [Google Scholar] [CrossRef]
Mousavi, S.S.; Firouzmand, M.; Charmi, M.; Hemmati, M.; Moghadam, M.; Ghorbani, Y. Blood pressure estimation from appropriate and inappropriate PPG signals using A whole-based method. Biomed. Signal Process. Control 2019, 47, 196–206. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Wen, Q.; Sun, L.; Yang, F.; Song, X.; Gao, J.; Wang, X.; Xu, H. Time Series Data Augmentation for Deep Learning: A Survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, QB, Canada, 19–27 August 2021; pp. 4653–4660. [Google Scholar] [CrossRef]
Smith, L.N. A disciplined approach to neural network hyper-parameters: Part 1— Learning rate, batch size, momentum, and weight decay. arXiv 2018, arXiv:1803.09820. [Google Scholar]
Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the ICML’13: Proceedings of the 30th International Conference on International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
Liu, P.; Qiu, X.; Huang, X. Adversarial Multi-task Learning for Text Classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017. [Google Scholar]
Smith, L.N. Cyclical Learning Rates for Training Neural Networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017. [Google Scholar] [CrossRef]

Figure 1. Pipeline of the proposed novel cBP-Tnet efficient single-channel model deep learning method for continuous non-invasive blood pressure monitoring with a photoplethysmogram.

Figure 2. Architecture of proposed novel “cBP-Tnet” efficient single-channel model deep learning method for continuous non-invasive blood pressure monitoring with photoplethysmogram.

Figure 3. cBP-Tnet AI model training and validation loss plot for SBP and DBP—optimal epochs: 1295.

Figure 4. cBP-Tnet AI model SBP (top) and DBP (bottom) prediction vs. actual (test) plots.

Figure 5. cBP-Tnet AI model SBP (test) predicted vs. actual plot, r = 88.5%.

Figure 6. cBP-Tnet AI model DBP (test) predicted vs. actual plot, r = 87.4%.

Table 1. Pseudocode of the cBP-Tnet signal processing and training pipeline.

Step	Description
1	Set random seeds.
2	Load PPG and ABP CSV files.
3	For each PPG/ABP signal, apply adaptive Kalman filter.
4	For each filtered PPG/ABP pair:
	Synchronize signals;
	Detect PPG peaks.
5	For each beat:
	Extract fixed-length PPG segment and derivatives → feature;
	Compute SBP = max(ABP segment), DBP = min(ABP segment).
6	Filter samples by SBP $\in [80, 180]$ and DBP $\in [60, 130]$ .
7	Stratified split into train/validation/test sets.
8	Standardize features.
9	Build DataLoaders (augment training only).
10	Initialize MultitaskTransformerModel.
11	Train with OneCycleLR and L1 loss, saving best weights.
12	Load best model weights.
13	Predict on validation/test sets → compute MAE, correlation, AAMI, BHS.
14	Script model to TorchScript; predict on test set; compare MAE.

Table 2. Leave-One-Subject-Out (LOSO) experiments (2 s window raw signal, base model).

LOSO Experiments	Systolic Blood Pressure (MAE, mmHg)	Diastolic Blood Pressure (MAE, mmHg)
cBP-Tnet (raw PPG only)	5.72	3.09
cBP-Tnet (raw PPG + PPG′)	5.08 (▾11.24%)	2.79 (▾9.55%)
cBP-Tnet (raw PPG + PPG′ + PPG″)	5.00 (▾1.56%)	2.75 (▾1.43%)
cBP-Tnet (raw PPG + PPG′ + PPG″ + Adaptive Kalman Filter)	4.95 (▾1.06%)	2.80 (▾1.56%)
cBP-Tnet (raw PPG + PPG′ + PPG″ + Adaptive Kalman Filter + SBP/DBP Outlier Removal)	4.81 (▾2.69%)	2.38 (▾14.84%)
cBP-Tnet (raw PPG + PPG′ + PPG″ + Adaptive Kalman Filter + SBP/DBP Outlier Removal + Signal Synchronization)	4.80 (▾0.29%)	2.35 (▾1.30%)
cBP-Tnet (raw PPG + PPG′ + PPG″ + Adaptive Kalman Filter + SBP/DBP Outlier Removal + Signal Synchronization + Data Augmentation)	4.71 (▾1.85%)	2.34 (▾0.43%)

Note: Bold entries highlight the added one signal-processing component; Italic entries denote the full end-to-end model; ▾ indicates reduction in MAE relative to the previously added one signal-processing component.

Table 3. cBP-Tnet AI model hyperparameter tuning/analysis.

cBP-Tnet Hyperparameter Tuning/Analysis	$d_{model}$	h	N	$P_{drop}$	Grad Clip	SBP MAE (mmHg)	DBP MAE (mmHg)
cBP-Tnet (Base) Model	128	4	8	0.05	4.0	4.71	2.34
(A)	64					7.25	3.72
(A)	256					5.03	2.55
(B)		2				6.02	3.15
(B)		8				4.75	2.36
(C)			6			4.77	2.42
(C)			10			4.76	2.37
(D)				0.00		4.74	2.37
(D)				0.10		4.87	2.46
(E)					0.0	145.43	69.72
(E)					8.0	4.75	2.36
cBP-Tnet (Extended) Model	128	4	8	0.05	4.0	4.32	2.18

Note: Bold font indicates the base model configuration and its resulting MAE; Bold-Italic fonts denotes the extended model configuration. Experiments (A)–(E) refer to the per-parameter sweeps described in the text.

Table 4. Deep learning-based methods related to estimate blood pressure with automatic photoplethysmogram feature extraction mechanisms.

Related Deep Learning Methods	SBP (MAE, mmHg)	DBP (MAE, mmHg)
ResNet	9.43 (r = N/A)	6.88 (r = N/A)
MTFF-ANN	5.59 (r = 0.92)	3.36 (r = 0.86)
TCN-CBAM	5.35 (r = 0.80)	2.12 (r = 0.60)
cBP-Tnet	4.32 (r = 0.89)	2.18 (r = 0.87)

Note: Boldface and italics denote the proposed cBP-Tnet method.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pimentel, A.A.; Huang, J.-J.; See, A.R.A. cBP-Tnet: Continuous Blood Pressure Estimation Using Multi-Task Transformer Network with Automatic Photoplethysmogram Feature Extraction. Appl. Sci. 2025, 15, 7824. https://doi.org/10.3390/app15147824

AMA Style

Pimentel AA, Huang J-J, See ARA. cBP-Tnet: Continuous Blood Pressure Estimation Using Multi-Task Transformer Network with Automatic Photoplethysmogram Feature Extraction. Applied Sciences. 2025; 15(14):7824. https://doi.org/10.3390/app15147824

Chicago/Turabian Style

Pimentel, Angelino A., Ji-Jer Huang, and Aaron Raymond A. See. 2025. "cBP-Tnet: Continuous Blood Pressure Estimation Using Multi-Task Transformer Network with Automatic Photoplethysmogram Feature Extraction" Applied Sciences 15, no. 14: 7824. https://doi.org/10.3390/app15147824

APA Style

Pimentel, A. A., Huang, J.-J., & See, A. R. A. (2025). cBP-Tnet: Continuous Blood Pressure Estimation Using Multi-Task Transformer Network with Automatic Photoplethysmogram Feature Extraction. Applied Sciences, 15(14), 7824. https://doi.org/10.3390/app15147824

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

cBP-Tnet: Continuous Blood Pressure Estimation Using Multi-Task Transformer Network with Automatic Photoplethysmogram Feature Extraction

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. MIMIC II Dataset Loading and Preprocessing

3.2. Automatic Photoplethysmogram Feature Extraction

3.3. PPG/ABP Data Filtering and Splitting

3.4. Signal Normalization and Augmentation

3.5. Proposed cBP-Tnet Multi-Task Transformer Model Training

3.5.1. Input Projection and Positional Encoding

3.5.2. Multi-Head Scaled Dot-Product Attention

3.5.3. Residual Connections and Layer Normalization

3.5.4. Position-Wise Feed-Forward Network (FFN)

3.5.5. Global Max Pooling Layer

3.5.6. Multi-Task Learning Output Layer

4. cBP-Tnet Experimental Results and Discussions

4.1. Leave-One-Subject-Out (LOSO) Experiments

4.2. Hyperparameter Tuning/Analysis

4.3. Comparison Against Related Deep Learning Methods to Estimate Blood Pressure with Automatic Feature Extraction Using Photolethysmogram Feature Extraction

4.4. cBP-Tnet Evaluation and Visualization

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI