BioDiffusion: A Versatile Diffusion Model for Biomedical Signal Synthesis

Machine learning tasks involving biomedical signals frequently grapple with issues such as limited data availability, imbalanced datasets, labeling complexities, and the interference of measurement noise. These challenges often hinder the optimal training of machine learning algorithms. Addressing these concerns, we introduce BioDiffusion, a diffusion-based probabilistic model optimized for the synthesis of multivariate biomedical signals. BioDiffusion demonstrates excellence in producing high-fidelity, non-stationary, multivariate signals for a range of tasks including unconditional, label-conditional, and signal-conditional generation. Leveraging these synthesized signals offers a notable solution to the aforementioned challenges. Our research encompasses both qualitative and quantitative assessments of the synthesized data quality, underscoring its capacity to bolster accuracy in machine learning tasks tied to biomedical signals. Furthermore, when juxtaposed with current leading time-series generative models, empirical evidence suggests that BioDiffusion outperforms them in biomedical signal generation quality.


Introduction
The significance of biomedical signal processing is continually emphasized in its role across various ubiquitous computing applications.The quest for accurate, arxiv pre-print version dependable data has been a driving force for innovations that lead to improved assistive technologies and deeper insights into diagnostics, patient monitoring, and therapeutics.Electrocardiograms (ECGs), electroencephalograms (EEGs), and data from human activity sensors represent a treasure trove of information.Their analysis has ushered in transformative breakthroughs, but not without associated challenges.
One major hurdle faced by biomedical signal processing is the intricacies that arise due to limited dataset size, imbalances in datasets, artificial noise, and anomalies.These factors can critically compromise the performance of machine learning models, necessitating the development of innovative solutions.Historically, approaches like data augmentation, data resampling, and statistical analyses have been employed to mitigate these challenges.Yet, the inherently non-stationary and multivariate characteristics of biomedical signals add another layer of complexity.Encouragingly, recent research trends highlight an uptick in leveraging deep learning for enhancing the preprocessing of biomedical signals [9,20,21].
Deep learning, though powerful, is often constrained by the nuances of biomedical datasets.Recognizing these challenges, our study introduces the BioDiffusion model, a diffusion-based probabilistic approach tailored for biomedical signal generation.Designed to adeptly handle a plethora of generation tasks, BioDiffusion serves as a holistic solution to biomedical signal synthesis challenges.From expanding training dataset sizes to anomaly removal and super-resolution, our model's adaptability offers a promising avenue for more efficient and precise analysis techniques in clinical applications.
Inspired by the Stable Diffusion model's prowess in image synthesis [19], we've adapted the BioDiffusion model to resonate with the unique traits of biomedical signals.To evaluate our model, we've engaged in a multi-faceted assessment, employing visual similarity comparisons, dimension reduction technologies like UMAP [13], and similarity scores such as wavelet Through rigorous benchmarking against contemporary time-series synthesis models, our findings illuminate the BioDiffusion model's superior proficiency in generating high-fidelity biomedical signals.The implications of our proposed model are profound; it presents a significant stride toward enhancing diagnostics, patient monitoring, and advancing biomedical research.

Main Contributions:
• Presentation of the BioDiffusion model, our innovative diffusion-based probabilistic approach tailored to address the complexities inherent in biomedical signal generation.
• Demonstration of our model's versatility in handling diverse generation tasks, presenting a unified solution to biomedical signal synthesis.
• Comprehensive evaluation of the BioDiffusion model through both qualitative and quantitative metrics, underscoring its effectiveness and precision.
• Comparative analysis highlighting the superior capability of BioDiffusion in biomedical signal synthesis relative to existing state-of-the-art models.
The remainder of this paper is structured as follows: Section 2 delves into pertinent works related to signal synthesis.Section 3 elucidates the BioDiffusion model's methodology and its specific adaptions for biomedical signals.Section 4 presents our experimental framework, datasets, evaluation metrics, and a comparative analysis with other models, underlining BioDiffusion's standout performance.Finally, Section 5 rounds off the paper, encapsulating the salient points and suggesting avenues for future exploration.

Related Work
This section delves into the pertinent literature in the realms of generative models and diffusion models for signal synthesis.Our objective is to offer a comprehensive perspective on their evolution, strengths, and constraints, especially in the context of time-series signal synthesis.

Generative Models in Signal Synthesis
Generative models aim to discern the inherent structure of data, enabling the generation of new samples mirroring the original data.Several paradigmatic approaches within generative models for time-series synthesis include: • Generative Adversarial Networks (GANs): Comprising two adversarial networks-the generator and the discriminator-GANs aim for the generator to improve its synthetic data samples to deceive the discriminator.Their prowess extends to various data types including time-series signals.• Autoregressive Models: These models sequentially generate samples, with each new element contingent on prior elements.WaveNet by Aaron van den Oord et al. [18] exemplifies this, producing raw audio waveforms using dilated causal convolutions for long-range temporal relationship capture.Although proficient in modeling temporal dynamics, their sequential nature can be computationally slow and may falter in grasping extended dependencies.
• Other generative paradigms like Normalizing Flows, Restricted Boltzmann Machines (RBMs), and Non-negative Matrix Factorization (NMF) have been explored.However, their efficacy diminishes with multidimensional non-stationary time-series signals.

Diffusion Models for Time-series Synthesis
Diffusion models harness latent variables to understand a dataset by modeling data point propagation through latent space.They function by adding Gaussian noise to training data (forward diffusion) and subsequently reversing this process (reverse diffusion) to retrieve the data [27].Their prowess has been manifested in diverse arenas like image synthesis and molecule design [26].
Several prominent studies in diffusion models include: • Yang L. et al.'s comprehensive discourse on deep learning-based diffusion models and their applicability to time-series tasks [27].
• Garnier O. et al. augmenting diffusion models for infinite-dimensional spaces, targeting audio signals and time series [6].
• Kong et al.'s exploration into audio synthesis through diffusion models [10] and Tashiro et al.'s venture into time-series imputation [25].
• Alcaraz et al.'s pursuit of time-series forecasting using diffusion models [1].
While these studies accentuate the capabilities of generative and diffusion models for time-series synthesis, there remain challenges in terms of scalability, stability, and fidelity, especially for intricate biomedical signals.Our proposed BioDiffusion model stands as an endeavor to surmount these challenges, deriving inspiration from prior works while innovating for enhanced versatility and efficacy in biomedical signal synthesis.The forthcoming section elucidates the methodology underlying BioDiffusion, illustrating its potential to revolutionize biomedical signal synthesis.

Diffusion Probabilistic Models
This section provides an overview of diffusion models, their extension to conditional data generation, and associated neural architectures.
Diffusion models [23,8] consist of a forward process that iteratively degrades data x 0 ∼ q(x 0 ) by adding Gaussian noise over T iterations: The reverse process incrementally restores the noisecorrupted data: The forward process hyperparameters β t are set such that x T approximates a standard normal distribution.The reverse process optimizes the evidence lower bound (ELBO) [5], with the loss given by: where Following prior work [23,8], the reverse process parameters are: with α t = 1 − β t , ᾱt = t s=1 α s , and βt = 1− ᾱt−1 1− ᾱt β t .Improved sample quality is achieved by optimizing modified losses, resembling denoising score matching over multiple noise levels [24,8].
A critical aspect of diffusion models is the extension to conditional data generation, wherein both the data x 0 and a set of conditions c are incorporated.The conditions can be any additional information or constraints provided externally, influencing the generative process.By assimilating c, the reverse process becomes: Intuitively, c offers an avenue to guide the generative model, providing a degree of control over the outputs.This inclusion makes diffusion models versatile, catering to scenarios like content-specific image generation or style-conditioned audio synthesis.
For the architecture, we employ a feed-forward neural network.It has distinct input layers for data, the conditions c, and the time step.In line with the approach in [12], our model leverages multi-scale structures through convolutional layers, enabling the extraction of hierarchical information.The training strategy employs early stopping, hinging on the validation set ELBO to prevent overfitting.

Unconditional Diffusion Models
The unconditional diffusion model employs a Markov chain-based generation process, converting data iteratively between its original form and noise.This intricate transformation is portrayed in Figure 1.Backward Process: Initiating this process, signals derived from Gaussian noise serve as inputs at diffusion step T .As the model retraces the steps back to 0, it methodically diminishes the noise.Each step t consumes the previous step's output (t + 1) as its input.A crucial aspect during this phase is the evaluation of the KL divergence between signals at the corresponding steps in both the forward and backward processes.The objective is to minimize this divergence.When the backward process culminates at step 0, the signals generated should closely mirror the original ones.
Signal Generation: Post-training, the model is equipped to accept random Gaussian noise.By invoking the backward process, it can craft synthetic signals.This procedure is dubbed "unconditional" due to the absence of stipulations on the signal generation from the noise.Such a design empowers the diffusion model to assimilate the dataset's entire distribution, endowing it with the capability to potentially produce any signal within the dataset's feature space.

Label Conditional Diffusion Models
Label-conditional diffusion models extend the framework of their unconditional counterparts by integrating scalar labels with each input datum.This inclusion of labels not only shapes the diffusion process but also allows for more targeted synthesis of signals, as elaborated below.
Forward Process with Labels: In this process, as depicted in Fig. 2, original signals are systematically associated with their corresponding labels.Within the U-Net architecture (detailed in section 3.4), each residual block is enriched with both the scalar label and the ongoing diffusion timestep, leveraging an embedding technique.
Backward Process with Labels: Here, the diffu- Synthetic Signal Generation: The culmination of this methodology is a trained diffusion model possessing dual capabilities.It is not only attuned to the holistic data distribution of the dataset but is also adept at crafting synthetic signals pertinent to a delineated class.

Signal Conditional Diffusion Models
Signal-conditional diffusion models, visualized in Fig. 3, introduce a nuanced methodology where signal conditions play a pivotal role exclusively during the backward diffusion phase, differentiating it from label-conditional approaches.
Forward Process: The forward diffusion process in the case of signal conditioning is the same as the original, unconditional diffusion.

Backward Diffusion with Signal Conditioning:
For the backward phase, a perturbed signal forms the conditional input, which could stem from an original signal sample tainted by noise, artifacts, or even be a downsampled version mirroring the original signal's dimensions.This conditional signal is amalgamated with noise drawn from a normal distribution.Post this combination, a convolutional layer refines it to align with the original signal's structure.The remainder of the backward process strives to cleanse the noise and produce a clean signal resembling the original signal it was seeded with.

U-Net Architecture
The U-Net architecture (see Figure 4) is used in each diffusion process pipeline.We modify the model depicted in the work [22]

Experimental Results
This section delves into the various methodologies employed by diffusion models in the synthesis of biomedical signals.We've partitioned our approach into three categories: unconditional, label-conditional, and signal-conditional diffusion processes.Our qualitative and quantitative evaluations underscore the efficacy of the generated synthetic data.We also benchmark our findings against state-of-the-art methods, underscoring the advantages of our model and pinpointing areas ripe for further refinement.We aim to demonstrate that diffusion models are promising candidates for crafting high-caliber biomedical signals, potentially transforming myriad biomedical arenas.

Datasets
Our model's performance is gauged across three datasets: a simulated one, the UniMiB human activity recognition (HAR) dataset [14], and the MIT-BIH Arrhythmia Database [16,7].For an in-depth exploration of the datasets, kindly refer to the Appendix.

Visualization of Raw Signals
To assess the fidelity of synthetic signals visually, we present a comparative plot between several real and synthetic signals.For continuity, discrete signal values at each sampling interval are interconnected.

Projection Through Dimension Reduction
For each class in every dataset, an unconditional diffusion model is trained.The UMAP projection of synthetic signals in relation to the original ones for select data classes is depicted in Figure 6.Extended visualizations are accessible in the provided source code repository.When scrutinized, it becomes evident that even for signals of considerable length (e.g., 512 timesteps), our diffusion model adeptly recognizes and replicates the intricate signal patterns.Moreover, the synthetic signals span the entire feature spectrum inhabited by the genuine signals.

Similarity Metrics and Results
The label-conditional diffusion model not only emulates the synthetic signal generation prowess of the unconditional model but also provides a guided synthesis tailored for specific classes.efficiency.A singular label-conditional diffusion model suffices for a multi-class dataset, in contrast to the multiple models required by the unconditional counterpart for each class.Intriguingly, when it comes to sparsely represented data classes, the label-conditional model potentially outperforms the unconditional one.This edge is attributed to its capacity to generalize patterns across the dataset and utilize this knowledge for class-specific synthesis.
To underscore the fidelity of signals generated by our diffusion models, we calculate similarity scores across diverse signal classes.The results are cataloged in Table 1.Our BioDiffusion model's outputs closely align with real signals, surpassing the fidelity of other cutting-edge techniques.
Quantitative Metrics: • Wavelet Coherence: A measure to assess common oscillations between two time-series across specific frequencies over time.Values oscillate between 0 and 1, with 1 denoting impeccable coherence.Due to its proficiency with non-stationary signals, it's a vital tool for evaluating evolving spectral content [11].
• Discriminative Score: Conceived as a quantitative metric to juxtapose sequences from real and generated datasets.It employs a 2-layer LSTM classifier trained to segregate the two datasets.
The classification error on a separate test set offers an objective similarity evaluation [28].
Baseline Techniques: • C-RNN-GAN: A pioneering GAN-based solution for sequential data synthesis using two-layer LSTM for both generator and discriminator [15].
• RCWGAN: An enhanced version of C-RNN-GAN with conditional data input for controlled generation [2].
• TimeGAN: A groundbreaking GAN framework that harnesses a latent space for time-series synthesis, augmented with both supervised and unsupervised losses [28].
• SigCWGAN: Enhances the GAN process with conditional data and the Wasserstein loss for stable training [17].
• TTS-CGAN: An iterative version of TTS-GAN introducing a label conditional transformer GAN,  A notable application of BioDiffusion lies in the generation of individualized signals.Scarcity of data samples from individual subjects can be a bottleneck for certain machine learning applications.However, our approach allows a diffusion model to be trained on diverse signals, which is then fine-tuned using signals from an individual subject.This method generates synthetic signals that retain the unique patterns of the subject, enabling the expansion of subject-specific datasets.
For more visual examples of the output of BioDiffusion in upsampling and personalized signal generation, please see the Appendix.

Conclusion
In conclusion, the proposed BioDiffusion model is a novel and versatile probabilistic model specifically designed for generating synthetic biomedical signals.
Our model offers a comprehensive solution for various generation tasks, including unconditional, labelconditional, and signal-conditional generation, which makes it a valuable tool for biomedical signal synthesis.We evaluated the quality of the generated signals using qualitative and quantitative assessments and demonstrated the effectiveness and accuracy of the BioDiffusion model in producing high-quality synthetic biomedical signals.Compared to state-of-the-art timeseries synthesis models, our BioDiffusion model consistently outperforms its counterparts, showcasing its superiority and robustness in biomedical signal generation.The model's versatility and adaptability have the potential to significantly contribute to the advancement of biomedical signal processing techniques, opening up new possibilities for improved research outcomes and clinical applications.

B Training details
We train an unconditional diffusion model per class per dataset.The training details are as follows.

Architecture
• Base channels: 64   We selected three types of noise that are frequently involved in time-series signal collections.They are: • Thermal noise, also known as white noise, is a type of random electrical noise that occurs in electronic circuits and arises from the thermal agitation of electrons, which results in a fluctuation of the voltage or current that is independent of the signal being measured.• Electrode contact noise, also known as low-frequency drift, is a type of noise that arises in electronic measurements due to changes in the electrical characteristics of the contact between the electrode and the surface being measured, which can cause fluctuations in the baseline signal over time.
• Motion artifacts, also known as random spikes, are unwanted signals that can occur in physiological or biological measurements due to movement or other physical disturbances, which can cause sudden, brief spikes in the recorded signal that are not related to the underlying biological activity being measured.

D.2 Signal Imputation
Signal imputation is another task that BioDiffusion can handle.Oftentimes, the collected signals may contain some missing values.We can use BioDiffusion to fill in those blanks.Fig. 14

D.3 Signal Super-resolution
Biomedical signals of identical types can possess distinct sampling rates due to the usage of different equipment for collection.This necessitates the application of signal downsampling or upsampling techniques to match the sampling rates when these signals are used concurrently.However, conventional upsampling methods like Hamming windows, linear/cubic interpolation, and zero-padding followed by low-pass filtering, may fall short in capturing intricate relationships among signal features.This shortcoming restricts their capacity to generate highquality, realistic upsampled signals.A potential solution to this limitation can be found in deep learning-based super-resolution techniques.Our signal conditional diffusion model, designed for signal upsampling, has been trained to create high-resolution signals that closely resemble their original counterparts.This is illustrated in Fig 15, where the model-generated signal exhibits features more akin to the original signal than the downsampled version.

D.4 Individual signal generation
One of the challenges that hinders certain machine learning applications on biomedical signals is the insufficient data samples from each individual subject.To address this issue, signal conditional diffusion models can be utilized.Initially, a diffusion model is trained on a specific type of signals from numerous subjects.Afterwards, a small number of signals from a single subject are utilized as conditional inputs to enable the diffusion model to generate a multitude of synthetic signals that incorporate the distinctive data patterns of that subject.As a result, synthetic signals can be employed to expand the dataset size of an individual subject and facilitate the development of machine learning applications tailored to that particular subject.
arXiv:2401.10282v2 [eess.SP] 27 Jan 2024 coherence.Additionally, our research delves into the synthesis signals' potential in machine learning model training, juxtaposing synthetic signals against real signals.

Figure 1 :
Figure 1: Unconditional Diffusion process Forward Process: Starting with the original signal, it is incrementally perturbed with Gaussian noise over a series of diffusion steps, spanning [0, T ].By the end of step T , the resulting signal retains the dimensions of the original but its data values adopt a normal distribution.

Figure 2 :
Figure 2: Label Conditional Diffusion process sion model ingests noise, drawn from a normal distribution, in tandem with a designated label.As the model progresses through the diffusion steps, there's a persistent focus on quantifying and minimizing the KL divergence between the signals emerging from the forward and backward processes.
to fit for time-series signals instead of N xN images.The signals in time step x t are concatenated with the time step embeddings and other

Fig- ure 5
illustrates a set of both real and synthetic signals derived from three distinct datasets.An immediate examination reveals the capability of our diffusion model in crafting synthetic signals that closely mirror the genuine signals.

Figure 4 :
Figure 4: Description of the U-Net architecture for signals with skip connections

Figure 5 :
Figure 5: Raw signals comparison.The left column shows real raw signals.The right column shows synthetic raw signals.

Figure 6 :
Figure 6: The real and synthetic data UMAP projection on three classes of three datasets.Each red dot represents one original signal after the dimension reduction, whereas a blue dot represents one synthetic signal.From the graphs, we can see that a set of synthetic signals are having similar distribution as a set of real signals in the 2D UMAP projections graphs.

• 8 • 4 Training•• 4 •••• 8 • 4 • 5 Training•• 4 ••Architecture•• 8 • 2 • 4 Training•• 4 •
Residual blocks groups: Attention heads: Optimizer: Adam • Batch size: 32 Learning rate: 3e-EpochsNoise schedule: cosine • Loss: l1 We train a label condition diffusion model per dataset.Each signal sample is paired with a scalar label.The training details are as follows.Number classes: 5 (Simulated and MITBIH dataset) • Number classes: 9 (UNIMIB dataset) Residual blocks groups: Attention heads: Conditional drop prob: 0.Optimizer: Adam • Batch size: 32 Learning rate: 3e-Epochs: 100 • Hardware: NVIDIA GeForce Diffusion • Diffusion timesteps: 1000 Noise schedule: cosine • Loss: l1 A signal conditional model is trained on a specific class of data, and synthetic signals are generated by using distorted signals as conditional inputs.The distorted signals provided to the model were not present in the training set, with the aim of assessing the model's ability to accurately restore them to their original form.Here we present several possible implementations of the model, it should be noted that these examples are not exhaustive, and the model is capable of other implementations as well.Base channels: 64 Channel multipliers: 1, 2, 4, 8, Residual blocks groups: Attention heads: Optimizer: Adam • Batch size: 32 Learning rate: 1e-

Figure 9 :
Figure 9: Real and synthetic signal UMAP projection on selected classes of simulated dataset

Figure 10 :
Figure 10: Real and synthetic signal UMAP projection on selected classes of UniMiB dataset

Figure 11 :
Figure 11: Real and synthetic signal UMAP projection on selected classes of MITBIH ECG dataset

Figure 12 :
Figure 12: Three type of noises involved in biomedical signals.First row: Thermal noise.Second row: Motion artifacts noise Third row: Electrode noise shows a few examples of signal imputation.The first row shows the original signals.The second row shows the same signals with some randomly missing values (values set to zero).We use them as signal conditions input to the diffusion model.The third row shows the reconstructed signals.We can see that the synthetic signals fill in the blanks and are very similar to the original signals.

Figure 14 :
Figure 14: Example signal imputation results.First row: real signals.Second row: signals with random blanks.Third row: imputed signals using BioDiffusion.

Table 1 :
Comparison scores of real and synthetic data generated by different state-of-the-art time-series generation models.
only enhanced the training set but also significantly bolstered the F1-score for the detection of minority classes.In contrast, signals synthesized by models like RCWGAN and C-RNN-GAN led the downstream classifier to a biased classification-predominantly towards the majority class (Non-Ectopic Beats), effectively nullifying the F1-score for other classes.It is