Next Article in Journal
MGFormer: Super-Resolution Reconstruction of Retinal OCT Images Based on a Multi-Granularity Transformer
Previous Article in Journal
Exhaled Breath Analysis (EBA): A Comprehensive Review of Non-Invasive Diagnostic Techniques for Disease Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Early Detection of Cardiovascular Disease Using Laser-Induced Breakdown Spectroscopy Combined with Machine Learning

1
Department of Physics, The University of Faisalabad, Faisalabad 38000, Pakistan
2
Department of Physics, University of Agriculture Faisalabad, Faisalabad 38000, Pakistan
3
Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford OX3 7LD, UK
*
Author to whom correspondence should be addressed.
Photonics 2025, 12(9), 849; https://doi.org/10.3390/photonics12090849 (registering DOI)
Submission received: 21 July 2025 / Revised: 9 August 2025 / Accepted: 22 August 2025 / Published: 25 August 2025
(This article belongs to the Special Issue Advanced Optical Measurement Spectroscopy and Imaging Technologies)

Abstract

Cardiovascular disease (CVD) is a term used for disorders affecting the heart. Globally, it is the most common cause of death. The main purpose of this study was the rapid detection of CVD, which is essential for effective cure and inhibition. Early detection may lower the risk of myocardial infarction (MI) and reduce the death rate in CVD patients. Laser-induced breakdown spectroscopy (LIBS) is a non-invasive and less sample preparation technique for early detection of CVD. LIBS technique investigated the variation in intensities of different biochemical elements such as Calcium (Ca), Nitrogen (N), Sodium (Na), Carbon (C) and CN-band in the spectra of healthy and CVD patients. Machine learning algorithms applied to LIBS spectral data for the determination of validation accuracy and classification between CVD and healthy individuals. Several models achieved a perfect 100% highest accuracy, which showed the exceptional precision in the given configuration. The Narrow Neural Network achieved 100% accuracy on both the validation and test datasets in a short duration of 10.008 s. This preliminary research of LIBS combined with machine learning may provide a complementary method over existing analytical techniques for early detection of CVD.

1. Introduction

Cardiovascular disease (CVD) is a group of diseases that affect the heart and blood vessels and is one of the main causes of death worldwide. Recent studies by the World Health Organization (WHO) indicate that CVD remains the leading cause of death globally, with an estimated 23.6 million people expected to die annually by 2030. CVD accounts for almost 18 million deaths each year, accounting for about 32% of global mortality. Among these fatalities, heart attacks and strokes together represent more than 80% with around one-third occurring in individuals under the age of 70 [1]. Abnormalities in the normal circulation of blood through the heart can lead to numerous severe cardiac conditions. These health issues are classified as CVD, which represent some of the most critical medical concerns. CVD includes conditions that affect the heart, blood vessels throughout the body and the vascular systems of the brain. It led to a major financial burden due to treatment costs and early deaths. Early detection of it can stop the disease progression and decrease the death rate [2]. It is classified between variables such as high cholesterol level, high blood pressure, obesity, smoking and non-variables such as age, genetics and gender. CVD progression can be cured after diagnosis of various factors. Numerous patients are symptomless till later stages; therefore, there is no early identification of heart attack, such as chest pain, breathlessness, fatigue and dizziness for timely medication [1,3].
Early detection is important, which helps in medication and further beneficial tactics to decrease the risk of CVD. If CVD is not detected earlier, it progresses quietly like a plaque buildup or a blood clot, and the blood flow to the heart muscle is cut off, which results in a heart attack [4]. There are four stages of heart disease stage 0, stage A, stage B, and stage C/D. At stage 0, there is no heart failure factor, no symptom of CVD, and no abnormal function. At stage A, at least one of the subsequent medical risk issues (hypertension, diabetes mellitus, obesity, metabolic syndrome, atherosclerotic disease, acute coronary syndrome). At stage B, there are risk causes and the existence of at least one functional irregularity (valvular heart disease, abnormality in regional wall motion), but deprived of symptoms of heart failure. At stage C/D, prior heart failure is connected with functional irregularities and symptoms of heart failure [5].
The diagnostic techniques for CVD like echo cardiogram (ECG) [6], echocardiography [7] and blood test to check cholesterol level [8], magnetic resonance imaging (MRI) [9], computed tomography (CT) [10], these techniques may only detect when the disease progressed to advanced stage and missing the initial changes that occur in early stage of disease. Other challenges include high costs, limited accessibility and a lack of sensitivity to detect minor abnormalities. These limitations underscore the urgent need for more accurate, accessible, and early-stage diagnostic solutions for effective CVD prevention. The measures of CVD are high cholesterol levels, hypertension and triglyceride levels. Hypertension is a cause of coronary heart disease; this needs to be detected early and accurately to prevent CVD [11].
Laser-induced breakdown spectroscopy (LIBS) is a powerful elemental analysis method that generates emission spectra by using laser-induced plasma for chemical analysis. It stands out for its minimal sample preparation, ability to perform real-time and multi-element detection and capability for remote analysis. These features make LIBS suitable for a wide range of applications from underwater exploration to space missions like analyzing the surface of Mars [12]. During the process of LIBS, a highly energetic laser pulse falls on the sample’s surface to generate plasma by exciting and ionizing the atoms, in which light is emitted, which gives us spectra of different elements of the sample, which is recorded by a spectrometer [13].
Several Other spectroscopic techniques, such as Raman spectroscopy [14], Fourier transform-infrared spectroscopy (FTIR) [15], nuclear magnetic resonance (NMR) spectroscopy [16], and near-infrared (NIR) spectroscopy [17] have also been used for the detection of CVD. These techniques encounter several limitations, including being highly sensitive to molecular changes, expensive, requiring large sample volumes, careful sample preparation and also involving complex equipment, which complicates the analysis. LIBS differentiate between normal and malignant cells in different types of cancer, such as colon, liver, breast, lungs, gastric and blood cancer. It is used to detect trace elements such as Na, C, Ca, N, iron (Fe), magnesium (Mg) and potassium (K), which are present in different types of cancer. Blood, serum and tissue samples were used for LIBS elemental analysis; blood and serum required a small sample of about 50 microliters, and tissue requires about a 5–15 micrometer sample size. LIBS is an in situ detection capability technique. LIBS is used for micro- and multi-elemental analysis. LIBS is simple and able to analyze solids, liquids and gases [18]. In LIBS analysis, different substrates were used, such as filter paper, boric acid, glass slides, and aluminum foil. Boric acid is used to make pallets, which helps to decrease spectral interference.
LIBS integrating with machine learning gives more satisfactory results. Machine learning is an advanced data processing technique recognized by artificial intelligence and used to extract the significant distinguishing features from a sample. Machine learning algorithms like support vector machine (SVM), principal component analysis (PCA), neural networks (NN), k-nearest neighbor (k-NN), linear discriminant analysis (LDA), decision trees (DT) and ensembles classifiers (ECS) are used to compare the whole blood samples for the breast cancer diagnosis [19]. Machine learning processes a large dataset into a small dataset. It can help to make a decision for early diagnosis of CVD. Machine learning algorithms are used to identify patterns and features of the analyzed dataset [20]. These are also used to summarize the data, reduce signal-to-noise ratio (SNR) and feature selection [21].
Machine learning is categorized into supervised and unsupervised learning. Supervised learning is used when a small and clearly labeled dataset is used by using algorithms like decision trees, support vector machines, neural networks, and linear discriminant analysis for different tasks such as classification or regression. On the other hand, unsupervised learning works with large and unlabeled datasets, which are used to discover patterns or groupings in data. PCA is a dimensionality reduction technique that falls under unsupervised learning as it identifies structure in data without labels [22]. In a study, Khan et al. review LIBS for various cancer types. In blood cancers such as lymphoma and multiple myeloma, LIBS combined with chemometric models such as k-NN and random subspace LDA achieved remarkably high performance, with accuracies above 99.7%, sensitivities above 99.6%, and specificities above 99.7%. For melanoma, both tissue and pellet samples showed excellent classification power, with sensitivity reaching up to 99.4% and specificity up to 100%, while formalin-fixed paraffin-embedded (FFPE) tissue samples achieved 100% sensitivity, specificity, and accuracy using artificial neural networks. In nasopharyngeal carcinoma (NPC), serum-based LIBS coupled with a random forest-extreme learning machine model yielded 98.33% accuracy, 99.02% sensitivity, and 97.75% specificity. Cervical cancer tissue analysis using PCA-SVM improved classification accuracy to 94.44%. Lung cancer detection also showed strong results, with a SVM and a random forest model rreaching98.9% accuracy and 99.3% sensitivity, while a deep learning approach using 1D ResNet achieved 91.1% accuracy with a high AUC of 0.99. Additionally, a bagging-voting fusion model applied to serum samples for liver, lung, and esophageal cancers reported an overall accuracy of 92.53% and a recall of 92.92%. These results highlight LIBS as a highly accurate and reliable technique for multi-cancer detection, particularly when integrated with advanced machine learning models [18,19].
LIBS data used to detect biomarkers such as Ca, Na, Mg, K, N, CN-band and C; however, high and low intensities predict the disease earlier, such as high intensities of Ca indicated the aortic valvular interstitial cells (VICs) [23]. There is a significant progression towards achieving precision in medicine all over the world; therefore, LIBS combined with machine learning has made tremendous results in medical science by showing accurate and earlier-stage prediction of CVD risk [24].
This preliminary research focused on early detection of CVD by analyzing elements including Ca, Na, C, N and CN-band in samples from both CVD patients and healthy individuals. Various machine learning algorithms such as SVM, PCA, NN, k-NN, LDA, DT and ECS were applied to classify the samples based on their LIBS spectral data. The narrow neural network achieved 100% accuracy on both the validation and test datasets in just 10.008 s. This demonstrates that LIBS combined with machine learning could provide a valuable complementary approach to existing analytical methods for the early detection of CVD.

2. Materials and Methods

2.1. LIBS System

A powerful Q-switched Nd:YAG laser model Q smart 850 (Quantel, France) working at 10 Hz with the wavelength of 532 nm, at a pulse energy of 250 mJ and the laser pulse duration was about 5 ns. A biconvex lens of focal length 100 mm was used to focus the laser for generating laser plasma in the blood samples. The emission spectrum of plasma with a wavelength range of 190 nm to 770 nm is captured into a charge-coupled device (CCD) based optical fiber spectrometer (AvaSpec 2048, Avantes, Netherland) with a spectral resolution of 0.05 nm. A delay generator (Stanford model 535DG, Stanford, CA, USA) was used to modify the synchronization of the trigger pulse. The spectrometer triggering delay was configured to 1 µs, while the CCD integration duration was set at 2 ms. To avoid crater formation, a two-dimensional stage was used to precisely adjust the laser spot placement on the samples, leading to LIBS signal distortion. The experiment was conducted under ambient atmospheric conditions. An experimental setup of LIBS is shown in Figure 1.

2.2. Samples Collection and Preparation

In this research, 30 whole blood samples of CVD patients and 30 whole blood samples of healthy individuals were collected. Each CVD and healthy sample taken from different patients was examined by the Faisalabad Institute of Cardiology (FIC), Pakistan. All the diseased samples belong to myocardial infarction (stage C and D) diagnosed by the pathological department. The clinical procedure for healthy and CVD samples was approved by the Research Ethical Committee of FIC Hospital. Blood samples were drawn from the back of the hand vein near the wrist and stored in ethylenediaminetetraacetic acid (EDTA) tubes to avoid coagulation. The samples were refrigerated at 4 °C and analyzed using LIBS within 42 h. Prior to LIBS examination, to enhance the spectral signal, the liquid blood samples were solidified to be poured on a boric acid pellet. A micropipette was used to dispense 30 µL of the blood sample onto a pellet. Each pellet was made by compressing 99.7% pure boric acid powder at 20 MPa using a hydraulic press with a diameter of approximately 40 mm. The blood sample was air-dried for a minimum of 2 h. A total of 50 spectra were recorded for each sample, and each was taken in a new position. Figure 2a shows the blood samples were stored in EDTA tubes, and Figure 2b shows blood poured on a boric acid pellet before laser shots, and Figure 2c shows the boric acid pellet after laser shots.

3. Data Classification Methods

3.1. Data Preprocessing

50 raw spectra of each CVD and healthy sample were recorded. A total of 1500 spectra were collected from CVD and another 1500 from healthy blood samples for spectral analysis. To minimize shot-to-shot laser pulse energy variations and sample inhomogeneity, the average of every 10 consecutive raw spectra was calculated, resulting in 150 averaged spectra for each healthy and CVD sample. Each 150 averaged spectra were normalized with their total spectral intensity calculated by integrating the spectral intensity over the entire spectral range; this process is called normalization. The normalization was performed by summing all the feature spectral data. The flow chart of data processing for classification between CVD and healthy samples is shown in Figure 3. This process started with raw spectral data taken from the CVD and healthy datasets. The feature selection method was based on the number of elements present in the sample. All elements with high intensity peaks were taken as a feature selection before applying classification algorithms. The division of samples and their spectra is shown in Table 1.

3.2. Elemental Analysis of LIBS

The significant emission lines of boron (B) and carbon (C) can be seen in the averaged spectra of the empty substrate. Here, boric acid was used as a substrate due to its minimal elemental composition, making it easier to accurately detect the elements in the blood sample. Idrees et al. also used boric acid for comparison of whole blood and serum samples of breast cancer. The major emission lines of Na, Ca, N, C and CN-band were present in the LIBS spectra of breast cancer whole blood and serum samples. However, the boron (B) and carbon (C) lines were not included in the analysis because they also appeared in the LIBS spectra of the empty boric acid pellet, indicating they were from the substrate, not the sample [19]. CVD and healthy samples have different spectra from the substrate. Elements with different intensities were identified by using the NIST database from the averaged normalized spectra of CVD and healthy ones. To verify the presence of emission lines for each element in the sample, at least two or three strong spectral lines were cross-checked.
The interference of the surrounding air also leads to the fluctuation of the spectral signal intensity of samples in the experiment. LIBS spectra were normalized to reduce spectral fluctuations caused by matrix effects and experimental variations, thereby improving the reliability of the analysis. Meanwhile, feature lines were selected as the inputs of classifiers, which weakened the influence of fluctuation within the class. Variations in sample composition, laser energy, plasma temperature and detector response can lead to inconsistent emission intensities. LIBS spectra provide more accurate and consistent data for effective discrimination and classification of different samples. Most of the detected elements were C, Ca, N, CN-band, and Na observed in the spectra of both CVD and healthy samples. The concentration of these elements was greater in CVD than in healthy individuals. LIBS spectra of several atomic emission lines of (a) CVD and (b) healthy samples, without normalized and with normalized spectra of (c) CVD and (d) healthy samples, with (e) offset spectra of CVD and healthy once, and the empty (f) substrate shown in Figure 4.
In this research, 11 emission lines of C, Ca, N, Na and CN band were selected for the discrimination analysis between healthy and CVD samples. The emissions with higher concentrations of Na, N, Ca, C and CN-band were noticed in the LIBS spectra of CVD patients as compared to the healthy ones. LIBS were used in the diagnosis of CVD for analysis of different elements such as C, Ca, Na, N and CN band, which were closely associated with heart disease. High calcium levels in blood samples were shown to indicate atherosclerosis, which was a condition that serves as a precursor to coronary artery disease (CAD). Higher concentration of sodium risks CVD. By detecting these elements, LIBS can be a determining technique for early detection of CVD [25]. The main components of a biological cell are C, Na, Ca, and N. Na is an important biomarker that rises because of CVD. It plays a major role in the stability of physical control. Risk of CVD rose up to 6% for every 1 g increase in dietary sodium intake [26]. The CN band is a spectral band emitted by the diatomic molecule CN (cyanogen). It is a distinguishing feature of LIBS spectra, mostly when investigating biological materials.
In the literature, the abundance of C, N, and the CN band in disease samples is significantly higher than that in healthy ones. The emission intensities based on the LIBS technique are important for differentiation. Moreover, the higher intensities of carbon are noticeable not only from the pure lines but also by the simple molecular emissions such as the CN-band [27]. Davari et al. (2018) classify the valvular interstitial cells (VICs) by using LIBS, which showed that the excess of calcium affects the heart interstitial cells. This disease was categorized by osteogenic variation of VICs. Calcific aortic valve disease (CAVD) affects 25% of people over 65 years old and 4% of people over 80 years old. Late-stage CAVD often needs surgery, such as valve replacement [23]. Table 2 shows the details of the selected emission lines identified by the NIST database. Emission lines of Ca, C, N, Na and CN-band were observed in both types of samples. A noticeable decrease in the concentration of Na and other observed elements in the spectra of healthy samples. CVD samples have strong emission lines as compared to healthy samples.

4. Results

4.1. Discrimination Analysis

Principal component analysis (PCA) is an unsupervised machine learning process that applies to spectral data to decrease complexity and for discrimination between healthy and disease samples. PCA clustering results for healthy and CVD blood samples are shown in Figure 5. Most of the data variation was taken by the first three principal components, named as first principal component (PC1), second principal component (PC2) and third principal component (PC3). It showed a significant component of the total variance in the data set. It included 150 averaged spectra of each healthy and CVD sample. Precisely, PC1 accounted for 58.41% of the variance, PC2 for 22.93%, and PC3 for 14.66% cumulatively representing over 96% of the total data variance. The CVD samples showed better separation than the healthy samples in a dimensional dataset because data points were more spread out in each type of sample. In contrast, the data points in healthy samples were more compact due to the absence of serious malignancies. Still, there was a visible overlap between healthy and CVD clusters, which makes it difficult to categorize them.

4.2. Multivariate Data Analysis

4.2.1. Machine Learning Methods

Machine learning models were applied for the classification between healthy and CVD samples. In the case of the validation dataset, 20 healthy and 20 CVD samples with their 100 averaged spectra for each type of sample were selected. Similarly, 10 healthy and 10 CVD samples with 50 averaged spectra for each type of sample were selected for the test dataset. All the samples and their spectra were selected for validation and the test set. Therefore, there was no overlap between them. The distribution of healthy and CVD samples with their number of samples and spectra for the test and validation set is shown in Table 3.
Supervised machine learning algorithms were used to improve the discrimination between healthy and CVD samples using LIBS data. SVM, Decision Trees (DT), Discriminant Analysis (DA), Logistic Regression (LR), Naïve Bayes (NB), k-Nearest Neighbors (k-NN), Ensemble Classifiers (EC), and Neural Networks (NN) were included for classification analysis. A lot of research has already been conducted for the discrimination analysis by using these algorithms.
A decision tree is a machine learning algorithm that is used in regression and classification categories. It consists of decision nodes, chance nodes and end nodes. Fine tree, medium tree and coarse tree are the tested variants. The nodes are connected by branches, which represent paths or decision rules. DT makes logical approximations and is easy to use. The model was trained to construct the data set of samples by analyzing which class labels were previously identified. DT was trained on labeled class data, which could accurately predict the outcomes of new instances [28]. Discrimination analysis is another supervised machine learning method used to discriminate the information into different classes. Depending on the different features, DA assumes that each class of data has a different multivariate normal distribution. DA has two types, which are linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). In LDA, all the classes have the same covariance but different average values. QDA allows average values and covariance to differ for each class, which gives more flexibility and allows for the handling of complex data to categorize the different samples. The model approximates the probability that the input data belongs to each predefined class and assigns it to the class with the highest predicted probability [29]. For the discrimination of CVD and healthy samples, LDA and QDA were both examined. Specific features from data, called feature lines, were used in the discriminant analysis, which reduced the number of variables in the dataset and avoided overfitting.
Logistic regression is a predictive model used in the relationship between a binary outcome variable and multiple independent variables. For accurate results of the model, the common strategies like large sample size, removing connecting variables and combining different variables into a table are included. To reduce the collinearity of a large data set, drop out one of the highly correlated variables; otherwise, it can impact the overall performance of the model. This model is also effective when the data is from different classes with similar modifications [30]. Logistic regression based on a linear combination was employed to distinguish between CVD and healthy samples.
Naïve Bayes is a simple probabilistic classifier based on Bayes’ theorem in which features are independent of each other. These models are efficient because the number of parameters increases linearly with the number of features in the dataset. It is the easiest classifier, which is useful for the validation process. Cross-validation divided the data into a training and a test data set [31].
SVM is used for classification and regression models. It is flexible for both linear and nonlinear data. SVM models use different kernel functions such as linear, quadratic, cubic, and Gaussian with fine, medium, and coarse scales. It helps to handle different levels of data complexity by adjusting its decision boundary to optimally separate the classes. While linear kernels are suitable for linearly separable data, Gaussian and polynomial kernels are more effective for nonlinear classification tasks [32].
k-NN is a supervised learning method used in pattern recognition and classification methods. It compares new data points to find the “k” closest. It is used to measure the distance between a new data point “x” and all the validation data set in which the algorithm selects the k validation points with the shortest distance “x”, and if a class appears more frequently than “x”, it is assigned to the majority class [33]. The different types of k-NN used in this study were fine, medium, cosine, cubic, weighted and coarse k-NN.
Ensemble classifiers are a model that combines the results of multiple classifiers in which each classifier in the group can work on the same task and their outputs are combined to make a final decision. These models give more accurate results by working together [34]. It has two types, including homogeneous, in which all the models are based on the same algorithm, for example, bagged trees, boosting trees and subspace trees and heterogeneous, in which different machine learning models are combined, for example, decision trees, SVM or neural networks. The main purpose of an individual ensemble classifier is to attain high precision [35].
A computational model inspired by the structure and functions of biological neural networks is an Artificial Neural Network (ANN). It is a type of deep learning primarily used to solve complex signal processing and pattern recognition problems. ANN consists of three interconnected layers, including the input layer, hidden layer and output layer. The input layer includes artificial neurons that receive and transmit data to the hidden layer, where the data is processed using biased connections as information passes through the layers, which the network adjusts and learns by changing the weights of its connections, which allows it to uncover patterns and trends too complex for human analysis. This ability to learn from data and generalize makes ANNs powerful tools for tasks such as classification, prediction and decision making across different industries [36].
Although there were differences in the spectral line intensities between healthy and CVD samples, it was not easy to classify them by direct observation or using PCA. To improve the performance of LIBS to discriminate between healthy and CVD samples, machine learning models were integrated for the classification.

4.2.2. Comparison of Classification Results

The classification models included validation and test models by using healthy and CVD blood samples with high intensities of normalized spectra. To minimize the risk of overfitting, a 10-fold cross-validation method was used. Model parameters were optimized using a grid search based on the validation dataset. Table 4 shows the classification results of validation accuracy with validation time and testing accuracy with operating time of each model. MATLAB 2024 was used to build and test the classification models. The PC used for the calculations had an 11th Gen Intel (R) Core (TM) i7-11370H processor unit at 3.30 GHz and 477 GB of storage. These characteristics demonstrated the difference in the timing of execution of the prediction accuracy of different models. On the other hand, parallel computing on multiple cores is typically used to significantly reduce computation time, especially when handling very large datasets, training deep learning models and performing extensive hyperparameter tuning. However, in this research, the dataset and model complexity did not necessitate such resources, and the obtained performance was satisfactory without the need for a graphics processing unit (GPU) or multi-core parallelization. Nevertheless, GPU or parallel computing could be incorporated to further optimize performance and considered as a direction for future work.
The precision of each classification model was calculated as the percentage of correctly predicted positive spectra to the total number of spectra predicted as true positives and false positives. All validation models and their accuracy were carried out on this software. When evaluating different classification models in the validation and test dataset, it was observed that most algorithms achieved a prediction accuracy range from 50.0% to 100.0% with different time durations. Additionally, four models, quadratic discrimination, Gaussian naïve bayes, Narrow NN and Trilayer NN achieved the highest accuracy of 100% on validation data. Test data showed even better results, twelve models including fine, medium and coarse tree, quadratic discrimination, Gaussian and kernel naïve bayes, bagged trees, subspace k-NN, narrow NN, wide NN, bilayer NN and trilayer NN achieved the highest 100% accuracy. For both validation and test data, four classification models included quadratic discrimination in 20.168 s, Gaussian naïve bayes in 14.622 s, narrow NN in 10.008 s and trilayer NN in 10.602 s, achieving 100% highest accuracy. From these models, the narrow NN was selected due to its significantly shorter execution time. For practical approaches, testing accuracy is crucial for assessing model reliability and consistency. These highest accuracies of validation and test datasets indicated the strong model generalization. Ideally, the most effective model maintains high testing accuracy while balancing the performance across both the validation and test datasets. These outcomes also highlighted the difference in spectra within the same category of blood samples, which were due to the difference between healthy and CVD samples. However, other models such as LDA, LR, k-NN and SVM showed low testing accuracy.
The confusion matrix is defined as the comparison between true and predicted classes. The true predicted classes are located along the diagonal of the matrix, while the false predicted classes are located outside of the diagonal [37]. In machine learning, the ROC curve is a graphical plot that demonstrates the performance of a binary classifier system, which shows that its discrimination threshold varies. It plots the true positive rate (TPR) against the false negative rate (FNR) at various threshold settings, which shows how well the model can discriminate between the different classes. The area under the curve (AUC) is a common metric to evaluate the overall performance of the classifier [38].
In Figure 6a for the validation dataset of CVD samples, 100.0% were correctly predicted as true positives and 0% were misclassified as false positives. Similarly, in Healthy samples, 100.0% were correctly predicted as true negatives and 0% were misclassified as false negatives. This gives sensitivity, which identifies diseased samples, also called the true positive rate. In this case, it is 100.0% and specificity, which identifies healthy samples, is also called the true negative rate. In this case, it is 100.0%. These results propose that the narrow NN model executes very well, particularly in identifying CVD and healthy cases with no false positives and negatives.
In Figure 6b, the validation set is the receiver operating characteristic (ROC) curve. It plots sensitivity on the y-axis vs. specificity on the x-axis. The curve represents excellent performance. The area under the curve (AUC) is 1.00, which represents the best classifier at this threshold. The operating point lies at the top left of the plot, indicating a perfect balance between CVD and healthy samples.
In Figure 7a, for the testing set of CVD samples, 100.0% were correctly predicted as true positives and 0% were misclassified as false positives. Similarly, in Healthy samples, 100.0% were correctly predicted as true negatives and 0% were misclassified as false negatives. This results in a sensitivity is 100.0% and a specificity is also 100.0%. The high values indicate a balanced and strong classification performance, with no misclassification. In Figure 7b, the ROC curve demonstrates model performance. The curve indicates strong discrimination between CVD and healthy samples. The area under the curve (AUC) is 1.00, suggesting excellent discrimination ability.
The emission spectra graphs used in this current study were relatively simple compared to more complex biomedical datasets. However, intentionally applied a wide range of machine learning algorithms, as shown in Table 4, to evaluate their performance not only for this initial binary classification task for presence or absence of heart disease, but also to identify models with strong generalization capabilities for future work. The long-term objective of this research was to transition from binary classification to a multi-class classification framework, which will allow for more detailed, preliminary diagnoses of different types or stages of heart disease. The current study serves as a foundational step toward that goal.

5. Discussion

It was observed that the concentration of elements significantly contributed to achieving accurate discrimination results. The highest prediction and test accuracies were obtained by a narrow neural network in a short duration of 10.008 s. It indicated excellent generalization and low risk of overfitting. It also showed fewer errors, such as outliers and misclassifications, making it more reliable. All models effectively differentiated CVD patients from healthy individuals with strong classification ability and optimal ROC performance.
In previous studies, Davari et al. showed that porcine valvular interstitial cells (VICs) were cultured in osteogenic media to induce calcification, and LIBS was able to detect calcium deposits as early as 2 days into differentiation with a detection limit of ~0.17 ± 0.04 µg, five times more sensitive than conventional assays [23]. Similarly, Srinidi et al. used tissue-mimicking phantoms representing calcified and non-calcified atherosclerotic plaques, where LIBS identified distinct calcium peaks of 400 nm, 590 nm, and 780 nm in the calcified phantoms alongside a significant drop in optical transmittance and validating its potential in plaque characterization [25]. In contrast, our study focused on human samples from both healthy individuals and CVD patients. LIBS combined with machine learning analyzed the elemental biomarkers such as calcium, sodium, nitrogen and CN-bands, which highlighted the LIBS potential as a rapid, non-invasive and highly accurate diagnostic tool for real-world CVD detection. The novelty of this work lies in the use of elemental analysis for CVD detection, which remains a largely unexplored area in current research. Different previous studies focused on imaging and biochemical methods. This was the first to combine LIBS with machine learning for early diagnosis of CVD patients and also showed the potential of LIBS to differentiate between healthy and diseased blood samples based on their elemental emission signatures. A large number of human samples with early stages should be included to make this technique more reliable and applicable.
The major challenges in the application of LIBS for biomedical diagnosis lie in the variability of biological sample surface, particularly their homogeneity, heterogeneity and sensitivity. These surface characteristics directly influenced laser energy, coupling efficiency and plasma formation dynamics, which were critical parameters for achieving reliable spectral emission. Additionally, system-related factors such as laser pulse fluctuations and detector response variability also contributed to the spectral inconsistencies. These factors collectively impacted the reproducibility and accuracy of LIBS data, posing a challenge for its clinical translation.
Our findings suggested that the combination of LIBS and ML is not only effective for CVD detection but could also be extended to other systemic diseases, marking a significant advancement in non-invasive diagnostic technologies.

6. Conclusions

In this preliminary study, the LIBS analysis of healthy and cardiovascular disease (CVD) samples revealed the presence of C, Ca, Na, N and CN-band. Particularly, LIBS combined with machine learning fills a significant research gap by conducting the elemental analysis of CVD blood samples. A combination for CVD has not been previously explored. The results showed that the CVD samples have stronger emission lines of elements than healthy ones. Applying multivariate statistical techniques such as PCA showed little distinction between healthy and CVD samples. Among the different classification models, several machine learning models achieved the highest 100% accuracy, in which a narrow neural network achieved the highest 100% on both validation and test accuracy in 10.008 s. These results demonstrated that the suggested technique is a reliable, fast, effective and non-invasive method for the rapid detection of CVD. Further research with a large number of samples will provide strong potential for this technique for real-time medical diagnosis in clinical applications. The integration of LIBS with machine learning not only improves detection sensitivity but also contributes to the advancement of cost-effective diagnostic tools.

Author Contributions

Conceptualization, A.H.; methodology, A.G.; software, S.S.; validation, A.H., F.A. and R.N.; formal analysis, A.H. and R.N.; investigation, F.A.; resources, B.S.I.; data curation, Y.J. and R.N.; writing—original draft preparation, A.H.; writing—review and editing, G.T. and B.S.I.; supervision, B.S.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CVDCardiovascular disease
LIBSLaser induced breakdown spectroscopy
Nd:YAGNeodymium-doped Yttrium aluminum garnet
WHOWorld health organization
ECGEchocardiogram
MRIMagnetic resonance imaging
MLMachine learning
PCAPrincipal component analysis
NNNeural networks
SVMSupport vector machine
ECSEnsembles classifiers
CNNConvolutional neural networks
k-NNk-nearest neighbor
LDALinear discriminant analysis
DTDecision tree
CADCoronary artery disease
DVTDeep venous thrombosis
SNRSignal to noise ratio
VICsValvular interstitial cells

References

  1. Baghdadi, N.A.; Farghaly Abdelaliem, S.M.; Malki, A.; Gad, I.; Ewis, A.; Atlam, E. Advanced machine learning techniques for cardiovascular disease early detection and diagnosis. J. Big Data 2023, 10, 144. [Google Scholar] [CrossRef]
  2. Azmi, J.; Arif, M.; Nafis, M.T.; Alam, M.A.; Tanweer, S.; Wang, G. A systematic review on machine learning approaches for cardiovascular disease prediction using medical big data. Med. Eng. Phys. 2022, 105, 103825. [Google Scholar] [CrossRef] [PubMed]
  3. Kumar, P.M.; Lokesh, S.; Varatharajan, R.; Babu, G.C.; Parthasarathy, P. Cloud and IoT based disease prediction and diagnosis system for healthcare using Fuzzy neural classifier. Future Gener. Comput. Syst. 2018, 86, 527–534. [Google Scholar] [CrossRef]
  4. Paudel, P.; Karna, S.K.; Saud, R.; Regmi, L.; Thapa, T.B.; Bhandari, M. Unveiling key predictors for early heart attack detection using machine learning and explainable AI technique with LIME. In Proceedings of the 10th International Conference on Networking, Systems and Security, Dhaka, Bangladesh, 21 December 2023; pp. 69–78. [Google Scholar]
  5. Gidding, S.S.; Lloyd-Jones, D.; Lima, J.; Ambale-Venkatesh, B.; Shah, S.J.; Shah, R.; Lewis, C.E.; Jacobs, D.R., Jr.; Allen, N.B. Prevalence of American heart association heart failure stages in black and white young and middle-aged adults: The CARDIA Study. Circ. Heart Fail. 2019, 12, e005730. [Google Scholar] [CrossRef]
  6. Moreno-Sánchez, P.A.; García-Isla, G.; Corino, V.D.; Vehkaoja, A.; Brukamp, K.; Van Gils, M.; Mainardi, L. ECG-based data-driven solutions for diagnosis and prognosis of cardiovascular diseases: A systematic review. Comput. Biol. Med. 2024, 172, 108235. [Google Scholar] [CrossRef]
  7. Shruthi, T.; Srimathi, P.; Janani, S. Cardiac Disease Diagnosis from Echocardiogram using Random Forest Classifier. In Proceedings of the 2024 5th International Conference on Data Intelligence and Cognitive Informatics (ICDICI), Tirunelveli, India, 18 November 2024; pp. 362–368. [Google Scholar]
  8. Curici, A.; Popescu, M.R.; Pîrvuleț, V.A.; Marinescu, G.I.; Ionescu, A.C. A Cross-Sectional Study on the Role of a Lab Test Screening Program in Defining Cardiovascular Disease Risk Prevalence. J. Pers. Med. 2024, 14, 284. [Google Scholar] [CrossRef]
  9. Matusik, P.S.; Mikrut, K.; Bryll, A.; Popiela, T.J.; Matusik, P.T. Cardiac magnetic resonance imaging in diagnostics and cardiovascular risk assessment. Diagnostics 2025, 15, 178. [Google Scholar] [CrossRef]
  10. Pontone, G.; Rossi, A.; Guglielmo, M.; Dweck, M.R.; Gaemperli, O.; Nieman, K.; Pugliese, F.; Maurovich-Horvat, P.; Gimelli, A.; Cosyns, B.; et al. Clinical applications of cardiac computed tomography: A consensus paper of the European Association of Cardiovascular Imaging—Part II. Eur. Heart J.-Cardiovasc. Imaging 2022, 23, e136–e161. [Google Scholar] [CrossRef]
  11. Pal, M.; Parija, S.; Panda, G.; Dhama, K.; Mohapatra, R.K. Risk prediction of cardiovascular disease using machine learning classifiers. Open Med. 2022, 17, 1100–1113. [Google Scholar] [CrossRef]
  12. Wang, Z. New Trend: Application of Laser-Induced Breakdown Spectroscopy with Machine Learning. Chemosensors 2025, 13, 5. [Google Scholar] [CrossRef]
  13. Sabri, A.S.; Rasid, H.M.; Abd Rashid, R.; Karim, U.K.; Subri, M.S.; Halim, M.I. A Review of Laser-Induced Breakdown Spectroscopy (LIBS) for Document Examination: Fundamentals, Mechanism, and Application. Curr. Appl. Sci. Technol. 2024, 24, e0257203. [Google Scholar] [CrossRef]
  14. Chaichi, A.; Prasad, A.; Gartia, M.R. Raman spectroscopy and microscopy applications in cardiovascular diseases: From molecules to organs. Biosensors 2018, 8, 107. [Google Scholar] [CrossRef] [PubMed]
  15. Mamarelis, I.; Anastassopoulou, J.; Mylonas, E.; Mamareli, V.; Spiliopoulos, K.; Theophanides, T. FT-IR spectroscopic study of atheromatic plaque formation and the risk of COVID-19 mortality of patients with cardiovascular diseases. A mathematical simulation approach. Atherosclerosis 2022, 355, 49. [Google Scholar] [CrossRef]
  16. Yabsley, W.; Homer-Vanniasinkam, S.; Fisher, J. Nuclear Magnetic Resonance Spectroscopy in the Detection and Characterisation of Cardiovascular Disease: Key Studies. Int. Sch. Res. Not. 2012, 2012, 784073. [Google Scholar] [CrossRef]
  17. Kuku, K.O.; Singh, M.; Ozaki, Y.; Dan, K.; Chezar-Azerrad, C.; Waksman, R.; Garcia-Garcia, H.M. Near-infrared spectroscopy intravascular ultrasound imaging: State of the art. Front. Cardiovasc. Med. 2020, 7, 107. [Google Scholar] [CrossRef]
  18. Khan, M.N.; Wang, Q.; Idrees, B.S.; Xiangli, W.; Teng, G.; Cui, X.; Zhao, Z.; Wei, K.; Abrar, M. A review on laser-induced breakdown spectroscopy in different cancers diagnosis and classification. Front. Phys. 2022, 10, 821057. [Google Scholar] [CrossRef]
  19. Idrees, B.S.; Teng, G.; Israr, A.; Zaib, H.; Jamil, Y.; Bilal, M.; Bashir, S.; Khan, M.N.; Wang, Q. Comparison of whole blood and serum samples of breast cancer based on laser-induced breakdown spectroscopy with machine learning. Biomed. Opt. Express 2023, 14, 2492–2509. [Google Scholar] [CrossRef]
  20. Kanawade, S.; Doke, P.; Hajare, D.; Udhan, S.; Patil, B.; Patil, S. Machine intelligence based early prediction methods for Cardiovascular Diseases (CVDs). J. Electr. Syst. 2024, 20. [Google Scholar] [CrossRef]
  21. Gárate-Escamila, A.K.; El Hassani, A.H.; Andrès, E. Classification models for heart disease prediction using feature selection and PCA. Inform. Med. Unlocked 2020, 19, 100330. [Google Scholar] [CrossRef]
  22. Mahesh, B. Machine Learning Algorithms—A Review. Int. J. Sci. Res. (IJSR) 2019, 9. [Google Scholar] [CrossRef]
  23. Davari, S.A.; Masjedi, S.; Ferdous, Z.; Mukherjee, D. In-vitro analysis of early calcification in aortic valvular interstitial cells using Laser-Induced Breakdown Spectroscopy (LIBS). J. Biophotonics 2018, 11, e201600288. [Google Scholar] [CrossRef]
  24. Li, J.; Pan, X.; Guo, L.; Chen, Y. Cancer diagnosis based on laser-induced breakdown spectroscopy with bagging-voting fusion model. Med. Eng. Phys. 2024, 132, 104207. [Google Scholar] [CrossRef] [PubMed]
  25. Srinidi, S.; Lohitaa, J.; Mercy, A.; John, P.; Mohan, B.M.; Vasa, N.J. Detection of Atherosclerotic Plaque using Laser-Induced Breakdown Spectroscopy. In Proceedings of the 2023 International Conference on Bio Signals, Images, and Instrumentation (ICBSII), Chennai, India, 16–17 March 2023; pp. 1–4. [Google Scholar]
  26. Wang, Y.J.; Yeh, T.L.; Shih, M.C.; Tu, Y.K.; Chien, K.L. Dietary sodium intake and risk of cardiovascular disease: A systematic review and dose-response meta-analysis. Nutrients 2020, 12, 2934. [Google Scholar] [CrossRef] [PubMed]
  27. Al-Salihi, M.; Yi, R.; Wang, S.; Wu, Q.; Lin, F.; Qu, J.; Liu, L. Quantitative laser-induced breakdown spectroscopy for discriminating neoplastic tissues from non-neoplastic ones. Opt. Express 2021, 29, 4159–4173. [Google Scholar] [CrossRef]
  28. Ashour, A.S.; Hawas, A.R.; Guo, Y. Comparative study of multiclass classification methods on light microscopic images for hepatic schistosomiasis fibrosis diagnosis. Health Inf. Sci. Syst. 2018, 6, 7. [Google Scholar] [CrossRef]
  29. Tharwat, A. Linear vs. quadratic discriminant analysis classifier: A tutorial. Int. J. Appl. Pattern Recognit. 2016, 3, 145–180. [Google Scholar] [CrossRef]
  30. Ciu, T.; Oetama, R.S. Logistic regression prediction model for cardiovascular disease. Int. J. New Media Technol. 2020, 7, 33–38. [Google Scholar] [CrossRef]
  31. Dulhare, U.N. Prediction system for heart disease using Naive Bayes and particle swarm optimization. Biomed. Res. 2018, 29, 2646–2649. [Google Scholar] [CrossRef]
  32. Duraisamy, B.; Sunku, R.; Selvaraj, K.; Pilla, V.V.; Sanikala, M. Heart disease prediction using support vector machine. Multidiscip. Sci. J. 2024, 6. [Google Scholar] [CrossRef]
  33. Assegie, T.A. Heart disease prediction model with k-nearest neighbor algorithm. Int. J. Inform. Commun. Technol. (IJ-ICT) 2021, 10, 225. [Google Scholar] [CrossRef]
  34. Chu, Y.; Chen, F.; Sheng, Z.; Zhang, D.; Zhang, S.; Wang, W.; Jin, H.; Qi, J.; Guo, L. Blood cancer diagnosis using ensemble learning based on a random subspace method in laser-induced breakdown spectroscopy. Biomed. Opt. Express 2020, 11, 4191–4202. [Google Scholar] [CrossRef]
  35. Shorewala, V. Early detection of coronary heart disease using ensemble techniques. Inform. Med. Unlocked 2021, 26, 100655. [Google Scholar] [CrossRef]
  36. Qamar, R.; Zardari, B.A. Artificial neural networks: An overview. Mesopotamian J. Comput. Sci. 2023, 2023, 124–133. [Google Scholar] [CrossRef]
  37. Sathyanarayanan, S.; Tantri, B.R. Confusion matrix-based performance evaluation metrics. Afr. J. Biomed. Res. 2024, 27, 4023–4031. [Google Scholar] [CrossRef]
  38. Feng, K.; Hong, H.; Tang, K.; Wang, J. Decision making with machine learning and ROC curves. arXiv 2019, arXiv:1905.02810. [Google Scholar] [CrossRef]
Figure 1. Experimental setup of LIBS.
Figure 1. Experimental setup of LIBS.
Photonics 12 00849 g001
Figure 2. (a) samples stored in EDTA tubes and (b) blood on a boric acid pellet before ablation and (c) boric acid pellet after ablation.
Figure 2. (a) samples stored in EDTA tubes and (b) blood on a boric acid pellet before ablation and (c) boric acid pellet after ablation.
Photonics 12 00849 g002
Figure 3. Flow chart of data analysis.
Figure 3. Flow chart of data analysis.
Photonics 12 00849 g003
Figure 4. LIBS spectra of several atomic emission lines of (a) CVD and (b) healthy samples without normalized and with normalized spectra of (c) CVD and (d) healthy samples with (e) offset spectra of CVD and healthy once and the empty (f) substrate.
Figure 4. LIBS spectra of several atomic emission lines of (a) CVD and (b) healthy samples without normalized and with normalized spectra of (c) CVD and (d) healthy samples with (e) offset spectra of CVD and healthy once and the empty (f) substrate.
Photonics 12 00849 g004
Figure 5. PCA clustering analysis of healthy and CVD.
Figure 5. PCA clustering analysis of healthy and CVD.
Photonics 12 00849 g005
Figure 6. (a) Confusion matrix and (b) ROC curve of Narrow NN formed by validation model for classification of CVD and healthy samples.
Figure 6. (a) Confusion matrix and (b) ROC curve of Narrow NN formed by validation model for classification of CVD and healthy samples.
Photonics 12 00849 g006
Figure 7. (a) Confusion matrix and (b) ROC curve of the Narrow NN formed by the testing model to differentiate CVD and healthy samples.
Figure 7. (a) Confusion matrix and (b) ROC curve of the Narrow NN formed by the testing model to differentiate CVD and healthy samples.
Photonics 12 00849 g007
Table 1. Division of samples and their spectra.
Table 1. Division of samples and their spectra.
Sample TypeNumber of SamplesTotal Number of Raw SpectraTotal Number of Averaged Spectra
CVD301500150
Healthy301500150
Table 2. Emission lines of identified elements.
Table 2. Emission lines of identified elements.
ElementsWavelength (nm)
C247.8
CN385.9, 386.1, 387.2, 388.2
Ca393.4, 396.8, 422.7
N500.5
Na588.9, 589.5
Table 3. The distribution of healthy and CVD samples with their number of samples and spectra for the test and validation set.
Table 3. The distribution of healthy and CVD samples with their number of samples and spectra for the test and validation set.
Data Separation HealthyCVD
Validation setNo of samples
No of spectra
20
100
20
100
Testing setNo of samples
No of spectra
10
50
10
50
Table 4. Classification results of validation, test accuracy and operating time.
Table 4. Classification results of validation, test accuracy and operating time.
AlgorithmAlgorithm TypeValidation Accuracy (%)Testing Accuracy (%)Validation Time (s)
Decision treesFine Tree99.0100.026.427
Medium Tree99.0100.023.603
Coarse Tree99.0100.021.991
Discrimination analysisLinear Discrimination73.066.021.453
Quadratic Discrimination100.0100.020.168
Binary GLM logistic regressionRegression83.069.018.972
Efficient logistic
regression
Efficient Logistic Regression64.558.017.291
Efficient linear SVMEfficient Linear SVM63.058.015.862
Naïve byesGaussian Naïve Bayes100.0100.014.622
Kernel Naïve Bayes98.0100.013.413
SVMLinear SVM78.066.020.572
Quadratic SVM97.597.021.745
Cubic SVM91.098.08.2034
Fine Gaussian93.092.02.9623
Medium Gaussian68.569.02.5884
Coarse Gaussian52.063.04.6541
k-NNFine k-NN98.599.03.3484
Medium k-NN97.596.03.7223
Coarse k-NN50.050.03.2015
Cosine k-NN98.099.02.2403
Cubic k-NN97.597.03.4522
Weighted k-NN98.096.02.5152
Ensemble classifierBoosted Trees50.050.06.5405
Bagged Trees99.0100.013.693
Subspace Discriminant72.566.017.012
Subspace k-NN99.5100.019.389
RUS Boosted Trees50.050.04.3633
Neural networkNarrow Neural Network100.0100.010.008
Medium Neural Network99.599.09.7316
Wide Neural Network99.5100.07.7047
Bilayer Neural Network99.5100.011.297
Trilayer Neural Network100.0100.010.602
KernelSVM Kernel95.595.011.324
Logistic Regression Kernel94.093.010.622
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hameed, A.; Idrees, B.S.; Nawaz, R.; Azam, F.; Sabir, S.; Gulzar, A.; Jamil, Y.; Teng, G. Early Detection of Cardiovascular Disease Using Laser-Induced Breakdown Spectroscopy Combined with Machine Learning. Photonics 2025, 12, 849. https://doi.org/10.3390/photonics12090849

AMA Style

Hameed A, Idrees BS, Nawaz R, Azam F, Sabir S, Gulzar A, Jamil Y, Teng G. Early Detection of Cardiovascular Disease Using Laser-Induced Breakdown Spectroscopy Combined with Machine Learning. Photonics. 2025; 12(9):849. https://doi.org/10.3390/photonics12090849

Chicago/Turabian Style

Hameed, Amna, Bushra Sana Idrees, Rabia Nawaz, Fiza Azam, Shahwal Sabir, Amna Gulzar, Yasir Jamil, and Geer Teng. 2025. "Early Detection of Cardiovascular Disease Using Laser-Induced Breakdown Spectroscopy Combined with Machine Learning" Photonics 12, no. 9: 849. https://doi.org/10.3390/photonics12090849

APA Style

Hameed, A., Idrees, B. S., Nawaz, R., Azam, F., Sabir, S., Gulzar, A., Jamil, Y., & Teng, G. (2025). Early Detection of Cardiovascular Disease Using Laser-Induced Breakdown Spectroscopy Combined with Machine Learning. Photonics, 12(9), 849. https://doi.org/10.3390/photonics12090849

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop