Multilevel Pain Assessment with Functional Near-Infrared Spectroscopy: Evaluating ΔHBO2 and ΔHHB Measures for Comprehensive Analysis

Assessing pain in non-verbal patients is challenging, often depending on clinical judgment which can be unreliable due to fluctuations in vital signs caused by underlying medical conditions. To date, there is a notable absence of objective diagnostic tests to aid healthcare practitioners in pain assessment, especially affecting critically-ill or advanced dementia patients. Neurophysiological information, i.e., functional near-infrared spectroscopy (fNIRS) or electroencephalogram (EEG), unveils the brain’s active regions and patterns, revealing the neural mechanisms behind the experience and processing of pain. This study focuses on assessing pain via the analysis of fNIRS signals combined with machine learning, utilising multiple fNIRS measures including oxygenated haemoglobin ( ΔHBO2 ) and deoxygenated haemoglobin ( ΔHHB ). Initially, a channel selection process filters out highly contaminated channels with high-frequency and high-amplitude artifacts from the 24-channel fNIRS data. The remaining channels are then preprocessed by applying a low-pass filter and common average referencing to remove cardio-respiratory artifacts and common gain noise, respectively. Subsequently, the preprocessed channels are averaged to create a single time series vector for both ΔHBO2 and ΔHHB measures. From each measure, ten statistical features are extracted and fusion occurs at the feature level, resulting in a fused feature vector. The most relevant features, selected using the Minimum Redundancy Maximum Relevance method, are passed to a Support Vector Machines classifier. Using leave-one-subject-out cross validation, the system achieved an accuracy of 68.51%±9.02% in a multi-class task (No Pain, Low Pain, and High Pain) using a fusion of ΔHBO2 and ΔHHB . These two measures collectively demonstrated superior performance compared to when they were used independently. This study contributes to the pursuit of an objective pain assessment and proposes a potential biomarker for human pain using fNIRS.


Introduction
Pain, despite its unpleasantness, acts as an essential biomarker in our bodies, alerting us to potential health issues, injuries, or emotional stress.Pain can be localised to a particular region, like an injury, but it can also be more widespread, as seen in many illnesses [1].Pain is a significant issue in society as it poses a substantial public health challenge, impacts the quality of life of sufferers, and places a burden on the economy [2,3].The economic impacts of pain are drastic, imposing a financial burden exceeding AUD 73 billion dollars annually, including AUD 48.3 billion dollars in lost productivity in Australia alone [4,5].Furthermore, it impacts the day-to-day routines and significantly diminishes the overall quality of life.For instance, low back pain is the leading cause of disability in the world, with over 600 million people living with pain [6].Therefore, the assessment and management of pain is essential for a wide range of clinical disorders and treatments, and its early diagnosis plays a vital role in mitigating the risk of its progression into chronic conditions or contributing to depression or anxiety [7].
Pain is a subjective experience and its measurement is difficult.In clinical practice, two primary subjective methods are used for pain assessment: self-reports and clinical judgment [8].The commonly accepted method to assess pain is self-report.Self-reporting techniques aim to gauge a patient's pain using verbal or numerical self-assessment tools, including methods such as visual analogue scales, verbal descriptor scales, numerical rating scales, or the McGill Pain Questionnaire [9,10].When self-reports are not accessible or may be unreliable, clinical observations can serve as a supplementary or alternative method.Clinical judgment for pain assessment relies on examining and understanding the nature, intensity, and context of the patient's pain experience based on observations [7].Despite their convenience and utility, subjective reports come with various limitations such as inconsistent measurement scales and variations in how pain is understood by medical professionals and patients.Furthermore, these methods cannot be effectively employed in cases involving children or patients with neurological disorders.
In order to address these limitations, researchers have turned to the analysis of the neurological aspects of pain using objective methods such as neuroimaging [11].For instance, Wager et al. [12] developed a system that employs machine learning to analyse data obtained from functional magnetic resonance imaging (fMRI).Their work demonstrated the potential to identify a consistent neurological signature of pain at the individual level.While fMRI-based objective assessments of pain have made significant progress in understanding the brain's pain mechanisms, the size and cost of MRI scanners and other conventional neuroimaging tools (such as positron emission tomography) make them impractical for routine clinical use [13].This limitation has increased the interest in portable neuroimaging devices that offer similar technical advantages to fMRI.One such technology is functional near-infrared spectroscopy (fNIRS), which measures changes in the concentrations of oxygenated hemoglobin (∆HBO 2 ) and deoxygenated haemoglobin (∆HHB)-similar to the blood oxygen level-dependent signal in fMRI.fNIRS is capable of non-invasive measurement of near-infrared light absorption within the range of 700 to 1000 nm through the skull [14].In contrast to traditional MRI scanners, the portability and compatibility of fNIRS with ferromagnetic and electrical components provide researchers with the option to monitor and study functional brain activity in clinical settings [15,16].
Machine learning has played a pivotal role in neuroimaging-based methods for the study of pain [17,18].It helps us to better understand the pain by uncovering patterns within clinical and experimental data [19].Machine learning methods can effectively acquire the ability to map features to known classes, enabling them to predict a pain phenotype class based on a complex set of obtained features.For instance, Brown et al. [20], in an fMRI study, employed the Support Vector Machine (SVM) algorithm to distinguish between painful and non-painful experimental stimuli, achieving an accuracy of 81%.In an EEG study, Gram et al. [21] examined individuals who had received either morphine or a placebo following cold pressor test stimulation.They used the SVM algorithm to classify responders, achieving an accuracy of 71.9%.This classification was based on wavelet coefficients derived from each EEG band.These studies have shown the potential of neuroimaging and machine learning in the identification of pain.
In pain research using fNIRS, machine learning has proven to be effective for the detection and prediction of pain [22].In a study by Pourshoghi et al. [22], authors used an SVM classifier using B-spline coefficients from functional data analysis.They achieved a classification accuracy of 94% in distinguishing between low-pain and high-pain signals using fNIRS.In Fernandez et al. [23], the results indicate that by using the Gaussian Support Vector Machine (SVM), they achieved an accuracy of 94.17% in classifying the four types of pain within the fNIRS data.Zeng et al. [24] investigated chronic pain's impact on brain function using fNIRS.Machine learning achieved high accuracy in identifying chronic pain patients based on resting-state fNIRS data, suggesting the potential for using functional connectivity features as neural markers for chronic pain diagnosis.Despite the promising results obtained by the mentioned studies, there is still limited research in this field within the literature.
This study employs an approach for pain assessment that leverages the analysis of fNIRS signals in combination with machine learning techniques.This approach utilises fNIRS measurements of ∆HBO 2 and ∆HHB to provide a comprehensive and accurate evaluation of pain levels.While the literature emphasises ∆HBO 2 as a more promising fNIRS measure [25,26], recent studies, as highlighted by Ho et al. [27], indicate that both measures exhibit high accuracy in classification tasks.Therefore, in this study, both ∆HBO 2 and ∆HHB measures have been taken into account.First, the pain information of 30 healthy subjects was collected using quantitative sensory testing (QST).Then, we performed a channel selection process to remove faulty channels from the analysis.Subsequently, ten statistical features from each measure were extracted.Then, we utilised well-known classifiers to identify pain levels using this reduced feature set.This study makes the following contributions: (1) proposing an fNIRS channel selection strategy for rejecting noisy channels based on high-frequency and high-amplitude artifacts; (2) presenting a group of possible features from fNIRS signals for the assessment of pain; (3) identifying that ∆HBO 2 is better at detecting high pain intensity and ∆HHB is good at detecting low pain intensity; and (4) proposing the combination of ∆HBO 2 and ∆HHB as a possible biomarker of human pain.This study contributes to the field of pain assessment and offers new avenues for understanding and quantifying pain in a more precise and objective manner.

Materials and Methods
Figure 1 presents the core system block diagram of the proposed fNIRS-based pain assessment system.The system integrates attributes from both ∆HBO 2 and ∆HHB to assess the pain level.Further elaboration on the materials and methodology is provided in the following subsections.

Experimental Protocol
In this study, 30 healthy individuals (7 females and 23 males) aged 19 to 52 years (31.7 ± 8.7 yrs) participated.None had unstable medical conditions, chronic pain, or recent medication usage prior to testing.Participants received detailed explanations and provided written informed consent before the start of the experiments.The research, involving human participants, received ethical approval from the University of Canberra's Human Research Ethics Committee (reference number 11837).
The data collection procedure took place at the Human-Machine Interface Laboratory at the University of Canberra, Australia.Participants were seated comfortably with both arms resting on the table.Electrodes from a transcutaneous electrical nerve stimulation (TENS) machine (Medihightec Medical Co., Ltd., Taipei City, Taiwan) were placed on the participants' inner forearm and the back of their right hand.The experimental process consisted of two phases: an initial assessment of individual pain perceptions using the QST protocol, which determined pain thresholds and tolerances, followed by the pain stimulation phase.We defined the pain threshold (low pain) as the lowest stimulus intensity at which stimulation became painful, and pain tolerance (high pain) as the highest intensity of pain the participant could endure before reaching a point of intolerable discomfort.In the pain stimulation phase, fNIRS data were acquired and a 60 s baseline recording was obtained before the start of the experiment.A counterbalanced approach was employed, alternating between low and high stimuli intensity and forearm or hand stimulation.Six 10 s stimulus repetitions were recorded for each type of stimulus, followed by 40 s rest intervals.Figure 2 presents a schematic representation of the stimulation and perception of pain.Changes in ∆HBO 2 and ∆HHB concentration (µmol/L) were measured using a wireless, continuous wave fNIRS device (Artinis Medical Systems, Gelderland, the Netherlands).The fNIRS system includes 24 channels covering the prefrontal cortex (PFC).Optodes (10 sources and 8 detectors) are separated by 35 mm and placed on the frontal lobe (Figure 3).The near-infrared light was emitted by sources with wavelengths of 760 and 840 nm at a sampling rate of 50 Hz.Figure 4 displays the raw fNIRS channels (∆HBO 2 ) recorded over a 5 min duration while a subject experienced varying pain intensities.Red: Sources; Blue: Detectors; and Yellow: Channels.Specifically, the optodes Tx1, Tx2, Tx7, Tx9, Rx3, and Rx7 were positioned at the following locations on the standard 10-20 EEG system: Tx1: at F8; Tx2: at Fp2; Tx7: at Fp1; Tx9: at F7; Rx3: at F4; and Rx7: at F3.

Channel Selection
In the context of processing fNIRS data from 24 channels, as shown in Figure 4, some specific challenges arose.Two of the electrodes related to channels 19 and 23 were found to be malfunctioning, necessitating their exclusion from the analysis.This action was taken to ensure the integrity of the data.Additionally, among the remaining 22 channels, it was observed that certain channels exhibited distinct and undesirable features in the form of high amplitude and sharp peaks resembling square wave artifacts.These peculiar patterns suggest that these channels were significantly contaminated by movement artifacts or other non-neural artifacts.To effectively address this issue and proceed with data analysis, a preliminary step involved the systematic identification of unreliable channels to be excluded from further processing.This selection was accomplished using the relative range (RR) operator threshold.Relative range (Equation ( 1)) is defined as the ratio of the range of the derivative of an fNIRS channel to the range of the raw channel, as follows: where x ch is the derivative of an fNIRS channel x ch , which represents the rate of change in a signal.In the context of fNIRS signals, the derivative can highlight regions where the signal changes rapidly, which may correspond to high-amplitude peaks (i.e., spikes) within a channel.As a result, high RR values indicate the presence of these high-amplitude sharp peaks.Experimental findings revealed that channels with an RR exceeding 0.1 (10%) are typically contaminated by these artifacts.With this threshold, the channels contaminated by artifacts were excluded, ensuring that only artifact-free channels were retained for subsequent processing.The raw fNIRS channels (n = 17) for ∆HBO 2 measurement selected after the channel selection algorithm are shown in Figure 5.After this step, the data from three subjects were excluded from further processing as the algorithm resulted in the removal of over 70% of their number of channels.For the remaining 27 subjects, the number of retained channels after the selection process ranged from 16 to 22.

Dataset Organisation
After completing the data collection and channel selection process, all recorded data were segmented into 10-second intervals for each class.This resulted in six observations for the baseline class per subject, 12 observations for the low pain class per subject, and 12 observations for the high pain class per subject.In order to address the class observation imbalance, six additional observations from the rest periods of each subject, prior to the pain stimulation, were included in the baseline class.Consequently, the dataset consisted of a total of 972 observations.Each subject contributed 12 observations for each class, resulting in a cumulative total of 36 observations per subject.The dataset included 324 observations for each of the Baseline (B), Low Pain (LP), and High Pain (HP) classes.

Signal Processing: Filtration and Averaging
To suppress the noise and pulsation in fNIRS data (∆HBO 2 and ∆HHB), as shown in Figure 6, each available fNIRS channel was passed through a 4th order Butterworth infinite impulse response low-pass filter with a cut-off frequency of 0.16 Hz [23].During fNIRS data acquisition, there can be various common noise sources that affect the measurements.These noise sources can include changes in blood flow unrelated to neural activity, motion artifacts, and systemic physiological changes such as heart rate and respiration [23].These sources can introduce noise into the fNIRS data.Common Average Referencing (CAR) [28] involves calculating the averages from all available channels across the scalp for each wavelength (∆HBO 2 and ∆HHB).This average is then subtracted, for each wavelength from the signal of each individual channel.This effectively subtracts out the common noise components shared by all channels.Equation ( 2) shows the channelaveraging scheme: where h is the average of fNIRS measure H (∆HBO 2 or ∆HHB), M is the total number of channels for each participant, k is the discrete time for which the signal is recorded, and j is the channel number.The preprocessed version of both fNIRS measures, i.e., ∆HBO 2 and ∆HHB for various experimental conditions, is displayed in Figure 7.

Feature Extraction
The ∆HBO 2 and ∆HHB signals display distinct characteristics associated with the pain intensities.Amplitude, as an indicator of pain intensity, increases with more painful stimuli, signifying higher neural activity and oxygen demand.Variation in these signals highlights the dynamic nature of pain experiences, showcasing rapid and substantial fluctuations over time.Complexity in ∆HBO 2 and ∆HHB responses uncovers the intricate interactions between brain regions and physiological systems involved in pain processing [29].The dynamics of ∆HBO 2 and ∆HHB responses reveal the timing of pain intensity, from pain onset to apex, and then, to recovery.Moreover, the stability of these signals distinguishes sustained pain from transient changes, providing insights into the persistence of pain perception.To extract the fNIRS signal information related to intensity, dynamics, stability, complexity, and variation-like characteristics [30], we have carefully chosen features [31,32] such as Log Energy, Crest Factor, Shape Factor, Impulse Factor, Margin Factor, Mobility, Complexity, Mean Absolute Deviation of First Difference, Range, and Variation in First Difference as defined in Table 1.These features are extracted from both ∆HBO 2 and ∆HHB signals and fused at the feature level to create a fused feature vector.

Feature Selection
Feature selection is crucial for improving model efficiency by focusing on important features, reducing dimensionality, and ultimately improving the overall performance in machine learning tasks.In this work, the Minimum Redundancy Maximum Relevance (MRMR) algorithm [33] is utilised.MRMR identifies the most informative features for a given task by considering both their relevance to the target variable and their redundancy with respect to each other.It evaluates the mutual information between features and the target, ranking them by relevance while also measuring the redundancy between features.The algorithm then selects features that achieve the right balance between relevance and redundancy, resulting in a subset of features that can improve model performance with reduced features.
Table 1.Details of statistical features used in this study.The feature vector F comprises all ten features, with h as the preprocessed signal (∆HBO 2 or ∆HHB), h as the derivative of h, and h as the mean of h .h peak , h rms , and h am denote the peak, root mean square, and absolute mean of the input signal h, respectively, while var(.)represents the variance.

Features Definitions
Log Energy Variation in First Derivative

Classification
In the context of pain level assessment, the classification focus was on distinguishing between various pain classes: Baseline (B), Low Pain (LP), and High Pain (HP).To achieve this, we employed a reduced feature set consisting of statistical features extracted from both ∆HBO 2 and ∆HHB signals.We utilised well-known classifiers such as Discriminant (Disc) [34], K-Nearest Neighbour (KNN) [35], and Support Vector Machine (SVM) [36] to identify pain levels using the feature set.We employed parameter optimisation, carefully tuning the classifiers using a Bayesian approach [37].This data-driven decision-making process is supported by an acquisition function known as 'expected improvement per second plus', which underwent 50 iterations.We identified the hyperparameters for each classification algorithm that minimised the 10-fold cross-validation loss across the entire dataset [38].
The classification performance was evaluated using a leave-one-subject-out crossvalidation (LOSOCV) approach [39].In LOSOCV, the model's effectiveness is assessed by withholding one individual's data from the dataset for testing, while the data from the remaining participants undergoes 10-fold cross validation.This process is repeated iteratively for each subject in the dataset, ensuring that each subject serves as the test set exactly once.The performance metrics consisting of accuracy (Acc), sensitivity (Sen), specificity (Spec), and F1 score (F1) and obtained in each iteration were averaged to provide a comprehensive assessment of the model's overall performance.Additionally, we systematically tested the identification of the best-performing model with varying numbers of features based on their MRMR rank.Thus, the combination of feature engineering, hyperparameter optimisation, and classification algorithms proves to be a powerful toolkit for decoding pain levels based on fNIRS signals.

Statistical Analysis
The obtained features were also analysed using statistical analysis to identify significant differences in the obtained features across the different experimental conditions for both ∆HBO 2 and ∆HHB independently.This information will help validate our hypothesis, indicating that the obtained features encompass pain-related data from the experimental conditions.First, the data were examined for normality and homogeneity using the Kolmogorov-Smirnov tests.Focusing on the ten extracted features from ∆HBO 2 and ∆HHB measurements for the classification of the pain level, differences were analysed using Analysis of Variance (ANOVA).A post hoc Bonferroni test was carried out for multiple comparisons.The significant level was set to p < 0.05.All statistical analyses were performed using SPSS version 29.

Results
In this section, the outcomes of the proposed multi-class fNIRS-based pain assessment system are presented.The results of the system are demonstrated via the independent utilisation of ∆HBO 2 and ∆HHB signals, along with employing combined haemoglobin measures.Ten features are extracted from each measure and are passed to the three classifiers (Disc, KNN, and SVM).In the case of ∆HBO 2 + ∆HHB, the features from each measure are fused before the classification stage, resulting in a total of 20 features in this case.The selection of classifiers for each experiment was made following extensive hyperparameter tuning, as detailed in Table 2.
Activation levels of fNIRS using both ∆HBO 2 and ∆HHB measurements for different experimental conditions are presented in Figure 8.As shown, the highest activation in the prefrontal cortex for ∆HBO 2 (first row) is recorded for HA (High Arm pain), while LH (Low Arm pain) exhibits the lowest concentration level compared to other conditions and with a very similar activation level to the baseline.Similar to ∆HBO 2 , the most elevated activation in ∆HHB measures is observed in the HA condition.However, other conditions do not exhibit a significant increase.

Classification Results
The results in terms of performance metrics for each measure are presented in Table 3.For the ∆HBO 2 measure, the SVM classifier performs remarkably well as compared to that of Disc and KNN, achieving the highest accuracy of 64.67%.It exhibits outstanding sensitivity (92.85%) and specificity (97.22%), underlining its ability to effectively identify pain instances while maintaining high precision.For the ∆HHB measure, the SVM classifier again excels with the highest accuracy of 62.28%.It maintains remarkable sensitivity (92.87%) and specificity (97.07%), showcasing its effectiveness in pain assessment.The KNN classifier exhibits an accuracy of 41.83%, whereas the Disc classifier displays an accuracy of 50.94%.The combined ∆HBO 2 + ∆HHB measure, when paired with the SVM classifier, outperforms other classification algorithms with an accuracy of 66.55%.Sensitivity (93.8%) and specificity (96.14%) remain high, highlighting the SVM's effectiveness in pain assessment.The F1 Score of 96.98% emphasises the balanced performance.On the other hand, the KNN classifier, with an accuracy of 40.19%, shows lower performance, and the Disc classifier, with an accuracy of 56.23%, exhibits moderate performance.The SVM classifier consistently achieves high accuracy, sensitivity, specificity, and F1 Score, with the ∆HBO 2 + ∆HHB measure performing the best among all measures.Following the acquisition of reference values using the full feature set (see Table 3), the feature set underwent a feature selection process using MRMR to minimise redundancy and enhance the discriminative power.Table 4 presents the performance metrics for each measure after applying MRMR.The results provide insights into how feature selection impacts the performance of pain assessment models.In the ∆HBO 2 measure, the feature selection process has notably influenced the performance of different classifiers.The SVM classifier, with nine selected features, achieves the highest accuracy of 65.71% with improved sensitivity (93.18%) and specificity (95.99%).The KNN classifier, with seven selected features, exhibits enhanced accuracy at 44.22%, although it still falls behind SVM.
In the ∆HHB measure, feature selection has similarly enhanced the performance of the classifiers.The SVM classifier, with nine selected features, maintains its position as the top-performing classifier with an accuracy of 63.42% along with improved sensitivity (94.44%) and specificity (97.22%).The combined ∆HBO 2 + ∆HHB measure benefits from feature selection, particularly in the SVM classifier with 15 selected features.It achieves the highest accuracy at 68.51%, emphasising the significance of choosing both hemoglobin measures.Sensitivity (94.7%), specificity (94.29%), and the F1 Score also reflect notable improvements.The KNN classifier, with 18 selected features, shows an accuracy of 40.8%.These findings emphasise the crucial role of both measures, particularly in the combined (∆HBO 2 + ∆HHB) measure, where the SVM classifier emerges as the optimal choice for precise and well-balanced pain assessment.Table 5 lists the features corresponding to the optimal results for each measure.In the approach using a fusion of haemoglobin measures (∆HBO 2 + ∆HHB), among the fifteen selected features, nine belong to ∆HBO 2 , highlighting its greater contribution compared to the six features from ∆HHB.In pain assessment, class-wise performance is also crucial because it enables the accurate identification of different pain levels, helping clinicians in making treatments based on individual pain experiences and needs.Analysing the class-wise performance of each measure, as depicted in Figure 9, highlights the superior effectiveness of the SVM classifier, particularly in accurately classifying instances of Baseline (B), Low Pain (LP), and High Pain (HP) compared to other classification methods such as Disc and KNN.Notably, the ∆HBO 2 measure demonstrates its strength in achieving higher classification accuracy for High Pain (HP) instances, while the ∆HHB measure excels in classifying Low Pain (LP) cases.However, it is important to emphasise the significance of identifying the absence of pain (B) in pain assessment, and here, the ∆HHB measure proves better at predicting pain-free observations compared to the ∆HBO 2 measure.The fusion of both ∆HBO 2 and ∆HHB effectively integrates this information, yielding improved results for both LP and HP classes.In summary, the fusion of both fNIRS measures enhances classwise accuracies in pain assessment, contributing to a more comprehensive and precise pain perception evaluation.

Statistical Analyses
The results regarding the comparison of the statistically significant ∆HBO 2 feature in the different experimental conditions are provided in Table 6.Among ten different features for ∆HBO 2 measurement, Log Energy, Crest Factor, Shape Factor, and Range exhibit significant differences compared to other features in distinguishing between experiment conditions, as indicated by their respective p-values (F (2,972) = 3.078, p = 0.046, F (2,972) = 3.264, p = 0.039, F (2,972) = 3.466, p = 0.032, F (2,972) = 10.179,p < 0.001, respectively).For ∆HHB measures, three features, Log Energy, Margin Factor, and Range, showed significant differences in identifying pain levels as compared to other features (F (2,972) = 3.127, p = 0.044, F (2,972) = 4.134, p = 0.016, F (2,972) = 4.558, p = 0.011, respectively).The results of the post hoc test for the comparison of pain levels for statistically significant features of the ∆HHB measure have been provided in Table 7.

Discussions
To the best of the authors' knowledge, this is the first study that deals with the objective assessment of pain via fNIRS within a comprehensive exploration of ∆HBO 2 and ∆HHB measures.The findings reveal an association between pain intensities and distinct statistical patterns in haemoglobin concentrations.Considering the overall system accuracy, the ∆HBO 2 measure demonstrated better performance than the ∆HHB measure in the multiclass scenario used in this study.However, when examining accuracies for specific classes, ∆HBO 2 excels in identifying High Pain signals, while ∆HHB demonstrates better accuracy for Low Pain observations.Upon a comparison of both fNIRS measures, it can be concluded that the fusion of ∆HBO 2 and ∆HHB measures at the feature level emerges as an effective method for the categorisation of the three pain intensities in our experimental conditions.
Based on the classification results, it can be deduced that the SVM classification algorithm is most effective when used with the selected statistical features across all the measures in pain assessment.Both of the fNIRS measures are considered to be reliable in evaluating pain, with ∆HBO 2 demonstrating slightly higher accuracy than the ∆HHB measure when used independently.However, the most optimal results are obtained when combining both ∆HBO 2 and ∆HHB, suggesting that a combination of these two measures offers the best performance for pain assessment in our experimental conditions.While ∆HBO 2 provides insights into the oxygenated haemoglobin concentration, which can indicate changes in blood flow and tissue activity, ∆HHB reveals deoxygenated haemoglobin levels reflecting variations in tissue oxygen consumption.By integrating these two measures, a more holistic understanding of the physiological responses to pain is achieved.This combined approach allows for a more robust assessment as it captures both the supply and demand aspects of oxygen delivery, thus enhancing the ability to detect and interpret changes in pain perception.
Existing studies on pain assessment using neuroimaging methods have primarily focused on binary classifications, mainly distinguishing between pain and no pain.However, the development of approaches capable of distinguishing various signatures of pain has been neglected so far.This limitation is significant given the diverse origins (e.g., peripheral, emotional, and phantom pain), varying intensities, and durations of pain experienced in the human body.Different types of pain are carried to the central nervous system by different sensory receptors, responding to various stimuli associated with pain, such as temperature, chemical, or pressure [15].Hence, there is a need for machine learning models that can effectively differentiate between multiple pain signatures at varying intensities, offering greater relevance for real-world scenarios.In contrast, our study addresses this gap by focusing on multilevel pain classification, considering pain originating from different locations of the body, specifically the hand and arm.This is particularly important for patients who are unable to communicate verbally, such as elderly people recovering from a stroke or with advanced dementia, and when the source of pain is not readily apparent.
In examining activation levels across different pain conditions, our focus on ∆HBO 2 and ∆HHB measures provides valuable insights into the neural responses associated with pain perception.As depicted in Figure 8, the most pronounced increase in ∆HBO 2 levels occurs in response to High Arm (HA) pain, emphasising the sensitivity of this measure to high pain intensities.Similarly, increased activation is evident in High Hand (HH) pain compared to Low Hand (LH) and Low Arm (LA) pains.These outcomes are consistent with prior research, reinforcing the notion that elevated pain levels are associated with a more pronounced neural response in fNIRS studies investigating ∆HBO 2 [22,40].The significant increase in activation during various conditions compared to the baseline, except for Low Arm (LA) pain, is noteworthy.This could, in part, account for the lower accuracy observed in identifying the low pain (LP) class using ∆HBO 2 across all three classifiers.Activation levels in ∆HHB data exhibited minimal fluctuations across diverse conditions, with the exception of the HA condition, where the highest activation, akin to ∆HBO 2 , was observed.These findings highlight the superior efficacy of ∆HBO 2 as a more reliable measure for pain assessment compared to ∆HHB, when used independently.However, when combined in a feature fusion scheme, they collectively obtained better accuracy than when used independently.While our proposed system demonstrated better performance in identifying different pain levels, it presents some limitations.First, the channel selection algorithm employed in our study served the purpose of rejecting channels saturated with artifacts and noise.However, it may automatically discard channels containing valuable pain-related information.To address this, a more advanced preprocessing algorithm should be considered, capable of mitigating noise in unreliable channels without outright rejection.This would ensure that potentially relevant information is retained in the dataset for more comprehensive pain assessment.Second, it is evident in our preprocessing stage, where we opted to average out all channels to generate a single time series vector.This approach, while simplifying the data, has the drawback of suppressing information inherent in individual channels.In our future work, we will conduct analysis by defining specific regions of interest based on functional areas of the brain, which can provide insights into the localised functions and responses to pain associated with different brain regions.Finally, our investigation into fNIRS data primarily focused on the time domain, emphasising the extraction and assessment of simple statistical features.However, by exclusively focusing on the time domain, we may have overlooked valuable information present in other domains.To broaden the scope of the analysis, we should consider additional domains, such as frequency or cepstral domains, throughout the stages of preprocessing, feature extraction, and evaluation.

Conclusions
In this study, we introduced a multilevel pain intensity assessment using fNIRS data, compiling a novel dataset from healthy individuals experiencing varying induced pain levels in distinct body locations.Analysing ∆HBO 2 and ∆HHB measures, we found that ∆HBO 2 outperformed ∆HHB overall but excelled in predicting high and low pain classes, respectively.Combining both measures significantly improved the performance, demonstrating the potential of fNIRS for multilevel pain assessment.The system achieved 68.51% ± 9.02% accuracy, 94.7% ± 5.77% sensitivity, and 94.29% ± 4.92% specificity in diagnosing no pain, low pain, and high pain observations, respectively.Future research aims to explore integrating fNIRS with other sensor modalities, analysing pain-related information in different fNIRS domains, and effectively pinpointing the site of pain.Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Figure 1 .
Figure 1.System block diagram of the proposed fNIRS-based pain assessment system.

Figure 2 .
Figure 2. Schematic representation of the experimental procedure.

Figure 4 .
Figure 4. Twenty-two-Channel fNIRS (measuring changes in ∆HBO 2 ) raw data (excluding two faulty channels) with annotated and highlighted durations for different conditions: B (Baseline), LA (Low Arm Pain), HA (High Arm Pain), HH (High Hand Pain), and LH (Low Hand Pain).The gray background in the figure represents the duration of each experiment phase: Baseline: 60 s, LA, LH, HA, and HH, each lasting 10 s.

Figure 5 .
Figure 5. Raw fNIRS channels (measuring changes in ∆HBO 2 ) selected after the proposed channel selection algorithm featuring the relative range (RR).The intervals for various pain conditions are highlighted and annotated as B (Baseline), LA (Low Arm Pain), HA (High Arm Pain), HH (High Hand Pain), and LH (Low Hand Pain).The gray background in the figure represents the duration of each experiment phase: Baseline: 60 s, LA, LH, HA, and HH, each lasting 10 s.

Figure 7 .
Figure 7. Preprocessed 10-Second Data Segments for Baseline (B), Low Pain (LP), and High Pain (HP) Classes, displayed for ∆HBO 2 (Left) and ∆HHB (Right).The processing pipeline encompasses low-pass filtering, Common Average Referencing (CAR) for each filtered channel, and the final step of averaging across all channels, culminating in a consolidated vector representation.

Figure 8 .
Figure 8. Haemodynamic changes shown using fNIRS for ∆HBO 2 (first row) and ∆HHB (second row) measures: (a) Baseline, (b) HH (High Hand Pain), (c) LH (Low Hand Pain), (d) HA (High Arm Pain), and (e) LA (Low Arm Pain).The color bar signifies the change in concentration of ∆HBO 2 and ∆HHB (∆µmol).These calculations are derived from the averages across all subjects for each respective channel.

Figure 9 .
Figure 9. Class-wise accuracy (%) assessment of different measures using Disc; KNN; and SVM classifiers using confusion charts.

Table 2 .
Optimised hyperparameters for different classification algorithms via Bayesian Optimisation in the context of distinguishing between Baseline (B), Low Pain (LP), and High Pain (HP).

Table 3 .
System performance metrics (Acc: Accuracy, Sen: Sensitivity, Spec: Specificity, and F1 Score) for different classification algorithms (Disc, KNN, and SVM) across various measures, with each measure having a different feature vector length denoted by #.

Table 4 .
System performance metrics (Acc: Accuracy, Sen: Sensitivity, Spec: Specificity, and F1 Score) with MRMR-based selected features for different classification algorithms (Disc, KNN, and SVM) applied to each measure, with the feature vector length denoted by #.

Table 5 .
List of selected features for each measure, with # indicating the number of features.

Table 6 .
Post Hoc Test Results for Different Levels of Pain in Various Features of ∆HBO 2 (Only comparisons with significant (p ≤ 0.05) values are reported.)

Table 7 .
Post Hoc Test Results for Different Levels of Pain in Various Features of ∆HHB (Only comparisons with significant p-values are reported).
*: the mean difference is significant at 0.05 level.