Convolutional Neural Networks for Differential Diagnosis of Raynaud’s Phenomenon Based on Hands Thermal Patterns

: Raynaud’s phenomenon (RP) is a microvessels’ disorder resulting in transient ischemia. It can be either primary or secondary to connective tissue diseases, such as systemic sclerosis. The differentiation between primary and secondary to systemic sclerosis is of paramount importance to set the proper therapeutic strategy. Thus far, thermal infrared imaging has been employed to accomplish this task by monitoring the ﬁnger temperature response to a controlled cold challenge. A completely automated methodology based on deep convolutional neural network is here introduced with the purpose of being able to differentiate systemic sclerosis from primary RP patients by relying uniquely on thermal images of the hands acquired at rest. The classiﬁcation performance of such a method was compared to that of a three-dimensional convolutional neural network model implemented to classify thermal images of the hands recorded during rewarming from a cold challenge. No signiﬁcant differences were found between the two procedures, thus ensuring the possibility to avoid the cold challenge. Moreover, the convolutional neural network models were compared with standard feature-based approaches and showed higher performances, thus overcoming the limitations related to the feature extraction (e.g., biases introduced by the operator). Such automated procedures can constitute promising tools for large scale screening of primary RP and secondary to systemic sclerosis in clinical practice.


Introduction
Raynaud's phenomenon (RP) is a common vascular disorder consisting of recurrent, long-lasting, and episodic vasospasm of the fingers and toes often manifesting as discoloration and pain [1]. RP is typically induced by cold exposure and emotional stress [2]. It affects approximately 5-10% of the population (prevalently females) [3], with a worldwide distribution, although its prevalence is elevated in cold climates where the risk of exposure to low ambient temperatures is greater [4]. RP is classified as primary RP (PRP) if there is no known underlying illness and secondary when associated with a disorder detected upon assessment [5]. The distinction is important because prognosis, severity, and treatment can all be affected [6]. Secondary RP can be associated with many systemic rheumatic diseases. The most frequent association is with systemic sclerosis (SSc) [7].
SSc is a complex autoimmune connective tissue disease that is characterized by progressive generalized obliterative vasculopathy and widespread aberrant tissue fibrosis [8]. Although SSc is a heterogeneous disease, RP occurs in most patients, affecting ∼96% of them [9]. RP is considered the most common and one of the earliest symptoms of this disease [10]. SSc-associated RP typically has a lag period that can last several years before additional SSc organ-specific disease manifestations emerge [10,11]. Whereas in primary or 'idiopathic' RP, tissue ischemia is transient or reversible, in secondary RP persistent tissue ischemia can occur, resulting in digital ulceration and/or gangrene [12].
Frequently in the early stages of disease there is not a clear-cut difference between PRP and secondary PR. Indeed, many RP patients have no sign of systemic disease, although they do present with subtle nailfold abnormalities. Conversely, the differential diagnosis of PRP versus secondary RP is of utmost importance to allow for the earliest successful treatment of this condition and the associated underlying disease. Since RP impacts the finger thermoregulatory system, the evaluation of the finger thermoregulatory impairment is crucial to distinguish between PRP and RP secondary to SSc [13]. To this aim, finger temperature can be monitored through thermal infrared (IR) imaging technique.
Thermal IR imaging is a contactless, non-invasive technique which provides a map of a body's superficial temperature by measuring its emitted infrared radiation [14][15][16]. It has been widely used in medicine to assess cutaneous temperature and its topographic distribution as well as to monitor the psychophysiological state of an individual [17,18]. Considering that the skin temperature depends on local blood perfusion and thermal tissue properties, thermal IR imaging provides important indirect information concerning circulation and thermoregulatory functionality of the cutaneous tissue [14]. This technique has been employed to differentiate primary from secondary RP, often by monitoring the finger response to a controlled cold challenge [19]. Indeed, SSc, PRP, and healthy controls (HC) show different thermal recovery to the same functional stimulation (i.e., cold challenge) [13]. Therefore, the differentiation among HC, SSc, and PRP was usually performed based on statistical analysis of simple descriptors of the cutaneous temperature recovery (e.g., lag time, time to reach a given recovery threshold) [19]. However, such an analysis procedure involves identifying the regions of interest on the fingers from which the statistical descriptors of the temperature recovery are computed as well as choosing the best descriptors to use for the classification. Hence, the intervention of an expert operator is required, thus introducing operator-dependent bias on the classification outcome.
Furthermore, since in PRP or SSc patients the blood vessels in the extremities are over-sensitive to changes in temperature, the administration of a cold challenge can be a very uncomfortable, and frequently painful, process. In addition, a major problem has been the lack of a standardized protocol that regularize such a challenges' administration procedure. It would therefore be desirable to develop an automated classification approach able to differentiate between PRP, SSc, and HC, thus limiting the human intervention and without the administration of a cold stimulus.
In this perspective, machine learning approaches are valuable to minimize or avoid the clinician's intervention and to speed up diagnosis. Indeed, such approaches are suitable for solving complex task and limiting human intervention. Presently, machine learning algorithms are gaining popularity across a wide range of innovative applications, such as smart houses [20] and autonomous vehicles [21] but also in physiological signal classification [22,23] or to support disease diagnosis [24,25]. Machine learning approaches can be feature-based or data-driven [26]. In the traditional feature-based machine learning approach, features are manually selected and extracted. The difficulty with this approach is that it is necessary to choose which features are important and, as the number of classes to classify increases, feature extraction can become cumbersome. On the other hand, data-driven methods provide a principled set of mathematical methods for extracting meaningful features from data [27]. Specifically, it learns from and makes predictions based on data. Those approaches are usually performed by employing advanced machine learning algorithm such as a deep neural network. Indeed, a deep neural network model exploits multiple layers of nonlinear information processing for feature extraction and transformation as well as for pattern analysis and classification [28]. Figure 1 shows the difference between the data driven and feature-based methods.
Appl. Sci. 2021, 11, 3614 3 of 18 multiple layers of nonlinear information processing for feature extraction and transformation as well as for pattern analysis and classification [28]. Figure 1 shows the difference between the data driven and feature-based methods. With respect to a deep neural network, deep convolutional neural networks (DCNNs) have become the leading architecture for most image classification tasks [29]. DCNNs make use of kernels (also known as filters), to detect features throughout an image. A kernel is a matrix of values, called weights [28]. They are typically composed of three types of layers: convolution, pooling, and fully connected layers. The first two perform feature extraction, whereas the third maps the extracted features into a final output, which is often a classifier. DCNNs find complex relationships by minimizing a cost function (a measure of error between the DCNN and real outputs) through the use of gradient descent approaches and a backpropagation algorithm [30]. DCNNs have been widely used in biomedical images analysis and they have reached excellent classification outcomes [31,32].
In this study, a novel automated DCNN classification methodology that aims to distinguish among PRP, SSc, and HC based on patient's hand thermal patterns measured at rest (i.e., without the administration of a cold challenge) is presented. The performances of the proposed classifier were compared to that of a DCNN model implemented to classify hands thermal images of participants undergoing the cold challenge procedure (CCP). Moreover, for both the analyses, the DCNN models' performances were compared to those of a feature-based machine learning approach. The comparison between different models allowed us to investigate the capability of the DCNN classifier based on the thermal image at rest to identify PRP, SSc, and HC with respect to approaches that require the administration of a cold challenge and the intervention of an expert operator.

Paticipants
The experimental session involved 36 participants: 13 healthy, 11 PRP, and 12 SSc participants (Table 1). Participation was strictly voluntary. Before the start of the experimental trials, the participants were adequately informed about the purpose and protocol With respect to a deep neural network, deep convolutional neural networks (DCNNs) have become the leading architecture for most image classification tasks [29]. DCNNs make use of kernels (also known as filters), to detect features throughout an image. A kernel is a matrix of values, called weights [28]. They are typically composed of three types of layers: convolution, pooling, and fully connected layers. The first two perform feature extraction, whereas the third maps the extracted features into a final output, which is often a classifier. DCNNs find complex relationships by minimizing a cost function (a measure of error between the DCNN and real outputs) through the use of gradient descent approaches and a backpropagation algorithm [30]. DCNNs have been widely used in biomedical images analysis and they have reached excellent classification outcomes [31,32].
In this study, a novel automated DCNN classification methodology that aims to distinguish among PRP, SSc, and HC based on patient's hand thermal patterns measured at rest (i.e., without the administration of a cold challenge) is presented. The performances of the proposed classifier were compared to that of a DCNN model implemented to classify hands thermal images of participants undergoing the cold challenge procedure (CCP). Moreover, for both the analyses, the DCNN models' performances were compared to those of a feature-based machine learning approach. The comparison between different models allowed us to investigate the capability of the DCNN classifier based on the thermal image at rest to identify PRP, SSc, and HC with respect to approaches that require the administration of a cold challenge and the intervention of an expert operator.

Paticipants
The experimental session involved 36 participants: 13 healthy, 11 PRP, and 12 SSc participants (Table 1). Participation was strictly voluntary. Before the start of the experimental trials, the participants were adequately informed about the purpose and protocol of the study. All participants signed an informed consent form, which outlined the methods and the purposes of the experimentation in accordance with the Declaration of Helsinki [33]. The study was approved by the Institutional Review Board and Local Ethical Committee of the School of Medicine of the University of Chieti-Pescara (protocol code AC052514 12/07/2016).  [34]. PRP patients were recruited from the Capillaroscopy outpatients service of the Dermatologic Clinic of the University G. d'Annunzio, Chieti, Italy from December 2016 to February 2017. Healthy individuals were recruited from parents of dermatologic oncology patients attending the outpatients service of the Dermatologic Clinic of the University G. d'Annunzio, Chieti, Italy from December 2016 to February 2017. All healthy individuals did not disclose any history of vascular disease or present with any type of vascular disease or vasoactive drug.
PRP and SSc patients were a priori classified according to the criteria and the methods established in 2001 by the American College of Rheumatology [35,36]. All the patients received continuative vasodilator therapy for RP (Pentoxyfillin, calcium channel blockers) which was not discontinued for the purposes of the study. PRP and SSc patients' exclusion criteria were a history of bronchial asthma; renal or hepatic failure; hypotension; moderate or severe arterial hypertension; history of drug or alcohol abuse, smoking, gout, gastric ulcers, or cerebral or cardiac ischemic disease; and sympathectomy of the upper limb performed within 12 months of the beginning of the study. HC exclusion criteria were cigarette smoking; cardiovascular, or neurovascular disorders; hypertension; any overt dermatological or immunological disease; all types of therapeutic treatment; and history of drug or alcohol abuse. Moreover, participants were requested to refrain from vigorous exercise, caffeine, and alcohol for 4 hours prior to the assessment.

Procedure
Upon arrival, each participant was left in the experimental room for 15 min [37] to allow participants to achieve proper acclimatization to the room environmental conditions and the baseline skin temperature to stabilize. The recording room was set at a standardized temperature (23 ± 0.5 • C) and humidity (55%) by a thermostat according to the International Academy of Thermology guidelines [38]. Participants sat comfortably on a chair during acclimatization and measurement periods. The experimental paradigm consisted of the IR-images recording of the dorsal aspect of both hands, before and after the administration of a cold challenge. The first ones were necessary to obtain the baseline of the fingers' temperature and the remaining ones were useful to monitor the temperature recovery. The participant's hands were placed on a non-reflective, black surface, where the hands shape was drawn to maintain the same position before and after the cold stimulus. The cold challenge was administered by immersing the hands (protected from getting wet by thin, disposable latex gloves) for 2 min in a 3 L water bath maintained at 10 • C. After the cold challenge, the gloves were removed, and the hands were returned to their original position on the non-reflective surface. Each recording session lasted 23 min including both baseline (3 min), recovery phases (20 min), beside the time needed for the cold stress (2 min) that was not recorded. The images were acquired every 30 s.

Data Acquisition
A FLIR SC3000 digital thermal camera was used in the experiment. It is characterized by a Focal Plane Array of 320 × 240 Quantum Well Infrared Photodetectors, sensitive to the thermal radiation in the 8-9 µm band. The temperature sensitivity/noise equivalent temperature difference of the thermal camera is 0.02 K. The thermal camera was blackbodycalibrated to factory specifications by means of periodic calibration performed by the Quality Management Systems of FLIR, which is certified to comply with ISO 9001:2008. The process of blackbody calibration is described in [37]. Such a process is useful to remove noise-effects related to the sensor drift/shift dynamics and optical artefacts. In accordance to literature [39], cutaneous emissivity was considered as ε ≈ 0.98. The thermal camera was placed 1.5 m distant from the hands' dorsum and it was placed perpendicular to the analyzed region [40].
The thermal images acquired were used to feed the DCNN models. In detail, the thermal image corresponding to the central part of the baseline period was chosen as representative of the resting condition for each participant. The whole CCP including the rewarming condition was instead represented by a sub-sample of thermal images constituted by extracting an image every three minutes.

Data Analysis
The data were analyzed considering the two conditions: thermal data acquired before the cold challenge (i.e., baseline data or hands' thermal images at rest) and data related to the whole CCP (i.e., baseline and recovery data). For each of the two conditions, a DCNN and a feature-based model such as the support vector classifier (SVC) were implemented to classify PRP, SSc, and Healthy participants. The performances of all the models were compared employing the McNemar-Bowker test. Figure 1 shows the DCNN and SVC processing pipeline implemented for the baseline analysis.
Concerning the feature-based approach for each finger of both hands, baseline and rewarming curves were obtained by averaging the temperature of the pixels within a specific region of interest. The regions of interest were identified as the nail-bed regions [19] ( Figure 2). Displacement between images were corrected manually using anatomical landmarks based on the fingers profile [19].

Data Acquisition
A FLIR SC3000 digital thermal camera was used in the experiment. It is chara by a Focal Plane Array of 320 × 240 Quantum Well Infrared Photodetectors, sen the thermal radiation in the 8-9 μm band. The temperature sensitivity/noise eq temperature difference of the thermal camera is 0.02 K. The thermal camera w body-calibrated to factory specifications by means of periodic calibration perfo the Quality Management Systems of FLIR, which is certified to comply w 9001:2008. The process of blackbody calibration is described in [37]. Such a proce ful to remove noise-effects related to the sensor drift/shift dynamics and optical In accordance to literature [39], cutaneous emissivity was considered as ε ≈ 0.98. T mal camera was placed 1.5 m distant from the hands' dorsum and it was placed dicular to the analyzed region [40].
The thermal images acquired were used to feed the DCNN models. In d thermal image corresponding to the central part of the baseline period was chose resentative of the resting condition for each participant. The whole CCP inclu rewarming condition was instead represented by a sub-sample of thermal image tuted by extracting an image every three minutes.

Data Analysis
The data were analyzed considering the two conditions: thermal data acquire the cold challenge (i.e., baseline data or hands' thermal images at rest) and data r the whole CCP (i.e., baseline and recovery data). For each of the two conditions, and a feature-based model such as the support vector classifier (SVC) were impl to classify PRP, SSc, and Healthy participants. The performances of all the mod compared employing the McNemar-Bowker test. Figure 1 shows the DCNN a processing pipeline implemented for the baseline analysis.
Concerning the feature-based approach for each finger of both hands, base rewarming curves were obtained by averaging the temperature of the pixels with cific region of interest. The regions of interest were identified as the nail-bed reg ( Figure 2). Displacement between images were corrected manually using anatomi marks based on the fingers profile [19].

Baseline Analysis DCNN Model
The IR images of the dorsum of both participants' hands during the baseline condition were used as the input of the DCNN model implemented to differentiate PRP from SSc and HC. The DCNN model was developed according to the following processing steps: • Input images preprocess. Firstly, the IR images were down sampled from 320 × 240 to 80 × 107 pixels and normalized by subtracting the average value of the entire set of images. A Principal Components Analysis was then adopted to reduce the number of spatial features while keeping the information characterizing the object to be analyzed. The number of components kept was 20 with a cumulative explained variance ratio of 0.95. • Model architecture design. The DCNN structure employed in this work was heuristically chosen in similarity with previously reported DCNN structures on biological images or signals classification [41]. The DCNN was composed of 3 convolutional layers, 3 pooling layers, and 1 fully connected prior to the output layer. The first convolutional layer consisted in 32 filters (size 5 × 5) applied to the input images to obtain 32 feature maps of the images. The other 2 convolutional layer were both composed of 10 filters (size 5 × 5). The activation function employed in all the 3 convolutional layers to add nonlinearity to the network, were the Rectified Linear Unit (ReLU) function. Then, as pooling layer (or down-sampling layer) a MaxPooling were chosen, where the largest element from the rectified feature map was retained. A filter size of 3 × 3, 3 × 4 and 2 × 2 for the 3 MaxPooling layers, respectively, was implemented to reduce the dimensionality of each feature to 1 after the 3 convolutional layers and before the fully connected layer. The fully connected layer consisted in 20 neurons employed to summarize information and compute the class score. Lastly, a softamx function was used in the output layer, which outputs a probability value from 0 to 1 for each of the 3 classification classes (PRP, SSc, and healthy). All the biases of the DCNN were initialized to a small constant, i.e., 0.1, whereas the weights were initialized in a pseudo-random manner employing a truncated normal distribution (standard deviation = 0.1). The DCNN architecture is shown in Figure 1a. • Model optimization. The model optimization was primarily focused on how to reduce overfitting. Indeed, developing a DCNN model with a small sample size inevitably involves high risk of overfitting. A technique used to address overfitting is regularization [42]. In this study, a Ridge Regression (L2 regularization) was implemented by adding the sum of the squared values of the model coefficient (weights) as a penalty term to the loss function. The loss function employed in this study was the categorical cross-entropy, therefore after performing the regularization technique it resulted in the following Equation: whereŷ k is the kth scalar value of the model output, y k is the corresponding target value, and the constant λ times the sum of the squared weight (ω) values is the regularization term. The λ value was set to 0.01. This was intended to reduce model complexity and make the model less prone to overfitting. In addition, instead of using a fixed learning rate hyperparameter for the presented model, which could lead the model to converge too quickly to a suboptimal solution, a tunable learning rate over the training process was implemented. In detail, a function was implemented to reduce learning rate by a factor of 0.1 once learning stop improving during at least 10 epochs.
• Model evaluation. To evaluate the model performance, a categorical cross-entropy was employed, which is suitable for multiclass classification task. The optimization procedure was iterated for 100 epochs with a batch size of 6 samples. To address the model generalization performance, a leave-one-out cross-validation procedure was performed [43]. The metrics used for evaluating the model was the accuracy, sensitivity, and specificity. Accuracy represents the percentage of correct predictions out of the total number of test samples. Sensitivity is the proportion of predicted true positives out of all patients with the disease, whereas specificity represent the percentage of the predicted true negatives out of all participants who do not have the disease [44]. Those metrics were performed by counting the number of correct DCNN predictors after an argmax evaluation of the DCNN output vector, averaged among the plateau iterations. This procedure was conducted following the leave one-out cross-validation. The overall sensitivity and specificity of the classifier were obtained by averaging the sensitivity and specificity of each class, respectively.
The described DCNN model was implemented in Python using Keras API with TensorFlow backend. For model evaluation, the scikit learn library was utilized. Figure 3 shows an example of the input images, respectively, for HC, SSc, and PRP participants, and the related feature maps resulting as output from the first two convolutional layers.
performed [43]. The metrics used for evaluating the model was the accuracy, sensiti ity, and specificity. Accuracy represents the percentage of correct predictions out the total number of test samples. Sensitivity is the proportion of predicted true po tives out of all patients with the disease, whereas specificity represent the percenta of the predicted true negatives out of all participants who do not have the disease [4 Those metrics were performed by counting the number of correct DCNN predicto after an argmax evaluation of the DCNN output vector, averaged among the plate iterations. This procedure was conducted following the leave one-out cross-valid tion. The overall sensitivity and specificity of the classifier were obtained by avera ing the sensitivity and specificity of each class, respectively. The described DCNN model was implemented in Python using Keras API with Te sorFlow backend. For model evaluation, the scikit learn library was utilized. Figure  shows an example of the input images, respectively, for HC, SSc, and PRP participan and the related feature maps resulting as output from the first two convolutional layers

Feature-Based Analysis
A feature extraction algorithm was then implemented to extract features of intere from the same IR-images of the baseline condition. In detail, a region of interest prese on each fingers' nail bed was manually selected. For each region of interest, the temper ture average value was extracted. Region of interests are shown in Figure 2. The extract temperature data were then normalized by subtracting the average value of the enti dataset. In addition, for each participant, features related to the 10 fingers normaliz temperature distribution were estimated. These features were: the standard deviatio

Feature-Based Analysis
A feature extraction algorithm was then implemented to extract features of interest from the same IR-images of the baseline condition. In detail, a region of interest present on each fingers' nail bed was manually selected. For each region of interest, the temperature average value was extracted. Region of interests are shown in Figure 2. The extracted temperature data were then normalized by subtracting the average value of the entire dataset. In addition, for each participant, features related to the 10 fingers normalized temperature distribution were estimated. These features were: the standard deviation (STD), kurtosis, skewness, range (i.e., the difference between the largest and smallest values Appl. Sci. 2021, 11, 3614 8 of 18 of the distribution), and the inter-quartile range (i.e., the difference between the third and first quartile). Figure 4 shows the group mean and standard deviation of the 10 fingers' temperature average value, and of all the other features z-score normalized.
Appl. Sci. 2021, 11,3614 (STD), kurtosis, skewness, range (i.e., the difference between the largest an ues of the distribution), and the inter-quartile range (i.e., the difference bet and first quartile). Figure 4 shows the group mean and standard deviation gers' temperature average value, and of all the other features z-score norm As a machine learning approach, the SVC was implemented. Specifi was computed using radial basis function (RBF) as a kernel function (nonfunction has the following formula for two vectors u and v: where γ is a hyperparameter used as similarity measure between two data purpose of our study γ was set as follows: whereas the regularization hyperparameter C was set to 5. The SVC gener bilities were assessed through cross-validation. The same cross-validation ployed for the DCNN model were utilized. The metrics employed to be e the accuracy, sensitivity, and specificity. The z-score normalized features c SVC inputs.  As a machine learning approach, the SVC was implemented. Specifically, the SVC was computed using radial basis function (RBF) as a kernel function (non-linear). Such a function has the following formula for two vectors u and v:

3-Dimensional DCNN
where γ is a hyperparameter used as similarity measure between two data point. For the purpose of our study γ was set as follows: whereas the regularization hyperparameter C was set to 5. The SVC generalization capabilities were assessed through cross-validation. The same cross-validation procedure employed for the DCNN model were utilized. The metrics employed to be evaluated were the accuracy, sensitivity, and specificity. The z-score normalized features constituted the SVC inputs.

CCP Analysis 3-Dimensional DCNN
A 3-dimensional DCNN (3D-DCNN) model was implemented to classify the thermal images recorded during the whole CCP (i.e., baseline and recovery conditions) in the three classes (HC, SSc, and PRP). Due to the high computational load and the time required to analyze all the images of the recorded IR video (one every 30 s for 20 min, 40 images in total), a representative sample was considered for the successive analyses. This sample was composed of IR-images extracted every three minutes. The baseline image was also added to the set of images to better describe the rewarming phenomenon and in accordance with previous studies [13,19].
The 3D-DCNN differs from the 2D in the convolutional filter type. Indeed, it applies a three-dimensional filter to the dataset and the filter moves in three-directions (x, y, z) to calculate the low-level feature representations. Their output shape is a three-dimensional volume space. Architectures with volumetric (i.e., spatially 3D) convolutions have been successfully used in video analysis or 3D medical images [45,46]. In this case, time acts as the third dimension [47].
The 3D-DCNN model was implemented following the same processing step used to develop the DCNN model (Section 2.4.1). The batch of input images underwent the same preprocessing procedure as described in Section 2.4.1, whereas the model architecture design slightly changed. The number of 3D-filters in the three convolutional layers remained the same, whereas the filter sizes were, respectively, 5 × 5 × 4, 3 × 3 × 2, 3 × 4 × 3, and the three MaxPooling, respectively, 3 × 3 × 1, 3 × 3 × 1, 3 × 4 × 2. In this way the dimensionality of each feature was reduced to 1 before the fully connected layer. Finally, the model optimization and evaluation were conducted as previously described. The features map resulting as output from the first 2 convolutional layers together with the batch of input images, respectively for HC, SSc, and PRP participants are shown in Figure 5.
The 3D-DCNN model was implemented following the same processing step used to develop the DCNN model (Section 2.4.1). The batch of input images underwent the same preprocessing procedure as described in Section 2.4.1, whereas the model architecture design slightly changed. The number of 3D-filters in the three convolutional layers remained the same, whereas the filter sizes were, respectively, 5 × 5 × 4, 3 × 3 × 2, 3 × 4 × 3, and the three MaxPooling, respectively, 3 × 3 × 1, 3 × 3 × 1, 3 × 4 × 2. In this way the dimensionality of each feature was reduced to 1 before the fully connected layer. Finally, the model optimization and evaluation were conducted as previously described. The features map resulting as output from the first 2 convolutional layers together with the batch of input images, respectively for HC, SSc, and PRP participants are shown in Figure 5.

Feature-Based Analysis
The feature-based analysis on the CCP data was performed by extracting features of interest from the fingers' temperature trend. Such a trend was obtained by averaging the

Feature-Based Analysis
The feature-based analysis on the CCP data was performed by extracting features of interest from the fingers' temperature trend. Such a trend was obtained by averaging the temperatures of the selected regions of interest (see Figure 2) throughout the experiment. The temperature time courses of each finger of a representative HC, SSc, and PRP participant are shown in Figure 6.

of 18
The temperature time courses of each finger of a representative HC, SSc, and PRP participant are shown in Figure 6. The extracted temperature data were then normalized by subtracting the average value of the entire dataset. The features identified to characterize the dynamic response of the finger's temperature to the cold challenge were the following: 1. Delay time: the time required for the finger's temperature after the cold challenge to reach 50% of its final value (i.e., the recovery temperature after 20 min from the cold stress). 2. Rise time: the time required for the finger's temperature to rise from 10% to 90% of its final value. 3. Recovery time: time required for the finger's temperature to reach the 68% of the difference value between the baseline temperature and the temperature soon after the cold challenge. 4. Steady state error: the difference between the baseline temperature value and the final recovery temperature value. 5. Delta: difference between the finger's temperature on its final recovery point and the temperature soon after the stimulus.
These five features were collected from each of the participants' fingers, for a total of 50 features per participant. Indeed, these features are commonly used in literature to describe the fingers' recovery phase from a cold stimulus [13,19,48] in PR patients. The group mean of all the features averaged among fingers and the related standard deviation are reported in Figure 7. These features were z-score normalized and used as input to the SVC, which was preformed to classify HC, SSc, and PRP participants. The SVC was implemented using RBF as a kernel function with γ = 1/50, and the same hyperparameters The extracted temperature data were then normalized by subtracting the average value of the entire dataset. The features identified to characterize the dynamic response of the finger's temperature to the cold challenge were the following:

1.
Delay time: the time required for the finger's temperature after the cold challenge to reach 50% of its final value (i.e., the recovery temperature after 20 min from the cold stress).

2.
Rise time: the time required for the finger's temperature to rise from 10% to 90% of its final value.

3.
Recovery time: time required for the finger's temperature to reach the 68% of the difference value between the baseline temperature and the temperature soon after the cold challenge.

4.
Steady state error: the difference between the baseline temperature value and the final recovery temperature value.

5.
Delta: difference between the finger's temperature on its final recovery point and the temperature soon after the stimulus.
These five features were collected from each of the participants' fingers, for a total of 50 features per participant. Indeed, these features are commonly used in literature to describe the fingers' recovery phase from a cold stimulus [13,19,48] in PR patients. The group mean of all the features averaged among fingers and the related standard deviation are reported in Figure 7. These features were z-score normalized and used as input to the SVC, which was preformed to classify HC, SSc, and PRP participants. The SVC was implemented using RBF as a kernel function with γ = 1/50, and the same hyperparameters were employed for the baseline classification.

Baseline Results
The DCNN average accuracy and the related standard error are reported as a function of the training epoch for training and the testing set, respectivel fitting effect (decrease of the accuracy at increasing epoch) is visible in the proving the efficacy of the employed procedure. The DCNN accuracy in the reached a plateau value of 0.84 ± 0.05 and a maximum value of 0.91 ± 0.04. With respect to the feature-based analysis, performed by using the SVC, of 0.77 was obtained. The sensitivity and specificity values for each of the 3 lyzed are shown in Table 2.

Baseline Results
The DCNN average accuracy and the related standard error are reported in Figure 8

Baseline Results
The DCNN average accuracy and the related standard error are reporte as a function of the training epoch for training and the testing set, respective fitting effect (decrease of the accuracy at increasing epoch) is visible in the proving the efficacy of the employed procedure. The DCNN accuracy in the reached a plateau value of 0.84 ± 0.05 and a maximum value of 0.91 ± 0.04. With respect to the feature-based analysis, performed by using the SVC, of 0.77 was obtained. The sensitivity and specificity values for each of the 3 lyzed are shown in Table 2. Table 2. Sensitivity and specificity of DCNN and SVC models on HC, SSc, and PRP p classification. With respect to the feature-based analysis, performed by using the SVC, an accuracy of 0.77 was obtained. The sensitivity and specificity values for each of the 3 classes analyzed are shown in Table 2.

CCP Results
The accuracy trend of the 3D-DCNN model as a function of training epochs for training and test set is reported in Figure 9. The model did not show any overfitting effect. The plateau value of the average accuracy was reached at 0.88 ± 0.05. pl. Sci. 2021, 11, 3614

CCP Results
The accuracy trend of the 3D-DCNN model as a function of training epoc ing and test set is reported in Figure 9. The model did not show any overfittin plateau value of the average accuracy was reached at 0.88 ± 0.05. The SVC accuracy achieved a value of 0.83. The sensitivity and specificit the three classes analyzed are reported in Table 3 for both the 3D-DCNN and fiers.

Baseline vs. CCP Results
To test the statistical significance of the differences in the classifier perf McNemar-Bowker test was performed on the classifiers' prediction outcome whereas McNemar's test requires that there are only two possible categories f sification outcome to be tested, in the McNemar-Bowker test the outcome a be classified in more than two classes. No statistical difference was found The SVC accuracy achieved a value of 0.83. The sensitivity and specificity for each of the three classes analyzed are reported in Table 3 for both the 3D-DCNN and SVC classifiers. Table 3. Sensitivity and specificity of 3D-DCNN and SVC models on HC, SSc, and PRP participants' classification.

Baseline vs. CCP Results
To test the statistical significance of the differences in the classifier performances, a McNemar-Bowker test was performed on the classifiers' prediction outcome [49]. Indeed, whereas McNemar's test requires that there are only two possible categories for each classification outcome to be tested, in the McNemar-Bowker test the outcome analyzed can be classified in more than two classes. No statistical difference was found between the classifiers' performance in the two experimental procedures (i.e., baseline vs. CCP). In detail, both comparisons between the DCNN and the 3D-DCNN performance and between the baseline SVC and the CCP SVC performance were found not significant (B = 2, df = 3, p = n.s. and B = 0.8, df = 3, p = n.s, respectively).
Within the procedures, the difference between the performances of the machine learning approaches employed were also tested and no statistically significant difference was reported. In detail, both comparisons between the DCNN and the SVC performance in the baseline condition (B = 0.6, df = 3, p = n.s.) and between the 3D-DCNN and the SVC in CCP (B = 2.7, df = 3, p = n.s.) were not significant. The overall performance metrics of both classifiers in both procedures are shown in Table 4.

Discussion
RP is associated with characteristic abnormalities in the function and morphology of the vasculature, which can result in irreversible digital ischemia. Although RP is usually idiopathic (PRP), it can occur as part of an underlying disorder such as in SSc. The condition of vasoconstriction of PRP or SSc require different clinical treatments. It is therefore important to develop reliable methods able to differentiate between a PRP patient and an RP secondary to SSc patient with high specificity. To this end, thermal IR imaging has been widely used as a support to clinical diagnosis. However, the thermographic protocols often incorporate some form of a temperature challenge, usually cold, and need to be performed by an experienced operator. In this study, a completely automated methodology was developed to differentiate SSc from PRP patients by relying uniquely on IR images of the hands acquired at rest. The outcome of such an automated procedure was compared to those obtained with procedures involving the cold challenge and manual selection of the features. No statistically significant differences were found in the comparison, thus favoring the automated procedure performed at basal condition. The differences found between the classes of participants, and the procedures adopted are detailed in the following subsections.

Thermoregulatory Difference between Classes
Thermoregulation is a complex mechanism that is mostly regulated by the autonomic nervous system. Specifically, sympathetic cholinergic nerves mediate the cutaneous vasodilation in response to whole-body heating, whereas noradrenergic nerves are involved in the cutaneous vasoconstriction during whole-body cooling [50]. Concerning localized stimuli, skin warming induces cutaneous vascular responses due to temperature-sensitive afferent neurons and nitric oxide, whereas local cutaneous vasoconstriction in response to direct cooling of the skin is due to sensory and sympathetic noradrenergic nerves and non-neural mechanisms [50]. The thermal responses of an HC, PRP, and SSc participant's hands to a cold challenge are shown in Figure 6.
PRP is associated with abnormal responses to the environmental temperature and activity (i.e., vasospasm) related to vessels' diseases [51]. Typically, PRP produces symmetrical ischemia lasting 15-20 min, involving both hands with a more sensitive finger. Patients with PRP usually exhibit mild episodes that do not hamper daily activities and generally improve with aging [52,53], whereas SSc patients are affected by intense and frequent ischemic events, associated with ulcers in 25-39% of cases [54,55].
SSc's vascular reactivity and occlusive disease are related to a complex interaction between endothelial cells, smooth muscle cells, the extracellular matrix, and intravascular circulating factors [51]. The impairment of the endothelium induces overproduction of the vasoconstrictor endothelin-1 and underproduction of the vasodilator nitric oxide and prostacyclin. The diminished production of vasodilatory neuropeptides together with an over-regulation of the vascular smooth muscle adrenoreceptor (α2c-AR), leads to an abnormal vasoconstrictive response to stress or cold stimuli. Moreover, severe vasospasms induce repeated episodes of vasoconstriction, provoking occlusion of the microcirculation and injuries [56].
The findings of the present study further confirm the capability of thermal IR imaging to provide information regarding the physiology of the superficial blood circulation [57]. Moreover, the good classification performance obtained using images acquired at rest demonstrated that the vascular impairment associated to RP and SSc generated peculiar superficial temperature distribution of the hands even in the absence of an ongoing bout of the disease.

Comparisons with Previous Studies and Discussion on Machine Learning Results
Several research studies focused on the development of reliable methods able to differentiate between PRP patients and RP secondary to SSc patients with high specificity. Since changes in temperature pattern of the finger and toes are a clinical manifestation of RP, the evaluation of the finger thermoregulatory impairment is essential to detect the presence of the disease and to differentiate the two forms of this disorder. To this end, thermal IR imaging has been widely used as a support to clinical diagnosis. However, the thermographic protocols often incorporate some form of temperature challenge, usually cold, and need to be performed by an experienced operator. For instance, de Campos et al. employed thermal IR imaging together with cold stress procedure to classify RP patients [58]. The authors used a linear discriminant analysis as a classification method. The best result obtained on the classification of RP, in primary and secondary reached an accuracy of 0.80. Ismail et al., performed RP classification through the use of thermal IR imaging and cold challenge procedure [19]. The overall classification outcome of 0.87 of correctly classified participants was achieved by employing a multiple logistic regression algorithm followed by a receiver operating characteristic curve analysis. Moreover, studies such as Viana et al., demonstrated a statistically significant difference between the thermal IR time course of RP patients versus healthy ones during the recovery from a cold challenge [59]. Similar results were obtained in [2,60]. Recent studies explored the possibility to classify RP patients based on their hands' temperature in baseline condition [61,62]. Horikoshi et al. demonstrated that the baseline nail fold temperature was significantly lower in RP patients than in controls [48]. Martini et al., proved that at baseline, higher temperatures at the distal interphalangeal joint and lower temperatures at metacarpophalangeal joints were observed in PRP compared to the secondary RP [61]. A considerable step forward would be the implementation of a completely automated procedure to avoid the use of a cold challenge and limit human intervention. To these aims, an automated data-driven DCNNs classifier that allows differentiation among PRP, SSc, and HC based on their hands' baseline IR image is presented in this study. The classifier's performance was compared to those of a feature-based approach. In addition, the DCNN model for IR images at rest was compared to a 3D-DCNN model fed with the hands' IR images acquired during the whole CCP. Each implemented model was cross-checked for performances evaluation.
Regarding the basal condition, the DCNN implemented was able to classify PRP, SSc, and HC with a high degree of accuracy, i.e., an overall accuracy of 0.84, an overall sensitivity of 0.83, and specificity of 0.91. On the other hand, the feature-based approaches achieved an overall accuracy of 0.77, sensitivity of 0.77, and specificity of 0.88. With respect to the CCP, the 3D-DCNN model was able to classify PRP, SSc, and HC with an overall accuracy of 0.88, sensitivity of 0.88, and specificity of 0.94, whereas the SVC delivered a classification with an overall accuracy of 0.83, a sensitivity of 0.80, and specificity of 0.90. An investigatory analysis of the feature maps resulting from the convolutional layers (Figures 3 and 5) permits an insight on the feature extraction process performed by the DCNN and 3D-DCNN models. Both models seem to focus on the difference among fingers and dorsum temperature highlighting those fingers where the difference is smaller to differentiate between HC, SSc, and PRP participants.
Although the accuracy of the classification performed on the IR imaging recorded during CCP was better than the accuracy achieved by classifying baseline IR images only, the differences were not statistically significant. This result highlights the importance of the baseline temperature as a stable physiological variable that might provide insight into the etiology of RP. Most importantly, it provides a valid, pioneering solution for differentiating PRP, SSc, and HC with performance comparable to those reported in literature but in a completely automated way and without requiring the administration of a cold challenge.
With respect to the machine learning algorithms used in this study, even though no significant difference was found between DCNN and SVC accuracy, DCNN provided better performance and was much more agile. Indeed, manually identifying and extracting features can be time consuming as well as being influenced by human errors. Moreover, the use of the DCNN model for baseline IR images classification would avoid the need for a standardized protocol, as it would only require the acquisition of a single IR image during rest condition.

Study's Limitations and Future Directions
Finally, it is worth mentioning that despite the very promising results, the low sample size can be considered a limitation of the study. In fact, the classification outcome, might increase its performance with a larger study sample, relying it on a multivariate analysis approach. Although the sample size of the study could be considered rather small, the classification was performed implementing a leave-one-out cross-validation procedure, thus basically evaluating the out-of-sample performance. Hence, the results obtained are indeed generalizable. Increasing the sample numerosity may produce a further improvement of the classifier's performance by decreasing a possible in-sample overfitting issue. Moreover, a larger sample size will allow analyzing the effect of gender and age on the classification performance. Indeed, such factors affect the skin vasoconstriction /vasodilation capacity. Therefore, a future purpose is to validate the technique on a larger sample of participants. Furthermore, the recent introduction of low-cost IR cameras can increase the availability of thermal IR imaging. Considerations on the use of such low-cost thermal technology for RP patients classification are already addressed in the literature [62,63]. In addition, the introduction of such an automated procedure based on the recording of a single IR image of the hands at rest, will possibly overcome the implementation difficulties referred to by Maverakis et al. [64]. Besides, it is likely that thermography may become more widely used given the increased accessibility of the equipment. Both these aspects could pave the way toward the application of the proposed model in everyday clinical practice. However, in this perspective, it is worth noticing that during thermal imaging measurements, the environmental conditions must be controlled and standardized as much as possible to minimize physiological variability.

Conclusions
In this study, an innovative automatic procedure to differentiate PRP from SSc and HC based on machine learning algorithms and IR images of the hands was presented. Different classification procedures were compared, involving the use or not of a cold challenge and the manual or automated features selection. The results revealed that the use of a cold challenge did not statistically improve the classification accuracy and that an automated feature selection approach performed better than the manual ones. Indeed, this study demonstrated that, by employing a DCNN model and IR images of the hands in basal condition, it is possible to achieve high level of accuracy and specificity in the differentiation among PRP, SSc, and HC. This approach allowed for overcoming issues related to the administration of a cold stimulus to the patients, as well as to avoid biases in the classification introduced by the operator, thus representing a great improvement with respect to the standard procedure. However, due to the limited sample size, this study is intended to provide evidence of the feasibility of such an automated approach. Further studies are needed to corroborate the results on a large-scale population. Although in literature it is reported that thermographic examinations are useful for differentiating secondary RP, such as SSc, from PRP, to the best of our knowledge this is the first study employing such an automated approach for this task. Finally, the implemented model, together with the recent introduction of low-cost thermal systems, can provide a new, quick, automated, and accurate classification method suitable for everyday clinical practice environment. Informed Consent Statement: Informed consent was obtained from all participants involved in the study. Written informed consent has been obtained from the patients to publish this paper.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy issues.