Concentrated Thermomics for Early Diagnosis of Breast Cancer "2279

Thermography has been employed broadly as a corresponding diagnostic instrument in breast cancer diagnosis. Among thermographic techniques, deep neural networks show an unequivocal potential to detect heterogeneous thermal patterns related to vasodilation in breast cancer cases. Such methods are used to extract high-dimensional thermal features, known as deep thermomics. In this study, we applied convex non-negative matrix factorization (convex NMF) to extract three predominant bases of thermal sequences. Then, the data were fed into a sparse autoencoder model, known as SPAER, to extract low-dimensional deep thermomics, which were then used to assist the clinical breast exam (CBE) in breast cancer screening. The application of convex NMF-SPAER, combining clinical and demographic covariates, yielded a result of 79.3% (73.5%, 86.9%); the highest result belonged to NMF-SPAER at 84.9% (79.3%, 88.7%). The proposed approach preserved thermal heterogeneity and led to early detection of breast cancer. It can be used as a noninvasive tool aiding CBE.


Introduction
Breast cancer shows high survival rates due to the current progresses in imaging modalities and treatment planning, but it still categorized as the most fatal cancer among women [1]. Here, a deep learning model is applied for dynamic thermography to aid the process of screening prior to mammography and in a clinical breast exam (CBE) to expedite diagnosis at an early stage. Several studies have confirmed the value of infrared thermography in detecting hypervascularity and hyperthermia in non-palpable breast cancer [2][3][4]. This study applies a convex matrix approximation and a deep learning model to extract the most predominant thermal patterns, called latent space thermomics. The model was trained and used alongside CBE for early detection of breast cancer and showed high accuracy in detecting symptomatic patients.

Materials and Methods
Thermal heterogeneity can be detected by using different matrix factorizations (MF) [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. Thermal variance can be used to evaluate the thermal heterogeneity of different components. Low-rank matrix approximation, using semi-and convex NMF, provides more freedom to bases when compared to NMF, while being controlled by non-negative coefficients. This means that despite the presence of potentially negative bases, they do not overlap. This property causes these algorithms to perform the grouping of bases, driven by their thermal variability on input thermal data. Figure 1 shows the low-rank representation of thermal images in the breast area for healthy and symptomatic participants, using convex NMF.
Eng. Proc. 2021, 8,30 3 of 5 Figure 1. Twelve examples of convex NMF-driven low-rank thermal matrix approximation for six healthy cases and six symptomatic or cancer cases. Some otherwise healthy cases reported pain and changes in the breast area that are clearly projected in the thermal images.

Classification Outcome
The extracted deep thermomics produced by the SPAER model were employed to train a random forest model that was used to make preliminary diagnostic decisions for both symptomatic and healthy participants. The model's accuracy was assessed by comparing the model's predictions with the ground truth, using mammography and breast biopsy. We evaluated the system's accuracy using 16 convex deep thermomics biomarkers, along with clinical and demographic covariates (i.e., age, marital status, and family history), with leave-one-out cross-validation. The accuracy yielded by the maximal multivariate model was 84.9% (79.3%, 88.7%) for NMF deep thermomics, which was challenged by NMF-SPAER+Clinical, convex NMF-SPAER+Clinical, and PCT-SPAER+Clinical, with clinical information yielding accuracies of 83.02% (79.2%, 86.8%), 79.3% (73.5%, 86.9%), and 81.1% (75.5%, 84.9%), respectively. IPCT-SPAER+Clinical, sparse PCT-SPAER + Clinical, sparse NMF-SPAER + Clinical, and convex NMF-SPAER + Clinical showed considerably similar accuracy of diagnosis, yielding median accuracies of 79.2% with different variation ranges. The lowest accuracy of these methods belonged to PCT-SPAER, which showed diagnostic accuracy of 75.5 (67.9, 81.1). The minimal full multivariate model belonged to sparse NMF-SPAER+Clinical, with an accuracy of 79.2 (73.6, 83.02%), demonstrating higher restriction in the domain of the solution due to two-fold sparsity and non-negativity in the NMF and SPAER models (see Table 1).  Thermal imaging throughputs extracted from thermal images, called thermomics [14][15][16][17], are used extensively to deliver diagnostic solutions, such as radiomics, for early-stage breast cancers [18][19][20]. The efficiency of thermomics is known to have been accompanied by clinical variables. Similarly, the abundance of thermomics impedes the system's performance by overfitting problems, which is known as the curse of dimensionality [2,14].
Deep learning has influenced thermomics and automated diagnosis through thermographic imaging. A sparse deep convolutional autoencoder model (SPAER) is used to extract low-dimensional features from thermographic images [14].
Three low-ranked representative bases, using convex NMF, were extracted and used as three channels of input data that were fed into the trained SPAER model. Then, SPAER extracted 16 deep thermomics from the input images, while encoding heterogeneous thermal patterns within this output vector. These 16 deep thermomics were then used to train a random forest model in making diagnostic decisions.

Study Cohort
We used 55 breast cancer screening participants belong to the mastology research database (DMR) to evaluate the strength of this proposed method. This cohort of patients obtained from the Hospital Universitário Antônio Pedro (HUAP) of the Federal University Fluminense [20]. Ethical Committee of the HUAP, under registration number CAAE: 01042812.0.0000.5243 supported by the Brazilian Ministry of Health approved to use this data. This data contains healthy/symptomatic patients, median age: 60 (20,120), multiple race (15 Africans (27.3%), 28 Caucasians (51%), 11 Parda (20%), and one Mulatto (1.8%)), and 18 had diabetes in their family history while 9 women went undergoing hormone replacement [20].

Results
The presented approach was tested on 55 breast cancer screening participants to investigate the capability of the model in diagnosing abnormality. The obtained results were compared to ground truth (the gold standard) to determine the system's accuracy. Three predominant low-rank matrices were calculated for 23 baseline thermal images utilizing matrix factorization. The results followed our proposed hypothesis, and thermal heterogeneity was intensified in the breast area for the symptomatic cases that had potential angiogenesis (blood vessel formation) and vasodilation, as more heterogeneity was exhibited in the ROI of symptomatic cases ("healthy with symptoms" or "cancerous" cases) than in the ROI of "healthy" cases ( Figure 1). Some otherwise healthy cases were categorized as symptomatic participants, if they reported nipple changes and pain but the results of mammography and biopsy showed no signs of cancer.
Our trained SPAER model [14] extracted 16 deep thermomics from the three channels' embedded convex low-ranked thermal matrix approximation, with dimensionality of 512 × 512 × 3. The model was trained using an Adam optimizer for 200 epochs for a batch size of 8, with a sparsity 1 regularization value of 10 −5 in dense layers.

Classification Outcome
The extracted deep thermomics produced by the SPAER model were employed to train a random forest model that was used to make preliminary diagnostic decisions for both symptomatic and healthy participants. The model's accuracy was assessed by comparing the model's predictions with the ground truth, using mammography and breast biopsy. We evaluated the system's accuracy using 16 convex deep thermomics biomarkers, along with clinical and demographic covariates (i.e., age, marital status, and family history), with leave-one-out cross-validation. The accuracy yielded by the maximal multivariate model was 84.9% (79.3%, 88.7%) for NMF deep thermomics, which was challenged by NMF-SPAER+Clinical, convex NMF-SPAER+Clinical, and PCT-SPAER+Clinical, with clinical information yielding accuracies of 83.02% (79.2%, 86.8%), 79.3% (73.5%, 86.9%), and 81.1% (75.5%, 84.9%), respectively. IPCT-SPAER+Clinical, sparse PCT-SPAER + Clinical, sparse NMF-SPAER + Clinical, and convex NMF-SPAER + Clinical showed considerably similar accuracy of diagnosis, yielding median accuracies of 79.2% with different variation ranges. The lowest accuracy of these methods belonged to PCT-SPAER, which showed diagnostic accuracy of 75.5% (67.9%, 81.1%). The minimal full multivariate model belonged to sparse NMF-SPAER+Clinical, with an accuracy of 79.2% (73.6%, 83.02%), demonstrating higher restriction in the domain of the solution due to two-fold sparsity and non-negativity in the NMF and SPAER models (see Table 1). The model with clinical variables yielded an accuracy of 71.7% (66.04%, 79.3%). Moreover, we measured the statistical significance cross-validated convex NMF-SPAER + Clinical versus other approaches using t-test (see Table 1). Convex NMF-SPAER + Clinical manifested difference to NMF-SPAER+Clinical (t-statistic = 2.9, p-value = 0.004) and NMF-SPAER+Clinical (t-statistic = 2.4, p-value = 0.01). Nevertheless, other methods exhibit statistical similarity to the cross-validated results of convex-NMF-SPAER+Clinical. This study was conducted using Python programming language and the TensorFlow and Keras Python libraries [21,22].

Conclusions
This study challenged the thermal diagnostic system with a thermomics approach generated by a sparse deep convolutional autoencoder, known as the SPAER model, while using convex low-rank thermal matrix approximation. This approach integrated convexity and sparsity to generate low-dimensional thermomics, while simultaneously preserving thermal patterns that are vital in determining angiogenesis and vasodilation in symptomatic patients. Convex non-negative matrix factorization (convex NMF) was used to reduce 23 thermal images to three channels. Then, the SPAER model reduced the dimensionality of the input data from 786,432 to 16 imaging biomarkers. The method was compared to five state-of-the-art matrix factorizations in thermography, and tested on 55 breast cancer screening cases that underwent dynamic thermography. The best accuracy reported in this study belonged to NMF-SPAER, without demographics and clinical information preserving heterogeneous thermal patterns, which had a result of 84.9% (79.3%, 88.7%). In future work, we can analyze the effect of aggregation of other available clinical factors or imaging modalities to enhance the diagnostic power of the system.