Radial Basis Function for Breast Lesion Detection from MammoWave Clinical Data

Recently, a novel microwave apparatus for breast lesion detection (MammoWave), uniquely able to function in air with 2 antennas rotating in the azimuth plane and operating within the band 1–9 GHz has been developed. Machine learning (ML) has been implemented to understand information from the frequency spectrum collected through MammoWave in response to the stimulus, segregating breasts with and without lesions. The study comprises 61 breasts (from 35 patients), each one with the correspondent output of the radiologist’s conclusion (i.e., gold standard) obtained from echography and/or mammography and/or MRI, plus pathology or 1-year clinical follow-up when required. The MammoWave examinations are performed, recording the frequency spectrum, where the magnitudes show substantial discrepancy and reveals dissimilar behaviours when reflected from tissues with/without lesions. Principal component analysis is implemented to extract the unique quantitative response from the frequency response for automated breast lesion identification, engaging the support vector machine (SVM) with a radial basis function kernel. In-vivo feasibility validation (now ended) of MammoWave was approved in 2015 by the Ethical Committee of Umbria, Italy (N. 6845/15/AV/DM of 14 October 2015, N. 10352/17/NCAV of 16 March 2017, N 13203/18/NCAV of 17 April 2018). Here, we used a set of 35 patients. According to the radiologists conclusions, 25 breasts without lesions and 36 breasts with lesions underwent a MammoWave examination. The proposed SVM model achieved the accuracy, sensitivity, and specificity of 91%, 84.40%, and 97.20%. The proposed ML augmented MammoWave can identify breast lesions with high accuracy.


Introduction
Mammography is considered as the gold standard technology for breast screening, where age and screening frequency are defined by contemplating the mammography risk-benefit ratio [1,2]. Indeed, risks associated with X-ray cumulative effects (and low sensitivity in dense breasts) limit the use of mammography-usually, women in the range between 50-69 years are invited for screening once every two or three years [3][4][5]. Usually, women after the age of 49 are offered bi/tri-annual screening to reduce the impact of ionizing radiation. Although, the recent studies reported that lowering the screening age limit to 40 years could potentially reduce breast cancer mortality rates [6][7][8].
Microwave-based techniques have recently been developed as a potential breast screening tool [9][10][11][12]. Microwave-based techniques are non-ionizing, non-invasive, and painless since they do not involve breast compression during screening. Microwave-based systems utilize the contrast in dielectric properties, i.e., permittivity and conductivity, within the spectrum of microwave frequencies (i.e., approximately in the range of 1 and 10 GHz) between healthy tissues and tissues with lesions. A high difference in one or both dielectric properties (up to 5) [13] stated between healthy tissues and tissues with lesions; newer studies confirm such high contrast exists between fatty breast tissues and lesions (WF), while it declines when considering fibro glandular breast tissues [14,15]. Wideranging research on microwave-based procedures began in the late 1990's, with a number of different prototypes developed [16]. Hitherto, few clinically tested microwave breast imaging operational systems have been reported in the literature, which was developed by Dartmouth College, USA [17], the University of Bristol, UK, jointly with Micrima Limited, UK [18][19][20], UBT Srl, Italy [21], University of Calgary, CA, [22,23], Southern University of Science and Technology, China [24], Hiroshima University, Japan, [25], McGill University, Canada [26], and Shizuoka University, Japan [27]. Only two of the aforementioned systems methods have now cleared a regulatory path (CE marking), i.e., MARIA (Micrima Limited, UK) and MammoWave ® (UBT Srl, Perugia, Italy). One of these models, the MARIA system, utilizes an array of 60 antennas (operating within the 3-8 GHz frequency band) and a matching liquid to perform the radar approach with a sensitivity of 76% [18][19][20]. Mam-moWave is uniquely skilled to work in the air with two antennas rotating in the azimuth plane, operates within the frequency band of 1-9 GHz. MammoWave examinations are performed in a multi-bistatic fashion, measuring the complex S 21 in the frequency domain. In more detail, the device transmits non-invasive and low-power microwave signals through the breast and accumulates the backscattered signatures (commonly denoted as the S 21 signals in engineering terminology) from a plurality of angular directions. A sensitivity of up to 82% has been reported [28]. An initial Machine Learning (ML) experiment [21] was performed on a limited number of subjects employing popular ML tools to classify the frequency response signal backscatter from the breast with radiological findings (WF) and no radiological findings (NF); it was found that the support vector machine (SVM) with a quadratic kernel outperformed the various applied methods tested.
The aim of this paper is to apply principal component analyses (PCA) to extract the unique quantitative responses from MammoWave raw-data frequency responses for an automated classification in WF and NF breasts, engaging the support vector machine (SVM) with radial basis function (RBF) kernel. The procedure is verified using clinical data collected in 61 breasts, each one having conventional exams by radiologists (which was used as the gold standard for our investigation). The contributions of the study are: • The experimentation was completed on 61 breasts, allowing the exploration of lesions with different dimensions. • The newly collected data appear differently in the hyperplane, motivating the authors to explore a radial basis function (RBF) kernel of SVM instead of a quadratic kernel, where SVM with an RBF kernel is found to be more efficient.

•
The optimal method for using the frequency response signals was explored. The experiment shows that the 50 components obtained by applying a principal component analysis (PCA) from the real-parts of the S 21 parameters (engaging SVM with an RBF kernel) is the best possible combination to classify NF and WF signals. • The prediction results have been analyzed by the team of researchers and radiologists through statistical measurements to understand the false positive and negative classifications, revealing that lesion size and breast density have an effect on microwave response, as well as ML predictions.

Methods
A diagrammatic flow chart of the proposed work is shown in Figure 1. In more detail, each breast has its own correspondent output of the radiologist's study review, which has been used as gold standard for the classification of the breasts in two categories: breasts with no radiological finding (NF), and breasts with radiological findings (WF), i.e., with lesions which may be benign or malignant. Gold standard labels of the breasts (NF or WF) have been employed to train and test the ML algorithms to identify microwave signals backscattered from the breasts automatically via the MammoWave.

Device Description
MammoWave (shown in Figure 2a) employs low power (1 mW) microwave signals in the 1-9 GHz frequency band. The device contains two antennas ( Figure 2d) held in free space, which illuminates the breast using electromagnetic signals and measures the correspondent scattered electromagnetic fields from different angular positions around the azimuth. The two antennas are connected to a 2-port VNA (Cobalt C1209, Copper Mountain, Indianapolis, IN, USA). For each breast, measurements have been performed, recording the complex S 21 , i.e., a parameter which is proportional to the electromagnetic field emerging from the transmitting antenna to the receiving one, after having interacted with the breast. The complex S 21 is recorded in a multi-bistatic fashion, i.e., for each transmitting position tx m , the receiving antenna is moved to measure the received signal at the receiving points rx n . In the current set-up, the receiving points are equally spaced at every 4.5 • , leading to a total of N RX = 80 receiving points, Figure 2b Figure 2c), with the breast (one at a time) positioned in a cup applying no compression. S 21 (i.e., raw data) may be then used to generate microwave images; however, in this paper, S 21 data only will be used.

Data Collection
The This study comprises 61 breasts, 25 of which were found to be NF, and 36 were determined as WF, from 35 patients participating in the feasibility clinical trial. Microwave imaging was performed with patients who had already undergone a conventional radiologist's examination review (used as a gold standard for our investigation). The average patient age was 52 years. Specifically, the radiologists reviewed conventional exams for each patient that agreed to participate in the study, classifying the breasts in to two groups: breasts with no radiological findings (NF) and breasts with radiological findings (WF), i.e., with lesions which may be benign or malignant. In this context, radiological study examination included: mammography, performed using a Selenia LORAD Mammography System (Hologic, Marlborough, MA), and/or echography, performed using the MyLab 70 xvg Ultrasound Scanner (Esaote, Genova, Italy), and/or magnetic resonance imaging, performed through a 3.0T MAGNETOM scanner (Siemens Healthcare, Erlangen, Germany). The lesion final assessment, performed using pathology within at least one year of clinical follow-up as reference standards was 22 benign lesions and 11 malignant lesions (while in three cases, the final assessment was not available). All lesion details are given in Table 1 (where possible, lesion details, dimensions, and lesion final assessment have been included).

MammoWave Signal Classification: Real-Parts of S 21 & RBF Kernel Approach
The raw frequency response includes the real and imaginary component, backscattered from the breast, i.e., λ n = Σ NF n=1 Real S21 (n) + jImg S 21 (n) where n is the number of frequencies, NF = 1601, Real S21 , and Img S 21 represents the real and imaginary component, respectively. Initial studies performed by the authors on the MammoWave's complex S 21 signal classification [21] indicates SVM with quadratic kernel (SV M Q ) is better able to categorise NF and WF signals over other tested conventional ML methods. Hence, this proposed work aims to further investigate the classification performance re-considering the real-parts of S 21 signals (in a form of Σ NF=1601 n=1 Real S21 (n)) as feature values for NF-WF signal classification through SVM model. There are two Real S21 groups, NF and WF. A twosample t-test has been performed to begin the experiment considering these groups. The t-test has been conducted to check whether the two types of Real S21 values are dependent and have equal variances. In other words, the outcome of the t-test signifies the suitability of Real S21 values for classifying NF and WF signals. The null hypothesis (H 0 ) here assumes that the two groups of Real S21 data samples are from populations with equal means. Therefore, the two types of Real S21 data samples can be employed for the classification task if the t-test rejects the H 0 and accept the alternative hypothesis (H a ). The alternative hypothesis (H a ) states that the Real S21 data comes from two different populations with unequal means. The desired significance level α = 0.05 has been assumed for accepting and rejecting the null hypothesis, where the p-value has been compared for deciding the statistical significance. Furthermore, the confidence interval for the difference in population means of NF and WF's Real S21 have been studied, where C L and C U demonstrate the lower and upper boundaries of the confidence interval. Table 2 shows the outcomes of the t-test, where p < α rejects the null hypothesis H 0 (H 0 = 1), accepts the alternative hypothesis H a , and the true mean of the population belong between −6.600 × 10 −5 to −4.600 × 10 −5 . Hence, the acceptance of the alternative hypothesis indicates that the Real S21 data comes from populations with unequal means and can be employed for the NF-WF signal classification task. Real S21 data of the NF and WF groups has been visualized in the 3D plane, which shows the spherical data, and might be classified better with the radial basis function (RBF) than the quadratic kernel of SVM. Thus, SV M RBF has been employed to classify NF and WF breast signals. The training and testing data have been divided using a Monte Carlo Cross Validation (MCCV) [29], where training and testing data have been initiated with 5% and 95%, respectively. The training data have been incremented by 5% in each simulation. The whole simulation has been repeated twenty-five times and average the performance metrics. SV M RBF computes the dissimilarity by measuring the squared Euclidean distance, which has been found to be more effective in the MammoWave breast classification task. Thus, the experiment has built on SV M RBF to improve the true positive prediction and reduce the false negative prediction. Figure 3 shows the outcomes of the MammoWave signal classification for NF and WF detection using the real components of the complex signals. The accuracy, sensitivity, and specificity of 79.80%, 70.40%, and 86.30%, respectively were obtained, which indicate the real parts from the original feature dimension (real parts of Complex S 21 ) are not significant enough to be employed as features in this classification task, and may need feature extractions to improve the classification performance.

MammoWave Signal Classification Results: PCA on Real-Parts of S 21 & RBF Kernel Approach
Hence, one of the most popular feature extraction principal component analysis (PCA) technique was applied on Real S 21 to transform more meaningful features for the classification task in a similar manner adopted to calculate the principal components (PCs) from the original complex signals of the MammoWave. Here, two vectors of variances (after PCA computation) have been selected from NF and WF breasts to study the magnitude of variance for selecting the number of PCs for the classification, as shown in Figure 4. Figure 4 shows the percentage of the total variance obtained from each PC for two different breast's S 21 , where first 80 PCs are found to be quantitatively significant. Hence, Figure 4a  The variance of PCs are close to each other in Figure 4. Two sample t-tests were constructed on the PCs to understand the capability to represent two signal groups and the data compactness, shown in Table 3. The probability has been found to be less than the significance level, p < α. Hence, the t-test accepts the alternative hypothesis H a , and clearly demonstrates the presence of two different means for two different populations. Subsequently, the difference between the lower and upper boundary (−1.770 × 10 −4 and −1.570 × 10 −4 ) reduced, which implies an improved data compactness over the prior result. An NF and WF signal classification has been performed by employing SV M RBF and varying the number of PCs from 80 to 40. The obtained results are shown in Figure 5. Figure 5a-c represents accuracy, sensitivity, and specificity, respectively, for applying a different number of PCs, where the x-axis represents the amount of training data used, and the y-axis describes the magnitude of the performance metric. The accuracy, sensitivity, and specificity have improved more than before, from 79.80% to 91%, 70.40% to 84.40%, and 86.30% to 97.20%. This is the optimal performance achieved, employing 50 PCs, and a further reduction of feature length (by 10 units of PCs) slopes down the classification metrics.

Discussion & Conclusions
The results demonstrate that a microwave breast imaging device (in this case Mam-moWave), when augmented by ML, could be employed to identify the presence of breast lesions with an accuracy of 91%, sensitivity > 84%, and specificity > 97%. Therefore, the augmentation of a non-ionizing and patient-comfort focused platform (MammoWave with ML) could be used to identify breast lesions in asymptomatic woman of any age and without any safety restrictions. This study comprises 61 breasts, of which 25 were NF and 36 were WF, from 35 patients participating for the feasibility clinical trial. Patients' pre-menstrual information was not considered. False negative cases have been found in some cases, particularly in the presence of small sized lesions (<10 mm). This issue will be addressed in our future work, modifying the conventional SV M RBF kernel structure and performing advanced research on feature representation. Also, current WF breasts includes both benign and malignant lesions; three-classes breasts classification will be adopted (i.e., no finding, begin finding, and malignant finding) in future. ML experiment will be continued with ongoing clinical trial data [30] for enhancement in decision making process and helping in breast lesion identification for asymptomatic women of any age and without any safety restrictions.
Author Contributions: G.T., L.S. and A.V. have designed and manufactured the UWB microwave non-ionising apparatus with associated signal processing techniques comprising MammoWave. Subjects have been recruited and screened through MammoWave. R.L. and M.D. (Michele Duranti) has conducted the conventional radiological investigation on the same subjects and confirmed the outcomes of UWB MammoWave experiment. S.P.R. analysed, interpreted the data for ML application, performed the ML algorithms, and made the draft. S.P.R. and M.D. (Maitreyee Dey) have analysed the prediction outcomes for automatic breast lesion detection through clinical UWB MammoWave. S.D. and G.T. have supervised the work and managed the experiments performed along with the co-authors at LSBU. S.D., G.T. and M.G. instigated the collaborative work on this paper between the teams at UBT, Perugia and LSBU. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The datasets that support the findings of this study are not publicly available, but will be made available upon reasonable request, following ethics committee approval and a data transfer agreement to guarantee the General Data Protection Regulation. Please contact the authors, Dr. Soumya Prakash Rana (Email: ranas11@lsbu.ac.uk, soumyaprakash.rana@gmail.com), or Dr. Gianluigi Tiberi (Email: tiberig@lsbu.ac.uk, gianluigi@ubt-tech.com) to request access to the data.