A Fair Performance Comparison between Complex-Valued and Real-Valued Neural Networks for Disease Detection

Our aim is to contribute to the classification of anomalous patterns in biosignals using this novel approach. We specifically focus on melanoma and heart murmurs. We use a comparative study of two convolution networks in the Complex and Real numerical domains. The idea is to obtain a powerful approach for building portable systems for early disease detection. Two similar algorithmic structures were chosen so that there is no bias determined by the number of parameters to train. Three clinical data sets, ISIC2017, PH2, and Pascal, were used to carry out the experiments. Mean comparison hypothesis tests were performed to ensure statistical objectivity in the conclusions. In all cases, complex-valued networks presented a superior performance for the Precision, Recall, F1 Score, Accuracy, and Specificity metrics in the detection of associated anomalies. The best complex number-based classifier obtained in the Receiving Operating Characteristic (ROC) space presents a Euclidean distance of 0.26127 with respect to the ideal classifier, as opposed to the best real number-based classifier, whose Euclidean distance to the ideal is 0.36022 for the same task of melanoma detection. The 27.46% superiority in this metric, as in the others reported in this work, suggests that complex-valued networks have a greater ability to extract features for more efficient discrimination in the dataset.


Introduction
Deep learning has marked a milestone in every field of science. Its application in medicine has spread quickly in recent years. Progress has been made from the augmented intelligence view [1], in which efforts focus on increasing the doctor's capacity to detect pathology patterns that are not easily visible to the human eye. Hence, developing algorithms that perform well is very important for the scientific community. Developing new structures is necessary to obtain better results to collaborate in disease detection with high confidence. Deep learning structures based on complex numbers [2] have gradually been developed; however, the theoretical mathematic support is limited, slowing the rapid evolution of algorithms of this type. On the other hand, in recent years, we have found works [3][4][5][6] with these new deep learning models being addressed in applications in which the input data maintain the information in magnitude and phase. This suggests that these novel algorithms have a greater ability to use the information analyzed [7,8]. Furthermore, we can generically express that complex-valued deep learning (CVDL) is a higher level of real-valued deep learning, as the latter can be considered a particular case when the imaginary part of the CVDL is zero. We have, therefore, focused on a fair comparison of real-value-based and complex-value-based classification structures to discover which one shows the best performance for the same task under similar conditions. It is important to highlight that the use of data in the real number domain at the input does not affect the performance of the structures being studied. Instead, this condition enables the observation Shizhen Hu, Seko Nagae, Akira Hirose [13] They prepared 7 different concentration samples and measured 30 times for each sample Glucose concentration estimation In this paper, an adaptive glucose concentration estimation system is proposed. The system estimates glucose concentration values non-invasively by making full use of transmission magnitude and phase data. The 60-80 GHz frequency band millimeter wave is chosen, and a single output neuron complex-valued neural network (CVNN) is built for adaptive concentration estimation. - The system shows a good generalization ability to estimate the concentration for unknown samples. It is effective in the estimation of the glucose concentration in the clinically practical range. - The mean squared error (MSE) for the CVNN is 0.011, while the MSE for the RVNN is 0.099. Joshua Bassey, Xiangfang Li, Lijun Qian [3] Used 167 publications Discuss the recent development of CVNNs A detailed review of various CVNNs in terms of activation function, learning and optimization, input and output representations, and their applications in tasks such as signal processing and computer vision are provided, followed by a discussion on some pertinent challenges and future research directions.
Complex-valued neural networks, compared to their real-valued counterparts, are still considered an emerging field and require more attention and action from the deep learning and signal processing research community.
Yang Ximei [6] A total of 5 radar data pre-processing approaches were implemented to generate dataset samples, including FFT and STFT Human-motion classification based on monostatic radar This thesis proposes three complex-valued convolutional neural networks (CNNs) for human-motion classification based on monostatic radar. The range-time, range-Doppler, range-spectrum-time, and time-frequency spectrograms of micro-Doppler signatures are adopted as the input to CVNNs with different plural-handled approaches. A series of experiments determine the optimal approach and data format that achieves the highest classification accuracy.  The proposed method outperformed the iterative POCS PF reconstruction method. It produced better artifact suppression and recovery of both image magnitude and phase details in the presence of local phase changes. Moreover, the network trained on axial brain data could reconstruct sagittal and coronal brain and knee data. As can be observed, complex-valued neural networks have been applied in health as well as other fields. In [8], the authors propose the use of a complex-valued neural network to classify complex fMRI data. They reached high accuracies, but they did not perform a comparison of their results with similar real-valued neural networks, and they did not use real-valued input data for their experiments. In [13], the authors used the CVDL to predict the glucose concentration estimation with a Mean Square Error (MSE) of 0.011. They did not study the behavior of this network for classification problems. In [3], the authors carried out a state-of-the-art study to show that complex-valued neural networks have special characteristics that make them powerful, while they underscore that their properties must be explored in much greater depth to use the ability of these algorithms that are emerging in deep learning. In [4][5][6]14], the authors proposed different applications of the CVDL, such as motion estimation, in-SAR and SAR radar complex data classification, channel state information for high-performance decode tasks, and others. However, they did not compare the performance between complex-valued and equivalent real-valued algorithms. In [4], the use of Fast Fourier Transform (FFT) is proposed to represent the data in the complex numbers domain. On the other hand, in [7], X-ray chest images were denoised using complex-valued neural networks, showing the high capacity of this kind of structure in health applications. In [15][16][17][18], the authors used the CVDL and complex-valued data to detect brain diseases using fMRI data, reaching outstanding performance. The key differentiating factor of this work compared with our proposed one is the nature of the input data, because we are using skin images and scalograms built from heart sounds. Moreover, they did not use raw real-valued data as input, and they did not compare the results with similar real-valued homologs structures. Based on the above, we have focused on our efforts to demonstrate the better performance of complex-valued deep learning compared with real-valued deep learning to solve real-valued health data classification problems. To carry out a fair comparison, we have used a similar number of trainable parameters to clarify that the power of these new algorithms is the consequence of the complex-number nature and not of the difference in the number of trainable parameters.

Materials
To achieve the objective of comparing complex-valued and real-valued structures, three clinical datasets of different types have been selected; ISIC2017 and PH2, related to melanoma pathology, and Pascal, associated with heart sounds. They are described below.

ISIC2017
The dataset is composed of 1995 images for deep learning analysis [19,20]. This dataset was published with a challenge [21] to researchers across the world to join forces to achieve good enough performance metrics to bring these models into production. This dataset is available at https://challenge.isic-archive.com/data/ (accessed on 1 February 2022). Figure 1 shows an example of the images contained in the dataset.
of the complex-number nature and not of the difference in the number of trainable eters.

Materials
To achieve the objective of comparing complex-valued and real-valued str three clinical datasets of different types have been selected; ISIC2017 and PH2, r melanoma pathology, and Pascal, associated with heart sounds. They are descr low.

ISIC2017
The dataset is composed of 1995 images for deep learning analysis [19,20]. taset was published with a challenge [21] to researchers across the world to join achieve good enough performance metrics to bring these models into producti dataset is available at https://challenge.isic-archive.com/data/ (accessed on 1 F 2022). Figure 1 shows an example of the images contained in the dataset.

PH2
The increase in cases of melanoma [22] has recently prompted the develop computer-assisted diagnostic systems to classify dermatoscopy images [23]. Fort the performance of such systems can be compared, as they can be evaluated on sets of images. Public databases are available to make a fair assessment of mult tems. We chose to use the PH2 dermatoscopy image database for this research. It manual segmentation, clinical diagnosis, and identification of several derma structures performed by expert dermatologists on a set of 200 images. The PH2 d is available free of charge for research and benchmarking purposes [24]. It can be at https://www.fc.up.pt/addi/ph2%20database.html, (accessed on 1 February 2022 2 shows an example of the images contained in the dataset.

PASCAL
The PASCAL database comprises 461 recordings for the classification sounds. Of these, 320 are normal sounds and 141 are abnormal/pathological soun hough the number of recordings is relatively large, the author's version of the art lished in Physiological Measurement [25] describes that the recordings last from 1 They also have a limited frequency range, under 195 Hz, due to the low pass filter

PH2
The increase in cases of melanoma [22] has recently prompted the development of computer-assisted diagnostic systems to classify dermatoscopy images [23]. Fortunately, the performance of such systems can be compared, as they can be evaluated on different sets of images. Public databases are available to make a fair assessment of multiple systems. We chose to use the PH2 dermatoscopy image database for this research. It includes manual segmentation, clinical diagnosis, and identification of several dermatoscopy structures performed by expert dermatologists on a set of 200 images. The PH2 database is available free of charge for research and benchmarking purposes [24]. It can be accessed at https:// www.fc.up.pt/addi/ph2%20database.html, (accessed on 1 February 2022) Figure 2 shows an example of the images contained in the dataset.

Materials
To achieve the objective of comparing complex-valued and real-valued str three clinical datasets of different types have been selected; ISIC2017 and PH2, r melanoma pathology, and Pascal, associated with heart sounds. They are descr low.

ISIC2017
The dataset is composed of 1995 images for deep learning analysis [19,20]. taset was published with a challenge [21] to researchers across the world to join achieve good enough performance metrics to bring these models into producti dataset is available at https://challenge.isic-archive.com/data/ (accessed on 1 F 2022). Figure 1 shows an example of the images contained in the dataset.

PH2
The increase in cases of melanoma [22] has recently prompted the develop computer-assisted diagnostic systems to classify dermatoscopy images [23]. Fort the performance of such systems can be compared, as they can be evaluated on sets of images. Public databases are available to make a fair assessment of mult tems. We chose to use the PH2 dermatoscopy image database for this research. It manual segmentation, clinical diagnosis, and identification of several derma structures performed by expert dermatologists on a set of 200 images. The PH2 d is available free of charge for research and benchmarking purposes [24]. It can be at https://www.fc.up.pt/addi/ph2%20database.html, (accessed on 1 February 2022 2 shows an example of the images contained in the dataset.

PASCAL
The PASCAL database comprises 461 recordings for the classification sounds. Of these, 320 are normal sounds and 141 are abnormal/pathological soun hough the number of recordings is relatively large, the author's version of the art lished in Physiological Measurement [25] describes that the recordings last from 1 They also have a limited frequency range, under 195 Hz, due to the low pass filter

PASCAL
The PASCAL database comprises 461 recordings for the classification of heart sounds. Of these, 320 are normal sounds and 141 are abnormal/pathological sounds. Although the number of recordings is relatively large, the author's version of the article published in Physiological Measurement [25] describes that the recordings last from 1 to 30 s. They also have a limited frequency range, under 195 Hz, due to the low pass filter applied. See Ph2 at www.peterjbentley.com/heartchallenge (accessed on 1 February 2022). Figure 3 shows an example of the scalograms obtained with the sounds from the dataset.  Figure  3 shows an example of the scalograms obtained with the sounds from the dataset.    Figure 4 shows the block diagram with the steps performed to achieve the research objective.

Dataset Preprocessing
The images were scaled to a standard dimension of 224 × 224 pixels and normalized based on the mean and standard deviation of the sets of pixels that form them. Equation (1) describes an executed mathematical procedure. (1) Equation (1). Image centered with respect to the mean and normalization based on division by the standard deviation.
It is remarkable that this stage of normalization is executed in the training stage and in the testing stage with the same values to reduce the distortion caused by data normalization. Scalograms were chosen as inputs for heart sounds. We have applied algorithms for automatic trimming and to obtain the scalogram through the wavelet transform. The used methods were published by Jojoa et al. in [26].    Figure 4 shows the block diagram with the steps performed to achieve the research objective.  Figure  3 shows an example of the scalograms obtained with the sounds from the dataset.    Figure 4 shows the block diagram with the steps performed to achieve the research objective.

Dataset Preprocessing
The images were scaled to a standard dimension of 224 × 224 pixels and normalized based on the mean and standard deviation of the sets of pixels that form them. Equation (1) describes an executed mathematical procedure. (1) Equation (1). Image centered with respect to the mean and normalization based on division by the standard deviation.
It is remarkable that this stage of normalization is executed in the training stage and in the testing stage with the same values to reduce the distortion caused by data normalization. Scalograms were chosen as inputs for heart sounds. We have applied algorithms for automatic trimming and to obtain the scalogram through the wavelet transform. The used methods were published by Jojoa et al. in [26].

Dataset Preprocessing
The images were scaled to a standard dimension of 224 × 224 pixels and normalized based on the mean and standard deviation of the sets of pixels that form them. Equation (1) describes an executed mathematical procedure.
Equation (1). Image centered with respect to the mean and normalization based on division by the standard deviation.
It is remarkable that this stage of normalization is executed in the training stage and in the testing stage with the same values to reduce the distortion caused by data normalization. Scalograms were chosen as inputs for heart sounds. We have applied algorithms for automatic trimming and to obtain the scalogram through the wavelet transform. The used methods were published by Jojoa et al. in [26].

Experiment Design
We found it necessary to design two-factor experiments in this stage of the research. The structure factor has two levels, which are complex-valued and real-valued. Moreover, the database factor has the ISIC2017, PH2, and PASCAL levels. Table 3 shows a summary of the experiment design. The F1 Score, Precision, Recall, Accuracy, and Specificity metrics were calculated for each factor combination in all the datasets studied.

Structure Factor
To make a fair comparison of the performance of real-valued and complex-valued networks, the design must be as similar as possible concerning the number of parameters and operations. This is achieved by obtaining two structures with the same layers and an equivalent number of trainable parameters.

Complex-Valued Structure
We aim to run this algorithm to extract the largest amount of information from the inputs in all proposed experiments, in other words, from all the floating-point tensor inputs that may or may not be images [27]. If we limit these values to integers in the interval [0, 255], they will match an image. We, therefore, decided to use different types of signals to observe the behavior for the different cases. We assumed that a transformation in the numerical domain could find useful information to improve the classification capacity of a deep learning model. This would be achieved with dimensional improvement (increasing dimensions and/or components) of the class separability index since it would allow the drawing of decision regions that would better separate the classes involved and, thus, the performance of the system. Based on this, the convolution network design is carried out from a finite impulse response (FIR) filter approach, whose coefficients are learned and belong to the proposed numerical set. Equation (2) specifies the convolution process in this numerical domain.
Equation (2). Complex convolution between Z and ZZ complex functions. It is noteworthy that the filters will execute operations in the complex number domain, whereby we have selected the Hilbert space as the space where the necessary complex operations will be performed [28]. It must also be considered that the dot product is the basis of forward propagation operations in deep learning. Furthermore, the average Pooling must [29] be defined in the numerical domain whenever it is the more intuitive that can be applied in the numerical set selected. The activation function that is used for this machine learning classification structure proposal is Complex Relu [26], which is described in Equation (3).
Once the basic operations needed to build the machine learning complex numberbased structure have been defined, the input data must be converted from its original real numeric nature to the complex number's domain. For this purpose, we decided to use the Fourier transform. However, to reduce the computational cost, we decided to use the Hermitian symmetry, which contains fewer parameters from the original Fourier matrix. This is shown in Equation (4).
Equation (4). Vandermonde matrix from the Fourier transform. All the above enable the possibility of a fair comparison of the results obtained. The convolution network structure in the complex number's domain is shown in Figure 5. Fourier transform. However, to reduce the computational cost, we decided to use the Hermitian symmetry, which contains fewer parameters from the original Fourier matrix. This is shown in Equation (4).
Equation (4). Vandermonde matrix from the Fourier transform. All the above enable the possibility of a fair comparison of the results obtained. The convolution network structure in the complex number's domain is shown in Figure 5. As can be observed, the network uses only convolutional layers, average Pooling, and Complex Relu. Based on this approach, we have built an equivalent convolution network in the real number's domain. This is shown in Figure 6. Furthermore, each one contains an equivalent number of parameters. In other words, correct conclusions are sought for the performance associated with the depth or the trainable number of parameters in the algorithms. Lastly, Table 4 shows the hyperparameters used for both structures involved in this study.  Table 5 shows the number of trainable parameters per layer. Focusing on maintaining an equivalent structure approach, we have attempted to ensure that the complex-valued network maintains at least half as many parameters as the real-valued As can be observed, the network uses only convolutional layers, average Pooling, and Complex Relu. Based on this approach, we have built an equivalent convolution network in the real number's domain. This is shown in Figure 6. Fourier transform. However, to reduce the computational cost, we decided to use the Hermitian symmetry, which contains fewer parameters from the original Fourier matrix. This is shown in Equation (4).
Equation (4). Vandermonde matrix from the Fourier transform. All the above enable the possibility of a fair comparison of the results obtained. The convolution network structure in the complex number's domain is shown in Figure 5. As can be observed, the network uses only convolutional layers, average Pooling, and Complex Relu. Based on this approach, we have built an equivalent convolution network in the real number's domain. This is shown in Figure 6. Furthermore, each one contains an equivalent number of parameters. In other words, correct conclusions are sought for the performance associated with the depth or the trainable number of parameters in the algorithms. Lastly, Table 4 shows the hyperparameters used for both structures involved in this study.  Table 5 shows the number of trainable parameters per layer. Focusing on maintaining an equivalent structure approach, we have attempted to ensure that the complex-valued network maintains at least half as many parameters as the real-valued Furthermore, each one contains an equivalent number of parameters. In other words, correct conclusions are sought for the performance associated with the depth or the trainable number of parameters in the algorithms. Lastly, Table 4 shows the hyperparameters used for both structures involved in this study. Similarly, Table 5 shows the number of trainable parameters per layer. Focusing on maintaining an equivalent structure approach, we have attempted to ensure that the complex-valued network maintains at least half as many parameters as the real-valued network. This was done to avoid the bias caused by the nature of two components, a real part and an imaginary part from the complex numbers.

Measurement and Cross-Validation
To compare the results obtained for each structure, we considered it necessary to measure several times in a repetitive manner in such a way that it allowed a comparative statistical analysis that eliminated subjectivity in the comparison process of the phenomenon. For this study, we decided to use a k folds cross-validation with k = 10 [30]. Based on theory, data normality and correlation tests should be performed [31] in order to apply a Student's t-test for comparison of means.

Hypothesis Test
The comparison of the performance of the used metrics from a statistical approach is important in the scientific method. Hence, the two-tailed Student's t-test [31] was chosen to observe whether there was sufficient statistical evidence to indicate that the means of the metrics calculated for the complex-valued networks are different from the means of the metrics calculated for the real-valued networks. Moreover, this test is used for its reliability with small samples [31].
Shapiro-Wilks [32]: The Student's t-test is highly sensitive to data normality. As a consequence, it is necessary to apply data normality tests before executing the Student's test. The Shapiro-Wilks test is based on the following hypotheses to be accepted or rejected, according to the p-value.

H0:
The data come from a normal distribution.

H1:
The data do not come from a normal distribution.
Student's t-test: comparison [31]: It is important to perform a means comparison test to observe the differences from a statistical approach, reducing the subjectivity that may appear in the comparison procedure. We decided to use the Student's t-test for F1, Precision, Recall, and Specificity mean metrics comparison. The Mann-Whitney U test was applied for Accuracy metric means comparison since this one did not accomplish the Shapiro-Wilks normality test. The hypotheses are accepted or rejected based on the p-value [32].

H0:
There is no statistical evidence to differentiate the means of the samples.

H1:
There is statistical evidence to differentiate the means of the samples.
We have used a confidence interval of 5% [31] for all the hypothesis tests in this study.

Results and Discussion
After building the convolution networks, we ran the experiments using Python 3.7 programming language, Complex-Pytorch framework development, and a GPU Nvidia RTX2080Ti. The code we used can be found in the following repository: https://github. com/mario42004/ComplexValuedDeepLearning, (accessed on 1 June 2022) We present the results for each of the case studies below. Table 6 shows the results obtained for 10 folds using the complex-valued convolution structure for melanoma detection in the set of dermatoscopy images from the ISIC2017 repository. We observe that the data meet the normality criteria to perform the Student's t-test in all cases.  Table 7 shows the results obtained for 10 folds using the real-valued convolution structure for the task of melanoma detection in the set of dermatoscopy images from the ISIC2017 repository. We observe that the data satisfy the normality criteria to perform the Student's t-test in all cases.  Table 8 shows the results obtained after having applied the Student's t-test. We underscore that hypothesis H0 was rejected with a least 5% significance level.  Table 9 shows the results obtained for 10 folds using the complex-valued convolution structure for the task of melanoma detection. The data set of dermatoscopy images was the PH2 repository. We observe that the data satisfy the normality criteria to perform the Student's t-test in almost all cases, except in Accuracy.  Table 10 shows the results obtained for 10 folds using the real-valued convolution structure for melanoma detection in the set of dermatoscopy images from the PH2 repository. We observe that the data satisfy the normality criteria to perform the Student's t-test in almost all cases, except in Accuracy. Table 11 shows the results obtained after having applied the Student's t-test. It should be noted that hypothesis H0 was rejected with a least 5% significance level for all the samples of metrics obtained except Accuracy. We can thus conclude that the complexvalued convolution network performs better than its real counterpart for the metrics used in the classification task of dataset PH2. Table 12 shows the results obtained for 10 folds using the complex-valued convolution structure for the task of detection of abnormality in the set of scalograms from the Pascal repository. We observe that the data satisfy the normality criteria to perform the later hypothesis test in almost all cases except for Accuracy. Table 13 shows the results obtained for 10 folds using the real-valued convolution structure for the task of abnormality detection in the set of scalograms obtained with the Pascal repository. We observe that the data satisfy the normality criteria to perform the Student's t-test in almost all cases, except in Accuracy. Table 11.   Table 14 shows the results obtained after having applied the Student's t-test. It should be noted that hypothesis H0 was rejected with a least 5% significance level for all the samples of metrics obtained except Accuracy. We can thus conclude that the complexvalued convolution network performs better than its real counterpart for the metrics used in the classification task for the Pascal dataset. As can be seen, the Student's t-test proves that the complex-valued network performs better, on average, for almost all the cases. For the Accuracy metric, the only one that did not accomplish the normality condition, we carried out the Mann-Whitney U test. The results are shown in Table 15 below: Conversely, we decided to draw the specificity and sensitivity/recall metrics to observe the behavior of the classifiers obtained in the Figure 7 ROC space.
As can be seen, the classifiers in the complex number's domain show a better performance. We achieved the best behavior in the ROC space for the classifier trained with the ISIC2017 dataset, with a Euclidean distance to the ideal coordinate classifier (0, 1) of 0.26127. Similarly, for the PH2 and Pascal datasets, better results were obtained with classifiers in the complex numbers domain, with distances of 0.31681 and 0.29447, respectively. It is remarkable that the use of the ADAM training algorithm, which was initially designed for real-valued networks, showed good behavior for training complex-valued networks, making the adaptation described in [33]. As can be seen, the classifiers in the complex number's domain show a better performance. We achieved the best behavior in the ROC space for the classifier trained with the ISIC2017 dataset, with a Euclidean distance to the ideal coordinate classifier (0, 1) of 0.26127. Similarly, for the PH2 and Pascal datasets, better results were obtained with classifiers in the complex numbers domain, with distances of 0.31681 and 0.29447, respectively. It is remarkable that the use of the ADAM training algorithm, which was initially designed for real-valued networks, showed good behavior for training complex-valued networks, making the adaptation described in [33].
Furthermore, the comparison which we propose in this study focuses on demonstrating the ability with simple structures so as not to deal with black box processes [34] that could cloud the objective comparison of these two classes of algorithms. However, we consider it important to assess the use of the transfer learning technique [35]. This would involve building complex-valued structures, which are deeper, and probably, will perform better than the results presented in this research work. In contrast, this would require higher computational costs and longer training times.
Lastly, we believe that the exponential growth of real-valued convolution networks was mainly a result of having access to big enough datasets to train the millions of parameters that form them. This can be verified with the Imagenet challenge [36]. Notwithstanding, the datasets that appeared were all defined in the real number set. Our opinion is that this could stop the growth and development of complex-valued networks, as there were no data that would allow experimentation with the algorithms in this numerical domain. We, therefore, decided to assess the potentiality of the complex-valued convolution networks with datasets in the real number domain, seeking to observe their capacity for this type of task [10].

Conclusions
Complex-valued networks show better performance for the F1 Score, Precision, Recall, and Specificity metrics in comparison to real-valued networks. This suggests higher potentiality for the classification of melanoma using dermatoscopy images. However, it should be noted that a change of numerical domain must be performed before the real- Furthermore, the comparison which we propose in this study focuses on demonstrating the ability with simple structures so as not to deal with black box processes [34] that could cloud the objective comparison of these two classes of algorithms. However, we consider it important to assess the use of the transfer learning technique [35]. This would involve building complex-valued structures, which are deeper, and probably, will perform better than the results presented in this research work. In contrast, this would require higher computational costs and longer training times.
Lastly, we believe that the exponential growth of real-valued convolution networks was mainly a result of having access to big enough datasets to train the millions of parameters that form them. This can be verified with the Imagenet challenge [36]. Notwithstanding, the datasets that appeared were all defined in the real number set. Our opinion is that this could stop the growth and development of complex-valued networks, as there were no data that would allow experimentation with the algorithms in this numerical domain. We, therefore, decided to assess the potentiality of the complex-valued convolution networks with datasets in the real number domain, seeking to observe their capacity for this type of task [10].

Conclusions
Complex-valued networks show better performance for the F1 Score, Precision, Recall, and Specificity metrics in comparison to real-valued networks. This suggests higher potentiality for the classification of melanoma using dermatoscopy images. However, it should be noted that a change of numerical domain must be performed before the realvalued inputs are processed. We made use of the Fourier transform, although it is not the only option available for this task.
In terms of trainable parameters, fair comparison opens the door to building deeper structures that may perform better than those that are presented in this study. It should be noted that the complex-valued networks are defined as even real-valued tensors, but the learning process is performed jointly and not independently.
Complex-valued convolution networks show a limitation from the point of view of theoretical contribution, which is in the scope of the operations that they can carry out in the complex numbers domain. It should be noted that not all the layers that are defined in a real-valued structure can be reproduced for the complex numbers domain. This causes those structures to be not completely comparable. This limitation arose due to the theoretical base that exists for real-valued networks. Further study of these novel algorithms is, however, required. The aim is to find layers that use the information the same way in real-valued structures as in complex-valued structures.
Similarly, complex-valued activation functions are limited in the Hilbert space as the Cauchy-Riemann conditions must be met for the entire space defined. This is a very relevant consideration if we intend to build a complex-valued gradient-based training algorithm. We, therefore, believe that deeper mathematical study is needed so that we can find holomorphic activation functions. This initially limits the possibilities, as only the constant function satisfies this property. The above limitation gives rise to a future alternative of building a non-gradient-based training algorithm [37] or the adaptation (as we have done in this research work) of a real-valued algorithm [33,38] to train complexvalued structures.
For future lines of research, real-valued and complex-valued networks need to be tested for data of a complex nature, i.e., a fair comparison to complement the approach presented herein. This may lead to the generalization of the ability of complex-valued convolution neural networks, transforming them into a universal algorithm for classification problems, regardless of the numerical nature of the input data.
The transfer learning technique must be addressed to work with deeper networks that show higher performance than the state of the art for the classification of the datasets used in this study. Nevertheless, hybrid approaches should equally be considered, i.e., layers totally in parallel, in the real number and complex number domains, which can perform simultaneous feature extractions that can improve the performance of the system studied with lower computational costs.
Once the desired model is obtained, it should be tested in different hospital scenarios. The validated model can be connected to a web interface through a cloud application. This approach will allow the healthcare staff to access easily, and thus they will have artificial intelligence support for the detection of melanoma and heart murmurs. This will be very useful in places with limited access to health care services or where the probability of the disease is high, and a quick and early diagnosis is required to prevent the evolution of the anomalies presented by patients of all ages.
In order to go deeper into the explanation of the high capacity of complex-valued deep learning, it is necessary to carry out a comprehensive study of the learned characteristics, using saliency maps, activation maps, gradient maps, and similar tools to understand in an intuitive way how this novel algorithm performs better than the real-valued deep learning. We consider it very important to highlight that the result of this process will be extensively different compared to the analysis of the real-valued deep learning due to the abstract nature of the complex numbers.
Although complex-valued algorithms are evolving, their use has not spread, as in the case of real-valued algorithms. We can categorically generalize that real-valued algorithms are a particular case of complex-valued algorithms when their imaginary part is zero. However, we consider it very important to know the performance of both approaches (algorithms) under similar operating conditions, i.e., with an equivalent amount of training parameters. Our purpose is to conclude on the higher ability of complex-valued algorithms to extract features, compared with the real-valued algorithms, increasing the discriminative ability in the classification task of the real-valued input data. In this way, we have eliminated the possible bias caused by the difference in the number of trainable parameters. Moreover, we have added a numerical domain conversion step (from real to complex) based on the Fourier transform, although, in the paper, we did not directly conclude the specific reason for its higher capacity for the task of melanoma and heart sound detection. However, our contribution focuses on the fact that we statistically evidenced the superiority of complex-valued convolution networks for the same classification task under equivalent conditions. Our work establishes a comparative precedent of the studied algorithms for disease detection applications on real-valued signals.