MR Images, Brain Lesions, and Deep Learning

Featured Application: This review provides a critical review of deep/machine learning algo ‐ rithms used in the identification of ischemic stroke and demyelinating brain diseases. It evalu ‐ ates their strengths and weaknesses when applied to real world clinical data. Abstract: Medical brain image analysis is a necessary step in computer ‐ assisted/computer ‐ aided diagnosis (CAD) systems. Advancements in both hardware and software in the past few years have led to improved segmentation and classification of various diseases. In the present work, we review the published literature on systems and algorithms that allow for classification, identification, and detection of white matter hyperintensities (WMHs) of brain magnetic resonance (MR) images, spe ‐ cifically in cases of ischemic stroke and demyelinating diseases. For the selection criteria, we used bibliometric networks. Of a total of 140 documents, we selected 38 articles that deal with the main objectives of this study. Based on the analysis and discussion of the revised documents, there is constant growth in the research and development of new deep learning models to achieve the high ‐ est accuracy and reliability of the segmentation of ischemic and demyelinating lesions. Models with good performance metrics (e.g., Dice similarity coefficient, DSC: 0.99) were found; however, there is little practical application due to the use of small datasets and a lack of reproducibility. Therefore, the main conclusion is that there should be multidisciplinary research groups to overcome the gap between CAD developments and their deployment in the clinical environment.


Introduction
There are estimated to be as many as a billion people worldwide [1] affected by peripheral and central neurological disorders [1,2]. Some of these disorders include brain tumors, Parkinson's disease (PD), Alzheimer's disease (AD), multiple sclerosis (MS), epilepsy, dementia, neuroinfectious, stroke, and traumatic brain injuries [1]. According to the World Health Organization (WHO), ischemic stroke and "Alzheimer disease with other dementias" are the second and fifth major causes of death, respectively [2].
Biomedical images give fundamental information necessary for the diagnosis, prognosis, and treatment of different pathologies. Hence, neuroimaging plays a fundamental role in understanding how the brain and the nervous system function [3] and discover how structural or functional anatomical alteration is correlated with different neurological disorders [4] and brain lesions. Currently, research on artificial intelligence (AI) and diverse techniques of imaging constitutes a crucial tool for studying the brain [5][6][7][8][9][10][11] and Diseases considered in this review: (a) ischemic stroke, which occurs when a vessel in the brain is blocked; (b) demyelinating disease, which is the loss of the myelin layer in the axons of neurons; and (c) white matter hyperintensities (WMHs) of ischemic stroke and demyelination, as shown by the magnetic resonance imaging-fluid attenuated inversion recovery (MRI-FLAIR) modality, which shows that without an expert, it is difficult to distinguish one disease from another because of their similarities in WMHs.
A stroke occurs when the blood flow to an area of the brain is interrupted [21,23]. There are three types of ischemic stroke according to the Bamford clinical classification system [24]: (1) partial anterior circulation syndrome (PACS), where the middle/anterior cerebral regions are affected; (2) lacunar anterior circulation syndrome (LACS), where the occlusion is present in vessels that provide blood to the deep-brain regions; and (3) total anterior circulation stroke (TACS), when middle/anterior cerebral regions are affected due to a massive brain stroke [24,25]. Ischemic stroke is a common cerebrovascular disease [1,26,27] and one of the principal causes of death and disability in low-and middle-income countries [1,4,6,7,[27][28][29]. In developed countries, brain ischemia is responsible for 75-80% of strokes, and 10-15% are attributed to a hemorrhagic brain stroke [4,25].
A demyelination disease is described as the loss of myelin with relative preservation of axons [8,22,29]. Love [22] notes that there are demyelinating diseases in which axonal degeneration occurs first and the degradation of myelin is secondary [7,22]. Accurate diagnosis classifies the demyelinating diseases of the central nervous system (CNS) according to the pathogenesis into "demyelination due to inflammatory processes, demyelination caused by developed metabolic disorders, viral demyelination, hypoxic-ischemic forms of demyelination and demyelination produced by focal compression" [22].
The inflammatory demyelination of the CNS is the principal cause of the common neurological disorder multiple sclerosis (MS) [8,19,20,22,30], which affects the central nervous system [31] and is characterized by lesions produced in the white matter (WM) [32] of the brain and affects nearly 2.5 million people worldwide, especially young adults (ages 18-35 years) [4,30,31].
The detection, identification, classification, and diagnosis of stroke is often based on clinical decisions made using computed tomography (CT) and MRI [33]. Using MRI, it is possible to detect the presence of little infarcts and assess the presence of a stroke lesion in the superficial and deep regions of the brain with more accuracy. This is because the area of the stroke region is small and is clearly visible in MR images compared to CT [4,21,25,26,28,34]. The delimitation of the area plays a fundamental role in the diagnosis since it is possible to misdiagnose stroke as other disorders [35,36], e.g., glioma lesions and demyelinating diseases [19,20].
For identifying neurological disorders like stroke and demyelinating disease, the manual segmentation and delineation of anomalous brain tissue is the gold standard for lesion identification. However, this method is very time consuming and specialist experience dependent [25,37], and because of these limitations, automatic detection of neurological disorders is necessary, even though it is a complex task because of data variability, e.g., in the case of ischemic stroke lesions, data variability could include the lesion shape and location, and factors like symptom onset, occlusion site, and patient differences [38].
This critical review summarizes the literature on deep learning and machine learning techniques in the processing, segmentation, and detection of features of WMHs found in ischemic and demyelinating diseases in brain MR images.
The principal research questions asked here are:  Why is research on the algorithms to identify ischemia and demyelination through the processing of medical images important?  What are the techniques and methods used in developing automatic algorithms for detection of ischemia and demyelinating diseases in the brain?  What are the performance metrics and common problems of deep learning systems proposed to date?
This paper is organized as follows. Section 2 gives an outline of the literature review selection criteria. Section 3 describes the principal machine learning and deep learning methods used in this application, Section 4 summarizes the principal constraints and common problems encountered in these CAD systems, and we conclude Section 5 with a brief discussion.

The Literature Review: Selection Criteria
The literature review was conducted using the recommendations given by Khan et al. [52], the methodology proposed by Torres-Carrión [53,54], and the protocol proposed by Moher et al. [55]. The preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram [55] is shown in Figure 2. We generated and analyzed bibliometric maps and identified clusters and their reference networks [56,57]. We also used the methods given in [58,59] to identify the strength of the research, as well as authors and principal research centers that work with MR images and machine/deep learning for the identification of brain diseases.
To conduct an appropriate search, it is important to focus our attention on the real context of the research, a method proposed by Torres-Carrión [54], the so-called conceptual mindfact (mentefacto conceptual), which can be used to organize the scientific thesaurus of the research theme [53]. Figure 3 describes the conceptual mindfact used in this work to focus and constrain the topic to MRI Brain Algorithm Difference Ischemic and Demyelinating Diseases and obtain an adequate semantic search structure of the literature in the relevant scientific databases.

Figure 3.
Conceptual mindfact (mentefacto conceptual) according to [53,54]. This allows the key word identification for a systemic search of the literature in scientific databases. Table 1 presents the semantic search structure [54] such as the input of the searchspecific literature (documents) in the scientific databases. The first layer is an abstraction of the conceptual mindfact; the second corresponds to the specific technicality, namely brain processing; the third level is relevant to the application, namely ischemic and demyelinating diseases. The fourth level is the global semantic structure search. Table 1. Key words used in the global semantic structure search.
The global semantic structure search ( Figure 2) resulted in 140 documents related to the central theme of this work. Figure 4 shows the evolution of the number of publications and the type (article, conference paper, and review) of the 140 documents from 2001 to December 2020. The first article related to the area of the study was published in 2001, and there has been a significant increase in the number of publications in the past three years, 2018 (21), 2019 (30), and 2020 (until 1 December; 33). Figure 4 also shows that journal articles predominate (99), followed by conference papers (27) and, finally, review articles (9). The first reviews were published in 2012 (2), followed by 2013 (1), 2014 (1), 2015 (1), and 2020 (4). Other five documents published correspond to conference reviews (3), an editorial (1), and a book chapter (1).  (99), followed by conference proceedings (27) and, finally, review articles (9). Other five documents published correspond to conference reviews (3), an editorial (1), and a book chapter (1). The reviews were published in 2012 (2) [11,66,67] from Harvard Medical School, Boston, United States, has published more documents (7) related to the research area of this review, and this correlates with his publication record as documented in the Scopus database related to ischemic stroke. it can be seen that Dr. Ona Wu [11,66,67] from Harvard Medical School, Boston, United States, has published more documents (7) related to the research area of this review. To analyze and answer the three central research questions of this work, the global search of the 140 documents was further refined. This filter complied with the categories given by Fourcade and Khonsari [68], which were applied only to "article" documents. These criteria were:  Aim of the study: ischemia and demyelinating disease processing by MRI brain images, identification, detection, classification, or differentiation According to the second selection criterion, we found 38 documents to include in the analysis of this work that also were related to and in agreement with the items described above.
For analysis, we used VOSviewer software version 1.6.15 [69] in order to construct and display bibliometric maps. The data used for this objective were obtained from Scopus due to its coverage of a wider range of journals [56,70].
In terms of citations and the countries of origin of these publications ( Figure 6), we observed that the United States has a large number of citations, followed by Germany, India, and the United Kingdom. This relationship was determined by the analysis of the number of citations in the documents generated by country, in agreement with the affiliation of the authors (primary authors), and for each country the total strength of the citation link [58]. The minimum number of documents of any individual country was five, and the minimum number of citations a country received was one. The countries were determined by the first author's affiliation. In the map, the density of yellow color in each country indicates the number of citations: The United States has a large number of citations in its documents, followed up Germany, India, and the United Kingdom. Figure 7 shows the network of documents and citations, and this map relates the different connections between the documents through the citations. The scale of the colors (purple to yellow) indicates the number of citations received per document, together with the year of publication, and the diameter of the points shows the normalization of the citations according to Van Eck and Waltman [59,71]. The purple points are the documents that have less than 10 citations, and yellow represents documents with more than 60 citations. Citation map between documents generated in VOSviewer [57]. The scale of the colors (purple to yellow) indicates the number of citations per document, and the diameter of the points shows the normalization of the citations according to Van Eck and Waltman [71]. The purple points are the documents that had less than 10 citations, and the yellow points represent documents with more than 60 citations.
In Table 2, we list the 10 most cited articles according to the normalization of the citations [71]. Waltman et al. [58] state that "the normalization corrects for the fact that older documents have had more time to receive citations than more recent documents" [58,69]. In addition, Table 2 shows the dataset, methodology, techniques, and metrics used to develop and validate the algorithm or CAD systems proposed by these authors.
In bibliometric networks or science mapping, there are large differences between nodes in the number of edges they have to other nodes [57]. To reduce these differences, VOSviewer uses association strength normalization [71], that is, a probabilistic measure of co-occurrence data.
Association strength normalization was discussed by Van Eck and Waltman [71], and here we construct a normalized network [57] in which the weight of an edge between nodes i and j is given by: where is also known as the similarity of nodes i and j, , ) denotes the total weight of all edges of node i (node j), and m denotes the total weight of all edges in the network [57].
For more information related to normalization, mapping, and clustering techniques used by VOSviewer, the reader is referred to the relevant literature [57,69,71].
From Table 2, it can be seen that articles that are cited often deal with ischemic stroke rather than demyelinating disease. The methods and techniques used were support vector machine (SVM) [72], random forest (RF) [38], classical algorithms of segmentation like the watershed (WS) algorithm [73], and techniques of deep learning such as convolutional neural networks (CNNs) [42,74], as well as their combinations: SVM-RF [28] and CNN-RF [26,75]. Table 2. List of the 10 most cited articles according to the normalization of the citations [58]. This table also shows the central theme of research, the type of image, and the methodology used in the processing.

Machine Learning/Deep Learning Methods in the Diagnosis of Ischemic Stroke and Demyelinating Disease
In the following subsections, we discuss how artificial intelligence (AI) through ML and DL methods is used in the development of algorithms for brain disease diagnosis and their relation to the central theme of this review.

Machine Learning and Deep Learning
The definitions of machine learning and deep learning are sub-fields of artificial intelligence (AI). AI is defined as the ability for a computer to imitate the cognitive abilities of a human being [68]. There are two different general concepts of AI: (1) cognitivism related to the development of rule-based programs referred to as expert systems and (2) connectionism associated with the development of simple programs educated or trained by data [68,81]. Figure 8 presents a very general timeline of the evolution of AI and the principal relevant facts related to the field of medicine. In addition, all applications of AI to medicine and health are not covered, e.g., ophthalmology, where AI has had tremendous success (see [82][83][84][85][86][87]).

Machine learning Methods
Machine learning (ML) can be considered as a subfield of artificial intelligence (AI). Lundervold and Lundervold [16] and Noguerol et al. [90] state that the main aim of ML is to develop mathematical models and computational algorithms with the ability to solve problems by learning from experiences without or with the minimum possible human intervention; in other words, the model created will be able to be trained to produce useful outputs when fed input data [90]. Lakhani et al. [91] state that recent studies demonstrate that machine learning algorithms give accurate results for the determination of study protocols for both brain and body MRI.
Machine learning can be classified into (1) supervised learning methods (e.g., support vector machine, decision tree, logistic regression, linear regression, naive Bayes, and random forest) and (2) unsupervised learning methods (K-means, mean shift, affinity propagation, hierarchical clustering, and Gaussian mixture modeling) [92] (Figure 9). Support vector machine (SVM): This is an algorithm used to classify and perform regression and clustering. An SVM is driven by a linear function similar to logistic regression [93] but with the difference that the SVM only outputs class identities and does not provide probabilities. (3) An SVM classifies between two classes by constructing a hyperplane in high-dimensional feature space [94]. The class identities are positive or negative when Equation (3) is positive or negative, respectively. For the optimal separation of the hyperplane between classes, the SVM uses different kernels (dot products) [95,96]. More information and details of SVMs are given in the literature [93][94][95][96].
k-Nearest neighbor (k-NN): The k-NN is a non-parametric algorithm (it means no assumption about the underlying data distribution) and can be used for classification or regression [93,97]. The k-NN is based on the measure of the Euclidean distance (distance function) and a voting function in k nearest neighbors [98], given N training vectors. The value of k (the number of nearest neighbors) decides the classification of the points between classes. The k-NN has the following basic steps: (1) calculate the distance, (2) find the closest neighbors, and (3) vote for labels [97]. More details of the k-NN algorithm can be found in references [93,98,99]. Programming libraries such as Scikit-Learn have algorithms for the k-NN [97]. The k-NN has higher accuracy and stability for MRI data but is relatively slow in terms of computational time [99]. As an aside, it is interesting to note that the nearest-neighbor formulation might have been first described by the Islamic polymath Ibn al Haytham in his famous book Kitab al Manazir (The Book of Optics, [100]) over a 1000 years ago.
Random forest (RF): This technique is a collection of classification and regression trees [101]. Here, a forest of classification trees is generated, where each tree is grown on a bootstrap sample of the data [102]. In that way, the RF classifier consists of a collection of binary classifiers where each decision tree casts a unit vote for the most popular class label (see Figure 9d) [103]. More information is given elsewhere [104].

k-Means clustering (k-means):
The k-means clustering algorithm is used for segmentation in medical imaging due to its relatively low computational complexity [105,106] and minimum computation time [107]. It is an unsupervised algorithm based on the concept of clustering. Clustering is a technique of grouping pixels of an image according to their intensity values [108,109]. It divides the training set into k different clusters of examples that are near each other [93]. The properties of the clustering are measures such as the average Euclidean distance from a cluster centroid to the members of the cluster [93]. The input data for use with this algorithm should be numeric values, with continuous values being better than discrete values, and the algorithm performs well when used with unlabeled datasets.

Deep Learning Methods
Deep learning (DL) is a subfield of ML [110] that uses artificial neural networks (ANNs) to develop decision-making algorithms [90]. Artificial neural networks are neural networks that employ learning algorithms [111] and infer rules for learning. To do so, a set of training data examples is needed. The idea is derived from the concept of the biological neuron ( Figure 9e). An artificial neuron receives inputs from other neurons, integrates the inputs with weights, and activates (or fires in the language of biology) when a pre-defined condition is satisfied [92]. There are many books describing ANNs; see, for example, [93]. . Graphical representation of some machine learning (ML) algorithms and the representations of an artificial neural network (ANN) and a DL neural network: (a) corresponds to the k-nearest neighbor (k-NN) algorithm and a representation of k = 5 (the number of nearest neighbors); (b) represents the k-means clustering algorithm, also represented by k = 2 clusters, with the blue circle representing the cluster centroid; (c) is the representation of the support vector machine (SVM) algorithm with the optimal separation by a hyperplane between classes; (d) corresponds to a random forest (RF) algorithm and represents a forest of classification trees; and finally (e) represents the similarity between concepts used between an artificial neuron and a true neuron with inputs and outputs. It also shows the architecture of an ANN and a DL neural network, where IL is the input layers, HL the hidden layers, and OL the output layer. This figure was created and adapted using references [112][113][114].
The fundamental unit of a neural network is the neuron, which has a bias w0 and a weight vector w = (w1, ..., wn) as parameters θ = (w0, ..., wn) to model a decision using a nonlinear activation function h(x) [115].
The activation functions commonly used are: sign(x), sigmoid , and tanh(x): An interconnected group of nodes comprise the ANN, with each node representing a neuron arranged in layers [16] and the arrow representing a connection from the output of one neuron to the input of another [103]. ANNs have an input layer, which receives observed values, and an output layer, which represents the target (a value or class), and the layers between input and output layers are called hidden layers [92].
There are different types of ANNs [116], and the most common types are convolutional neural networks (CNNs) [117], recurrent neural networks (RNNs) [118], long shortterm memory (LSTM) [119], and generative adversarial networks (GANs) [120]. In practice, these types of networks can be combined [116] between themselves and with classical machine learning algorithms. CNNs are most commonly used for the processing of medical images because of their success in processing and recognition of patterns in vision systems [49].
CNNs are inspired by the biological visual cortex and also called multi-layer perceptrons (MLPs) [49,121,122]. An MLP consists of a stack of layers: convolutional, max pooling, and fully connected layers. The intermediate layer is fed by the output of the previous layer, e.g., the convolutional layer creates feature maps of different sizes, and the pooling layers reduce the sizes of the feature maps to be fed to the following layers. The final fully connected layers produce the specified class prediction at the output [49]. The general CNN architecture is presented in Figure 10. There is a compromise between the number of neurons in each layer, the connection between them, and the number of layers with the number of parameters that defines the network [49]. Table 3 presents a summary of the principal structures of a CNN and the commonly used DL libraries.  The dense convolutional network (DenseNet) connects each layer to every other layer in a feed-forward fashion.

SENets
Squeeze-and-Excitation Networks [137] Model interdependencies between channels of the relationship of features used in a traditional CNN NASNet Neural Architecture Search Network [138] Authors propose to search an architectural building block on a small dataset and then transfer the block to a larger dataset. YOLO You Only Look Once [139] A unified model for object detection GANs Generative Adversarial Networks [120] Framework for estimating generative models via adversarial nets Siamese nets Siamese Networks [140] A Siamese neural network is a class of neural network architectures that contain two or more identical subnetworks. The word identical here means that they have the same configuration with the same parameters and weights.
U-Net U-Net: Convolutional Networks for Biomedical Image Segmentation [141] The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. It is a Python front end to the Torch computational engine. It is an integration of Python with the Torch engine. Performance is higher than Torch with GPU integration facility.

Computer-Aided Diagnosis in Medical Imaging (CADx System)
Computer-aided diagnosis has its origins in the 1980s at the Kurt Rossmann Laboratories for Radiologic Image Research in the Department of Radiology at the University of Chicago [153]. The initial work was on the detection of breast cancer [35,153,154], and the reader is referred to a recent review [155].
There has been much research and development of CADx systems using different modalities of medical images. CAD is not a substitute for the specialist but can assist or be an adjunct to the specialist in the interpretation of the images [40]. In other words, CADx systems can provide a second objective opinion [89,99] and make the final disease decision from image-based information and the discrimination of lesions, complementing a radiologist's assessment [123].
CAD development takes into consideration the principles of radiomics [45,[156][157][158][159][160]. The term radiomics is defined as the extraction and analysis of quantitative features of medical images-in other words, the conversion of medical images into mineable data with high fidelity and high throughput for decision support [45,156,157]. The medical images used in radiomics are obtained principally with CT, PET, or MRI [45].
The steps that are utilized by a CAD system consist of [45] (a) image data and preprocessing, (b) image segmentation, (c) feature extraction and qualification, and (d) classification. In general, the stage of feature extraction could be changed depending on the techniques used to extract the feature (ML or DL algorithms) [161].

Image Data
The dataset is the principal component to develop an algorithm because it is the nucleus of the processing. Razzak et al. [145] state that the accuracy of diagnosis of a disease depends upon image acquisition and image interpretation. However, Shen et al. [121] add a caveat that the image features obtained from one method need not be guaranteed for other images acquired using different equipment [121,162,163]. For example, it has been shown that the methods of image segmentation and registration designed for 1.5-Tesla T1-weighted brain MR images are not applicable to 7.0-Tesla T1-weighted MR images [43,57,58].
There are different datasets of images for brain medical image processing. In the case of stroke, the most famous datasets used are the Ischemic Stroke Lesion Segmentation (ISLES) [26,75] and Anatomical Tracings of Lesions After Stroke (ATLAS) datasets [164]. For demyelinating disease, there is not a specific dataset, but datasets for multiple sclerosis are often used, e.g., MS segmentation (MSSEG) [165]. Table 4 lists the datasets that have been used in the publications under consideration in this review.

Image Preprocessing
There are several preprocessing steps necessary to reduce noise and artifacts in the medical images, which should be performed before the segmentation [40,166,167].
The preprocessing steps commonly used are (1) grayscale conversion and image resizing [167] to get better contrast and enhancement; (2) bias field correction to correct the intensity inhomogeneity [30,166]; (3) image registration, a process for spatial alignment [166]; and (4) removal of nonbrain tissue such as fat, skull, or neck, which has intensities overlapping with intensities of brain tissues [27,166,168].

Image Segmentation
In simple terms, image segmentation is the procedure of separating a digital image into a different set of pixels [37] and is considered the most fundamental process as it extracts the region of interest (ROI) through a semiautomatic or automatic process [176].
It divides the image into areas according to a specific description to obtain the anatomical structures and patterns of diseases.
Despotovíc et al. [166] and Merjulah and Chandra [37] indicate that the principal goal of medical image segmentation is to make things simpler and transform it "into a set of semantically meaningful, homogeneous, and nonoverlapping regions of similar attributes such as intensity, depth, color, or texture" [166] because the segmentation assists doctors to diagnose and make decisions [37].
To evaluate, validate, and measure the performance of every automated lesion segmentation methodology compared to expert segmentation [177], one needs to consider the accuracy (evaluation measurements) and reproducibility of the model [178]. The evaluation measurements compare the output of segmentation algorithms with the ground truth on either a pixel-wise or a volume-wise basis [5].
The accuracy is related to the grade of closeness of the estimated measure to the true measure [178], and for that, four situations are possible: true positives (TPs) and true negatives (TNs), where the segmentation is correct, and false positives (FPs) and false negatives (FNs), where there is disagreement between the two segmentations.
The most commonly used metrics to evaluate the automatic segmentation accuracy, quality, and strength of the model are [179]:  Precision: Is the measure of over-segmentation between 0 and 1, and it means the proportion of the computed segmentation that overlaps with the reference segmentation [179,180]. This also is called the positive predictive value (PPV), with a high PPV indicating that a patient identified with a lesion does actually have the lesion [182].
 Recall, also known as sensitivity: Gives a metric between 0 and 1. It is a sign of oversegmentation, and it is a measure of the amount of the reference segmentation that overlaps with the computed segmentation [179,180].
The metrics of overlap measures that are less often used the sensitivity, specificity (measures the portion of negative voxels in the ground-truth segmentation [183]), and accuracy, which, according to García-Lorenzo et al. [178] and Taha and Hanbury [183], should be considered carefully because these measures penalize errors in small segments more than in large segments. These are defined as: Average symmetric surface distance (ASSD, mm): Represents the average surface distance between two segmentations (computed and reference and vice versa) and is an indicator of how well the boundaries of the two segmentations align. The ASSD is measured in millimeters, and a smaller value indicates higher accuracy [75,177,180]. The average surface distance (ASD) is given as: where , is a 3D matrix consisting of the Euclidean distances between the two image volumes X and Y, and the ASSD is defined as [177]:  Hausdorff's distance (HD, mm): It is more sensitive to segmentation errors appearing away from segmentation frontiers than the ASSD [180]. The Hausdorff measure is an indicator of the maximal distance between the surfaces of two image volumes (the computed and reference segmentations) [26,180]. The HD is measured in millimeters, and like the ASSD, a smaller value indicates higher accuracy [177].
where and are points of lesion segmentations and , respectively, and , is a 3D matrix consisting of all Euclidean distances between these points [177].  Intra-class correlation (ICC): Is a measure of correlation between volumes segmented and ground-truth lesion volume [180].  Correlation with Fazekas score: A Fazekas score is a clinical measure of the WMH, comprising two integers in the range [0, 3] reflecting the degree of a periventricular WMH and a deep WMH, respectively [180].  Relative volume difference (VD, %): It measures the agreement between the lesion volume and the ground-truth lesion volume. A low VD means more agreement [182]. (16) where and are segmented and ground-truth lesion volumes, respectively. Lastly, we define [178] reproducibility, which is a measure of the degree of agreement between several identical experiments. Reproducibility guarantees that differences in segmentations as a function of time result from changes in the pathology and not from the variability of the automatic method [178]. Tables 2 and 5 tabulate databases, modalities, and the evaluation measures reported in the literature. CBF: cerebral blood flow; ADC: apparent diffusion coefficient; SVD: small-vessel disease; TACS: total anterior circulation stroke syndrome; PACS: partial anterior circulation stroke syndrome; LACS: lacunar stroke syndrome; SISS: sub-acute ischemic stroke segmentation; SPES: stroke penumbra estimation; FROC: free-response receiving operating characteristic; COMBAT Stroke: computer-based decision support system for Thrombolysis in stroke.

Feature Extraction
An ML or DL algorithm is often a classifier [148] of objects (e.g., lesions in medical images). Feature selection is a fundamental step in the processing of medical images, and especially, it allows us to research which features are relevant for the specific classification problem of interest, and also it helps to get higher accuracy rates [47].
The task of feature extraction is complex due to the task of determining an algorithm that can extract a distinctive and complete feature representation, and for that principal reason, it is very difficult to generalize and implies that one has to design a featurization method for every new application [115]. In DL, this process is also called hand-crafting features [115].
The classification is related to the extracted features that are entered as input to an ML model [148], while a DL algorithm model uses pixel values in images directly as input information instead of features calculated from segmented objects [148].
In the case of processing stroke with CNNs, the featurization of the images is a key application [75,124] and depends on the signal-to-noise ratio in the image, which can be improved by target identification via segmentation to select regions of interest [124]. According to Praveen et al., [193], a CNN learns to discriminate local features and returns better performance than hand-crafted features.
Texture analysis is a common technique in medical pattern recognition tasks to determine the features, and for that, one uses second-order statistics or co-occurrence matrix features [45]. Mitra et al. [182] indicate that they derive local features, spatial features, and context-rich features from the input MRI channels.
It is clear that currently, DL algorithms, especially those that use a combination of CNNs and machine learning classifiers, produce a marked transformation [197] in the featurization and segmentation in medical image processing [16,124]. CNNs have high utility in tasks like identification of compositional hierarchy features and low-level features (e.g., edges), specific pattern forms, and development of intrinsic structures (e.g., shapes, textures) [5], as well as spatial feature generation from an n-dimensional array of basically any arbitrary size [43,144], e.g., the U-Net model proposed by Ronneberger et al. [141], which employs parameter sharing between encoder-decoder paths for incorporating spatial and semantic data that allow better segmentation performance [179]. Based on the U-Net model, currently there are novel variants of U-Net designs. For example, Bamba et al., [198] used a U-net architecture with 3D convolutions that allow the use of an attention gate for the decoder to suppress unimported parts of the input, while emphasizing the relevant features. There is considerable room for improvement and innovation (e.g., [199]).
The process of converting a raw signal into a predictor (automatization of the featurization) constitutes an advantage of the DL methods over others, which is useful when there are large volumes of data of uncertain relationship to an outcome [124], e.g., the featurization of acute stroke and demyelinating diseases.

ML and DL Classifiers Applied to Diagnosis of Ischemia and Demyelinating Diseases
In this subsection, we discuss the different classifiers that have been utilized in the literature. Additional details such as datasets and the measure metrics of the algorithms and the tasks are presented in Tables 2 and 5.
Different studies [21,72,92,189] related to stroke (see Table 5 and Figure 1) and its different types use, principally, classifiers of ML to determine the properties of the lesion. The classifiers most commonly used are the SVM and random forest (RF) [189].
According to Lee et al. [189], the RF has some advantages over the SVM because the RF can be trained quickly and provides insight into the features that can predict the target outcome [189]; in addition, the RF can automatically perform the task of feature selection and provide a reliable feature importance estimate. Additionally, the SVM is effective only in cases where the number of samples is small compared to the number of features [92,189]. Along similar lines, Subudhi et al. [28] reported that the RF algorithm works better when one has a large dataset, and it is more robust when there are a higher number of trees in the decision-making process; they reported an accuracy of 93.4% and a DSC index of 0.94 in their study.
Huang et al. [72] presented results that predict ischemic tissue fate pixel by pixel based on multi-modal MRI data of acute stroke using a flexible support vector machine algorithm [72]. Nazari-Farsani et al. [33] proposed an identification of ischemic stroke through the SVM with a linear kernel and cross-validation folder with an accuracy of 73% using a private dataset of 192 patient scans, while Qiu et al. [184] with a private dataset of 1000 patients for the same task used only the random forest (RF) classifier and obtained an accuracy of 95%.
The combination of the traditional classifier like the SVM and RF with a CNN show better results. For example, [38,72,193] report values of the DSC between 0.80 and 0.86. Melingi and Vivekanand [167] reported that through a combination of kernelized fuzzy C-means clustering and an SVM, they achieved an accuracy of 98.8% and sensitivity of 99%.
A method for detecting stroke presence using the SVM and feed-forward backpropagation neural network classifiers is presented in [21]. For extraction of the features of the segmentation of the stroke region, k-means clustering was used along with an adaptive neuro fuzzy inference system (ANFIS) classifier, since the other two methods failed to detect the stroke region in low-edge brain images, resulting in an accuracy and precision of 99.8% and 97.3%, respectively.
The different developments in architectures of DL models contribute to better evaluation and segmentation results. For example, Kumar et al. [179] proposed a combination of U-Net and fractal networks. Fractal networks are based on the repetitive generation of self-similar objects and ruling out of residual connections [134,179]. They reported on sub-acute stroke lesion segmentation (SISS) and acute stroke penumbra estimation (SPES) using a public database (ISLES 2015, ISLES 2017), with an accuracy of 0.9908 and a DSC of 0.8993 for SPES and corresponding values of accuracy of 0.9914 and a DSC of 0.883 for SISS. Clèrigues et al. [190] with the same public database and tasks proposed the uses of a U-Net-based 3D CNN architecture and 32 filters and obtained values of DSC as 0.59 for SISS and 0.84 for SPEC.
Multiple sclerosis (MS) is characterized by the presence of white matter (WM) lesions and constitutes the most common inflammatory demyelinating disease of the central nervous system [8,200,201] and for that reason is often confused with other pathologies, since the key for that is the determination and characterization of the WMHs. Guerrero et al. [125], using CNNs with a u-shaped residual network architecture (uResNet) with the principal task of differentiating the WMHs, found DSC values of 69.5 for WMHs and 40.0 for ischemic stroke.
Mitra et al. [182] in their work of lesion segmentation also presented differentiation of ischemic stroke and MS through the analysis of WMHs and reported a DSC of 0.60 while using only the classical RF classifier. Similar work by Ghaforian et al. [19] but with the central aim of determining WMHs that correspond to cerebral small-vessel disease (SVD) reported a sensitivity of 0.73 with 28 false positives using a combination of Ada-Boost and RF algorithms.

Common Problems in Medical Image Processing for Ischemia and Demyelinating Brain Diseases
This section presents a brief summary of some common problems found in the processing of ischemia and demyelinating disease images.

The Dataset
The availability of large datasets is a major problem in medical imaging studies, and there are few datasets related to specific diseases [27]. The lack of datasets is a challenge since deep learning methods require a large amount of data for training, testing, and validation [33].
Another major problem is that even though algorithms for ischemic stroke segmentation in MRI scans have been (and are) intensively researched, the reported results in general do not allow us to establish a comparative analysis due to the use of different databases (private and public) with different validation schemes [35,40].
The Ischemic Stroke Lesion Segmentation (ISLES) challenge was designed to facilitate the development of tools for the segmentation of stroke lesions [26,75,124]. The Ischemic Stroke Lesion Segmentation (ISLES) group [26,75] has a set of stroke images, but there is a need to enrich the dataset with clinical information (annotations) in order to get better performance with CNNs.
Another problem with the datasets is the need for accurately labeled data [43]. This lack of annotated data constitutes a major challenge for ML-supervised algorithms [202] because the methods have to learn and train with limited annotated data, which in most cases contain weak annotations (sparse annotations, noisy annotations, or only imagelevel annotations) [197]. Therefore, collecting image data in a structured and systematic way is imperative [92] due to the large database required by AI techniques to function efficiently.
An example of a good practice of health data (images and health information) is exemplified by the UK Biobank [203], which has health data from half a million UK participants. The UK Biobank aims to create a large-scale biomedical database that can be accessed globally for public health research. However, the access depends on administrator approval and payment of a fee.
Other difficulties that accompany the labeling of the images in a dataset include a lack of collaboration between clinical specialists and academics, patient privacy issues, and, most importantly, the costly, time-consuming task of manual labeling of data by clinicians [40].
With CNNs, overfitting is a common problem due to the small size of the training data [150], and therefore, it is important the increase of the size of training data. One solution for this problem is the use of the technique of data augmentation, which according to [204] helps improve generalization capabilities of deep neural networks and can be perceived as implicit regularization. For example, Tajbakhsh et al. [197,205] reported in their results that the sensitivity of a model improves by 10% (from 62% to 72%) if the dataset is increased from a quarter to full size of the training dataset. Various methods of data augmentation of medical images are reviewed in [206].
However, in [192], it is suggested that cascaded CNN architectures are a practical solution for the problem of limited annotated data, and the proposed architecture tends to learn well from small sets of data [192].
An additional but no less important problem is the availability of equipment for collecting image data. Even though MRI is better than CT for stroke diagnosis [18], there is also the fact that in some developing countries, the availability of CT and MRI facilities is very limited and relatively expensive. This is coupled with a lack of suitably trained technical personnel and information [40]. Even in developed countries, there are disparities in the availability of equipment between urban and rural areas. These issues are discussed, for example, in a report published by the Organisation for Economic Co-operation and Development (OECD) [207].

Detection of Lesions
It is known that that brain lesions have a high degree of variability [8,64], e.g., stroke lesions and tumors, and hence it is a hard and complex challenge to develop a system with great fidelity and precision. As an example, the lesion size and contrast affect the performance of the segmentation [18].
In the case of WMHs and their association with a disease like ischemic stroke, demyelinating disease, or any other disorder, the set of features to describe their appearances and locations [19] plays a fundamental role in training and requires minimum errors in any model.

Computational Cost
In medical image processing, the computational cost is a fundamental factor, since ML algorithms often require a large amount of data to learn to provide useful answers [116], and hence increased computational costs. Different studies [146,148,208] report that training neural networks that are efficient and make accurate predictions have a high computational cost (e.g., time, memory, and energy) [146]. This problem is often a limitation with CNNs due to the high dimensionality of input data and the large number of training images required [148]. However, graphical processing units (GPUs) have proven to be flexible and efficient hardware for ML purposes [116]. GPUs are highly specialized processors for image processing. The area of general-purpose GPU (GPGPU) computing is a growing area and is an essential part of many scientific computing applications. The basic architecture of a graphic processing unit (GPU) differs a lot from a central processing unit (CPU). A GPU is optimized for high computational power and high throughput. CPUs are designed for more general computing workloads. GPUs, in contrast, are less flexible; however, GPUs are designed to compute in parallel the same instructions. As noted earlier, neural networks are structured in a very uniform manner such that at each layer of the network identical artificial neurons perform the same computation. Therefore, the structure of a network is highly appropriate for the kinds of computation that a GPU can efficiently perform. GPUs have other additional advantages over CPUs, such as more computational units and a higher bandwidth to retrieve from memory. Furthermore, in applications requiring image processing, GPU graphic-specific capabilities can be exploited to further speed up calculations. As noted by Greengard, "Graphical processing units have emerged as a major powerhouse in the computing world, unleashing huge advancements in deep learning and AI" [209,210].
Suzuki et al. [148,211] propose the utilization of a massive-training artificial neural network (MTANN) [212] instead of CNNs because a CNN requires a huge number of training images (e.g., 1,000,000), while the MTANN requires a small number of training images (e.g., 20) because of its simpler architecture. They note that with GPU implementation, a MTANN completes training in a few hours, whereas a deep CNN takes several days [148], and the time taken depends on the task as well as the processor speed.
It has been proposed that one can use small convolutional kernels in 3D CNNs [144]. This architecture seems to be more discriminative without increasing the computational cost and the number of trainable parameters in relation to the task of identification [76].

Discussion and Conclusions
The techniques of deep learning are going to play a major role in medical diagnosis in the future, and even with the high training cost, CNNs appear to have great potential and can serve as a preliminary step in the design and implementation of a CAD system [40].
However, brain lesions, especially WMHs, have significant variations with respect to size, shape, intensity, and location, which makes their automatic and accurate segmentation challenging [197]. For example, even though stroke is considered to be easy to recognize and differentiate from other WMHs for experienced neuroradiologists, it could be a challenge and a difficult task for general physicians, especially in rural areas or in developing countries where there are shortages of radiologists and neurologists, and for that reason, it is important to employ computer-assisted methods as well as telemedicine [213,214]. Montemurro and Perrini [215] state that the current COVID-19 pandemic situation further underscores the importance of telemedicine in neurology and other health aspects (e.g., ophthalmology [216]), is no longer a futuristic concept, and has become new normal (see, for example, [217]). An example of the utility of telemedicine is the success experience reported by Hong et al. [218], who detail how telemedicine during the COVID-19 pandemic is provideing rapid access to specialists who are unavailable in West China, a region that does not have many economic resources or healthcare infrastructure when compared to the eastern part of the country [218]. It should be noted that telemedicine "was more a concept than a fully developed reality" [215], due principally to limitations such as a lack of financial resources, technological infrastructure, regulatory protocols, safety data, trained people, ethical questions, etc. [215,218,219]; these aspects are especially challenging in developing countries [220,221].
To identify stroke, according to Huang et al. [72], the SVM method provides better prediction and quantitative metrics compared to the ANN. In addition, they note that the SVM provides accurate prediction with a small sample size [72,222]. Feng et al. [124] indicate that the biggest barriers in applying deep learning techniques to medical data are the insufficiency of the large datasets that are needed to train deep neural networks (DNNs) [124].
In the ISLES 2015 [26] and ISLES 2016 [75] competitions, the best results were obtained for stroke lesion segmentation and outcome prediction using the classic machine learning models, specifically the random forest (RF), whereas in ISLES 2017 [75], the participants offered algorithms that use CNNs, but the overall performance was not much different from ISLES 2016. However, the ISLES team states that despite this, deep learning has the potential to influence clinical decision making for stroke lesion patients [75]. However, this is only in the research setting and has not been applied to a real clinical environment, in spite of the development of many CAD systems [116].
Although various models trained with small datasets report good results (DSC values > 0.90) in their classifications or segmentations (Table 4 [21,77,190]), Davatzikos [223] recommends avoidance of methods trained with small datasets because of replicability and reproducibility issues [90,223]. Therefore, it is important to have multidisciplinary groups [90,111,224] involving representatives from the clinical, academic, and industrial communities in order to create efficient processes that can validate the algorithms and hence approve or refute recommendations made by software [90]. Related to this is that algorithmic development has to take into consideration that real-life performance by clinicians is different from models.
However, other areas of medicine, for example, ophthalmology, have shown that certain classifiers approach clinician-level performance. Of further importance is the development of explainable AI methods that have been applied to ophthalmology where correlations are made between areas of the image that the clinician uses to make decisions and the ones used by the algorithms to arrive at the result (i.e., the portions of the image that most heavily weigh the neural connections) [83,[225][226][227].
Thus, it is important to actively involve multidisciplinary communities to pass the valley of death [116], namely the lack of resources and expertise often encountered in translational research. This will take into account the fact that currently, deep learning is a black box [49], where the inputs and outputs are known but the inner representations are not well understood. This is being alleviated by the development of explainable AI [84].
Even though there have been remarkable advances, there are only a few methods that are able to handle the vast range of radiological presentations of subtle disease states. There is a tremendous need for large annotated clinical datasets, a problem that can be (partially) solved by data augmentation and by methods of transfer learning [228,229] used in the models principally with different CNN architectures.
Although it is very important to note that processing diseases or tasks in medical images is not the same as processing general pictures of, say, dogs or cats, it is possible uses a set of generic features already trained in CNNs for a specific task to transfer as features for input to classifiers focused on other medical imaging tasks. For example, in medical imaging, see [230][231][232][233]. Therefore, it is important to keep in mind the fact mentioned by Bini [234] that like humans, the software is only as good as the data on which it is trained.
In summary, through the analysis of the literature review, we can conclude:  Although there are some developed models with good metrics, it is clear that not all have enough confidence to be applied in a real clinical environment due to reproducibility and replicability issues.  Our research has noted diverse approaches in the detection differentiation of WHMs, especially with ischemic stroke and demyelinating diseases like MS. These include methods like support vector machines (SVMs), neural networks, decision trees, and linear discrimination analysis.  The need for a large annotated dataset to train and get better results is noted. For that reason, it will be ideal if the scientific and medical community can achieve a global repository of medical images to get models that could be universally applicable and overcome the fact of developed models being only applicable to a specific population.
Finally, we can say that further research on deep learning techniques like CNNs, transfer learning, and data augmentation can help improve the efficiency of CAD systems. In addition, in medical image analysis and diagnosis, it is important to include clinical as well as basic scientific and computational knowledge in order to develop models that could be useful to humanity and allow us to deal with health crises like the current COVID-19 pandemic, where, for example, the analysis and processing of chest X-ray images [233,235,236] constitute an important tool to help in the diagnosis of the disease.