Transfer Learning from Healthy to Unhealthy Patients for the Automated Classiﬁcation of Functional Brain Networks in fMRI

: Functional Magnetic Resonance Imaging (fMRI) is an essential tool for the pre-surgical planning of brain tumor removal, which allows the identiﬁcation of functional brain networks to preserve the patient’s neurological functions. One fMRI technique used to identify the functional brain network is the resting-state-fMRI (rs-fMRI). This technique is not routinely available because of the necessity to have an expert reviewer who can manually identify each functional network. The lack of sufﬁcient unhealthy data has so far hindered a data-driven approach based on machine learning tools for full automation of this clinical task. In this article, we investigate the possibility of such an approach via the transfer learning method from healthy control data to unhealthy patient data to boost the detection of functional brain networks in rs-fMRI data. The end-to-end deep learning model implemented in this article distinguishes seven principal functional brain networks using fMRI images. The best performance of a 75% correct recognition rate is obtained from the proposed deep learning architecture, which shows its superiority over other machine learning algorithms that were equally tested for this classiﬁcation task. Based on this best reference model, we demonstrate the possibility of boosting the results of our algorithm with transfer learning from healthy patients to unhealthy patients. This application of the transfer learning technique opens interesting possibilities because healthy control subjects can be easily enrolled for fMRI data acquisition since it is non-invasive. Consequently, this process helps to compensate for the usual small cohort of unhealthy patient data. This transfer learning approach could be extended to other medical imaging modalities and pathology.


Introduction
Medical imaging is one of the most investigated use cases for machine learning in healthcare [1]. While effort remains consistent in developing and improving algorithms, data availability is crucial for deploying efficient machine learning solutions [2]. The recent COVID-19 pandemic has demonstrated, for instance, how the availability of a large annotated dataset could significantly boost the power of machine learning [3]. However, in most clinical practices, such an initiative to share a large dataset is still limited.
The machine learning community has developed several workaround approaches to compensate for the lack of data. This compensation can be obtained using algorithms Functional MRI (fMRI) is a method that eases the understanding of brain activation by analyzing the blood-oxygen-level-dependent (BOLD) signal, allowing the identification and localization of functional brain areas. The development of this technique promotes a better understanding of the functional anatomy of the human brain and a more accurate characterization of the inter-individual topographical variability in functional brain areas, such as language areas [13]. Thus, some fMRI techniques are progressively included as a procedure in several pathologies for surgical planning [14][15][16][17].
The standard fMRI approach is a task-based block paradigm contrasting brain activation at rest and when performing a specific task. However, despite its usefulness, this technique presents several drawbacks, and inconsistencies [18]: the patient's cooperation is needed, and it is unsuitable for young children and patients unable to perform the task. In addition, the study of several functional networks is time-consuming, and it requires the acquisition of each network with subsequent development of a specific activation task paradigm [19]. An alternative for the task-based characterization of functional networks is the restingstate fMRI (rs-fMRI), which studies the synchronization of low-frequency oscillation between brain areas at rest [20,21]. It is possible and practical to identify from these signals the so-called Intrinsic Connectivity Networks (ICNs), which reflect the neuro-anatomical substrate that corresponds to the brain's functional networks [22,23]. However, rs-fMRI for functional network identification is not yet part of the pre-operative routine because of the high level of expertise needed for ICN identification. Indeed, each of the ICNs needs to be visually reviewed by an expert to identify an individual functional network of interest [24]. To broaden the use of this technique in the pre-surgical planning for various surgical procedures, the initial stage consists of the effective automation of fMRI brain network identification in patients' data.
In the literature, automated machine learning algorithms have been the subject of several studies to identify disease patterns in rs-fMRI data, especially in epilepsy [25,26], as well as traumatic brain injuries [27], addiction [28], cognitive impairment [29], and psychiatric disorders such as depression and schizophrenia [26,28]. There have been relatively few attempts [22,24,30] to automatically identify functional networks on rs-fMRI data using machine learning. Lu et al. [30] developed an instance-based automated method for identifying language networks in brain tumor subjects using independent component analysis (ICA)-based mapping on rs-fMRI. By contrast, we are data-driven and do not limit ourselves to only language networks. In fact, our study considers seven functional networks. Each of these studies has its defined scopes, data variants, and functional networks used for automated identification in rs-fMRI for pre-surgical planning. In [24], the authors proposed a task-free paradigm for acquiring fMRI data, which was less demanding for patients and easy to administer. Further investigation was carried out on right-handed healthy control subjects. A semi-automated language component identification procedure was proposed and tested on healthy patients [24]. In this article, we consider unhealthy patients in addition to healthy subject data. In the study by [22], a model was trained to identify the main functional networks in a small number of healthy volunteers for different functional networks. The performance of the simple feed-forward network proposed in [22] is ultimately dependent on handcrafted features extracted from fMRI images. The above concerns motivated our proposition to design a specific end-to-end deep learning [31,32] knowledge transfer method to identify and automate the detection of functional networks in the rs-fMRI of unhealthy patients. This approach has the advantage of being applicable to patients in need of brain surgery due to brain tumors or other reasons.
While some efforts are being made to provide more and more public datasets of medical images of large interest, there are currently still few available public datasets of resting-state fMRI of healthy or unhealthy individuals [33,34]. However, these datasets have been produced with slightly different protocols than ours. These differences include the type of disease, number of participants, and MRI sequence for some areas. These differences would prevent the transfer learning approach on our dataset. Other related datasets include the database of [35]. It is made of 227 healthy individuals aged 18 to 74 to investigate the impact of adult age on functional brain connectivity; the database of [36] includes 993 patients and 1421 healthy individuals to classify psychiatric disorders. We investigate patients with brain tumors. Therefore, these datasets would also not allow a direct transfer learning approach from healthy to unhealthy on our data. Therefore, the situation of clinical interest considered in this study is perfectly suited to test the possibility of transferring knowledge from healthy to unhealthy patients.
As innovative elements, we (i) automatically identify functional networks on rs-fMRI data for the first time with an end-to-end deep learning method as opposed to handcrafted features that were previously proposed in the closest literature for this problem [22]. (ii) We demonstrate the value of transfer learning from a model of healthy control subjects to unhealthy patients with a brain tumor.

Database
We obtained data from 81 healthy subjects and 55 unhealthy patients. While healthy data were acquired from regular volunteers, unhealthy data were obtained from patients with brain tumors with a specific lesion region, as indicated by the provided binary lesion mask. A detailed description of the unhealthy population is provided in [13]. This is a single-center, prospective, open-label trial, in compliance with regulation and ethical guidelines for clinical research, approved by the local ethics committee (Comité de protection des personnes Ouest II, decision reference CPP 2012-25). A total of 81 healthy volunteers (36 females and 45 males) aged from 23 to 38 years old were included and signed written informed consent. Fifty-five adult patients with a brain lesion treated in the Department of Neurosurgery of the university hospital of Angers underwent a preoperative fMRI language mapping with both rs-fMRI and task fMRI, as well as a perioperative cortical mapping of eloquent brain language areas in awake condition. All subjects gave their written, informed consent before enrolling in this study.
For all healthy and unhealthy data, we extracted 55 features ICA with a specific interest in 7 brain features. One of the main difficulties with independent component analysis in resting-state fMRI is the determination of the total number of components (TNC) to be used, which may lead to suboptimal decompositions with the merging of multiple networks in case of low TNC or the fragmentation of a functional network into multiple components in case of high TNC [37,38]. Our choice to analyze 55 ICs among all patients was based on previous works and appeared to be a good compromise to identify functional brain networks [23,39]. These brain features correspond to seven biological networks of the brain, which are the Language Network (LANG), Salience Network (SAL), Ventral Attention Network (VAN), Default Mode Network (DMN), Left Fronto-parietal Control (lFPCN), Right Frontoparietal Control Network (rFPCN), Dorsal Attention Network (DAN). The seven selected brain features represent the main ICN identified and described in resting-state fMRI literature. These particular networks were selected for the DMN to serve as a control for the others because of the inter-individual variability that makes them difficult to identify using detection software or by non-expert reviewers. These connectivity networks correspond to known functional networks that support cognitive functions and have been used for pre-surgical planning [38,40]. The connectivity networks were also found to be consistent between rs-fMRI and various fMRI data acquisition and analysis techniques [41]. Functional networks without anatomical variabilities, such as the motor, sensory, or visual cortex, were not considered for algorithm training and automated identification because of their consistent anatomical location.
Image labels for each healthy and unhealthy data file marked by domain experts were used to assign each image to its respective network class. In addition to the two variants of network images provided for both healthy and unhealthy, unhealthy data include details of the brain tumor as described in Table 1 and shown in Figure 1.  Table 1: (a) is the lesion mask, (b) is the grey matter mask, (c) is white matter mask, (d) is cerebrospinal fluid mask, (e) is whole brain, cerebrospinal fluid (skull and skin included), (f) is whole brain (white and grey matter). Is the mask for the white matter (no activation inside the white matter, but may be a good way to estimate the brain deformations linked to the tumor and the peritumoral edema) The mask for the cerebrospinal fluid (like for the white matter, no activation inside, but may be useful to estimate brain deformations) 5 Whole brain-white gray matter (wms) The whole brain (white and gray matter) in T1 anatomical MRI sequence, with the skin and skull clipped 6 Whole brain (wmrs) This provides view of the whole brain cerebrospinal fluid, skull and skin included

Data Acquisitions and Preprocessing
All fMRI acquisitions were performed using a 3 Tesla MRI (Magnetom Skyra, Siemens medical systems, Erlangen, Germany) with slice thickness of 4 mm, which yielded a voxel size of 3 × 3 × 4 mm 3 and consequently a 3-dimensional image of 42 px × 51 px × 34 channels. The fMRI sequences were acquired for each patient in the following order: an anatomical 3D T1, one resting-state acquisition, and two task-induced activities. All patients and healthy volunteers enrolled did not have language impairment at the moment of the fMRI acquisition and during the surgical procedure. The first three volumes acquired in each sequence were discarded to allow the stabilization of the magnetic field gradients.
Data preprocessing was performed using MatLab (The MathWorks, Natick, MA, USA) with Anatomy, SPM8, and VBM 12 toolboxes. The preprocessing of fMRI data was performed using the following steps: slice-timing correction, realignment to the first volume of the first session, and unwrapping to correct head movements and magnetic distortions. Images were then segmented and normalized to the Montreal Neurological Institute template [42]. Rs-fMRI data of each patient were segmented into 55 spatial independent components (ICs) through an intrinsic connectivity network spatial independent component analysis (SICA) approach employing a customized version of the infomax algorithm running under Matlab [18,43]. ICs correspond to 3D fMRI activation volumes of brain areas with spontaneous synchronous activity.
The identification of reference fMRI brain networks was performed manually for each subject by two independent and experienced reviewers (J.-M.L. and A.T.M.) without any disagreement. Based on fMRI activation peaks and spatial distribution of these activations, we selected seven main networks of DMN, LANG, VAN, lFPCN, rFPCN, SAL, and DAN among the 55 generated ICs for each patient. The annotated images were used in two versions: full images (connectivity map) and corresponding thresholded images. Figure 2 shows an image sample of networks. Individual spatial components were thresholded at z = 2 at the cluster level, corresponding to the 5% most activated voxels in each intrinsic connectivity network. This methodology is consistent with the literature and allowed us to overcome the background activation noise to identify the anatomical location of specifically activated brain areas [44].

Identification of Functional Networks through Machine Learning Algorithms
Among the 55 ICs identified using SICA approach, only a few correspond to functional networks. Several ICs were, in fact, background noise, which was characterized by a low number of activated voxels. Functional networks generally comprise between 1200 and 3000 activated voxels. In order to reduce the number of ICs for each patient and improve the performance of the functional network identification, we added a preliminary step with the exclusion of all ICs with less than 850 activated voxels. We chose to introduce this threshold to fix a minimal number of activated voxels, above which it can be considered as network. Too few activated voxels were found to be only noise and not related to connectivity networks of interest during the manual review by the two expert reviewers. This procedure was performed to discard "noise" networks and increase the algorithm's detection sensitivity. Then, we extracted the coordinates of the maximal activation peak of each cluster in order to minimize the number of variables considered for training before feeding the data into algorithms.
To identify the most suited family of machine learning algorithm for functional network classification, we implemented six machine learning algorithms, such as Random Forest, Feed forward Neural networks, Naïve Bayesian classifier, K-Nearest Neighbors, Support vector machine, and Classification tree. The random forest classifier consists of a combination of tree classifiers, 100 in our experiment. Each classifier is generated using a random vector sampled independently from the input vector. Each tree casts a unit vote for the most popular class to classify an input vector. Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, such as the probability that a given sample belongs to a particular class. Naive Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes.The k-nearest neighbors classifier [45] stores the complete training data. New examples are classified by choosing the majority class among the k closest examples in the training data. We used the Euclidean distance to measure the tile distance between examples for our particular problem. Support Vector Machine is a powerful method for building a classifier. It aims to create a decision boundary between two classes that enables the prediction of labels from one or more feature vectors. This decision boundary, known as the hyperplane, is orientated so that it is as far as possible from the closest data points from each of the classes. Decision trees [46] recursively split the feature space based on tests that evaluate one feature variable against a threshold value. We used the information gain criteria for choosing the best test and top-down pruning with a value of 0.95 to reduce over-fitting.
In addition to the six shallow learning methods, we included deep learning methods in our benchmark test. Deep learning aims at jointly learning feature representations with the required prediction models. We chose the predominant approach in computer vision, namely, deep convolutional neural networks [47]. The baseline approach resorts to standard supervised training of the prediction model (the neural network) on the target training data. No additional data sources were used. In particular, given a training set comprised of K pairs of images f i and labelsŷ i , we train the parameters θ of the network r using stochastic gradient descent to minimize empirical risk: where L denotes the loss function, which is cross-entropy in our case. The minimization is carried out using the Adam optimizer [48] with a learning rate of 0.001. The architecture of networks r(·, ·), shown in Figure 3, has been optimized on a cross-sample set and is given as follows: three convolutional layers with filters of size 3 × 3 and respective numbers of filters 64, 128, 256, each followed by ReLU activations and 2 × 2 max pooling; a fully connected layer with 256 units, ReLU activation and dropout (0.5); and a fully connected output layer for 7 classes and a softmax activation. The hyperparameters of the optimized CNN were based on a grid-search operating on the depth of the neural network. Other dimensions could be further investigated such as width, such as in EfficientNet [49]. Here, we do not seek an absolute best performance but rather focus on the possible relative gain in performance brought by transfer learning from healthy controls to unhealthy patients. In addition to the optimized CNN of Figure 3, we also included comparison with standard CNN architectures such as VGG16 [50], ResNet [51] and DenseNet [52]. The tested shallow and deep supervised learning classification algorithms were implemented based on fMRI data from 81 healthy subjects. The training dataset included 78 individual cartography of each of the seven main functional networks, corresponding to the seven identified networks among the 55 ICs generated for each of the 78 healthy control subjects in the training group. In order to reduce the dimensionality and minimize over-fitting in shallow learning algorithms, we extracted the coordinates of the network activation peak of each cluster in order to minimize the number of variables considered for training before feeding the data into algorithms. Each algorithm was trained ten times with a cross-validation strategy to ensure robustness and confidence. Algorithms were then tested using the fMRI data from the four other healthy subjects. We used each of these algorithms for each patient to identify the seven identified networks among generated 55 ICs from the main functional networks. The identified networks were further compared to the reference networks by our two expert reviewers for validation. We identified the most suited algorithms for identifying the seven main functional networks (DMN,  lFPCN, LANG, rFPCN, SAL, DAN, and VAN). Finally, we tested the different parameters of the model to optimize the results. The best method was selected based on the highest classification performances.

Transfer Learning Strategies
The best model from the previous section was then investigated in its capability to transfer to unhealthy patients. We explored three main transfer learning techniques: brute transfer, mix transfer, and weight transfer. These techniques allow our unhealthy test data to be identified by some knowledge from healthy data and augmented data. In the brute transfer, a model was entirely trained on data from healthy controls, while in the mix transfer, the training database contained some unhealthy data. For the weight transfer method, our saved model weights from healthy data were loaded for further training and fine-tuning with unhealthy patient data. We tested the model with unseen unhealthy data (patients with tumors). We trained all transfer learning models at a learning rate of 1 × 10 −5 with 500-1000 epochs. To minimize over-fitting, we used an early stopping method based on the validation error increase. A grid-search algorithm chose optimal hyperparameters for the CNN model based on maximized precision of the training data: the stopping points for network training were ten validation failures followed by a model checkpoint.

Data Augmentation
Data augmentation was achieved in two ways. First, we computed a spatial stretch on the healthy fMRI network images similar to the effect of a brain tumor on the area within and about 3-5 px around the region of the lesion mask (See Figure 1). A classical filter known as pinch-explode was used for this purpose Figure 4. Second, we introduced a randomly generated 3D lesion mask. The lesion masks were chosen with a radius of 0-10 px across the 10th to 32nd channels of our image data with dimensions 42 px × 51 px × 34 channels, comparable to real tumor masks, as shown in Figures 5 and 6. With such a signal void, we turned the image voxels of the brain tumor region using our masks into zero values, i.e., no signal, to mimic the expected drop of fMRI signal inside the tumor. In both data augmentation ways, the input images were healthy patients. The transformations were chosen (stretch and signal-void) to simulate the expected impact of the tumor on the fMRI signal. In this spirit, data augmentation is another form of transfer learning from healthy to unhealthy patients to be compared with the other transfer learning approaches of the previous sections.

Experimental Results
In this section, we give experimental results using the acquisition protocol and training strategies described in the method Section 2. In the first subsection, we compare the performance of several ML techniques to find the best baseline method which can be used in the second subsection for our transfer learning experiments. Finally, in the last subsection, we compare our result with the closely related literature.

Performance Comparisons
The comparison of the different algorithms in Table 2 identified the proposed CNN model as the most efficient approach for identifying the functional networks of interest on healthy subjects. In addition to the comparison presented in Table 2, we extended our effort to implement other well-known CNN architectures such as VGG16, ResNet, and DenseNet on our dataset. However, the performance of these models was recorded in the range of 50% to 55% on healthy data and, therefore, was perceived to be unreliable. The observed difficulty was in the dimension of the original images and the total number of images in our dataset. The typical image size for well-known CNN architectures for computer vision (such as VGG16, ResNet, and DenseNet) is considered to be at 224 pixels × 224 pixels, as they are mainly designed to work on the ImageNet database [53]. Our original images are in multi-channel format and therefore have a size of 42 pixels × 51 pixels × 34 (width, height, channel). In order to adjust the image size, a bi-cubic interpolation has been used to up-sample image size by a factor of 4. This up-sampling reduced the quality of images and caused a significant drop in the performance of the models. On the other hand, the number of training images is much lower than the number of parameters in the well-known CNN architecture, leading the model to over-fit and reducing the model performance.

Transfer Learning
We selected the best method identified in Table 2 for healthy data and conducted the transfer learning approaches on this method to data from unhealthy patients. The results in Table 3 show the recorded accuracy values for several experiments on the proposed CNN model. Each defines the data used for training and testing with their respective data sizes. It has to be mentioned that the trained model never sees the testing data, neither during the training process nor the hyper-parameters' tuning process. Several baseline experiments were conducted to assess the other added value of transfer learning approaches. First, we trained on healthy control data and tested on healthy control. This experiment provided an upper bound of performance with the highest accuracy of 86%. This high score is possibly also due to the expected higher homogeneity of healthy control. The same experiment was carried out while training unhealthy and testing unhealthy patients. A drop of about 10% of accuracy was observed, which builds a second baseline with fewer patients. The investigated transfer learning approaches were expected to provide performances between these two bounds. We considered four transfer learning strategies for this experiment: (i) brute transfer (training on healthy and testing on unhealthy data), (ii) mixed transfer (adding some unhealthy data to healthy data to train the model), (iii) weight transfer (fine-tuning on unhealthy data) and (iv) transfer learning with data augmentation.
On the brute transfer strategy, as indicated in Table 3 row 3, we trained our model with 81 healthy control subjects and conducted testing on all 55 unhealthy patients. We recorded an average accuracy of 0.74 ± 0.01 for all test data size ranges. The brute transfer is therefore not bringing any improvement here. For the mix transfer strategy, Table 3 row 4, we trained our model with 81 healthy control subjects and 45 unhealthy patients. At the same time, we performed our model test with ten unhealthy patients. An improvement in accuracy to 0.77 ± 0.01 on test data was observed by comparison with the brute transfer. The addition of data helps, even with a mixture of healthy and unhealthy patients by comparison with pure unhealthy patients experiment of row 1. However, we do not reach the upper bound performance of row 1 despite having more data than in this experiment. This performance demonstrates a discrepancy between healthy and unhealthy patients. Figure 7 shows the validation accuracy (from validation data) of the trained model on healthy data for various amounts of added unhealthy patients (10,20,30,45). We recorded a ∼ =1% increase in validation accuracy for every ten unhealthy patient data added to training data (seven functional network images per patient). As the third transfer learning strategy, in Table 3 row 5, we transferred the weight and bias of a model fully trained on healthy data (model of row 1) to a model for training on unhealthy data. The model was retrained and finetuned on 45 unhealthy patients and tested on the 10 remaining patients. A performance of 0.78 ± 0.01 is obtained on unhealthy test data. This result is the highest performance among all tested transfer learning strategies. The three transfer learning strategies were repeated in the presence of augmented data (Table 3 rows 6 to 10). Augmented data were produced by data augmentation techniques (Section 2.5) from healthy data to simulate unhealthy data. The recorded performances in these experiments remained in the same range as other transfer learning approaches.

Comparison with Prior Works
As a closely related work, Mitchell et al. [22] focus on identifying selected functional networks in 21 healthy volunteers by training a simple feed-forward neural network model. This approach was achieved using a Multilayer Perceptron (MLP), which usually follows the procedure of hand-crafted features extracted from data. Generally, Multilayer Perceptrons (MLP) are fully connected neural networks which generate outputs based on inputs. Literature sometimes uses MLP interchangeably with Deep Neural Network (DNN); however, there is a sharp contrast because MLP is a subset of DNN. In this case, there is a pre-selection of ICs of interest. Our ICs were generated using a bottom-up, data-driven approach using an independent component analysis. ICA has gained popularity as one of the two frequently selected analytical methods for rs-fMRI data, which requires no seed on any predefined region [54,55]. In contrast, ICs generated in Mitchell et al.'s [22] study used canonical seed regions of interest scattered across the brain. These two approaches may provide similar features for further analysis. However, hand-crafted feature extraction can limit the flexibility and potential of identifying certain functional brain areas, as demonstrated in our approach. In addition, the location of the seed regions could significantly impact the resulting pattern of a functional system such as the Language network. Furthermore, sensitivity to systematic noise such as head movement and physiological nuisance signals causes false identification of non-language areas (false positive) and false detection of putative language areas (false negative), which limits the clinical application of seed-based rs-fMRI in language mapping [30]. The comparison of our proposed CNN performed in the same conditions as Mitchell's work and the method of [22] is given in Table 4 and demonstrates the interest of our approach.

Discussion and Error Analysis
The results of this study indicate that healthy control can help to boost the functional network identification for unhealthy patient data by adding the healthy data during the training process. In this section, we discuss the observed errors and further analyze the origin of the transferability between healthy to unhealthy data.
One may wonder "where do the classification errors in this experiment can come from?". We generated the confusion matrix ( Figure 8) as well as the sensitivity (true positive rate) and specificity (true negative rate) of the classification individual functional brain networks to discover the most sensitive cases. Table 5 shows the model evaluation of each individual network for the classification of healthy subjects, unhealthy patients and transfer learning. The primary source of confusion between the different functional networks is the spatial overlap between the activated areas. We segmented the functional network identification into classification steps, identifying in each of them between the 55 ICs the best-fitted ICs for all 7 functional networks. We realized that the main sources of error came from the confusion between LANG and the VAN, as well as DAN and rFPCN as shown in Figure 8. The difficulty in differentiating between DAN and rFPCN may be explained by the spatial overlapping between the two networks [56]. In contrast, the relationship between VAN and LANG networks is more complex than in other networks. The distinction between the language and ventral attentional networks in rs-fMRI may be difficult, as they present similar activations in the ventrolateral prefrontal cortex, inferior frontal cortex and temporal gyrus in right-handed patients [57]. However, slight differences in the activation may allow for discrimination between these two networks in the inferior parietal lobule, in which the activation is more anterior, located in the temporoparietal junction and the supramarginal gyrus for the attentional network and more posterior in the angular gyrus for the language network [13,57,58]. The ventral attentional network is also located in the non-dominant hemisphere, almost symmetrical to the language network in the dominant hemisphere, which may also explain the difficulties of discriminating between these two networks. Considering the lateralization of these two networks, the handedness assessment using the Edinburgh handedness inventory has been considered as a supplement to discriminate between ventral attentional and language networks [13]. However, while this information may be useful in right-handed patients where left-hemisphere dominance exists in 96% of patients. Left-handed patients should be considered with caution since only 27% of left-handed patients have a dominant right hemisphere and, therefore, a left-lateralized ventral attentional network [59].  We investigated the overlapping surface of thresholded functional networks and the lesion mask in unhealthy patients to understand better the possibility of transfer from healthy to unhealthy data. The distribution of the intersection over union (IoU) values of 3D binary images of all unhealthy patients data is shown in Figure 9 for correct and wrong classification. Most of the thresholded functional networks have little or no overlap with the lesion mask. The normalized versions of these histograms are provided in Figure 10. The two distributions were observed to be highly skewed values of 2.11 and 1.94 for IoU of correctly and wrongly classified images, respectively, indicating non-Gaussian distribution. These histograms show that the category (correctly classified or wrongly classified) are estimated to be equal across the different IoU values as also confirmed by the p-value of 0.75 in the t-test carried out from the IoU distribution, which indicates non-significance (>0.05) in a difference between the two categories. To qualitatively illustrate this statistical fact, Figure 11 provides a scenario where images with or without overlap are correctly or wrongly classified. No direct effect of the tumor on the thresholded functional networks targeted is observed in our dataset. This observation can explain the possibility of transfer learning from healthy to unhealthy data. Nonetheless, we found a useful but not perfect transferability, and therefore, a discrepancy should exist. This could be in the intrinsic shape of the functional network of unhealthy patients, which may be distorted when located in the vicinity of the tumor.

Conclusions
This work demonstrated the interesting possibility of transfer learning from healthy controls to unhealthy patients. This was illustrated for the automatic identification of functional brain networks in rs-fMRI for patients with brain tumors. This result is important as it opens up an easy way to overcome the lack of data in machine learning for biomedical imaging. We demonstrated that healthy control data could boost the classification of functional brain networks in rs-fMRI for patient with brain tumors. This was obtained with an optimized classical CNN, which was shown to outperform standard CNN architectures and shallow learning methods, including the one previously tested in the literature on healthy subjects. The overall best performance obtained with unhealthy patients after transfer learning was 0.78%. The remaining errors where found to be indeed corresponding to difficult cases. The gain brought by the transfer from healthy subjects was about 4%, which is a classical order of magnitude in transfer learning. These performances remain smaller than the best performance obtained only on healthy control subjects (0.86%). Brain tumors make the classification harder than in healthy subjects; nonetheless, the knowledge gained from healthy control subjects can help classify functional brain networks in rs-fMRI with unhealthy patients. It is, therefore, an interesting result since healthy control subjects can be enrolled relatively quickly in hospitals for the non-invasive rs-fMRI studies.
The limiting factor in transferring knowledge from healthy to unhealthy patients may be the discrepancy between healthy control and unhealthy patients, which occur due to the influence of tumor on a region of the functional brain network. Several paths to compensate for this discrepancy could be investigated. Style transfer from healthy to unhealthy could be investigated to perform this compensation in the image domain. In addition, one could consider domain adaptation in the neural network to operate this shift in the latent space rather than in the image. Lastly, one could also consider the pre-processing image approach to compensate in the image domain for the distortion (spatial deformation, bold signal attenuation, etc.) brought by the tumors in the images. In this article, we demonstrated the possibility of a transfer of knowledge from healthy to unhealthy patients. The same methodology could be extended to other biomedical imaging for which the production of large cohort is critical to benefit from machine learning.