Semi-Supervised Learning in Medical MRI Segmentation: Brain Tissue with White Matter Hyperintensity Segmentation Using FLAIR MRI

White-matter hyperintensity (WMH) is a primary biomarker for small-vessel cerebrovascular disease, Alzheimer’s disease (AD), and others. The association of WMH with brain structural changes has also recently been reported. Although fluid-attenuated inversion recovery (FLAIR) magnetic resonance imaging (MRI) provide valuable information about WMH, FLAIR does not provide other normal tissue information. The multi-modal analysis of FLAIR and T1-weighted (T1w) MRI is thus desirable for WMH-related brain aging studies. In clinical settings, however, FLAIR is often the only available modality. In this study, we thus propose a semi-supervised learning method for full brain segmentation using FLAIR. The results of our proposed method were compared with the reference labels, which were obtained by FreeSurfer segmentation on T1w MRI. The relative volume difference between the two sets of results shows that our proposed method has high reliability. We further evaluated our proposed WMH segmentation by comparing the Dice similarity coefficients of the reference and the results of our proposed method. We believe our semi-supervised learning method has a great potential for use for other MRI sequences and will encourage others to perform brain tissue segmentation using MRI modalities other than T1w.


Introduction
Automated quantitative metrics of structural magnetic resonance imaging (MRI), such as cortical volume or thickness, have commonly been used as objective indicators of neurodegeneration related to aging, stroke, and dementia. Recently, they have been combined with other MRI-based biomarkers such as white-matter hyperintensity (WMH) for investigation in Alzheimer's disease (AD) or aging research [1,2].
The WMH biomarker indicates bright areas appear in the white matter on T2 fluidattenuated inversion recovery (FLAIR) sequences. The etiologies of WMH are diverse, and it is considered primarily a marker of small-vessel cerebrovascular disease. WMH represent increased blood-brain barrier permeability, plasma leakage, and the degeneration of axons and myelin [3].
Larger WMH regions are associated with an accelerated cognitive decline and increased risk for AD [4]. Recent studies suggest that WMH plays a role in AD's clinical symptoms and there is synergistic contribution of both medial temporal lobe atrophy and WMH on cognitive impairment and dementia severity [5]. Patients with mild cognitive impairment or early AD had concurrent WMH, which indicates a more significant cognitive dysfunction than those with a low WMH burden [5]. WMH predicts conversion from mild cognitive impairment to AD [6]. In addition, WMH has been reported to have a relationship with structural changes and cognitive performance, especially with respect to processing speed, even in cognitively unimpaired participants [7].
In clinical practice, WMH burden is usually estimated using a visual scale such as the Fazekas scale, but this cannot be used as an objective indicator without analyzing the volumetric ratio between WMH and white matter (WM). The quantification of WMH is essential for evaluating the association of WMH burden with cognitive dysfunction and longitudinal changes in WMH volume. Hence, a reliable automated method for measuring WMH and cortical volume would be helpful in clinical practice. Recently, it was reported that WMH progression is associated with changes in cortical thinning. Therefore, automatic measurement methods for WMH burden and cortical volume measurement on FLAIR MRI would be clinically valuable for tracing the longitudinal change in patients with cognitive impairment [8].
Thus, brain structural analysis, especially volumetric analysis, combined with WMH could provide more descriptive information to reveal the relationship between cognitive performance and MRI-based biomarkers. There are various brain tissue segmentation tools for three-dimensional (3D) T1-weighted (T1w) MRIs, such as FreeSurfer [9], SPM [10], and FSL [11]. However, brain tissue segmentation tools for other magnetic resonance sequences (such as FLAIR, susceptibility-weighted imaging (SWI), and gradient echo (GRE) sequences) are rarely developed because their aim is not to measure brain volume or analyze brain morphology precisely.
A deep learning-based automatic segmentation algorithm was first used to diagnose brain tumors and showed a high automatic detection rate and high accuracy [12]. However, accuracy and reproducibility of automatic segmentation methods for WMH have not been achieved because the gold standard of manual segmentation is time consuming to create. It is also difficult to obtain intra-and inter-rater reliability. In reality, it is rare for clinicians to obtain both T1w and FLAIR MRIs because of the burden of scanning time. Ultimately, this situation hinders approaches that perform brain tissue segmentation on non-T1w sequences.
In this study, we propose a brain tissue segmentation method that enables us to obtain trainable brain labels on FLAIR MRIs. With given T1w MRI and FLAIR MRI paired datasets, we initially generated the brain labels on the T1w MRIs and aligned the label to FLAIR MRIs using the co-registration method. Then, the label quality was improved using a semi-supervised learning method [13]. Finally, we trained a deep neural network-based brain segmentation model for FLAIR MRI.

Subjects
This study has the following Institutional Review Board (IRB) approval. As shown in Table 1

Overview of the Proposed Method
Our goal is to produce brain tissue and WMH segmentation exclusively on FLAIR MRI. However, it is impractical to generate the ground truth using FLAIR MRI because it lacks structural information when compared with T1w MRI. Therefore, we suggest the following steps, as shown in Figure 1: (A) Brain Tissue Segmentation from FLAIR MRI: T1w MRI brain tissue labels are generated using FreeSurfer and the labels are co-registered with the FLAIR MRI. (B) Brain Tissue Segmentation Enhancement: the co-registered pseudolabels are enhanced with an initial semi-supervised learning method using a deep learning segmentation architecture followed by a morphological correction. (C) Brain Tissue and WMH Segmentation: this process trains the brain tissue and WMH segmentation processes individually and merges the results into one label with their predictions.

Pseudo-Labeling from T1w MRI
For the T1w MRIs, we used FreeSurfer (6.0, Boston, MA, USA) with a "recon-all" pipeline, then extracted the brain labels as the pseudo-labels, which consist of cerebral gray matter, cerebral white matter, cerebellum gray matter, cerebellum white matter, and lateral ventricle from aseg+aparc.mgz [9].

Co-Registration
Co-registration is a method that aligns two individual MRIs (e.g., different modalities) obtained from the same subject. In our case, this is used to align the T1w MRI with the FLAIR MRI. Because the primary purpose of the first step of our process is to generate initial brain tissue labels on the FLAIR MRI, we calculated the transform matrix from the T1w MRI to FLAIR MRI using a spatial co-registration method with rigid transformation from the SimpleITK library [14]. We transformed the pseudo-labels to the FLAIR MRI using the registered transform matrix. However, because of differences in the MRI spacings and dimensions, the result did not delineate brain tissue structure accurately. Therefore, we iteratively enhanced the brain tissue segmentation labels of the FLAIR MRIs.

Deep Learning-Based Initial Segmentation
We trained a convolutional neural network (CNN) with the FLAIR MRI and coregistered pseudo-labels of the brain tissue as shown in Figure 2. For the initial segmentation model, we used U-Net [15] with an evolving-normalization (EvoNorm) activation layer [16]. In the preprocessing step, the histogram-based intensity regularization, minmax normalization with percentile cut-offs of (0.05, 99.95), and z-score normalization were performed, and the input shape was set as 196 × 196. We used medical MRI-based augmentation techniques to improve the robustness of the CNN-based segmentation architecture. Our data augmentations were applied using TorchIO [17].

Morphological Label Correction
After training on the brain tissue segmentation in the FLAIR MRIs, there remains some noise that makes the training label data incomplete. Thus, we perform a simple morphological correction method based on brain structure characteristics to enhance the brain tissue labels. In addition, we performed connected component-based noise reduction using the fill-hole method [18] by connecting the nearest 26 voxels in three dimensions. The morphologically processed brain tissue label does not have either isolated labels or holes. Reference segmentation of the WMH was performed by manual outlining on the FLAIR MRIs. A total of 308 FLAIR MRI datasets were manually segmented, producing binary masks with a value of 0 (non-WMH class) or 1 (WMH class). The manual segmentation process was performed through the consensus of three certified radiologists (J.Y. Kim, S.W. Oh, and M.K. Lee) who did not have access to the T1w MRIs for the subjects. For the process of consensus, two radiologists discussed the criteria used for defining manual segmentation and had training sessions to standardize their visual skills. Manual segmentation was performed independently, and the segmentation results were then exchanged for confirmation. Chronic infarcts were hypointense with hyperintense rim lesions on the FLAIR MRI and were excluded from WMH labeling.

Preprocessing
We performed the resampling into a 1 mm 2 isometric space (z-direction was excluded from this process because our training process requires a 2D slice MRI). For WMH segmentation, we used the skull-stripping method with HD-BET [19] on the FLAIR MRIs of the WMH dataset to focus the regions corresponding to our training on the white-matter regions. Moreover, as shown in Figure 3, we used histogram-based intensity regularization, min-max normalization with percentile cut-offs of (0.05, 99.95), and z-normalization using TorchIO [17] to deal with the differences in MRI intensity variance.

Training
We compared three well-known segmentation architectures for the brain tissue segmentation and WMH segmentation: U-Net [15], U-Net++ [15], and HighRes3DNet [20]. We used the kernel sizes specified in the original article for each architecture. We set the input and output shapes to 196 × 196 and used the EvoNorm activation layer [16] instead of batch normalization and an activation function. The only difference between the training processes of brain tissue segmentation and WMH segmentation is the data augmentation.
For brain tissue segmentation, we used the following TorchIO augmentation methods [17].

•
RandomAffine, which has a scale parameter in the range of 0.85-1.15. • RandomMotion, with a degree value up to 10 and a translation value up to 10 mm. • RandomBiasField, with a magnitude coefficient parameter ranging between −0.5 and 0.5. • RandomNoise, which has a mean value of Gaussian distribution in range of 0 to 0.025. • RandomFlip, with a spatial transform value up to 2, which inverts the Z axis.
For WMH segmentation, we did not include data augmentation in our method so that we could focus on the actual intensity range of the WMH regions.

Experiment Setup
We used the PyTorch deep learning library [21] for our main framework on a workstation with an Intel i9-9900X 3.5 GHz CPU, 128G RAM, and two NVIDIA RTX 2080 11 GB GPUs. In addition, for preprocessing and augmentation, we used TorchIO library [17]. For brain tissue segmentation enhancement ( Figure 1B), we trained the CNN-based segmentation model using patches that consist of 128 samples per FLAIR MRI cropped from randomly selected locations. The size of the cropped MRI patches is 128 × 128. In addition, we used the cross-entropy loss function [22] and AdamW optimizer [23] with learning rate = 0.001 and weight decay = 0.01. For brain tissue segmentation, we divided 68 subjects with a split ratio of 0.8; 54 subjects were used for training and 14 subjects were used for validation. After model training was complete, we used grid-based sampling and aggregation to perform inference on the segmentation results.
For WMH segmentation, we distributed 308 subjects, using 277 subjects for training and 31 subjects for validation. No medical MRI-based augmentation technique was used to process the WMH segmentation to avoid confusing any of the information on the already sensitive object. The remainder of the experiment setup was the same as that of brain tissue segmentation. For the loss function, we used the DiceBCE loss function, which is a combination of the Dice loss function [24] and binary cross-entropy loss function [22], to handle the varying sizes of WMH regions. Moreover, we used the AdamW optimizer [23] with learning rate = 0.001 and weight decay = 0.01.

Evaluation for Brain Tissue Segmentation
The co-registered pseudo-labels of brain tissue cannot be interpreted as the ground truth due to noise, isolated regions, and misled hole results from using the co-registration method. Therefore, we measured the relative volume difference between the labels from T1w MRI using FreeSurfer and the labels predicted from FLAIR MRI using our proposed method. The relative volume difference is defined as follows: Relative Difference (X, X re f erence ) = |X − X re f erence | X re f erence * 100

Evaluation for WMH Segmentation
To evaluate the performance of WMH segmentation, we measured the Dice overlap score [25], which measures the similarity between the ground truth label and the prediction label.
Because of the variance in size in the WMH segmentation, we expect several false positives and false negatives in the predicted segmentation of WMH. Therefore, we also measured the ratio of true positives to positive predictions (the precision), the ratio of true positives to all predictions (the recall), and the weight of precision and recall (the F1 score). Figure 4 presents the result of brain tissue segmentation from each model. By comparing each model's predicted label with its pseudo-label, we can observe that all three models (U-Net++, U-Net, and HighRes3DNet) have made changes in initially empty regions.

Measured Volume Comparisons: Brain Tissue Segmentation
By comparing the volume obtained by each model to the volume obtained by FreeSurfer from the T1w MRI, we can measure the average relative volume difference (U-Net++, 4.8 ± 2.0; U-Net, 4 Table 3. Relative difference between the FreeSurfer labels (T1w) and in the pseudo-labels, U-Net++, U-Net, and High-Res3DNet. GM, gray matter; SD, standard deviation; WM, white matter.

Measurement Brain Tissue Pseudo Label (FLAIR) U-Net++ U-Net HighRes3DNet
Relative Difference (%, mean ± SD)  Figure 5 shows the results of WMH segmentation from each model. For comparably large WMH regions, all three models had predictions similar to the ground truth. However, many false positives and false negatives were found in small WMH regions.

Dice Overlap Scores: WMH Segmentation
As shown in Table 4, U-Net performed best on the dice overlap score with 0.81 ± 0.07, and f1 score with 0.84 ± 0.04. HighRes3DNet performed the best on recall with 0.92 ± 0.06, yet the score showed the lowest dice overlap score. As Figure 6 demonstrates, U-Net had the best performance with the lowest range of the interquartile range (IQR) and its whiskers.

Discussion
In this study, we developed a reliable automated segmentation method using FLAIR for WMH and cortical volume without the need for 3D T1w MRIs. In clinical practice, it can be difficult to obtain 3D T1w MRIs because of the long scan time, magnetic resonance machine performance, and the patient's condition. In contrast, FLAIR MRI is a more common and essential sequence for evaluating the brain and is easy to obtain in routine practice, so this method is applicable to more patients.
As it is shown in Figure 7, the result of brain tissue segmentation enhancement suggests that semi-supervised learning was able to process the direct training of brain tissue segmentation only on FLAIR MRI. By following the procedure shown in Figure 1, the transformation between two different MRI modalities (T1w and FLAIR) could be made with improvements to the labeling quality. As we intended, our method was able to measure the volume of brain tissue and WMH using only FLAIR MRI.

Performance of Brain Tissue Segmentation
We evaluated the performance of the segmentation by comparing the relative difference in volume of the results obtained using T1w MRI and its paired FLAIR MRI. However, the difference in relative volumes was less than 10% for all three models: 0.86 for U-Net, 0.85 for U-Net++, and 0.81 for HighRes3DNet. Considering that a relative difference of 3.4% already existed in the pseudo-labels for FLAIR MRI, we conclude that our method is sufficient for FLAIR MRI segmentation.
However, we noticed that our segmentation method has a limitation due to the absence of ground truth. Therefore, in a further study, we would like to evaluate our generated brain tissue pseudo-labels using radiologists and measuring the Dice overlap score.

Performance of WMH Segmentation
We compared the Dice overlap between the ground truth and the prediction for each model. We figured U-Net, with a Dice overlap score of 0.81 ± 0.07, a precision of 0.88 ± 0.05, a recall of 0.80 ± 0.08, and an F1 score of 0.83 ± 0.05, is the most balanced segmentation architecture of the comparison models.
Even though U-Net++ and HighRes3DNet had lower dice overlap scores compared to U-Net, as shown in Figure 6, we could still demonstrate that any convolutional networkbased segmentation architecture is qualified for WMH segmentation.

Clinical Relevance and Application
In clinical practice, the proposed algorithm will be a useful screening tool for the quantification of cortical volume and WMH burden in the elderly with cognitive impairment using only 2D FLAIR MRIs and without the need for 3D T1 volume MRIs. This will be beneficial because it is difficult to obtain 3D T1 volume MRIs in the elderly because of the long scan time and need for a high-performance magnetic resonance machine to obtain good quality data. T2, SWI, and GRE are also known for the difficulty of obtaining brain tissue labels from these modalities. This is because it is effectively impossible to obtain structural information from these MRIs without the paired T1w MRI. As mentioned before, there are many medical segmentation studies based on deep learning, but few considered obtaining the structural information from any single modality other than T1w MRI.

Conclusions
We introduced a semi-supervised learning method for brain tissue segmentation using only FLAIR MRI. With our brain segmentation results, we demonstrated that our FLAIR MRI segmentation is just as reliable as segmentation using its paired T1w MRI. We moreover showed that brain tissue segmentation and WMH segmentation could be performed from a single FLAIR MRI. Furthermore, the results indicate that our semisupervised learning method is not limited to FLAIR MRI but could also be applied to T2, SWI, and GRE MRIs without the need to obtain brain tissue labels from paired T1w MRIs. We believe our semi-supervised learning method has the clinical potential of being a key solution for quantifying cortical volume and WMH burden for WMH analysis using FLAIR MRI exclusively.