A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development

Luu, Minh Sao Khue; Benedichuk, Margaret V.; Roppert, Ekaterina I.; Kenzhin, Roman M.; Tuchinov, Bair N.

doi:10.3390/jimaging11120454

Open AccessReview

A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development

by

Minh Sao Khue Luu

^*

,

Margaret V. Benedichuk

,

Ekaterina I. Roppert

,

Roman M. Kenzhin

and

Bair N. Tuchinov

The Artificial Intelligence Research Center, Novosibirsk State University, 630090 Novosibirsk, Russia

^*

Author to whom correspondence should be addressed.

J. Imaging 2025, 11(12), 454; https://doi.org/10.3390/jimaging11120454

Submission received: 27 October 2025 / Revised: 11 December 2025 / Accepted: 13 December 2025 / Published: 18 December 2025

(This article belongs to the Special Issue Self-Supervised Learning and Multimodal Foundation Models for AI-Driven Medical Imaging)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The development of foundation models for brain MRI depends critically on the scale, diversity, and consistency of available data, yet systematic assessments of these factors remain scarce. In this study, we analyze 54 publicly accessible brain MRI datasets encompassing over 538,031 scans to provide a structured, multi-level overview tailored to foundation model development. At the dataset level, we characterize modality composition, disease coverage, and dataset scale, revealing strong imbalances between large healthy cohorts and smaller clinical populations. At the image level, we quantify voxel spacing, orientation, and intensity distributions across 14 representative datasets, demonstrating substantial heterogeneity that can influence representation learning. We then perform a quantitative evaluation of preprocessing variability, examining how intensity normalization, bias field correction, skull stripping, spatial registration, and interpolation alter voxel statistics and geometry. While these steps improve within-dataset consistency, residual differences persist between datasets. Finally, a feature-space case study using a 3D DenseNet121 shows measurable residual covariate shift after standardized preprocessing, confirming that harmonization alone cannot eliminate inter-dataset bias. Together, these analyses provide a unified characterization of variability in public brain MRI resources and emphasize the need for preprocessing-aware and domain-adaptive strategies in the design of generalizable brain MRI foundation models.

Keywords:

brain MRI; public datasets; foundation models; data harmonization; preprocessing variability; covariate shift

1. Introduction

Brain diseases such as tumors, Alzheimer’s disease, multiple sclerosis, and stroke affect millions worldwide, leading to significant health and societal burdens [1]. Magnetic resonance imaging (MRI) has become the central modality for studying these conditions due to its non-invasive nature, superior soft tissue contrast, and ability to capture diverse anatomical and physiological information across multiple sequences. While originally developed for clinical decision-making, the rapid expansion of publicly available MRI datasets has transformed neuroimaging into a data-driven domain where large-scale machine learning, particularly foundation models, now plays a pivotal role.

Foundation models, first established in natural language processing and computer vision [2], are increasingly being explored for medical imaging [3,4,5,6,7]. Their promise lies in learning generalizable representations from heterogeneous data and transferring them to a wide range of downstream tasks. However, the success of such models in brain MRI crucially depends on the availability of harmonized, large-scale datasets. Unlike other imaging domains, brain MRI suffers from high heterogeneity: multiple acquisition protocols, diverse sequence types (e.g., T1w, T2w, FLAIR, DWI), inconsistent annotations, fragmented repositories, and variable licensing terms. This fragmentation presents a unique challenge for developing general-purpose models.

A number of surveys and benchmarks have attempted to catalog medical imaging datasets, but they fall short in key ways when viewed from the perspective of brain MRI foundation models. For instance, MedSegBench [8] aggregates 35 datasets across modalities yet includes only one brain MRI dataset and provides no analysis of voxel-level heterogeneity or preprocessing standards. Similarly, Dishner et al. [9] cataloged 110 radiology datasets (49 brain MRI) but spanned too broad an anatomical scope to address brain-specific challenges such as sequence diversity and harmonization [10]. Other focused reviews, e.g., on glioma datasets [11,12], provide valuable clinical and molecular context but rarely analyze imaging metadata (e.g., voxel resolution, intensity distributions, or missing modalities) that directly influence pretraining strategies. Even highly influential initiatives like the BraTS Challenge [13,14,15,16] have advanced reproducibility and benchmarking but rely on heavily preprocessed data, which reduces heterogeneity and thus limits real-world generalization. In short, prior surveys tend to be either too broad (spanning many anatomical domains) or too narrow (focusing on a single disease), and they often omit the image- and preprocessing-level variability most relevant for foundation model development.

This review addresses these gaps. We provide a structured and multi-level assessment of public brain MRI datasets with a specific focus on their suitability for foundation model training. Unlike prior works, we move beyond cataloging and explicitly quantify variability across dataset-level and image-level properties. We also evaluate the effects of preprocessing choices, which remain a largely underexplored source of covariate shift. Our analysis is designed to bridge the disconnect between dataset curation and model pretraining, highlighting practical considerations for building harmonized resources.

Our contributions are fourfold:

(i): Dataset-level review: We review 54 adult 3D structural brain MRI datasets covering over 538,031 subjects. This includes detailed analysis of modality composition, disease coverage, dataset scale, and licensing diversity, revealing major imbalances between healthy and clinical populations that influence pretraining data design.
(ii): Image-level profiling: We perform a quantitative comparison of voxel spacing, image orientation, and intensity statistics across 14 representative datasets. This analysis exposes strong variation in geometric resolution and contrast distribution, which can affect how foundation models learn anatomical and pathological features.
(iii): Quantitative evaluation of preprocessing variability: We measure how bias field correction, intensity normalization, skull stripping, registration, and interpolation modify voxel-level statistics and geometry across datasets.
(vi): Feature-space analysis of residual covariate shift: Using a 3D DenseNet121, we quantify cross-dataset divergence that remains after full preprocessing, linking voxel-level variability to learned representations.

Together, these contributions provide the first structured review that unifies dataset-, image-, and preprocessing-level analyses, offering practical guidelines for building harmonized and generalizable brain MRI foundation models.

2. Review Methodology

2.1. Data Collection and Selection Process

We performed a structured search for publicly available brain MRI datasets between May and June 2025. Sources included Google, Google Dataset Search, PubMed, Scientific Data, and major neuroimaging repositories such as TCIA, OpenNeuro, NITRC, CONP Portal and Synapse. Search terms combined phrases such as “public brain MRI dataset,” “open access brain MRI,” “3D structural brain MRI for AI,” and “MRI segmentation dataset,” with variations replacing “dataset” by “database.” No date restrictions were applied. Each repository entry or publication was manually reviewed to determine eligibility, and the process was repeated iteratively until no new datasets were identified, achieving data saturation.

This review focused exclusively on datasets containing 3D structural MRI of the adult human brain. Datasets were included only if they satisfied all of the following criteria:

(i): volumetric 3D structural MRI scans were available (not 2D slices or statistical maps);
(ii): subjects were adults;
(iii): at least one structural modality (e.g., T1-weighted) was included, rather than only functional or diffusion modalities (e.g., fMRI, DTI, MRA);
(vi): acquisitions were 3D static volumes (not 4D dynamic or time-resolved scans);
(v): at least 20 unique 3D scans were provided.

For multimodal datasets that additionally included fMRI, DTI, PET, or clinical assessments, only the structural MRI scans were considered in this review.

2.1.1. Screening Outcome

Our search yielded more than one hundred candidate entries across repositories and publications. After removing duplicates and excluding pediatric-only cohorts, 2D or statistical map datasets, collections with fewer than 20 scans, and datasets without accessible images, a total of 54 datasets were retained. Together, these cover 538,031 subjects and form the basis of our review.

2.1.2. Standardization of Modalities and Cohort Labels

To enable consistent comparison across heterogeneous datasets, we standardized both imaging modalities and cohort labels. The detailed mapping rules are summarized in Appendix A Table A1 (modalities) and Table A2 (cohorts). These datasets span a broad range of neurological and psychiatric conditions alongside healthy controls and vary in imaging protocols, scanner characteristics, and subject demographics. To maintain readability, only representative datasets are shown here in Table 1. A full version of this table with all 54 datasets is provided in Appendix A, Table A3.

2.1.3. Subset for Image-Level Analysis

Due to licensing restrictions and regional access limitations, only a portion of the identified datasets could be downloaded for direct inspection. We selected a group of datasets that, together, represent several major brain conditions seen in structural MRI, including brain tumors, multiple sclerosis, stroke, epilepsy, neurodegenerative diseases, and healthy controls. This gives a good spread of different clinical situations while keeping the analysis manageable. We also chose datasets with clear and consistent NIfTI files and no obvious subject overlap so that voxel-level measurements could be compared reliably. The selected datasets therefore offer a balanced and representative sample while keeping the analysis focused and tractable. To avoid redundancy, we excluded benchmark collections that merely aggregate scans from other public sources, retaining only the original datasets. The subset used for image-level profiling includes MSLesSeg [17], MS-60 [18], MSSEG-2 [19], BraTS25-MET [20], BraTS25-SSA [21,22], BraTS25-MEN [23,24,25], ISLES22 [26], EPISURG [27], OASIS-1 [28], OASIS-2 [29], IXI [30], UMF-PD [31], NFBS [32], and BrainMetShare [33].

2.2. Metadata Extraction

To enable consistent cross-dataset analysis, we programmatically loaded each image file and extracted key metadata. For every scan, we recorded spatial attributes (image dimensions, voxel spacing, orientation codes, affine matrix) and non-image attributes (modality, subject ID, session ID when available). Images outside the inclusion scope, such as DTI sequences in IXI, were excluded at this stage. All extracted metadata were stored in standardized per-dataset CSV files following a uniform schema. This structured resource forms the foundation for subsequent dataset- and image-level analyses presented in this review and is designed to facilitate reproducibility and reuse by the wider community.

Computational Resources

Computations were performed on a workstation with an NVIDIA Quadro RTX 8000 (48 GB) running CUDA 12.5 and 128 GB RAM, using Python 3.10.12 for preprocessing and PyTorch 2.2.1 for feature extraction.

3. Dataset-Level Analysis

3.1. Disease Coverage

The disease distribution analysis shown in Figure 1 reveals a pronounced imbalance across public brain MRI datasets. After separating combined cohort labels and removing the undefined “Multiple Diseases” category, Healthy subjects form the largest group, followed by Neurodegenerative disorders (approximately 8800 subjects) and Brain Tumors (around 8400 subjects). Medium-scale categories include Stroke (2300 subjects), Autism (2200 subjects), and Epilepsy (870 subjects). Smaller datasets correspond to Psychiatric Disorders (455 subjects), Multiple Sclerosis (319 subjects), and White Matter Hyperintensities (170 subjects).

This distribution highlights the structural bias of the open neuroimaging landscape. The abundance of healthy and neurodegenerative cohorts reflects the historical focus on population-based and aging studies, while chronic, diffuse, or subtle pathologies remain underrepresented. Despite the diversity of available datasets, the dominance of a few diagnostic categories implies that current public MRI data cannot fully capture the clinical heterogeneity of the brain. This skewed representation constrains comparative analysis across disease types and may perpetuate overrepresentation of high-resource conditions in future benchmarks.

For foundation models, the imbalance in disease coverage directly influences representational learning. Pretraining dominated by T1-weighted healthy and Alzheimer’s data encourages the model to learn structural regularities and global contrast variations, while subtle lesion characteristics typical of demyelinating or vascular diseases remain statistically rare. Such bias limits transferability to small-lesion or microstructural disorders. To mitigate this, pretraining datasets should deliberately balance disease composition, incorporate underrepresented conditions (e.g., MS, WMH, psychiatric disorders), and include healthy scans primarily as anatomical anchors. Transparent reporting of disease proportions is essential for understanding bias propagation during large-scale pretraining.

3.2. Dataset Scale

The analysis of dataset sizes (Figure 2) exposes an extreme imbalance in the public brain MRI landscape. A single dataset—UKBioBank—accounts for more than 500,000 subjects, while nearly all other datasets range from a few dozen to a few thousand participants. Yet when examined alongside disease coverage, the relationship between scale and content becomes more revealing: the largest datasets are almost exclusively composed of healthy or aging populations, whereas smaller datasets concentrate on specific pathologies such as brain tumors, stroke, and multiple sclerosis. In other words, data abundance is inversely correlated with clinical complexity.

For foundation models, the insight from this scale–disease relationship is profound. Pretraining must not simply accumulate images—it must balance information density against population scale. Large healthy datasets can anchor the model’s low-level feature representation, but meaningful generalization arises only when smaller, heterogeneous clinical datasets are interleaved to inject structural variability and abnormal morphology. The optimal training corpus is therefore not the largest one, but the one that combines datasets across scales and disease domains in a way that maximizes representational complementarity.

When merging datasets, several considerations follow:

Sampling balance: Naive aggregation will cause population-scale datasets to dominate optimization; adaptive weighting or stratified sampling is necessary to preserve rare clinical features.
Harmonization: Resolution, voxel spacing, and intensity normalization must be aligned to prevent the model from interpreting acquisition differences as anatomical variations.
Domain alignment: Cross-dataset normalization in feature space (e.g., domain-adversarial training or latent alignment) can reduce the domain gap between healthy and disease cohorts.

The scale analysis reveals that the most informative foundation model will not come from the largest dataset, but from the strategic fusion of small, diverse datasets with large, stable ones. Quantity establishes the foundation; diversity defines intelligence. A model pretrained under this philosophy learns both the invariant anatomy of the healthy brain and the variable morphology of disease, achieving robustness not through volume but through representational balance.

3.3. Modality Composition

The modality co-occurrence analysis (Figure 3) reveals distinct pairing patterns among structural MRI sequences across public datasets. The most frequent combination is between T1-weighted and FLAIR scans, followed by T1–T2 and T1–T1C pairs. These sequences commonly co-occur within multi-contrast structural datasets such as BraTS, ADNI, and MSSEG, where complementary contrasts are used to capture both anatomical boundaries and pathological hyperintensities. Moderate co-occurrence is also observed among FLAIR, T2, and T1C, indicating a tendency for lesion-focused studies to integrate multiple structural contrasts that highlight different tissue characteristics. In contrast, single-modality datasets remain prevalent, particularly among population studies (e.g., IXI, OASIS), which provide only T1-weighted scans.

This co-occurrence pattern demonstrates that public brain MRI datasets—though diverse—are structurally interlinked through a limited but consistent set of core modalities. The strong correlation between T1 and FLAIR availability reflects a shared acquisition strategy for anatomical delineation and lesion sensitivity, while the partial inclusion of T2 and T1C indicates dataset-specific clinical emphasis (e.g., edema or contrast enhancement). The heatmap also reveals that cross-dataset modality overlap is incomplete: no single dataset provides full structural coverage, and different combinations dominate different disease domains. This partial alignment introduces redundancy in some modalities but gaps in others when datasets are combined.

For foundation models trained on aggregated public datasets, these co-occurrence dynamics carry important consequences. The uneven intersection of modalities across datasets means that multi-contrast information is not uniformly available for all subjects. This heterogeneity can lead to modality imbalance during pretraining and complicate cross-dataset harmonization. To address this, foundation models must incorporate modality-aware mechanisms—such as learned modality embeddings or masked reconstruction objectives—that can leverage overlapping contrasts while remaining robust to missing ones. The observed co-occurrence structure also suggests that structural modalities share sufficient anatomical redundancy to enable joint representation learning: by training across datasets with partially overlapping contrasts (e.g., T1 + FLAIR from one source, T1 + T2 from another), the model can implicitly learn a unified structural feature space that generalizes across acquisition protocols. Consequently, modality co-occurrence is not merely a dataset property but a key enabler of scalable, harmonized pretraining across heterogeneous MRI corpora.

4. Image-Level Analysis

At the image level, heterogeneity in voxel geometry, orientation, and intensity introduces latent biases that can substantially affect representation learning. These properties define the physical scale, spatial consistency, and dynamic range of brain MRI data—factors that determine whether a foundation model learns anatomical invariants or dataset-specific artifacts. Our image-level analysis quantifies these factors across 14 public datasets and provides interpretative insights for model design and harmonization.

4.1. Voxel Spacing

Voxel spacing defines the physical size of each voxel along the x, y, and z axes in millimeters, determining how finely anatomical structures are represented in the image and directly influencing the learning behavior of foundation models. When voxel spacing varies across datasets, the same convolution or attention kernel covers different physical regions, leading to inconsistent representation of anatomical details, blurred or missing small lesions in thicker slices, and domain shifts when combining data. This makes voxel spacing not just a technical aspect of MRI acquisition but a key factor that shapes model generalization. It affects architectures differently: CNNs may learn biased features when scale changes, transformers can misalign patches or positional encodings, and SAM-style models often lose boundary accuracy when slices are uneven—making anisotropy a hidden source of error that limits transferability.

Figure 4 shows the 3D distribution of voxel spacings across 14 representative datasets. Most datasets cluster near isotropic spacing around

(1.0, 1.0, 1.0)

mm, indicating uniform resolution across all axes. The three BraTS collections (BraTS-MET, BraTS-SSA, BraTS-MEN), OASIS-1/2, NFBS, and IXI fall into this group, providing consistent high-quality data for model pretraining. In contrast, multiple sclerosis datasets (MS-60, MSLesSeg, MSSEG-2) and BrainMetShare exhibit moderate anisotropy, with fine in-plane resolution (

x, y \approx 0.8 - 1.0

mm) but thicker slices along the z-axis (

1.5 - 3.0

mm). This reduces sensitivity to small or thin lesions that appear across only one or two slices. Stroke and surgical datasets, such as ISLES22 and EPISURG, show the widest variability, including cases with very thick slices (

z > 4

mm) and variable in-plane spacing up to

2.0

mm. Such heterogeneity reflects differences in acquisition protocols across centers and scanners. Finally, mixed clinical datasets like UMFPD and BrainMetShare include both near-isotropic and anisotropic scans, representing real-world diversity in clinical imaging practices. These observations lead to three key insights that have direct implications for the development of foundation models: (i) many research datasets share near-isotropic resolution and are well-suited for standardized pretraining; (ii) clinical and disease-specific datasets tend to be anisotropic, introducing geometric inconsistencies that require explicit modeling; and (iii) spacing variability alone can cause measurable distribution shifts between datasets, even after resampling.

To further characterize these differences, we grouped each image into three categories based on the degree of anisotropy. We computed, for each image, the ratio between the largest and smallest spacing values among the three axes. If all spacings were equal (ratio = 1.0), the image was labeled as isotropic. If the ratio was greater than 1.0 but less than 2.0, it was labeled mildly anisotropic. Ratios of 2.0 or higher were labeled highly anisotropic.

As shown in Table 2, most images fall into the isotropic or mildly anisotropic categories—7968 and 7152 images, respectively. However, over 1700 images are highly anisotropic, indicating substantial geometric distortion, especially in slice thickness. If left uncorrected, these differences can lead to biased model learning and performance degradation across datasets.

4.2. Orientation

The orientation of MRI volumes defines how the anatomical axes of the brain are mapped to the voxel coordinate system. Each MRI scan stores its orientation using a three-letter code (e.g., RAS, LAS, LPS), which specifies the direction of the x, y, and z axes relative to the patient’s anatomy. While orientation may appear as a technical metadata field, it has a direct and critical influence on the learning behavior of foundation models. When images are stored in inconsistent orientations across datasets, identical brain structures appear in different spatial locations or mirrored configurations. This leads to misalignment in anatomical correspondences, causing the model to learn orientation-specific patterns instead of generalizable anatomical features. Therefore, harmonizing orientation is essential for foundation models to learn consistent spatial representations that can generalize across diverse datasets.

Table 3 summarizes the orientation distribution across datasets. The most common orientation is RAS (6592 images), which is the standard convention in neuroimaging software such as FSL and FreeSurfer. However, a considerable number of datasets adopt alternative conventions, including LPS (5012 images) and LAS (3473 images). These three orientations together account for over 90% of all images analyzed. Notably, several datasets contain multiple orientations internally—for instance, BraTS-MET and EPISURG each include images in both RAS and LPS forms. Less frequent orientations such as RSA, PSR, or ASL are observed in smaller datasets (e.g., OASIS, NFBS, UMFPD). The presence of such variability reflects the absence of a unified orientation policy among dataset providers, even within well-curated public repositories.

The observed orientation heterogeneity introduces a subtle but significant source of distributional shift that can impair model transferability. Models trained on mixed-orientation data without explicit normalization may implicitly encode orientation-specific spatial priors. For example, left–right inversions between RAS and LAS orientations can confuse the model’s learned feature alignment, leading to inconsistent activation patterns for homologous brain regions. Similarly, inconsistent superior–inferior axis definitions can distort 3D spatial context, reducing the model’s ability to capture global anatomical symmetry.

For foundation model pretraining, these inconsistencies compound across large-scale datasets. Since pretraining relies on learning generic spatial and structural representations, uncorrected orientation differences can fragment the learned latent space, causing the model to associate the same anatomy with distinct feature embeddings depending on orientation. This weakens the universality of learned representations and increases the burden on fine-tuning.

Hence, orientation harmonization is not merely a preprocessing detail but a foundational requirement for effective cross-dataset learning. Converting all volumes to a common convention (typically RAS) before model training ensures that spatial relationships are consistent across datasets. For large-scale pretraining pipelines, we recommend enforcing explicit orientation standardization as part of dataset ingestion. Such harmonization minimizes unnecessary domain shifts, allowing the foundation model to focus on learning biologically meaningful anatomy rather than orientation artifacts.

4.3. Image Intensity Distribution

Image intensity represents the voxel-wise signal values within MRI scans and encapsulates the physical properties of tissues as captured by different imaging sequences. Intensity distributions are shaped by scanner hardware, acquisition protocols, and post-processing pipelines such as bias-field correction or intensity normalization. For foundation models, which depend on large-scale data aggregation from diverse sources, inconsistent intensity scaling or contrast profiles can substantially affect representation learning. A model trained on non-harmonized intensity profiles may implicitly overfit to dataset-specific brightness ranges, thereby reducing its ability to generalize across unseen domains.

Figure 5 illustrates the distribution of median voxel intensities across representative datasets. Datasets such as EPISURG, OASIS-1, OASIS-2, and IXI exhibit wide intensity variability, whereas others (e.g., the BraTS series, ISLES22, MSLesSeg, and BrainMetShare) show lower and more stable median values. This disparity likely arises from differences in scanner calibration, rescaling conventions (e.g., 0–255 versus z-scored), and preprocessing intensity normalization methods. The OASIS datasets, for example, show extensive dispersion with median intensities exceeding 300, reflecting a broad dynamic range and the absence of uniform scaling. In contrast, the BraTS and MS-related datasets exhibit tight clusters around zero, suggesting that bias correction and standardized normalization were consistently applied.

These differences have several implications for foundation model development. First, heterogeneous intensity distributions introduce latent biases that may lead a model to associate tissue contrast with dataset identity rather than underlying anatomy. This undermines the objective of learning scanner- and modality-invariant representations. Second, extreme intensity outliers—particularly in datasets with mixed acquisition conditions—can destabilize loss optimization during pretraining by distorting the input statistics used by normalization layers. Conversely, datasets with highly standardized intensity ranges, while beneficial for stable convergence, may limit the model’s exposure to real-world variability and thus reduce robustness during fine-tuning on unnormalized clinical data.

From a model design perspective, these findings highlight the importance of preprocessing-aware normalization strategies. Dynamic intensity scaling or adaptive histogram alignment could be implemented within the data loading pipeline to ensure consistent contrast across datasets. Alternatively, self-supervised objectives that promote intensity-invariant representations (e.g., histogram-matching augmentations or contrast consistency losses) may help the model decouple anatomical features from brightness variations. Ultimately, balancing intensity harmonization for stable training with sufficient distributional diversity for adaptability remains a key challenge for developing robust and generalizable MRI foundation models.

To quantitatively assess whether these intensity differences are statistically significant, we applied the Kruskal–Wallis H test to the per-image median values grouped by dataset. The result was highly significant (

H = 15,093.849

,

p < 0.0001

), confirming that the observed inter-dataset variations are not due to random fluctuation. This non-parametric test evaluates whether at least one group differs in median from the others, without assuming a specific underlying distribution. The extremely low p-value supports the visual findings in Figure 5, indicating that intensity scaling differences across datasets are real, systematic, and substantial.

5. Intra-Dataset Patient-Level Analysis

To understand how variability appears not only across datasets but also within a single dataset, we conducted patient-level analyses on three representative collections: MSLesSeg, BraTS-MET, and IXI. These datasets were chosen because they highlight different kinds of internal heterogeneity. MSLesSeg shows variation in the number of longitudinal timepoints per patient, BraTS-MET illustrates multi-center and multi-orientation effects even within a curated challenge dataset, and IXI demonstrates scanner-related differences in a healthy cohort. For each dataset, we extracted patient-level metadata to summarize characteristics that may affect model design, training stability, or evaluation reliability. A broader summary of patient-level heterogeneity across all datasets considered in this study—including variation in timepoints, sites, and scanner field strengths—is provided in Table A4. The complete metadata in CSV format is available in the Supplementary Materials.

In MSLesSeg, patients do not all have the same number of MRI timepoints. Most patients (

n = 50

) have only one timepoint, while others have two (

n = 15

), three (

n = 5

), or four (

n = 5

). This means the dataset mixes mostly single-timepoint data with a smaller amount of longitudinal follow-up. Such differences matter because patients with several timepoints show how lesions change over time, while single-timepoint patients provide only a snapshot. Models that use temporal information need to account for this mixture to avoid biasing toward the much more common single-timepoint cases.

BraTS-MET, despite being a standardized competition dataset, exhibits its own form of intra-dataset variation due to its multi-center construction. The cohort aggregates scans from numerous institutions with differing acquisition practices, leading to variability in voxel geometry, intensity behavior, and image orientation. Orientation is one clear example of this internal diversity: while most patients follow the RAS convention (

n = 1359

), a substantial subset uses LPS (

n = 110

), and a very small number use LAS (

n = 6

). Such differences can affect both preprocessing and training, as inconsistent handling of axes may result in flipped anatomy, misaligned labels, or unintended orientation biases. Even in competition-grade datasets, this type of internal heterogeneity underscores the need for careful preprocessing and standardized orientation handling, especially when training large foundation models.

The IXI dataset illustrates yet another kind of internal variability. Although often treated as a uniform dataset, IXI contains images from different hospitals and MRI scanners. As shown in Figure 6, the scanners differ in how many scans they contribute, the voxel spacing they use, their intensity distributions, and how much of the brain each scan covers. As a result, a model trained solely on IXI is still exposed to multiple acquisition styles rather than a single, consistent imaging domain.

Variation inside a single dataset is not unusual; similar effects appear in natural-image datasets, but MRI variation is more difficult for models to handle because voxel geometry and intensity scale are tied directly to the physical acquisition process. If such differences are not addressed, a model may learn scanner-specific or site-specific cues instead of anatomical patterns, leading to less robust and less generalizable representations. This risk is especially strong when one scanner or orientation dominates the dataset, allowing its acquisition style to overshadow minority subgroups.

In practice, medical foundation models benefit from simple but important preprocessing steps such as resampling to a common spacing, intensity normalization, orientation standardization, and balanced sampling across subgroups (e.g., scanners, orientations, or timepoint counts). These steps help the model focus on anatomy rather than acquisition identity and help reduce internal domain shifts. Addressing intra-dataset variability is necessary even before combining datasets, as meaningful differences already exist within each dataset on its own.

6. Evaluation of Preprocessing Effects on Image Harmonization

To systematically evaluate the impact of preprocessing on data harmonization, we randomly sampled images from the curated datasets and applied a standardized pipeline comprising bias-field correction, intensity normalization, skull stripping, and spatial registration. The resulting images were analyzed through voxel-wise statistical comparisons and qualitative visual inspection to assess improvements in inter-dataset consistency and anatomical fidelity.

6.1. Intensity Normalization

Intensity normalization is the process of adjusting MRI voxel values to a common scale so that images from different scanners or subjects become comparable. The most common techniques include z-score normalization, histogram matching, and WhiteStripe normalization. Z-score normalization rescales each image to have zero mean and unit variance, reducing intensity range differences; it is best used as a simple, general method when datasets are diverse or lack a consistent reference. Histogram matching aligns the intensity distribution of each image to that of a reference scan or template, making it ideal for multi-site datasets with large scanner or protocol variability. WhiteStripe normalization uses the intensity range of normal-appearing white matter to anchor scaling, which is most effective for brain studies where maintaining tissue contrast is important.

As summarized in Table 4, the original voxel intensities span a wide range, reflecting strong contrast between bright enhancement regions and darker tissues. After applying z-score normalization, the intensity distribution becomes centered around zero with reduced variance, resulting in a more uniform and balanced appearance across tissues. However, this transformation also alters the visual contrast, as shown in Figure 7: some brain regions appear brighter, while fine structural details become less pronounced. This effect occurs because z-score normalization rescales voxel values relative to the global mean and standard deviation, thereby compressing the overall dynamic range and reducing intensity extremes.

When building foundation models, intensity normalization should be applied consistently across all datasets to prevent artificial domain shifts. The chosen method must preserve relative tissue contrast while harmonizing global intensity ranges. It is also beneficial to expose the model to multiple normalization styles during pretraining, helping it learn invariance to contrast variations. Finally, combining preprocessing-based normalization with learnable normalization layers (e.g., instance or adaptive layer normalization) allows the model to adapt dynamically to unseen data while maintaining stable, harmonized feature representations.

6.2. Bias Field Correction

Bias field correction adjusts MRI images to remove gradual brightness variations caused by uneven magnetic fields or coil sensitivity. These variations make some regions look brighter or darker even when the tissue is the same, so correction helps make the intensity more uniform across the brain. The popular methods include N4ITK (N4 bias field correction), N3 (nonparametric nonuniform intensity normalization), and SPM’s unified segmentation approach. In this review, we applied N4ITK bias correction with modality-specific tuning, such as adjustments included enhanced smoothing for FLAIR, brain masking for T1C images, and balanced settings for T1 and T2, using the SimpleITK implementation.

Representative examples are shown in Figure 8. The raw image (top-left) displays uneven brightness—the left side of the brain appears darker due to scanner-related field inhomogeneity. After correction (top-second), the preprocessed image shows more uniform brightness across tissue regions, while the estimated bias field map (top-third) captures the smooth multiplicative field responsible for this nonuniformity. The intensity histograms reveal that voxel intensities have shifted and become more compact, indicating reduced variation between bright and dark areas. The horizontal and vertical profiles show that peaks corresponding to white matter and gray matter are now closer in amplitude, confirming improved intensity consistency. The intensity correlation plot (r = 0.823) shows that the correction maintains overall intensity relationships but rescales them toward a more uniform distribution. Quantitatively, as shown in Table 5, the coefficient of variation decreases (0.207 → 0.163), meaning intensity variability within tissue is reduced, while the signal-to-noise ratio (SNR) remains similar (6.87 → 6.50), suggesting correction did not distort contrast or amplify noise. The difference map highlights smooth intensity shifts, with no sharp artifacts.

While bias correction helps standardize input intensities for foundation model training, its effects vary with modality, anatomy, and pathology. Overcorrection may reduce lesion contrast or introduce distortions, while undercorrection can leave scanner-specific artifacts. Hence, visual and quantitative validation is essential, particularly when aggregating multi-source data.

6.3. Skull Stripping

The primary goal of skull stripping is to remove non-brain tissue, such as the skull, scalp, and dura mater, from the image. This is a critical step as these tissues have high-intensity signals that can interfere with intensity normalization and confuse segmentation algorithms. Common tools include FSL’s Brain Extraction Tool (BET) [34], AFNI’s 3dSkullStrip [35], and more recently, deep learning-based methods like HD-BET [36], which often provide more accurate results. While most datasets in our analysis are provided pre-stripped (e.g., BraTS, ISLES22), the specific algorithm used often varies or is not documented, leading to subtle differences in the final brain mask. Figure 9 illustrates the effect of skull stripping on a PD image from the IXI dataset, where non-brain tissues such as the scalp and skull are successfully removed, leaving only the intracranial structures for further analysis.

From a foundation model standpoint, skull stripping can influence both pretraining and downstream transfer. When training models across multiple datasets, consistent skull stripping helps reduce non-biological variability and ensures that the model focuses on relevant brain structures. However, inconsistency across datasets—where some scans are stripped and others are not—can lead to feature-space fragmentation, causing the model to learn dataset-specific biases rather than generalizable brain representations. Therefore, strict harmonization of preprocessing pipelines, including identical skull stripping tools, thresholds, and quality-control procedures, is essential.

Moreover, the choice to strip or retain the skull should align with the model’s target scope. For models designed to capture brain-centric features—such as lesion segmentation, cortical parcellation, or morphometric analysis—skull stripping is generally beneficial, as it directs attention to intracranial tissues. Conversely, for models intended to generalize across multi-modal or multi-organ contexts (e.g., MRI–CT alignment, PET fusion, or structural-to-functional transfer), removing the skull can limit cross-modality correspondence and reduce anatomical completeness. A practical strategy for large-scale foundation model pretraining is to include both stripped and unstripped variants of each scan and use metadata tags or preprocessing embeddings to inform the model about their origin. This dual representation encourages robustness to preprocessing differences and enables the model to learn invariance to skull presence—an increasingly important capability for generalizable medical foundation models.

6.4. Spatial Registration to MNI152

Spatial registration aims to align MRI volumes into a common anatomical space, reducing spatial variability across datasets. Using a modality-aware ANTs pipeline with rigid–affine–SyN transformations, we aligned representative scans to the MNI152 template. This process standardizes brain geometry but also exposes how registration can reshape anatomical statistics in subtle, dataset-specific ways.

Figure 10 shows the registration effect for a T2-weighted image from BraTS-MEN. The aligned scan closely matches the MNI template, and quantitative metrics confirm high structural similarity (mutual information = 0.974, structural similarity = 0.641). However, resampling expanded the image volume by 26.9%, and local correlation (

r = - 0.217

) indicates that voxel intensity relationships were partially altered. Overlay maps and checkerboard comparisons highlight that most deviations occur near lesion borders and ventricles—regions where pathology or intensity nonuniformity interact poorly with the template deformation.

These findings reveal an essential trade-off. Registration improves spatial consistency across datasets, supporting template-based feature extraction and patch sampling. Yet, excessive geometric forcing can distort pathological anatomy and attenuate lesion contrast, especially in heterogeneous clinical data. For foundation model pretraining, this suggests that full MNI normalization may be beneficial only for structural harmonization, while native-space training augmented with local spatial perturbations could better preserve disease-specific variability and improve cross-domain generalization.

6.5. Interpolation of Thin-Slice Volumes

Several clinical datasets, such as MS-60, contain scans with limited z-axis coverage or thick slices, producing anisotropic volumes that hinder 3D convolutional learning. To mitigate this, we applied an automated interpolation procedure that increases through-plane resolution while maintaining anatomical scale. This step is not simply geometric resampling—it directly determines how well small, low-contrast lesions are represented in 3D feature space.

Figure 11 illustrates a FLAIR image from the MS-60 dataset before and after interpolation. The original scan (13 slices) shows severe discontinuities and collapsed tissue boundaries, whereas the interpolated version (64 slices) restores smoother cortical contours and continuous sulcal structures without distorting global shape. Quantitatively, the effective slice thickness decreased by approximately 4.8×, enabling isotropic patch extraction for pretraining and consistent input dimensions across datasets.

From a foundation model perspective, interpolation functions as a structural equalizer: it harmonizes volumetric resolution across sources, improving patch uniformity and kernel receptive fields. However, it also generates synthetic voxels that may obscure very small hyperintensities or produce interpolation artifacts along lesion edges. Thus, interpolation should be applied selectively—preferably on high-anisotropy datasets or in conjunction with uncertainty-aware augmentations—to balance geometric consistency and lesion fidelity.

7. Residual Covariate Shift After Preprocessing

Despite standardized preprocessing, inherent heterogeneity in MRI data from diverse sources introduces residual covariate shift, which can impede the generalizability of deep learning models. This shift appears as subtle variations in noise patterns, intensity scaling, and remaining artifacts that preprocessing cannot fully remove. To examine this effect, we analyzed T1-weighted MRI scans of healthy subjects from two public datasets—NFBS (125 images) and a subset of IXI (54 images)—after applying a uniform pipeline consisting of skull stripping, N4 bias field correction, MNI152 registration, and intensity normalization. A single central axial slice was extracted from each volume to reduce computational cost while preserving key contrast differences. To characterize the remaining variability in feature space, we used DenseNet121 pretrained on ImageNet as a fixed feature extractor. This model offers a neutral and widely adopted representation that does not depend on either dataset, allowing observed differences to reflect genuine dataset variation rather than model training effects. From this network, 1024-dimensional feature vectors were obtained from the penultimate layer without fine-tuning. Quantitative assessment of the resulting feature differences between datasets is summarized in Table 6.

Despite a high cosine similarity, indicative of similar vector directions, a substantial Euclidean distance and average Wasserstein distance highlight significant shifts in the magnitude and distribution of features. Statistical analysis further confirmed this divergence: 83.89% of all features exhibited statistically significant differences (

p < 4.88 \times 10^{- 5}

) after Bonferroni correction). These findings conclusively demonstrate that standard preprocessing is insufficient for complete MRI data harmonization. The persistent residual covariate shift in the learned feature space critically impairs model robustness and transferability across unseen domains. Therefore, developing and implementing explicit domain adaptation strategies—such as disentangled representation learning, meta-learning for domain generalization, and robust uncertainty estimation—is paramount for building truly generalizable and clinically reliable models. This is particularly crucial for the advancement of foundation models in high-stakes medical imaging applications.

8. Practical Considerations for Using Public Brain MRI Datasets

When combining open-source datasets from many public repositories—many of which contain raw scans with artifacts and large differences in acquisition—machine learning models often struggle to learn useful patterns. Without consistent preprocessing, the noise and inconsistencies make training less effective. In addition, because hospitals and imaging centers use different storage formats and protocols, patient data can be split across systems, duplicated, or incomplete. These issues create misleading patterns and hide important clinical signals, even for advanced models.

To reduce these problems, researchers should standardize all scans with steps such as resampling to isotropic voxel spacing, intensity normalization, and consistent orientation. For datasets collected from multiple sites or using different protocols, it is important to use consistent sequence names, keep scanner and site information as metadata, and apply corrections like z-score normalization or bias-field correction when needed. Missing modalities should be handled with flexible model designs (e.g., modality-aware embeddings or placeholder channels). Balanced sampling or stratified batches can help prevent models from overfitting to specific subgroups in diverse datasets. Together, these choices support the creation of strong and reliable pretraining datasets for brain MRI foundation models.

Beyond preprocessing, strict standardization and quality checks—including detecting errors—are essential. Expert clinical validation is also critical. Statistical improvements alone are not enough: a model that performs well on one dataset may fail—or even cause harm—when applied in a different clinical setting.

9. Limitations

This review has several limitations that reflect the scope we intentionally defined for this study. First, we focused exclusively on adult structural MRI datasets. This allowed us to compare voxel geometry, intensity behavior, and preprocessing effects in a consistent way, but it means that diffusion MRI, fMRI, quantitative MRI, and pediatric datasets were not included. These modalities and populations involve different acquisition characteristics and would require separate, modality-specific analyses.

Second, the landscape of publicly available brain MRI datasets is shaped by what institutions are able and willing to release. Most datasets come from Western or other highly developed regions, which creates a geographic imbalance that reflects the current availability of open data rather than a choice made in this review. Public datasets also carry selection bias: they are often collected in research-oriented or well-resourced clinical environments, and therefore tend to include higher-quality scans and focus on certain conditions such as Alzheimer’s disease, brain tumors, and healthy adults. Other clinical populations—such as psychiatric disorders, multiple sclerosis, vascular diseases, and routine lower-quality scans—are less represented or absent. As a result, the dataset-level and image-level patterns observed here may not fully generalize to the broader clinical landscape.

Finally, this study did not evaluate annotation quality or perform model benchmarking. These analyses are important for understanding how dataset variability affects downstream performance but fall outside the scope of the present work. Future studies could extend this review by examining annotation consistency, including more diverse datasets, and assessing how different pretraining corpora influence model behavior.

10. Conclusions & Discussion

In this study, we conducted a structured, multi-level analysis of 54 publicly accessible adult structural brain MRI datasets to characterize the variability most relevant to foundation model development. At the dataset level, our review highlights substantial imbalance in both scale and diagnostic coverage: large healthy and aging cohorts dominate the landscape, whereas clinically complex populations such as stroke, multiple sclerosis, and psychiatric disorders remain comparatively underrepresented. This skewed distribution implies that naïvely aggregated pretraining corpora may disproportionately reflect healthy anatomical priors while providing limited exposure to subtle or heterogeneous pathology.

Image-level profiling further revealed considerable heterogeneity in voxel spacing, orientation conventions, and intensity distributions across datasets. Although many research collections share near-isotropic resolution, clinical datasets frequently exhibit strong anisotropy and mixed orientation formats. Intensity statistics likewise vary systematically between datasets, as confirmed by the Kruskal–Wallis analysis. These findings indicate that foundational representation learning is shaped not only by biological variation but also by acquisition- and site-dependent factors that can introduce measurable covariate shifts.

Our quantitative evaluation of preprocessing pipelines demonstrates that standard steps such as bias-field correction, intensity normalization, skull stripping, registration, and interpolation improve within-dataset consistency but do not fully harmonize cross-dataset distributions. The feature-space assessment using a 3D DenseNet121 supports this observation: even after full preprocessing, non-trivial divergence persists between datasets, suggesting that harmonization must extend beyond conventional preprocessing. This underscores the need for preprocessing-aware architectures, modality-robust sampling, and domain adaptation strategies capable of handling real-world variability.

Together, our analyses provide an integrated view of dataset-, image-, and preprocessing-level variability and outline practical considerations for developing harmonized, robust, and generalizable brain MRI foundation models.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/jimaging11120454/s1: Table S1: Full patient-level metadata for all analyzed public brain MRI datasets.

Author Contributions

Conceptualization, M.S.K.L.; methodology, M.S.K.L.; software, M.S.K.L., M.V.B. and E.I.R.; formal analysis, M.S.K.L., M.V.B. and E.I.R.; data curation, M.S.K.L. and M.V.B.; writing—original draft preparation, M.S.K.L., M.V.B., E.I.R. and R.M.K.; writing—review and editing, R.M.K. and B.N.T.; visualization, M.S.K.L. and E.I.R.; supervision, B.N.T.; project administration, B.N.T.; funding acquisition, B.N.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Economic Development of the Russian Federation in accordance with the subsidy agreement with the Novosibirsk State University dated 17 April 2025 grant number No. 139-15-2025-006: IGK 000000C313925P3S0002.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. All data analyzed in this study are publicly available in their original sources as cited throughout the manuscript.

Acknowledgments

This manuscript was prepared and refined with the assistance of ChatGPT (GPT-5, OpenAI, 2025) for language enhancement and clarity. The authors have reviewed and edited the content generated by the tool and take full responsibility for the final version of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Standardization of imaging modalities across datasets.

Standardized Label	Original Variants Grouped
T1	Anatomical T1-weighted scans
T1C	Contrast-enhanced T1-weighted scans
DWI/DTI	DWI, DTI, ADC, TRACE
fMRI/rs-fMRI	BOLD, resting-state fMRI, task-fMRI
Others	SWI, ASL, PET, CT, MRA, MEG

Table A2. Standardization of cohort labels across datasets.

Standardized Label	Original Variants Grouped
Healthy	Healthy
Stroke	Stroke
Multiple Sclerosis	Multiple Sclerosis
Brain Tumor	Glioma, Glioblastoma, Meningioma, Metastasis
Neurodegenerative	Alzheimer’s Disease, Parkinson’s Disease
Psychiatric Disorder	Schizophrenia, Bipolar Disorder, Major Depressive Disorder (MDD)

Table A3. Summary of included brain MRI datasets.

Dataset	Modality	Cohort	#Subjects
[18F]MK6240 [37]	T1, Others	Healthy	33
ABIDE-I [38]	T1	Autism, Healthy	1112
ABIDE-II [39]	T1	Autism, Healthy	1114
ADNI [40,41]	T1, T2, FLAIR	Neurodegenerative	4068
AOMIC-ID1000 [42,43]	T1, DWI/DTI, fMRI/rs-fMRI	Healthy	928
AOMIC-PIOP1 [43,44]	T1, DWI/DTI, fMRI/rs-fMRI	Healthy	216
AOMIC-PIOP2 [43,45]	T1, DWI/DTI, fMRI/rs-fMRI	Healthy	226
ARC [46,47]	T1, T2, FLAIR, DWI/DTI, fMRI/rs-fMRI	Stroke	230
ATLAS R2.0 [48]	T1	Stroke	955
BBSRC [49]	T1, DWI/DTI, fMRI	Healthy	34
BrainMetShare [33]	T1, T1C, FLAIR	Brain Tumor	156
BraTS-GLI (2025) [15,50]	T1, T1C, T2, FLAIR	Brain Tumor	1809
BraTS-MEN (2025) [23,24,25]	T1, T1C, T2, FLAIR	Brain Tumor	750
BraTS-MET (2025) [20]	T1, T1C, T2, FLAIR	Brain Tumor	1778
BraTS-SSA (2025) [21,22]	T1, T1C, T2, FLAIR	Brain Tumor	95
CC-359 [51]	T1	Healthy	359
DLBS [52]	T1, T2, FLAIR, DWI/DTI, fMRI/rs-fMRI	Healthy	464
EDEN2020 [53]	T1C, FLAIR, DWI/DTI, Others	Brain Tumor, Healthy	45
EPISURG [27]	T1	Epilepsy	430
GSP [54]	T1, DWI/DTI, fMRI/rs-fMRI	Healthy	1570
HBN-SSI [55]	T1, Others	Healthy	13
HCP [56]	T1, fMRI/rs-fMRI	Healthy	1200
ICTS [57]	T1C	Brain Tumor	1591
IDB-MRXFDG [58]	T1, FLAIR, Others	Healthy	37
IDEAS [59]	T1, FLAIR	Epilepsy	442
ISLES22 [26]	T1, T2, DWI/DTI, FLAIR	Stroke	400
IXI [30]	T1, T2, DWI/DTI	Healthy	581
MBSR [60]	T1, DWI/DTI, fMRI	Healthy	147
Brain Tumor-SEG-CLASS [61]	T1, T1C, FLAIR	Brain Tumor	96
MGH Wild [62]	T1, T2, FLAIR	Healthy	1110
MICA-MICs [63]	T1, DWI/DTI, fMRI/rs-fMRI	Healthy	50
MOTUM [64]	T1, T1C, T2, FLAIR	Brain Tumor	66
MS-60 [18]	T1, T2, FLAIR	Multiple Sclerosis	60
MSLesSeg [17]	T1, T2, FLAIR	Multiple Sclerosis	75
MSSEG-2 [19]	FLAIR	Multiple Sclerosis	100
MSValid [65]	T1, T2, FLAIR	Multiple Sclerosis	84
NFBS [32]	T1	Psychiatric Disorders, Healthy	125
NIMH-Ketamine [66]	T1, T2, DWI/DTI, fMRI	Psychiatric Disorders, Healthy	58
NIMH-RV [67,68]	T1, T2, DTI, FLAIR, Others	Healthy	1859
Novosibirsk-Brain Tumor [69]	T2, FLAIR, DWI/DTI, Others	Brain Tumor	42
OASIS-1 [28]	T1	Neurodegenerative	416
OASIS-2 [29]	T1	Neurodegenerative	150
PPMI [70]	CT, fMRI, MRI, DTI, PET, SPECT	Neurodegenerative, Healthy	8765
QIN-BRAIN-DSC-MRI [71]	T1, DSC	Brain Tumor	49
ReMIND [72]	T1, T1C, T2, FLAIR, DWI/DTI, iUS	Brain Tumor	114
SOOP [73]	T1, T2, FLAIR, TRACE, ADC	Stroke	1669
UCLA [74,75]	T1, DWI/DTI, fMRI/rs-fMRI	Psychiatric Disorders, Healthy	272
UCSF-ALPTDG [76]	FLAIR, T1, T1C, T2	Brain Tumor	298
UCSF-BMSR [77]	T1, T1C, FLAIR	Brain Tumor	412
UCSF-PDGM [78]	T1, T1C, T2, FLAIR, DWI/DTI, Others	Brain Tumor	495
UKBioBank [79]	T1, FLAIR, DWI/DTI, fMRI/rs-fMRI	Multiple Diseases	500,000
UMF-PD [31]	T1, fMRI/rs-fMRI	Neurodegenerative, Healthy	83
UPENN-GBM [80,81]	T1, T1C, T2, FLAIR, Others	Brain Tumor	630
WMH [82]	T1, FLAIR	White Matter Hyperintensities	170

Table A4. Summary of Intra-Dataset Heterogeneity in Public Brain MRI Collections.

Dataset	Timepoint	Site	Scanner
[18F]MK6240	Mixed	Single-center	3T
ABIDE-I	Single	Multi-center	–
ABIDE-II	Mixed	Multi-center	1.5T, 3T
ADNI	Multiple	Multi-center	1.5T, 3T
AOMIC-ID1000	Single	Single-center	3T
AOMIC-PIOP1	Single	Single-center	3T
AOMIC-PIOP2	Single	Single-center	3T
ARC	–	Multi-center	3T
ATLAS R2.0	–	Multi-center	–
BBSRC	–	Single-center	3T
BrainMetShare	Single	Multi-center	1.5T, 3T
BraTS-GLI (2025)	Single	Multi-center	1.5T, 3T
BraTS-MEN (2025)	Single	Multi-center	1.5T, 3T
BraTS-MET (2025)	Single	Multi-center	1.5T, 3T
BraTS-SSA (2025)	Single	Multi-center	–
CC-359	Single	Single-center	1.5T, 3T
DLBS	Multiple	Multi-center	1.5T, 3T
EDEN2020	Single	Single-center	3T
EPISURG	–	Single-center	1.5T, 3T
GSP	Single	Single-center	3T
HBN-SSI	Multiple	Single-center	3T
HCP	Single	Multi-center	3T
ICTS	Single	Multi-center	1.5T, 3T
IDB-MRXFDG	Single	Single-center	3T
IDEAS	Single	Multi-center	–
ISLES22	Single	Multi-center	–
IXI	Single	Multi-center	1.5T, 3T
MBSR	Multiple	Single-center	3T
Brain Tumor-SEG-CLASS	–	Multi-center	1.5T, 3T
MGH Wild	Single	Multi-center	1.5T, 3T
MICA-MICs	Single	Single-center	3T
MOTUM	Single	Multi-center	1.5T, 3T
MS-60	Single	Single-center	1.5T
MSLesSeg	Mixed	Multi-center	1.5T, 3T
MSSEG-2	Mixed	Multi-center	3T
MSValid	Single	Multi-center	1.5T, 3T
NFBS	Single	Single-center	3T
NIMH-Ketamine	Multiple	Single-center	3T
NIMH-RV	Multiple	Multi-center	3T
Novosibirsk-Brain Tumor	Single	Single-center	3T
OASIS-1	Single	Single-center	1.5T
OASIS-2	Multiple	Single-center	1.5T
PPMI	Multiple	Multi-center	1.5T, 3T
QIN-BRAIN-DSC-MRI	Single	Multi-center	1.5T, 3T
ReMIND	Single	Single-center	3T
SOOP	Single	Multi-center	–
UCLA	Multiple	Single-center	–
UCSF-ALPTDG	Single	Single-center	1.5T, 3T
UCSF-BMSR	Single	Single-center	1.5T, 3T
UCSF-PDGM	Single	Multi-center	3T
UKBioBank	–	Multi-center	–
UMF-PD	–	Multi-center	1.5T, 3T
UPENN-GBM	Mixed	Single-center	1.5–3T
WMH	Single	Single-center	1.5T, 3T

Figure A1. Z-score intensity normalization effect, T1C from BraTS-MEN dataset.

Figure A2. Statistical summary of intensity distributions across steps for IXI dataset. Preprocessing reduces variability, especially in outlier values.

References

Feigin, V.L.; Nichols, E.; Alam, T.; Bannick, M.S.; Beghi, E.; Blake, N.; Culpepper, W.J.; Dorsey, E.R.; Elbaz, A.; Ellenbogen, R.G.; et al. Global, regional, and national burden of neurological disorders, 1990–2016: A systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2019, 18, 459–480. [Google Scholar] [CrossRef]
Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2021, arXiv:2108.07258. [Google Scholar] [CrossRef]
Azad, B.; Azad, R.; Eskandari, S.; Bozorgpour, A.; Kazerouni, A.; Rekik, I.; Merhof, D. Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision. arXiv 2023, arXiv:2310.18689. [Google Scholar] [CrossRef]
He, Y.; Huang, F.; Jiang, X.; Nie, Y.; Wang, M.; Wang, J.; Chen, H. Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions. arXiv 2024, arXiv:2404.03264. [Google Scholar] [CrossRef] [PubMed]
Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Ou, N.; Basaran, B.D.; Visentin, M.; Qiao, M.; Gu, R.; Ouyang, C.; Liu, Y.; Matthews, P.M.; Ye, C.; et al. A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2024; Series Title: Lecture Notes in Computer Science; Linguraru, M.G., Dou, Q., Feragen, A., Giannarou, S., Glocker, B., Lekadir, K., Schnabel, J.A., Eds.; Springer Nature: Cham, Switzerland, 2024; Volume 15012, pp. 379–389. [Google Scholar] [CrossRef]
Cox, J.; Liu, P.; Stolte, S.E.; Yang, Y.; Liu, K.; See, K.B.; Ju, H.; Fang, R. BrainSegFounder: Towards 3D Foundation Models for Neuroimage Segmentation. arXiv 2024, arXiv:2406.10395. [Google Scholar] [CrossRef]
Kuş, Z.; Aydin, M. MedSegBench: A comprehensive benchmark for medical image segmentation in diverse data modalities. Sci. Data 2024, 11, 1283. [Google Scholar] [CrossRef] [PubMed]
Dishner, K.A.; McRae-Posani, B.; Bhowmik, A.; Jochelson, M.S.; Holodny, A.; Pinker, K.; Eskreis-Winkler, S.; Stember, J.N. A Survey of Publicly Available MRI Datasets for Potential Use in Artificial Intelligence Research. J. Magn. Reson. Imaging 2024, 59, 450–480. [Google Scholar] [CrossRef]
Pomponio, R.; Erus, G.; Habes, M.; Doshi, J.; Srinivasan, D.; Mamourian, E.; Bashyam, V.; Nasrallah, I.M.; Satterthwaite, T.D.; Fan, Y.; et al. Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage 2020, 208, 116450. [Google Scholar] [CrossRef]
Yearley, A.G.; Iorgulescu, J.B.; Chiocca, E.A.; Peruzzi, P.P.; Smith, T.R.; Reardon, D.A.; Mooney, M.A. The current state of glioma data registries. Neuro-Oncol. Adv. 2022, 4, vdac099. [Google Scholar] [CrossRef]
Andaloussi, M.A.; Maser, R.; Hertel, F.; Lamoline, F.; Husch, A.D. Exploring Adult Glioma through MRI: A Review of Publicly Available Datasets to Guide Efficient Image Analysis. arXiv 2024, arXiv:2409.00109. [Google Scholar] [CrossRef]
Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R.; et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 2015, 34, 1993–2024. [Google Scholar] [CrossRef]
Bakas, S.; Reyes, M.; Jakab, A.; Bauer, S.; Rempfler, M.; Crimi, A.; Shinohara, R.T.; Berger, C.; Ha, S.M.; Rozycki, M.; et al. Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge. arXiv 2018, arXiv:1811.02629. [Google Scholar] [CrossRef]
Baid, U.; Ghodasara, S.; Mohan, S.; Bilello, M.; Calabrese, E.; Colak, E.; Farahani, K.; Kalpathy-Cramer, J.; Kitamura, F.C.; Pati, S.; et al. The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification. arXiv 2021, arXiv:2107.02314. [Google Scholar] [CrossRef]
Bonato, B.; Nanni, L.; Bertoldo, A. Advancing Precision: A Comprehensive Review of MRI Segmentation Datasets from BraTS Challenges (2012–2025). Sensors 2025, 25, 1838. [Google Scholar] [CrossRef] [PubMed]
Guarnera, F.; Rondinella, A.; Crispino, E.; Russo, G.; Di Lorenzo, C.; Maimone, D.; Pappalardo, F.; Battiato, S. MSLesSeg: Baseline and benchmarking of a new Multiple Sclerosis Lesion Segmentation dataset. Sci. Data 2025, 12, 920. [Google Scholar] [CrossRef] [PubMed]
Muslim, A.M. Brain MRI Dataset of Multiple Sclerosis with Consensus Manual Lesion Segmentation and Patient Meta Information. 2022. Available online: https://data.mendeley.com/datasets/8bctsm8jz7/1 (accessed on 15 June 2025).
Commowick, O.; Cervenansky, F.; Cotton, F.; Dojat, M. MSSEG-2 challenge proceedings: Multiple sclerosis new lesions segmentation challenge using a data management and processing infrastructure. In Proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Strasbourg, France, 27 September–1 October 2021; p. 126. Available online: https://inria.hal.science/hal-03358968 (accessed on 15 June 2025).
Maleki, N.; Amiruddin, R.; Moawad, A.W.; Yordanov, N.; Gkampenis, A.; Fehringer, P.; Umeh, F.; Chukwurah, C.; Memon, F.; Petrovic, B.; et al. Analysis of the MICCAI Brain Tumor Segmentation—Metastases (BraTS-METS) 2025 Lighthouse Challenge: Brain Metastasis Segmentation on Pre- and Post-treatment MRI. arXiv 2025, arXiv:2504.12527. [Google Scholar] [CrossRef]
Adewole, M.; Rudie, J.D.; Gbadamosi, A.; Toyobo, O.; Raymond, C.; Zhang, D.; Omidiji, O.; Akinola, R.; Suwaid, M.A.; Emegoakor, A.; et al. The Brain Tumor Segmentation (BraTS) Challenge 2023: Glioma Segmentation in Sub-Saharan Africa Patient Population (BraTS-Africa). arXiv 2023, arXiv:2305.19369. [Google Scholar] [CrossRef]
Adewole, M.; Rudie, J.D.; Gbadamosi, A.; Zhang, D.; Raymond, C.; Ajigbotoshso, J.; Toyobo, O.; Aguh, K.; Omidiji, O.; Akinola, R.; et al. The BraTS-Africa Dataset: Expanding the Brain Tumor Segmentation (BraTS) Data to Capture African Populations. Radiol. Artif. Intell. 2025, 7, 4. [Google Scholar] [CrossRef]
LaBella, D.; Adewole, M.; Alonso-Basanta, M.; Altes, T.; Anwar, S.M.; Baid, U.; Bergquist, T.; Bhalerao, R.; Chen, S.; Chung, V.; et al. The ASNR-MICCAI Brain Tumor Segmentation (BraTS) Challenge 2023: Intracranial Meningioma. arXiv 2023, arXiv:2305.07642. [Google Scholar] [CrossRef]
LaBella, D.; Baid, U.; Khanna, O.; McBurney-Lin, S.; McLean, R.; Nedelec, P.; Rashid, A.; Tahon, N.H.; Altes, T.; Bhalerao, R.; et al. Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge. arXiv 2024, arXiv:2405.09787. [Google Scholar] [CrossRef]
LaBella, D.; Khanna, O.; McBurney-Lin, S.; Mclean, R.; Nedelec, P.; Rashid, A.S.; Tahon, N.H.; Altes, T.; Baid, U.; Bhalerao, R.; et al. A multi-institutional meningioma MRI dataset for automated multi-sequence image segmentation. Sci. Data 2024, 11, 496. [Google Scholar] [CrossRef]
Hernandez Petzsche, M.R.; De La Rosa, E.; Hanning, U.; Wiest, R.; Valenzuela, W.; Reyes, M.; Meyer, M.; Liew, S.L.; Kofler, F.; Ezhov, I.; et al. ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset. Sci. Data 2022, 9, 762. [Google Scholar] [CrossRef]
Pérez-García, F.; Rodionov, R.; Alim-Marvasti, A.; Sparks, R.; Duncan, J.; Ourselin, S. EPISURG: A Dataset of Postoperative Magnetic Resonance Images (MRI) for Quantitative Analysis of Resection Neurosurgery for Refractory Epilepsy. 2020. Available online: https://rdr.ucl.ac.uk/articles/dataset/EPISURG_a_dataset_of_postoperative_magnetic_resonance_images_MRI_for_quantitative_analysis_of_resection_neurosurgery_for_refractory_epilepsy/9996158/1 (accessed on 15 June 2025).
Marcus, D.S.; Wang, T.H.; Parker, J.; Csernansky, J.G.; Morris, J.C.; Buckner, R.L. Open Access Series of Imaging Studies (OASIS): Cross-sectional MRI Data in Young, Middle Aged, Nondemented, and Demented Older Adults. J. Cogn. Neurosci. 2007, 19, 1498–1507. [Google Scholar] [CrossRef]
Marcus, D.S.; Fotenos, A.F.; Csernansky, J.G.; Morris, J.C.; Buckner, R.L. Open Access Series of Imaging Studies: Longitudinal MRI Data in Nondemented and Demented Older Adults. J. Cogn. Neurosci. 2010, 22, 2677–2684. [Google Scholar] [CrossRef]
IXI Dataset. Imperial College London, South Kensington Campus, London SW7 2AZ, United Kingdom. Available online: https://brain-development.org/ixi-dataset/ (accessed on 15 June 2025).
Badea, L.; Onu, M.; Wu, T.; Roceanu, A.; Bajenaru, O. Exploring the reproducibility of functional connectivity alterations in Parkinson’s disease. PLoS ONE 2017, 12, e0188196. [Google Scholar] [CrossRef] [PubMed]
Puccio, B.; Pooley, J.P.; Pellman, J.S.; Taverna, E.C.; Craddock, R.C. The preprocessed connectomes project repository of manually corrected skull-stripped T1-weighted anatomical MRI data. Gigascience 2016, 5, s13742–016–0150–5. [Google Scholar] [CrossRef]
Grøvik, E.; Yi, D.; Iv, M.; Tong, E.; Rubin, D.L.; Zaharchuk, G. BrainMetShare. 2020. Available online: https://aimi.stanford.edu/datasets/brainmetshare (accessed on 15 June 2025).
Smith, S.M. Fast robust automated brain extraction. Hum. Brain Mapp. 2002, 17, 143–155. [Google Scholar] [CrossRef] [PubMed]
Cox, R.W. AFNI: Software for Analysis and Visualization of Functional Magnetic Resonance Neuroimages. Comput. Biomed. Res. 1996, 29, 162–173. [Google Scholar] [CrossRef] [PubMed]
Isensee, F.; Schell, M.; Pflueger, I.; Brugnara, G.; Bonekamp, D.; Neuberger, U.; Wick, A.; Schlemmer, H.; Heiland, S.; Wick, W.; et al. Automated brain extraction of multisequence MRI using artificial neural networks. Hum. Brain Mapp. 2019, 40, 4952–4964. [Google Scholar] [CrossRef]
Dascal, A.; Koepp, M.; Royer, J.; Chen, J.; Arafat, T.; Caciagli, L.; Bernasconi, N.; Hopewell, R.; Soucy, J.P.; Hsiao, C.H.H.; et al. An Open Dataset of Cerebral Tau Deposition in Young Healthy Adults Based on [18F]MK6240 Positron Emission Tomography. 2025. Available online: https://osf.io/znt9d/overview (accessed on 15 June 2025).
Di Martino, A.; Yan, C.G.; Li, Q.; Denio, E.; Castellanos, F.X.; Alaerts, K.; Anderson, J.S.; Assaf, M.; Bookheimer, S.Y.; Dapretto, M.; et al. The autism brain imaging data exchange: Towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry 2014, 19, 659–667. [Google Scholar] [CrossRef]
Di Martino, A.; O’Connor, D.; Chen, B.; Alaerts, K.; Anderson, J.S.; Assaf, M.; Balsters, J.H.; Baxter, L.; Beggiato, A.; Bernaerts, S.; et al. Enhancing studies of the connectome in autism using the autism brain imaging data exchange II. Sci. Data 2017, 4, 170010. [Google Scholar] [CrossRef] [PubMed]
Jack, C.R.; Bernstein, M.A.; Fox, N.C.; Thompson, P.; Alexander, G.; Harvey, D.; Borowski, B.; Britson, P.J.; L. Whitwell, J.; Ward, C.; et al. The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J. Magn. Reson. Imaging 2008, 27, 685–691. [Google Scholar] [CrossRef] [PubMed]
Jack, C.R.; Arani, A.; Borowski, B.J.; Cash, D.M.; Crawford, K.; Das, S.R.; DeCarli, C.; Fletcher, E.; Fox, N.C.; Gunter, J.L.; et al. Overview of ADNI MRI. Alzheimer’s Dement. 2024, 20, 7350–7360. [Google Scholar] [CrossRef]
Snoek, L.; Miesen, M.V.D.; Leij, A.V.D.; Beemsterboer, T.; Eigenhuis, A.; Scholte, S. AOMIC-ID1000. 2021. Available online: https://doi.org/10.18112/OPENNEURO.DS003097.V1.2.1 (accessed on 15 June 2025).
Snoek, L.; Van Der Miesen, M.M.; Beemsterboer, T.; Van Der Leij, A.; Eigenhuis, A.; Steven Scholte, H. The Amsterdam Open MRI Collection, a set of multimodal MRI datasets for individual difference analyses. Sci. Data 2021, 8, 85. [Google Scholar] [CrossRef]
Snoek, L.; Miesen, M.V.D.; Leij, A.V.D.; Beemsterboer, T.; Eigenhuis, A.; Scholte, S. AOMIC-PIOP1. 2020. Available online: https://openneuro.org/datasets/ds002785/versions/2.0.0 (accessed on 15 June 2025).
Snoek, L.; Miesen, M.V.D.; Leij, A.V.D.; Beemsterboer, T.; Eigenhuis, A.; Scholte, S. AOMIC-PIOP2. 2020. Available online: https://openneuro.org/datasets/ds002790/versions/2.0.0 (accessed on 15 June 2025).
Gibson, M.; Newman-Norlund, R.; Bonilha, L.; Fridriksson, J.; Hickok, G.; Hillis, A.E.; Den Ouden, D.-B.; Rorden, C. Aphasia Recovery Cohort (ARC) Dataset. 2023. Available online: https://openneuro.org/datasets/ds004884/versions/1.0.1 (accessed on 15 June 2025).
Gibson, M.; Newman-Norlund, R.; Bonilha, L.; Fridriksson, J.; Hickok, G.; Hillis, A.E.; Den Ouden, D.B.; Rorden, C. The Aphasia Recovery Cohort, an open-source chronic stroke repository. Sci. Data 2024, 11, 981. [Google Scholar] [CrossRef]
Liew, S.L.; Lo, B.P.; Donnelly, M.R.; Zavaliangos-Petropulu, A.; Jeong, J.N.; Barisano, G.; Hutton, A.; Simon, J.P.; Juliano, J.M.; Suri, A.; et al. A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms. Sci. Data 2022, 9, 320. [Google Scholar] [CrossRef]
Lloyd, W.K.; Morriss, J.; Macdonald, B.; Joanknecht, K.; Nihouarn, J.; Reekum, C.M.V. Emotion Regulation in the Ageing Brain, University of Reading, BBSRC. 2021. Available online: https://openneuro.org/datasets/ds002366/versions/1.1.0 (accessed on 15 June 2025).
Verdier, M.C.d.; Saluja, R.; Gagnon, L.; LaBella, D.; Baid, U.; Tahon, N.H.; Foltyn-Dumitru, M.; Zhang, J.; Alafif, M.; Baig, S.; et al. The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRI. arXiv 2024, arXiv:2405.18368. [Google Scholar] [CrossRef]
Souza, R.; Lucena, O.; Garrafa, J.; Gobbi, D.; Saluzzi, M.; Appenzeller, S.; Rittner, L.; Frayne, R.; Lotufo, R. An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement. NeuroImage 2018, 170, 482–494. [Google Scholar] [CrossRef] [PubMed]
Park, D.; Hennessee, J.; Smith, E.T.; Chan, M.; Katen, C.; Wig, G.; Rodrigue, K.; Kennedy, K. The Dallas Lifespan Brain Study. 2024. Available online: https://openneuro.org/datasets/ds004856/versions/1.2.0 (accessed on 15 June 2025).
Castellano, A.; Pieri, V.; Galvan, S.; Iadanza, A.; Riva, M.; Bello, L.; Rodriguez y Baena, F.; Falini, A. EDEN2020 Human Brain MRI Datasets for Brain Glioma Patients. Hum. Brain Mapp. 2020, 42, 1268–1286. [Google Scholar] [CrossRef]
Buckner, R.L.; Roffman, J.L.; Smoller, J.W.; Neuroinformatics Research Group. Brain Genomics Superstruct Project (GSP). 2014. [Google Scholar] [CrossRef]
O’Connor, D.; Potler, N.V.; Kovacs, M.; Xu, T.; Ai, L.; Pellman, J.; Vanderwal, T.; Parra, L.C.; Cohen, S.; Ghosh, S.; et al. The Healthy Brain Network Serial Scanning Initiative: A resource for evaluating inter-individual differences and their reliabilities across scan conditions and sessions. GigaScience 2017, 6, giw011. [Google Scholar] [CrossRef] [PubMed]
Van Essen, D.C.; Smith, S.M.; Barch, D.M.; Behrens, T.E.; Yacoub, E.; Ugurbil, K. The WU-Minn Human Connectome Project: An overview. NeuroImage 2013, 80, 62–79. [Google Scholar] [CrossRef]
Lu, S.L.; Liao, H.C.; Hsu, F.M.; Liao, C.C.; Lai, F.; Xiao, F. The intracranial tumor segmentation challenge: Contour tumors on brain MRI for radiosurgery. NeuroImage 2021, 244, 118585. [Google Scholar] [CrossRef]
Mérida, I.; Jung, J.; Bouvard, S.; Le Bars, D.; Lancelot, S.; Lavenne, F.; Bouillot, C.; Redouté, J.; Hammers, A.; Costes, N. CERMEP-IDB-MRXFDG: A database of 37 normal adult human brain [18F]FDG PET, T1 and FLAIR MRI, and CT images available for research. EJNMMI Res. 2021, 11, 91. [Google Scholar] [CrossRef]
Taylor, P.N.; Wang, Y.; Simpson, C.; Janiukstyte, V.; Horsley, J.; Leiberg, K.; Little, B.; Clifford, H.; Adler, S.; Vos, S.B.; et al. The Imaging Database for Epilepsy And Surgery (IDEAS). 2024. Available online: https://openneuro.org/datasets/ds005602/versions/1.0.0 (accessed on 15 June 2025).
Seminowicz, D.; Burrowes, S.; Kearson, A.; Zhang, J.; Krimmel, S.; Samawi, L.; Furman, A.; Keaser, M. MBSR. 2024. Available online: https://openneuro.org/datasets/ds005016/versions/1.1.1 (accessed on 15 June 2025).
Vassantachart, A.; Cao, Y.; Shen, Z.; Cheng, K.; Gribble, M.; Ye, J.C.; Zada, G.; Hurth, K.; Mathew, A.; Guzman, S.; et al. Segmentation and Classification of Grade I and II Meningiomas from Magnetic Resonance Imaging: An Open Annotated Dataset (Meningioma-SEG-CLASS). 2023. Available online: https://www.cancerimagingarchive.net/collection/meningioma-seg-class/ (accessed on 15 June 2025).
Iglesias, J.E.; Billot, B.; Balbastre, Y.; Magdamo, C.; Arnold, S.E.; Das, S.; Edlow, B.L.; Alexander, D.C.; Golland, P.; Fischl, B. SynthSR: A public AI tool to turn heterogeneous clinical brain scans into high-resolution T1-weighted images for 3D morphometry. Sci. Adv. 2023, 9, eadd3607. [Google Scholar] [CrossRef]
Royer, J.; Rodríguez-Cruces, R.; Tavakol, S.; Larivière, S.; Herholz, P.; Li, Q.; Vos De Wael, R.; Paquola, C.; Benkarim, O.; Park, B.Y.; et al. An Open MRI Dataset For Multiscale Neuroscience. Sci. Data 2022, 9, 569. [Google Scholar] [CrossRef]
Gong, Z.; Xu, T.; Peng, N.; Cheng, X.; Niu, C.; Wiestler, B.; Hong, F.; ei Bran Li, B. A Multi-Center, Multi-Parametric MRI Dataset of Primary and Secondary Brain Tumors. 2023. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KUUEWC (accessed on 15 June 2025).
Pappalardo, F.; Russo, G.; Di Salvatore, V.; Battiato, S.; Guarnera, F.; Rondinella, A. MSValid Data Collection. 2024. Available online: https://zenodo.org/records/10875606 (accessed on 15 June 2025).
Evans, J.W.; Nugent, A.C.; Zarate, C.A. NIMH Ketamine Mechanism of Action Study. 2025. Available online: https://openneuro.org/datasets/ds005917/versions/1.0.1 (accessed on 15 June 2025).
Nugent, A.C.; Thomas, A.G.; Mahoney, M.; Gibbons, A.; Smith, J.; Charles, A.; Shaw, J.S.; Stout, J.D.; Namyst, A.M.; Basavaraj, A.; et al. The NIMH Healthy Research Volunteer Dataset. 2025. Available online: https://openneuro.org/datasets/ds005752/versions/2.1.0 (accessed on 15 June 2025).
Nugent, A.C.; Thomas, A.G.; Mahoney, M.; Gibbons, A.; Smith, J.T.; Charles, A.J.; Shaw, J.S.; Stout, J.D.; Namyst, A.M.; Basavaraj, A.; et al. The NIMH intramural healthy volunteer dataset: A comprehensive MEG, MRI, and behavioral resource. Sci. Data 2022, 9, 289. [Google Scholar] [CrossRef]
Filimonova, E.; Pashkov, A.; Borisov, N.; Kalinovsky, A.; Rzaev, J. Utilizing the amide proton transfer technique to characterize diffuse gliomas based on the WHO 2021 classification of CNS tumors. Neuroradiol. J. 2024, 37, 490–499. [Google Scholar] [CrossRef] [PubMed]
Marek, K.; Chowdhury, S.; Siderowf, A.; Lasch, S.; Coffey, C.S.; Caspell-Garcia, C.; Simuni, T.; Jennings, D.; Tanner, C.M.; Trojanowski, J.Q.; et al. The Parkinson’s progression markers initiative (PPMI)—Establishing a PD biomarker cohort. Ann. Clin. Transl. Neurol. 2018, 5, 1460–1477. [Google Scholar] [CrossRef]
Schmainda, K.M.; Prah, M.A.; Connelly, J.M.; Rand, S.D. Glioma DSC-MRI Perfusion Data with Standard Imaging and ROIs (QIN-BRAIN-DSC-MRI). 2016. Available online: https://www.cancerimagingarchive.net/collection/qin-brain-dsc-mri/ (accessed on 15 June 2025).
Juvekar, P.; Dorent, R.; Kögl, F.; Torio, E.; Barr, C.; Rigolo, L.; Galvin, C.; Jowkar, N.; Kazi, A.; Haouchine, N.; et al. The Brain Resection Multimodal Imaging Database (ReMIND). 2023. Available online: https://www.cancerimagingarchive.net/collection/remind/ (accessed on 15 June 2025).
Rorden, C.; Absher, J.; Newman-Norlund, R. Stroke Outcome Optimization Project (SOOP). 2024. Available online: https://openneuro.org/datasets/ds004889/versions/1.1.2 (accessed on 15 June 2025).
Bilder, R.; Poldrack, R.; Cannon, T.; London, E.; Freimer, N.; Congdon, E.; Karlsgodt, K.; Sabb, F. UCLA Consortium for Neuropsychiatric Phenomics LA5c Study. 2020. Available online: https://openneuro.org/datasets/ds000030/versions/1.0.0 (accessed on 15 June 2025).
Poldrack, R.; Congdon, E.; Triplett, W.; Gorgolewski, K.; Karlsgodt, K.; Mumford, J.; Sabb, F.; Freimer, N.; London, E.; Cannon, T.; et al. A phenome-wide examination of neural and cognitive function. Sci. Data 2016, 3, 160110. [Google Scholar] [CrossRef] [PubMed]
Fields, B.K.K.; Calabrese, E.; Mongan, J.; Cha, S.; Hess, C.P.; Sugrue, L.P.; Chang, S.M.; Luks, T.L.; Villanueva-Meyer, J.E.; Rauschecker, A.M.; et al. The University of California San Francisco Adult Longitudinal Post-Treatment Diffuse Glioma MRI Dataset. In Radiology: Artificial Intelligence; Radiological Society of North America (RSNA): Oak Brook, IL, USA, 2024; Volume 6. [Google Scholar] [CrossRef]
Rudie, J.D.; Saluja, R.; Weiss, D.A.; Nedelec, P.; Calabrese, E.; Colby, J.B.; Laguna, B.; Mongan, J.; Braunstein, S.; Hess, C.P.; et al. The University of California San Francisco Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) MRI Dataset. Radiol. Artif. Intell. 2024, 6, e230126. [Google Scholar] [CrossRef] [PubMed]
Calabrese, E.; Villanueva-Meyer, J.; Rudie, J.; Rauschecker, A.; Baid, U.; Bakas, S.; Cha, S.; Mongan, J.; Hess, C. The University of California San Francisco Preoperative Diffuse Glioma MRI (UCSF-PDGM). 2023. Available online: https://www.cancerimagingarchive.net/collection/ucsf-pdgm/ (accessed on 15 June 2025).
Sudlow, C.; Gallacher, J.; Allen, N.; Beral, V.; Burton, P.; Danesh, J.; Downey, P.; Elliott, P.; Green, J.; Landray, M.; et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med. 2015, 12, e1001779. [Google Scholar] [CrossRef] [PubMed]
Bakas, S.; Sako, C.; Akbari, H.; Bilello, M.; Sotiras, A.; Shukla, G.; Rudie, J.D.; Flores Santamaria, N.; Fathi Kazerooni, A.; Pati, S.; et al. Multi-Parametric Magnetic Resonance Imaging (mpMRI) Scans for de novo Glioblastoma (GBM) Patients from the University of Pennsylvania Health System (UPENN-GBM). 2021. Available online: https://www.cancerimagingarchive.net/collection/upenn-gbm/ (accessed on 15 June 2025).
Bakas, S.; Sako, C.; Akbari, H.; Bilello, M.; Sotiras, A.; Shukla, G.; Rudie, J.D.; Santamaría, N.F.; Kazerooni, A.F.; Pati, S.; et al. The University of Pennsylvania glioblastoma (UPenn-GBM) cohort: Advanced MRI, clinical, genomics, & radiomics. Sci. Data 2022, 9, 453. [Google Scholar] [CrossRef] [PubMed]
Kuijf, H.; Biesbroek, M.; de Bresser, J.; Heinen, R.; Chen, C.; van der Flier, W.; Barkhof; Viergever, M.; Biessels, G.J. Data of the White Matter Hyperintensity (WMH) Segmentation Challenge. 2022. Available online: https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/AECRSD (accessed on 15 June 2025).

Figure 1. Distribution of subjects by disease category after removing the undefined “Multiple Diseases” group. The x-axis uses a logarithmic scale to enable visualization across several orders of magnitude, from hundreds to tens of thousands of subjects.

Figure 2. Distribution of dataset sizes on a logarithmic scale. The figure highlights the dominance of one extremely large population dataset and numerous smaller, clinically focused cohorts. Logarithmic scaling compresses large numerical differences to emphasize structural imbalance across dataset scales.

Figure 3. Heatmap of modality co-occurrence across structural MRI datasets. High-intensity cells indicate frequent pairing between modalities, particularly T1–FLAIR, T1–T2, and T1–T1C. These patterns reveal partial but consistent overlap that supports unified representation learning across multi-dataset collections.

Figure 4. Voxel spacing distribution (in mm) along the x, y, and z axes for 14 curated datasets. Each point represents one scan, and each color corresponds to a dataset. Compact clusters indicate consistent acquisition protocols, while spread-out points show variation in resolution and anisotropy.

Figure 5. Median voxel intensity per image across datasets. Each dot represents one 3D MRI volume.

Figure 6. Intra-dataset variability within the IXI dataset across three scanners: (a) number of volumes contributed by each scanner, showing strong imbalance with Philips 1.5T dominating the dataset; (b) distribution of voxel spacing in the z-axis, highlighting geometric differences between scanners; (c) variation in median intensity, illustrating scanner-dependent photometric differences in structural MRI; (d) distribution of slice counts, reflecting differences in field-of-view and acquisition protocols.

Figure 7. T1-weighted contrast-enhanced (T1C) image from the BrainMetShare dataset before (left) and after z-score normalization (right).

Figure 8. Bias field correction effect for BraTS-SSA. Includes raw image, corrected result, estimated bias field map, and applied brain mask.

Figure 9. Effect of skull stripping on a PD image from the IXI dataset. (Left): original image with visible scalp and skull. (Right): stripped image showing improved tissue contrast and brain boundary definition, but minor over-stripping near cortical edges.

Figure 10. Effect of registration on a T2-weighted image from the BraTS-MEN dataset. (Top): original, registered, and MNI152 template images. (Bottom): histogram alignment, intensity correlation (

r = - 0.217

), and overlay comparisons showing a 26.9% volume increase.

Figure 10. Effect of registration on a T2-weighted image from the BraTS-MEN dataset. (Top): original, registered, and MNI152 template images. (Bottom): histogram alignment, intensity correlation (

r = - 0.217

), and overlay comparisons showing a 26.9% volume increase.

Figure 11. Interpolation of a fluid-attenuated inversion recovery (FLAIR) MRI image from the MS-60 dataset. (Left): original thin-slice volume (13 slices) showing discontinuities. (Right): interpolated volume (64 slices) with improved z-axis continuity and preserved anatomy.

Table 1. Representative structural brain MRI datasets included in this review. # indicates the number of subjects.

Dataset	Key Modalities	Cohort	# Subjects
UKBioBank	T1, FLAIR, DWI, fMRI	Multiple diseases	500,000
HCP	T1, fMRI, DWI	Healthy	1200
IXI	T1, T2, DWI	Healthy	581
OASIS-1	T1	Neurodegenerative	416
ADNI	T1, T2, FLAIR	Neurodegenerative	4068
ABIDE-I	T1	Autism, Healthy	1112
BraTS-MET	T1/T1C/T2/FLAIR	Brain tumor	1778
ISLES22	T1, T2, FLAIR, DWI	Stroke	400
MSSEG-2	FLAIR	Multiple sclerosis	100
EPISURG	T1	Epilepsy	430

Table 2. Counts of anisotropy categories across all analyzed images.

Anisotropy Category	Count
Isotropic	7968
Mildly Anisotropic	7152
Highly Anisotropic	1724

Table 3. Axial orientation distribution and dataset sources. Each axcode string represents the anatomical direction of the image axes: the first letter indicates the direction of the X-axis (e.g., R = Right, L = Left), the second letter corresponds to the Y-axis (e.g., A = Anterior, P = Posterior), and the third letter represents the Z-axis (e.g., S = Superior, I = Inferior). For example, RAS means the X-axis increases from left to right, the Y-axis from posterior to anterior, and the Z-axis from inferior to superior—commonly used in neuroimaging.

Orientation	Count	Datasets
RAS	6592	BraTS-MET (2025), BraTS-SSA (2025), BrainMetShare, EPISURG, ISLES22
LPS	5012	BraTS-MEN (2025), BraTS-MET (2025), BraTS-SSA (2025)
LAS	3473	BraTS-MET (2025), EPISURG, ISLES22, IXI, MS-60, MSLesSeg, MSSEG-2, OASIS-1, UMFPD
RSA	664	EPISURG
PSR	582	EPISURG, IXI
ASL	373	OASIS-2
PIR	129	EPISURG, NFBS
LIP	9	EPISURG
LSP	5	EPISURG
ASR	4	EPISURG
LSA	1	EPISURG

Table 4. Voxel-level intensity statistics before and after z-score normalization.

Statistic	Original Image	Normalized Image
Minimum	0.854	0
Maximum	1259.812	8.40
Mean	258.018	∼0.00
Std	95.000	∼1.00

Table 5. Quantitative effects of bias field correction on T2-weighted images from BraTS-SSA.

Metric (BraTS-SSA)	Before	After
Coefficient of Variation	0.207	0.163
Signal-to-Noise Ratio (SNR)	6.87	6.50

Table 6. Quantitative assessment of residual covariate shift in learned feature space: NFBS vs. IXI Datasets.

Metric	Value
Cosine Similarity (mean vectors)	0.960907
Euclidean Distance (mean vectors)	6.854918
Average 1D Wasserstein Distance	0.141903
Significant Features ( $p < 0.05$ )	947/1024
Bonferroni-Significant Features ( $p < 4.88 \times 10^{- 5}$ )	859/1024

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luu, M.S.K.; Benedichuk, M.V.; Roppert, E.I.; Kenzhin, R.M.; Tuchinov, B.N. A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development. J. Imaging 2025, 11, 454. https://doi.org/10.3390/jimaging11120454

AMA Style

Luu MSK, Benedichuk MV, Roppert EI, Kenzhin RM, Tuchinov BN. A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development. Journal of Imaging. 2025; 11(12):454. https://doi.org/10.3390/jimaging11120454

Chicago/Turabian Style

Luu, Minh Sao Khue, Margaret V. Benedichuk, Ekaterina I. Roppert, Roman M. Kenzhin, and Bair N. Tuchinov. 2025. "A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development" Journal of Imaging 11, no. 12: 454. https://doi.org/10.3390/jimaging11120454

APA Style

Luu, M. S. K., Benedichuk, M. V., Roppert, E. I., Kenzhin, R. M., & Tuchinov, B. N. (2025). A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development. Journal of Imaging, 11(12), 454. https://doi.org/10.3390/jimaging11120454

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development

Abstract

1. Introduction

2. Review Methodology

2.1. Data Collection and Selection Process

2.1.1. Screening Outcome

2.1.2. Standardization of Modalities and Cohort Labels

2.1.3. Subset for Image-Level Analysis

2.2. Metadata Extraction

Computational Resources

3. Dataset-Level Analysis

3.1. Disease Coverage

3.2. Dataset Scale

3.3. Modality Composition

4. Image-Level Analysis

4.1. Voxel Spacing

4.2. Orientation

4.3. Image Intensity Distribution

5. Intra-Dataset Patient-Level Analysis

6. Evaluation of Preprocessing Effects on Image Harmonization

6.1. Intensity Normalization

6.2. Bias Field Correction

6.3. Skull Stripping

6.4. Spatial Registration to MNI152

6.5. Interpolation of Thin-Slice Volumes

7. Residual Covariate Shift After Preprocessing

8. Practical Considerations for Using Public Brain MRI Datasets

9. Limitations

10. Conclusions & Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI