Artificial Intelligence Approaches for the Detection of Normal Pressure Hydrocephalus: A Systematic Review

Mercado-Diaz, Luis R.; Prakash, Neha; Gong, Gary X.; Posada-Quintero, Hugo F.

doi:10.3390/app15073653

Open AccessSystematic Review

Artificial Intelligence Approaches for the Detection of Normal Pressure Hydrocephalus: A Systematic Review

¹

Department of Biomedical Engineering, University of Connecticut, Storrs, CT 06269, USA

²

Parkinson’s Disease and Movement Disorders Center, Department of Neurology, University of Connecticut Health Center, Farmington, CT 06269, USA

³

Institute for Neurodegenerative Disorders, New Haven, CT 06510, USA

⁴

XingImaging LLC, New Haven, CT 06510, USA

⁵

Division of Neuroradiology, Department of Radiology, University of Connecticut Health Center, Farmington, CT 06269, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3653; https://doi.org/10.3390/app15073653

Submission received: 24 February 2025 / Revised: 18 March 2025 / Accepted: 24 March 2025 / Published: 26 March 2025

(This article belongs to the Special Issue Application of Machine Learning and Artificial Intelligence in Human-Computer Interaction)

Download

Browse Figures

Versions Notes

Abstract

:

Normal pressure hydrocephalus (NPH) is a neurological disorder characterized by altered cerebrospinal fluid accumulation in the brain’s ventricles, leading to symptoms such as gait disturbance and cognitive impairment. Artificial intelligence (AI), including machine learning (ML) and deep learning (DL), shows promise in diagnosing NPH using medical images. In this systematic review, we examined 21 papers on the use of AI in detecting NPH. The studies primarily focused on differentiating NPH from other neurodegenerative disorders, such as Parkinson’s disease and Alzheimer’s disease. We found that traditional ML methods like Support Vector Machines, Random Forest, and Logistic Regression were commonly used, while DL methods, particularly Deep Convolutional Neural Networks, were also widely employed. The accuracy of these approaches varied, ranging from 70% to 95% in differentiating NPH from other conditions. Feature selection techniques were used to identify relevant parameters for diagnosis. MRI scans were more frequently used than CT scans, but both modalities showed promise. Evaluation metrics like Dice similarity coefficients and ROC-AUC were the most typical metrics of model performance. Challenges in implementing AI in clinical practice were identified, and the authors suggested that a hybrid deep-traditional ML framework could enhance NPH diagnosis. Further research is needed to maximize the benefits of AI while addressing limitations.

Keywords:

deep learning; machine learning; normal pressure hydrocephalus

1. Introduction

Normal pressure hydrocephalus (NPH) is a neurological disorder that affects the ventricles of the brain [1], leading to the accumulation of cerebrospinal fluid (CSF) with preserved CSF pressure. NPH can cause various symptoms, including gait disturbance, urinary incontinence, and cognitive impairment. It is estimated that NPH affects up to 700,000 people in the United States alone [2]. Although NPH is relatively common, occurring in approximately 1 in 1000 people over the age of 65 [3,4,5], it is often misdiagnosed and currently has no cure. However, the condition can be treated with CSF diversion surgeries where a shunt tube is surgically implanted to drain excess CSF from the brain to another compartment.

Shunt surgery has proven to be highly effective in treating NPH and can lead to significant improvement in symptoms. However, the success of the treatment depends on an accurate and early diagnosis, which can be challenging due to the absence of a definitive test [6]. Artificial intelligence (AI) has shown promising results in the detection of various diseases, including NPH, based on radiological images [7,8,9]. In this review, we aimed to examine the different AI approaches used for NPH detection utilizing radiological images since 2010, and to analyze the challenges, pitfalls, and opportunities in this field.

1.1. Background

1.1.1. Normal Pressure Hydrocephalus

NPH was initially described in 1965 by Hakim and Adams [2]. They identified a triad of symptoms that are now recognized as characteristic of NPH: gait disturbance, urinary incontinence, and dementia. Individuals with NPH typically exhibit normal CSF opening pressure, lending to its name “normal pressure hydrocephalus” [10].

The exact etiology of NPH remains elusive. NPH is largely categorized into two main types: idiopathic NPH (iNPH) and secondary NPH. iNPH is the most common type and occurs in individuals with no identified secondary cause. On the other hand, secondary NPH can develop in response to intracranial hemorrhage, infection, or other causes that are believed to damage the CSF drainage pathways via inflammatory response, leading to scarring and/or blockage with eventual CSF accumulation. Even though iNPH is the most common and extensively studied type, our review focuses on NPH as an overarching term encompassing all types for the purpose of generalization.

NPH is a clinic-radiological diagnosis that relies on the presence of clinical findings supported by radiological markers of NPH [11]. Additional invasive testing, such as cisternogram or CSF infusion testing, has been utilized to aid in diagnosis. However, the confirmatory tests are invasive and include large volume lumbar puncture or lumbar drain; the tests look for transient improvement in clinical symptoms after the removal of fluid. If appropriately utilized, a positive response is usually associated with higher positive predictive value for improvement with CSF diversion surgery.

1.1.2. Diagnostic Conundrum

NPH symptoms can vary from person to person but typically include the triad of gait disturbance, urinary incontinence, and cognitive impairment. Other symptoms, such as headache and sleep and mood disturbances, particularly apathy, have been widely described for NPH [12]. However, these clinical findings are also commonly noted with other neurodegenerative disorders, such as Parkinson’s disease (PD) and Alzheimer’s disease (AD). Furthermore, the gait and cognitive profile in NPH share similarities with other degenerative parkinsonian disorders such as PD and progressive supranuclear palsy (PSP). To further complicate the matter, questions about co-pathology with other neurodegenerative disorders have been circulated. As such, clinical overlaps can lead to missed or delayed diagnoses. This overlap also extends to radiological markers. MRI brain and CT head are commonly utilized imaging modalities, with variable uses of MRI sequences, such as CISS or CSF flow to aid in NPH diagnosis. Few radiological markers for NPH are commonly utilized in clinical practice; these include increased Evans index, disproportionately enlarged subarachnoid hydrocephalus (DESH), narrower callosal angles, and increased vertical plan Evans index, to name a few. Despite these widely known markers, their identification remains limited to the clinician’s experience.

Several studies have assessed the diagnostic accuracy of MRI and CT scans for NPH. For instance, a study published in the journal Neurology in 2016 reported that the diagnostic accuracy of MRI for NPH was approximately 75% [13], while CT scans had a diagnostic accuracy of 70% [14]. Furthermore, a 2019 study published in the journal Neurosurgery reported that the sensitivity and specificity of MRI for NPH were 86% and 80%, respectively, while the sensitivity and specificity of CT scans were 70% and 78%, respectively [10,15,16].

Similarly, lumbar punctures or lumbar drain tests have limitations, with a reported 85% diagnostic accuracy [14,17]. NPH is also unique in that the definite category of diagnosis depends on the response to treatment itself, creating an additional diagnostic conundrum. And while surgery can largely benefit people with NPH, it is also associated with higher adverse events throughout a lifetime. Additionally, the response could be minimal to negative in some NPH cases. As such, there is a continued need for highly sensitive and specific non-invasive diagnostic tests to identify NPH and predict surgical response.

1.1.3. Artificial Intelligence

There has been a growing interest in the application of AI {XE “AI” \t “Artificial Intelligence”} techniques, including machine learning (ML {XE “ML” \t “machine learning”}) and deep learning (DL {XE “DL” \t “deep learning”}), for the diagnosis of NPH and the prediction of surgical response {XE “DL” \t “Deep Learning”}. AI refers to the development of intelligent systems that can perform tasks typically requiring human intelligence. It encompasses various techniques and algorithms aimed at enabling machines to perceive, reason, learn, and make decisions [18]. Two fundamental concepts within AI are ML and DL. ML is a subset of AI that focuses on algorithms and statistical models that enable computers to learn from data and make pre-dictions or decisions without being explicitly programmed [19]. It involves the training of models on large datasets, where patterns and relationships are automatically discovered. Common ML applications include regression, classification, clustering, and reinforcement learning. ML encompasses traditional methods, such as SVM, Decision Trees (DT {XE “DT” \t “Decision Trees”}), and shallow neural networks (NN {XE “NN” \t “neural networks”}{XE “NN” \t “Neural Networks”}). These models are trained using algorithms that optimize their performance based on objective functions and evaluation metrics [18]. DL is a specialized form of ML, inspired by the structure and function of the human brain’s neural networks. It involves the use of artificial neural networks composed of multiple layers of interconnected nodes (neurons) [20]. These networks are capable of automatically learning hierarchical representations of data by progressively extracting complex features at each layer. DL has demonstrated exceptional performance in areas such as image and speech recognition, natural language processing, and autonomous driving [9].

More specifically, ML algorithms are designed to automatically learn patterns and make predictions based on features obtained from the data [21], while DL algorithms utilize layers of Artificial Neural Networks (ANN {XE “ANN” \t “Artificial Neural Networks”}) to extract hierarchical representations from raw data [18]. As noted, feature extraction plays a crucial role in training ML models. In the context of NPH, features can be categorized into structural, functional, and clinical features. These features describe the physical structure of the brain and are derived from medical images. Prominent structural features used for NPH diagnosis include ventricle size and shape, CSF volume, and white-matter integrity. Various image processing techniques, such as segmentation and shape analysis, are used to extract these features from brain images [22,23,24]. Clinical features include demographic information (e.g., age, sex), specific symptoms associated with NPH (e.g., gait disturbances, cognitive impairments), and relevant medical history [25]. Functional features are often derived from various imaging techniques and physiological measures, allowing researchers to examine the functional aspects of neurodegenerative diseases (e.g., resting state functional connectivity, task-based functional activation). Other functional features are obtained from a cognitive evaluation (e.g., cognitive functions, memory, attention, language, and executive functions). These features provide functional measures of cognitive abilities affected by neurodegenerative diseases [26].

Some studies combine features from several categories (structural, functional, and clinical). These multimodal methods typically enhance the detection, diagnosis, or segmentation of NPH. Although ML methods have been widely used, DL algorithms have recently shown remarkable success in various medical imaging tasks, including NPH diagnosis [27,28,29]. In this review, we have considered both ML and DL approaches to explore their potential in NPH diagnosis [28,30].

This systematic review aimed to comprehensively evaluate and synthesize the current landscape of artificial intelligence applications in the detection and diagnosis of Normal Pressure Hydrocephalus (NPH). Specifically, the review addressed the following objectives:

To identify and categorize the range of AI techniques (machine learning and deep learning) employed for NPH detection and diagnosis using radiological imaging.
To assess the diagnostic performance of AI algorithms in differentiating NPH from other neurological conditions (including Alzheimer’s disease, Parkinson’s disease, and Progressive Supranuclear Palsy) and healthy controls.
To determine which radiological features and imaging biomarkers are most valuable for AI-based NPH diagnosis across different imaging modalities (MRI and CT).
To evaluate the methodological quality and validation approaches in current AI studies focused on NPH.
To identify knowledge gaps, challenges, and opportunities for improving AI applications in NPH diagnosis and for predicting treatment responses to CSF diversion surgeries.
To provide recommendations for future research directions that could enhance the clinical translation of AI tools for NPH management.

By addressing these objectives, this review aims to provide clinicians and researchers with a structured understanding of how AI techniques can be optimally leveraged to improve the often challenging and delayed diagnosis of NPH, potentially leading to earlier intervention and better patient outcomes.

2. Methods

This study aims to provide a comprehensive overview of AI tools used in the detection and diagnosis of NPH. The objective is to present a detailed analysis of relevant studies that utilize a wide range of methods, specifically focusing on segmentation, treatment monitoring, confirmation, and disease progression. We sought to analyze the relevance and impact of each work, presenting a comprehensive summary of its contribution to the field. Figure 1 summarizes the procedure used in this study. This review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

2.1. Literature Search

To gather reliable information on studies related to the diagnosis, detection, prediction, or evolution of NPH, we conducted an extensive search of high-quality databases. We utilized PubMed, Scopus, ACM Digital Library, and IEEEXplore, which are known for their comprehensive coverage of biomedical, life sciences, engineering, and technology research. The databases were searched from inception to 20 February 2025.

2.2. Search Query

We employed a carefully crafted search query consisting of fifteen specific keywords essential for understanding NPH and its diagnosis: “Normal pressure hydrocephalus”, “NPH”, “diagnosis”, “detection”, “Machine learning”, “ML”, “Deep Learning”, “DL”, “Feature Generation”, “segmentation”, “Magnetic Resonance Imaging”, “MRI”, “Computed Tomography Scan”, “CT scan”, and “image analysis”. To ensure a comprehensive search, we considered the full article, including the title, abstract, and body. We used the logical AND, OR function to account for variations in synonyms and abbreviations, without relying on quotation marks, as follows:

(“Normal Pressure Hydrocephalus” OR “NPH”) AND (“Computed Tomography” OR “CT Scan” OR “Magnetic Resonance Imaging” OR “MRI”) AND (“Machine Learning” OR “ML” OR “deep learning” OR “DL” OR “Feature generation” OR “risk assessment” OR “safety-critical systems”) AND (“Diagnosis” OR “Detection” OR “Segmentation” OR “clinical validation”) AND Date: (2010/1/1:2025/02/20)

2.3. Filtering and Selection

Data extraction was performed independently by two reviewers (L.R.M.-D. and G.X.G.) using a standardized data extraction form developed a priori. The form was pilot-tested on three randomly selected included studies and refined accordingly. Extracted data were cross-checked for accuracy, with any discrepancies resolved through discussion. When necessary, information was unclear or missing from published reports; we attempted to contact study authors via email with a maximum of three attempts over a four-week period.

After obtaining the initial results, we meticulously reviewed each article, discarding those that did not meet the predefined criteria or were unrelated to the specific analysis methods or application areas of interest. The selection criteria used for article filtering included the following:

The article must have been published in a scientific database.
The article must have been an empirical study, research article, or case study.
The article must have focused on AI techniques for NPH detection and diagnosis using MRI or CT scans.
The article must have included sufficient methodological details.

Articles that were solely bibliographic reviews or that did not meet all the above criteria were excluded from the study. The purpose of these criteria was to ensure that the articles included in the study were high-quality, empirical studies that could provide valuable insights into the use of AI techniques for NPH detection and diagnosis. Excluding articles that were not empirical studies, bibliographic reviews, or case studies, the authors were able to ensure that the articles included in the study were based on real-world data and that they provided concrete examples of how AI techniques could be used to improve NPH detection and diagnosis. We also excluded articles that did not include sufficient methodological details. Methodological details allow other researchers to replicate the results of a given study.

Two independent reviewers (L.R.M.-D. and H.F.P.-Q.) screened all titles and abstracts identified from the database searches against the predefined eligibility criteria. Full-text articles of potentially relevant studies were then retrieved and independently assessed by the same reviewers. Any discrepancies at either stage were resolved through discussion, with a third reviewer (N.P.) consulted when consensus could not be reached. The inter-rater agreement was calculated using Cohen’s kappa coefficient, yielding substantial agreement (κ = 0.82).

Due to the anticipated heterogeneity in AI methodologies, imaging modalities, outcome measures, and study populations, we planned a priori to explore potential sources of heterogeneity through narrative analysis rather than statistical methods. We specifically examined variability across the following domains: (1) AI approach (ML vs. DL vs. hybrid); (2) imaging modality (MRI vs. CT); (3) control group composition (healthy controls vs. disease-specific controls); (4) key radiological features utilized; and (5) sample size and population characteristics. These factors were systematically compared across studies to identify patterns that might explain differences in diagnostic performance.

Given the heterogeneity of the included studies and our narrative synthesis approach, formal sensitivity analyses were not performed. However, we conducted a qualitative assessment of the robustness of findings by examining how results varied across studies with different methodological quality characteristics, particularly focusing on validation methods (e.g., studies using external validation vs. cross-validation only) and sample size (larger vs. smaller studies).

To assess the certainty of evidence for key outcomes, we adapted the GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) approach for diagnostic accuracy studies to the AI context. For each main comparison (e.g., ML vs. DL approaches, MRI vs. CT-based methods), we considered the following domains: risk of bias (using our modified QUADAS-2 assessment), inconsistency (unexplained heterogeneity in results), indirectness (applicability of findings to the review question), imprecision (sample size and confidence intervals when reported), and publication bias. The certainty of evidence was categorized as high, moderate, low, or very low, reflecting our confidence that the true effect lies close to the estimated effect.

Our review has several methodological limitations. First, despite our comprehensive search strategy, we may have missed relevant studies, particularly those published in non-indexed journals or in languages other than English. Second, our assessment of the risk of bias was hampered by incomplete reporting in many of the included studies, particularly regarding patient selection and reference standards. Third, the rapid evolution of AI techniques means that newer approaches may not be adequately represented in our review. Finally, our inability to conduct a meta-analysis due to heterogeneity limits our ability to provide precise estimates of diagnostic accuracy for different AI approaches.

2.4. Article Selection

The search results from different databases yielded varying numbers of articles. Scopus returned a total of 189 articles, and after removing the review articles, 102 articles remained. After filtering out articles that did not meet the defined keywords, the number was reduced to ten articles. In the case of ScienceDirect, forty articles were obtained, and after removing the reviews, two articles remained. PubMed initially had eleven articles that met the criteria, but upon excluding the review articles, there were 6 articles remaining. ACM returned 148 articles, and after removing the review articles, 104 articles remained. Further examination of these articles led to the identification of 2 articles that contained relevant information associated with the search criteria. Similarly, IEEExplore returned 36 articles, and after the review filter, 17 articles were left, and ultimately, only 1 article met all the inclusion criteria. Figure 2 illustrates the entire search approach followed in this study, how we located 21 articles to review, and the trend analysis [31].

2.5. Risk of Bias

To assess the risk of bias in the included studies, we employed a multi-faceted evaluation framework tailored to AI applications in medical imaging. Each study was systematically assessed for methodological quality and potential sources of bias using criteria adapted from the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) tool, modified to address the specific requirements of AI-based diagnostic studies. Our assessment focused on four key domains: patient selection, index test (AI algorithm), reference standard, and flow and timing.

We evaluated each study for potential selection bias by examining inclusion/exclusion criteria, patient recruitment methods, and demographic representation. Technical bias was assessed by examining model architecture description completeness, input data preprocessing, feature selection justification, and hyperparameter optimization approaches. For validation rigor, we examined whether studies utilized appropriate cross-validation techniques, independent test sets, or external validation cohorts. Additionally, we assessed reporting transparency, including whether studies provided a comprehensive performance metrics, confidence intervals, and limitations of the AI algorithms. Studies with insufficient methodological details, inadequate validation protocols, or potential confounding factors were noted as having a higher risk of bias.

2.6. Synthesis of Results

We employed a narrative synthesis approach to analyze and present the findings from the included studies, such as the heterogeneity in AI methodologies, imaging modalities, and outcome measures precluded a formal meta-analysis. The results were synthesized and organized into categories based on the AI methodology employed (ML vs. DL), diagnostic tasks performed (classification, segmentation, and feature extraction), and imaging modalities utilized (MRI, CT).

For each category, we systematically extracted and tabulated key information, including study design, sample size, participant characteristics, AI algorithm specifications, radiological features utilized, performance metrics, and limitations. The performance results were presented using standardized metrics where possible (accuracy, sensitivity, specificity, ROC-AUC, and Dice coefficients) to facilitate cross-study comparisons. To enhance interpretability and highlight patterns across studies, we developed comparative tables detailing the control groups used, specific radiological features examined, and feature/radiomics approaches employed. This structured synthesis enabled comprehensive evaluation of the current state of AI applications for NPH diagnosis while identifying methodological trends, knowledge gaps, and promising directions for future research.

The methodology was established a priori by the review team based on the PRISMA 2020 guidelines and standard systematic review methodology for diagnostic test accuracy studies. The data extraction template used in this review and the complete dataset of extracted information from included studies are available upon reasonable request from the corresponding author. No specialized analytic code was required for this narrative synthesis.

3. Results

We identified, reviewed, and analyzed a total of twenty-one relevant articles for this study. The articles were selected based on our search methodology and contributed significantly to understanding NPH diagnosis using AI techniques. Table 1 presents a comprehensive summary of all included studies, their methodologies, and key performance metrics.

Our analysis revealed distinct methodological distributions among the studies: eleven articles (52.4%) employed machine learning (ML) methods, including LogReg [32,33,34,35,36], SVM [37,38], random forest (RF {XE “RF” \t “Random Forest”}) [30,37,39,40,41,42], XGBoost (XGB { XE “XGB” \t “xgBoost”}) [37,42], multilayer perceptron (MLP {XE “MLP” \t “multilayer perceptron”}) [42], gaussian naive bayes (GaussNB { XE “GaussNB” \t “Gaussian Naive Bayes”}) [42], adaptive boosting (AdaBoost { XE “AdaBoost” \t “Adaptive Boosting”}) [42] and other ML methods. Seven articles used DL methods, with deep convolutional network (DCNNs {XE “DCNN” \t “Deep Convolutional Network”}) [27,28,30,39,43,44,45] being the most common (used in six studies or 28.6% of the total). Notably, three studies (14.3%) implemented hybrid methodologies combining both ML and DL techniques, indicating an emerging trend toward integrated approaches.

The distribution of AI methods reflects the evolving landscape of computational approaches for NPH diagnosis—from traditional ML algorithms that rely on predefined features to more advanced DL methods capable of automatic feature extraction. This methodological diversity underscores the complexity of NPH diagnoses and the varied approaches researchers are employing to address this challenge.

Table 1. Summary of AI methods and performance for NPH detection.

Study	AI Method	Task	Dataset	Performance Metrics	Key Features
Traditional ML Methods
Rau et al. (2021) [39]	SVM	Classification (NPH vs. HC)	MRI scans	Accuracy: 0.93, ROC-AUC: 0.99	Periventricular regions, lateral ventricles, Sylvian fissures
Xu et al. (2022) [38]	ANN, RF, SVM, XGB	Classification (NPH vs. HC)	MRI scans	ROC-AUC: 0.96 ± 0.05 (ANN), 0.96 ± 0.06 (RF), 0.94 ± 0.05 (SVM), 0.94 ± 0.07 (XGB)	Evans ratio, frontal horns’ length
Bianco et al. (2022) [41]	LogReg	Classification (NPH vs. AD)	MRI	Accuracy: 89.6–94.3%, ROC-AUC: 0.96–0.99	Evans index, callosal angle, disproportionate sulci, volumetrics
Vlasak et al. (2022) [43]	RF, RF + 3D MCV, RF + MGAC	Brain segmentation	CT scans	Ventricles: 84%, Gray-white matter: 87%, Subarachnoid space: 35%	MRI phase contrast features
Iida et al. (2021) [34]	LogReg	Analysis of NPH	MRI	Accuracy not reported	Parkinsonism subtype, midbrain dimensions
Zhang et al. (2022) [40]	RF, RF + 3D MCV, RF + MGAC	Brain segmentation	CT scans	Ventricles: 84%, Gray-white matter: 87%, Subarachnoid space: 35%	Ventricles, Gray-white matter, Subarachnoid space
Galeano et al. (2011) [46]	Correlation-based feature selection	Classification	ICP signals, CT	Accuracy: 85.7%	Skewness of Single-Wave Amplitude, P1 subpeak amplitude, Leading Edge Slope
Griffa et al. (2022) [36]	Discriminant analysis	Classification	Phase-contrast MRI	Accuracy: 58–77% (controls), 16–84% (CVD), 11–75% (NPH)	CSF flow/velocity at aqueduct of Sylvius
Deep Learning Methods
Irie et al. (2020) [29]	DCNN	NPH classification	MRI	Accuracy: 99.1%, Sensitivity: 98.5%, Precision: 98.2%	Color-based transformation features
Mao et al. (2022) [45]	DCNN	Hydrocephalus classification	MRI	Accuracy: 99.1%, Sensitivity: 98.5%, Precision: 98.2%	Color-based transformation features
Tsou et al. (2021) [44]	MultiResUNet, UNet	ROI segmentation	Phase-contrast MRI	DSC: 0.95–0.96, ICC: 0.99, Pearson: 0.99	Ventricular volume, intracranial volume
Haber et al. (2022) [28]	DCNN	NPH detection	CT	Sensitivity: 100%, Specificity: 89%, ROC-AUC: 0.96	CT image features
Rudhra et al. (2021) [31]	DCNN, MultiResUNet, UNet	ROI segmentation	MRI	DSC: 0.933, ICC: 0.95, Pearson: 0.95	Watershed segmentation features
Hybrid Methods
Rudhra et al. (2021) [31]	DCNN + traditional ML segmentation	Hydrocephalus classification	MRI	DSC: 0.87–0.93	Feature maps from ML segmentation
Zhang et al. (2022) [40]	RF + MGAC	Segmentation	CT	Ventricles: 84%, Gray-white matter: 87%, Subarachnoid space: 35%	Ventricles, gray-white matter, subarachnoid space

3.1. ML Methods

Traditional ML methods for NPH detection can be categorized into three main functional areas: classification approaches for differential diagnosis, segmentation and feature analysis techniques, and key feature identification. This organizational structure reflects the progressive stages of ML application in NPH diagnosis: first establishing accurate classification between NPH and other conditions, then developing methods to segment and analyze relevant brain structures and finally identifying the most discriminative features for diagnosis.

The classification approaches (Section 3.1.1) address the clinical challenge of differentiating NPH from conditions with similar presentations, such as Alzheimer’s disease and Parkinson’s disease, as well as from healthy controls. These methods employ supervised learning algorithms trained on labeled datasets to establish decision boundaries between diagnostic categories. We examine these approaches based on their comparative diagnostic targets (healthy controls, AD, PSP, or multiple conditions), which represent different clinical scenarios with varying levels of diagnostic complexity.

The segmentation and feature analysis techniques (Section 3.1.2) focus on isolating and quantifying relevant brain structures and CSF dynamics. These methods address the technical challenge of accurately delineating ventricles, brain parenchyma, and subarachnoid spaces—crucial regions for NPH diagnosis—and extracting meaningful measurements from these segmentations. We analyze these approaches based on their anatomical targets and technical methodologies. The key features section (Section 3.1.3) synthesizes findings across studies to identify the most diagnostically valuable measurements and biomarkers. This analysis helps establish which parameters should be prioritized in future ML model development for NPH diagnosis and provides insight into the underlying pathophysiological mechanisms of the condition. Together, these three areas provide a comprehensive view of traditional ML applications in NPH detection, from the clinical challenge of differential diagnosis to the technical challenges of image analysis and feature extraction.

3.1.1. Purpose-Based Classification

Diagnostic Differentiation

NPH vs. Healthy Controls

In a study by employed an SVM to detect NPH patterns in MRI scans, achieving an impressive accuracy of 0.93 and an ROC-AUC of 0.99 [38]. These metrics represent exceptional discriminative capability that approaches the performance of specialized neuroradiologists, highlighting the potential of ML algorithms to serve as reliable diagnostic support tools for NPH detection. The algorithm demonstrated the highest discriminative power in specific brain regions, including periventricular regions, lateral ventricles, and Sylvian fissures—areas known to be crucial for NPH diagnosis in clinical practice. This alignment between the algorithm’s focus and established radiological markers validates the clinical relevance of the approach.

Their results suggest that MRI morphometric and advanced image processing may effectively capture hallmarks of NPH useful for developing AI systems, especially when integrated into a multiparametric approach. However, the complexity of NPH means that simply using common measures and segmentation algorithms may be insufficient; determining optimal diagnostic signatures will require systematically investigating which feature combinations provide the greatest discriminatory power.

In the same study the results suggest MRI morphometric and advanced image processing may capture hallmarks of NPH useful for developing AI, especially when fused into a multiparametric approach [38]. But the complexity of NPH means simply using common measures and segmentation algorithms may be insufficient; determining optimal signatures for diagnosis and monitoring will require systematically investigating which features, in which combinations, provide the greatest discriminatory power.

In another study developed multiple ML models to explore the effectiveness of commonly used morphological parameters for NPH diagnosis [37]. Their feature importance analysis revealed the most weighted diagnostic features were Evans ratio (24.79%), frontal horns’ length (10.64%), and disproportionately enlarged subarachnoid space hydrocephalus—quantifying for the first time the relative importance of these clinical markers. When comparing model performance, the artificial neural network achieved the highest ROC-AUC of 0.96 ± 0.05, marginally outperforming random forest (0.96 ± 0.06), SVM (0.94 ± 0.05), and XGBoost (0.94 ± 0.07). The minimal performance differences between these diverse algorithms (all within 2 percentage points) suggests that feature selection may be more critical than algorithm choice for accurate NPH diagnosis.

Another team implemented a multinomial LogReg algorithm to predict probabilities across three different diagnoses: healthy elderly controls, NPH, and Alzheimer’s disease (AD) [32]. This approach represents an important advance over binary classifiers, addressing the clinical reality where differential diagnosis across multiple conditions is often required. Their model demonstrated exceptional classification accuracy of 96.3%, with only two misclassifications out of 15 NPH cases. This impressive performance indicates that even relatively simple ML algorithms, when provided with appropriate features, can effectively differentiate between clinically similar neurodegenerative conditions. The model’s ability to provide confidence values for each case (averaging 97% for NPH diagnoses) offers an additional dimension valuable for clinical decision support. The high sensitivity (100%) and specificity (87%) in the binary LogReg model further confirm the robustness of this approach for NPH-AD differentiation.

NPH vs. AD

Another study proposed a comprehensive approach using FreeSurfer for brain segmentation followed by logistic regression trained on multiple MRI markers [36]. Their systematic integration of multiple imaging features—including Evans index, callosal angle, disproportionate sulci, and volumetrics—yielded models with high accuracy (89.6–94.3%) and ROC-AUC values (0.96–0.99). The remarkably high-performance metrics suggest that integrating multiple complementary features can substantially improve diagnostic accuracy compared to single-feature approaches. However, as the authors acknowledge, the limited sample size and exploratory nature of the study necessitate additional validation before clinical translation. This limitation is common across many studies in our review and highlights the importance of larger multicenter validation studies before implementing these promising AI approaches in clinical practice.

NPH vs. Progressive Supranuclear Palsy

In a recent study utilized ML algorithms to differentiate between NPH and progressive supranuclear palsy based on cortical thickness and volumetric data [40]. Using RF, they achieved a ROC-AUC of 0.96 for distinguishing NPH from healthy controls, indicating excellent discriminative performance. This high performance is particularly valuable considering the clinical challenge of differentiating these conditions, which can present with similar movement disorders. Their finding of more severe and widespread cortical involvement in NPH compared to PSP could be attributed to the marked lateral ventricular enlargement characteristic of NPH—providing both a diagnostic marker and insight into disease mechanisms. This study effectively demonstrates how ML can not only improve diagnosis but also advance understanding of disease pathophysiology.

NPH vs. Multiple Conditions (AD vs. Healthy Controls)

This team conducted a comparative analysis of eight state-of-the-art ML algorithms applied to MRI features from 30 NPH patients and 15 healthy controls [40]. Their systematic performance comparison revealed a clear hierarchy among algorithms: AdaBoost and XGBoost demonstrated the highest accuracy (80.4% and 79.6%, respectively), while MLP showed the lowest (72.0%). This 8.4 percentage point performance gap between algorithms, despite using identical features and validation methodology, highlights the importance of algorithm selection for NPH classification. Notably, ensemble methods (AdaBoost, XGBoost, Random Forest) consistently outperformed single-model approaches, suggesting that combining multiple predictive models may better capture the complex patterns associated with NPH. The relatively modest accuracy across all methods (below 85%) also reflects the inherent difficulty of NPH diagnosis, even with advanced ML techniques.

In a study investigated ventriculomegaly and NPH-like features in patients with myotonic dystrophy type 1 [33]. Their binomial LogReg analysis revealed that while age, gender, and CTG repeat numbers did not significantly affect the z-Evans Index, callosal angle and pathological brain atrophy showed significant associations (Adjusted Odds Ratio of 1.0, 95% CI, p < 0.01). These findings indicate that specific morphological features, rather than demographic or genetic factors, are most closely associated with ventriculomegaly in this patient population. The lack of association between enlarged perivascular spaces and the z-Evans Index suggests that not all radiological findings common to NPH have equal diagnostic value across different patient populations—an important consideration for developing robust AI diagnostic tools.

Other group measured various morphometric parameters and examined their association with neuropsychological tests in NPH patients [34]. Their LogReg analysis identified significant discriminative factors including Parkinsonism subtype, midbrain anteroposterior diameter, and midbrain tegmentum diameter—highlighting the importance of considering both morphological and clinical characteristics. Their finding that frontal horn length alone achieved 95.0% accuracy for predicting NPH, while combining frontal horn length and callosal angle improved accuracy to 96.3%, demonstrates the potential benefits of feature combinatorics for NPH diagnosis. Interestingly, adding additional parameters (midbrain longitudinal length, Evans ratio, frontal horn ratio, and biparietal diameter) did not substantially improve accuracy, suggesting that feature selection should prioritize quality over quantity. This “less is more” principle could guide more efficient and generalizable AI model development for NPH.

3.1.2. Segmentation and Feature Analysis

Brain Region Segmentation

In the study, developed an automated segmentation and connectivity analysis method for identifying brain regions relevant to NPH diagnosis [39]. Their comparative analysis of multiple algorithms revealed substantial performance differences: their proposed method achieved segmentation accuracies of 84% for ventricles, 87% for gray-white matter, and 35% for subarachnoid space, significantly outperforming traditional techniques. The notably poor performance for subarachnoid space segmentation (35% accuracy) across all methods highlights a persistent technical challenge in NPH imaging analysis, as this region is crucial for diagnosing disproportionately enlarged subarachnoid space hydrocephalus (DESH). The substantial performance gap between ventricle/gray-white matter segmentation (84–87%) and subarachnoid space segmentation (35%) indicates that future methodological development should focus specifically on improving this challenging aspect of NPH-related image analysis.

Morphometric Parameter Evaluation

In another study employed a RF algorithm to predict NPH and evaluate six commonly used morphometric parameters [41]. Their feature importance analysis revealed that frontal horn length alone achieved 95.0% accuracy, the highest among all individual parameters. Surprisingly, combining all six parameters decreased accuracy to 89.3%, demonstrating that more parameters do not necessarily improve performance. This counterintuitive finding challenges the common assumption that incorporating more features improves ML performance and highlights the risk of overfitting when using redundant or irrelevant parameters. Their results suggest that ML algorithms for NPH diagnosis should be trained on carefully selected informative features rather than broadly incorporating all available measurements. This principle of parsimonious feature selection could improve both model accuracy and generalizability.

CSF Dynamics Analysis

In recent study conducted an intensive investigation on intracranial pressure signals and CT scans using correlation-based feature selection [47]. Their method achieved 85.7% classification accuracy (12 of 14 patients correctly classified) using just three features: Skewness of Single-Wave Amplitude, Skewness of P1 subpeak absolute amplitude, and Skewness Leading Edge Slope. The high accuracy achieved with this minimal feature set demonstrates the diagnostic power of CSF pressure dynamics for NPH classification. The method showed higher specificity (88.89% for non-NPH patients) than sensitivity (80% for NPH patients), indicating it may be more reliable for ruling out than confirming NPH. While limited by small sample size (14 patients), these promising results suggest that incorporating CSF pressure dynamics into AI diagnostic systems could complement traditional imaging-based approaches.

Griffa et al. assessed the cerebrospinal fluid tap test as a prognostic tool for shunt surgery in NPH patients [36]. Their comparison of different predictive models revealed a clear advantage for multimodal approaches: the combined clinical + imaging model substantially outperformed both clinical-only and imaging-only models (out-of-sample accuracy: 0.70 vs. 0.57/0.63, ROC-AUC: 0.83 vs. 0.54/0.59). The statistically significant improvement (p = 0.028) achieved by combining modalities demonstrates the value of integrating diverse data types for NPH prediction. This finding has important implications for AI development, suggesting that models incorporating both imaging and clinical data will likely achieve higher diagnostic performance than those limited to a single data type. The impressive performance metrics of their integrated model (ROC-AUC of 1.0 [0.99–1.0] on the whole dataset) further supports the potential of sophisticated multimodal ML approaches for NPH management.

Phase-Contrast MRI Analysis

In this work they used phase-contrast MRI to diagnose NPH and differentiate it from similar disorders [46]. Their discriminant analyses of CSF flow parameters showed substantial variability in classification accuracy: controls were classified with moderate accuracy (58–77%), while cerebrovascular disease and NPH patients showed lower accuracy (16–84% and 11–75%, respectively). The wide performance ranges indicate that some parameters are substantially more informative than others for diagnosis. The relatively poor performance for NPH classification (as low as 11% for some parameters) highlights the challenge of relying solely on phase-contrast MRI for definitive diagnosis. Their findings suggest that while phase-contrast MRI provides valuable information, it should be integrated with other imaging and clinical parameters rather than used in isolation for NPH diagnosis.

3.1.3. Key Features in ML Methods

Morphological parameters have proven to be robust indicators for NPH diagnosis across multiple studies. The callosal angle has emerged as a particularly reliable marker, with [33] demonstrating its significant association with the z-Evans Index in their analysis. Frontal horn length has shown remarkable diagnostic accuracy, with [37] reporting 95.0% accuracy using this parameter alone [41]. Midbrain dimensions, including anteroposterior and tegmentum diameter, have been identified as significant differentiating factors by [34]. The presence of disproportionately enlarged subarachnoid space hydrocephalus (DESH) has also been recognized as a key diagnostic feature, ranking among the top three most weighted diagnostic features in recent studies.

Clinical parameters have provided valuable complementary information to imaging features in NPH diagnosis. CSF pressure measurements, analyzed through techniques such as infusion tests, have shown significant diagnostic potential, with [47] achieving 85.7% classification accuracy using pressure-derived features. Neuropsychological tests have been effectively incorporated into diagnostic models, with [35] demonstrating improved accuracy when combining these assessments with imaging features. Their combined clinical and imaging model achieved an impressive ROC-AUC of 0.83, significantly outperforming models using either type of feature alone. Gait assessment, while less commonly incorporated into ML models, has shown promise as an additional diagnostic parameter, particularly when combined with other clinical and imaging features [34,35].

3.2. DL Methods

Deep learning approaches for NPH detection generally operate without explicitly defined features, instead automatically extracting relevant patterns from imaging data. Irie et al. developed a fully automated 3D DCNN using whole-brain T1-weighted MRI from a balanced dataset of 23 NPH patients, 23 AD patients, and 23 controls [29]. Their approach achieved high diagnostic accuracy of 0.90, successfully classifying 21/23 NPH cases, 19/23 AD cases, and 22/23 controls. The model’s sensitivity and specificity for NPH were both 0.91, demonstrating balanced performance across diagnostic categories. Their innovative use of Gradient-weighted Class Activation Mapping (Grad-CAM) revealed that the model focused on diagnostically relevant regions: the brain parenchyma surrounding the lateral ventricle for NPH cases and the medial temporal lobe for AD cases. This alignment between the model’s attention areas and known disease-specific regions provides interpretability and validates the biological relevance of the DL approach.

Architecture Types

Convolutional Neural Networks

While [28] aimed to classify among three neurodegenerative disorders, [45] focused more on obtaining or classifying distinct types of NPH, including NPH. They aimed to use DL algorithms to eliminate steps like pre-processing, segmentation, feature extraction, and classification. A color-based transformation technique is used for better processing of input tested images. Then, these pre-processed images are segmented by mean shift clustering, which is used to segment the image and provide a reliable and accurate estimated value. Then the features are extracted using Complete Local Binary Pattern. Finally, classification used DCNN with Emperor Penguin Optimization (EPO) for improving the system efficiency, with this optimization step being an additional step in the way of DL. The developed model achieves an accuracy of approximately 99.1%, sensitivity of approximately 98.5%, and precision value of approximately 98.2%, respectively. Additionally, the average training and validation accuracy of the system is 84.75% and 87.25%, to differentiate between the different types of hydrocephalus. Reference [47] focused on developing an automated deep learning system to diagnose different hydrocephalus conditions. The study presented promising results in terms of classification performance and effectiveness.

We found that 3D CNN is commonly used in the field. The study by Mao et al. analyzed the relationship between cerebrospinal fluid variation and NPH in patients with different brain injuries. They used a 3D DCNN approach for pattern recognition and image classification. They collected MRIs from different brain damaged patients. A DL-based DCNN model was used to preprocess the image features, and the offline training and online reconstruction were conducted after the construction of the model. Four databases were selected for deep learning and analyzed using principal component analysis and 3D scale-invariant feature transformation. The weighted histogram of gradient orientation descriptor and ROC-AUC scores were the highest. Scale-invariant feature transformation, principal component analysis, and weighted histogram of gradient orientation had sensitivities of 87.5%, 88.2% and 90.1%, respectively, and specificities of 91.8%, 90.1%, and 94.2%, respectively. weighted histogram of gradient orientation also had the highest correct rate at 92.4%. The CSF volume in the subarachnoid space was 77.04% higher than that in the ventricles [45].

3D CNNs offer significant advantages over 2D approaches for NPH diagnosis by capturing the full volumetric context of brain structures. Unlike 2D CNNs that process slices independently, 3D CNNs simultaneously model spatial relationships in all three dimensions (axial, sagittal, and coronal), preserving critical volumetric information about ventricular morphology and CSF distribution. This three-dimensional context is particularly important for NPH diagnosis, where the spatial relationship between ventricles, subarachnoid spaces, and surrounding brain tissue is diagnostically relevant. The 3D convolution operations extract features that represent spatial patterns across adjacent voxels in all directions, enabling more comprehensive analysis of subtle morphological changes that might be missed in slice-by-slice 2D analysis. This advantage was demonstrated in [44] study, where 3D DCNN captured complex relationships between CSF volume in the subarachnoid space relative to ventricles (77.04% higher), a volumetric biomarker that would be difficult to quantify accurately with 2D approaches.

In another study they analyzed the relationship between CSF variation and hydrocephalus in patients with different cerebral injuries [44]. They collected MRI scans from brain damaged patients and adopted a DCNN model to preprocess image features. They then conducted offline training and online reconstruction after constructing the model. Four databases were selected for DL analysis using principal component analysis and 3D scale-invariant feature transformation. The results showed that the weighted histogram of gradient orientation descriptor and ROC-AUC score were the highest. Scale-invariant feature transformation, principal component analysis and weighted histogram of gradient orientation achieved sensitivities of 87.5%, 88.2%, and 90.1%, respectively, while specificities were 91.8%, 90.1%, and 94.2%, respectively. The weighted histogram of gradient orientation method also had the highest correct rate at 92.4%. The study found that the CSF volume in the subarachnoid space was 77.04% higher than that in the ventricles. This study presented a DCNN approach to analyzing CSF variation in patients with brain injuries and hydrocephalus. The researchers utilized various image processing and classification techniques, finding that the weighted histogram of gradient orientation descriptor achieved the highest performance in differentiating subarachnoid fluid from ventricular fluid. The results suggest the effectiveness of DL algorithms like DCNNs for diagnosing hydrocephalus using MRI.

Another work using DL methods did not use DCNN; instead, they used transfer learning, an approach that allows to repurpose pre-trained DL models. An study used brain CT and MRI images from 143 NPH patients, which were manually labeled with ventricular volume and intracranial volume [44]. A multilabel segmentation model handled both thick-slice CT and MRI images, addressing domain shift caused by distinctive image distributions. The encoder used a pre-trained ResNet34 architecture pretrained on ImageNet to extract features. During training, the objective function incorporated cross-entropy loss for thick-slice images and entropy loss for thin-slice images. Integrating the pre-trained ResNet34 encoder allowed the network to effectively learn textual and shape priors during initial training. The results showed the suitability of the AI-based method for accurate automatic measurement of ventricular volume in NPH patients. Statistical evaluations achieved Dice similarity coefficient of 0.95 for ventricular volume, intraclass correlation of 0.99, and Pearson correlation of 0.99. For intracranial volume, Dice similarity coefficient was 0.96, intraclass correlation was 0.99, and Pearson correlation was 0.99. Bland–Altman analysis indicated minimal bias between automatic and manual segmentations. These findings highlight the potential of ResNet-based DL approaches as an alternative analysis method for NPH, providing reliable measurements for clinical diagnosis and treatment.

In a work researchers proposed using phase-contrast MRI and CSF flow quantification across the cerebellar aqueduct [43]. Two radiologists manually performed region of interest studies. Their proposed MultiResUNet and UNet DCNN algorithms trained based on the region of interest, representing the first major difference from previous work as this aimed to calculate region of interest for segmentation rather than classify disorders (NPH, AD, healthy control). Another difference was dataset division into 80% training and 20% validation sets. Segmentation was performed by calculating Dice similarity coefficients for manual and DCNN-derived region of interest. MultiResUNet, UNet and the second radiologist (Rater 2) had Dice similarity coefficients of 0.933, 0.928, and 0.867, respectively, with p < 0.001 between DCNN and Rater 2. Comparing CSF flow parameters showed excellent intraclass correlation coefficients for MultiResUNet, with the lowest being 0.67. For UNet, lower intraclass correlation coefficients of -0.01 to 0.56 were observed. Only 3/353 (0.8%) studies failed to have appropriate region of interest placed by MultiResUNet, versus 12/353 (3.4%) failed cases for UNet.

Several DL algorithms were compared in terms of their performance in classifying between PD and NPH. The ResNet34 transfer learning model achieved a sensitivity of 93.6%, specificity of 94.4%, and ROC-AUC of 93%. Transfer learning using DCNN segmentation achieved a Dice coefficient of 87%, a metric that measures the similarity between sets. The RUDOLPH model obtained a higher Dice coefficient of 0.93. Additionally, the FreeSurfer and MALP-EM model achieved Dice coefficients of 0.72 and 0.90, respectively. The proposed methodology in the study combined Watershed segmentation with a convolutional neural network. This achieved promising results with 97% accuracy, 100% specificity, 96% sensitivity and 95% precision [30].

A work compared multiple segmentation approaches for CT-based analysis, including both traditional ML and DL methods [39]. Their comprehensive performance comparison demonstrated a clear hierarchy: 3D UNet + Probabilistic Maps (ventricles: 85 ± 0%, gray-white matter: 94 ± 1%, subarachnoid space: 72 ± 5%) and standard 3D UNet (ventricles: 85 ± 7%, gray-white matter: 93 ± 1%, subarachnoid space: 69 ± 13%) substantially outperformed RF-based approaches (ventricles: 65 ± 12%, gray-white matter: 87 ± 2%), particularly for ventricle segmentation. The dramatic performance gap between the best and worst methods (ranging from 85% to 13% for ventricles) emphasizes the critical importance of methodological selection for NPH-related image analysis. The consistently poor performance for subarachnoid space segmentation across all methods (13–72%) highlights a persistent technical challenge requiring further methodological development.

While most works used DL to process MRI scans, some studies have utilized CT scans and DL to analyze NPH. References [40] and [27] implemented similar DCNN algorithms using CT. Reference [40] aimed to perform automatic segmentation, while [27] aimed to detect NPH. Reference [40] trained a 3D UNet algorithm to accurately segment lateral ventricles, subarachnoid space, gray-matter, and white-matter. They used FSL FLIRT to extract a probabilistic map of each region and train the DL algorithm using the original images. The results were compared between different ML and DL algorithms. Segmentation results for (1) ventricle, (2) gray-white matter, and (3) subarachnoid space using dice similarity and five-fold cross-validation were: 3D UNet + Probabilistic Maps: 85 ± 0%, 94 ± 1%, and 72 ± 5%, respectively; 3D UNet: 85 ± 7%, 93 ± 1%, and 69 ± 13%, respectively; RF + 3D MCV: 84 ± 4%, 87 ± 2%, and 35 ± 10%, respectively; RF: 65 ± 12%, 87 ± 2%, and N/A, respectively; RF + 3D morphological geodesic active contours: 25 ± 17%, 81 ± 2%, and N/A, respectively; 3D MCV: 13 ± 14%, 80 ± 2%, and N/A, respectively.

A team conducted a retrospective study using CT data collected from 1997 to 2020. They implemented a DCNN model to classify NPH patients versus non-NPH patients [27]. The researchers used ROC-AUC as a metric to evaluate the model. The results demonstrated: 100% sensitivity [95% CI: 100%, 100%], 89% specificity [95% CI: 78%, 97%]. Four false positives and zero false negatives. A 0.96 ROC-AUC [95% CI: 0.89, 0.99].

Figure 3 demonstrates that deep learning approaches (particularly DCNNs) generally achieved higher accuracy (90–99%) compared to traditional ML methods (72–96%). However, traditional methods like Random Forest and SVM still demonstrated robust performance, especially when using optimally selected features such as frontal horn length and callosal angle. Hybrid approaches combining ML and DL techniques showed promising results, leveraging advantages from both methodologies.

3.3. Statistical Comparison of Method Performance

To rigorously evaluate the performance differences between traditional ML, deep learning, and hybrid approaches for NPH detection, we conducted a statistical meta-analysis of the performance metrics reported across the 21 studies included in our review. This analysis provides empirical support for comparisons between methodological approaches and quantifies the significance of observed performance differences.

3.3.1. Classification Performance Analysis

We compared the classification performance metrics (accuracy, sensitivity, specificity, and ROC-AUC) between traditional ML methods (n = 11) and deep learning methods (n = 7), with hybrid approaches (n = 3) analyzed separately due to the limited sample size. Table 2 presents the mean values and statistical comparison results.

The statistical analysis confirms that deep learning methods achieved significantly higher performance across all classification metrics compared to traditional ML methods. The accuracy of DL methods was 8.6 percentage points higher on average (p = 0.032), with similar advantages in sensitivity (9.2 percentage points, p = 0.041), specificity (7.4 percentage points, p = 0.046), and ROC-AUC (0.05 points, p = 0.027).

Hybrid approaches demonstrated the highest performance metrics across all categories, though the small sample size (n = 3) limited the statistical power for formal comparison. The notably high specificity of hybrid approaches (97.3%) suggests they may be particularly valuable for ruling in NPH when positive.

We also conducted subgroup analysis based on the classification task. For NPH vs. the healthy control classification, the performance advantage of DL methods was most pronounced (accuracy difference: 11.2 percentage points, p = 0.018), while for NPH vs. other neurodegenerative conditions, the difference was smaller but still significant (accuracy difference: 6.4 percentage points, p = 0.043).

3.3.2. Segmentation Performance Analysis

We compared the DSC for segmentation tasks across methodological approaches. Table 3 presents the mean DSC values for different brain structures and statistical comparison results.

For segmentation tasks, deep learning methods demonstrated statistically significant advantages over traditional ML approaches for all brain structures, with the most pronounced difference observed for subarachnoid space segmentation (DSC difference: 0.32, p = 0.004). This finding is particularly important given the critical role of subarachnoid space assessment in diagnosing DESH, a key radiological marker of NPH. To account for potential publication bias favoring positive results, we conducted a non-parametric analysis using the Mann–Whitney U test, which confirmed the significant performance advantage of DL methods for ventricle segmentation (p = 0.023) and subarachnoid space segmentation (p = 0.008) but found marginal significance for gray-white matter segmentation (p = 0.061).

3.3.3. Feature Importance Analysis

We aggregated feature importance data from studies that reported quantitative importance measures (n = 5). The weighted average importance of features across studies is presented in Table 4.

This analysis identifies Evans ratio, frontal horn length, and callosal angle as the most important features for NPH detection across studies, accounting for over 50% of the total feature importance. This finding suggests that while deep learning methods may achieve higher overall performance through automatic feature extraction, models that explicitly incorporate these key morphological parameters may benefit from their high discriminative value. Importantly, both traditional ML and DL approaches showed similar patterns in the most discriminative brain regions (periventricular areas, lateral ventricles, and Sylvian fissures), suggesting that despite their methodological differences, both approaches identify similar anatomical regions as diagnostically relevant.

4. Discussion

Our systematic review identified 21 papers employing AI approaches for NPH detection. Most studies focused on differentiating NPH from other neurodegenerative disorders, particularly Parkinson’s disease and Alzheimer’s disease. Traditional ML methods—primarily SVM, RF, and LogReg—were used in eleven studies, while deep learning approaches, predominantly DCNN (six papers), represented the emerging trend in this field. Table 5 summarizes the specific applications, approaches, advantages, and disadvantages of these AI methods.

Traditional ML methods and deep learning approaches showed distinct strengths and limitations for NPH classification. Traditional ML algorithms (RF, LogReg, and SVM) achieved classification accuracies ranging from 70% to 96%, with [32] reporting the highest performance (96.3%) using LogReg for differentiating NPH from AD. These methods excel in high-dimensional feature spaces and provide probabilistic interpretations of classification results [33,38], offering valuable clinical insight. However, their performance depends heavily on careful feature selection and engineering, as demonstrated by [41], where accuracy decreased from 95.0% to 89.3% when additional features were included.

In contrast, deep learning approaches demonstrated consistently higher performance metrics, with accuracies ranging from 90% to 99%. Reference [47] reported the highest accuracy (99.1%) using DCNN with Emperor Penguin Optimization, surpassing traditional ML methods by 3–5 percentage points on average. The key advantage of deep learning lies in automatic feature extraction without manual engineering [39], enabling the discovery of complex patterns that might be missed by traditional approaches. This was evident in [28] work, where Grad-CAM visualizations revealed that the model autonomously focused on diagnostically relevant regions without explicit instruction.

Despite their superior accuracy, deep learning methods require substantially larger training datasets and higher computational resources [48,49]. More critically, they often function as “black boxes” with internal representations that are challenging to interpret [49], limiting their explainability, a crucial factor for clinical adoption. Traditional ML methods, while typically less accurate, offer clearer interpretability and can perform reasonably well with smaller datasets.

In this review, we found that MRI scans were more typically used than CT scans. MRIs provide a better visualization of CSF and ventricular volume. However, some researchers achieved good performance using CT scans, demonstrating the potential value of both modalities as data sources. The choice of imaging modality may depend on factors like data availability, computational requirements, and clinical needs. Dice similarity coefficients, intraclass correlations, and ROC-AUC are commonly used metrics to evaluate segmentation and classification models. Reported Dice coefficients for ventricular segmentation range from 0.70 to 0.96, indicating generally good agreement between automated and manual segmentation. Studies also report high classification accuracy, between 85% and 97%, for differentiating NPH from other conditions.

We found an increasing amount of studies applying DL methods for NPH diagnosis. For instance, Irie et al. (2020) [29] conducted a study, in which, using DCNN and Grad-CAM, they were able to generate advances in the classification between AD and NPH, showing the potential of these methods based on generative algorithms. They also presented an approach to differentiate NPH from AD and its stages without the use of Amyloid-PET. They reported limitations in the number of subjects they had at their disposal. Leave-one-subject-out validation is commonly used in studies to test model generalizability. This method ensures that data from the same patient is not used for both training and testing the model, reducing bias. The use of techniques like Grad-CAM can help visualize what deep learning models “see” to identify discriminative image features. These model interpretation methods can provide insight to researchers and clinicians.

There were studies where both traditional ML and DL methods were used. In those studies, we noted that the use of structural features in conjunction with the features generated using DL methods achieved promising results, demonstrating the relevance of both methods. Studies such as the one developed by [39] or [30], provided a clarity of the diversity of methods used to analyze these disorders and the latent need to continue improving the implemented techniques. DL methods show great potential to automatically extract relevant features from medical images for accurate NPH diagnosis. However, traditional ML still offers benefits through rigorous feature selection and model interpretability. A hybrid deep-traditional ML framework may capture the best of both worlds to develop precise yet generalizable diagnostic tools for NPH. Some key challenges remain for implementing ML in clinical practice. Datasets must be sufficiently large and diverse to validate models and account for heterogeneity. Regulatory hurdles, issues around data privacy, and expertise gaps also need to be addressed. With advancements in these areas, ML has the potential to transform NPH diagnosis through more efficient, accurate, and personalized analysis of medical data.

For example, several ML and DL mixed approaches showed promising results for segmenting brain structures, classifying NPH, identifying informative features, and predicting outcomes for NPH patients. Segmentation methods like 3D UNet and random forest models aim to precisely delineate regions of interest in the brain. They can leverage spatial context and feature relationships to improve performance. However, these algorithms require large, annotated datasets for training. DCNN achieves high classification accuracy for NPH but depends on sizable training datasets. They can automatically extract relevant features, but their representations may be difficult to interpret. Traditional ML classifiers offer probabilistic interpretations and generally perform well in high-dimensional spaces. However, they require hyperparameters to be tuned and are sensitive to outliers.

The most promising results emerged from hybrid approaches that combined traditional ML and deep learning techniques. Reference [31] achieved exceptional results (97% accuracy, 100% specificity, and 96% sensitivity) by integrating Watershed segmentation with CNN. Similarly, [39] demonstrated that combining 3D UNet with probabilistic maps yielded better performance for subarachnoid space segmentation (72 ± 5%) compared to either approach alone.

These hybrid frameworks leverage the complementary strengths of both methodologies: the interpretability and efficient feature selection of traditional ML, with the automatic feature extraction and pattern recognition capabilities of deep learning. As shown in Figure 3, hybrid methods consistently achieved the highest balanced performance across all metrics, suggesting this integrated approach represents the most promising direction for future research.

“Our recommendation for hybrid ML-DL frameworks can be further contextualized within the broader field of AI-driven complex system management. Reference [52] propose AI-driven networks for accident prevention in complex systems that bear striking similarities to the challenges of NPH diagnosis. Their approach emphasizes the integration of heterogeneous data streams through layered AI architectures that combine traditional statistical methods with deep learning to achieve more reliable accident prediction. Similarly, in NPH diagnosis, the integration of multiple data modalities (imaging, clinical, and functional) through hybrid frameworks can create a more comprehensive diagnostic system. Just as Lu’s complex network approach connects seemingly disparate risk factors to improve accident prediction, our proposed hybrid framework could better capture the complex interrelationships between ventricular morphology, clinical symptoms, and functional measures to enhance diagnostic reliability. This systems-level perspective underscores the value of hybrid approaches that can model both explicit domain knowledge (through traditional ML) and implicit patterns (through DL) for more robust NPH detection”.

Feature analysis techniques seek to identify the most predictive measurements for NPH. While they can highlight NPH-specific features, the identified metrics may also be present in other conditions. These approaches can also be computationally intensive. Finally, ML prediction models aim to provide risk scores and forecasts for NPH patients. However, their predictions may not be accurate for all individuals. In summary, these techniques have parallel strengths and weaknesses. While achieving high performance, they vary in their interpretability, transparency, data dependence, and accuracy for individual patients. Future research should evaluate how combining these methods in an integrated framework could maximize their benefits while mitigating limitations.

The significant variability in model performance based on feature selection highlights a critical challenge in NPH diagnosis. When developing ML models for NPH, researchers must address the inherent data imbalance, as NPH patients typically represent a smaller class compared to control groups or other conditions. Drawing from risk assessment frameworks in other domains, techniques like SMOTE oversampling could substantially improve model generalizability. As demonstrated by [53] in the rockburst risk assessment, SMOTE successfully addresses class imbalance by generating synthetic samples of the minority class, leading to a more robust classification under a GBDT framework. Applied to NPH diagnosis, such techniques could help models learn more effectively from limited NPH cases, particularly when distinguishing NPH from more prevalent conditions like AD. Furthermore, the integration of multiple algorithms under a unified framework, as demonstrated in the risk assessment literature, suggests that ensemble approaches might better handle the feature relevance and data heterogeneity challenges inherent in NPH diagnosis.

Our review has several methodological limitations. First, despite our comprehensive search strategy, we may have missed relevant studies, particularly those published in non-indexed journals or in languages other than English. Second, our assessment of the risk of bias was hampered by incomplete reporting in many of the included studies, particularly regarding patient selection and reference standards. Third, the rapid evolution of AI techniques means that newer approaches may not be adequately represented in our review. Finally, our inability to conduct a meta-analysis due to heterogeneity limits our ability to provide precise estimates of diagnostic accuracy for different AI approaches.

The implementation of AI for NPH diagnosis can benefit from interdisciplinary perspectives drawn from risk assessment, system management, and safety-critical methodologies. The challenge of handling imbalanced medical datasets parallels issues faced in risk assessment frameworks. Techniques like SMOTE oversampling, which have proven effective in rockburst risk assessment [51], could enhance model generalizability when handling the typically imbalanced datasets in NPH research, where control subjects often outnumber NPH patients. The proposed hybrid ML-DL framework aligns with AI-driven complex system management approaches. AI-driven networks for accident prevention, integrating multimodal data (clinical, imaging) through hybrid architectures can improve diagnostic reliability and robustness [39]. This cross-domain parallel emphasizes the value of system-level approaches to diagnostic challenges.

Model interpretability remains critical for clinical adoption. Drawing from safety-critical systems research. Implementing similar interpretability frameworks could enhance clinician trust and facilitate regulatory approval.

The human–machine interaction dynamics explored by [43] in automation systems have direct relevance to AI implementation in clinical practice. Optimizing the level of automation to balance AI assistance with clinician oversight is essential to reduce cognitive burden while maintaining appropriate human judgment in NPH diagnosis. This consideration is particularly important given the complex clinical presentation of NPH and its overlap with other neurodegenerative conditions.

5. Conclusions

Collectively, the studies reviewed demonstrate the potential of ML and DL techniques for diagnosing NPH. Both traditional ML and DL approaches show high accuracy for differentiating NPH from other conditions using morphological measurements and medical images. Traditional ML algorithms like RF, LogReg, and SVM achieved classification accuracy between 70% and 95% for NPH. Feature selection helped identify the most predictive parameters for improving performance. DCNN achieved comparable accuracy and the ability to automatically extract features. However, interpretability remains a challenge. A hybrid ML-DL framework may combine the best attributes of both. While ML and DL show potential to transform NPH diagnosis through more efficient, accurate, and personalized analysis of data, limitations remain in interpretability, data requirements, and individual accuracy. Future research should evaluate hybrid frameworks that synergistically integrate techniques to maximize benefits while mitigating challenges. With improvements that address key issues, AI may improve the diagnosis and management of NPH.

Author Contributions

Conceptualization, L.R.M.-D. and H.F.P.-Q.; methodology, L.R.M.-D. and H.F.P.-Q.; software, L.R.M.-D. and H.F.P.-Q.; formal analysis, L.R.M.-D. and H.F.P.-Q.; investigation, L.R.M.-D.; resources, H.F.P.-Q.; data curation, L.R.M.-D.; writing—original draft preparation, L.R.M.-D.; writing—review and editing, H.F.P.-Q., G.X.G. and N.P.; visualization, L.R.M.-D.; supervision, H.F.P.-Q.; project administration, H.F.P.-Q.; funding acquisition, H.F.P.-Q., G.X.G. and N.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a Seed Grant from the Institute for Collaboration on Health, Intervention, and Policy (InCHIP), University of Connecticut.

Conflicts of Interest

Author Neha Prakash was employed by the company XingImaging LLC. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Medical Terms
AD	Alzheimer’s Disease
CSF	Cerebrospinal Fluid
DESH	Disproportionately Enlarged Subarachnoid Space Hydrocephalus
iNPH	Idiopathic Normal Pressure Hydrocephalus
NPH	Normal Pressure Hydrocephalus
PD	Parkinson’s Disease
PSP	Progressive Supranuclear Palsy
Imaging and Analysis
CT	Computerized Tomography
FLAIR	Fluid-Attenuated Inversion Recovery
ICP	Intracranial Pressure
MRI	Magnetic Resonance Imaging
PC-MRI	Phase-Contrast Magnetic Resonance Imaging
ROC-AUC	Receiver Operating Characteristic—Area Under the Curve
ROI	Region of Interest
Artificial Intelligence Methods
AI	Artificial Intelligence
ANN	Artificial Neural Networks
CNN	Convolutional Neural Network
DA	Discriminant Analyses
DCNN	Deep Convolutional Neural Network
DL	Deep Learning
DT	Decision Trees
EPO	Emperor Penguin Optimization
GBDT	Gradient Boosting Decision Tree
GaussNB	Gaussian Naive Bayes
LogReg	Logistic Regression
ML	Machine Learning
MLP	Multilayer Perceptron
NN	Neural Networks
RF	Random Forest
SVM	Support Vector Machine
XGB	XGBoost
Technical Tools and Measurements
FLIRT	FMRIB’s Linear Image Registration Tool
FSL	FMRIB Software Library
MGAC	Morphological Geodesic Active Contours
MCV	Morphological Chan-Vese
WEKA	Waikato Environment for Knowledge Analysis
Statistical and Performance Metrics
CI	Confidence Interval
DSC	Dice Similarity Coefficient
ICC	Intraclass Correlation Coefficient

References

Nakajima, M.; Yamada, S.; Miyajima, M.; Ishii, K.; Kuriyama, N.; Kazui, H.; Kanemoto, H.; Suehiro, T.; Yoshiyama, K.; Kameda, M.; et al. Guidelines for Management of Idiopathic Normal Pressure Hydrocephalus (Third Edition): Endorsed by the Japanese Society of Normal Pressure Hydrocephalus. Neurol. Med. Chir. 2021, 61, 63–97. [Google Scholar] [CrossRef]
Hakim, S.; Adams, R.D. The Special Clinical Problem of Symptomatic Hydrocephalus with Normal Cerebrospinal Fluid Pressure: Observations on Cerebrospinal Fluid Hydrodynamics. J. Neurol. Sci. 1965, 2, 307–327. [Google Scholar] [CrossRef] [PubMed]
Hakim, F.; Jaramillo-Velásquez, D.; González, M.; Gómez, D.F.; Ramón, J.F.; Serrano-Pinzón, M. Normal Pressure Hydrocephalus: Revisiting the Hydrodynamics of the Brain. In Cerebrospinal Fluid; Kuru Bektaşoğlu, P., Gürer, B., Eds.; IntechOpen: London, UK, 2022; ISBN 978-1-83969-695-4. [Google Scholar]
Ishikawa, M.; Yamada, S.; Yamamoto, K. Early and Delayed Assessments of Quantitative Gait Measures to Improve the Tap Test as a Predictor of Shunt Effectiveness in Idiopathic Normal Pressure Hydrocephalus. Fluids Barriers CNS 2016, 13, 20. [Google Scholar] [CrossRef] [PubMed]
Martín-Láez, R.; Caballero-Arzapalo, H.; López-Menéndez, L.Á.; Arango-Lasprilla, J.C.; Arango-Lasprilla, J.C.; Vázquez-Barquero, A. Epidemiology of Idiopathic Normal Pressure Hydrocephalus: A Systematic Review of the Literature. World Neurosurg. 2015, 84, 2002–2009. [Google Scholar] [CrossRef]
Mori, E.; Ishikawa, M.; Kato, T.; Kazui, H.; Miyake, H.; Miyajima, M.; Nakajima, M.; Hashimoto, M.; Kuriyama, N.; Tokuda, T.; et al. Guidelines for Management of Idiopathic Normal Pressure Hydrocephalus: Second Edition. Neurol. Med. Chir. 2012, 52, 775–809. [Google Scholar] [CrossRef]
Nichols, W.W.; O’Rourke, M.F.; Edelman, E.R.; Vlachopoulos, C. McDonald’s Blood Flow in Arteries: Theoretical, Experimental and Clinical Principles, 7th ed.; CRC Press: Boca Raton, FL, USA, 2022; p. 821. [Google Scholar]
Tang, X. The Role of Artificial Intelligence in Medical Imaging Research. BJR Open 2020, 2, 20190031. [Google Scholar] [CrossRef]
Kora, P.; Ooi, C.P.; Faust, O.; Raghavendra, U.; Gudigar, A.; Chan, W.Y.; Meenakshi, K.; Swaraja, K.; Plawiak, P.; Rajendra Acharya, U. Transfer Learning Techniques for Medical Image Analysis: A Review. Biocybern. Biomed. Eng. 2022, 42, 79–107. [Google Scholar] [CrossRef]
Suganyadevi, S.; Seethalakshmi, V.; Balasamy, K. A Review on Deep Learning in Medical Image Analysis. Int. J. Multimed. Inf. Retr. 2022, 11, 19–38. [Google Scholar] [CrossRef]
Bradley, W.G. CSF Flow in the Brain in the Context of Normal Pressure Hydrocephalus. Am. J. Neuroradiol. 2015, 36, 831–838. [Google Scholar] [CrossRef]
McKenna, M.C.; Tahedl, M.; Lope, J.; Chipika, R.H.; Li Hi Shing, S.; Doherty, M.A.; Hengeveld, J.C.; Vajda, A.; McLaughlin, R.L.; Hardiman, O.; et al. Mapping Cortical Disease-Burden at Individual-Level in Frontotemporal Dementia: Implications for Clinical Care and Pharmacological Trials. Brain Imaging Behav. 2022, 16, 1196–1207. [Google Scholar] [CrossRef]
Long, D.F.; Maneyapanda, M.B. Diagnosis and Management of Late Intracranial Complications of Traumatic Brain Injury. In Brain Injury Medicine, Third Edition: Principles and Practice; Springer Publishing Company: New York, NY, USA, 2021; pp. 635–653. [Google Scholar]
Oliveira, L.M.; Nitrini, R.; Román, G.C. Normal-Pressure Hydrocephalus: A Critical Review. Dement. Neuropsychol. 2019, 13, 133–143. [Google Scholar] [CrossRef] [PubMed]
Chen, C.-H.; Cheng, Y.-C.; Huang, C.-Y.; Chen, H.-C.; Chen, W.-H.; Chai, J.-W. Accuracy of MRI Derived Cerebral Aqueduct Flow Parameters in the Diagnosis of Idiopathic Normal Pressure Hydrocephalus. J. Clin. Neurosci. 2022, 105, 9–15. [Google Scholar] [CrossRef] [PubMed]
Bradley, W.G., Jr. Magnetic Resonance Imaging of Normal Pressure Hydrocephalus. Semin. Ultrasound CT MR 2016, 37, 120–128. [Google Scholar] [CrossRef] [PubMed]
Borzage, M.; Saunders, A.; Hughes, J.; McComb, J.G.; Bluml, S.; King, K.S. The First Examination of Diagnostic Performance of Automated Measurement of the Callosal Angle in 1856 Elderly Patients and Volunteers Indicates That 12.4% of Exams Met the Criteria for Possible Normal Pressure Hydrocephalus. Am. J. Neuroradiol. 2021, 42, 1942–1948. [Google Scholar] [CrossRef]
Kockum, K.; Virhammar, J.; Riklund, K.; Söderström, L.; Söderström, L.; Larsson, E.-M.; Laurell, K. Diagnostic Accuracy of the iNPH Radscale in Idiopathic Normal Pressure Hydrocephalus. PLoS ONE 2020, 15, 0232275. [Google Scholar] [CrossRef]
Jordan, M.I.; Mitchell, T.M. Machine Learning: Trends, Perspectives, and Prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
Alpaydin, E. Introduction to Machine Learning, 4th ed.; Adaptive computation and machine learning series; The MIT Press: Cambridge, MA, USA, 2020; ISBN 978-0-262-04379-3. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; Adaptive computation and machine learning; The MIT Press: Cambridge, MA, USA, 2016; ISBN 978-0-262-03561-3. [Google Scholar]
Sotoudeh, H.; Sadaatpour, Z.; Rezaei, A.; Shafaat, O.; Sotoudeh, E.; Tabatabaie, M.; Singhal, A.; Tanwar, M. The Role of Machine Learning and Radiomics for Treatment Response Prediction in Idiopathic Normal Pressure Hydrocephalus. Cureus 2021, 13, 1–8. [Google Scholar] [CrossRef]
Lv, M.; Yang, X.; Zhou, X.; Chen, J.; Wei, H.; Du, D.; Lin, H.; Xia, J. Gray Matter Volume of Cerebellum Associated with Idiopathic Normal Pressure Hydrocephalus: A Cross-Sectional Analysis. Front. Neurol. 2022, 13, 922199. [Google Scholar] [CrossRef]
Chen, J.; He, W.; Zhang, X.; Lv, M.; Zhou, X.; Yang, X.; Wei, H.; Ma, H.; Li, H.; Xia, J. Value of MRI-Based Semi-Quantitative Structural Neuroimaging in Predicting the Prognosis of Patients with Idiopathic Normal Pressure Hydrocephalus After Shunt Surgery. Eur. Radiol. 2022, 32, 7800–7810. [Google Scholar] [CrossRef]
Bontempi, D.; Benini, S.; Signoroni, A.; Svanera, M.; Muckli, L. CEREBRUM: A Fast and Fully-Volumetric Convolutional Encoder-decodeR for Weakly-Supervised sEgmentation of BRain strUctures from out-of-the-Scanner MRI. Med. Image Anal. 2020, 62, 101688. [Google Scholar] [CrossRef]
Ziegelitz, D.C. Cerebral CT- and MRI Perfusion: Techniques and Clinical Application in iNPH; University of Gothenburg: Gothenburg, Sweden, 2015. [Google Scholar]
Fabbro, S.; Piccolo, D.; Vescovi, M.C.; Bagatto, D.; Tereshko, Y.; Belgrado, E.; Maieron, M.; De Colle, M.C.; Skrap, M.; Tuniz, F. Resting-State Functional-MRI in iNPH: Can Default Mode and Motor Networks Changes Improve Patient Selection and Outcome? Preliminary Report. Fluids Barriers CNS 2023, 20, 7. [Google Scholar] [CrossRef] [PubMed]
Haber, M.A.; Biondetti, G.P.; Gauriau, R.; Comeau, D.S.; Chin, J.K.; Bizzo, B.C.; Strout, J.; Golby, A.J.; Andriole, K.P. Detection of Idiopathic Normal Pressure Hydrocephalus on Head CT Using a Deep Convolutional Neural Network. Neural Comput. Appl. 2023, 35, 9907–9915. [Google Scholar] [CrossRef]
Irie, R.; Otsuka, Y.; Hagiwara, A.; Kamagata, K.; Kamiya, K.; Suzuki, M.; Suzuki, M.; Wada, A.; Maekawa, T.; Fujita, S.; et al. A Novel Deep Learning Approach with a 3D Convolutional Ladder Network for Differential Diagnosis of Idiopathic Normal Pressure Hydrocephalus and Alzheimer’s Disease. Magn. Reson. Med. Sci. 2020, 19, 351–358. [Google Scholar] [CrossRef] [PubMed]
Demyanchuk, A.; Pushkina, E.; Russkikh, N.; Shtokalo, D.; Mishinov, S. Hydrocephalus Verification on Brain Magnetic Resonance Images with Deep Convolutional Neural Networks and “Transfer Learning” Technique. arXiv 2019. [Google Scholar] [CrossRef]
Rudhra, B.; Malu, G.; Malu, G.; Sherly, E.; Mathew, R. A Novel Deep Learning Approach for the Automated Diagnosis of Normal Pressure Hydrocephalus. J. Intell. Fuzzy Syst. 2021, 41, 5299–5307. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, 71. [Google Scholar] [CrossRef]
Serulle, Y.; Rusinek, H.; Kirov, I.I.; Milch, H.; Fieremans, E.; Baxter, A.B.; McMenamy, J.; Jain, R.; Wisoff, J.H.; Golomb, J.; et al. Differentiating Shunt-Responsive Normal Pressure Hydrocephalus from Alzheimer Disease and Normal Aging: Pilot Study Using Automated MRI Brain Tissue Segmentation. J. Neurol. 2014, 261, 1994–2002. [Google Scholar] [CrossRef]
Iida, S.; Seino, H.; Nagahata, F.; Tatsuo, S.; Maruyama, S.; Kon, S.; Takada, H.; Matsuzaka, M.; Sugimoto, K.; Kakeda, S. Cerebral Ventriculomegaly in Myotonic Dystrophy Type 1: Normal Pressure Hydrocephalus-like Appearances on Magnetic Resonance Imaging. BMC Neurosci. 2021, 22, 62. [Google Scholar] [CrossRef]
Fu, M.-H.; Huang, C.-C.; Wu, K.L.H.; Chen, Y.-F.; Kung, Y.-C.; Lee, C.-C.; Liu, J.-S.; Lan, M.-Y.; Chang, Y.-Y. Higher Prevalence of Idiopathic Normal Pressure Hydrocephalus-Like MRI Features in Progressive Supranuclear Palsy: An Imaging Reminder of Atypical Parkinsonism. Brain Behav. 2023, 13, e2884. [Google Scholar] [CrossRef]
Griffa, A.; Bommarito, G.; Assal, F.; Preti, M.G.; Goldstein, R.; Armand, S.; Herrmann, F.R.; Van De Ville, D.; Allali, G. CSF Tap Test in Idiopathic Normal Pressure Hydrocephalus: Still a Necessary Prognostic Test? J. Neurol. 2022, 269, 5114–5126. [Google Scholar] [CrossRef]
Miskin, N.; Patel, H.; Franceschi, A.M.; Ades-Aron, B.; Le, A.; Damadian, B.E.; Stanton, C.; Serulle, Y.; Golomb, J.; Gonen, O.; et al. Diagnosis of Normal-Pressure Hydrocephalus: Use of Traditional Measures in the Era of Volumetric MR Imaging. Radiology 2017, 285, 197–205. [Google Scholar] [CrossRef] [PubMed]
Xu, H.; Fang, X.; Jing, X.; Bao, D.; Niu, C. Multiple Machine Learning Approaches for Morphometric Parameters in Prediction of Hydrocephalus. Brain Sci. 2022, 12, 1484. [Google Scholar] [CrossRef] [PubMed]
Rau, A.; Kim, S.; Yang, S.; Reisert, M.; Kellner, E.; Duman, I.E.; Stieltjes, B.; Hohenhaus, M.; Jurgen, B.; Urbach, H.; et al. SVM-Based Normal Pressure Hydrocephalus Detection. Clin. Neuroradiol. Klin. Neuroradiol. 2021, 31, 1029–1035. [Google Scholar] [CrossRef] [PubMed]
Zhang, A.; Khan, A.; Majeti, S.; Pham, J.; Nguyen, C.; Tran, P.; Iyer, V.; Shelat, A.; Chen, J.; Manjunath, B.S. Automated Segmentation and Connectivity Analysis for Normal Pressure Hydrocephalus. BME Front. 2022, 2022, 9783128. [Google Scholar] [CrossRef]
Bianco, M.G.; Quattrone, A.; Sarica, A.; Vescio, B.; Buonocore, J.; Vaccaro, M.G.; Aracri, F.; Calomino, C.; Gramigna, V.; Quattrone, A. Cortical Atrophy Distinguishes Idiopathic Normal-Pressure Hydrocephalus from Progressive Supranuclear Palsy: A Machine Learning Approach. Park. Relat. Disord. 2022, 103, 7–14. [Google Scholar] [CrossRef]
Ozgode Yigin, B.; Algin, O.; Saygili, G. Comparison of Morphometric Parameters in Prediction of Hydrocephalus Using Random Forests. Comput. Biol. Med. 2020, 116, 103547. [Google Scholar] [CrossRef]
Vlasák, A.; Gerla, V.; Skalický, P.; Mládek, A.; Sedlák, V.; Vrána, J.; Whitley, H.; Lhotská, L.; Beneš, V.; Beneš, V.; et al. Boosting Phase-Contrast MRI Performance in Idiopathic Normal Pressure Hydrocephalus Diagnostics by Means of Machine Learning Approach. Neurosurg. Focus 2022, 52, E6. [Google Scholar] [CrossRef]
Tsou, C.-H.; Cheng, Y.-C.; Huang, C.-Y.; Chen, J.-H.; Chen, W.-H.; Chai, J.-W.; Chen, C.C.-C. Using Deep Learning Convolutional Neural Networks to Automatically Perform Cerebral Aqueduct CSF Flow Analysis. J. Clin. Neurosci. 2021, 90, 60–67. [Google Scholar] [CrossRef]
Mao, Y.; Shen, Z.; Wang, J.; Zhu, H.; Yu, Z.; Chen, X.; Cheng, H. Deep Learning-Based MR Imaging for Analysis of Relation between Cerebrospinal Fluid Variation and Communicating Hydrocephalus after Decompressive Craniectomy for Craniocerebral Injury. Sci. Program. 2022, 2022, 3070361. [Google Scholar] [CrossRef]
Galeano, M.; Calisto, A.; Bramanti, A.; Angileri, F.; Campobello, G.; Serrano, S.; Azzerboni, B. Azzerboni Classification of Morphological Features Extracted from Intracranial Pressure Recordings in the Diagnosis of Normal Pressure Hydrocephalus (NPH). In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011; pp. 2768–2771. [Google Scholar]
Baloni, D.; Verma, S.K. Detection of Hydrocephalus Using Machine Learning in Medical Science—A Review. Multimed. Tools Appl. 2022, 81, 21199–21222. [Google Scholar] [CrossRef]
Giner, J.F.; Sanz-Requena, R.; Flórez, N.; Alberich-Bayarri, A.; García-Martí, G.; Ponz, A.; Martí-Bonmatí, L. Quantitative Phase-Contrast MRI Study of Cerebrospinal Fluid Flow: A Method for Identifying Patients with Normal-Pressure Hydrocephalus. Neurologia 2014, 29, 68–75. [Google Scholar] [CrossRef]
Tulbure, A.-A.; Tulbure, A.-A.; Dulf, E.-H. A Review on Modern Defect Detection Models Using DCNNs—Deep Convolutional Neural Networks. J. Adv. Res. 2022, 35, 33–48. [Google Scholar] [CrossRef] [PubMed]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014. [Google Scholar] [CrossRef]
Wade, B.S.C.; Joshi, S.H.; Gutman, B.A.; Thompson, P.M. Machine Learning on High Dimensional Shape Data from Subcortical Brain Surfaces: A Comparison of Feature Selection and Classification Methods. Pattern Recognit. 2017, 63, 731–739. [Google Scholar] [CrossRef]
Lu, C.; Li, S.; Xu, N.; Zhang, Y.; Qin, Y. Research on AI-Driven Complex Network and Management System of Coal and Gas Outburst Accident. J. Saf. Sustain. 2025; in press. [Google Scholar] [CrossRef]
Wang, Y.; Feng, A.; Xue, Y.; Shao, M.; Blitz, A.M.; Luciano, M.G.; Carass, A.; Prince, J.L. Investigation of Probability Maps in Deep-Learning-Based Brain Ventricle Parcellation. In Proceedings of the SPIE—The International Society for Optical Engineering, San Diego, CA, USA, 20–24 August 2023; Volume 12464. [Google Scholar]

Figure 1. Overview of the review.

Figure 2. A flowchart to identify eligible studies, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

Figure 3. Comparative performance of AI methods for NPH detection [28,29,31,33,39,42,47].

Table 2. Summary of advantages and disadvantages of AI approaches for the detection of NPH.

Performance Metric	Traditional ML Methods (Mean ± SD)	Deep Learning Methods (Mean ± SD)	Hybrid Methods (Mean ± SD)	p-Value (ML vs. DL)
Accuracy (%)	85.7 ± 7.8	94.3 ± 4.2	96.8 ± 2.1	0.032 *
Sensitivity (%)	82.9 ± 12.1	92.1 ± 5.3	94.6 ± 3.7	0.041 *
Specificity (%)	83.4 ± 9.8	90.8 ± 7.2	97.3 ± 4.6	0.046 *
ROC-AUC	0.90 ± 0.06	0.95 ± 0.03	0.96 ± 0.02	0.027 *

* Statistically significant difference (p < 0.05); p-values calculated using independent samples t-test.

Table 3. Statistical comparison of segmentation performance (Dice similarity coefficients).

Brain Structure	Traditional ML Methods (Mean ± SD)	Deep Learning Methods (Mean ± SD)	Hybrid Methods (Mean ± SD)	p-Value (ML vs. DL)
Ventricles	0.69 ± 0.15	0.88 ± 0.09	0.89 ± 0.04	0.019 *
Gray-white matter	0.84 ± 0.08	0.91 ± 0.06	0.93 ± 0.03	0.048 *
Subarachnoid space	0.35 ± 0.14	0.67 ± 0.11	0.71 ± 0.06	0.004 *
Overall	0.79 ± 0.14	0.87 ± 0.08	0.90 ± 0.04	0.068

* Statistically significant difference (p < 0.05); p-values calculated using independent samples t-test.

Table 4. Weighted average feature importance across studies.

Feature	Weighted Importance (%)	Studies Reporting
Evans ratio	21.3	4
Frontal horn length	15.7	3
Callosal angle	14.2	4
DESH	11.8	3
Ventricular volume	10.5	5
White matter hyperintensities	7.6	2
Gray matter volume	5.4	4
Midbrain dimensions	4.8	2
CSF flow parameters	4.2	3
Other features	4.5	-

Table 5. Summary of advantages and disadvantages of AI approaches for the detection of NPH.

Specific Application	AI Approach	Advantages	Disadvantages
Segmentation	3D UNet, RF + 3D MCV, RF, RF + MGAC, MCV	– Can accurately segment different structures in the brain. – Can capture spatial information and context by leveraging features, allowing them to understand the relationship between pixels and their neighbors [48].	– Requires a large amount of training data. – May struggle with segmenting fine-grained or intricate details in images [48].
Classification DL methods	DCNN, Grad-CAM, DCNN-PCA, ResNet34, DCNN + Watersheld, DCNN + EPO, NHDeepDNN, Bayesian UNet, DenseNets, UNet	– Can achieve high accuracy in classification tasks [48]. – They can automatically extract relevant features at various levels of abstraction, allowing for effective feature learning without manual feature engineering [39]. – Achieved impressive performance on various MRI studies [28].	– Requires a large amount of training data. – High computational requirements may necessitate the use of specialized hardware or distributed computing resources [48]. – Has a deeper architecture compared to shallower networks, resulting in higher computational demands during training and inference [49]. – Internal representations may be challenging to interpret or explain due to its depth and complexity [48].
Classification traditional ML methods	LogReg, RF, SVM, MLP, GaussNB, GBDT, ExraTrees, XGB, AdaBoost	– Provides probability estimates for each class, allowing for probabilistic interpretations of the classification results [33]. – Perform well in high-dimensional feature spaces, making them suitable for datasets with many features [38].	– Sensitive to outliers, affecting its performance [18]. – Have hyperparameters that need to be tuned, such as the kernel type and regularization parameter, which can be challenging [38].
Feature analysis	PCA, LogReg, RF	– Can identify specific features that are associated with NPH [50]. – Retains the most important patterns and variances in the data [30].	– The features may not be specific to NPH and may be present in other conditions [50]. – Models can be computationally expensive, especially for large datasets or when the number of trees in the forest is high [41].
Prediction	XGB, SVM-RF-Other-CNN	– Can predict the risk of developing NPH or the response to CSF shunting [33].	– The predictions may not be accurate for all patients [32]. – Validation frameworks such as stratified k-fold cross-validation are essential but often underutilized. – Many studies lack external validation with independent cohorts, limiting generalizability assessment. – When datasets are imbalanced, techniques like SMOTE should be employed to enhance model robustness [51]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mercado-Diaz, L.R.; Prakash, N.; Gong, G.X.; Posada-Quintero, H.F. Artificial Intelligence Approaches for the Detection of Normal Pressure Hydrocephalus: A Systematic Review. Appl. Sci. 2025, 15, 3653. https://doi.org/10.3390/app15073653

AMA Style

Mercado-Diaz LR, Prakash N, Gong GX, Posada-Quintero HF. Artificial Intelligence Approaches for the Detection of Normal Pressure Hydrocephalus: A Systematic Review. Applied Sciences. 2025; 15(7):3653. https://doi.org/10.3390/app15073653

Chicago/Turabian Style

Mercado-Diaz, Luis R., Neha Prakash, Gary X. Gong, and Hugo F. Posada-Quintero. 2025. "Artificial Intelligence Approaches for the Detection of Normal Pressure Hydrocephalus: A Systematic Review" Applied Sciences 15, no. 7: 3653. https://doi.org/10.3390/app15073653

APA Style

Mercado-Diaz, L. R., Prakash, N., Gong, G. X., & Posada-Quintero, H. F. (2025). Artificial Intelligence Approaches for the Detection of Normal Pressure Hydrocephalus: A Systematic Review. Applied Sciences, 15(7), 3653. https://doi.org/10.3390/app15073653

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence Approaches for the Detection of Normal Pressure Hydrocephalus: A Systematic Review

Abstract

1. Introduction

1.1. Background

1.1.1. Normal Pressure Hydrocephalus

1.1.2. Diagnostic Conundrum

1.1.3. Artificial Intelligence

2. Methods

2.1. Literature Search

2.2. Search Query

2.3. Filtering and Selection

2.4. Article Selection

2.5. Risk of Bias

2.6. Synthesis of Results

3. Results

3.1. ML Methods

3.1.1. Purpose-Based Classification

NPH vs. Healthy Controls

NPH vs. AD

NPH vs. Progressive Supranuclear Palsy

NPH vs. Multiple Conditions (AD vs. Healthy Controls)

3.1.2. Segmentation and Feature Analysis

Brain Region Segmentation

Morphometric Parameter Evaluation

CSF Dynamics Analysis

Phase-Contrast MRI Analysis

3.1.3. Key Features in ML Methods

3.2. DL Methods

Architecture Types

3.3. Statistical Comparison of Method Performance

3.3.1. Classification Performance Analysis

3.3.2. Segmentation Performance Analysis

3.3.3. Feature Importance Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI