Next Article in Journal
Introduction of AI Technology for Objective Physical Function Assessment
Next Article in Special Issue
Spatial Cognitive EEG Feature Extraction and Classification Based on MSSECNN and PCMI
Previous Article in Journal
Classical Batch Distillation of Anaerobic Digestate to Isolate Ammonium Bicarbonate: Membrane Not Necessary!
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning-Driven GLCM Analysis of Structural MRI for Alzheimer’s Disease Diagnosis

by
Maria João Oliveira
,
Pedro Ribeiro
and
Pedro Miguel Rodrigues
*
CBQF—Centro de Biotecnologia e Química Fina—Laboratório Associado, Escola Superior de Biotecnologia, Universidade Católica Portuguesa, Rua de Diogo Botelho 1327, 4169-005 Porto, Portugal
*
Author to whom correspondence should be addressed.
Bioengineering 2024, 11(11), 1153; https://doi.org/10.3390/bioengineering11111153
Submission received: 9 October 2024 / Revised: 7 November 2024 / Accepted: 13 November 2024 / Published: 15 November 2024

Abstract

:
Background: Alzheimer’s disease (AD) is a progressive and irreversible neurodegenerative condition that increasingly impairs cognitive functions and daily activities. Given the incurable nature of AD and its profound impact on the elderly, early diagnosis (at the mild cognitive impairment (MCI) stage) and intervention are crucial, focusing on delaying disease progression and improving patients’ quality of life. Methods: This work aimed to develop an automatic sMRI-based method to detect AD in three different stages, namely healthy controls (CN), mild cognitive impairment (MCI), and AD itself. For such a purpose, brain sMRI images from the ADNI database were pre-processed, and a set of 22 texture statistical features from the sMRI gray-level co-occurrence matrix (GLCM) were extracted from various slices within different anatomical planes. Different combinations of features and planes were used to feed classical machine learning (cML) algorithms to analyze their discrimination power between the groups. Results: The cML algorithms achieved the following classification accuracy: 85.2% for AD vs. CN, 98.5% for AD vs. MCI, 95.1% for CN vs. MCI, and 87.1% for all vs. all. Conclusions: For the pair AD vs. MCI, the proposed model outperformed state-of-the-art imaging source studies by 0.1% and non-imaging source studies by 4.6%. These results are particularly significant in the field of AD classification, opening the door to more efficient early diagnosis in real-world settings since MCI is considered a precursor to AD.

1. Introduction

Alzheimer’s disease (AD) is a progressive neurodegenerative disease characterized by memory loss and multiple cognitive impairments (memory, attention, concentration, speech, and thinking, among others) [1]. AD represents one of the most expensive, lethal, and burdening diseases of this century. This deterioration leads to alterations in an individual’s behavior, personality, and functional capacity, impeding their ability to perform activities of daily living [2].
The World Health Organization (WHO) estimates that more than 55 million people (8.1% of women and 5.4% of men over 65 years) are living with dementia, a number that could reach 78 million by 2030 and nearly triple to 139 million by 2050 [3]. Within this context, AD assumes a prominent position, accounting for approximately 60% to 80% of all instances of dementia [4]. Given that the predominant risk factor for dementia is advancing age, the ongoing rise in life expectancy and demographic aging further amplifies the probability of individuals developing this condition. Thus, AD directly impacts cognitive abilities, and neurocognitive function presents significant challenges, particularly in populations with increasing life expectancies [5]. These are clear indicators that this dementia-related disorder will pose considerable challenges to public health and elderly care systems across the world [6].
AD follows a progressive disease sequence that extends from an asymptomatic phase with biomarker evidence of AD through minor cognitive and/or neurobehavioral changes to, ultimately, AD dementia [4]. This translates into the AD continuum, where three broad phases exist: preclinical Alzheimer’s disease, mild cognitive impairment (MCI) due to AD, and dementia due to AD, also called Alzheimer’s dementia [7].
During the preclinical phase, individuals exhibit signs of AD pathology without apparent cognitive or functional deterioration, comprising a long asymptomatic phase where day-to-day routines remain unperturbed [4]. In turn, MCI identifies individuals who do not have dementia but do have biomarker evidence of Alzheimer’s brain changes, as well as new but subtle deficits in cognition. Difficulties in memory, language, and thought processes arise when the brain reaches a point where it can no longer offset the detrimental impact and loss of neurons [7,8]. Within the subset of individuals experiencing MCI, around 15% undergo progression to dementia within two years. Additionally, approximately one-third of those with MCI develop dementia associated with AD within five years [7]. The AD dementia stage itself can be divided into mild, moderate, and severe, depending on the magnitude of the disease’s manifestation and the resulting decline in the patient’s level of independence.
Despite clinical measures being available for the diagnosis of AD progression, the process is notably time-intensive. The first steps in diagnosis take action when a patient presents symptoms consistent with the early stages of AD. In clinical practice, a two-stage process is often employed. This involves an initial “triage”, typically conducted through primary care, where it is essential to review the clinical background and family history and exclude possible reversible causes of cognitive impairment, such as depression or vitamin, hormone, and electrolyte deficiencies, through routine examinations, namely blood analyses [4]. One of the most frequently reported manifestations involves memory-related symptoms. Therefore, clinical evaluations are performed to gauge the existence of cognitive and functional impairments. A cognitive assessment that can be efficiently conducted (within 10 min), albeit with certain constraints, is the Mini-Mental State Examination [4]. This assessment scores memory and language impairments on a scale from 0 to 30, with lower scores meaning more severe impairments. As for the second stage, once an initial assessment is undergone and the presence of cognitive impairment is confirmed, specialists employ clinical supplementary evaluations to ascertain the underlying causes of the impairment and validate the diagnosis with the identification of neurodegeneration and biomarker presence through magnetic resonance imaging (MRI), positron emission tomography (PET), electroencephalogram (EEG), or cerebrospinal fluid (CSF) analysis. Table 1 shows the state-of-the-art AD detection using a comprehensive comparison of studies of the ADNI and other imaging and non-imaging databases.
Although initially mainly used to rule out other causes of cognitive impairment, MRI has demonstrated positive predictive value for AD now that new imaging analysis modalities have emerged [31]. Structural MRI directs its focus towards the physical modifications occurring in the brain regions affected by the neuropathology of AD, such as atrophy and changes in tissue characteristics [31,32]. For instance, hippocampal degeneration is correlated with increased relevant biomarker deposition, leading to a smaller hippocampal volume (HV) in AD patients showing declined cognitive function. Thus, HV serves as a recommended biomarker, indicating neuronal damage, which supports the diagnosis of AD in the presence of clinical symptoms. Notably, the accelerated atrophy of the hippocampus is a reliable predictor for an increased likelihood of conversion from MCI to AD [32]. Despite the absence of molecular specificity, sMRI is regarded as secure and effective, providing valuable measures of brain atrophy. These metrics encapsulate the accrued neuronal damage, which, in turn, is a direct determinant of the clinical state [32].
While clinical measures and diagnostic tools are available for the assessment of AD progression, the process is notably time-intensive. Moreover, accurately evaluating the disease’s development can be challenging until the symptoms become overt, even for experts. Therefore, early detection has become a focal point for the scientific community. Early diagnosis not only prompts precautionary measures but also has the potential to mitigate the disease’s non-curable adverse effects on the patient’s daily life in the foreseeable future [5]. With this in mind, our work focuses on developing an automated diagnostic tool for discrimination between MCI and AD using sMRI texture features. Our primary objectives include the following:
  • To introduce the utilization of 22 sMRI gray-level co-occurrence matrix features for the characterization of AD activity;
  • To strengthen the differentiation between healthy control, MCI, and AD patients by systematically comparing the synergistic power of GLCM features extracted across different sMRI planes;
  • To thoroughly evaluate the discriminatory performance of the GLCM features by employing an extensive set of machine learning models.

2. Materials and Methods

The methodology conducted in this study is illustrated in Figure 1. It is divided into three main phases: data collection/pre-processing, feature extraction, and machine learning classification. This research was conducted using a MacBook Pro 14 (Apple Inc., Cupertino, CA, USA) equipped with an M3 Pro chip featuring an 11-core CPU, a 14-core GPU, 18 GB of RAM, and 512 GB SSD.

2.1. Dataset Description

The Alzheimer’s Disease Neuroimaging Initiative (ADNI) database provided the brain MRI datasets used in this work. Specifically, the first phase of ADNI1 is available for download at http://adni.loni.usc.edu (accessed on 14 September 2024) [33]. The ADNI1 MRI procedure used dual-echo T2-weighted and T1-weighted sequences to provide consistent longitudinal structural imaging on 3T and 1.5T scanners. Each patient’s average scan time was about forty-five minutes per session. Many imaging vendors, such as Siemens, GE Healthcare, and Philips Healthcare, adopt this uniform procedure. Strict quality control methods were applied to every examination to find and eliminate images judged unsuitable due to subject mobility or inadequate anatomical coverage. A total of 504 images from individual participants in the ADN1 acquisitions, representing a range of diagnostic groupings, were included in the dataset: 167 represent healthy controls, 235 patients with MCI, and 102 patients diagnosed with AD. It is important to note that not all MCI patients were β -amyloid positive in this dataset.
Table 2 presents details regarding each group’s demographic characteristics.

2.2. Image Pre-Processing

2.2.1. Data Cleaning

From the different ADN1 collections, the images were selected so as to obtain a single image for a specific subject, selecting the one associated with the most recent date.
One of the objectives of ADNI1 included evaluating structural MRI metrics at both 1.5T and 3T to assess whether the strength of the magnetic field significantly influenced the quantitative measures that are critical for AD. The researchers concluded that the performance with both field strengths was similar, with neither presenting significant disadvantages over the other [34]. Thus, 3T and 1.5T scans were both used to maximize the number of single subjects’ images, as the differences in the image quality between 1.5T and 3T MRI are minimal. This approach enabled the expansion of the study groups’ sizes for the present investigation.

2.2.2. Skull Stripping

Although some image corrections were already provided by the ADNI, such as the correction of image geometry distortion or intensity non-uniformity, the selected images were loaded into the MATLAB 2023b software, where a set of structural pre-processing steps were performed with the help of the SPM12 (Statistical Parametric Mapping) package (freely available online at https://www.fil.ion.ucl.ac.uk/spm/ (accessed on 15 September 2024), which is designed for the analysis of brain imaging data sequences. These structural steps were applied according to [35,36] and included segmentation, skull stripping, and normalization. Segmentation is essential for the differentiation of tissue types and the calculation of the deformation matrix, which dictates the location of the result on a specific brain coordinate template. Extracting the skull region is a crucial step in brain segmentation assignments for clinical analysis, and it is known as skull stripping. Effective skull stripping becomes essential because of the brain’s complex anatomical structure and intensity variations in MRI. This process’s precision and efficiency are paramount for accurate diagnostics [37]. The normalization step at this point was important to normalize the image to a standard space, defining the boundaries around the brain from a set origin. An example of the resulting image from the structural steps is provided in Figure 2.

2.2.3. Slicing and Final Pre-Processing Steps

The resulting images underwent decomposition into 2D slices representing three distinct anatomical planes: coronal, sagittal, and axial. Employing a MATLAB 2023b routine, eight slices were generated from each image for every anatomical plane. This specific number of slices was chosen due to computational constraints, as it balanced detailed anatomical representation with a manageable data volume and processing time. Additionally, it is essential to note that these slices were consistently extracted from identical positions across all images to ensure uniformity in the data analysis.
Subsequently, all acquired 2D slices were converted from the initial Neuroimaging Informatics Technology Initiative (NIfTI) format to Portable Network Graphics (PNG) and subjected to transformation. The transformation involved resizing the input images to dimensions of 156 × 156. This resizing significantly reduced the computational time during the feature extraction phase. Moreover, the use of square images streamlined the subsequent analysis. Following image resizing, a normalization process was implemented for each slice. Normalization scales the pixel values in an image from a predefined range (0 to 255) to a standardized range between 0 and 1 [38].

2.3. Feature Extraction

For each subject, eight images per anatomical view, resulting in a total of 24 images per subject, and a cumulative total of 12,096 images for the entire dataset consisting of 504 study participants were employed for feature extraction. From each image, 22 texture features were computed. The gray-level co-occurrence matrix (GLCM) method was used for this.

Texture Analysis via GLCM

Using MATLAB 2023b and the function graycomatrix (available in the MATLAB Image Processing Toolbox), it was possible to generate a matrix by determining the frequency of occurrences of a pixel. In this work’s context, the function output consisted of multiple GLCMs, specified to the graycomatrix function by an array of offsets, which defined pixel relationships of varying directions and distances. Specifically, the array utilized defined four offsets [ 0 1 ; 1 1 ; 1 0 ; 1 1 ] , representing the distance between a pixel of interest and its neighboring pixels in four distinct directions: horizontal, vertical, and two diagonals. Employing multiple directions facilitates the more sensitive characterization of the texture within the image, encompassing texture patterns across various orientations [39].
Additionally, the number of gray levels used was 8, the default value for numeric images, which is the case for MRI images. This determines the discrete intensity levels considered in GLCM computation and, ultimately, the size of the GLCM (8 × 8). Moreover, this selection balances the capture of textural intricacies and the management of the computational demands [39].
Although the function graycoprops (available in the MATLAB Image Processing Toolbox) successfully extracts four statistics specified in the properties of the GLCMs—contrast, correlation, energy, and homogeneity—which were used, a few more parameters were also computed according to [40]. Each feature output vector had four dimensions corresponding to the four-direction offset array. Table 3 provides a comprehensive overview of all extracted features [40]. To fully understand the data presented, the specific notations used to describe the matrix entries and associated calculations are provided in the Notation part of the manuscript.
All of the extracted feature information was organized into matrices, with subjects as rows and features as columns, categorized by group (AD, CN, MCI) and anatomical view (axial, coronal, sagittal, and combined views). A comprehensive matrix was also constructed by combining the information from all three groups for all anatomical views.
The z-score normalization method was applied to each matrix to ensure the consistent treatment of each feature. In MATLAB, z-score normalization [41] is performed by column, which standardizes each feature across the subjects.

2.4. Classification Framework

The normalized matrices were subsequently uploaded to Python (version 3.9.12) to execute a set of machine learning models and generate discrimination reports.

2.4.1. ML Discrimination Between Study Groups—No Feature Selection Approach

The model’s performance in discriminating between study groups (AD vs. CN, AD vs. MCI, CN vs. MCI, and all vs. all) was evaluated using eighteen selected Scikit-learn ML models, employing a hold-out method. Each algorithm processed either 2112 features when utilizing data from all three anatomical planes (22 features extracted, each with (×) 4 offsets from (×) 8 slices (×) per 3 anatomical planes, for each group comparison) or 704 features when analyzing each plane individually (22 features extracted, each with (×) 4 offsets from (×) 8 slices (×) per 1 anatomical plane, for each group comparison).
In the hold-out method, the dataset was divided randomly into a train and test set. Specifically, 80% of the data was allocated for training, and the remaining 20% was used for testing. The training set was utilized for model training, while the test set was used to evaluate the model’s performance on unseen data.
Table 4 provides details of each classifier used and the corresponding hyperparameters. For this purpose, some configurations were created, but the default hyperparameters provided by Scikit-learn were also used [42].

2.4.2. ML Discrimination Between Study Groups—Feature Selection Approach

Following an analysis of the outcomes from the previous phase (no feature selection approach), a subset of seven classifiers demonstrating superior A c c u r a c y was selected for classification between the study groups using a feature selection approach. In the evaluation of the model’s capability to discriminate between the study groups (using the same hold-out process), feature selection, utilizing the F-score algorithm [43], was systematically conducted by incrementally adding to the model’s entries one feature per iteration, ranging from 2 to 2111 features (combined plane variant) or from 2 to 703 features (per plane). The feature selection process was strictly performed using the training data to prevent any data leakage assumptions between training and testing.

2.4.3. Classification Metrics

The performance evaluation of the proposed model was conducted using ten metrics: a c c u r a c y , s p e c i f i c i t y , p r e c i s i o n , r e c a l l , F 1 - s c o r e , A U C (area under the curve), and G m e a n .
A c c u r a c y represents the number of correctly classified classes concerning all cases and can be defined as
A c c u r a c y = T P + T N T P + T N + F P + F N × 100 %
where T P , T N , F P , and F N are, respectively, the true positives, true negatives, false positives, and false negatives [42].
P r e c i s i o n , also known as the positive predictive value, indicates the proportion of correctly classified positive cases among all cases predicted as positive. It can be defined as [42]
P r e c i s i o n = T P T P + F P × 100 %
R e c a l l , also known as sensitivity, represents the proportion of correctly predicted positive cases among the total number of actual positive cases, being defined as [42,44]
R e c a l l = T P T P + F N × 100 %
S p e c i f i c i t y is the negative class version of the r e c a l l and denotes the rate of negative samples that are correctly classified. The specificity ranges from 0 to 1, where 1 indicates the perfect prediction of the negative class, and 0 means that all negative class samples are incorrectly predicted. It is defined as [44]
S p e c i f i c i t y = T N F N + T N × 100 %
The F 1 - s c o r e is the harmonic average between the recall and precision, penalizing extreme values of either [44]. The corresponding equation is defined as [42]
F 1 - S c o r e = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l × 100 %
The geometric mean ( G m e a n ) is a metric that evaluates the balance in performance across all classes. A higher Gmean value indicates a reduced risk of overfitting, which means that the model has been trained in a specific way to the training sample, compromising its ability to generalize to new data [45]. The Gmean is mathematically defined as [42]
G m e a n = R e c a l l × S p e c i f i c i t y × 100 %
The area under the curve ( A U C ) of the receiver operating characteristic (ROC) curve assesses a model’s ability to discriminate between positive and negative classes. It achieves this by comparing the true and false positive rates across various classification thresholds. The A U C values range from 0 to 1, with a perfect classifier yielding 1 and a random classifier yielding 0.5. The A U C concisely measures a model’s performance, enabling straightforward comparison between models and evaluations in scenarios with class imbalances [42].

3. Classification Results

The obtained discrimination report metrics for both variations (without and with feature selection) are documented in the following subsections.

3.1. Discrimination Results Without Feature Selection

The discrimination results when utilizing the complete feature set are presented in Table 5, Table 6, Table 7 and Table 8. The best results are highlighted in blue.
  • CN vs. AD: Referring to Table 5, the highest a c c u r a c y of 85.2% was achieved using information from the three planes, obtained through the cML algorithms LinSVC or OvsR. The lowest classification a c c u r a c y was 66.7%, obtained using the axial plane.
  • AD vs. MCI: From Table 6, the highest classification a c c u r a c y of 98.5% was achieved from the three planes using the LogReg classifier. The axial plane exhibited the lowest classification A c c u r a c y , recording a value of 64.7%.
  • CN vs. MCI: As seen in Table 7, the most notable classification outcome was 88.9%, achieved from the coronal plane using the ExTreeC cML algorithm. The axial plane yielded the least favorable outcome, demonstrating a c c u r a c y of 56.8%.
  • all vs. all: Table 8 shows that the most significant classification a c c u r a c y of 82.2% was achieved through the cML algorithms LinSVC or OvsR. The minimum a c c u r a c y attained was 49.5%, again from the axial plane.

3.2. Discrimination Results with Feature Selection

The discrimination results obtained using the F-score method for feature selection are presented in Table 9, Table 10, Table 11 and Table 12. The best results are highlighted in blue.
  • CN vs. AD: Referring to Table 9, the highest classification a c c u r a c y of 85.2% was achieved utilizing the sagittal plane, incorporating 491 selected features and employing the ExTreeC classifier. The lowest classification a c c u r a c y was 72.2% using the axial plane.
  • AD vs. MCI: From Table 10, the highest classification a c c u r a c y of 98.5% was attained through the ML algorithms LinSVC or OvsR, using 3 planes and 1379 features. The lowest classification a c c u r a c y attained was 80.9% using the axial plane.
  • CN vs. MCI: From Table 11, the peak a c c u r a c y value of 95.1% was obtained using 1129 features selected from the three planes and the ExTreeC classifier. The minimum a c c u r a c y reached was 67.9% from the axial plane.
  • all vs. all: As indicated in Table 12, the classification presented the highest a c c u r a c y , 87.1%, for the three planes using 1366 features selected and the LinSVC classifier. The axial plane demonstrated the lowest classification a c c u r a c y , registering 55.4%.

4. Discussion

We divide this section into two subsections to provide a comprehensive discussion. The first subsection compares the approach with no feature selection to the feature selection approach. The second subsection compares the results of the present work with the state-of-the-art results.

4.1. No Feature Selection Approach vs. Feature Selection Approach

Checking the results presented in Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12, it is possible to see that feature selection overall enhances the classification results.
Regarding the a c c u r a c y , the FS procedure outperformed the non-FS procedure by 0.9% for the pair C N vs. A D , 5.4% for C N vs. M C I , and 3.8% for a l l vs. a l l .
The r e c a l l also showed significant improvements from non-FS to FS, indicating a reduction in false negatives ( F N ). Improving the r e c a l l scores is essential since failing to detect positive cases of AD or misclassifying its stages can have severe consequences.
The F 1 s c o r e also increased when applying the FS procedure, reflecting a balanced enhancement in both identifying more real positive cases (higher r e c a l l ) and making fewer false positive errors (higher p r e c i s i o n ). In the AD diagnosis context, where classification errors must be minimized, this improvement demonstrates the robustness of the model when applying FS.
The A U C score also improved, enhancing the model’s ability to accurately classify patients into AD, MCI, or CN. This improvement aids in better evaluating the model’s performance, particularly in scenarios with class imbalances, such as the one involving MCI in the present study.
The above coincides with the known advantages of feature selection—identifying and retaining only the essential or relevant features for the classification task, culminating not only in better classification results but also in reducing the dimensionality of the data, improving the overall performance [46].
Focusing on the feature selection results, the two result values above 90% correspond to the pairs CN vs. MCI and AD vs. MCI. This is of high importance because, firstly, there is no active cure for AD at present. As MCI is considered a transitional stage between normal cognitive aging and AD, monitoring individuals diagnosed with MCI allows for the early detection of those who are more likely to progress to AD, enabling interventions and treatments to be initiated earlier. Thus, distinguishing between CN and MCI, as well as AD and MCI, can be helpful for clinicians and caregivers in the early development of personalized treatment plans tailored to the individual’s stage of cognitive decline.
The CN vs. MCI and AD vs. MCI groups show consistently good performance across all metrics reported. For the pair CN vs. MCI, the high s p e c i f i c i t y (96.2%) demonstrates the model’s exceptional capability in correctly identifying CN individuals (the negative class), which highlights its efficiency in avoiding false positive diagnoses, ensuring that individuals without MCI are rarely misclassified, and avoiding inappropriate medical interventions for those not affected by MCI. Furthermore, the G m e a n of 94.6% ensures that the s p e c i f i c i t y is well balanced with the s e n s i t i v i t y ( r e c a l l ), indicating that the model effectively manages classification errors ( F P and F N ). Additionally, the A U C score of 94.6% signifies the model’s robust ability to discriminate between the CN and MCI classes, performing well even in scenarios where the class proportions are not even. Concerning the pair AD vs. MCI, the p r e c i s i o n of 100.0%, paired with the r e c a l l of 94.1%, suggests a highly reliable model that is excellent in correctly identifying MCI cases without falsely labeling AD patients as MCI. The G m e a n of 99.0% is highly relevant in this case, enhancing the confidence that, even in the presence of class imbalances, which happens for this study pair, the model does not disproportionately favor one class over the other.
Typically, the pair CN vs. AD would exhibit the highest classification performance since these groups present the most significant anatomical differences in the brain [12]. This is also supported by the significant differences between the MMSE scores reported in Table 2, which are related to the subject’s level of symptomatology. Nevertheless, it is important to note that accurately distinguishing between cognitively normal individuals and those with AD presents significant challenges due to the complex and heterogeneous nature of the condition [47]. Despite these challenges, this work has demonstrated considerable strength in p r e c i s i o n , showing that, when the model predicts AD, these predictions are correct about 88.2% of the time, which is crucial to avoid the consequences of F P , such as unnecessary testing or treatment for individuals incorrectly diagnosed with AD. Moreover, the G m e a n of 86.0% suggests that the model maintains a good balance between r e c a l l and s p e c i f i c i t y , essential in minimizing both types of classification errors, indicating that the model performs satisfactorily well in identifying both CN and AD conditions with a low error rate.
Regarding the all vs. all pair, although the multi-class classification proved to be challenging, the model’s capability is affirmed by key metrics. Alongside the a c c u r a c y of 87.1%, the model achieved an F 1 - s c o r e of 87.3%, indicating a well-tuned model that not only captures most positive cases ( r e c a l l ) but also ensures that its predictions of the positive class are accurate ( p r e c i s i o n ), minimizing the risk of harmful F P . Concurrently, the s p e c i f i c i t y (92.7%) and G m e a n (90.0%) show that the model correctly identifies negative cases and provides balanced performance across various class distinctions, suggesting a reduced risk of overfitting.
Examining the outcomes across various study planes in Section 3.2, it is evident that the most successful results were obtained when utilizing the brain’s three-dimensional representation (using all three planes). This outcome was expected since combining the coronal, axial, and sagittal planes provides a holistic perspective on the brain, enabling the more precise visualization of critical structures for AD detection, namely the cerebral cortex, ventricles, and hippocampus, which are among the principal regions implicated in AD pathology [48]. Incorporating data from all three anatomical planes can offer complementary features, beneficial for classification endeavors, as evidenced by research findings indicating that multi-view models tend to exhibit superior a c c u r a c y compared to single-view models [49].

4.2. Study Results vs. State-of-the-Art Results

While acknowledging that, in medicine, the a c c u r a c y may not fully capture the balance between r e c a l l and s p e c i f i c i t y , the following comparison between the present study’s results and state-of-the-art results will primarily focus on the a c c u r a c y to check the model’s performance, as it enables a more direct comparison; see Table 1 and Figure 3.
Focusing on the findings from the ADNI database, the following should be noted.
  • Not all state-of-the-art studies attempted the three binary classifications conducted in the present work.
  • Only four state-of-the-art studies performed multi-classification—all vs. all. The present study outperformed the work of [12,20] by 11.8% and 16.1%, respectively. However, the present study showed 3.4% lower a c c u r a c y compared to the findings in [17].
  • The present work did not outperform the state-of-the-art work concerning the pair CN vs. AD, where it was surpassed by margins of a c c u r a c y ranging from 2.3% [16] to 13.8% [11].
  • In terms of the pair CN vs. MCI, the proposed model outperformed the majority of the consulted state-of-the-art sMRI-based studies [9,12,15,16,17,18,19,20] by 9.5%, 6.9%, 2.2%, 15.9%, 17.1%, 22.7%, 18.2%, and 5.9%, respectively. The one that yielded better performance than the current work did so by a maximum of 1.4% a c c u r a c y [14].
  • The pair that showed the higher classification a c c u r a c y , AD vs. MCI, successfully outperformed not only all sMRI-based studies but also the ones related to fMRI and PET, enhancing the importance of the present work’s findings. Higher results have been obtained with differences ranging from 0.10% [14] to 25.8% [19] regarding a c c u r a c y . Although the classes were imbalanced in the present work, due to the high number of MCI patients when compared with AD patients, the a c c u r a c y attained was accompanied by a G m e a n value of 99.0%, suggesting that the model generalizes well, avoiding overfitting. This is a significant finding, opening the door to more efficient early diagnosis in real-world settings since MCI is considered a precursor to AD.
When comparing the present work with other sMRI-based studies that used other databases rather than the ADNI, some conclusions can be obtained.
  • The performance of the present work in distinguishing between CN and AD did not surpass that of the state-of-the-art studies. Specifically, it was outperformed with a c c u r a c y margins ranging from 2.8% [21] to 12.6% [22].
  • Compared with the study in [23], the present study shows a gain of 4.4% regarding the a c c u r a c y for the AD vs. MCI pair, but a loss of 0.4% for the CN vs. MCI pair.
Finally, comparing the present results with those obtained from other examination modalities, the proposed model was able to achieve the following:
  • It surpassed the achieved a c c u r a c i e s for the AD vs. MCI pair when compared with the studies in [26,27] by 15.5% and 4.6%, respectively;
  • It surpassed the reported a c c u r a c i e s for CN vs. MCI by 18.1% [28] and by 0.1% [26], although it showed a loss of 3.0% when compared with [27];
  • It outperformed the study in [26] by 12.1%, but was outperformed by the algorithms employed in the studies in [27,28] by 8.5% and 2.9% in terms of a c c u r a c y , respectively; nevertheless, it was unable to outperform the state-of-the-art a c c u r a c i e s for the pair CN vs. AD, being surpassed by 11.8% [26], 10.8% [27], and 9.8% [28];
  • It underperformed with regard to the Rodrigues et al., 2021 study [27] for the multi-classification set ( a l l vs. a l l ), with differences of about 8.5%.
Although enriching, these comparisons must be carefully analyzed, because different signals, image databases and sources, feature selection methods, and classification means were employed in the state-of-the-art studies. It should also be noted that most state-of-the-art studies did not report other classification metrics, focusing only on the a c c u r a c y , which can be misleading. The proposed model presents a comprehensive set of metrics, demonstrating its a c c u r a c y in correctly classifying instances and effectively discriminating between classes, making it ready for implementation. Furthermore, this work introduces a model closer to market implementation using a hold-out validation method, which splits the dataset into training and testing subsets. This approach leverages the dataset’s considerable size to simulate the performance on unseen data, providing a realistic assessment of the model’s generalization capabilities, unlike most state-of-the-art methods presented in Table 1, which rely on cross-validation.

5. Conclusions

Alzheimer’s disease poses significant challenges due to its incurability and challenging diagnostic process. Early detection is of great interest, offering the possibility to delay the onset of debilitating symptoms that erode patients’ quality of life and shorten their lifespans. This work aimed to introduce an artificial intelligence model that can detect AD at the stages of CN, MCI, and AD itself using sMRI. By analyzing pre-processed brain images, 22 features computed from the gray-level co-occurrence matrix were extracted. Initially, machine learning algorithms were trained using the complete feature set, followed by a refinement step where only the most discriminant and compatible features were utilized, thereby incorporating a feature selection process. The findings demonstrated that feature selection consistently improved the model performance, resulting in heightened a c c u r a c i e s and enhancements in other relevant classification metrics to support more reliable and adequate decision-making in practical settings. Consequently, the algorithms achieved the following classification a c c u r a c i e s : 85.2% for CN vs. AD, 98.5% for AD vs. MCI, 95.1% for CN vs. MCI, and 87.1% for all vs. all. For the pair AD vs. MCI, the proposed model outperformed all of the consulted state-of-the-art imaging studies by 0.1% and the non-imaging source studies by 4.6%, as shown in Figure 3. It should be noted that Figure 3 is designed to illustrate both the improvements and limitations in terms of the discrimination accuracy of our method compared with the current state-of-the-art approaches, while acknowledging that a direct comparison and conclusions should always be considered carefully given the varied methodologies employed, database sizes, source data, and the potential influence of the applied processes’ margins of error.
Considering that most studies did not present results for all vs. all classification, the proposed model added this variant and performed well. Additionally, the use of a wide array of classification metrics, rather than only focusing on the accuracy, offers a broader and more profound view of the model’s performance, boosting our confidence in it. Therefore, the potential of this work in facilitating the diagnosis of AD in real-world circumstances, particularly in the early stages, is reinforced.
Although the results suggest that the proposed model can aid in the diagnostic process, providing medical practitioners with an additional tool for more confident decision-making, future work should focus on balancing the number of subjects within the classified groups and expanding the overall dataset. This will ensure more reliable generalization and, thus, a more robust model that is better prepared for real-world application. Furthermore, it could be beneficial to understand the impact of applying other forward-based feature selection methods instead of the F-score, e.g., Pearson’s correlation, linear discriminant analysis, ANOVA, or chi-squared tests, to maximize the discriminative performance of the cML models. Finally, it could also be interesting to extend and test the capability of the presented algorithm solution to detect preclinical AD pathologies in CN and cases of subjective cognitive impairment (SCI), which is another key medical need in the field of early AD diagnosis.

Author Contributions

Conceptualization, M.J.O. and P.M.R.; methodology, M.J.O., P.R. and P.M.R.; validation, P.M.R.; investigation, M.J.O., P.R. and P.M.R.; writing—original draft, M.J.O.; writing—review and editing, P.R. and P.M.R.; supervision, P.M.R.; funding acquisition, P.M.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu, accessed on 21 September 2024). As such, the investigators within the ADNI contributed to the design and implementation of the work and/or provided data but did not participate in the analysis or writing of this report. A complete list of the ADNI investigators can be found at http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf, (accessed on 21 September 2024).

Acknowledgments

This work was supported by Fundação para a Ciência e a Tecnologia (FCT), Portugal, through the project UIDB/50016/2020. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense, award number W81XWH-12-2-0012). The ADNI is funded by the National Institute on Aging and the National Institute of Biomedical Imaging and Bioengineering and through generous contributions from the following: AbbVie; Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai, Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche, Ltd., and its affiliated company, Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development, LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer, Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institute of Health Research provides funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org, accessed on 23 September 2024). The grantee organization is the Northern California Institute for Research and Education, and the study was coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. The ADNI data are disseminated by the Laboratory for Neuro-Imaging at the University of Southern California.

Conflicts of Interest

The authors declare no conflicts of interest.

Notation

p ( i , j ) ( i , j ) th entry in a normalized gray-tone
spatial dependence matrix.
N g Number of distinct gray levels in the
quantized image
μ x , μ y Mean gray level intensities of p x and p y ,
respectively
σ x , σ y Standard deviations of p x and p y , respectively
H ( X ) Entropy of p x
H ( Y ) Entropy of p y
p x ( i ) = j = 1 N g p ( i , j ) Marginal row probabilities
p y ( j ) = i = 1 N g p ( i , j ) Marginal column probabilities
p x + y ( k ) = i = 1 N g j = 1 N g p ( i , j ) where i + j = k and k = 2 , 3 , , 2 N g .
p x y ( k ) = i = 1 N g j = 1 N g p ( i , j ) where | i j | = k and k = 0 , 1 , , N g 1 .
H X Y = i j p ( i , j ) l o g ( p ( i , j ) )
H X Y 1 = i j p ( i , j ) l o g p x ( i ) p y ( j )
H X Y 2 = i j p x ( i ) p y ( j ) l o g p x ( i ) p y ( j )

References

  1. Scheltens, P.; De Strooper, B.; Kivipelto, M.; Holstege, H.; Chételat, G.; Teunissen, C.E.; Cummings, J.; van der Flier, W.M. Alzheimer’s disease. Lancet 2021, 397, 1577–1590. [Google Scholar] [CrossRef] [PubMed]
  2. Alzheimer Portugal. A Doença de Alzheimer. Available online: https://alzheimerportugal.org/a-doenca-de-alzheimer/ (accessed on 21 January 2024).
  3. World Health Organization. World Failing to Address Dementia Challenge. Available online: https://www.who.int/news/item/02-09-2021-world-failing-to-address-dementia-challenge (accessed on 21 January 2024).
  4. Porsteinsson, A.; Isaacson, R.; Knox, S.; Sabbagh, M.; Rubino, I. Diagnosis of Early Alzheimer’s Disease: Clinical Practice in 2021. J. Prev. Alzheimer’s Dis. 2021, 8, 371–386. [Google Scholar] [CrossRef] [PubMed]
  5. Raza, N.; Naseer, A.; Tamoor, M.; Zafar, K. Alzheimer Disease Classification through Transfer Learning Approach. Diagnostics 2023, 13, 801. [Google Scholar] [CrossRef] [PubMed]
  6. Qiu, C.; Kivipelto, M.; von Strauss, E. Epidemiology of Alzheimer’s disease: Occurrence, determinants, and strategies toward intervention. Dialogues Clin. Neurosci. 2009, 11, 111–128. [Google Scholar] [CrossRef]
  7. Alzheimer’s Association. 2023 Alzheimer’s disease facts and figures. Alzheimer’s Dement. 2023, 19, 1598–1695. [Google Scholar] [CrossRef]
  8. Tahami Monfared, A.A.; Byrnes, M.; White, L.; Zhang, Q. Alzheimer’s Disease: Epidemiology and Clinical Progression. Neurol. Ther. 2022, 11, 553–569. [Google Scholar] [CrossRef]
  9. Zhang, Q.; Long, Y.; Cai, H.; Chen, Y.W. Lightweight neural network for Alzheimer’s disease classification using multi-slice sMRI. Magn. Reson. Imaging 2024, 107, 164–170. [Google Scholar] [CrossRef]
  10. Srividhya, L.; Sowmya, V.; Vinayakumar, R.; Gopalakrishnan, E.A.; Sowmya, K.P. Deep learning-based approach for multi-stage diagnosis of Alzheimer’s disease. Multimed. Tools Appl. 2023, 83, 16799–16822. [Google Scholar] [CrossRef]
  11. Shukla, A.; Tiwari, R.; Tiwari, S. Structural biomarker-based Alzheimer’s disease detection via ensemble learning techniques. Int. J. Imaging Syst. Technol. 2023, 34, e22967. [Google Scholar] [CrossRef]
  12. Silva, J.; Bispo, B.C.; Rodrigues, P.M. Structural MRI Texture Analysis for Detecting Alzheimer’s Disease. J. Med. Biol. Eng. 2023, 43, 227–238. [Google Scholar] [CrossRef]
  13. Wang, L.; Sheng, J.; Zhang, Q.; Zhou, R.; Li, Z.; Xin, Y.; Zhang, Q. Functional Brain Network Measures for Alzheimer’s Disease Classification. IEEE Access 2023, 11, 111832–111845. [Google Scholar] [CrossRef]
  14. Lama, R.K.; Kim, J.I.; Kwon, G.R. Classification of Alzheimer’s Disease Based on Core-Large Scale Brain Network Using Multilayer Extreme Learning Machine. Mathematics 2022, 10, 1967. [Google Scholar] [CrossRef]
  15. Pei, Z.; Wan, Z.; Zhang, Y.; Wang, M.; Leng, C.; Yang, Y.H. Multi-scale attention-based pseudo-3D convolution neural network for Alzheimer’s disease diagnosis using structural MRI. Pattern Recognit. 2022, 131, 108825. [Google Scholar] [CrossRef]
  16. Prajapati, R.; Kwon, G.R. A Binary Classifier Using Fully Connected Neural Network for Alzheimer’s Disease Classification. J. Multimed. Inf. Syst. 2022, 9, 21–32. [Google Scholar] [CrossRef]
  17. Goenka, N.; Tiwari, S. Multi-class classification of Alzheimer’s disease through distinct neuroimaging computational approaches using Florbetapir PET scans. Evol. Syst. 2022, 14, 801–824. [Google Scholar] [CrossRef]
  18. Kang, W.; Lin, L.; Zhang, B.; Shen, X.; Wu, S. Multi-model and multi-slice ensemble learning architecture based on 2D convolutional neural networks for Alzheimer’s disease diagnosis. Comput. Biol. Med. 2021, 136, 104678. [Google Scholar] [CrossRef]
  19. Prajapati, R.; Khatri, U.; Kwon, G.R. An Efficient Deep Neural Network Binary Classifier for Alzheimer’s Disease Classification. In Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Republic of Korea, 13–16 April 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar] [CrossRef]
  20. Lee, E.; Choi, J.S.; Kim, M.; Suk, H.I. Toward an interpretable Alzheimer’s disease diagnostic model with regional abnormality representation via deep learning. NeuroImage 2019, 202, 116113. [Google Scholar] [CrossRef]
  21. Wen, J.; Thibeau-Sutre, E.; Diaz-Melo, M.; Samper-González, J.; Routier, A.; Bottani, S.; Dormont, D.; Durrleman, S.; Burgos, N.; Colliot, O. Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation. Med. Image Anal. 2020, 63, 101694. [Google Scholar] [CrossRef]
  22. Hussain, E.; Hasan, M.; Hassan, S.; Azmi, T.; Rahman, M.A.; Parvez, M.Z. Deep Learning Based Binary Classification for Alzheimer’s Disease Detection using Brain MRI Images. In Proceedings of the 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), Kristiansand, Norway, 9–13 November 2020; pp. 1115–1120. [Google Scholar] [CrossRef]
  23. Rallabandi, V.S.; Seetharaman, K. Deep learning-based classification of healthy aging controls, mild cognitive impairment and Alzheimer’s disease using fusion of MRI-PET imaging. Biomed. Signal Process. Control 2023, 80, 104312. [Google Scholar] [CrossRef]
  24. Lahmiri, S. Integrating convolutional neural networks, kNN, and Bayesian optimization for efficient diagnosis of Alzheimer’s disease in magnetic resonance images. Biomed. Signal Process. Control 2023, 80, 104375. [Google Scholar] [CrossRef]
  25. Qiu, S.; Joshi, P.S.; Miller, M.I.; Xue, C.; Zhou, X.; Karjadi, C.; Chang, G.H.; Joshi, A.S.; Dwyer, B.; Zhu, S.; et al. Development and validation of an interpretable deep learning framework for Alzheimer’s disease classification. Brain 2020, 143, 1920–1933. [Google Scholar] [CrossRef] [PubMed]
  26. Pirrone, D.; Weitschek, E.; Di Paolo, P.; De Salvo, S.; De Cola, M.C. EEG Signal Processing and Supervised Machine Learning to Early Diagnose Alzheimer’s Disease. Appl. Sci. 2022, 12, 5413. [Google Scholar] [CrossRef]
  27. Rodrigues, P.M.; Bispo, B.C.; Garrett, C.; Alves, D.; Teixeira, J.P.; Freitas, D. Lacsogram: A New EEG Tool to Diagnose Alzheimer’s Disease. IEEE J. Biomed. Health Inform. 2021, 25, 3384–3395. [Google Scholar] [CrossRef] [PubMed]
  28. Rodrigues, P.M.; Freitas, D.R.; Teixeira, J.P.; Alves, D.; Garrett, C. Electroencephalogram Signal Analysis in Alzheimer’s Disease Early Detection. Int. J. Reliab. Qual. E-Healthc. 2018, 7, 40–59. [Google Scholar] [CrossRef]
  29. AlMansoori, M.E.; Jemimah, S.; Abuhantash, F.; AlShehhi, A. Predicting early Alzheimer’s with blood biomarkers and clinical features. Sci. Rep. 2024, 14, 6039. [Google Scholar] [CrossRef]
  30. Silva, M.; Ribeiro, P.; Bispo, B.C.; Rodrigues, P.M. Detecção da Doença de Alzheimer através de Parâmetros Não-Lineares de Sinais de Fala. In Proceedings of the Anais do XLI Simpósio Brasileiro de Telecomunicações e Processamento de Sinais, Rio de Janeiro, Brazil, 8–11 October 2023; Sociedade Brasileira de Telecomunicações: Rio de Janeiro, Brazi, 2023. SBrT2023. [Google Scholar] [CrossRef]
  31. Johnson, K.A.; Fox, N.C.; Sperling, R.A.; Klunk, W.E. Brain Imaging in Alzheimer Disease. Cold Spring Harb. Perspect. Med. 2012, 2, a006213. [Google Scholar] [CrossRef]
  32. Hu, X.; Meier, M.; Pruessner, J. Challenges and opportunities of diagnostic markers of Alzheimer’s disease based on structural magnetic resonance imaging. Brain Behav. 2023, 13, e2925. [Google Scholar] [CrossRef]
  33. Alzheimer’s Disease Neuroimaging Initiative-ADNI. Data Sharing and Publication Policy. Available online: https://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_DSP_Policy.pdf (accessed on 15 May 2023).
  34. Jack, C.R.; Barnes, J.; Bernstein, M.A.; Borowski, B.J.; Brewer, J.; Clegg, S.; Dale, A.M.; Carmichael, O.; Ching, C.; DeCarli, C.; et al. Magnetic resonance imaging in Alzheimer’s Disease Neuroimaging Initiative 2. Alzheimer’s Dement. 2015, 11, 740–756. [Google Scholar] [CrossRef]
  35. Kuhnke, P. Structural Pre-Processing Batch. Available online: https://github.com/PhilKuhnke/fMRI_analysis/blob/main/1Preprocessing/SPM12/batch_structural_preproc_job.m (accessed on 25 May 2023).
  36. Kuhnke, P.; Kiefer, M.; Hartwigsen, G. Task-Dependent Recruitment of Modality-Specific and Multimodal Regions during Conceptual Processing. Cereb. Cortex 2020, 30, 3938–3959. [Google Scholar] [CrossRef]
  37. Fatima, A.; Shahid, A.R.; Raza, B.; Madni, T.M.; Janjua, U.I. State-of-the-Art Traditional to the Machine- and Deep-Learning-Based Skull Stripping Techniques, Models, and Algorithms. J. Digit. Imaging 2020, 33, 1443–1464. [Google Scholar] [CrossRef]
  38. Arafa, D.; Moustafa, H.E.D.; Ali, H.; Ali-Eldin, A.; Saraya, S. A deep learning framework for early diagnosis of Alzheimer’s disease on MRI images. Multimed. Tools Appl. 2023, 83, 3767–3799. [Google Scholar] [CrossRef]
  39. MathWorks Help Center. Graycomatrix. Available online: https://www.mathworks.com/help/images/ref/graycomatrix.html (accessed on 7 October 2023).
  40. Uppuluri, A. GLCM Texture Features. Available online: https://www.mathworks.com/matlabcentral/fileexchange/22187-glcm-texture-features (accessed on 8 October 2023).
  41. Peck, R.; Olsen, C.; Devore, J. Introduction to Statistics and Data Analysis; Cengage Learning: Belmont, CA, USA, 2008; p. 880. [Google Scholar]
  42. Ribeiro, P.; Sá, J.; Paiva, D.; Rodrigues, P.M. Cardiovascular Diseases Diagnosis Using an ECG Multi-Band Non-Linear Machine Learning Framework Analysis. Bioengineering 2024, 11, 58. [Google Scholar] [CrossRef] [PubMed]
  43. Sevani, N.; Hermawan, I.; Jatmiko, W. Feature Selection based on F-score for Enhancing CTG Data Classification. In Proceedings of the 2019 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), Banda Aceh, Indonesia, 22–24 August 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 18–22. [Google Scholar] [CrossRef]
  44. Hicks, S.A.; Strümke, I.; Thambawita, V.; Hammou, M.; Riegler, M.A.; Halvorsen, P.; Parasa, S. On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 2022, 12, 5979. [Google Scholar] [CrossRef] [PubMed]
  45. Mutasa, S.; Sun, S.; Ha, R. Understanding artificial intelligence based radiology studies: What is overfitting? Clin. Imaging 2020, 65, 96–99. [Google Scholar] [CrossRef]
  46. Dhal, P.; Azad, C. A comprehensive survey on feature selection in the various fields of machine learning. Appl. Intell. 2021, 52, 4543–4581. [Google Scholar] [CrossRef]
  47. Duara, R.; Barker, W. Heterogeneity in Alzheimer’s Disease Diagnosis and Progression Rates: Implications for Therapeutic Trials. Neurotherapeutics 2022, 19, 8–25. [Google Scholar] [CrossRef]
  48. Saleem, T.J.; Zahra, S.R.; Wu, F.; Alwakeel, A.; Alwakeel, M.; Jeribi, F.; Hijji, M. Deep Learning-Based Diagnosis of Alzheimer’s Disease. J. Pers. Med. 2022, 12, 815. [Google Scholar] [CrossRef]
  49. Ebrahimi-Ghahnavieh, A.; Luo, S.; Chiong, R. Transfer Learning for Alzheimer’s Disease Detection on MRI Images. In Proceedings of the 2019 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology(IAICT), Bali, Indonesia, 1–3 July 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar] [CrossRef]
Figure 1. Methodology workflow diagram.
Figure 1. Methodology workflow diagram.
Bioengineering 11 01153 g001
Figure 2. Skull stripping process in SPM: (a) original image; (b) processed image.
Figure 2. Skull stripping process in SPM: (a) original image; (b) processed image.
Bioengineering 11 01153 g002
Figure 3. State-of-the-art comparison with the present study (best a c c u r a c y ). For reference: (Shukla et al. 2023 [11]), (Hussain et al. 2020 [22]), (Pirrone et al. 2022 [26]), (Lama et al. 2022 [14]), (Rallabandi and Seetharama 2023 [23]), (Rodrigues et al. 2021 [27]), and (Goenka et al. 2022 [17]).
Figure 3. State-of-the-art comparison with the present study (best a c c u r a c y ). For reference: (Shukla et al. 2023 [11]), (Hussain et al. 2020 [22]), (Pirrone et al. 2022 [26]), (Lama et al. 2022 [14]), (Rallabandi and Seetharama 2023 [23]), (Rodrigues et al. 2021 [27]), and (Goenka et al. 2022 [17]).
Bioengineering 11 01153 g003
Table 1. Comprehensive comparison of studies on ADNI and on other databases.
Table 1. Comprehensive comparison of studies on ADNI and on other databases.
StudyDatabase# of SubjectsExam ModalityFeatures ExtractedBest ClassifierFeature SelectionValidation ApproachAccuracy
Within ADNI Database
[9]ADNI1200sMRISpatial InformationLightweight Neural NetworkNot AppliedHold-outCN vs. AD—95.0%
CN vs. MCI—85.6%
AD vs. MCI—87.5%
[10]ADNI1296sMRIHierarchical Patterns,
Textures, and Other
Structural Features
2D CNN (ResNet-50v2)Not AppliedCross-validationCN vs. AD—91.2%
All vs. All—90.3%
[11]ADNI600sMRIStatistical FeaturesEnsemble LR_SVMK-BestCross-validationCN vs. AD—99.0%
CN vs. MCI—96.0%
AD vs. MCI—85.0%
[12]ADNI89sMRIStatistical and
Textural Features
BagT
FGSVM
QSVM
SubKNN
F-scoreCross-validationCN vs. AD—93.3%
CN vs. MCI—88.2%
AD vs. MCI—87.7%
All vs. All—75.3%
[13]ADNI96fMRIGraph Theoretical
Measures
Linear SVMMRMRCross-validationCN vs. AD—96.8%
[14]ADNI95fMRICorrelation Matrix
Between Different ROIs
SL-RELMAdaptive Structure
Learning
Cross-validationCN vs. AD—95.4%
CN vs. MCI—96.5%
AD vs. MCI—98.4%
[15]ADNI2116sMRIMulti-Scale and
Hierarchical Transformed
Features
CNN (PKG-Net)Not AppliedCross-validationCN vs. AD—94.3%
CN vs. MCI—92.9%
AD vs. MCI—92.1%
[16]ADNI178sMRICortical Volumetric
Features
Fully Connected
Neural Network
PCACross-validationCN vs. AD—87.5%
CN vs. MCI—79.2%
AD vs. MCI—83.3%
[17]ADNI381Amyloid PET3D Slice-Based
Features
3D CNNNot AppliedHold-outCN vs. AD—92.9%
CN vs. MCI—78.0%
AD vs. MCI—87.2%
All vs. All—90.5%
[18]ADNI798sMRISpatial FeaturesEnsemble based on 2D CNNNot AppliedHold-outCN vs. AD—90.4%
CN vs. MCI—72.4%
AD vs. MCI—77.2%
[19]ADNI179sMRICortical Surface
Labels
DNNNot AppliedCross-validationCN vs. AD—85.2%
CN vs. MCI—76.9%
AD vs. MCI—72.7%
[20]ADNI801sMRIVoxel-Level GM Volume DensityCNN Ensemble ModelNot AppliedCross-validationCN vs. AD—92.8%
CN vs. MCI—89.2%
AD vs. MCI—81.5%
All vs. All—71%
Present StudyADNI504sMRIStatistical and
Textural Features
ExTreeC
LinSVC
OvsR
F-ScoreHold-outCN vs. AD—85.2% CN vs. MCI—95.1% AD vs. MCI—98.5% All vs. All—87.1%
Different Imaging Databases
[21]AIBL1455sMRIVoxel-Based FeaturesSVMNot AppliedCross-validationCN vs. AD—88.0%
[22]OASIS416sMRIModel Components12-layer CNNNot AppliedHold-outCN vs. AD—97.8%
[23]OASIS1098MRI-PET
Fusion
MRI-PET Feature MapsInception–ResNet CNN ModelCNN-Based RankingHold-outCN vs. AD—95.9% CN vs. MCI—95.5% AD vs. MCI—94.1%
[24]OASIS77MRICNN-Based FeaturesKNNROC-Based RankingCross-validationCN vs. AD—95.0%
[25]FHS102sMRIDisease Probability MapsCNNNot AppliedHold-outCN vs. AD—76.6%
[25]NACC582sMRIDisease Probability MapsCNNNot AppliedHold-outCN vs. AD—81.8%
Non-Imaging Modalities
[26]IRCCS105EEGFrequency Domain
Features
KNNInformation Gain FilterCross-validationCN vs. AD—97.0%
CN vs. MCI—95.0% AD vs. MCI—83.0% All vs. All—75.0%
[27]University Hospital Center of São João38EEGCepstral and
Lacstral Distances
ANNKW testCross-validationCN vs. ADM—96.0% CN vs. MCI—98.1% ADM-ADA vs. MCI—93.9% All vs. All—95.6%
[28]University Hospital Center of São João38EEGRelative Power, Spectral Ratios,
Maxima, Minima, and Zero
Crossing Distances
ANNKW testCross-validationCN vs. AD—95.0%
CN vs. MCI—77.0% AD vs. MCI—83.0% All vs. All—90.0%
[29]ADNI623BiomarkersSNPs, Gene and
Clinical Data
SVMMutual InformationCross-validationCN vs. MCI/AD—95.0%
[30]DementiaBank269Speech SignalsStatistical and
Non-Linear Parameters
LogRegNot AppliedCross-validationCN vs. AD—85.2%
Table 2. Database demographics overview.
Table 2. Database demographics overview.
Group# of SubjectsAge Average ± SDGender
F | M
MMSE Average ± SD
CN16777.9 ± 5.33828529.1 ± 1.16
MCI23577.0 ± 7.047715825.0 ± 4.22
AD10277.0 ± 7.33505218.9 ± 6.12
Table 3. Feature description.
Table 3. Feature description.
FeatureFormulaDescription
Autocorrelation i j ( i j ) p ( i , j ) Measures the magnitude of the fineness and coarseness of the texture
Contrast n = 0 N g 1 n 2 i = 1 N g j = 1 N g p ( i , j ) | i j = n Measures the local variations between pixels
Correlation 1 H X Y H X Y 1 m a x H X , H Y Estimates the combined probability occurrence of the indicated pixel pairs
Correlation 2 i j ( i j ) μ x μ y σ x σ y Estimates the combined probability occurrence of the indicated pixel pairs
Cluster Prominence i j ( i + j μ x μ y ) 4 p ( i , j ) Measures the skewness and asymmetry of the GLCM
Cluster Shade i j ( i + j μ x μ y ) 3 p ( i , j ) Measures the skewness and uniformity of the GLCM
Dissimilarity i j i j · p ( i , j ) Measures the local intensity variation, defined as the mean absolute difference between neighboring pairs
Energy i j p ( i , j ) 2 Specifies the sum of squared elements in the GLCM
Entropy i j p ( i , j ) l o g ( p ( i , j ) ) Assesses the randomness of an intensity image
Homogeneity 1 i j p ( i , j ) 1 + i j Measures the nearness of the distribution of elements in the GLCM to the GLCM diagonal
Homogeneity 2 i j 1 1 + ( i j ) 2 p ( i , j ) Measures the nearness of the distribution of elements in the GLCM to the GLCM diagonal
Maximum Probability max i , j p ( i , j ) Occurrence of the most predominant pair of neighboring intensity values
Variance i j ( i μ ) 2 p ( i , j ) Measures the distribution of neighboring intensity level pairs about the mean intensity level in the GLCM
Sum Average i = 2 2 N g i p x + y ( i ) Measures the relationship between the occurrences of pairs with lower intensity values and the occurrences of pairs with higher intensity values
Sum Variance i = 2 2 N g ( i f 16 ) 2 p x + y ( i ) Measures groupings of voxels with similar gray-level values
Sum Entropy i = 2 2 N g p x + y ( i ) l o g p x + y ( i ) Measures neighborhood intensity value differences
Difference Variance variance of p x y Measures the heterogeneity that places higher weights on differing intensity level pairs that deviate more from the mean
Difference Entropy i = 0 N g 1 p x y ( i ) l o g p x y ( i ) Measures the randomness/variability in the neighborhood intensity value differences
Information
Measure
of Correlation 1
H X Y H X Y 1 m a x H X , H Y Assesses the correlation between the probability distributions of i and j
Information
Measure
of Correlation 2
( 1 e x p [ 2.0 ( H X Y 2 H X Y ) ] ) 1 2 Assesses the correlation between the probability distributions of i and j
Inverse
Difference
Normalized
k = 0 N g 1 p ( x y ( k ) ) 1 + k N g Measures the local homogeneity of an image
Inverse
Difference
Moment
Normalized
k = 0 N g 1 p ( x y ( k ) ) 1 + k 2 N g 2 Measures the local homogeneity of an image
Table 4. Scikit-learn ML classifier configurations.
Table 4. Scikit-learn ML classifier configurations.
ClassifierHyperparameters
GaussianProcessClassifier (GauPro) 1.0 × RBF ( 1.0 )
LinearSVC (LinSVC)Default parameters
SGDClassifier (SGD)max_iter: 100, tol: 0.001
KNearestNeighborsClassifier (KNN)Default parameters
LogisticRegression (LogReg)solver: lbfgs
LogisticRegressionCV (LogRegCV)cv: 3
BaggingClassifier (BaggC)Default parameters
ExtraTreesClassifier (ExTreeC)n_estimators: 300
RandomForestClassifier (RF)max_depth: 5, n_estimators: 300, max_features: 1
GaussianNB (GauNB)Default parameters
DecisionTreeClassifier (DeTreeC)max_depth: 5
MLPClassifier (MLP)α: 1, max_iter: 1000
AdaBoostClassifier (AdaBoost)Default parameters
QuadraticDiscriminantAnalysis (QuaDis)Default parameters
OnevsRestClassifier (OvsR)random_state: 0
LightGBMClassifier (LGBM)Default parameters
GradientBoostingClassifier (GradBoost)Default parameters
SGDClassifier (SGD)max_iter: 100, tol: 0.001
Table 5. Classification metrics for the pair CN vs. AD.
Table 5. Classification metrics for the pair CN vs. AD.
CN vs. ADClassifierAccuracyRecallPrecisionSpecificityF1-ScoreAUCGmean
3 PlanesLinSVC/OvsR85.2%77.3%85.0%85.3%81.0%83.9%85.1%
Axial PlaneDeTreeC66.7%10.5%66.7%66.7%18.2%53.8%66.7%
Coronal PlaneLinSVC70.4%52.4%64.7%73.0%57.9%67.1%68.7%
Sagittal PlaneLogRegCV79.6%57.9%78.6%80.0%66.7%74.7%79.3%
Table 6. Classification metrics for the pair AD vs. MCI.
Table 6. Classification metrics for the pair AD vs. MCI.
AD vs. MCIClassifierAccuracyRecallPrecisionSpecificityF1AUCGmean
3 PlanesLogReg98.5%94.1%100.0%98.1%97.0%97.1%99.0%
Axial PlaneBaggC64.7%40.0%40.0%75.0%40.0%57.5%54.8%
Coronal PlaneExTC94.1%73.3%100.0%93.0%84.6%86.7%96.4%
Sagittal PlaneLinSVC/OvsR95.6%95.2%90.9%97.8%93.0%95.5%94.3%
Table 7. Classification metrics for the pair CN vs. MCI.
Table 7. Classification metrics for the pair CN vs. MCI.
CN vs. MCIClassifierAccuracyRecallPrecisionSpecificityF1AUCGmean
3 PlanesLinSVC/OvsR80.2%74.4%82.9%78.3%78.4%80.0%80.5%
Axial PlaneBaggC56.8%40.0%50.0%60.4%44.4%54.8%54.9%
Coronal PlaneExTreeC88.9%83.8%91.2%87.2%87.3%88.5%89.2%
Sagittal PlaneExTreeC82.7%80.6%75.8%87.5%78.1%82.3%81.4%
Table 8. Classification metrics for the pair all vs. all.
Table 8. Classification metrics for the pair all vs. all.
All vs. AllClassifierAccuracyRecallPrecisionSpecificityF1AUCGmean
3 PlanesLinSVC/OvsR82.2%82.2%83.0%89.9%81.9%89.8%86.2%
Axial PlaneBaggC49.5%49.5%48.4%74.5%48.1%62.2%60.0%
Coronal PlaneExTreeC65.3%65.3%62.5%84.2%63.6%86.6%71.8%
Sagittal PlaneExTreeC68.3%68.3%68.1%85.6%66.3%86.4%76.2%
Table 9. Classification metrics for the pair CN vs. AD-FS.
Table 9. Classification metrics for the pair CN vs. AD-FS.
CN vs. AD# of FeaturesClassifierAccuracyRecallPrecisionSpecificityF1AUCGmean
3 Planes943LogRegCV77.8%70.8%77.3%78.1%73.9%77.1%77.7%
Axial Plane204OvsR72.2%42.9%75.0%71.4%54.5%66.9%73.2%
Coronal Plane533LinSVC/OvsR79.6%65.2%83.3%77.8%73.2%77.8%80.5%
Sagittal Plane491ExTreeC85.2%71.4%88.2%83.8%78.9%82.7%86.0%
Table 10. Classification metrics for the pair AD vs. MCI-FS.
Table 10. Classification metrics for the pair AD vs. MCI-FS.
AD vs. MCI# of FeaturesClassifierAccuracyRecallPrecisionSpecificityF1AUCGmean
3 Planes1379LinSVC/OvsR98.5%94.1%100.0%98.1%97.0%97.1%99.0%
Axial Plane8BaggC80.9%50.0%69.2%83.6%58.1%71.0%76.1%
Coronal Plane645ExTreeC91.2%80.0%95.2%89.4%87.0%88.8%92.3%
Sagittal Plane199ExTreeC95.6%95.8%92.0%97.7%93.9%95.6%94.8%
Table 11. Classification metrics for the pair CN vs. MCI-FS.
Table 11. Classification metrics for the pair CN vs. MCI-FS.
CN vs. MCI# of FeaturesClassifierAccuracyRecallPrecisionSpecificityF1AUCGmean
3 Planes1129ExTreeC95.1%93.1%93.1%96.2%93.1%94.6%94.6%
Axial Plane5ExTreeC67.9%58.8%62.5%71.4%60.6%66.6%66.8%
Coronal Plane606ExTreeC86.4%75.0%93.1%82.7%83.1%85.3%87.7%
Sagittal Plane509ExTreeC88.9%75.9%91.7%87.7%83.0%86.0%89.7%
Table 12. Classification metrics for the pair all vs. all-FS.
Table 12. Classification metrics for the pair all vs. all-FS.
All vs. All# of FeaturesClassifierAccuracyRecallPrecisionSpecificityF1AUCGmean
3 Planes1366LinSVC87.1%87.1%87.7%92.7%87.3%94.7%90.0%
Axial Plane694BaggC55.4%55.4%57.0%75.2%55.2%73.8%64.7%
Coronal Plane495ExTreeC75.2%75.2%73.6%89.5%73.4%87.2%81.1%
Sagittal Plane608ExTreeC71.3%71.3%68.6%89.5%67.7%89.3%78.1%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Oliveira, M.J.; Ribeiro, P.; Rodrigues, P.M. Machine Learning-Driven GLCM Analysis of Structural MRI for Alzheimer’s Disease Diagnosis. Bioengineering 2024, 11, 1153. https://doi.org/10.3390/bioengineering11111153

AMA Style

Oliveira MJ, Ribeiro P, Rodrigues PM. Machine Learning-Driven GLCM Analysis of Structural MRI for Alzheimer’s Disease Diagnosis. Bioengineering. 2024; 11(11):1153. https://doi.org/10.3390/bioengineering11111153

Chicago/Turabian Style

Oliveira, Maria João, Pedro Ribeiro, and Pedro Miguel Rodrigues. 2024. "Machine Learning-Driven GLCM Analysis of Structural MRI for Alzheimer’s Disease Diagnosis" Bioengineering 11, no. 11: 1153. https://doi.org/10.3390/bioengineering11111153

APA Style

Oliveira, M. J., Ribeiro, P., & Rodrigues, P. M. (2024). Machine Learning-Driven GLCM Analysis of Structural MRI for Alzheimer’s Disease Diagnosis. Bioengineering, 11(11), 1153. https://doi.org/10.3390/bioengineering11111153

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop