DTI-Based Structural Connectome Analysis of SCLC Patients After Chemotherapy via Machine Learning

Stavros Theofanis Miloulis; Ioannis Kakkos; Ioannis Zorzos; Ioannis A. Vezakis; Eleftherios Kontopodis; Ourania Petropoulou; Errikos M. Ventouras; Yu Sun; George K. Matsopoulos

doi:10.3390/app152312458

,

and

¹

Biomedical Engineering Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, 15790 Athens, Greece

²

Department of Biomedical Engineering, University of West Attica, 12243 Athens, Greece

³

Key Laboratory for Biomedical Engineering of MOE of China, Department of Biomedical Engineering, Zhejiang University, Hangzhou 310027, China

⁴

College of Mathematical Medicine, Zhejiang Normal University, Jinhua 321004, China

Appl. Sci.2025, 15(23), 12458;https://doi.org/10.3390/app152312458

This article belongs to the Special Issue Advanced Technologies in Medical/Health Informatics

Version Notes

Order Reprints

Abstract

Small-cell lung cancer (SCLC) is an aggressive malignancy that exhibits high prevalence for brain metastases. Furthermore, chemotherapy and metastasis-preventive approaches are also linked to neurotoxicity, further aggravating cognitive impairment. Despite evidence supporting structural and functional brain alterations in SCLC, the application of machine learning (ML) to new connectivity biomarkers has remained unexplored. This study is—to the best of our knowledge—the first to apply ML to structural brain connectomics in SCLC, using diffusion tensor imaging (DTI) to identify features discriminating between post-chemotherapy SCLC patients and healthy controls. Specifically, we constructed structural networks via deterministic tractography, applying an adapted feature reduction technique to identify the most informative connections without selection bias. This process isolated 16 connections involving 26 brain regions, predominantly in the frontal, temporal, and parietal lobes, showcasing primarily intra-hemispheric and left-lateralized alterations. Our optimal model leveraged a Gaussian Support Vector Machine (SVM), achieving a weighted accuracy of 0.92, a sensitivity of 0.93, a specificity of 0.91, and an area under the curve of 0.94. The selected feature subset retained high performance when tested with other classifiers, confirming its robustness. Our findings differ from prior studies based on statistically derived features, highlighting the ML-driven connectomics’ potential in uncovering DTI-derived SCLC patterns, offering interpretable insights for neuroimaging-based diagnostics.

Keywords:

small-cell lung cancer; DTI; tractography; structural connectivity; machine-learning

1. Introduction

Small-cell lung cancer (SCLC) is the most aggressive lung cancer subtype [], representing approximately 15% of global lung cancer cases, and is associated with a poor prognosis and a 5-year survival rate below 7%. The severity of SCLC is further exemplified by its high propensity for metastasis exhibited early on during the disease, with around 70% of new diagnoses at stage IV []. The brain constitutes a particularly frequent site for SCLC metastasis, with around 10–20% of patients presenting brain metastasis at diagnosis and more than 50% developing it over the disease course [].

In this light, clinicians are striving to adjust treatment strategies to also mitigate cognitive effects, with key challenges encountered both in first-line treatment and in additional protective methods. For standard treatment, the current approach (systemic therapy or platinum-based combination chemotherapy []) is limited in preventing brain metastases due to the blood–brain barrier hindering CNS penetration [,]. Regarding protective measures, the typically applied technique for both limited-stage and extended-stage SCLC [] (prophylactic cranial irradiation—PCI []) does not fully mitigate the risk of brain metastases, while cognitive impact may be further aggravated due to the innate neurotoxicity effects of PCI itself [,]. In this context, there is a pressing need in clinical practice for further evidence on the cognitive impact of both SCLC and its standard treatments, as well as on strategies aimed at protecting brain function.

In response, several neuroimaging studies—often combined with neuropsychological batteries []—have attempted to characterize the cognitive impact of SCLC and its treatment. Research has mainly focused on magnetic resonance imaging (MRI) variations such as T1-weighted MRI, diffusion tensor imaging (DTI), and functional MRI (fMRI), assessing gray matter integrity and structural/functional connectivity in both SCLC and NSCLC populations at various stages. As such, voxel-based morphometry (VBM) has been applied on structural T1-weighted MRI in conjunction with serological markers of SCLC patients (a) at diagnosis, (b) post-chemotherapy and pre-PCI, and (c) in follow-ups post-PCI [] in pursuit of early biomarkers for cognitive impairment. In their study, Simó et al. (2023) [] identified alterations (even prior to chemotherapy) in gray matter integrity measurements, indicating disruption in anterior, mid-cingulate, superior, inferior temporal, and fusiform gyri, as well as in the precuneus. Similarly, Vaquero et al. (2021) [] assessed the effect of physical exercise on a population of SCLC and NSCLC patients as a countermeasure against cognitive effects after both chemotherapy and PCI, demonstrating a decreased loss of gray matter volume (GMV) for the SCLC patients who followed the intervention.

The above modalities have also been used in conjunction with graph theory, which has been ideal in describing neuronal communication on an anatomical and functional processing level []. Namely, the cohort analyzed by Bromis et al. (2017) [] via resting-state fMRI (rs-fMRI) included SCLC patients after chemotherapy and prior to PCI treatment (together with healthy controls), to investigate functional connectivity disturbances in the default mode network (DMN), the sensorimotor network (SMN), and the task-positive network (TPN). For all these resting state networks, the results revealed lower connectivity for the patient group in multiple regions. Another study by Simó et al. (2018) [] applied rs-fMRI to NSCLC patients pre-chemotherapy, SCLC patients post-chemotherapy, and healthy controls, attempting to classify groups based on functional connectivity profiles involving 15 resting state networks. Interestingly, while the DMN showed decreased connectivity for both patient groups, increased connectivity was observed in the left/right anterior temporal (LAT/RAT) and cerebellum networks, possibly indicating a compensatory response. A population of similar characteristics (NSCLC prior to chemotherapy, SCLC after chemotherapy and prior to PCI, and healthy controls) was evaluated by the same team from a brain structure perspective [], utilizing T1-weighted MRI and DTI to identify cognitive impairment in cancer patients before and after chemotherapy alike. Finally, another multimodal study has utilized T1-weighted MRI, DTI, and rs-fMRI on pre-PCI SCLC patients [], revealing statistically significant differences across all modalities in accordance with the other studies. Overall, despite a general consensus acknowledging a cognitive impact of SCLC and chemotherapy, the number of studies remains limited [,,,,,].

Despite these important findings, most approaches have relied on statistical group comparisons, which may overlook subtle or multivariate connectivity patterns. As a result, the literature provides limited evidence and lacks data-driven frameworks capable of uncovering hidden substrates of cognitive impairment in SCLC [,,]. This is in contrast to state-of-the-art brain connectivity analyses employing artificial intelligence in other neuroimaging applications, enabling efficient modeling and classification under multiple conditions such as mild cognitive impairment [], Alzheimer’s [], Parkinson’s [], schizophrenia [], epilepsy [], and aphasia []. From this standpoint, DTI-derived connectivity features have been used as inputs to machine learning algorithms, allowing for the differentiation between healthy and patient populations [,] and yielding a high performance (measured via classification accuracy and AUC). Particularly, DTI features have appeared to outperform clinical data in some cases [].

Overall, despite the success of the combined application of connectivity analysis and machine learning in other conditions, there exists a glaring gap in SCLC surveys, with only one study having employed multivariate pattern analysis (MVPA) on functional connectivity data from both SCLC and NSCLC patients [] and no prior studies having applied ML with structural connectivity in SCLC. Motivated by this gap and inspired by similar ML-based approaches in other disorders, the present study applies a data-driven machine learning approach utilizing DTI-derived structural connectivity information to identify and interpret prominent features distinguishing between SCLC patients and healthy controls, reflecting cognitive impact linked to neurotoxicity. Accordingly, our objectives are to (i) develop an interpretable DTI-based ML framework to discriminate SCLC patients from healthy controls; (ii) derive a stable, low-dimensional subset of discriminative connections; and (iii) interpret the selected features in terms of neuroscience, highlighting their anatomical organization and implications.

2. Materials and Methods

2.1. Subjects

This study included 25 SCLC patients (after chemotherapy and prior to PCI) and 14 healthy controls (HC), who underwent MRI anisotropic diffusion (DTI) recording in the Radiology Research Unit of the Evgenidion Hospital between July 2012 and June 2014. Although the small cohort size reflects the clinical rarity of SCLC cases undergoing DTI, its adequacy is supported by a post hoc ROC-based analytic power estimation (Appendix C), which indicated sufficient power for detecting group-level effects. All participants were right-handed with no history of neurological or psychiatric disorders. The study was approved by the Evgenidion Hospital’s relevant ethics committee (Protocol Code: 145/25-07-2012), complying with the MRI safety criteria, while written informed consent was obtained from all participants. Exclusion criteria included the following: (1) motion artifacts in the DTI sequence (assessed during the preprocessing step), (2) presence of brain metastases, (3) psychotropic medication treatment, and (4) alcohol or drug abuse. Three patients did not meet the aforementioned criteria and thus were excluded from the subsequent analysis. The demographic data of the included participants are presented in Table A1 of Appendix A along with statistical analysis, which demonstrated no statistically significant differences with respect to the age, gender, or education level, with the single difference being associated with the smoking ratio.

2.2. Data Acquisition and Preprocessing

MRI/DTI data were collected at the Radiology Research Unit of the Medical Imaging Department at Evgenidion Hospital, Athens, Greece. Scans were performed using a 3.0 Tesla Philips Achieva MRI scanner (Philips Medical Systems, Best, The Netherlands) equipped with an 8-channel SENSE head coil. High-resolution structural T1-weighted images were acquired using a sagittal gradient echo (GE) sequence with the following parameters: repetition time (TR) = 9.8 ms, echo time (TE) = 3.7 ms, flip angle = 7°, field of view (FOV) = 256 × 256 mm², acquisition matrix = 256 × 240, and reconstructed voxel resolution of 1 × 1 × 1 mm³. Diffusion-weighted images were obtained using a single-shot spin echo-planar imaging (EPI) sequence with 60 gradient directions (b-value = 1000 s/mm²), along with a reference image without diffusion weighting (b₀ = 0 s/mm²). Imaging parameters included TR = 4156 ms, TE = 69 ms, flip angle = 90°, FOV = 256 × 256 mm², acquisition matrix = 128 × 126, and a reconstructed isotropic voxel size of 2 × 2 × 2 mm³. Corrections for eddy current-induced distortions and subject motion were applied via affine registration of each diffusion-weighted volume to the corresponding b₀ image []. A diffusion tensor model was then fitted to the data at each voxel, allowing for the computation of fractional anisotropy (FA) maps on a per-subject basis. DTI preprocessing was conducted using the FMRIB Software Library (FSL, version 6.0.7; www.fmrib.ox.ac.uk/fsl, accessed on 17 April 2025).

2.3. Structural Connectivity Estimation

To quantify structural brain connectivity, a subject-specific network was generated for each participant. As such, the MRI/DTI data were parcellated utilizing the Automated Anatomical Labeling (AAL) atlas [,,], dividing the gray matter into 90 cortical and subcortical regions of interest (ROIs) (45 per hemisphere). T1-weighted anatomical images were co-registered to diffusion-derived fractional anisotropy (FA) images via affine transformation, followed by non-linear registration to the MNI ICBM152 template []. The AAL atlas was then transformed into native space through inverse warping. Structural connectivity edges were estimated using whole-brain deterministic tractography (WBDT), applying the Fiber Assignment by Continuous Tracking (FACT) algorithm [], with a minimum FA threshold of 0.2 and a maximum turning angle of 45°. For each pair of ROIs, a connection was retained if at least three fiber tracts (Ν ≥ 3) were present, in line with previous structural connectome studies to reduce false-positive connections and improve anatomical plausibility [,]. Edge weights were defined using the corresponding FA values to reflect white matter integrity. All connectivity estimation procedures were carried out using the PANDA (Pipeline for Analyzing Brain Diffusion Images) toolbox [] in MATLAB R2024b (MathWorks, Natick, MA, USA). The full pipeline of the implemented framework is presented in Figure 1.

Figure 1. Schematic representation of the methodological pipeline. Upon data acquisition, structural networks were extracted for each participant, linking 90 (45 R +45 L) regions by quantifying whole-brain deterministic tractography data. As such, the initial feature space includes the connections selected after applying the Ν ≥ 3 threshold for fiber tracts. This was further reduced by maintaining only features with non-zero values in at least 75% of the subjects, and by applying the RFE-CBR method described in Section 2.4. Subsequently, classification results were elicited for the optimal feature subset, using leave-one-out cross-validation supported by permutation testing.

2.4. Network-Based Feature Selection

To explore the feasibility of efficient distinction between the brain structures and investigate connectivity attributes of the SCLC and HC groups, classification was applied utilizing the network edges as discriminative variables. In this regard, the FA structural connectivity values of each individual were considered features, leading to 90 × (90 − 1)/2 = 4005 unique values. Furthermore, with a focus on providing a robust and effective machine learning design and due to the high sparsity of the FA Matrix, only features with non-zero values in at least 75% of the subjects were retained, leading to a total of 185 features that were eventually included.

Nevertheless, the ratio of retained features to available samples remains relatively high, which may increase the risk of overfitting during the classification process. In consequence, a feature selection procedure was employed to remove redundant features and establish an optimal subset for improving generalization performance and robustness. To that end, a previously validated recursive feature elimination algorithm with correlation bias reduction (RFE-CBR) was employed [], applying an SVM (support vector machine) classifier to the full feature set to assess each feature’s prominence.

In each iteration, an SVM model is generated, and a ranking criterion is calculated for each feature. Since SVMs utilize the input values to estimate the maximum-margin decision boundary for the separation of the two classes, the ranking criterion is based on the weight vectors to the hyperplane. Subsequently, features correspond to the ranking criterion, with the smallest value being removed from the feature space and the remaining features constituting the feature set for the next iteration. This backward elimination procedure is repeated until all the feature vectors are removed in succession. In addition, the RFE-CBR feature selection method evaluates the correlation between the features, facilitating their potential influence on classification modeling and, therefore, leading to more robust results []. After the algorithm concludes, all features are sorted by the reverse order of removal, thus comprising the feature selection ranked set.

2.5. Classification

To distinguish between SCLC and HC groups, the SVM classifier was employed, being a supervised-learning method that maps known examples (data points) in an N-dimensional space and determines the optimal decision boundary that splits the classes while maintaining a maximum-margin between the hyperplane and the input data points. To address the cases in which input data are not linearly separable, a modified feature space (kernel trick) was applied to transform the dimensionality of the data and thus identify the optimal non-linear hyperplane. Specifically, a Gaussian radial basis function (RBF) kernel was employed for the SVM, accounting for the non-linear interactions and dependence in the data and between the feature vectors in feature space [].

Both the feature selection and classification procedures were implemented using a Leave-One-Out Cross Validation (LOOCV) approach to alleviate subject-specific dependencies while allowing for the detection of subject-invariant variables able to encode information related to SCLC. As such, one instance (corresponding to one subject) was used as a testing set, and a testing label was assigned to the rest in a sequential manner until every instance had been utilized as a testing set. During the feature selection, the RFE-CBR algorithm only employed data from the training set, generating 36 ranked feature sets. In a similar fashion, the SVM models were constructed by utilizing the training sets, estimating performance by applying the training sets to the produced classification models. Performance was assessed by averaging the SVM models’ various metrics across all folds. However, due to class imbalance, conventional classification accuracy (i.e., total correct predictions, divided by total instances) would not successfully demonstrate the validity of the SVM models, so the weighted accuracy (assigning different weights to each class based on the number of samples) was used for evaluation, henceforth simply denoted as accuracy.

In accordance with the feature selection method described above, a sequential process was followed to identify the optimal subset, starting with a null feature subset and adding, in succession, the most frequently shared feature from the ranked RFE-CBR set. In each step, an SVM model was generated, repetitively assessing the classification performance, until the feature space included all features. The optimal feature subset was deemed the one with the highest classification accuracy. Furthermore, to achieve maximum performance, SVM model parameter fine-tuning was applied. In this context, the box constraint (C) was varied from 10⁻² to 10² with logarithmic steps, while the kernel coefficient (γ) employed an exponential scaling from 2⁻⁵ to 2⁵. Moreover, to address potential selection bias or overfitting during the classification procedures, 1000 random permutations of the class labels (under the same LOOCV design) were implemented. Decision function scores from each classifier were used to generate ROC curves and compute AUC. Therefore, an empirical classification accuracy and area under the ROC curve (AUC) distribution and a corresponding p-value were estimated by calculating the ratio of the number of permutations that achieved higher accuracy/AUC than the original samples to the number of total permutations [].

2.6. Feature-Related Performance Validation

For a more rigorous estimation of the performance of the features employed, we also investigated different classification methods. The assumption driving this step was that a high classification performance would indicate the universal properties of the selected features with respect to their discriminative ability, regardless of the model []. Classification was thus additionally performed by applying the optimal RFE-CBR subset to linear SVM kernel, k-Nearest Neighbor (kNN), Linear Discriminant Analysis (LDA), and Random Forest (RF) classifiers, estimating the resulting performance.

3. Results

3.1. White Matter Connectivity Patterns

As mentioned earlier, a total of 185 features (i.e., 185 connections between different ROIs) were incorporated in the subsequent machine learning pipeline (Figure 2). Nodes included all 90 AAL ROIs, with the degree centrality ranging from 1 to 10, with the maximum values observed for the left and right precuneus regions (Figure 3). In addition, the majority of connections present a frontal predominance with 73 out of 185 network edges involving frontal regions, 34 of which concern frontal-to-frontal connections. The same trend is demonstrated in respect to both temporal (with 27 out of 50 (54%) of temporal nodes referring to temporal-to-temporal connections) and parietal connections (with 17 out of 51 (33%) reflecting parietal-to-parietal). On the other hand, occipital ROIs demonstrate both occipital-to-occipital (15 out of 40) and occipital-to-parietal connectivity (14 out of 40), while no clear trend could be discerned regarding cingulate gyri, insula gyri, and central structures. Overall connectivity demonstrated an intra-hemispheric pattern with the majority of edges (176 out of 185) lying in the same hemisphere, although no predominant hemisphere was discerned (92 left-to-left, 84 right-to-right). The abbreviations of the ROIs are presented in Table A2 of Appendix B.

Figure 2. Summary of the 185 structural connections remaining after the threshold. The circular plot displays individual connections between different ROIs. Connections have been color-coded to mark left intra-hemispheric, right intra-hemispheric, and inter-hemispheric connections. The box colors correspond to the centrality degree of each ROI.

Figure 3. Connectivity matrix showing connections between brain areas, with each color denoting the number of between-area connections.

3.2. Classification Performance and Feature Robustness

The Gaussian SVM classification model achieved 0.92 accuracy and 0.94 AUC (sensitivity = 0.93; specificity = 0.91) (p < 0.001, 1000 permutations), employing the selected 16 features, with optimal values for parameters C and γ being 1 and 0.25, respectively (Table 1). Furthermore, in order to validate whether the optimal feature subset identified via the RFE-CBR selection algorithm would attain high performance regardless of the classification method, a pool of additional machine learning models was tested, including linear kernel SVM (C = 1), kNN (number of neighbors, k = 3), LDA (no regularization and no score transformation) and RF classifiers (Number of Trees, N = 30; Split Criterion = Gini’s Diversity Index; Maximum Depth of Each Decision Tree = No Limit; Number of Features Considered at Each Split = All; Minimum Number of Samples Required to Split an Internal Node = 1; Minimum Number of Samples at a Leaf Nod = 1; Bootstrap Sampling = Enabled). In each case, the resulting performance was calculated, with the results being presented in Table 1. The confusion matrix of the Gaussian SVM model and the Receiver Operating Characteristic (ROC) curves and their corresponding AUC are presented in Figure 4. As illustrated, all methods achieved an AUC of at least 0.92, although none outperformed the proposed Gaussian SVM classifier. This finding supports our hypothesis that the features identified as part of the optimal set exhibit global discriminative characteristics, remaining relatively robust to the algorithmic differences between methods.

Table 1. Classification performance and validation results of optimal feature subset using different classification methods.

Figure 4. In the upper left corner, the confusion matrix of the Gaussian SVM is presented. The performance of the Gaussian SVM (upper middle), linear SVM (upper left), k-NN (bottom left), LDA (bottom middle), and RF (bottom right) models is illustrated through the ROC curves.

To further evaluate model generalization, we implemented a soft-voting ensemble combining Gaussian SVM, kNN, and RF classifiers, achieving 0.86 accuracy (AUC = 0.90), slightly below that of the Gaussian SVM, indicating that, within the current feature space and sample size, the Gaussian SVM retained the best generalization capability.

3.3. ROC-Based Analytic Power Estimation

To provide a sample adequacy justification, a ROC-based justification of sample adequacy was employed. As such, treating the classifier as a diagnostic test, we used the Hanley–McNeil variance for the area under the ROC curve (AUC) []. To that end, we estimated the variance of the area under the ROC curve (AUC) using the Hanley–McNeil approach under the binormal model. For our observed AUC = 0.94 with n₁ = 22 SCLC and n₂ = 14 HC, the standard error was SE = 0.04, yielding z = 11 versus AUC = 0.5 (two-sided α = 0.05), 95% CI [0.862, 1.000], and power ≈ 1.00. Under a conservative scenario, assuming AUC = 0.75 to be true with the same sample sizes, SE = 0.081, z = 3.09, and two-sided power ≈ 0.87. The formulas and calculations are provided in Appendix C.

3.4. Interpretation of Discriminative Predictors

The optimal feature subset incorporated 16 features (connections), engaging 26 unique brain areas (Figure 5). As such, the majority of the selected features link ROIs in the frontal (five), temporal (five), and parietal (four) nodes, whereas the edges that involve the insula gyri regions were excluded by the feature selection algorithm. Regarding the strength of the connections, 10 out of 16 demonstrated an increase from HC to SCLC groups, while 6 out of 16 demonstrated an overall decrement. In detail, the connections with increased strength included four temporal-to-temporal (HIP.R-PHG.R, FFG.L-TPOmid.L, PHG.L-TPOsup.L, and STG.L-TPOsup.L), one frontal-to-frontal (SFGdor.R-SFGmed.R), one frontal-to-temporal (PreCG.L-MTG.L), one parietal-to-parietal (PCUN.R-SPG.R), one occipital-to-occipital (CAL.L-SOG.L), and two inter-hemispherical cingulate gyri connections (ACG.R-ACG.L and PCG.R-PCG.L). Correspondingly, the six connections with a decrease in strength included cingulate-to-cingulate (DCG.L-PCG-L), parietal-to-parietal (ANG.L-SPG.L), frontal-to-parietal (PCUN.L-PCL.L), central-to-frontal (PreCG.L-PUT.L, SFGdor.L-PUT.L), and one inter-hemispherical parietal connection (PCUN.R-PCUN.L). Notably, all connections linking temporal regions exhibited reduced strength in the HC group, although the limited number of these connections constrains the ability to infer a broader trend. Moreover, contrary to the overall features, the optimal subset presented an intra-hemispheric predominance, with 10 left-to-left and 3 right-to-right edges, along with 3 inter-hemispheric connections. Interestingly, a subsequent statistical analysis on all features via a one-way ANOVA did not reveal any statistically significant (p-value < 0.05) alterations among the classes, with the exception of PHG-TPOsup and PreCG-PUT connections within the left hemisphere.

Figure 5. The optimal features regarding (a) the individual feature mean value for the HC and SCLC group. At the top, the left-to-left hemisphere, while at the bottom, the right-to-right (left corner) hemisphere and interhemispheric (left) connections; (b) the distribution of the selected connectivity features. The color of each connection represents the difference in the mean value of the connectivity strength between HC and SCLC. The color bar on the bottom right indicates the connectivity alteration from HC to SCLC.

4. Discussion

In this paper, we applied a data-driven ML analysis of DTI-derived structural connectivity networks to investigate the neurocognitive alterations attributed to SCLC and chemotherapy compared with a group of healthy controls. Particularly, the initial features corresponded to strength values of fractional anisotropy (FA) connections, engaging 45 brain regions (a total of 90 nodes considering both left and right hemispheres). Upon filtering across the whole feature space for sparsity, the total 185 connections mainly involved frontal regions, which have been associated with cognitive impairment induced by cancer and chemotherapy [,,]. Similarly, temporal, parietal, and occipital regions have also been associated with the same processes in other cancer types [,]. Moreover, the predominance of intra-hemispheric and intra-lobar connections, highlighting the presence of segregation connectivity patterns, is consistent with the inherent characteristics of DTI FA measurements, favoring detection of localized tracts compared to the longer and fewer long-range connections [,]. To the best of the authors’ knowledge, no prior study has applied ML to analyze structural connectivity in an SCLC cohort. The implications of this paper are further discussed below.

To navigate within the feature space, a feature selection and classification framework was applied, utilizing the FA connections to discriminate between SCLC and HC. Our approach achieved a high performance of 0.92 accuracy and 0.94 AUC (0.93 sensitivity, 0.91 specificity), utilizing 16 out of the 185 features considered. A noteworthy observation is that the optimal standalone SVM-based framework outperformed the soft-voting ensemble. This could potentially be linked to the modest sample size, since ensemble methods may be more susceptible to overfitting [], especially when they incorporate classifiers that individually perform less reliably on high-dimensional neuroimaging data []. On another note, it could also be hypothesized that this result is inherent to our methodology, since the RFE-based feature subset is particularly well aligned with the Gaussian SVM; therefore, inclusion of additional models may confound the discriminative characteristics captured by the SVM, leading to reduced overall accuracy. This suggests that, in the current application setting, the selected connectivity features offer maximal discriminative power when paired with a single model specifically tuned for these connections.

However, it should be noted that more modern techniques—such as deep learning classifiers—have the potential to yield superior performance, though they often obscure the decision-making process, as they represent information through complex transformations that are challenging to interpret []. As such, even explainability methodologies are calculated based on layered representations that abstract away from the original input features, making it difficult to trace back which features drive the classification outcomes []. Given our aim to achieve not only high performance but also meaningful insights into the brain structures associated with SCLC, we opted to employ more traditional, interpretable machine learning approaches.

Examining the optimal feature subset, our first observation is the small number of connections engaging a few brain regions (16 out of 185 total connections and 26 out of 90 brain regions), which facilitate the interpretability of the resulting model. Moreover, the recurrence of these features across cross-validation folds, together with the low p-values obtained from permutation testing, supports the stability of the selected features and reduces the likelihood of overfitting. Coupled with the high classification metrics, this is an indicator of the robustness of our approach []. This trait is also corroborated by our feature performance validation step, with all other classifiers exhibiting high values in all metrics [] (accuracy, sensitivity, specificity, AUC, and F1-score). Particularly, the selected features included connections between frontal–temporal, frontal–parietal, frontal–central, frontal–frontal, temporal–temporal, parietal–parietal, cingulate–cingulate, and occipital–occipital regions, in accordance with prior DTI and fMRI studies that were focusing on the same group of SCLC patients [,]. Frontal and temporal regions were particularly prominent among the identified connections, in accordance with the full feature space and prior research establishing these regions as especially susceptible to chemotherapy effects. Moreover, temporal areas were exclusively involved in connections exhibiting increased strength in the SCLC group, while other regions were involved in connections of both increasing and decreasing strength between groups. In this regard, while decreasing connection strength for the patient group may be an indicator of treatment-related or displacement-associated disruption of white matter connectivity [] affecting cognition, the opposite pattern could reflect compensatory hyperconnectivity mechanisms within temporal circuits, a hypothesis supported by previous findings of temporal lobe vulnerability. This finding is consistent among studies examining both multiple non-central-nervous-system cancer types [,,] and cancer-type-specific cohorts, such as ovarian [] and breast cancer [].

With regard to segregation mechanisms in the optimal feature subset, intra-hemispheric communication remained prevalent, with the left hemisphere emerging as predominant with 10/16 connections. Inter-hemispheric communication remains heavily under-represented as well (3/16), reflecting the incidence of the corresponding connections within the full set (9/185). While this intra-hemispheric predominance has already been pointed out as method-inherent, the discriminative power of these connections suggests that the impact of cancer and/or treatment may indeed manifest as intra-hemispheric reorganization, reflecting changes in commissural pathways such as the corpus callosum, or a more general localized compensation as part of new segregation patterns. This assumption is also supported by similar findings in breast cancer patients after chemotherapy (a widely studied population for chemotherapy-related cognitive impairment [], hence used as contextual evidence), which additionally report a disruption of inter-hemispheric integration [,]. Finally, the participants’ handedness (all right-handed) could explain the high prominence of the left hemisphere among connections since this side of the brain is typically dominant for language and executive functions in right-handed individuals. This is also consistent with prior DTI studies reporting lateralized vulnerability in white matter tracts engaging regions typically implicated in executive and verbal processing []. The importance of the regions selected as features by our model is also highlighted by earlier DTI and fMRI studies on breast cancer patients after chemotherapy, reporting disorders related to attention, language, and memory [,], consistent with functional consequences of altered connectivity that may arise from displacement, chemotherapy-related microstructural effects, or network-level reorganization [,]. However, since no standalone hemispheric comparison was implemented, any apparent left-hemispheric predominance should be interpreted with caution and viewed as an observation emerging from the selected feature subset rather than evidence of a definitive hemispheric effect.

Placing the showcased discriminative features in the context of prior neuroimaging research on SCLC, our findings implicate several regions functionally associated with the default mode network (DMN), as well as, to a lesser extent, the task-positive network (TPN) and the sensorimotor network (SMN). Notably, previous fMRI studies of lung cancer patients have reported decreased DMN functional connectivity following chemotherapy or prophylactic cranial irradiation [,]. Although a substantial part of the 26 brain regions comprising our optimal feature set is DMN-related, the observed structural connectivity changes are heterogeneous, including both increased and decreased connection strengths, suggesting complex reorganization rather than a uniform disruption. In contrast, regions associated with the TPN and SMN were less represented in our findings, with only three connections involving these networks appearing in the optimal subset []. Moving beyond high-level associations with standard networks, the selected features include connections (e.g., HIP.R-PHG.R and PHG.L-TPOsup.L) involving brain areas that are essential for episodic and contextual memory; thus, the corresponding alterations are consistent with memory-related deficits frequently reported in lung cancer populations (not necessarily SCLC) []. Connections linking the superior temporal gyrus and the temporal poles (e.g., STG.L-TPOsup.L, and FFG.L-TPOmid.L) can also be linked to previous lung cancer studies, potentially associated with observed learning impairments []. Similarly, parietal/precuneus-based (PCUN.R-SPG.R, ANG.L-SPG.L, and PCUN.R-PCUN.L) and frontal/frontostriatal (SFGdor.R-SFGmed.R, PreCG.L-PUT.L, and SFGdor.L-PUT.L) alterations have indeed been associated with visuospatial processing [] and overall executive dysfunctions [] that have been observed in patients.

In relation to existing DTI-based work employing statistical analyses, the results partially overlap with prior findings. Specifically, the intersection of AAL regions exhibiting statistically significant differences in connectivity metrics with our feature set included 7 (PHG.L, STG.L, SPG.R, SPG.L, PreCG.L, SFGmed.R, and ANG.L []) and 12 (SFGdor.R, ANG.L, TPOsup.L, MTG.R, CAL.L, SFGmed.L, ACG.R, DCG.L, PCUN.R, PCUN.L, PCL.R, and STG.L []) AAL regions in prior DTI studies. Nevertheless, a notable discrepancy still exists between ML-derived and statistically derived regions based on prior research on SCLC cohorts. The originality of our findings is further underscored by the fact that a conventional statistical analysis (one-way ANOVA) identified only two connections as significantly differentiating SCLC patients from healthy controls (PHG.L-TPOsup.L and PreCG.L-PUT.L). This contrast reflects the fundamental methodological difference between the two approaches and highlights the added value of incorporating machine learning to uncover discriminative patterns that may remain undetected through traditional statistical approaches alone []. Specifically, ANOVA evaluates each connection independently and, therefore, cannot detect subtle but coordinated alterations across multiple features, whereas ML leverages multivariate patterns—including several sub-threshold differences—that collectively provide strong discriminative power.

Nonetheless, it should be emphasized that the interpretation of the identified structural connections and associated brain regions (i.e., network nodes) should consider the potential presence of confounding parameters, including subject-specific clinical variability and behavioral parameters, such as smoking, which indeed presented a statistically significant difference in our dataset. Moreover, a larger cohort could reduce the risk of overfitting within ML modeling [], although our approach showcased the inherent quality of the selected features as discriminative markers of SCLC patients. A larger dataset would allow us to reduce variance in cross-validated estimates while enabling the concurrent implementation of statistical analytics, shedding light on the underlying neural processes expressed through brain connectivity patterns, while enabling interpretability. Furthermore, imaging-based biomarkers can also be combined with neuropsychological batteries for the assessment of patients [], providing added value by correlating connectivity observations with behavioral alterations as a result of cognitive impairment.

5. Conclusions

In this study, a machine learning framework was employed—for the first time—to investigate and interpret structural brain connectivity alterations in SCLC patients vs. healthy controls following chemotherapy using DTI data. As such, we utilized fractional anisotropy to construct structural brain networks, which served as input features for subsequent selection and classification. To identify the most informative connections, we applied the RFE-CBR algorithm, resulting in a salient subset capable of distinguishing SCLC patients from healthy controls with high discriminative power. The robustness and generalizability of the selected features were confirmed through consistent performance across multiple classifiers, with the SVM yielding the highest classification performance. Analysis of the optimal feature set showed evidence of intra-hemispheric reorganization, particularly left-lateralized alterations involving frontal and temporal regions, though these observations should be interpreted with caution given the modest sample size. These findings support the capacity of machine learning-based structural connectomics in detecting neural alterations driven by complex connectivity patterns and treatment effects, highlighting its relevance for neuroimaging-based biomarker discovery. Future work should focus on validating these findings in larger cohorts, correlating statistical and ML results, and incorporating longitudinal assessments (including cognitive batteries) to track treatment-related changes over time in order to further refine the underlying connectivity signature of SCLC and chemotherapy in a broader sense.

Author Contributions

All authors contributed to the writing of the original draft, review, and editing of this study. Additional contributions included: Conceptualization: S.T.M. and G.K.M.; data curation: I.Z. and I.A.V.; formal analysis: O.P., Y.S., and E.K.; funding acquisition: G.K.M.; investigation: S.T.M. and I.K.; methodology: S.T.M., I.Z., and I.K.; project administration: G.K.M.; resources: E.M.V. and Y.S.; software: S.T.M. and I.A.V.; supervision: G.K.M.; validation: O.P. and E.M.V.; visualization: I.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by Evgenidion Hospital’s relevant ethics committee (Protocol Code: 145/25-07-2012).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data are not publicly available due to privacy reasons and can be made available upon request from the corresponding authors.

Acknowledgments

We would like to thank Kostakis Gkiatis and Anastasios Metzelopoulos for their support with data preprocessing, as well as Matilda Papathanasiou, Nikolaos Kelekis, and Vasileios Kouloulias for their valuable assistance in data collection.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AAL	Automated Anatomical Labeling
ANOVA	Analysis of Variance
AUC	Area Under the Curve
CI	Confidence Interval
CNS	Central Nervous System
DMN	Default Mode Network
DTI	Diffusion Tensor Imaging
EPI	Echo-Planar Imaging
fMRI	Functional Magnetic Resonance Imaging
FA	Fractional Anisotropy
FACT	Fiber Assignment by Continuous Tracking
FOV	Field of View
GE	Gradient Echo
GMV	Gray Matter Volume
HC	Healthy Control
kNN	k-Nearest Neighbor
LAT	Left Anterior Temporal
LDA	Linear Discrimination Analysis
LOOCV	Leave-One-Out Cross Validation
MCI	Mild Cognitive Impairment
ML	Machine Learning
MR	Magnetic Resonance
MRI	Magnetic Resonance Imaging
MVPA	Multivariate Pattern Analysis
NB	Naïve Bayes
NSCLC	Non-Small-Cell Lung Cancer
PANDA	Pipeline for Analyzing Brain Diffusion Images
PCI	Prophylactic Cranial Irradiation
rs-fMRI	Resting-State Functional Magnetic Resonance Imaging
RAT	Right Anterior Temporal
RBF	Radial Basis Function
RF	Random Forest
RFE-CBR	Recursive Feature Elimination algorithm with Correlation Bias Reduction
ROC	Receiver Operator Characteristic
ROI	Region of Interest
SCLC	Small-Cell Lung Cancer
SE	Standard Error
SMN	Sensorimotor Network
SVM	Support Vector Machine
SVR	Support Vector Regression
TE	Echo Time
TPN	Task-Positive Network
TR	Repetition Time
WBDT	Whole-Brain Deterministic Tractography

Appendix A. Demographics

Table A1 presents the demographic data for the SCSC and HC groups, as well as a summary of the cancer stage and regimens. Statistical analysis demonstrated no statistically significant differences with respect to the age, gender, and education level (evaluated based on years of education) factors, with the single difference being associated with the smoking ratio.

Table A1. Summary of demographics between the two groups, including cancer stage and chemotherapy regimens.

Classifier	Controls (n = 14)	Patients (n = 20)	p-Value
Age (years)	56.42 (7.62)	54.81 (5.98)	0.49
Education (years)	17 (6.44)	13.31 (4.9)	0.07
Gender			0.46
Male	8 (57%)	13 (65%)
Female	6 (43%)	7 (35%)
Smoking	2 (14%)	12 (60%)	0.002
Stage	-	IIB—10 (50%)
	-	IIIA—8 (40%)
	-	IIIB—2 (10%)
Regimen_1 ¹	-	15 (75%)
Regimen_2 ²	-	5 (25%)

¹ Regimen_1: Cisplatin 60–80 mg/m² day 1 + Etoposide 100–120 mg/m² days 1–3 (every 21 days); ² Regimen_2: Carboplatin 5 AUC day 1 + Etoposide 100–120 mg/m² days 1–3 (every 21 days).

Appendix B. AAL Regions of Interest (ROIs)

Table A2 displays the names, abbreviations, and brain areas of the AAL regions of interest (ROIs).

Table A2. Names and corresponding abbreviations of the regions of interest (ROIs).

Region Name	Abbreviation	Area
Precentral gyrus	PreCG	Frontal Lobe
Superior frontal gyrus, dorsolateral	SFGdor	Frontal Lobe
Superior frontal gyrus, orbital part	ORBsup	Frontal Lobe
Middle frontal gyrus	MFG	Frontal Lobe
Middle frontal gyrus, orbital part	ORBmid	Frontal Lobe
Inferior frontal gyrus, opercular part	IFGoperc	Frontal Lobe
Inferior frontal gyrus, triangular part	IFGtriang	Frontal Lobe
Inferior frontal gyrus, orbital part	ORBinf	Frontal Lobe
Rolandic operculum	ROL	Frontal Lobe
Supplementary motor area	SMA	Frontal Lobe
Olfactory cortex	OLF	Frontal Lobe
Superior frontal gyrus, medial	SFGmed	Frontal Lobe
Superior frontal gyrus, medial orbital	ORBsupmed	Frontal Lobe
Gyrus rectus	REC	Frontal Lobe
Insula	INS	Insula Gyri
Anterior cingulate, paracingulate gyri	ACG	Cingulate Gyri
Median cingulate, paracingulate gyri	DCG	Cingulate Gyri
Posterior cingulate gyrus	PCG	Cingulate Gyri
Hippocampus	HIP	Temporal Lobe
Parahippocampal gyrus	PHG	Temporal Lobe
Amygdala	AMYG	Temporal Lobe
Calcarine fissure	CAL	Occipital Lobe
Cuneus	CUN	Occipital Lobe
Lingual gyrus	LING	Occipital Lobe
Superior occipital gyrus	SOG	Occipital Lobe
Middle occipital gyrus	MOG	Occipital Lobe
Inferior occipital gyrus	IOG	Occipital Lobe
Fusiform gyrus	FFG	Temporal Lobe
Postcentral gyrus	PoCG	Parietal Lobe
Superior parietal gyrus	SPG	Parietal Lobe
Inferior parietal lobule	IPL	Parietal Lobe
Supramarginal gyrus	SMG	Parietal Lobe
Angular gyrus	ANG	Parietal Lobe
Precuneus	PCUN	Parietal Lobe
Paracentral lobule	PCL	Frontal Lobe
Caudate nucleus	CAU	Central Structures
Lenticular nucleus, putamen	PUT	Central Structures
Lenticular nucleus, pallidum	PAL	Central Structures
Thalamus	THA	Central Structures
Heschl gyrus	HES	Temporal Lobe
Superior temporal gyrus	STG	Temporal Lobe
Temporal pole (superior)	TPOsup	Temporal Lobe
Middle temporal gyrus	MTG	Temporal Lobe
Temporal pole (middle)	TPOmid	Temporal Lobe
Inferior temporal gyrus	ITG	Temporal Lobe

Appendix C. ROC-Based Analytic Power Estimation

Treating the classifier as a diagnostic test, we used the Hanley–McNeil variance for the area under the ROC curve (AUC) []. The Hanley–McNeil approach provides an approximate variance of the AUC under the binormal model, given the number of positive and negative cases.

For an AUC estimate,

\hat{A}

, with n₁ = number of positive cases (SCLC) and n₂ = number of negative cases (HC), the variance is

V a r (\hat{A}) = \frac{\hat{A} (1 - \hat{A}) + (n_{1} - 1) (Q_{1} - {\hat{A}}^{2}) + (n_{2} - 1) (Q_{2} - {\hat{A}}^{2})}{n_{2} n_{1}}

(A1)

where

Q_{1} = \frac{\hat{A}}{2 - \hat{A}}

(A2)

Q_{2} = \frac{2 {\hat{A}}^{2}}{1 + \hat{A}}

(A3)

The standard error (SE) is

S E = \sqrt{V a r (\hat{A})}

(A4)

For the power calculation procedure, we tested the null hypothesis, H₀, AUC = 0.5, and calculated the z-statistic:

z = \frac{\hat{A} - 0.5}{S E}

(A5)

Power is then the probability that a normal deviate exceeds the critical value, z_a, given the alternative hypothesis. For practical reporting, if z is very large (e.g., >6), power ≈ 1. In this study

\hat{A} = 0.94

, n₁ = 14, and n₂ = 22; therefore,

Q_{1} = \frac{0.94}{2 - 0.94} = \frac{0.94}{1.06} \approx 0.887

(A6)

Q_{2} = \frac{{2 \times 0.94}^{2}}{1 + 0.94} = \frac{1.796}{1.94} \approx 0.991

(A7)

V a r = \frac{0.94 \times 0.06 + (21) (0.067) + (13) (0.355)}{308} = \frac{0.47870337}{308} \approx 0.0016

(A8)

The standard error (SE) is

S E = \sqrt{0.0016} = 0.04

(A9)

Therefore, z will be calculated as follows:

z = \frac{(0.94 - 0.5)}{0.04} = 11

(A10)

This is astronomically above the critical

z_{0.05} = 1.96

; therefore,

p o w e r \approx 1.00

(two-sided α = 0.05).

For the observed AUC = 0.94 with SE = 0.04, the 95% confidence interval was calculated as

0.94 \pm 1.96 \times 0.04 = [0.8616, 1.0184] \approx [0.862, 1.000]

(A11)

Even under a conservative assumption of a true AUC = 0.75, with the same sample size, we get

Q_{1} = 0.6

,

Q_{2} = 0.642857

,

V a r = 0.0065572820

,

S E = 0.08097705

, and

z = 3.0873

. Power ≈ 0.87.

References

Tian, Y.; Li, Q.; Yang, Z.; Zhang, S.; Xu, J.; Wang, Z.; Bai, H.; Duan, J.; Zheng, B.; Li, W.; et al. Single-Cell Transcriptomic Profiling Reveals the Tumor Heterogeneity of Small-Cell Lung Cancer. Signal Transduct. Target. Ther. 2022, 7, 346. [Google Scholar] [CrossRef]
Megyesfalvi, Z.; Gay, C.M.; Popper, H.; Pirker, R.; Ostoros, G.; Heeke, S.; Lang, C.; Hoetzenecker, K.; Schwendenwein, A.; Boettiger, K.; et al. Clinical Insights into Small Cell Lung Cancer: Tumor Heterogeneity, Diagnosis, Therapy, and Future Directions. CA Cancer J. Clin. 2023, 73, 620–652. [Google Scholar] [CrossRef]
Steindl, A.; Schlieter, F.; Klikovits, T.; Leber, E.; Gatterbauer, B.; Frischer, J.M.; Dieckmann, K.; Widhalm, G.; Zöchbauer-Müller, S.; Hoda, M.A.R.; et al. Prognostic Assessment in Patients with Newly Diagnosed Small Cell Lung Cancer Brain Metastases: Results from a Real-Life Cohort. J. Neurooncol. 2019, 145, 85–95. [Google Scholar] [CrossRef]
Saltos, A.; Shafique, M.; Chiappori, A. Update on the Biology, Management, and Treatment of Small Cell Lung Cancer (SCLC). Front. Oncol. 2020, 10, 1074. [Google Scholar] [CrossRef]
Rittberg, R.; Banerji, S.; Kim, J.O.; Rathod, S.; Dawe, D.E. Treatment and Prevention of Brain Metastases in Small Cell Lung Cancer. Am. J. Clin. Oncol. 2021, 44, 629–638. [Google Scholar] [CrossRef]
Zhu, Y.; Cui, Y.; Zheng, X.; Zhao, Y.; Sun, G. Small-Cell Lung Cancer Brain Metastasis: From Molecular Mechanisms to Diagnosis and Treatment. Biochim. Biophys. Acta (BBA)-Mol. Basis Dis. 2022, 1868, 166557. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Li, J.; Hu, Y.; Zhang, Y.; Lin, Z.; Zhao, Z.; Jiao, S. Prophylactic Cranial Irradiation Could Improve Overall Survival in Patients with Extensive Small Cell Lung Cancer: A Retrospective Study. Strahlenther. Onkol. 2016, 192, 905–912. [Google Scholar] [CrossRef]
Tang, L.; Tian, G.; Li, N. Current Dilemma and Future Directions over Prophylactic Cranial Irradiation in SCLC: A Systematic Review in MRI and Immunotherapy Era. Front. Oncol. 2024, 14, 1382220. [Google Scholar] [CrossRef]
Simó, M.; Vaquero, L.; Alemany, M.; Padrones, S.; Caravaca, A.; Padró, A.; Navarro, A.; Palmero, R.; Nadal, E.; Bruna, J. P01.04.B LONG-TERM COGNITIVE TOXICITY IN SMALL CELL LUNG CANCER POPULATION: EARLY CLINICAL, NEUROIMAGING, AND SEROLOGICAL BIOMARKERS. Neuro-Oncology 2023, 25, ii26–ii27. [Google Scholar] [CrossRef]
Vaquero, L.; Rodríguez-Fornells, A.; Pera-Jambrina, M.Á.; Bruna, J.; Simó, M. Plasticity in Bilateral Hippocampi after a 3-Month Physical Activity Programme in Lung Cancer Patients. Eur. J. Neurol. 2021, 28, 1324–1333. [Google Scholar] [CrossRef]
Bassett, D.S.; Sporns, O. Network Neuroscience. Nat. Neurosci. 2017, 20, 353–364. [Google Scholar] [CrossRef]
Bromis, K.; Gkiatis, K.; Karanasiou, I.; Matsopoulos, G.; Karavasilis, E.; Papathanasiou, M.; Efstathopoulos, E.; Kelekis, N.; Kouloulias, V. Altered Brain Functional Connectivity in Small-Cell Lung Cancer Patients after Chemotherapy Treatment: A Resting-State fMRI Study. Comput. Math. Methods Med. 2017, 2017, 1403940. [Google Scholar] [CrossRef]
Simó, M.; Rifà-Ros, X.; Vaquero, L.; Ripollés, P.; Cayuela, N.; Jové, J.; Navarro, A.; Cardenal, F.; Bruna, J.; Rodríguez-Fornells, A. Brain Functional Connectivity in Lung Cancer Population: An Exploratory Study. Brain Imaging Behav. 2018, 12, 369–382. [Google Scholar] [CrossRef]
Simó, M.; Root, J.C.; Vaquero, L.; Ripollés, P.; Jové, J.; Ahles, T.; Navarro, A.; Cardenal, F.; Bruna, J.; Rodríguez-Fornells, A. Cognitive and Brain Structural Changes in a Lung Cancer Population. J. Thorac. Oncol. 2015, 10, 38–45. [Google Scholar] [CrossRef]
Mentzelopoulos, A.; Gkiatis, K.; Karanasiou, I.; Karavasilis, E.; Papathanasiou, M.; Efstathopoulos, E.; Kelekis, N.; Kouloulias, V.; Matsopoulos, G.K. Chemotherapy-Induced Brain Effects in Small-Cell Lung Cancer Patients: A Multimodal MRI Study. Brain Topogr. 2021, 34, 167–181. [Google Scholar] [CrossRef] [PubMed]
Bzdok, D.; Altman, N.; Krzywinski, M. Statistics versus Machine Learning. Nat. Methods 2018, 15, 233–234. [Google Scholar] [CrossRef]
Wein, S.; Deco, G.; Tomé, A.M.; Goldhacker, M.; Malloni, W.M.; Greenlee, M.W.; Lang, E.W. Brain Connectivity Studies on Structure-Function Relationships: A Short Survey with an Emphasis on Machine Learning. Comput. Intell. Neurosci. 2021, 2021, 5573740. [Google Scholar] [CrossRef] [PubMed]
Singh, N.M.; Harrod, J.B.; Subramanian, S.; Robinson, M.; Chang, K.; Cetin-Karayumak, S.; Dalca, A.V.; Eickhoff, S.; Fox, M.; Franke, L.; et al. How Machine Learning Is Powering Neuroimaging to Improve Brain Health. Neuroinformatics 2022, 20, 943–964. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Shao, Y.; Wang, J.; Liu, Y.; Yang, Y.; Wang, Z.; Xi, Q. Machine Learning Based on Functional and Structural Connectivity in Mild Cognitive Impairment. Magn. Reson. Imaging 2024, 109, 10–17. [Google Scholar] [CrossRef]
Billeci, L.; Badolato, A.; Bachi, L.; Tonacci, A. Machine Learning for the Classification of Alzheimer’s Disease and Its Prodromal Stage Using Brain Diffusion Tensor Imaging Data: A Systematic Review. Processes 2020, 8, 1071. [Google Scholar] [CrossRef]
Huang, X.; He, Q.; Ruan, X.; Li, Y.; Kuang, Z.; Wang, M.; Guo, R.; Bu, S.; Wang, Z.; Yu, S.; et al. Structural Connectivity from DTI to Predict Mild Cognitive Impairment in de Novo Parkinson’s Disease. NeuroImage Clin. 2024, 41, 103548. [Google Scholar] [CrossRef] [PubMed]
Sun, Y.; Zhang, Z.; Kakkos, I.; Matsopoulos, G.K.; Yuan, J.; Suckling, J.; Xu, L.; Cao, S.; Chen, W.; Hu, X.; et al. Inferring the Individual Psychopathologic Deficits with Structural Connectivity in a Longitudinal Cohort of Schizophrenia. IEEE J. Biomed. Health Inform. 2022, 26, 2536–2546. [Google Scholar] [CrossRef]
Kamiya, K.; Amemiya, S.; Suzuki, Y.; Kunii, N.; Kawai, K.; Mori, H.; Kunimatsu, A.; Saito, N.; Aoki, S.; Ohtomo, K. Machine Learning of DTI Structural Brain Connectomes for Lateralization of Temporal Lobe Epilepsy. Magn. Reson. Med. Sci. 2016, 15, 121–129. [Google Scholar] [CrossRef]
Hu, X.; Varkanitsa, M.; Kropp, E.; Betke, M.; Ishwar, P.; Kiran, S. Aphasia Severity Prediction Using a Multi-Modal Machine Learning Approach. NeuroImage 2025, 317, 121300. [Google Scholar] [CrossRef]
Mateos-Pérez, J.M.; Dadar, M.; Lacalle-Aurioles, M.; Iturria-Medina, Y.; Zeighami, Y.; Evans, A.C. Structural Neuroimaging as Clinical Predictor: A Review of Machine Learning Applications. NeuroImage Clin. 2018, 20, 506–522. [Google Scholar] [CrossRef]
Tax, C.M.W.; Bastiani, M.; Veraart, J.; Garyfallidis, E.; Okan Irfanoglu, M. What’s New and What’s next in Diffusion MRI Preprocessing. NeuroImage 2022, 249, 118830. [Google Scholar] [CrossRef]
Tzourio-Mazoyer, N.; Landeau, B.; Papathanassiou, D.; Crivello, F.; Etard, O.; Delcroix, N.; Mazoyer, B.; Joliot, M. Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain. NeuroImage 2002, 15, 273–289. [Google Scholar] [CrossRef]
Rolls, E.T.; Joliot, M.; Tzourio-Mazoyer, N. Implementation of a New Parcellation of the Orbitofrontal Cortex in the Automated Anatomical Labeling Atlas. NeuroImage 2015, 122, 1–5. [Google Scholar] [CrossRef]
Rolls, E.T.; Huang, C.-C.; Lin, C.-P.; Feng, J.; Joliot, M. Automated Anatomical Labelling Atlas 3. NeuroImage 2020, 206, 116189. [Google Scholar] [CrossRef] [PubMed]
Grabner, G.; Janke, A.L.; Budge, M.M.; Smith, D.; Pruessner, J.; Collins, D.L. Symmetric Atlasing and Model Based Segmentation: An Application to the Hippocampus in Older Adults. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2006; Larsen, R., Nielsen, M., Sporring, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 58–66. [Google Scholar]
Mori, S.; Crain, B.J.; Chacko, V.P.; Van Zijl, P.C.M. Three-Dimensional Tracking of Axonal Projections in the Brain by Magnetic Resonance Imaging. Ann. Neurol. 1999, 45, 265–269. [Google Scholar] [CrossRef] [PubMed]
Kwon, H.; Choi, Y.-H.; Lee, J.-M. A Physarum Centrality Measure of the Human Brain Network. Sci. Rep. 2019, 9, 5907. [Google Scholar] [CrossRef]
Cha, J.H.; Choi, Y.-H.; Lee, J.-M.; Lee, J.Y.; Park, H.-K.; Kim, J.; Kim, I.-K.; Lee, H.J. Altered Structural Brain Networks at Term-Equivalent Age in Preterm Infants with Grade 1 Intraventricular Hemorrhage. Ital. J. Pediatr. 2020, 46, 43. [Google Scholar] [CrossRef] [PubMed]
Cui, Z.; Zhong, S.; Xu, P.; He, Y.; Gong, G. PANDA: A Pipeline Toolbox for Analyzing Brain Diffusion Images. Front. Hum. Neurosci. 2013, 7, 42. [Google Scholar] [CrossRef]
Kakkos, I.; Dimitrakopoulos, G.N.; Sun, Y.; Yuan, J.; Matsopoulos, G.K.; Bezerianos, A.; Sun, Y. EEG Fingerprints of Task-Independent Mental Workload Discrimination. IEEE J. Biomed. Health Inform. 2021, 25, 3824–3833. [Google Scholar] [CrossRef]
Toloşi, L.; Lengauer, T. Classification with Correlated Features: Unreliability of Feature Ranking and Solutions. Bioinformatics 2011, 27, 1986–1994. [Google Scholar] [CrossRef] [PubMed]
Kakkos, I.; Ventouras, E.M.; Asvestas, P.A.; Karanasiou, I.S.; Matsopoulos, G.K. A Condition-Independent Framework for the Classification of Error-Related Brain Activity. Med. Biol. Eng. Comput. 2020, 58, 573–587. [Google Scholar] [CrossRef]
Golland, P.; Liang, F.; Mukherjee, S.; Panchenko, D. Permutation Tests for Classification. In Proceedings of the Learning Theory; Auer, P., Meir, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 501–515. [Google Scholar]
Graf, R.; Zeldovich, M.; Friedrich, S. Comparing Linear Discriminant Analysis and Supervised Learning Algorithms for Binary Classification—A Method Comparison Study. Biometrical J. 2024, 66, 2200098. [Google Scholar] [CrossRef] [PubMed]
Hanley, J.A.; McNeil, B.J. A Method of Comparing the Areas under Receiver Operating Characteristic Curves Derived from the Same Cases. Radiology 1983, 148, 839–843. [Google Scholar] [CrossRef]
Pendergrass, J.C.; Targum, S.D.; Harrison, J.E. Cognitive Impairment Associated with Cancer. Innov. Clin. Neurosci. 2018, 15, 36–44. [Google Scholar]
Lange, M.; Joly, F.; Vardy, J.; Ahles, T.; Dubois, M.; Tron, L.; Winocur, G.; De Ruiter, M.B.; Castel, H. Cancer-Related Cognitive Impairment: An Update on State of the Art, Detection, and Management Strategies in Cancer Survivors. Ann. Oncol. 2019, 30, 1925–1940. [Google Scholar] [CrossRef]
Wang, L.; Hu, S.; Yao, Z.; Xue, M.; Lu, Z.; Xiao-ju, Z.; Ding, Y. Correlation between Cancer-Related Cognitive Impairment and Resting Cerebral Glucose Metabolism in Patients with Ovarian Cancer. Heliyon 2024, 10, e34106. [Google Scholar] [CrossRef] [PubMed]
Tzourio-Mazoyer, N. Intra- and Inter-Hemispheric Connectivity Supporting Hemispheric Specialization. In Micro-, Meso- and Macro-Connectomics of the Brain; Kennedy, H., Van Essen, D.C., Christen, Y., Eds.; Springer: Cham, Switzerland, 2016; ISBN 978-3-319-27776-9. [Google Scholar]
Bullmore, E.; Sporns, O. Complex Brain Networks: Graph Theoretical Analysis of Structural and Functional Systems. Nat. Rev. Neurosci. 2009, 10, 186–198. [Google Scholar] [CrossRef]
Aceña, V.; Martín de Diego, I.; Fernández, R.R.; Moguerza, J.M. Minimally Overfitted Learners: A General Framework for Ensemble Learning. Knowl.-Based Syst. 2022, 254, 109669. [Google Scholar] [CrossRef]
Jollans, L.; Boyle, R.; Artiges, E.; Banaschewski, T.; Desrivières, S.; Grigis, A.; Martinot, J.-L.; Paus, T.; Smolka, M.N.; Walter, H.; et al. Quantifying Performance of Machine Learning Methods for Neuroimaging Data. Neuroimage 2019, 199, 351–365. [Google Scholar] [CrossRef]
Zhang, Q.; Zhu, S. Visual Interpretability for Deep Learning: A Survey. Front. Inf. Technol. Electron. Eng. 2018, 19, 27–39. [Google Scholar] [CrossRef]
Montavon, G.; Samek, W.; Müller, K.-R. Methods for Interpreting and Understanding Deep Neural Networks. Digit. Signal Process. 2018, 73, 1–15. [Google Scholar] [CrossRef]
Alelyani, S. Stable Bagging Feature Selection on Medical Data. J. Big Data 2021, 8, 11. [Google Scholar] [CrossRef]
Khaire, U.M.; Dhanalakshmi, R. Stability of Feature Selection Algorithm: A Review. J. King Saud. Univ.-Comput. Inf. Sci. 2022, 34, 1060–1073. [Google Scholar] [CrossRef]
Mentzelopoulos, A.; Karanasiou, I.; Papathanasiou, M.; Kelekis, N.; Kouloulias, V.; Matsopoulos, G.K. A Comparative Analysis of White Matter Structural Networks on SCLC Patients After Chemotherapy. Brain Topogr. 2022, 35, 352–362. [Google Scholar] [CrossRef]
Manan, A.A.; Yahya, N.A.; Taib, N.H.M.; Idris, Z.; Manan, H.A. The Assessment of White Matter Integrity Alteration Pattern in Patients with Brain Tumor Utilizing Diffusion Tensor Imaging: A Systematic Review. Cancers 2023, 15, 3326. [Google Scholar] [CrossRef]
Saward, J.B.; Ellis, E.G.; Cobden, A.L.; Caeyenberghs, K. Mapping Cognitive Deficits in Cancer Patients after Chemotherapy: An Activation Likelihood Estimation Meta-Analysis of Task-Related fMRI Studies. Brain Imaging Behav. 2022, 16, 2320–2334. [Google Scholar] [CrossRef]
Mulholland, M.M.; Stuifbergen, A.; Schutz, A.D.L.T.; Rocha, O.Y.F.; Blayney, D.W.; Kesler, S.R. Evidence of Compensatory Neural Hyperactivity in a Subgroup of Chemotherapy-Treated Breast Cancer Survivors and Its Association with Brain Aging. medRxiv 2024. [Google Scholar] [CrossRef]
Oliva, G.; Giustiniani, A.; Danesin, L.; Burgio, F.; Arcara, G.; Conte, P. Cognitive Impairment Following Breast Cancer Treatments: An Umbrella Review. Oncologist 2024, 29, e848–e863. [Google Scholar] [CrossRef] [PubMed]
Xue, M.; Du, W.; Cao, J.; Jiang, Y.; Song, D.; Yu, D.; Zhang, J.; Guo, J.; Xie, X.; Xie, L.; et al. Relationship between δ-Catenin Expression and Whole-Brain Small-World Network in Breast Cancer Patients before Chemotherapy. Sci. Rep. 2024, 14, 31119. [Google Scholar] [CrossRef] [PubMed]
Tao, L.; Wang, L.; Chen, X.; Liu, F.; Ruan, F.; Zhang, J.; Shen, L.; Yu, Y. Modulation of Interhemispheric Functional Coordination in Breast Cancer Patients Receiving Chemotherapy. Front. Psychol. 2020, 11, 1689. [Google Scholar] [CrossRef]
Vasaghi Gharamaleki, M.; Mousavi, S.Z.; Owrangi, M.; Gholamzadeh, M.J.; Kamali, A.-M.; Dehghani, M.; Chakrabarti, P.; Nami, M. Neural Correlates in Functional Brain Mapping among Breast Cancer Survivors Receiving Different Chemotherapy Regimens: A qEEG/HEG-Based Investigation. Jpn. J. Clin. Oncol. 2022, 52, 1253–1264. [Google Scholar] [CrossRef] [PubMed]
Feng, Y.; Zhang, X.D.; Zheng, G.; Zhang, L.J. Chemotherapy-Induced Brain Changes in Breast Cancer Survivors: Evaluation with Multimodality Magnetic Resonance Imaging. Brain Imaging Behav. 2019, 13, 1799–1814. [Google Scholar] [CrossRef]
McDonald, B.C. Structural Neuroimaging Findings Related to Adult Non-CNS Cancer and Treatment: Review, Integration, and Implications for Treatment of Cognitive Dysfunction. Neurotherapeutics 2021, 18, 792–810. [Google Scholar] [CrossRef]
McDonald, C.R.; White, N.S.; Farid, N.; Lai, G.; Kuperman, J.M.; Bartsch, H.; Hagler, D.J.; Kesari, S.; Carter, B.S.; Chen, C.C.; et al. Recovery of White Matter Tracts in Regions of Peritumoral FLAIR Hyperintensity with Use of Restriction Spectrum Imaging. AJNR Am. J. Neuroradiol. 2013, 34, 1157–1163. [Google Scholar] [CrossRef]
Ahn, S.J.; Kwon, H.; Kim, J.W.; Park, G.; Park, M.; Joo, B.; Suh, S.H.; Chang, Y.S.; Lee, J.-M. Hippocampal Metastasis Rate Based on Non-Small Lung Cancer TNM Stage and Molecular Markers. Front. Oncol. 2022, 12, 781818. [Google Scholar] [CrossRef]
Zhang, D.-F.; Li, Z.-H.; Zhang, Z.-P.; He, Y.-F.; Shang, B.-L.; Xu, X.-F.; Ding, Y.-Y.; Cheng, Y.-Q. Cognitive Changes Are Associated with Increased Blood-Brain Barrier Leakage in Non-Brain Metastases Lung Cancer Patients. Brain Imaging Behav. 2023, 17, 90–99. [Google Scholar] [CrossRef] [PubMed]
Silverman, D.H.S.; Sleurs, C.; Gavrila Laic, R.A.; Amidi, A.; Chen, B.T.; Deprez, S.; McDonald, B.C. Neuroimaging Studies of Cognitive Dysfunction Following Cancer and Treatment. J. Clin. Exp. Neuropsychol. 2025, 1–27. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Ni, J.; Yan, F.; Yin, N.; Li, X.; Ma, R.; Wu, J.; Zhou, G.; Feng, J. Functional Changes of the Prefrontal Cortex, Insula, Caudate and Associated Cognitive Impairment (Chemobrain) in NSCLC Patients Receiving Different Chemotherapy Regimen. Front. Oncol. 2022, 12, 1027515. [Google Scholar] [CrossRef] [PubMed]
Ivanoska, I.; Trivodaliev, K.; Kalajdziski, S.; Zanin, M. Statistical and Machine Learning Link Selection Methods for Brain Functional Networks: Review and Comparison. Brain Sci. 2021, 11, 735. [Google Scholar] [CrossRef]
Tsegaye, B.; Snell, K.I.E.; Archer, L.; Kirtley, S.; Riley, R.D.; Sperrin, M.; Van Calster, B.; Collins, G.S.; Dhiman, P. Larger Sample Sizes Are Needed When Developing a Clinical Prediction Model Using Machine Learning in Oncology: Methodological Systematic Review. J. Clin. Epidemiol. 2025, 180, 111675. [Google Scholar] [CrossRef]

Figure 1. Schematic representation of the methodological pipeline. Upon data acquisition, structural networks were extracted for each participant, linking 90 (45 R +45 L) regions by quantifying whole-brain deterministic tractography data. As such, the initial feature space includes the connections selected after applying the Ν ≥ 3 threshold for fiber tracts. This was further reduced by maintaining only features with non-zero values in at least 75% of the subjects, and by applying the RFE-CBR method described in Section 2.4. Subsequently, classification results were elicited for the optimal feature subset, using leave-one-out cross-validation supported by permutation testing.

Figure 2. Summary of the 185 structural connections remaining after the threshold. The circular plot displays individual connections between different ROIs. Connections have been color-coded to mark left intra-hemispheric, right intra-hemispheric, and inter-hemispheric connections. The box colors correspond to the centrality degree of each ROI.

Figure 3. Connectivity matrix showing connections between brain areas, with each color denoting the number of between-area connections.

Figure 4. In the upper left corner, the confusion matrix of the Gaussian SVM is presented. The performance of the Gaussian SVM (upper middle), linear SVM (upper left), k-NN (bottom left), LDA (bottom middle), and RF (bottom right) models is illustrated through the ROC curves.

Figure 5. The optimal features regarding (a) the individual feature mean value for the HC and SCLC group. At the top, the left-to-left hemisphere, while at the bottom, the right-to-right (left corner) hemisphere and interhemispheric (left) connections; (b) the distribution of the selected connectivity features. The color of each connection represents the difference in the mean value of the connectivity strength between HC and SCLC. The color bar on the bottom right indicates the connectivity alteration from HC to SCLC.

Table 1. Classification performance and validation results of optimal feature subset using different classification methods.

Classifier	Accuracy	Sensitivity	Specificity	F1-Score	Area Under the ROC Curve
SVM-gaussian	0.92 ¹	0.93	0.91	0.92	0.94 ¹
SVM-linear	0.82 ¹	0.79	0.86	0.82	0.93 ¹
k-NN	0.77 ²	0.71	0.82	0.76	0.92 ¹
LDA	0.78 ²	0.79	0.77	0.78	0.92 ¹
RF	0.75 ²	0.73	0.76	0.74	0.85 ²
Soft-Voting Ensemble	0.86 ¹	0.88	0.85	0.86	0.90 ¹

¹ p < 0:001, 1000 permutations; ² p < 0:01, 1000 permutations.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

DTI-Based Structural Connectome Analysis of SCLC Patients After Chemotherapy via Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Subjects

2.2. Data Acquisition and Preprocessing

2.3. Structural Connectivity Estimation

2.4. Network-Based Feature Selection

2.5. Classification

2.6. Feature-Related Performance Validation

3. Results

3.1. White Matter Connectivity Patterns

3.2. Classification Performance and Feature Robustness

3.3. ROC-Based Analytic Power Estimation

3.4. Interpretation of Discriminative Predictors

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Demographics

Appendix B. AAL Regions of Interest (ROIs)

Appendix C. ROC-Based Analytic Power Estimation

References

Article Metrics

Citations

Article Access Statistics