Papillary-Muscle-Derived Radiomic Features for Hypertrophic Cardiomyopathy versus Hypertensive Heart Disease Classification

Purpose: This study aimed to assess the value of radiomic features derived from the myocardium (MYO) and papillary muscle (PM) for left ventricular hypertrophy (LVH) detection and hypertrophic cardiomyopathy (HCM) versus hypertensive heart disease (HHD) differentiation. Methods: There were 345 subjects who underwent cardiovascular magnetic resonance (CMR) examinations that were analyzed. After quality control and manual segmentation, the 3D radiomic features were extracted from the MYO and PM. The data were randomly split into training (70%) and testing (30%) datasets. Feature selection was performed on the training dataset. Five machine learning models were evaluated using the MYO, PM, and MYO+PM features in the detection and differentiation tasks. The optimal differentiation model was further evaluated using CMR parameters and combined features. Results: Six features were selected for the MYO, PM, and MYO+PM groups. The support vector machine models performed best in both the detection and differentiation tasks. For LVH detection, the highest area under the curve (AUC) was 0.966 in the MYO group. For HCM vs. HHD differentiation, the best AUC was 0.935 in the MYO+PM group. Comparing the radiomics models to the CMR parameter models for the differentiation tasks, the radiomics models achieved significantly improved the performance (p = 0.002). Conclusions: The radiomics model with the MYO+PM features showed similar performance to the models developed from the MYO features in the detection task, but outperformed the models developed from the MYO or PM features in the differentiation task. In addition, the radiomic models performed better than the CMR parameters’ models.


Introduction
Among the various causes of left ventricular hypertrophy (LVH), hypertrophic cardiomyopathy (HCM) and hypertensive heart disease (HHD) are the most-common. However, the detection of LVH and the differentiation between HCM and HHD are often clinically challenging, especially when the hypertrophy is mild to moderate [1]. Meanwhile, the prognosis of patients with HCM varies. In severe and untreated HCM patients, complications include heart failure or even sudden cardiac death [2,3]. For HHD patients, LVH is typically observed in those with long-term poorly controlled hypertension, and without proper treatment, this can progressively develop into heart failure [4]. More importantly, the treatments for these two diseases are quite different [5]. Therefore, based on accurate LVH detection, the early and precise differential diagnosis between HCM and HHD is crucial for treatment plan decisions and prognosis improvement [2].  together in an HC subject (visualization was performed with a 3D slicer); (c) described methods used in the feature selection; (d) shows how the data partition was performed; (e) described ML pipeline: selected MYO and PM features were evaluated for detection and differentiation performance with different ML methods; (f) results were compared between different groups and ML methods, and the evaluation was performed with the ROC curve, calibration curve, and decision curve. HCM, hypertrophic cardiomyopathy; HHD, hypertensive heart disease; VOI, volume of interest; LASSO, least absolute shrinkage and selection operator; ML, machine learning; MYO, myocardium; PM, papillary muscle; LVH, left ventricular hypertrophy.

Study Population
In this retrospective study, patients with LVH were consecutively enrolled in Renji hospitals from August 2018 to December 2021, and HC subjects were randomly selected from our database.
The inclusion criteria for the HHD group were as follows: (1) echocardiogram (ECG) demonstration of a hypertrophic LV (maximal LV wall thickness > 11 mm or LV-mass-tobody-surface area > 115 g/m 2 for men or > 95 g/m 2 for women) in the absence of other cardiac or systemic diseases) [20]; (2) diagnosis of arterial hypertension [21].
The HC group consisted of healthy volunteers who demonstrated normal cardiac dimensions and volumes, normal cardiac function, and the absence of late gadolinium enhancement. None of the control subjects had a history of known cardiac disease, including cardiac surgery or interventions.
Exclusion criteria for all subjects were an established diagnosis of FD, cardiac amyloidosis, severe valvular disease, aortic stenosis, iron deposition, evidence of inflammatory processes in the myocardium or pericardium, history of ST segment elevation myocardial infarction, and LVH caused by athlete's heart [22].
The CMR data were exported in the Digital Imaging and Communications in Medicine (DICOM) format. Anonymization and CMR function analyses were performed on a workstation with dedicated post-processing software cvi42 (Version 5.13.0, Albert, Canada). All images were subjected to manual quality control to exclude those of inadequate quality before further assessment. Basic CMR parameters that summarize LV function (including LV mass, LV end-diastolic volume (LVEDV), and LV ejection fraction (LVEF)) were obtained using the cvi42.

Definition of Volume of Interest
Limited by copyright, we were not able to export contours directly from the cvi42 software. To export the MYO and PM contours, an open-source software itk-snap (Version 3.8.0) was used for the following analysis. All images were segmented simultaneously by two cardiologists (>4 years of CMR experience) in consensus. After that, another experienced cardiologist (>5 years of CMR experience) checked the contours and made adjustments if necessary. In this study, only ED cine images were considered for analysis due to ambiguous border between the endocardium and PM in end systole. The VOI of the MYO was defined as the area between the LV endocardium and epicardium [24]. For the VOI of the PM, we delineated the PM contours according to anatomical characteristics, movement pattern, and previous segmentation examples [12,25]. non-PM structures (e.g., papillary muscle variants, apical-basal muscle bundle etc.), and trabeculations were excluded [9,26].
Supplementary Figure S1 shows a representative apical-basal muscle bundle, which is hard to distinguish from the PM with short-axis cine images solely in our datasets.

Feature Extraction and Feature Selection
Image filters (wavelet, Laplace of Gaussian (LOG), and gradient) were applied to our datasets, and 3D radiomic features were extracted for both the MYO and PM VOIs using Pyradiomics [27]. The detailed description and calculation equation are available on the Pyradiomics official website: https://pyradiomics.readthedocs.io/en/v3.0.1/radiomics. html, accessed on 1 October 2022.
After feature extraction, the data were randomly split into training and testing datasets in a 7:3 ratio.
Feature selection was performed on the training dataset with 3 steps: (1) The Pearson correlation coefficient (ρ) was calculated for each feature; features with ρ > 0.8 were excluded; (2) a least absolute shrinkage and selection operator (LASSO) regression algorithm was employed to select important features [28]; (3) features were ranked using the Boruta method [29].

Model Development
Five machine learning classifiers were evaluated: AdaBoost (AB), K-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and decision tree (DT). A fivefold cross-validation (CV) with a grid search was performed within the training dataset to determine the best combination of model parameters. The performance of the different models was compared based on the area under the curve (AUC), accuracy, precision, recall, and calibration curve.

Evaluation of MYO and PM Features
For the multi-feature analysis, we aimed to assess the value of the PM features. Therefore, three radiomics groups were designed: (1) the MYO group, including the best 2 × N MYO features; (2) the PM group, including the best 2 × N PM features; (3) the MYO+PM group, including the best N MYO and N PM features. Subsequently, different machine learning (ML) models were tested and evaluated in the three groups.

Comparison of Radiomic Models and Clinical Data Models
To compare our proposed radiomics models with clinical data models, an optimal ML model was selected and developed using CMR parameters (LVEF, LVEDV index, and LV mass index). We also combined the CMR parameters and radiomic features to examine the possible synergistic effects between CMR parameters and radiomics information. The development and evaluation processes were the same as those previously described.

Statistics
All statistical analyses were performed using SPSS (Version 26) and Python (Version 3.7). To compare the means, Student's T-test, the paired T-test, and the Mann-Whitney U-test were conducted as appropriate. Statistical significance was set at p < 0.05. Model performance was evaluated using the receiver operating characteristic (ROC) curves and AUC, accuracy, precision, and recall, and a comparison between the AUCs was performed with the Delong test [30]. The Hosmer-Lemeshow test was used for the calibration curve evaluation [31]. Weights were calculated for the HC, HCM, and HHD groups, where appropriate, to compensate the unbalanced datasets [32].

Demographic and CMR-Based Clinical Characteristics
The detailed inclusion and exclusion pipeline is shown in Figure 2. A total of 230 LVH (NHCM =158, NHHD = 72) patients who satisfied the inclusion criteria and passed quality control were included. Subjects in the HC group were matched with patients with LVH in terms of age and sex (p = 0.866 and 0.364, respectively) at a 1:2 ratio. Table 1 shows the demographic and CMR data of the cohort.
For the CMR parameters, subjects with LVH showed a significantly increased LV mass (144.3 g vs. 83.2 g, p < 0.001) and LV mass index (80.0 g/m 2 vs. 46.3 g/m 2 , p < 0.001). However, for the HHD group, we found that their LVEFs were lower than those in the HCM and HC groups (55% vs. 68% and 65%, respectively, p < 0.001 for both groups), whereas the LVEDVs were higher than those in the HCM and HC groups (163 mL vs. 130 mL and 126 mL, respectively, p < 0.001 for both groups). This phenomenon could be partly explained by the fact that HHD patients usually have had hypertension for many years; therefore, some patients already had an enlarged left ventricle. Figure 2. This figure shows the inclusion and exclusion of LVH and HC subjects. All subjects were obtained from our datasets between 2018 and 2021. For the LVH group, firstly, images with inadequate quality were excluded. Secondly, subjects with myocardial infarction, systemic diseases, or cardiac surgery were further excluded. For the HC group, images with satisfactory quality were matched to the LVH group by sex and age at a 1:2 ratio.

Feature Extraction and Selection
A total of 2632 features (N MYO = 1316 and N PM = 1316) were extracted; the detailed feature information is available in Supplementary Table S1. After correlation and LASSO feature selection, 60 features (N MYO = 28, N PM = 32) survived; meanwhile, for the detection task, 20 features (N MYO = 8, N PM = 12) survived, and the features were ranked using relative importance calculated from the Boruta method. Table 2 shows the six most-important features for the differentiation tasks derived from the MYO and PM. The gradient graylevel co-occurrence matrix (GLCM) correlation was the most-important in the MYO group, and the shape maximum 2D diameter slice was the most-important in the PM group. Complete lists of the detection and differentiation tasks are provided in Supplementary Tables S2 and S3.

Multi-Feature Analysis of MYO, PM, and MYO+PM Groups
For both the detection and differentiation tasks, we used N = 3 for further analysis; therefore, we had six features in the MYO, PM, and MYO+PM groups. Among the six features in the MYO+PM group, there were three shape features (MYO: shape sphericity and shape elongation; PM: shape maximum 2D diameter slice).
For the detection task, different model performances on the training dataset are shown in Figure 3. After comparing the AUC, accuracy, and calibration curves (Supplementary Figure S2a), the SVM models were selected. The performance of the SVM models with the testing dataset is presented in Table 3. Among the MYO, PM, and MYO+PM groups, the model developed with the MYO features achieved the highest AUC (0.966 vs. 0.944 (MYO+PM) and 0.772 (PM), p = 0.924 and p < 0.001, respectively); however, accuracy, precision, and recall were slightly higher in the MYO+PM group. The results indicated the MYO features had a similar efficacy as the MYO+PM features in the detection task.
For the differentiation task, the z-score distribution of the MYO+PM group is shown in Figure 4. The performances of the different models on the training dataset are shown in Figure 3, and the calibration curves are shown in Supplementary Figure S2b. Similar to the detection task, SVM was selected for further analysis. The results of the SVM model showed that the MYO+PM group had a significantly higher AUC (0.935 vs. 0.875 (MYO) and 0.716 (PM), p = 0.040 and 0.002, respectively), and the accuracy showed a 4.4% increase (87.0% vs. 82.6%). The ROC and calibration curves for the differentiation task with the SVM models are plotted in Figure 5, which shows excellent calibration results (all p > 0.05, indicating excellent calibration results) for both the training and testing data. The clinical usefulness of the MYO, PM, and MYO+PM models is shown as decision curves in Figure 6.

Comparison of Radiomics Models to CMR Parameter Models
As shown in Table 4, the radiomics model showed significant improvements compared with the CMR parameter models (AUC: 0.935 vs. 0.774, p = 0.002), and no significant improvement was observed when comparing the radiomics + CMR parameter model with the radiomics model (AUC: 0.935 vs. 0.906, p = 0.117). We also observed that the CMR parameters group showed biased results (precision = 0.693, recall = 0.409, and F1 = 0.474) in the differentiation task. The calibration curves for the models developed with the CMR parameters, radiomics, and radiomics + CMR parameters are exhibited in Figure 7. The clinical usefulness of the radiomics, CMR, and radiomics+CMR models is shown as the decision curves in Figure 6.

Discussion
In this study, we investigated the LVH detection and HCM vs. HHD differentiation ability of the MYO, PM, and MYO+PM radiomic features; we also compared the MYO+PM group's results with CMR parameters group's results.

Summary of Main Findings
The main findings of our study were as follows: • The MYO and MYO+PM groups showed great LVH detection based on the AUC and accuracy; • The MYO+PM group outperformed the MYO group on the HCM vs. HHD differentiation task; • Our proposed radiomics models showed significantly better performance than the CMR parameter models. • Our methods showed excellent calibration results and high clinical usefulness, as shown by the calibration curves and decision curves.
The main findings are summarized in Table 5. Bold numbers indicate the highest AUC and accuracy with the selected feature group.

Discussion Based on Results
Radiomics analysis has been widely applied for the differentiation of cardiomyopathies. However, to our knowledge, this is the first study focusing on the radiomics of the PM. Previous radiomics studies on cardiomyopathy classification were mainly based on the MYO features. Ulf et al. extracted radiomic features from native T1 MYO and selected six texture features; their final model achieved a 0.80 accuracy and 0.89 AUC on the test data [24]. Xu et al. combined deep learning (DL) and radiomics methods for ECG-based LVH aetiology differentiation; their final results showed an AUC of 0.839 for HCM vs. HHD classification [33]. Izquierdo et al. compared their radiomics models to CMR-indexbased models and found no significant differences [19]. Although the model performance varied with the datasets, in this study, our models showed satisfactory HCM vs. HHD classification results (accuracy = 87.0% vs. 80.0%, AUC = 0.94 vs. 0.89) compared with previous studies. Using the CMR parameter model as the baseline, our radiomics model showed significant improvement (AUC:0.935 vs. 0.774, P = 0.002). Based on these results, we demonstrated the effectiveness and superiority of the proposed radiomics models.
In addition, our models achieved excellent results with limited features (three MYO + three PM features), and the number of features selected was also consistent with several previous studies [24,34]. Among the six features selected, three belonged to the shape feature class (MYO: shape sphericity and shape elongation; PM: shape maximum 2D diameter slice), which made our concise MYO+PM radiomics model more explainable. Radiologically, hypertrophy in HCM patients usually show a concentric pattern, while the hypertrophy of HHD is mainly affected by high blood pressure, which made "MYO sphericity" and "MYO elongation" more reasonable from a clinical perspective. "PM maximum 2D diameter slice" represented the largest distance between the surface mesh vertices in the slice level, which could reflect the different distribution patterns of the PM in HCM and HHD. These results also validated our observation that the PM's morphology can facilitate the classification between HCM and HHD.
From a clinical point of view, no diagnosis relies purely on radiomic features; therefore, we also developed CMR parameter models and radiomics + CMR combined models. Consistent with a previous study, our CMR-parameter-based model showed unsatisfactory prediction accuracy of only 81.7% and 71.0% on LVH detection and HCM vs. HHD differentiation, respectively [1]. After the incorporation of radiomic features, the performance improved in both tasks, but for HCM vs. HHD, the AUC and accuracy were still worse than the models using solely radiomic features (Table 5).

Technical Perspectives
Despite these promising findings, we considered the possibility of over-fitting. Overfitting is a common problem in radiomics studies. Therefore, we implemented calibration curves for different ML models to facilitate both model selection (Supplementary Figure S2) and model performance evaluation (Figures 5 and 7). Although the SVM and KNN methods both showed excellent calibration results, the SVM models showed higher performance than the KNN models in both the detection and differentiation tasks ( Figure 4). Therefore, the SVM models were selected for further analysis. The following experimental results validated that the SVM models performed well and did not show an over-fitting tendency with the six radiomic features as the inputs ( Figure 5).
From a technical perspective, to deal with unbalanced datasets, class weights were calculated for each group when appropriate. Although we did not compare the performance of our "balanced" dataset with that of an unbalanced dataset, our results showed the following: except for the PM group, the MYO and MYO+PM groups exhibited relatively high precision (all > 0.8) and recall (all > 0.75) on both the training ( Figure 3) and testing (Tables 3 and 4) datasets.

Clinical Perspectives
The implementation of both the MYO and MYO+PM features showed excellent LVH detection performance; however, the MYO+PM group outperformed the MYO group on the HCM vs. HHD differentiation (p = 0.040). We also validated that the incorporation of the CMR parameters into the radiomic information did not facilitate HCM vs. HHD differentiation. Our results suggest to clinicians that the PM is a potential useful diagnostic tool for LVH. In addition, we suggest that the calibration curve should be examined for every machine learning algorithm, if appropriate.

Limitations
This study had some limitations. First is the immature DL-based PM automatic segmentation results of the PM (DSC = 0.79 in previous studies) [25,35]. In this study, we performed manual segmentation and quality control of selected VOIs. To increase the segmentation accuracy and minimize interobserver variance, two cardiologists completed the segmentation together in consensus. Although time consuming, accurate manual segmentation is crucial for study reliability and repeatability. We also look forward to developing a precise and stable DL segmentation model for further implementation of deep PM radiomics studies.
Second, LVH is not a singular disease entity. The different pathologies of LVH exhibit various clinical behaviors, treatment responses, and prognoses. HCM and HHD are two most-common causes of LVH. Therefore, only patients with HCM or HHD were included in this study. If the sample size permits, we would also like to validate our methods using other available LVH-related disease datasets.

Conclusions
To the best of our knowledge, this is the first study on PM-derived radiomic features. Our results showed that the proposed PM radiomics exhibited excellent performance in LVH detection and HCM vs. HHD differentiation and outperformed clinical data models. Moreover, through our experiments, we hypothesized that PM radiomics have great potential for the classification of other cardiac diseases and could facilitate clinicians' clinical decision-making.

Supplementary Materials:
The following Supporting Information can be downloaded at: https:// www.mdpi.com/article/10.3390/diagnostics13091544/s1: Figure S1: An example of apical basal muscle bundle.; Figure S2: Comparison of models using calibration curves.; Table S1: Full list of extracted features.; Table S2: Full list of features ranked using Boruta method for detection task.; Table S3: Full list of features ranked using Boruta method for differentiation task. Refs. [27,36] are cited in the supplementary materials.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
No new data were created, and the data are unavailable due to privacy or ethical restrictions.

Acknowledgments:
We would like to thank Chongwen Wu in Shanghai Renji Hospital for technical support.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; nor in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: