Machine Learning Based on Computed Tomography Pulmonary Angiography in Evaluating Pulmonary Artery Pressure in Patients with Pulmonary Hypertension

Background: Right heart catheterization is the gold standard for evaluating hemodynamic parameters of pulmonary circulation, especially pulmonary artery pressure (PAP) for diagnosis of pulmonary hypertension (PH). However, the invasive and costly nature of RHC limits its widespread application in daily practice. Purpose: To develop a fully automatic framework for PAP assessment via machine learning based on computed tomography pulmonary angiography (CTPA). Materials and Methods: A machine learning model was developed to automatically extract morphological features of pulmonary artery and the heart on CTPA cases collected between June 2017 and July 2021 based on a single center experience. Patients with PH received CTPA and RHC examinations within 1 week. The eight substructures of pulmonary artery and heart were automatically segmented through our proposed segmentation framework. Eighty percent of patients were used for the training data set and twenty percent for the independent testing data set. PAP parameters, including mPAP, sPAP, dPAP, and TPR, were defined as ground-truth. A regression model was built to predict PAP parameters and a classification model to separate patients through mPAP and sPAP with cut-off values of 40 mm Hg and 55 mm Hg in PH patients, respectively. The performances of the regression model and the classification model were evaluated by analyzing the intraclass correlation coefficient (ICC) and the area under the receiver operating characteristic curve (AUC). Results: Study participants included 55 patients with PH (men 13; age 47.75 ± 14.87 years). The average dice score for segmentation increased from 87.3% ± 2.9 to 88.2% ± 2.9 through proposed segmentation framework. After features extraction, some of the AI automatic extractions (AAd, RVd, LAd, and RPAd) achieved good consistency with the manual measurements. The differences between them were not statistically significant (t = 1.222, p = 0.227; t = −0.347, p = 0.730; t = 0.484, p = 0.630; t = −0.320, p = 0.750, respectively). The Spearman test was used to find key features which are highly correlated with PAP parameters. Correlations between pulmonary artery pressure and CTPA features show a high correlation between mPAP and LAd, LVd, LAa (r = 0.333, p = 0.012; r = −0.400, p = 0.002; r = −0.208, p = 0.123; r = −0.470, p = 0.000; respectively). The ICC between the output of the regression model and the ground-truth from RHC of mPAP, sPAP, and dPAP were 0.934, 0.903, and 0.981, respectively. The AUC of the receiver operating characteristic curve of the classification model of mPAP and sPAP were 0.911 and 0.833. Conclusions: The proposed machine learning framework on CTPA enables accurate segmentation of pulmonary artery and heart and automatic assessment of the PAP parameters and has the ability to accurately distinguish different PH patients with mPAP and sPAP. Results of this study may provide additional risk stratification indicators in the future with non-invasive CTPA data.


Introduction
Pulmonary hypertension (PH) is a malignant pulmonary circulation disease characterized by two aspects: a progressive pulmonary artery pressure increases and a poor natural prognosis. Untreated PH can lead to right ventricular failure due to hypertrophy and remodeling of the right ventricle [1,2]. Further, the median survival in patients with PH without therapy is approximately 2.8 years [2]. Given that PH is a chronic and progressive disease, it should be diagnosed and intervened as early as possible.
PH is defined as a resting mean pulmonary artery pressure (mPAP) of 25 mm Hg or above, and critical PH is defined as 20-25mm Hg [1,3]. Currently, right heart catheterization (RHC) is the only way to measure mPAP accurately. Therefore, a diagnosis of PH is only accepted as confirmed (or excluded) by RHC [4].
A non-invasive and effective method to diagnose PH is essential [5]. Although RHC is the most reliable way to establish the diagnosis of PH, as an invasive procedure, it might be delayed due to its potential complications [6]. In addition, this invasive procedure requires anesthesia so it is not suitable for early screening, and not all the medical institutions have the requirement for RHC inspection. As a non-invasive screening method, which has the advantage of providing more details in high-resolution 3D images [7], computed tomography pulmonary angiography (CTPA) can be routinely used to observe the structure of pulmonary blood vessels, pulmonary parenchyma, and the heart to analyze or exclude possible PH causes. The morphological features in CTPA images serve as important references to assist clinicians in diagnosing PH. Assuming that the morphological features hidden in CTPA images are further excavated, we might be able to build a regression model to predict the PAP value based on CTPA images.
Using CTPA images to assist physicians in risk stratification of patients in PH is another meaningful study. Previous studies have shown that chronic obstructive pulmonary disease (COPD) is one of the most common causes of PH [8,9]. Although patients with COPD often present mild or moderate PH, typically with mPAP between 20 and 40 mm Hg, mild mPAP elevations can also lead to poor prognosis [9]. Patients with severe PH are defined as having a resting mPAP > 35 to 40 mm Hg, and a sharp increase in mPAP often leads to right ventricular failure. Moreover, these patients often have severe hypoxemia and exhibit hypocapnia, reducing their life expectancy [10]. In addition, Nadrous et al. [11] proved that idiopathic pulmonary fibrosis patients with pulmonary artery systolic pressure (sPAP) of >50 mm Hg had a higher mortality with evidence of 1-year and 3-year mortalities of 56% and 68%. Further, in patients with ≥moderate primary mitral regurgitation [12], the risk of adverse events is higher for patients of SPAP ≥ 55 mmHg. Another study pointed out that in patients with asymptomatic primary mitral valve regurgitation or flail leaflet, the prognostic value of peak exercise sPAP ≥ 50 mm Hg is significant [13]. Therefore, it is of great significance to explore the possibility to classify patients in PH based on mPAP of 40 mm Hg and sPAP of 55 mm Hg by non-invasive measurements in order to carry out risk stratification of PH patients.
Artificial intelligence (AI) has great potential in cardiovascular function assessment. One of the major applications of AI in this field is to assess functional parameters from morphological features in CTPA of patients, which includes constructing regression models and classification models to help with the diagnosis of PH. Liu et al. [14,15] has demonstrated that some specific cardiovascular parameters of CTPA could be the predictors to assess chronic thromboembolic pulmonary hypertension (CTEPH) severity and right ventricular function. Melzig et al. [16] pointed out that some morphological parameters of CTPA such as 3D volumes are valuable in noninvasively estimating pulmonary artery functional parameters, which means that CTPA data are beneficial for noninvasively predicting PH.
Dong et al. [17] claimed that several morphological parameters of CTPA performed well in evaluating the severity of right ventricular dysfunction. Beyond the pulmonary field, in the study that shows the performance of coronary CT angiography-derived fractional flow reserve (CT-FFR) in diagnosing ischemia in myocardial bridging (MB), Zhou et al. [18] found that machine learning-based coronary CT-FFR showed good diagnostic performance compared to invasive FFR.
In this study, aiming to evaluate the value of CTPA morphological data analysis in the diagnosis of PH, we proposed a methodology to extract and select the morphological features from CTPA images, and then build regression and classification models based on machine learning methods. Using this methodology, we implemented a workflow for the auxiliary diagnosis of PH based on the analysis of CTPA images.

Materials and Methods
The overall workflow of the proposed method is shown in Figure 1. After the acquisition and selection of the study population, CTPA data and RHC data of the corresponding patients were both collected and recorded. Then, four main steps were followed to finish the work. First, we accomplished the annotation task via computer software and the segmentation of eight selected substructures of the heart and pulmonary from the CTPA images using a proposed segmentation framework. After that, we extracted the morphological features and their second-order features within the substructures for subsequent analysis. To establish a better regression model to predict the mPAP, sPAP, pulmonary artery diastolic pressure (dPAP), and total pulmonary resistance (TPR) of a specific patient, we selected some morphological features which are highly correlated with the pulmonary arterial pressure and reduced the dimensions of the selected features matrix. Then, we built a regression model to predict the PAP parameters and implemented a series of statistical analysis to verify the consistency between predicted and true values. In addition, we used 10-fold cross-validation to establish a classification model to determine whether a patient has severe pulmonary arterial pressure or not based on mPAP and sPAP.

Patients
A total of 55 patients with PH were retrospectively enrolled in this study between June 2017 and July 2021 from a single tertiary center of Beijing Anzhen Hospital. PH was diagnosed as a mean pulmonary artery pressure ≥25 mm Hg at rest as evaluated by right-heart catheterization (RHC) [19]. The PH types were defined as the 5-classification standard [20]. Patients who received CTPA and right-heart catheterization exam within 1 week were included. Demographic and clinical data were retrieved from each patient's electronic medical record. Exclusion criteria included the following: (i) the interval of CTPA and RHC exam longer than 1 week; (ii) inferior image quality of CTPA; and (iii) inadequate clinical data acquired from the electronic medical record system. The study was approved by the local institutional ethics review committee. Informed consent was waived because of the retrospective study.

Imaging Protocol
A 320-row CT system (Aquilion ONE, Toshiba, Otawara, Japan) was used in all patients for CTPA scanning. All the exams were performed with non-ECG-gated helical scan protocol. Patients were positioned supine and feet first into the gantry. Dual scanograms were used for determination of the anatomical coverage. The volume was placed to cover the entire lung fields from the pulmonary apex to the posterior costophrenic angle. Each volume CTPA data acquisition was acquired with a single breath-hold. The CT gantry rotation time was 330 ms. The tube voltage was 100-120 kV; effective tube current was 200-300 mA adjusted by personal body mass index (BMI). The collimation was 0.625 mm; pitch was 0.99. All the data were reconstructed using a standard soft-tissue and lung kernel (FC56). Images were reconstructed with slice thickness of 0.9 mm, interval of 0.45 mm. Figure 1. The overall workflow. After the selection and exclusion of patient population and CTPA images acquisition, eight substructures of heart and pulmonary arteries were automatically segmented as regions of interest (ROIs). Morphological features and their second-order features are extracted from these ROIs. Then, the essential features were selected through statistical methods and then reconstructed to reduce the redundancy of the features matrix. In the last step, meaningful features and corresponding results are set as an input through machine learning algorithms to obtain a regression model and a classification model. A total of 40-50 mL contrast medium (Omnipaque 350, GE Healthcare, Shanghai, China) was intravenously injected by using a dual-head power injector with the injection rate of 3.5-4.5 mL/s adjusted according to BMI and the CT data acquisition time. A saline chaser bolus of 30 mL was injected with the same rate as the contrast medium. A region of interest was placed at the level of the main pulmonary artery for bolus tracking. The exposure was triggered with a 5 s delay after the 150 HU threshold was reached.

Right-Heart Catheterization
Right-heart catheterization was performed with the Seldinger technique. Under X-ray fluoroscopic guidance, an 8F Swan-Ganz catheter (Baxter Healthcare, Irvine, CA, USA) was induced through the right internal jugular vein. After 10 min rest, the hemodynamic parameters including sPAP, dPAP, pulmonary capillary wedge pressure (PCWP), and cardiac output (CO) were obtained at end-expiration. The mPAP and pulmonary vascular resistance (PVR) were calculated.

Data Standardization and the Localization of Heart and Pulmonary Artery
CTPA images were all automatically cropped into 512 × 512 × 480 pixels, which included the full volume from the top of the main pulmonary artery to the bottom of the right ventricle.

Image Segmentation
Prior to training our model, the dataset was randomly split into training (80%; 44 of 55) and test (20%; 11 of 55) sets. After data preprocessing, the training set was further divided into training (80%; 35 of 44) and validation (20%; 9 of 44) sets. We adopted a deep learning framework referred to as a nnU-Net [22] to perform segmentation on the preprocessed data. Then, we selected a full resolution 3D U-Net architecture to perform segmentation. Using this architecture, eight substructures of the lungs and heart were segmented, with four substructures for each part. Therefore, two independent 3D U-Net networks were trained for segmentation. The convolutional neural network architecture is shown in Figure 2. The batch-size is 2 and the patch-size is 128 × 128 × 128. In addition, the loss function is a combination of the cross-entropy loss function and the Dice loss function.
However, some of the outputs were larger or smaller than the ground-truth due to the low contrast between the segmentation part and the adjacent structures, which is mainly caused by the lack of contrast agent. To optimize these segmentation results, we proposed a framework to create a better mask for poor results. First, we selected the axiomatic good results as positive masks and the others as negative masks. Second, we expanded and contracted every mask slightly to obtain a group of masks including the current, the expanded, and the contracted one. Then, the image intensities of the mask contours were extracted and flattened into a one-dimension vector. After zero padding, the vectors were concatenated and then fitted into a self-attention module followed by a fully connected neural network which determined whether the input was a positive mask or not. The program would continuously resize the input mask until the output became positive mask. Finally, all the segmentations were used to automatically and reliably measure the morphological features, thus putting them into predictive models. 55) and test (20%; 11 of 55) sets. After data preprocessing, the training set was further divided into training (80%; 35 of 44) and validation (20%; 9 of 44) sets. We adopted a deep learning framework referred to as a nnU-Net [22] to perform segmentation on the prepro cessed data. Then, we selected a full resolution 3D U-Net architecture to perform segmen tation. Using this architecture, eight substructures of the lungs and heart were segmented with four substructures for each part. Therefore, two independent 3D U-Net networks were trained for segmentation. The convolutional neural network architecture is shown in Figure 2. The batch-size is 2 and the patch-size is 128 × 128 × 128. In addition, the loss function is a combination of the cross-entropy loss function and the Dice loss function.

The Extraction of Morphological and Second-Order Features
Since it is proven that pulmonary artery and ventricles are strongly correlated with pulmonary arterial hypertension, the morphological features of all the eight substructures and their second-order features may contain direct correlations to the medical metrics such as the mean pulmonary artery pressure. Therefore, we obtained as many morphological features and their second-order features from the segmentations as possible. Liu et al. [14] demonstrated that some specific morphological parameters of CTPA could be used to assess chronic thromboembolic pulmonary hypertension (CTEPH).

Feature Selection and Dimension Reduction
As shown in Figure 3, the following four methods were used to reduce data redundancy and select meaningful morphological features and second order features. First, the Spearman test was used to pick up significant features. Second, the Pearson correlation coefficient r between any two features was calculated. When r > 0.9, the feature with a lower Pearson correlation coefficient with the dependent variable was removed. Third, we normalized the features to a range of data with mean 0 and variance 1. This can be implemented with StandardScaler in Python scikit-learn environment. Finally, the principal components analysis was adopted to reduce the dimension of the features in an unsupervised way. Therefore, we selected reasonable morphological features and second-order features.

Regression and Classification Model Construction
We used statistical analysis to compare the manual and automated measurement methods and to verify the correlation of the two measurements. At the same time, we also compared the performance of different models on AI automatic measurement features, and finally selected the XGBoost model to complete regression and classification. After

Regression and Classification Model Construction
We used statistical analysis to compare the manual and automated measurement methods and to verify the correlation of the two measurements. At the same time, we also compared the performance of different models on AI automatic measurement features, and finally selected the XGBoost model to complete regression and classification. After comparing with the SVM model and CatBoost model, the features of 55 patients were used as input for the XGBoost [23] regression and classification model. The PAP parameters the regression model aimed to predict included mPAP, sPAP, dPAP, and TPR, while the classification model only used mPAP and sPAP to stratify the risk of patients with PH. The max depth of the model is set as 3. We completed 20 times ten-fold cross validation to ensure the validity of the results stratified on patient level [24]. Finally, we proved that our workflow could assess functional parameters through CTPA to help the auxiliary diagnosis of PAH.

Statistics Analysis
All statistical analyses were performed using SPSS ver. 27.0 (SPSS Inc., Chicago, IL, USA). We used dice score to evaluate the performance of the proposed segmentation framework compared with the nnU-Net. After that, we compared the AI automatic measurements and manual measurements through Paired t-test and Bland-Altman analyses. Then, the correlation of morphological features with mPAP, sPAP, dPAP, and TPR was determined by the Spearman test. After that, the statistical differences of selected features were determined by the Pearson test. Using RHC results as the ground truth, intraclass correlation coefficients (ICC) between mPAP, sPAP, dPAP, and TPR measured by RHC and the output of our regression model were calculated. After that, we set the classification threshold as 40mm Hg and 55 mm Hg for mPAP and sPAP to separate the patients into different risk classes, respectively. Then, we calculated sensitivity and specificity of the patient risk level derived from RHC results with the output of our classification model. In addition, the receiver operating characteristic curve (ROC) was used to evaluate the classification performance.

Study Population Characteristics
A total of 55 patients with PH were included in this study with clinical characteristics shown in Table 1. Chronic pulmonary embolism (15/55, 27.27%) was the most common reason of pulmonary hypertension in this study. There was only 1 patient diagnosed with mPAP ≥ 70 mm Hg.

Computational Time
The average training time of the segmentation network on each patient is about 6 min. During the testing, the average inference time of the network model on each patient is about 10 min. The training and testing of the segmentation network were all done on NVIDIA A100 Tensor Core GPU. Moreover, the average time for feature extraction on each case is about 17 s. In the prediction phase, the computational time is about 8 s for the regression and classification for each case. The feature extraction and prediction of the regression and classification model were all tested on AMD Ryzen 7 5800H with Radeon Graphics CPU.

The Performance of the Segmentation Framework
Using an independent testing dataset, the average dice score for segmentation is 87.3% ± 2.9 with original nnU-Net. However, from Table 2, we can see that the dice score of each part improved significantly and the average dice score becomes 88.2% ± 2.9. Apart from dice score, the improvements can be seen in Figure 4. Compared with the result of proposed network framework in Figure 4c, myocardium of LV is mis-segmented as chamber of LV using nnU-Net in Figure 4b which overestimates the area of LV. In addition, the MPA segmentation mask which does not cover the original pulmonary artery in Figure 4e is slightly expanded to a plausible mask in Figure 4f.   Table 3 shows that the differences in RPAd, AAd, RVd, and LAd measured by the manual measurement and the AI automatic measurement were not statistically significant (t = 1.222, p = 0.227; t = −0.347, p = 0.730; t = 0.484, p = 0.630; t = −0.320, p = 0.750, respectively). However, the results also suggest that the differences in MPAd, LPAd, LVd, and RAd measured by the manual the AI automatic measurements were statistically significant (t = −3.573, p = 0.001; t = 4.394, p < 0.001; t = 4.255, p < 0.001; t = −7.096, p < 0.001; respectively). Nevertheless, according to the correlation between these features and the PAP parameters, only LVd is valuable for regression and classification. Then, Bland-Altman analyses [25] for features assessed by manual and AI automatic measurements were carried out, and the corresponding biases (limits of agreement) of MPAd, RPAd, LPAd, AAd, LVd, RVd, LAd, and RAd are −2.25 mm (−11.4 mm, 6.90 mm), 0.575 mm (−6.28 mm, 7.43 mm),  Figure 5 shows Bland-Altman analyses between manual and AI measurements.   Table 3 shows that the differences in RPAd, AAd, RVd, and LAd measured by the manual measurement and the AI automatic measurement were not statistically significant (t = 1.222, p = 0.227; t = −0.347, p = 0.730; t = 0.484, p = 0.630; t = −0.320, p = 0.750, respectively). However, the results also suggest that the differences in MPAd, LPAd, LVd, and RAd measured by the manual the AI automatic measurements were statistically significant (t = −3.573, p = 0.001; t = 4.394, p < 0.001; t = 4.255, p < 0.001; t = −7.096, p < 0.001; respectively). Nevertheless, according to the correlation between these features and the PAP parameters, only LVd is valuable for regression and classification. Then, Bland-Altman analyses [25] for features assessed by manual and AI automatic measurements were carried out, and the corresponding biases (limits of agreement) of MPAd, RPAd, LPAd, AAd, LVd, RVd, LAd, and RAd are −2.25 mm (−11.4 mm, 6.90 mm), 0.575 mm (−6.28 mm, 7.43 mm),  Figure 5 shows Bland-Altman analyses between manual and AI measurements.

The Correlations between Pulmonary Artery Pressure and Selected Features
The correlations between morphological features and their second-order features obtained by automatic measurement and pulmonary artery pressure obtained by RHC are shown in Table 4. The table lists some of the characteristics and four pulmonary artery pressure values that are highly correlated with each other: mean pulmonary artery pressure, pulmonary artery systolic pressure, pulmonary artery diastolic pressure, and total pulmonary resistance. Correlations between pulmonary artery pressure and CTPA features show a positive correlation between mPAP and RAd/LAd (r = 0.333, p = 0.012), and a negative correlation between mPAP and LAd, LVd, LAa (r = −0.400, p = 0.002; r = −0.208, p = 0.123; r = −0.470, p = 0.000; respectively), but no correlation was found between mPAP and MPAd, MPAd/AAd, RPAd, LPAd, RVd, RAd.

The Correlations between Pulmonary Artery Pressure and Selected Features
The correlations between morphological features and their second-order features obtained by automatic measurement and pulmonary artery pressure obtained by RHC are shown in Table 4. The table lists some of the characteristics and four pulmonary artery pressure values that are highly correlated with each other: mean pulmonary artery pressure, pulmonary artery systolic pressure, pulmonary artery diastolic pressure, and total pulmonary resistance. Correlations between pulmonary artery pressure and CTPA features show a positive correlation between mPAP and RAd/LAd (r = 0.333, p = 0.012), and a negative correlation between mPAP and LAd, LVd, LAa (r = −0.400, p = 0.002; r = −0.208, p = 0.123; r = −0.470, p = 0.000; respectively), but no correlation was found between mPAP and MPAd, MPAd/AAd, RPAd, LPAd, RVd, RAd.

The Performance of the Regression Model
In the regression task, we compared three commonly used regression models including XGBoost, CatBoost, and SVM. As shown in Figure 6, with the same testing dataset of patients in mPAP regression task, XGBoost (MSE = 12.81) performed better than SVM (MSE = 60.24) and CatBoost (MSE = 28.37). Thus, we chose XGBoost regressor to predict the sPAP, dPAP, and TPR values of patients in this study. Figure 6 shows that the MSE of the results on dPAP, sPAP, and TPR is 16.94, 21.88, and 69,862.59, respectively. To demonstrate the consistency between the predicted and true values with different models and pressure types, we calculated the ICC of these two groups. As shown in Table 5, assuming the interaction effect is absent, the ICC of CATBoost regressor, SVM regressor, and XGBoost regressor to predict mPAP value is 0.689, 0.138, and 0.934, respectively. Moreover, the ICC of XGBoost regressor to predict sPAP, dPAP, and TPR is 0.981, 0.903, and 0.685, respectively. The above results show that the XGBoost regressor is able to predict mPAP, sPAP, and dPAP on a small dataset of adult patients with PH. Note. Two-way mixed effects model where people effects are random and measures effects are fixed; a The estimator is the same, whether the interaction effect is present or not; b Type C intraclass correlation coefficient using a consistency definition. The between-measure variance is excluded from the denominator variance; c This estimate is computed assuming the interaction effect is absent, because it is not estimable otherwise.

The Performance of the Classification Model
In the classification task, we also compared three commonly used classification models, namely XGBoost, CatBoost, and SVM. Setting the cut-off value as 40 mm Hg, we tested the above three different machine learning models to classify patients with PH based on mPAP. As shown in Figure 7, using the same testing dataset of patients in mPAP classification task, XGBoost (AUC = 0.911, p < 0.001) performed better than SVM (AUC = 0.679, p = 0.2846) and CatBoost (AUC = 0.893, p < 0.001). Similarly, we used these three models to classify patients with PH by sPAP with a cut-off value of 55 mm Hg. The results are also shown in Figure 6, yielding AUC of 0.556 (p = 0.8257), 0.639 (p = 0.6262) and 0.833 (p = 0.0057) for CatBoost, SVM and XGBoost, respectively. cation task, XGBoost (AUC = 0.911, p < 0.001) performed better than SVM (AUC = 0.679, p = 0.2846) and CatBoost (AUC = 0.893, p < 0.001). Similarly, we used these three models to classify patients with PH by sPAP with a cut-off value of 55 mm Hg. The results are also shown in Figure 6, yielding AUC of 0.556 (p = 0.8257), 0.639 (p = 0.6262) and 0.833 (p = 0.0057) for CatBoost, SVM and XGBoost, respectively.

Discussion
In this study, we developed a fully automated CTPA image-based framework for the additional diagnosis of PH. First, this framework can achieve the segmentation of eight substructures of pulmonary artery and heart (LV, RV, LA, RA, LPA, RPA, MPA, and AA). Using an independent testing dataset, the average Dice score for segmentation with our proposed framework could reach 88.2%. Second, we completed the features extraction

Discussion
In this study, we developed a fully automated CTPA image-based framework for the additional diagnosis of PH. First, this framework can achieve the segmentation of eight substructures of pulmonary artery and heart (LV, RV, LA, RA, LPA, RPA, MPA, and AA). Using an independent testing dataset, the average Dice score for segmentation with our proposed framework could reach 88.2%. Second, we completed the features extraction based on the segmentation outcome. Some of the AI automatic extractions (AAd, RVd, LAd, and RPAd) achieved good consistency with the manual measurements. However, the differences between MPAd, LPAd, LVd, and RAd, respectively measured by AI and physicians, are statistically significant (p < 0.001). Then, we selected morphological features or their second-order features with high correlations between mPAP, sPAP, dPAP, and TPR, and achieved features dimension reduction using principal component analysis. Finally, the regression model for predicting mPAP, sPAP, dPAP, and TPR and the classification model for separating patients with PH by mPAP or sPAP with different risk levels were executed. Good consistency existed between the outcome of the regression model predicted mPAP, dPAP, and sPAP and the ground-truth from RHC (ICC = 0.934, p = 0.002; ICC = 0.903, p = 0.006; ICC = 0.981, p = 0.000, respectively). The AUC of ROC curve of the classification model reached 0.911 (p < 0.001) for mPAP data and 0.833 (p = 0.0057) for sPAP data.
The segmentation for the eight substructures were based on our proposed framework. The part after pre-segmentation of this framework is a complement which relies on the performance of nnU-Net based pre-segmentation. The proposed method utilizes the image intensity information near the edge of the mask to obtain the feature map through an attention mechanism. The feature map that aims to represent the drastic changes between the segmentation part and the adjacent area is a useful input for the classification neural network. However, the pre-segmentation outputs for some parts may appear as unpredictable results, such as a sudden disappearance in a certain plane or the results appear as two separate pieces. These problems cannot be solved by current framework, so it is necessary to propose a better segmentation method combining latest knowledge in machine learning field.
In this study, the morphological features contained in CTPA images were extracted based on 3D image segmentations. Previous studies have shown that some morphological parameters manually measured by physicians have high correlations between pulmonary vascular resistance in patients with chronic thromboembolic pulmonary hypertension [7,14,15]. However, Liu's study did not achieve automatic extraction of morphological parameters, thus lack of repeatability. Our study is more efficient and ensures strong repeatability through automatic segmentation and morphological parameter extraction. Nevertheless, the present network based on nnU-Net for segmentation cost about 10 min to inference. Therefore, we may try to improve the network's segmentation performance and prediction speed by improving the network structure or using methods such as model pruning.
The analysis of morphological features in this study represents mainly an extension of the existing research. Liu [14] and Jia [17] et al. proposed that not only the morphological features themselves but also the ratio between them were correlated to the mPAP levels or right ventricular dysfunction. Melzig et al. [16] used the volume of main, right, and left pulmonary arteries and combined echocardiographic sPAP to achieve a higher accuracy for the prediction of mPAP. Our study included one-dimensional and two-dimensional morphological features of heart and pulmonary arteries and the ratio between selected features, such as RAd/LAd, LVd, and LAa. In recent years, radiomics has gradually emerged, which can extract many invisible shapes, textures, and image intensity features. Cetin et al. [26] demonstrated the feasibility and the clinical value of the cardiac MRI radiomics in analyzing the cardiovascular risk factors including diabetes, hypertension, high cholesterol, and smoking. Moreover, Lu et al. [27] proposed a combined model, including morphological features and radiomics features from CT scan, to distinguish minimally invasive adenocarcinomas and invasive adenocarcinomas, with high potential to provide for the auxiliary diagnosis. Therefore, if hundreds of thousands of features could be extracted from images by radiomics, and features extraction and dimension reduction can be performed on them, then the performance of the regression and classification model in our study may be better.
The regression model proposed in this study is designed to assess pulmonary arterial pressure in the diagnosis of PH based on CTPA images. The ICC coefficient between the predicted value and the actual value of this regression model of mPAP, sPAP, and dPAP is 0.934, 0.903, and 0.981, respectively. The regression model results demonstrate the potential feasibility of inferring the functional parameters of patients based on morphological features obtained from CTA images without RHC. However, the number of samples in this study is small, and all of them are adult patients with PH. There is a lack of samples from different groups, such as people without PH and children in PH. Therefore, it is necessary to conduct research in the future with inclusion of larger and more diverse samples to verify the generalizability of the results of this model.
The classification model used in this study uses 40 mm Hg and 55 mm Hg as the cut-off value of mPAP and sPAP to characterize PH patients into two categories. The classification model is a further extension and application of the regression model proposed in this study, and has a certain potential to achieve risk stratification for patients with PH. However, since the samples in this study did not contain patients without PH, the ability of the model to discriminate between PH and non-PH patients remains to be determined. Therefore, patients without PH can be introduced to construct a classification model capable of diagnosing PH in the future. On the other hand, the mPAP values of patients with PH in this study are concentrated and the lower sPAP values are not enough, resulting in unbalance of the samples. In the future, we may construct a large number of samples of patients with a balanced distribution of mPAP and sPAP values. Then we may be able to complete the multi-classification of patients in PH.
At present, the classification model proposed in this study focuses on the auxiliary diagnosis. For a better realization of a pathological clear division of patients with pulmonary arterial hypertension (pulmonary hypertension caused by left heart disease, pulmonary disease and hypoxia, chronic thromboembolic pulmonary hypertension, and unknown multi-factor mechanism), it is necessary to further improve the diversity of collected samples and integrate more information contained in multimodal medical images into the analysis, therefore achieving the purpose of pathological classification for PH. As discussed previously, our study was limited in analyzing one-dimensional and two-dimensional morphological features of heart and pulmonary arteries and the ratio between the selected features. In future study, we will explore the volumetric information of heart and pulmonary artery morphology as well as the spatial relationship between different intra-and extra-cardiac structures to improve the accuracy of PAP parameter evaluation.

Conclusions
In this study, we developed a fully automatic framework for supplementary diagnosis of PH based on CTPA images. We achieved automatic segmentation of eight substructures (LV, RV, LA, RA, LPA, RPA, MPA, and AA) of the pulmonary artery and the heart with high accuracy by our proposed segmentation framework. Based on this segmentation result, the extraction of morphological features was automatically carried out by the machine learning, which was highly repeatable. Our results showed that it is feasible to assess PAP parameters in patients with PH from CTPA images rather than invasive RHC examinations. Furthermore, our proposed framework can also perform a preliminary classification of PH patients, which is a contribution to the diagnosis or management of PH patients.  Institutional Review Board Statement: This retrospective study was approved by our institutional review board in accordance with local ethics procedures (2021164X).

Informed Consent Statement:
Further consent was waived with approval. Data Availability Statement: Data is unavailable due to privacy.

Conflicts of Interest:
The authors declare no conflict of Interest.

AUC
area under the receiver operating characteristic curve CTPA computed tomography pulmonary angiography dPAP pulmonary artery diastolic pressure ICC intraclass correlation coefficient mPAP mean pulmonary artery pressure MSE mean square error PH pulmonary hypertension RHC right heart catheterization sPAP pulmonary artery systolic pressure TPR total pulmonary resistance